From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>

Date: Tue 11 Oct 2005 - 19:36:05 EST

Date: Tue 11 Oct 2005 - 19:36:05 EST

"Dimitris Rizopoulos" <dimitris.rizopoulos@med.kuleuven.be> writes:

> As Gabor said, the issue here is that subset.data.frame() evaluates

*> the value of the `select' argument in the parent.frame(); Thus, if you
**> create a local function within lapply() (or sapply()) it works:
*

It's more complicated than that: It evaluates the select argument in a named list with names duplicating those of the data frame, and *then* in parent.frame. This is convenient for command line use, because you can specify ranges of variables as in

dfsub <- subset(dfr,select=c(sex:treat, x_pre:x_24))

but it is quite risky to try and do this inside a function - if you're passing in a variable, the result depends on whether there is a variable of the same name in the data frame! You can probably get around it using substitute() constructions, but I think it is safer to avoid using functions with nonstandard semantics inside functions.

> tt <- function (n) {

*> x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
**> print(lapply(x, function(y, n) subset(y, select = n), n = n))
**> print(sapply(x, function(y, n) subset(y, select = n), n = n))
**> }
**>
**> tt("a")
**>
**>
**> I hope it helps.
**>
**> Best,
**> Dimitris
**>
**> ----
**> Dimitris Rizopoulos
**> Ph.D. Student
**> Biostatistical Centre
**> School of Public Health
**> Catholic University of Leuven
**>
**> Address: Kapucijnenvoer 35, Leuven, Belgium
**> Tel: +32/(0)16/336899
**> Fax: +32/(0)16/337015
**> Web: http://www.med.kuleuven.be/biostat/
**> http://www.student.kuleuven.be/~m0390867/dimitris.htm
**>
**>
**>
**> ----- Original Message -----
**> From: "joerg van den hoff" <j.van_den_hoff@fz-rossendorf.de>
**> To: "Gabor Grothendieck" <ggrothendieck@gmail.com>; "Thomas Lumley"
**> <tlumley@u.washington.edu>
**> Cc: "r-help" <r-help@stat.math.ethz.ch>
**> Sent: Tuesday, October 11, 2005 10:18 AM
**> Subject: Re: [R] problem with lapply(x, subset,...) and variable
**> select argument
**>
**>
**> > Gabor Grothendieck wrote:
**> >> The problem is that subset looks into its parent frame but in this
**> >> case the parent frame is not the environment in tt but the
**> >> environment
**> >> in lapply since tt does not call subset directly but rather lapply
**> >> does.
**> >>
**> >> Try this which is similar except we have added the line beginning
**> >> with environment before the print statement.
**> >>
**> >> tt <- function (n) {
**> >> x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
**> >> environment(lapply) <- environment()
**> >> print(lapply(x, subset, select = n))
**> >> }
**> >>
**> >> n <- "b"
**> >> tt("a")
**> >>
**> >> What this does is create a new version of lapply whose
**> >> parent is the environment in tt.
**> >>
**> >>
**> >> On 10/10/05, joerg van den hoff <j.van_den_hoff@fz-rossendorf.de>
**> >> wrote:
**> >>
**> >>>I need to extract identically named columns from several data
**> >>>frames in
**> >>>a list. the column name is a variable (i.e. not known in advance).
**> >>>the
**> >>>whole thing occurs within a function body. I'd like to use lapply
**> >>>with a
**> >>>variable 'select' argument.
**> >>>
**> >>>
**> >>>example:
**> >>>
**> >>>tt <- function (n) {
**> >>> x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
**> >>> for (xx in x) print(subset(xx, select = n)) ### works
**> >>> print (lapply(x, subset, select = a)) ### works
**> >>> print (lapply(x, subset, select = "a")) ### works
**> >>> print (lapply(x, subset, select = n)) ### does not work as
**> >>> intended
**> >>>}
**> >>>n = "b"
**> >>>tt("a") #works (but selects not the intended column)
**> >>>rm(n)
**> >>>tt("a") #no longer works in the lapply call including variable
**> >>>'n'
**> >>>
**> >>>
**> >>>question: how can I enforce evaluation of the variable n such that
**> >>>the lapply call works? I suspect it has something to do with eval
**> >>>and
**> >>>specifying the correct evaluation frame, but how? ....
**> >>>
**> >>>
**> >>>many thanks
**> >>>
**> >>>joerg
**> >>>
**> >>>______________________________________________
**> >>>R-help@stat.math.ethz.ch mailing list
**> >>>https://stat.ethz.ch/mailman/listinfo/r-help
**> >>>PLEASE do read the posting guide!
**> >>>http://www.R-project.org/posting-guide.html
**> >>>
**> >>
**> >>
**> >
**> > many thanks to thomas and gabor for their help. both solutions solve
**> > my
**> > problem perfectly.
**> >
**> > but just as an attempt to improve my understanding of the inner
**> > workings
**> > of R (similar problems are sure to come up ...) two more question:
**> >
**> > 1.
**> > why does the call of the "[" function (thomas' solution) behave
**> > different from "subset" in that the look up of the variable "n"
**> > works
**> > without providing lapply with the current environment (which is
**> > nice)?
**> >
**> > 2.
**> > using 'subset' in this context becomes more cumbersome, if sapply is
**> > used. it seems that than I need
**> > ...
**> > environment(sapply) <- environment(lapply) <- environment()
**> > sapply(x, subset, select = n))
**> > ...
**> > to get it working (and that means you must know, that sapply uses
**> > lapply). or can I somehow avoid the additional explicit definition
**> > of
**> > the lapply-environment?
**> >
**> >
**> > again: many thanks
**> >
**> > joerg
**> >
**> > ______________________________________________
**> > R-help@stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide!
**> > http://www.R-project.org/posting-guide.html
**> >
**>
**>
**> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
**>
*

-- O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Tue Oct 11 19:46:48 2005

*
This archive was generated by hypermail 2.1.8
: Sun 23 Oct 2005 - 18:39:37 EST
*