Re: [R] problem with lapply(x, subset, ...) and variable select argument

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Tue 11 Oct 2005 - 19:36:05 EST

"Dimitris Rizopoulos" <dimitris.rizopoulos@med.kuleuven.be> writes:

> As Gabor said, the issue here is that subset.data.frame() evaluates
> the value of the `select' argument in the parent.frame(); Thus, if you
> create a local function within lapply() (or sapply()) it works:

It's more complicated than that: It evaluates the select argument in a named list with names duplicating those of the data frame, and *then* in parent.frame. This is convenient for command line use, because you can specify ranges of variables as in

  dfsub <- subset(dfr,select=c(sex:treat, x_pre:x_24))

but it is quite risky to try and do this inside a function - if you're passing in a variable, the result depends on whether there is a variable of the same name in the data frame! You can probably get around it using substitute() constructions, but I think it is safer to avoid using functions with nonstandard semantics inside functions.    

> tt <- function (n) {
> x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
> print(lapply(x, function(y, n) subset(y, select = n), n = n))
> print(sapply(x, function(y, n) subset(y, select = n), n = n))
> }
>
> tt("a")
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://www.med.kuleuven.be/biostat/
> http://www.student.kuleuven.be/~m0390867/dimitris.htm
>
>
>
> ----- Original Message -----
> From: "joerg van den hoff" <j.van_den_hoff@fz-rossendorf.de>
> To: "Gabor Grothendieck" <ggrothendieck@gmail.com>; "Thomas Lumley"
> <tlumley@u.washington.edu>
> Cc: "r-help" <r-help@stat.math.ethz.ch>
> Sent: Tuesday, October 11, 2005 10:18 AM
> Subject: Re: [R] problem with lapply(x, subset,...) and variable
> select argument
>
>
> > Gabor Grothendieck wrote:
> >> The problem is that subset looks into its parent frame but in this
> >> case the parent frame is not the environment in tt but the
> >> environment
> >> in lapply since tt does not call subset directly but rather lapply
> >> does.
> >>
> >> Try this which is similar except we have added the line beginning
> >> with environment before the print statement.
> >>
> >> tt <- function (n) {
> >> x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >> environment(lapply) <- environment()
> >> print(lapply(x, subset, select = n))
> >> }
> >>
> >> n <- "b"
> >> tt("a")
> >>
> >> What this does is create a new version of lapply whose
> >> parent is the environment in tt.
> >>
> >>
> >> On 10/10/05, joerg van den hoff <j.van_den_hoff@fz-rossendorf.de>
> >> wrote:
> >>
> >>>I need to extract identically named columns from several data
> >>>frames in
> >>>a list. the column name is a variable (i.e. not known in advance).
> >>>the
> >>>whole thing occurs within a function body. I'd like to use lapply
> >>>with a
> >>>variable 'select' argument.
> >>>
> >>>
> >>>example:
> >>>
> >>>tt <- function (n) {
> >>> x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >>> for (xx in x) print(subset(xx, select = n)) ### works
> >>> print (lapply(x, subset, select = a)) ### works
> >>> print (lapply(x, subset, select = "a")) ### works
> >>> print (lapply(x, subset, select = n)) ### does not work as
> >>> intended
> >>>}
> >>>n = "b"
> >>>tt("a") #works (but selects not the intended column)
> >>>rm(n)
> >>>tt("a") #no longer works in the lapply call including variable
> >>>'n'
> >>>
> >>>
> >>>question: how can I enforce evaluation of the variable n such that
> >>>the lapply call works? I suspect it has something to do with eval
> >>>and
> >>>specifying the correct evaluation frame, but how? ....
> >>>
> >>>
> >>>many thanks
> >>>
> >>>joerg
> >>>
> >>>______________________________________________
> >>>R-help@stat.math.ethz.ch mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>PLEASE do read the posting guide!
> >>>http://www.R-project.org/posting-guide.html
> >>>
> >>
> >>
> >
> > many thanks to thomas and gabor for their help. both solutions solve
> > my
> > problem perfectly.
> >
> > but just as an attempt to improve my understanding of the inner
> > workings
> > of R (similar problems are sure to come up ...) two more question:
> >
> > 1.
> > why does the call of the "[" function (thomas' solution) behave
> > different from "subset" in that the look up of the variable "n"
> > works
> > without providing lapply with the current environment (which is
> > nice)?
> >
> > 2.
> > using 'subset' in this context becomes more cumbersome, if sapply is
> > used. it seems that than I need
> > ...
> > environment(sapply) <- environment(lapply) <- environment()
> > sapply(x, subset, select = n))
> > ...
> > to get it working (and that means you must know, that sapply uses
> > lapply). or can I somehow avoid the additional explicit definition
> > of

> > the lapply-environment?
> >
> >
> > again: many thanks
> >
> > joerg
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Oct 11 19:46:48 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:39:37 EST