From: Claus O'Rourke <claus.orourke_at_gmail.com>

Date: Wed, 16 Mar 2011 16:32:58 +0000

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Mar 2011 - 16:45:47 GMT

Date: Wed, 16 Mar 2011 16:32:58 +0000

Brilliant - that was really useful!

On Tue, Mar 15, 2011 at 3:46 PM, Ista Zahn <izahn_at_psych.rochester.edu> wrote:

> Hi Claus,

*>
**> On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke <claus.orourke_at_gmail.com> wrote:
**>> Hi,
**>> I am trying to recursively apply a function to a selection of columns
**>> in a dataframe. I've had a look around and from what I have read, I
**>> should be using some version of the apply function, but I'm really
**>> having some headaches with it.
**>
**> I would just do it in a loop (see below)
**>>
**>> Let me be more specific with an example.
**>>
**>> Say I have a data frame similar to the following
**>>
**>> A x y z r1 r2 r3 r4
**>> 0.1 0.2 0.1 ...
**>> 0.1 0.3 ...
**>> 0.2 ...
**>>
**>> i.e., a number of columns, each of the same length, and all containing
**>> real numbers. Of these columns, I want to model one variable, say A,
**>> as a function of other variables, say x, y, z, and any one of my r1,
**>> r2, r3, ... variables.
**>>
**>> i.e., I want to model
**>> A ~ x + y + z + r1
**>> A ~ x + y + z + r2
**>> ....
**>> A ~ x + y + z + rn
**>>
**>> But where the number of 'r' variables I will have will be large, and I
**>> don't know the specific number of these variables in advance.
**>>
**>> My question first is, how can I select all the columns in a dataframe
**>> that have a heading that matches a string pattern?
**>
**> ?grep
**>
**>>
**>> And then related to this, what would be the best way of repeatedly
**>> applying my modelling function to the result?
**>
**> Well, I don't know about the "best" way. But why not just
**>
**> set.seed(21 )
**> dat <- as.data.frame(matrix(rnorm(100000 ), ncol=100, dimnames=list
**> (1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" )))))
**>
**> mods <- list()
**> for(i in grep("r", names(dat ), value=TRUE)) {
**> mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat )
**> }
**>
**> Note that you should be cautious about making any inferences based on
**> this kind of method. In the example above 9 r variables are
**> "significant" at the .05 level, even though the data was generated
**> "randomly":
**>
**> sort(sapply(mods, function(x) coef(summary(x))[5, 4]))
**>
**> Best,
**> Ista
**>>
**>> Many thanks for any help for this occasional R armature.
**>>
**>> Claus
**>>
**>> ______________________________________________
**>> R-help_at_r-project.org mailing list
**>> https://stat.ethz.ch/mailman/listinfo/r-help
**>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**>> and provide commented, minimal, self-contained, reproducible code.
**>>
**>
**>
**>
**> --
**> Ista Zahn
**> Graduate student
**> University of Rochester
**> Department of Clinical and Social Psychology
**> http://yourpsyche.org
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Mar 2011 - 16:45:47 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 16 Mar 2011 - 16:50:22 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*