[R] Aggregating multiple columns

From: Adam D. I. Kramer <adik_at_ilovebacon.org>
Date: Thu, 19 Mar 2009 14:41:55 -0700 (PDT)


Dear colleagues,

         Consider the following data frame:

x <- data.frame(y=rnorm(100),order=rep(1:10,10),subject=rep(1:10,each=10))

         ...it is my goal to aggregate x to compute a linear effect of order for each subject. So, ideally, result would be a vector containing a single number for each subject, representing the linear relationship between y and order.

         I first tried this:

result <- aggregate(x[1:2,],list(subject=x$subject),

             function (z) { lm(y ~ order, data=z)$coefficients[2] }
           )

...because lm(y ~ order, data=x, subset=x$subject==1)$coefficients[2] would
give me the correct term for subject 1 (i.e., that is the number I am actually looking for).

         However, when used on data frames, aggregate() aggregates every COLUMN in x _separately_ using FUN...while lm needs both columns *together.*

         ...I then turned to tapply, but that is useful only on "atomic objects," and not data frames.

         I have two solutions, which I find inelegant and slow:

  1. result <- sapply(levels(factor(x$subject)), function(z) { lm(y ~ order, data=x, subset=subject==z)$coefficients[2]} )

...this gets the job done, but is very slow.

2) result <- c();
for (z in 1:nlevels(x$s2)) { result[z] <- lm(y ~ order, data=x, subset=x$s2==levels(x$s2)[z])$coefficients[2] }; result <- unlist(result);

...also does the job, but is also very slow.

Is there a better solution? I miss the speed of tapply and aggregate; the example has only 100 rows and 10 subjects, but the actual data has many more of each.

Cordially,
Adam D. I. Kramer
Ph.D. Candidate, Social and Personality Psychology University of Oregon



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 19 Mar 2009 - 20:44:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 19 Mar 2009 - 22:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive