# [R] Aggregating multiple columns

Date: Thu, 19 Mar 2009 14:41:55 -0700 (PDT)

Dear colleagues,

Consider the following data frame:

x <- data.frame(y=rnorm(100),order=rep(1:10,10),subject=rep(1:10,each=10))

...it is my goal to aggregate x to compute a linear effect of order for each subject. So, ideally, result would be a vector containing a single number for each subject, representing the linear relationship between y and order.

I first tried this:

result <- aggregate(x[1:2,],list(subject=x\$subject),

```             function (z) { lm(y ~ order, data=z)\$coefficients[2] }
)

```

...because lm(y ~ order, data=x, subset=x\$subject==1)\$coefficients[2] would
give me the correct term for subject 1 (i.e., that is the number I am actually looking for).

However, when used on data frames, aggregate() aggregates every COLUMN in x _separately_ using FUN...while lm needs both columns *together.*

...I then turned to tapply, but that is useful only on "atomic objects," and not data frames.

I have two solutions, which I find inelegant and slow:

1. result <- sapply(levels(factor(x\$subject)), function(z) { lm(y ~ order, data=x, subset=subject==z)\$coefficients[2]} )

...this gets the job done, but is very slow.

2) result <- c();
for (z in 1:nlevels(x\$s2)) { result[z] <- lm(y ~ order, data=x, subset=x\$s2==levels(x\$s2)[z])\$coefficients[2] }; result <- unlist(result);

...also does the job, but is also very slow.

Is there a better solution? I miss the speed of tapply and aggregate; the example has only 100 rows and 10 subjects, but the actual data has many more of each.

Cordially,