On Thu, 2005-04-21 at 16:31 +0100, jose silva wrote:

> I know this question is very simple, but I am not figure it out

> test<- data.frame(year=c(2000,2000,2001,2001),x=c(54,41,90,15), y=c(29,2,92,22), z=c(26,68,46,51))

> I want to sum the vectors x, y and z within each year (2000 and 2001) to obtain this:

> I tried tapply but did not work (or probably I do it wrong)

> Any suggestions?

tapply() is typically used against a single vector, subsetting by one or more factors.

In this case, since you want to get the colSums for more than one column in the data frame, there are a few options:

- Use by():

> by(test[, -1], test$year, colSums)

test$year: 2000

x y z

95 31 94

test$year: 2001

x y z

105 114 97

2. Use aggregate():

> aggregate(test[, -1], list(Year = test$year), sum)

Year x y z

1 2000 95 31 94

2 2001 105 114 97

3. Use split() and then lapply():

> test.s <- split(test, test$year)

> test.s

$"2000"

year x y z

1 2000 54 29 26

2 2000 41 2 68

$"2001"

year x y z

3 2001 90 92 46

4 2001 15 22 51

> lapply(test.s, function(x) colSums(x[, -1]))

$"2000"

x y z

95 31 94

$"2001"

x y z

105 114 97

Which you choose may depend upon how you need the output structured for subsequent use.

See ?by, ?aggregate, ?lapply and ?split for more information.

Marc Schwartz

