Re: [R] basic question

From: Marc Schwartz <MSchwartz_at_MedAnalytics.com>
Date: Fri 22 Apr 2005 - 02:11:33 EST

On Thu, 2005-04-21 at 16:31 +0100, jose silva wrote:
> I know this question is very simple, but I am not figure it out

> I have the data frame:

> test<- data.frame(year=c(2000,2000,2001,2001),x=c(54,41,90,15), y=c(29,2,92,22), z=c(26,68,46,51))

> test

> year x y z

> 1 2000 54 29 26

> 2 2000 41 2 68

> 3 2001 90 92 46

> 4 2001 15 22 51

> I want to sum the vectors x, y and z within each year (2000 and 2001) to obtain this:

>
> year x y z

> 1 2000 95 31 94

> 2 2001 105 114 97

> I tried tapply but did not work (or probably I do it wrong)

>
> Any suggestions?

tapply() is typically used against a single vector, subsetting by one or more factors.

In this case, since you want to get the colSums for more than one column in the data frame, there are a few options:

  1. Use by():

> by(test[, -1], test$year, colSums)

test$year: 2000
 x y z
95 31 94



test$year: 2001
  x y z
105 114 97

2. Use aggregate():

> aggregate(test[, -1], list(Year = test$year), sum)
  Year x y z
1 2000 95 31 94
2 2001 105 114 97

3. Use split() and then lapply():

> test.s <- split(test, test$year)
> test.s

$"2000"
  year x y z
1 2000 54 29 26
2 2000 41 2 68

$"2001"
  year x y z
3 2001 90 92 46
4 2001 15 22 51

> lapply(test.s, function(x) colSums(x[, -1]))
$"2000"
 x y z
95 31 94

$"2001"
  x y z
105 114 97

Which you choose may depend upon how you need the output structured for subsequent use.

See ?by, ?aggregate, ?lapply and ?split for more information.

HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Apr 22 02:16:26 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:21 EST