Re: [R] tapply

From: Marc Schwartz <MSchwartz_at_mn.rr.com>
Date: Tue 21 Jun 2005 - 09:46:58 EST

On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:
> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391 2600
> 5541 389100307391 2600
> 5542 389100307391 2600
> 5543 389100307391 2600
> 5544 389100307391 2600
> 5546 381300302513 NA
> 5547 387000307470 NA
> 5548 387000307470 NA
> 5549 387000307470 NA
> 5550 387000307470 NA
> 5551 387000307470 NA
> 5552 387000307470 NA
>
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
>
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
>
> please help.

The index vector(s) in tapply() need to be a "list". See the description of the INDEX argument in ?tapply:

> tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)
381300302513 387000307470 389100307391

           0 0 13000

Note that the use of na.rm = TRUE here results in misleading values of 0 for the other two groups, which are all NA's and this is not self-evident unless you know the data.

You may be better off with:

> tapply(z[[3]],list(z[[2]]), sum)

381300302513 387000307470 389100307391

          NA NA 13000

unless your real data is a mix of NA's and measured values.

Also see ?complete.cases and ?na.omit for further approaches to dealing with such data sets.

HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jun 21 09:56:09 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:54 EST