From: Marc Schwartz <MSchwartz_at_mn.rr.com>

Date: Tue 21 Jun 2005 - 09:46:58 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jun 21 09:56:09 2005

Date: Tue 21 Jun 2005 - 09:46:58 EST

On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:

*> hi,
*

> i have another question on tapply:

*> i have a dataset z like this:
**> 5540 389100307391 2600
**> 5541 389100307391 2600
**> 5542 389100307391 2600
**> 5543 389100307391 2600
**> 5544 389100307391 2600
**> 5546 381300302513 NA
**> 5547 387000307470 NA
**> 5548 387000307470 NA
**> 5549 387000307470 NA
**> 5550 387000307470 NA
**> 5551 387000307470 NA
**> 5552 387000307470 NA
**>
**> I want to sum the column 3 by column 2.
**> I removed NA by calling:
**> tapply(z[[3]], z[[2]], sum, na.rm=T)
**> but it does not work.
**>
**> then, i used
**> z1<-z[!is.na(z[[3]],]
**> and repeat
**> still doesn't work.
**>
**> please help.
*

The index vector(s) in tapply() need to be a "list". See the description of the INDEX argument in ?tapply:

> tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)

381300302513 387000307470 389100307391

0 0 13000

Note that the use of na.rm = TRUE here results in misleading values of 0 for the other two groups, which are all NA's and this is not self-evident unless you know the data.

You may be better off with:

> tapply(z[[3]],list(z[[2]]), sum)

381300302513 387000307470 389100307391

NA NA 13000

unless your real data is a mix of NA's and measured values.

Also see ?complete.cases and ?na.omit for further approaches to dealing with such data sets.

**HTH,
**
Marc Schwartz

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jun 21 09:56:09 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:32:54 EST
*