# Re: [R] tapply

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Wed 22 Jun 2005 - 18:13:27 EST

>>>>> "AndyL" == Liaw, Andy <andy_liaw@merck.com> >>>>> on Tue, 21 Jun 2005 13:30:54 -0400 writes:

AndyL> Try:
>> (x <- factor(1:2, levels=1:5))
AndyL> [1] 1 2
AndyL> Levels: 1 2 3 4 5
>> (x <- x[, drop=TRUE])
AndyL> [1] 1 2
AndyL> Levels: 1 2

or

(x <- factor(1:2, levels=1:5))
(x2 <- factor(x))

which also drops the level
Martin

AndyL> Andy

>> From: Weiwei Shi [mailto:helprhelp@gmail.com]
>>
>> Even before I tried, I already realize it must be true when I read
>> this reply! Great job! thanks, Andy.
>>
>> > str(z)
>> `data.frame': 235 obs. of 2 variables:
>> \$ CLAIMNUM : Factor w/ 1907 levels "0","10000001849",..: 1083 1083
>> 1083 1582 1582 1084 1681 1681 1391 1391 ...
>> \$ SIU.SAVED: int 475 3000 3000 0 0 4352 0 0 4500 3000 ...
>>
>> So, I have another general question: how to avoid this when I
>> do the matching?
>> In my case, claimnum does not have to be a factor. I think I can do
>> as.integer on it to de-factor it. But, I want to know how to do it w/
>> keeping is as factor? btw, what's your way to drop those levels? :)
>>
>> weiwei
>>
>>
>> On 6/21/05, Liaw, Andy <andy_liaw@merck.com> wrote:
>> > What does str(z) say? I suspect the second column is a
>> factor, which, after
>> > the subsetting, has some empty levels. If so, just drop
>> those levels.
>> >
>> > Andy
>> >
>> > > From: Weiwei Shi
>> > >
>> > > hi
>> > > i tried all the methods suggested above:
>> > > ave and rowsum with "with" function works for my
>> situation. I think
>> > > the problem might not be due to tapply.
>> > > My data z comes from
>> > > z<-y[y[[1]] %in% x[[2]], c(1,9)]
>> > >
>> > > while z is supposed to have no entries for those non-matched
>> > > between x and y.
>> > >
>> > > however, when I run tapply, and the result also includes those
>> > > non-matched entries. I use is.na function to remove those
>> entry from z
>> > > first and then use tapply again, but the result is the same: those
>> > > NA's and those non-matched results are still there.
>> That's what I mean
>> > > by "it doesn't work".
>> > >
>> > > Is there something I missed here so that z "implicitly" has some
>> > > "trace" back to y dataset?
>> > >
>> > > thanks,
>> > >
>> > > On 6/20/05, Gabor Grothendieck <ggrothendieck@gmail.com> wrote:
>> > > > On 6/20/05, Weiwei Shi <helprhelp@gmail.com> wrote:
>> > > > > hi,
>> > > > > i have another question on tapply:
>> > > > > i have a dataset z like this:
>> > > > > 5540 389100307391 2600
>> > > > > 5541 389100307391 2600
>> > > > > 5542 389100307391 2600
>> > > > > 5543 389100307391 2600
>> > > > > 5544 389100307391 2600
>> > > > > 5546 381300302513 NA
>> > > > > 5547 387000307470 NA
>> > > > > 5548 387000307470 NA
>> > > > > 5549 387000307470 NA
>> > > > > 5550 387000307470 NA
>> > > > > 5551 387000307470 NA
>> > > > > 5552 387000307470 NA
>> > > > >
>> > > > > I want to sum the column 3 by column 2.
>> > > > > I removed NA by calling:
>> > > > > tapply(z[[3]], z[[2]], sum, na.rm=T)
>> > > > > but it does not work.
>> > > > >
>> > > > > then, i used
>> > > > > z1<-z[!is.na(z[[3]],]
>> > > > > and repeat
>> > > > > still doesn't work.
>> > > > >
>> > > > >
>> > > >
>> > > > Depending on what you want you may be able to use rowsum:
>> > > >
>> > > > - display only groups that have at least one non-NA with the sum
>> > > > being the sum of the non-NAs:
>> > > >
>> > > > with(na.omit(z), rowsum(V3, V2))
>> > > >
>> > > > - display all groups with the sum being NA if any member is NA:
>> > > >
>> > > > rowsum(z\$V3, z\$V2)
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>>
>>
