Re: [R] aggregate vs tapply; is there a middle ground?

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Sun 12 Feb 2006 - 10:37:46 EST

hadley wickham <h.wickham@gmail.com> writes:

> > I faced a similar problem. Here's what I did
> >
> > tmp <-
> > data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
> > tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
> > tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
> > merge(tmp2,tmp1,all.x=T)
> >
> > At least fewer than 10 extra lines of code. Anyone with a simpler solution?
>
> Well, you can almost do this in with the reshape package:
>
> tmp <-
> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
> a <- recast(tmp, A + B ~ ., sum)
> # see also recast(tmp, A ~ B, sum)
> add.all.combinations(a, row="A", cols = "B")
>
> Where add.all.combinations basically does what you outlined above --
> it would be easy enough to generalise to multiple dimensions.

Anything wrong with

> as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum))))

   A B       Freq
1  A a         NA

2 B a -0.2524320
3 C a 3.8539264
4 D a NA
5  A c  0.7227294
6  B c -0.2694669
7  C c  0.4760957
8  D c         NA
9  A e         NA
10 B e  0.1800500
11 C e         NA

12 D e -1.0350928

(except the silly colname, responseName="sum" should fix that).

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sun Feb 12 10:41:57 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:42:27 EST