Re: [R] aggregate slow with many rows - alternative?

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Fri 14 Oct 2005 - 10:29:18 EST

Convert dat to a matrix and see if working with the matrix instead of a data frame speeds things up enough.

On 10/13/05, Hans-Peter <gchappi@gmail.com> wrote:
> Hi,
>
> I use the code below to aggregate / cnt my test data. It works fine,
> but the problem is with my real data (33'000 rows) where the function
> is really slow (nothing happened in half an hour).
>
> Does anybody know of other functions that I could use?
>
> Thanks,
> Hans-Peter
>
> --------------
> dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656,
> 32656, 32656, 32672, 32672, 32699 ),
> FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953,
> 89953, 64395, 62896, 62870 ),
> Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
> f <- function(x) data.frame( Datum = x[1,1], FischerID = x[1,2],
> Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
> t.a <- do.call("rbind", by(dat, dat[,1:2], f)) # slow for 33'000 rows
> t.a <- t.a[order( t.a[,1], t.a[,2] ),]
>
> # show data
> dat
> t.a
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 14 10:33:32 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:44 EST