Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

From: Seth Falcon <sfalcon_at_fhcrc.org>
Date: Thu 27 Jul 2006 - 14:20:29 GMT

"Kevin B. Hendricks" <kevin.hendricks@sympatico.ca> writes:
> My first R attempt was a simple
>
> # sort the data.frame gd and the sort key
> sorder <- order(MDPC)
> gd <- gd[sorder,]
> MDPC <- MDPC[sorder]
> attach(gd)
>
> # find the length and sum for each unique sort key
> XN <- by(MVE, MDPC, length)
> XSUM <- by(MVE, MDPC, sum)
> GRPS <- levels(as.factor(MDPC))
>
> Well the ordering and sorting was reasonably fast but the first "by"
> statement was still running 4 hours later on my machine (a dual 2.6
> gig Opteron with 4 gig of main memory). This same snippet of code in
> SAS running on a slower machine takes about 5 minutes of system
> time.

I wonder if split() would be of use here. Once you have sorted the data frame gd and the sort keys MDPC, you could do:

gdList <- split(gd$MVE, MDPC)

xn <- sapply(gdList, length)
xsum <- sapply(gdList, sum)

+ seth



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Jul 28 00:27:24 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 27 Jul 2006 - 18:28:51 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.