Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

From: Thomas Lumley <>
Date: Mon 31 Jul 2006 - 14:19:01 GMT

On Sat, 29 Jul 2006, Kevin B. Hendricks wrote:

> Hi Bill,
>>>> sum : igroupSums
> Okay, after thinking about this ...
> # assumes i is the small integer factor with n levels
> # v is some long vector
> # no sorting required
> igroupSums <- function(v,i) {
> sums <- rep(0,max(i))
> for (j in 1:length(v)) {
> sums[[i[[j]]]] <- sums[[i[[j]]]] + v[[j]]
> }
> sums
> }
> if written in fortran or c might be faster than using split. It is
> at least just linear in time with the length of vector v.

For sums you should look at rowsum(). It uses a hash table in C and last time I looked was faster than using split(). It returns a vector of the same length as the input, but that would easily be fixed.

The same approach would work for min, max, range, count, mean, but not for arbitrary functions.


Thomas Lumley			Assoc. Professor, Biostatistics	University of Washington, Seattle

______________________________________________ mailing list Received on Tue Aug 01 00:23:31 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 01 Aug 2006 - 00:28:32 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.