Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Mon 31 Jul 2006 - 14:19:01 GMT

On Sat, 29 Jul 2006, Kevin B. Hendricks wrote:

> Hi Bill,
>
>>>> sum : igroupSums
>
> Okay, after thinking about this ...
>
> # assumes i is the small integer factor with n levels
> # v is some long vector
> # no sorting required
>
> igroupSums <- function(v,i) {
> sums <- rep(0,max(i))
> for (j in 1:length(v)) {
> sums[[i[[j]]]] <- sums[[i[[j]]]] + v[[j]]
> }
> sums
> }
>
> if written in fortran or c might be faster than using split. It is
> at least just linear in time with the length of vector v.

For sums you should look at rowsum(). It uses a hash table in C and last time I looked was faster than using split(). It returns a vector of the same length as the input, but that would easily be fixed.

The same approach would work for min, max, range, count, mean, but not for arbitrary functions.

         -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue Aug 01 00:23:31 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 01 Aug 2006 - 00:28:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.