Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

From: Kevin B. Hendricks <kevin.hendricks_at_sympatico.ca>
Date: Sat 29 Jul 2006 - 04:32:21 GMT

Hi Bill,

>>> sum : igroupSums

Okay, after thinking about this ...

# assumes i is the small integer factor with n levels
# v is some long vector
# no sorting required

igroupSums <- function(v,i) {

   sums <- rep(0,max(i))
   for (j in 1:length(v)) {

       sums[[i[[j]]]] <- sums[[i[[j]]]] + v[[j]]    }
   sums
}

if written in fortran or c might be faster than using split. It is at least just linear in time with the length of vector v. This approach could be easily made parallel to t threads simply by picking t starting points someplace along v and running this routine in parallel on each piece. You could even do it without thread locking if "sums" elements can be accessed atomically or by creating multiple copies of "sums" (one for each piece) and then doing a final addition.

I still think I am missing some obvious way to do this but ...

Am I thinking along the right lines?

Kevin



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat Jul 29 14:36:16 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 31 Jul 2006 - 16:27:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.