Re: [Rd] reshape() makes R run out of memory (PR#14121)

From: hadley wickham <h.wickham_at_gmail.com>
Date: Wed, 09 Dec 2009 18:10:45 -0600

> Yes. The culprit would seem to be interaction(), as in
>
>> x <- y <- z <- 1:999
>> i <- interaction(x,y,z, drop=TRUE)
> Error: cannot allocate vector of size 3.7 Gb
>
> which is happening due to the occurrence of three idvar variables. This
> works basically as interaction(x,y,z)[,drop=TRUE], i.e. it first creates a
> factor with 999^3 levels, and removes the empty levels afterward.
>
> In the absense of a better interaction(), you might try making your own
> single idvar as do.call("paste",tbl[,c("ID", "DATE1", "DATE2")]) or so.

There's also ninteraction in the plyr package, which has been designed to generate a unique integer for each combination (while maintaining the original order of the data and any missing combinations) as efficiently as possible. It's much faster than interaction(..., drop = T) and I hope it would be faster than paste since it works with integers rather than strings.

Hadley

-- 
http://had.co.nz/

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 10 Dec 2009 - 00:18:20 GMT

This archive was generated by hypermail 2.2.0 : Thu 10 Dec 2009 - 07:11:03 GMT