[R] comparing reshape's

From: ivo welch <ivowel_at_gmail.com>
Date: Fri, 11 Jun 2010 16:36:01 -0400


I thought I would share the following.

System: Mac Pro 2.26GHz, OSX, 8GB of memory (not a constraint), R 2.11.0, 64bit version.

Task: I have a long data set: 2.2 million long observations (factor xid, factor yid, variable zcontent), which I want to map into a sparse matrix of 948 columns and 16,350 rows. There are two commonly used functions to accomplish this:

   library(stats);
   outcome = reshape( subset(mydataframe, select=c(yid,xid,zcontent), timevar="yid", idvar="xid", direction="wide") )

takes about 9,600 seconds .

   library(reshape)
   melted = melt( subset(mydataframe, select=c(yid,xid,zcontent), id=c("xid", "yid") )

   outcome = cast( zcontent, xid ~ yid )

takes about 875 seconds.

so, for large reshape jobs from long to wide, the reshape library is much more efficient. YMMV.

/iaw



Ivo Welch (ivo.welch_at_brown.edu, ivo.welch_at_gmail.com)

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 11 Jun 2010 - 20:37:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Jun 2010 - 20:40:36 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive