[Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

From: Antonio Piccolboni <antonio_at_piccolboni.info>
Date: Mon, 30 Apr 2012 16:28:04 -0700


Hi,
I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in

system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x&quot;, 1:2000, sep =""))})
  user system elapsed
  0.004 0.000 0.004

and then I try to split it

> system.time(split(fd, 1:nrow(fd)))

   user system elapsed
  0.333 0.031 0.415

You will be quick to notice the roughly two orders of magnitude difference in time between creation and conversion. Granted, it's not written anywhere that they should be similar but the latter seems interpreter-slow to me (split is implemented with a lapply in the data frame case) There is also a memory issue when I hit about 20000 elements (allocating 3GB when interrupted). So before I resort to Rcpp, despite the electrifying feeling of approaching the bare metal and for the sake of getting things done, I thought I would ask the experts. Thanks

Antonio

        [[alternative HTML version deleted]]



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 01 May 2012 - 00:11:12 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 May 2012 - 14:00:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive