Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue, 01 May 2012 13:46:50 +0100

On 01/05/2012 00:28, Antonio Piccolboni wrote:
> Hi,
> I was wondering if there is anything more efficient than split to do the
> kind of conversion in the subject. If I create a data frame as in
>
> system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x",
> 1:2000, sep =""))})
> user system elapsed
> 0.004 0.000 0.004
>
> and then I try to split it
>
>> system.time(split(fd, 1:nrow(fd)))
> user system elapsed
> 0.333 0.031 0.415
>
>
> You will be quick to notice the roughly two orders of magnitude difference
> in time between creation and conversion. Granted, it's not written anywhere

Unsurprising when you create three orders of magnitude more data frames, is it? That's a list of 2000 data frames. Try

system.time(for(i in 1:2000) data.frame(x = i, y = rnorm(1), id = paste0("x", i)))

> that they should be similar but the latter seems interpreter-slow to me
> (split is implemented with a lapply in the data frame case) There is also a
> memory issue when I hit about 20000 elements (allocating 3GB when
> interrupted). So before I resort to Rcpp, despite the electrifying feeling
> of approaching the bare metal and for the sake of getting things done, I
> thought I would ask the experts. Thanks

You need to re-think your data structures: 1-row data frames are not sensible.

>
>
> Antonio
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 01 May 2012 - 12:49:33 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 May 2012 - 18:20:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive