[Rd] Lightweight data frame class

From: Vadim Ogranovich <vograno_at_evafunds.com>
Date: Fri 26 Nov 2004 - 10:31:07 EST


Hi,  

As far as I can tell data.frame class adds two features to those of lists:
* matrix structure via [,] and [,]<- operators (well, I know these are actually "["(i, j, ...), not "[,]").
* row names attribute.  

It seems that the overhead of the support for the row names, both computational and RAM-wise, is rather non-trivial. I frequently subscript from a data.frame, i.e. use [,] on data frames, and my timing shows that the equivalent list operation is about 7 times faster, see below.  

On the other hand, at least in my usage pattern, I really rarely benefit from the row names attribute, so as far as I am concerned row names is just an overhead. (Of course the speed difference may be due to other factors, the only thing I can tell is that subscripting is very slow in data frames relative to in lists).  

I thought of writing a new class, say lightweight.data.frame, that would be polymorphic with the existing data.frame class. The class would inherit from "list" and implement [,], [,]<- operators. It would also implement the "rownames" function that would return seq(nrow(x)), etc. It should also implement as.data.frame to avoid the overhead of conversion to a full-blown data.frame in calls like lm(y ~ x, data=myLightweightDataframe).  

Has anyone thought of this? Can you see any potential problems?  

Thanks,
Vadim      

P.S. These are the timing results comparing data.frame operations to those of lists

# make a 1e6 * 5 list
> system.time(x <- lapply(seq(5), function(x) rnorm(1e6)))
[1] 4.46 0.10 4.57 0.00 0.00
# convert it to a data.frame
> system.time(y <- as.data.frame(x))

[1] 49.17 1.25 50.61 0.00 0.00
# do an equivalent of x[-1,] on the list
> i <- seq(2, nrow(y)); system.time(x.sub <- lapply(x, function(x)
x[i]))
[1] 0.19 0.15 0.35 0.00 0.00
# do an equivalent of x[-1,] on the data.frame
> i <- seq(2, nrow(y)); system.time(y.sub <- y[i,])
[1] 2.08 0.56 2.64 0.00 0.00
> 2.64/0.35

[1] 7.542857



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Nov 26 10:38:33 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:01:43 EST