Re: [Rd] [R] data.frame() size

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Fri 09 Dec 2005 - 17:37:30 GMT

There was nothing attached in the copy that came through to me.

By the way, there was some discussion earlier this year on a light-weight data.frame class but I don't think anyone ever posted any code.

On 12/9/05, Matthew Dowle <mdowle@concordiafunds.com> wrote:
>
> Hi,
>
> Please see below for post on r-help regarding data.frame() and the
> possibility of dropping rownames, for space and time reasons.
> I've made some changes, attached, and it seems to be working well. I see the
> expected space (90% saved) and time (10 times faster) savings. There are no
> doubt some bugs, and needs more work and testing, but I thought I would post
> first at this stage.
>
> Could some changes along these lines be made to R ? I'm happy to help with
> testing and further work if required. In the meantime I can work with
> overloaded functions which fixes the problems in my case.
>
> Functions effected :
>
> dim.data.frame
> format.data.frame
> print.data.frame
> data.frame
> [.data.frame
> as.matrix.data.frame
>
> Modified source code attached.
>
> Regards,
> Matthew
>
>
> -----Original Message-----
> From: Matthew Dowle
> Sent: 09 December 2005 09:44
> To: 'Peter Dalgaard'
> Cc: 'r-help@stat.math.ethz.ch'
> Subject: RE: [R] data.frame() size
>
>
>
> That explains it. Thanks. I don't need rownames though, as I'll only ever
> use integer subscripts. Is there anyway to drop them, or even better not
> create them in the first place? The memory saved (90%) by not having them
> and 10 times speed up would be very useful. I think I need a data.frame
> rather than a matrix because I have columns of different types in real life.
>
> > rownames(d) = NULL
> Error in "dimnames<-.data.frame"(`*tmp*`, value = list(NULL, c("a", "b" :
> invalid 'dimnames' given for data frame
>
>
> -----Original Message-----
> From: pd@pubhealth.ku.dk [mailto:pd@pubhealth.ku.dk] On Behalf Of Peter
> Dalgaard
> Sent: 08 December 2005 18:57
> To: Matthew Dowle
> Cc: 'r-help@stat.math.ethz.ch'
> Subject: Re: [R] data.frame() size
>
>
> Matthew Dowle <mdowle@concordiafunds.com> writes:
>
> > Hi,
> >
> > In the example below why is d 10 times bigger than m, according to
> > object.size ? It also takes around 10 times as long to create, which
> > fits with object.size() being truthful. gcinfo(TRUE) also indicates a
> > great deal more garbage collector activity caused by data.frame() than
> > matrix().
> >
> > $ R --vanilla
> > ....
> > > nr = 1000000
> > > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))
> > [1] 0.22 0.01 0.23 0.00 0.00
> > > system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))
> > [1] 2.81 0.20 3.01 0.00 0.00 # 10 times longer
> >
> > > dim(m)
> > [1] 1000000 2
> > > dim(d)
> > [1] 1000000 2 # same dimensions
> >
> > > storage.mode(m)
> > [1] "integer"
> > > sapply(d, storage.mode)
> > a b
> > "integer" "integer" # same storage.mode
> >
> > > object.size(m)/1024^2
> > [1] 7.629616
> > > object.size(d)/1024^2
> > [1] 76.29482 # but 10 times bigger
> >
> > > sum(sapply(d, object.size))/1024^2
> > [1] 7.629501 # or is it ? If its not
> > really 10 times bigger, why 10 times longer above ?
>
> Row names!!
>
>
> > r <- as.character(1:1e6)
> > object.size(r)
> [1] 72000056
> > object.size(r)/1024^2
> [1] 68.6646
>
> 'nuff said?
>
> --
> O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
>
>
>
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat Dec 10 04:48:28 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:34 GMT