Re: [R] data.frame() size

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Fri 09 Dec 2005 - 05:57:29 EST

Matthew Dowle <mdowle@concordiafunds.com> writes:

> Hi,
>
> In the example below why is d 10 times bigger than m, according to
> object.size ? It also takes around 10 times as long to create, which fits
> with object.size() being truthful. gcinfo(TRUE) also indicates a great deal
> more garbage collector activity caused by data.frame() than matrix().
>
> $ R --vanilla
> ....
> > nr = 1000000
> > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))
> [1] 0.22 0.01 0.23 0.00 0.00
> > system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))
> [1] 2.81 0.20 3.01 0.00 0.00 # 10 times longer
>
> > dim(m)
> [1] 1000000 2
> > dim(d)
> [1] 1000000 2 # same dimensions
>
> > storage.mode(m)
> [1] "integer"
> > sapply(d, storage.mode)
> a b
> "integer" "integer" # same storage.mode
>
> > object.size(m)/1024^2
> [1] 7.629616
> > object.size(d)/1024^2
> [1] 76.29482 # but 10 times bigger
>
> > sum(sapply(d, object.size))/1024^2
> [1] 7.629501 # or is it ? If its not
> really 10 times bigger, why 10 times longer above ?

Row names!!

> r <- as.character(1:1e6)
> object.size(r)

[1] 72000056
> object.size(r)/1024^2

[1] 68.6646

'nuff said?

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Dec 09 06:19:56 2005

This archive was generated by hypermail 2.1.8 : Fri 09 Dec 2005 - 09:31:34 EST