Re: [Rd] Severe memory problem using split()

From: cstrato <cstrato_at_aon.at>
Date: Tue, 13 Jul 2010 00:00:14 +0200

Dear Martin,

Thank you, you are right, now I get:

 > ann <- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE)  > object.size(ann)
2035952 bytes
 > u2p <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])  > object.size(u2p)
1207368 bytes
 > object.size(unlist(u2p))
865176 bytes

Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of a table of size 754KB seems still to be pretty large?

Best regards
Christian

On 7/12/10 11:44 PM, Martin Morgan wrote:
> On 07/12/2010 01:45 PM, cstrato wrote:
>> Dear all,
>>
>> With great interest I followed the discussion:
>>
https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html
>> since I have currently a similar problem:
>>
>> In a new R session (using xterm) I am importing a simple table
>> "Hu6800_ann.txt" which has a size of 754KB only:

>>
>>> ann<- read.delim("Hu6800_ann.txt")
>>> dim(ann)
>> [1] 7129 11
>>
>>
>> When I call "object.size(ann)" the estimated memory used to store "ann"
>> is already 2MB:

>>
>>> object.size(ann)
>> 2034784 bytes
>>
>>
>> Now I call "split()" and check the estimated memory used which turns out
>> to be 3.3GB:

>>
>>> u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
>>> object.size(u2p)
>> 3323768120 bytes
>
> I guess things improve with stringsAsFactors=FALSE in read.delim?
>
> Martin
>
>>
>> During the R session I am running "top" in another xterm and can see
>> that the memory usage of R increases to about 550MB RSIZE.

>>
>>
>> Now I do:
>>
>>> object.size(unlist(u2p))
>> 894056 bytes
>>
>> It takes about 3 minutes to complete this call and the memory usage of R
>> increases to about 1.3GB RSIZE. Furthermore, during evaluation of this
>> function the free RAM of my Mac decreases to less than 8MB free PhysMem,
>> until it needs to swap memory. When finished, free PhysMem is 734MB but
>> the size of R increased to 577MB RSIZE.

>>
>> Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not
>> change the object.size, only processing was faster and it did use less
>> memory on my Mac.

>>
>> Do you have any idea what the reason for this behavior is?
>> Why is the size of list "u2p" so large?
>> Do I make any mistake?

>>
>>
>> Here is my sessionInfo on a MacBook Pro with 2GB RAM:

>>
>>> sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> Best regards
>> Christian
>> _._._._._._._._._._._._._._._._._._
>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>> V.i.e.n.n.a A.u.s.t.r.i.a
>> e.m.a.i.l: cstrato at aon.at
>> _._._._._._._._._._._._._._._._._._
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 12 Jul 2010 - 22:02:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 12 Jul 2010 - 23:00:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive