Re: [Rd] Severe memory problem using split()

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Mon, 12 Jul 2010 14:44:00 -0700

On 07/12/2010 01:45 PM, cstrato wrote:
> Dear all,
>
> With great interest I followed the discussion:
> https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html
> since I have currently a similar problem:
>
> In a new R session (using xterm) I am importing a simple table
> "Hu6800_ann.txt" which has a size of 754KB only:
>

>> ann <- read.delim("Hu6800_ann.txt")
>> dim(ann)

> [1] 7129 11
>
>
> When I call "object.size(ann)" the estimated memory used to store "ann"
> is already 2MB:
>
>> object.size(ann)

> 2034784 bytes
>
>
> Now I call "split()" and check the estimated memory used which turns out
> to be 3.3GB:
>
>> u2p  <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
>> object.size(u2p)

> 3323768120 bytes

I guess things improve with stringsAsFactors=FALSE in read.delim?

Martin

>
> During the R session I am running "top" in another xterm and can see
> that the memory usage of R increases to about 550MB RSIZE.
>
>
> Now I do:
>

>> object.size(unlist(u2p))

> 894056 bytes
>
> It takes about 3 minutes to complete this call and the memory usage of R
> increases to about 1.3GB RSIZE. Furthermore, during evaluation of this
> function the free RAM of my Mac decreases to less than 8MB free PhysMem,
> until it needs to swap memory. When finished, free PhysMem is 734MB but
> the size of R increased to 577MB RSIZE.
>
> Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not
> change the object.size, only processing was faster and it did use less
> memory on my Mac.
>
> Do you have any idea what the reason for this behavior is?
> Why is the size of list "u2p" so large?
> Do I make any mistake?
>
>
> Here is my sessionInfo on a MacBook Pro with 2GB RAM:
>
>> sessionInfo()

> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> Best regards
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
> V.i.e.n.n.a A.u.s.t.r.i.a
> e.m.a.i.l: cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Mon 12 Jul 2010 - 21:45:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 12 Jul 2010 - 22:20:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive