[Rd] Severe memory problem using split()

From: cstrato <cstrato_at_aon.at>
Date: Mon, 12 Jul 2010 22:45:43 +0200

Dear all,

With great interest I followed the discussion: https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html since I have currently a similar problem:

In a new R session (using xterm) I am importing a simple table "Hu6800_ann.txt" which has a size of 754KB only:

> ann <- read.delim("Hu6800_ann.txt")
> dim(ann)

[1] 7129 11

When I call "object.size(ann)" the estimated memory used to store "ann" is already 2MB:

> object.size(ann)

2034784 bytes

Now I call "split()" and check the estimated memory used which turns out to be 3.3GB:

> u2p <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
> object.size(u2p)

3323768120 bytes

During the R session I am running "top" in another xterm and can see that the memory usage of R increases to about 550MB RSIZE.

Now I do:

> object.size(unlist(u2p))

894056 bytes

It takes about 3 minutes to complete this call and the memory usage of R increases to about 1.3GB RSIZE. Furthermore, during evaluation of this function the free RAM of my Mac decreases to less than 8MB free PhysMem, until it needs to swap memory. When finished, free PhysMem is 734MB but the size of R increased to 577MB RSIZE.

Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not change the object.size, only processing was faster and it did use less memory on my Mac.

Do you have any idea what the reason for this behavior is? Why is the size of list "u2p" so large?
Do I make any mistake?

Here is my sessionInfo on a MacBook Pro with 2GB RAM:

> sessionInfo()

R version 2.11.1 (2010-05-31)

[1] C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

Best regards

C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a           A.u.s.t.r.i.a
e.m.a.i.l:        cstrato at aon.at

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 12 Jul 2010 - 20:52:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 12 Jul 2010 - 22:10:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive