Re: [R] large data set, error: cannot allocate vector

From: Robert Citek <rwcitek_at_alum.calberkeley.org>
Date: Wed 10 May 2006 - 04:22:23 EST

On May 8, 2006, at 9:47 AM, Thomas Lumley wrote:
> On Fri, 5 May 2006, Robert Citek wrote:
>> Reloading the 10 MM dataset:
>>
>> R > foo <- read.delim("dataset.010MM.txt")
>>
>> R > object.size(foo)
>> [1] 440000376
>>
>> R > gc()
>> used (Mb) gc trigger (Mb) max used (Mb)
>> Ncells 10183941 272.0 15023450 401.2 10194267 272.3
>> Vcells 20073146 153.2 53554505 408.6 50086180 382.2
>>
>> Combined, Ncells or Vcells appear to take up about 700 MB of RAM,
>> which is about 25% of the 3 GB available under Linux on 32-bit
>> architecture. Also, removing foo seemed to free up "used" memory,
>> but didn't change the "max used":
>
> No, that's what "max" means. You need gc(reset=TRUE) to reset the
> max.

Yup, that worked (see below). The example from ?gc wasn't that clear to me. Thanks for clarifying. I also found it informative to compare loading data into a data.frame vs a vector.

$ cat <<eof | R -q --no-save
gc()
foo <- read.delim("dataset.010MM.txt")

gc()
rm(foo)
gc()
gc(reset=TRUE)

eof

R > gc()

          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 177865  4.8     407500 10.9   350000  9.4
Vcells  72114  0.6     786432  6.0   333941  2.6

R > foo <- read.delim("dataset.010MM.txt")

R > gc()

            used (Mb) gc trigger (Mb) max used (Mb) Ncells 10179849 271.9 15023450 401.2 10180159 271.9 Vcells 20072448 153.2 47764583 364.5 46849682 357.5

R > rm(foo)

R > gc()

          used (Mb) gc trigger (Mb) max used (Mb) Ncells 179910 4.9 12018759 321.0 10181187 271.9 Vcells 72458 0.6 38211666 291.6 46849682 357.5

R > gc(reset=TRUE)

          used (Mb) gc trigger (Mb) max used (Mb) Ncells 179920 4.9 9615007 256.8 179920 4.9 Vcells 72482 0.6 30569332 233.3 72482 0.6

$ cat <<eof | R -q --no-save
gc()
foo <- scan("dataset.010MM.txt")

gc()
rm(foo)
gc()
gc(reset=TRUE)

eof

R > gc()

          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 177865  4.8     407500 10.9   350000  9.4
Vcells  72114  0.6     786432  6.0   333941  2.6

R > foo <- scan("dataset.010MM.txt")
Read 10000000 items

R > gc()

            used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   178230  4.8     407500  10.9   350000   9.4
Vcells 10072185 76.9 26713872 203.9 26456224 201.9

R > rm(foo)

R > gc()

          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 178286  4.8     407500  10.9   350000   9.4
Vcells 72190 0.6 21371097 163.1 26456224 201.9

R > gc(reset=TRUE)

          used (Mb) gc trigger  (Mb) max used (Mb)
Ncells 178296  4.8     407500  10.9   178296  4.8
Vcells 72214 0.6 17096877 130.5 72214 0.6

Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed May 10 04:27:08 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 10 May 2006 - 06:10:04 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.