Re: [R] reading a big file

From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Thu, 24 May 2007 11:30:53 -0700

On Thu, 24 May 2007, Christoph Scherber wrote:

> Dear Remigijus,
>
> You should change memory allocation in Windows XP, as described in
>
> http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021

Porbably, this will not solve the problem as the object to be created will need 400 MB and scan() will require memory to create that object. Not to mention that the OS will consume a chunk of RAM.

>
> Hope this helps.
>
> Best wishes
> Christoph
>
>
> --
> Christoph Scherber
> DNPW, Agroecology
> University of Goettingen
> Waldweg 26
> D-37073 Goettingen
>
> +49-(0)551-39-8807
>
>
>
>
> Remigijus Lapinskas schrieb:
>> Dear All,
>>
>> I am on WindowsXP with 512 MB of RAM, R 2.4.0, and I want to read in a
>> big file mln100.txt. The file is 390MB big, it contains a column of 100
>> millions integers.
>>
>>> mln100=scan("mln100.txt")
>> Error: cannot allocate vector of size 512000 Kb
>> In addition: Warning messages:
>> 1: Reached total allocation of 511Mb: see help(memory.size)
>> 2: Reached total allocation of 511Mb: see help(memory.size)
>>
>> In fact, I would be quite happy if I could read, say, every tenth
>> integer (line) of the file. Is it possible to do this?
>>

To save out the first, eleventh, etc:

mln.con <- file("tmp.txt",open="r")
res <- rep(0,10)
for (i in 1:10 ) res[i] <- as.integer( readLines( mln.con ,n = 10 )[1] )

>> Cheers,
>> Rem
>>
>> ______________________________________________
>> R-help_at_stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> .
>>
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry_at_tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 24 May 2007 - 18:33:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 24 May 2007 - 19:31:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.