[R] read large amount of data

From: Weiwei Shi <helprhelp_at_gmail.com>
Date: Tue 19 Jul 2005 - 01:34:51 EST

I have a dataset with 2194651x135, in which all the numbers are 0,1,2, and is bar-delimited.

I used the following approach which can handle 100,000 lines: t<-scan('fv', sep='|', nlines=100000)

t1<-matrix(t, nrow=135, ncol=100000)

I changed my plan into using stratified sampling with replacement (col 2 is my class variable: 1 or 2). The class distr is like: awk -F\| '{print $2}' fv | sort | uniq -c 2162792 1
  31859 2

Is it possible to use R to read the whole dataset and do the stratified sampling? Is it really dependent on my memory size? Mem: 3111736k total, 1023040k used, 2088696k free, 150160k buffers Swap: 4008208k total, 19040k used, 3989168k free, 668892k cached



Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Jul 19 01:42:24 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:46 EST