[R] "Large" data set: performance issue

About this list Date view Thread view Subject view Author view Attachment view

From: Till Baumgaertel (till.baumgaertel@epost.de)
Date: Tue 02 Apr 2002 - 22:51:26 EST


Message-id: <3C97A93300008A7B@mail.epost.de>

hi all,

I've got to import CSV-datasets (with variable-names in the first line)
into data.frames. each is about 12MB (or more!) with 1823 columns and about
500 rows. the first 22 columns are in "character"-mode, the rest is "numeric".

I run R 1.4.1 on a Windows 2000 system.

First I tried read.table() which works fine for a low number of cases (say,
40). with all cases the function does not return within one hour (celeron@600mhz,
256 MB).

Then I tried scan() which is almost OK.
I scan() the first line for var-names, then the rest. the data-matrix get
transposed and as.data.frame()'ed.

the problem is converting the last 1801 variabales to "numeric"-mode.

i use the following snippet:
i <- 23;
while( i <= totCols){
        datframe[,i]<-as.numeric(datframe[,i]);
        i <- i + 1;
}

each step takes ~2 secs which makes all in all about an hour.

I suppose I do something really stupid. For reading the data I use
datfull<-scan(filename,sep=",",skip=1,what="character")
which gives me a transposed matrix of my data (variables in rows).

If this wasn't, maybe I could give the "what"-parameter a vector value with
the appropriate variable-types?

Sorry, but I really got stuck and don't know any further.

thanks,
Till

________________________________________
Zeitschriftenabos online bestellen - jetzt neu im Infoboten! http://www.epost.de

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Wed 16 Oct 2002 - 11:57:10 EST