Re: [R] data input strategy - lots of csv files

From: Michael Dondrup <michael.dondrup_at_cebitec.uni-bielefeld.de>
Date: Thu 11 May 2006 - 18:44:51 EST

Hi,
if you would use a list, to collect (append) all your data.frames from read.csv, you don't have to compute variable names like a1...a66, just iterate over contents of the list. By using function dir, you can read all files in a directory in a loop.

Michael

Am Thursday 11 May 2006 10:03 schrieb Sean O'Riordain:
> Good morning,
> I have currently 63 .csv files most of which have lines which look like
> 01/06/05,23445
> Though some files have two numbers beside each date. There are
> missing values, and currently the longest file has 318 rows.
>
> (merge() is losing the head and doing runaway memory allocation - but
> thats another question - I'm still trying to pin that issue down and
> make a small repeatable example)
>
> Currently I'm reading in these files with lines like
> a1 <- read.csv("daft_file_name_1.csv",header=F)
> ...
> a63 <- read.csv("another_silly_filename_63.csv",header=F)
>
> and then i'm naming the columns in these like...
> names(a1)[2] <- "silly column name"
> ...
> names(a63)[2] <- "daft column name"
>
> then trying to merge()...
> atot <- merge(a1, a2, all=T)
> and then using language manipulation to loop
> atot <- merge(atot, a3, all=T)
> ...
> atot <- merge(atot, a63, all=T)
> etc...
>
> followed by more language manipulation
> for() {
> rm(a1)
> } etc...
>
> i.e.
> for (i in 2:63) {
> atot <- merge(atot, eval(parse(text=paste("a", i, sep=""))), all=T)
> # eval(parse(text=paste("a",i,"[1] <- NULL",sep="")))
>
> cat("i is ", i, gc(), "\n")
>
> # now delete these 63 temporary objects...
> # e.g. should look like rm(a33)
> eval(parse(text=paste("rm(a",i,")", sep="")))
> }
>
> eventually getting a dataframe with the first column being the date,
> and the subsequent 63 columns being the data... with missing values
> coded as NA...
>
> so my question is... is there a better strategy for reading in lots of
> small files (only a few kbytes each) like that which are timeseries
> with missing data... which doesn't go through the above awkwardness
> (and language manipulation) but still ends up with a nice data.frame
> with NA values correctly coded etc.
>
> Many thanks,
> Sean O'Riordain
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu May 11 18:59:18 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 11 May 2006 - 20:10:06 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.