Re: [R] How to load load multiple text files and order by id

From: Dennis Murphy <djmuser_at_gmail.com>
Date: Sat, 05 Mar 2011 23:36:30 -0800

Hi:

This is basically Scott's idea with a few added details.

Let's assume your files have similar names - e.g., they differ only by number.
The example below creates ten files of similar structure to yours. There are then two paths one can follow: (1) put all the files into a specific directory, or
(2) keep them where they are.

This is my current working directory (Win 7):
> getwd()

[1] "C:/Users/Dennis/Documents"

# Create ten files, each with 20 IDs and a random count. The files are then # written as csv files to the current working directory. This is simply a way
# for me to generate data that in some sense mimics the data you already have.
# You don't need to reproduce this since you already have the file list. for (i in 1:10) {

    df <- data.frame(id = sprintf('%02d', 1:20),

                     count = rpois(20, 50))
    write.csv(df, file = paste('file_', sprintf('%02d', i), '.csv', sep = ''),
                row.names = FALSE)
     }

# Option 1: Move all the files to a separate subdirectory of the current directory-
# I'll call it 'myfiles', because I'm highly imaginative. [If your files have different names
# that are difficult to isolate with a certain string pattern, this is probably the best option.]
# Once the files are moved, I can change the working directory to myfiles: setwd('myfiles')

# > getwd()
# [1] "C:/Users/Dennis/Documents/myfiles"

# Now, read all the csv files from this directory into a list object - in your case,
# it may be simpler to define a vector of names with list.files() instead and check

# that it's right before using lapply, something like
# filelist <- list.files(pattern = '.csv', all.files = FALSE)
# readlist <- lapply(filelist, read.csv, header = TRUE)
# The line below combines the two.
readlist <- lapply(list.files(pattern = 'csv', all.files = FALSE),
                     read.csv, header = TRUE)

# Assign names count_01 to count_10 to the list components (rationale: these # are the column names I'll want to use in the final data frame) names(readlist) <- paste('count', sprintf('%02d', 1:length(readlist)), sep = '_')
# As Scott intimated (but never used :), fire up the plyr and reshape packages:
library(plyr)
library(reshape)
# The first command is equivalent to do.call(rbind, readlist), but the advantage of
# ldply is that it copies over the list component names in a variable named .id as well,
# which as we'll see is very useful...
dtf <- ldply(readlist, rbind)
head(dtf) # to see the first few lines

# The cast() function in the reshape package takes our 'long' data in dtf and
# reshapes it to 'wide' form according to the formula - in this case, the rows will
# be the id numbers and the columns will be count_01 - count_10. Fortunately,
# the count is taken as the 'value' variable. (This is made more explicit in the
# reshape2 package, where the corresponding function is dcast() and count # would be (in quotes) the argument of value_var = )...but this works: cast(dtf, id ~ .id)

# Option 2: The files happen to be in the same directory as getwd(), but may be
# mixed in with a bunch of other files. This is the case in my Documents directory.
setwd('..')
getwd()
[1] "C:/Users/Dennis/Documents"

# I may have other .csv files in this directory, so I'm probably better off trying to
# match 'file_' instead of '.csv'. Otherwise, it's pretty much the same story as above:
list2 <- lapply(list.files(pattern = 'file_', all.files = FALSE),

                     read.csv, header = TRUE)
names(list2) <- paste('count', sprintf('%02d', 1:length(readlist)), sep = '_')
dtg <- ldply(list2, rbind)
cast(dtg, id ~ .id)

A third option is to create a separate subdirectory for the data files, copy an R shortcut
into that directory (at least under Windows, anyway), go to Properties and change the
'StartIn' directory to its name. Then follow Option 1.

HTH,
Dennis

On Sat, Mar 5, 2011 at 6:39 PM, Richard Green <greener_at_uw.edu> wrote:

> Hello R users,
> I am fairly new to R and was hoping you could point me in the right
> direction I have a set of text files (36).
> Each file has only two columns (id and count) , I am trying to figure out a
> way to load all the files together and
> then have them ordered by id into a matrix data frame. For example

>

> If each txt file has :
> ID count
> id_00002 20
> id_00003 3
>

> A Merged File:
> ID count_file1 count_file2 count_file3 count_file4
> id_00002 20 8 12 5 19 26
> id_00003 3 0 2 0 0 0
> id_00004 75 84 241 149 271 257
>

> Is there a relatively simply way to do that in R? I was trying with <-
> read.table
> and then <- cbind but that does not appear to be working. Any suggestions
> folks have are appreciated.
> Thanks
> -Rich
>

> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 06 Mar 2011 - 07:39:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 06 Mar 2011 - 09:10:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive