Re: [R] How to load load multiple text files and order by id

From: Dennis Murphy <>
Date: Sat, 05 Mar 2011 23:36:30 -0800


This is basically Scott's idea with a few added details.

Let's assume your files have similar names - e.g., they differ only by number.
The example below creates ten files of similar structure to yours. There are then two paths one can follow: (1) put all the files into a specific directory, or
(2) keep them where they are.

This is my current working directory (Win 7):
> getwd()

[1] "C:/Users/Dennis/Documents"

# Create ten files, each with 20 IDs and a random count. The files are then # written as csv files to the current working directory. This is simply a way
# for me to generate data that in some sense mimics the data you already have.
# You don't need to reproduce this since you already have the file list. for (i in 1:10) {

    df <- data.frame(id = sprintf('%02d', 1:20),

                     count = rpois(20, 50))
    write.csv(df, file = paste('file_', sprintf('%02d', i), '.csv', sep = ''),
                row.names = FALSE)

# Option 1: Move all the files to a separate subdirectory of the current directory-
# I'll call it 'myfiles', because I'm highly imaginative. [If your files have different names
# that are difficult to isolate with a certain string pattern, this is probably the best option.]
# Once the files are moved, I can change the working directory to myfiles: setwd('myfiles')

# > getwd()
# [1] "C:/Users/Dennis/Documents/myfiles"

# Now, read all the csv files from this directory into a list object - in your case,
# it may be simpler to define a vector of names with list.files() instead and check

# that it's right before using lapply, something like
# filelist <- list.files(pattern = '.csv', all.files = FALSE)
# readlist <- lapply(filelist, read.csv, header = TRUE)
# The line below combines the two.
readlist <- lapply(list.files(pattern = 'csv', all.files = FALSE),
                     read.csv, header = TRUE)

# Assign names count_01 to count_10 to the list components (rationale: these # are the column names I'll want to use in the final data frame) names(readlist) <- paste('count', sprintf('%02d', 1:length(readlist)), sep = '_')
# As Scott intimated (but never used :), fire up the plyr and reshape packages:
# The first command is equivalent to, readlist), but the advantage of
# ldply is that it copies over the list component names in a variable named .id as well,
# which as we'll see is very useful...
dtf <- ldply(readlist, rbind)
head(dtf) # to see the first few lines

# The cast() function in the reshape package takes our 'long' data in dtf and
# reshapes it to 'wide' form according to the formula - in this case, the rows will
# be the id numbers and the columns will be count_01 - count_10. Fortunately,
# the count is taken as the 'value' variable. (This is made more explicit in the
# reshape2 package, where the corresponding function is dcast() and count # would be (in quotes) the argument of value_var = )...but this works: cast(dtf, id ~ .id)

# Option 2: The files happen to be in the same directory as getwd(), but may be
# mixed in with a bunch of other files. This is the case in my Documents directory.
[1] "C:/Users/Dennis/Documents"

# I may have other .csv files in this directory, so I'm probably better off trying to
# match 'file_' instead of '.csv'. Otherwise, it's pretty much the same story as above:
list2 <- lapply(list.files(pattern = 'file_', all.files = FALSE),

                     read.csv, header = TRUE)
names(list2) <- paste('count', sprintf('%02d', 1:length(readlist)), sep = '_')
dtg <- ldply(list2, rbind)
cast(dtg, id ~ .id)

A third option is to create a separate subdirectory for the data files, copy an R shortcut
into that directory (at least under Windows, anyway), go to Properties and change the
'StartIn' directory to its name. Then follow Option 1.


On Sat, Mar 5, 2011 at 6:39 PM, Richard Green <> wrote:

> Hello R users,
> I am fairly new to R and was hoping you could point me in the right
> direction I have a set of text files (36).
> Each file has only two columns (id and count) , I am trying to figure out a
> way to load all the files together and
> then have them ordered by id into a matrix data frame. For example


> If each txt file has :
> ID count
> id_00002 20
> id_00003 3

> A Merged File:
> ID count_file1 count_file2 count_file3 count_file4
> id_00002 20 8 12 5 19 26
> id_00003 3 0 2 0 0 0
> id_00004 75 84 241 149 271 257

> Is there a relatively simply way to do that in R? I was trying with <-
> read.table
> and then <- cbind but that does not appear to be working. Any suggestions
> folks have are appreciated.
> Thanks
> -Rich

> [[alternative HTML version deleted]]
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]] mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Sun 06 Mar 2011 - 07:39:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 06 Mar 2011 - 09:10:19 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive