Re: [R] how to read in multiple files with unequal number of columns

From: jim holtman <jholtman_at_gmail.com>
Date: Wed, 23 Apr 2008 08:47:18 -0400

Is this what you want? I am assuming that you will read the dataframes into a list and then process them like below:

> # put dataframe in a list -- would have read them in via a list
> x <- list(d, my.fake.data)
> # determine maximum number of columns and then pad out the short one
> # also use the column names of the largest one
>
> col.max <- max(sapply(x, ncol))
> colNames <- lapply(x, function(.data){
+ if (ncol(.data) == col.max) colnames(.data) + })[[1]]
> new.data <- lapply(x, function(.data){

+     if (ncol(.data) < col.max){
+         .data[(ncol(.data) + 1):col.max] <- NA
+         colnames(.data) <- colNames
+     }
+     .data
+ })

> all <- do.call(rbind, new.data)
> all

   x y fac
1 1 1 B
2 1 2 B
3 1 3 B
4 1 4 B
5 1 5 A
6 1 6 A
7 1 7 C
8 1 8 C
9 1 9 A
10 1 10 C
11 1 2 <NA>
>

On Tue, Apr 22, 2008 at 9:05 AM, Tania Oh <tania.oh_at_bnc.ox.ac.uk> wrote:
> Dear all,
>
> I want to read in 1000 files which contain varying number of columns.
> For example:
>
> file[1] contains 8 columns (mixture of characters and numbers)
> file[2] contains 16 columns etc
>
> I'm reading everything into one big data frame and when I try rbind, R
> returns an error of
> "Error in rbind(deparse.level, ...) :
> numbers of columns of arguments do not match"
>
>
> Below is my code:
>
> all <- NULL
> all <- as.data.frame(all)
>
> ##read in the contents of the files
> for (f in 1:length(fnames)){
>
> tmp <- try(read.table(fnames[f], header=F, fill=T, sep="\t"),
> TRUE)
>
> if (class(tmp) == "try-error") {
> next ## skip this file if it's empty/non-existent
> }else{
> ## combine all the file contents into one big data frame
> all <- rbind(all, tmp)
> }
> }
>
>
> Here is some example of what the data in the files:
>
> L3 <- LETTERS[1:3]
> (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE)))
>
> > str(d)
> 'data.frame': 10 obs. of 3 variables:
> $ x : num 1 1 1 1 1 1 1 1 1 1
> $ y : num 1 2 3 4 5 6 7 8 9 10
> $ fac: Factor w/ 3 levels "A","B","C": 1 3 1 2 2 2 2 1 1 2
>
> my.fake.data <- data.frame(cbind(x=1, y=2))
> > str(my.fake.data)
> 'data.frame': 1 obs. of 2 variables:
> $ x: num 1
> $ y: num 2
>
>
> all <- rbind(d, my.fake.data)
>
> Error in rbind(deparse.level, ...) :
> numbers of columns of arguments do not match
>
>
> I've searched the R-site but couldn't find any relevant solution.I
> might have used the wrong keywords to search, so if this question has
> been answered already, I'd be very grateful if someone could point me
> to the post. Else any help/suggestions would be greatly appreciated.
>
> Many thanks in advance,
> tania

>
> D.Phil student
> Department of Physiology, Anatomy and Genetics
> University of Oxford
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 23 Apr 2008 - 12:55:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Apr 2008 - 13:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive