Re: [R] how to read in multiple files with unequal number of columns

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon, 28 Apr 2008 11:16:36 +0100 (BST)

With the help of some reproducible code from Tania I traked this down. She started with all=NULL as the first argument, and merge() was failing when there were no common columns and no rows in one of the inputs (as expand.grid failed). I've fixed that in R-patched.

Using all for your object when it is both a function in R and an argument name in merge() is confusing to humans (if not in this case to R).

On Tue, 22 Apr 2008, Tania Oh wrote:

> Thanks Ingmar,
>
> but when I used merge in :
>
> all <- merge(all, tmp),
>
> I get an error:
>
> Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
> invalid 'times' value
>
> is the error because of the way I initialised 'all'?
> what is the correct way of using merge in this case?
>
> thanks
> tania
>
>
>
>
> On 22 Apr 2008, at 14:12, Ingmar Visser wrote:
>
>> you may be looking for ?merge
>> hth, Ingmar
>>
>> On 22 Apr 2008, at 15:05, Tania Oh wrote:
>>
>>> Dear all,
>>>
>>> I want to read in 1000 files which contain varying number of columns.
>>> For example:
>>>
>>> file[1] contains 8 columns (mixture of characters and numbers)
>>> file[2] contains 16 columns etc
>>>
>>> I'm reading everything into one big data frame and when I try
>>> rbind, R
>>> returns an error of
>>> "Error in rbind(deparse.level, ...) :
>>> numbers of columns of arguments do not match"
>>>
>>>
>>> Below is my code:
>>>
>>> all <- NULL
>>> all <- as.data.frame(all)
>>>
>>> ##read in the contents of the files
>>> for (f in 1:length(fnames)){
>>>
>>> tmp <- try(read.table(fnames[f], header=F, fill=T, sep="\t"),
>>> TRUE)
>>>
>>> if (class(tmp) == "try-error") {
>>> next ## skip this file if it's empty/non-existent
>>> }else{
>>> ## combine all the file contents into one big data frame
>>> all <- rbind(all, tmp)
>>> }
>>> }
>>>
>>>
>>> Here is some example of what the data in the files:
>>>
>>> L3 <- LETTERS[1:3]
>>> (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10,
>>> replace=TRUE)))
>>>
>>>> str(d)
>>> 'data.frame': 10 obs. of 3 variables:
>>> $ x : num 1 1 1 1 1 1 1 1 1 1
>>> $ y : num 1 2 3 4 5 6 7 8 9 10
>>> $ fac: Factor w/ 3 levels "A","B","C": 1 3 1 2 2 2 2 1 1 2
>>>
>>> my.fake.data <- data.frame(cbind(x=1, y=2))
>>>> str(my.fake.data)
>>> 'data.frame': 1 obs. of 2 variables:
>>> $ x: num 1
>>> $ y: num 2
>>>
>>>
>>> all <- rbind(d, my.fake.data)
>>>
>>> Error in rbind(deparse.level, ...) :
>>> numbers of columns of arguments do not match
>>>
>>>
>>> I've searched the R-site but couldn't find any relevant solution.I
>>> might have used the wrong keywords to search, so if this question has
>>> been answered already, I'd be very grateful if someone could point me
>>> to the post. Else any help/suggestions would be greatly appreciated.
>>>
>>> Many thanks in advance,
>>> tania
>>>
>>> D.Phil student
>>> Department of Physiology, Anatomy and Genetics
>>> University of Oxford
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> Ingmar Visser
>> Department of Psychology, University of Amsterdam
>> Roetersstraat 15
>> 1018 WB Amsterdam
>> The Netherlands
>> t: +31-20-5256723
>>
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 28 Apr 2008 - 10:44:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 28 Apr 2008 - 11:30:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive