Re: [R] How to read only specified columns from a data file

From: Sarah Goslee <sarah.goslee_at_gmail.com>
Date: Wed, 16 Mar 2011 09:19:39 -0400

On Wed, Mar 16, 2011 at 9:07 AM, Luis Ridao <luridao_at_gmail.com> wrote:
> This is my code:
>
> mycols <- rep(NULL, 430) ; mycols[c(1,3:5)] <- rep("numeric", 4) ;
> mycols[c(2)] <- rep("character",1)

rep(NULL, 430) does not give you a vector of length 430; it gives you a NULL vector, and at the end of this process mycols is of length 5.

So read.table() does exactly what you've told it, and reads in the columns as calculated from the first five rows, and gives the first five columns the classes
specified in mycols.

According to the documentation for read.table(), you want "NULL" rather than NULL anyway, and rep("NULL", 430) should work as expected.

Sarah

> inp <- read.table(myfile, skip=2, colClasses=mycols,fill=T)
> head(inp)
>
> Best,
> Luis
>
> On Wed, Mar 16, 2011 at 1:03 PM, David Winsemius <dwinsemius_at_comcast.net> wrote:
>>
>> On Mar 16, 2011, at 8:13 AM, Sarah Goslee wrote:
>>
>>> read.table() looks at the first five rows when determining how many
>>> columns
>>> there are. If there are more columns in row 7 and you do not specify that
>>> in
>>> the read.table() command directly, they will be wrapped to the next row.
>>>
>>> This was discussed on the list within the last couple weeks.
>>
>> In addition to Sarah's comments, I also not that you did not include your
>> code. I don't think it could have been identical to the code I suggested,
>> which was in turn based on the code you had proposed. So ... what did you do
>> to get that result?
>>
>>
>> --
>> David.
>>
>>>
>>> Sarah
>>>
>>> On Wed, Mar 16, 2011 at 7:54 AM, Luis Ridao <luridao_at_gmail.com> wrote:
>>>>
>>>> David,
>>>>
>>>> Thanks for your tip but it seems I'm having problems with the number
>>>> of columns R manages to read in. Below it s an example of the data read
>>>> in:
>>>>
>>>>> inp[1:20,]
>>>>
>>>>       V1          V2        V3       V4     V5     V6     V7     V8
>>>> V9
>>>> 1   1.0000 log_fy_coff -1.007600 0.119520 1.0000     NA            NA
>>>> NA
>>>> 2   2.0000 log_fy_coff -0.935010 0.112840 0.8896 1.0000            NA
>>>> NA
>>>> 3   3.0000 log_fy_coff -0.876260 0.107500 0.8219 0.8847 1.0000     NA
>>>> NA
>>>> 4   4.0000 log_fy_coff -0.683090 0.103030 0.7656 0.8143 0.8747 1.0000
>>>> NA
>>>> 5   5.0000 log_fy_coff -0.623500 0.100980 0.7206 0.7636 0.8086 0.8764
>>>> 1.0000
>>>> 6   6.0000 log_fy_coff -0.583330 0.098978 0.6819 0.7214 0.7615 0.8150
>>>> 0.8762
>>>> 7   1.0000                    NA       NA     NA     NA            NA
>>>> NA
>>>> 8   7.0000 log_fy_coff -0.676790 0.096608 0.6521 0.6892 0.7254 0.7719
>>>> 0.8148
>>>> 9   0.8717      1.0000        NA       NA     NA     NA            NA
>>>> NA
>>>> 10  8.0000 log_fy_coff -0.696060 0.093761 0.6297 0.6654 0.6988 0.7405
>>>> 0.7750
>>>> 11  0.8116      0.8643  1.000000       NA     NA     NA            NA
>>>> NA
>>>> 12  9.0000 log_fy_coff -0.527060 0.089949 0.6003 0.6347 0.6667 0.7060
>>>> 0.7367
>>>>
>>>> as you see there are only 9 columns in inp and the rest is read in in
>>>> the following row(see row 7)
>>>> I just don't understand why this is happening (using fill=T does not
>>>> help either)
>>>>
>>>> Best,
>>>> Luis
>>>>
>>>> On Tue, Mar 15, 2011 at 5:15 PM, David Winsemius <dwinsemius_at_comcast.net>
>>>> wrote:
>>>>>
>>>>> On Mar 15, 2011, at 1:11 PM, <rex.dwyer_at_syngenta.com> wrote:
>>>>>
>>>>>> I think you need to read an introduction to R.
>>>>>> For starters, read.table returns its results as a value, which you are
>>>>>> not
>>>>>> saving.
>>>>>> The probable answer to your question:
>>>>>> Read the whole file with read.table, and select columns you need, e.g.:
>>>>>> tab <- read.table(myfile, skip=2)[,1:5]
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: r-help-bounces_at_r-project.org
>>>>>> [mailto:r-help-bounces_at_r-project.org]
>>>>>> On Behalf Of Luis Ridao
>>>>>> Sent: Tuesday, March 15, 2011 11:53 AM
>>>>>> To: r-help_at_r-project.org
>>>>>> Subject: [R] How to read only specified columns from a data file
>>>>>>
>>>>>> R-help,
>>>>>>
>>>>>> I'm trying to read a data file with plenty of columns.
>>>>>> I just need the first 5 but it doe not work by doing something like:
>>>>>>
>>>>>>> mycols <- rep(NULL, 430) ; mycols[c(1:4)] <- NA
>>>>>>> read.table(myfile, skip=2, colClasses=mycols)
>>>>>
>>>>> I would have suggested:
>>>>>
>>>>> mycols <- rep(NULL, 430) ; mycols[1:5] <- rep("numeric", 5)
>>>>> inp <- read.table(myfile, skip=2, colClasses=mycols)
>>>>> head(inp)
>>>>>
>>>>> --
>>>>> David.
>>>>>
>>>>>>
>>>>>> Any suggestions?
>>>>>>
>>>
>>> --

-- 
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 16 Mar 2011 - 13:25:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 16 Mar 2011 - 15:00:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive