Re: [Rd] Behaviour of read.table with empty columns

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed, 09 May 2007 17:04:58 +0100 (BST)

On Wed, 9 May 2007, John Fox wrote:

> Dear r-devel list members,
>
> I stumbled across the following behaviour of read.table() recently:
Suppose
> that I have the data
>
> a " " ""
> "" "" ""
>
> in a file or copied to the clipboard, and issue the command
>
>> DF <- read.table("clipboard")
>> DF
> V1 V2 V3
> 1 a NA NA
> 2 NA NA
>
>> is.na(DF)
> V1 V2 V3
> [1,] FALSE TRUE TRUE
> [2,] FALSE TRUE TRUE
>
> I was surprised by the NAs. Note that they occur only when a column consists
> entirely of empty strings or strings composed of blanks.
>
> On the other hand
>
>> data.frame(A=c("", "", ""))
> A
> 1
> 2
> 3
>
> works as I would have expected.

How did you expect R to know that "" meant a character column? You are allowed to quote any type of column, so as far as read.table is concerned the columns is entirely empty and so its type is unknown. It defaults to the simplest possible type, logical.

The answer is I think to use colClasses="character".

It is probably slightly more accurate to say that if colClasses is not given, all columns are read as character columns, and then converted to the simplest possible type. In earlier versions of R you could get NULL columns (if there were no rows at all), but now the simplest is logical.

Brian

> A work-around for me was
>
>> DF[is.na(DF)] <- ""
>> DF
> V1 V2 V3
> 1 a
> 2
>
> But, as I said, I found the behaviour of read.table() puzzling.
>
> All this is with R 2.5.0 on a Windows XP Pro SP 2 system.
>
> Comments?
>
> Thanks,
> John
>
> --------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 09 May 2007 - 16:08:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 May 2007 - 06:34:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.