Re: [Rd] read.csv

From: Ted Harding <Ted.Harding_at_manchester.ac.uk>
Date: Sun, 14 Jun 2009 21:21:24 +0100 (BST)


On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:
> If read.csv's colClasses= argument is NOT used then read.csv accepts
> double quoted numerics:
>
> 1: > read.csv(stdin())
> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
> A B
> 1 1 1
> 2 2 2
>
> However, if colClasses is used then it seems that it does not:
>

>> read.csv(stdin(), colClasses = "numeric")

> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, :
> scan() expected 'a real', got '"1"'
>
> Is this really intended? I would have expected that a csv file
> in which each field is surrounded with double quotes is acceptable
> in both cases. This may be documented as is yet seems undesirable
> from both a consistency viewpoint and the viewpoint that it should
> be possible to double quote fields in a csv file.

Well, the default for colClasses is NA, for which ?read.csv says:   [...]
  Possible values are 'NA' (when 'type.convert' is used),   [...]
and then ?type.convert says:
  This is principally a helper function for 'read.table'. Given a   character vector, it attempts to convert it to logical, integer,   numeric or complex, and failing that converts it to factor unless   'as.is = TRUE'. The first type that can accept all the non-missing   values is chosen.

It would seem that type 'logical' won't accept integer (naively one might expect 1 --> TRUE, but see experiment below), so the first acceptable type for "1" is integer, and that is what happens. So it is indeed documented (in the R[ecursive] sense of "documented" :))

However, presumably when colClasses is used then type.convert() is not called, in which case R sees itself being asked to assign a character entity to a destination which it has been told shall be integer, and therefore, since the default for as.is is   as.is = !stringsAsFactors
but for this ?read.csv says that stringsAsFactors "is overridden bu [sic] 'as.is' and 'colClasses', both of which allow finer control.", so that wouldn't come to the rescue either.

Experiment:
  X <-logical(10)
  class(X)
  # [1] "logical"
  X[1]<-1
  X
  # [1] 1 0 0 0 0 0 0 0 0 0
  class(X)
  # [1] "numeric"
so R has converted X from class 'logical' to class 'numeric' on being asked to assign a number to a logical; but in this case its hands were not tied by colClasses.

Or am I missing something?!!

Ted.



E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 14-Jun-09                                       Time: 21:21:22
------------------------------ XFMail ------------------------------

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 14 Jun 2009 - 20:32:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 18 Jun 2009 - 07:36:39 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive