From: Petr Savicky <savicky_at_cs.cas.cz>

Date: Tue, 16 Jun 2009 20:09:01 +0200

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 16 Jun 2009 - 18:27:37 GMT

Date: Tue, 16 Jun 2009 20:09:01 +0200

On Sun, Jun 14, 2009 at 09:21:24PM +0100, Ted Harding wrote:

> On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:

*> > If read.csv's colClasses= argument is NOT used then read.csv accepts
**> > double quoted numerics:
**> >
**> > 1: > read.csv(stdin())
**> > 0: A,B
**> > 1: "1",1
**> > 2: "2",2
**> > 3:
**> > A B
**> > 1 1 1
**> > 2 2 2
**> >
**> > However, if colClasses is used then it seems that it does not:
**> >
**> >> read.csv(stdin(), colClasses = "numeric")
**> > 0: A,B
**> > 1: "1",1
**> > 2: "2",2
**> > 3:
**> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
**> > na.strings, :
**> > scan() expected 'a real', got '"1"'
**> >
**> > Is this really intended? I would have expected that a csv file
**> > in which each field is surrounded with double quotes is acceptable
**> > in both cases. This may be documented as is yet seems undesirable
**> > from both a consistency viewpoint and the viewpoint that it should
**> > be possible to double quote fields in a csv file.
**>
**> Well, the default for colClasses is NA, for which ?read.csv says:
**> [...]
**> Possible values are 'NA' (when 'type.convert' is used),
**> [...]
**> and then ?type.convert says:
**> This is principally a helper function for 'read.table'. Given a
**> character vector, it attempts to convert it to logical, integer,
**> numeric or complex, and failing that converts it to factor unless
**> 'as.is = TRUE'. The first type that can accept all the non-missing
**> values is chosen.
**>
**> It would seem that type 'logical' won't accept integer (naively one
**> might expect 1 --> TRUE, but see experiment below), so the first
**> acceptable type for "1" is integer, and that is what happens.
**> So it is indeed documented (in the R[ecursive] sense of "documented" :))
**>
**> However, presumably when colClasses is used then type.convert() is
**> not called, in which case R sees itself being asked to assign a
**> character entity to a destination which it has been told shall be
**> integer, and therefore, since the default for as.is is
**> as.is = !stringsAsFactors
**> but for this ?read.csv says that stringsAsFactors "is overridden
**> bu [sic] 'as.is' and 'colClasses', both of which allow finer
**> control.", so that wouldn't come to the rescue either.
**>
**> Experiment:
**> X <-logical(10)
**> class(X)
**> # [1] "logical"
**> X[1]<-1
**> X
**> # [1] 1 0 0 0 0 0 0 0 0 0
**> class(X)
**> # [1] "numeric"
**> so R has converted X from class 'logical' to class 'numeric'
**> on being asked to assign a number to a logical; but in this
**> case its hands were not tied by colClasses.
**>
**> Or am I missing something?!!
*

In my opinion, you explain, how it happens that there is a difference
in the behavior between

read.csv(stdin(), colClasses = "numeric")
and

read.csv(stdin())

but not, why it is so.

The algorithm "use the smallest type, which accepts all non-missing values"
may well be applied to the input values either literally or after removing
the quotes. Is there a reason, why

read.csv(stdin(), colClasses = "numeric")
removes quotes from the input values and
read.csv(stdin())

does not?

Using double-quote characters is a part of the definition of CSV file, see,
for example

http://en.wikipedia.org/wiki/Comma_separated_values
where one may find

Fields may always be enclosed within double-quote characters, whether necessary or not.

Petr.

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 16 Jun 2009 - 18:27:37 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 16 Jun 2009 - 18:36:15 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*