Re: [Rd] read.table() errors with tab as separator (PR#9061)

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Wed 05 Jul 2006 - 11:21:29 GMT

Prof Brian Ripley <ripley@stats.ox.ac.uk> writes:

> On Wed, 5 Jul 2006, Peter Dalgaard wrote:
>
> > John.Maindonald@anu.edu.au writes:
> >
> >> (1) read.table(), with sep="\t", identifies 13 our of 1400 records,
> >> in a file with 1400 records of 3 fields each, as having only 2 fields.
> >> This happens under version 2.3.1 for Windows as well as with
> >> R 2.3.1 for Mac OS X, and with R-devel under Mac OS X.
> >> [R version 2.4.0 Under development (unstable) (2006-07-03 r38478)]
> >>
> >> (2) Using read.table() with sep="\t", the first 1569 records only
> >> of a 1821 record file are input. The file has exactly two fields
> >> in each record, and the minimum length of the second field is
> >> 1 character. If however I extract lines 1561 to 1650 from the
> >> file (the file "short.txt" below), all 90 lines are input.
> >
> > Notice that the single quote is a quote character in read.table (as
> > opposed to read.delim, which uses only the double quote, to cater for
> > TAB-separated files from Excel & friends).
> >
> >> [1] "865\tlinear model (lm)! Cook's distance\t152"
> > ^
> > !!!!
> >
> > (This reminds me that we probably should shift the default for
> > comment.char too since it leads to similar issues, but it seems not to
> > be the problem in this case.)
>
> This seems to imply that we should change the default for 'quote': to
> do so could break a lot of scripts. (Given how long the default has
> been
> comment.char="#", I doubt if we should change that either.)

Sorry, unclear. We already change quote= for read.delim and read.csv, and I was suggesting also to modify the default for comment.char for those functions, but definitely not for read.table.

Arguably, those functions are there to handle file formats generated by other programs, and it is unlikely that such programs will generate comment lines starting with #, whereas we have learned that Excel will occasionally generate fields like #NULL#, which mess up the parsing.

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Jul 05 21:24:43 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 05 Jul 2006 - 12:26:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.