Re: [Rd] Bug in read.table?

From: peter dalgaard <>
Date: Tue, 16 Nov 2010 14:04:16 +0100

On Nov 16, 2010, at 02:59 , Ben Bolker wrote:

> Ben Bolker <bbolker <at>> writes:

>> Ben Bolker <bbolker <at>> writes:

>> Can simplify this still farther:
>> a b'c
>> d e'f
>> g h'i
>  This example file leads to duplicate lines.
> Arguably it should have behavior analogous to:

>> scan(what="")
> 1: a b'c
> 3: d e'f
> 5: g h'i
> 7: Read 6 items
> [1] "a"   "b'c" "d"   "e'f" "g"   "h'i"

>>> One of the first things that happens in read.table is that
>>> the first few lines are read with readTableHead:
>>>  lines <- .Internal(readTableHead(file, nlines, comment.char, 
>>>       blank.lines.skip, quote, sep))

>> in this case, this reads the first two lines as one line;
>> the single quote at pos. 4 of the first line closes on pos.
>> 4 of the second line, preventing the first new line from
>> ending a line.
>> R then pushes back two copies of the lines that have
>> been read (this is normal behavior; I don't quite follow the
>> logic).
>> The rest of the file is read with scan(), 1 line at a time.
>> However, there is the discrepancy between the way
>> that readTableHead interprets new lines in the middle of
>> quoted strings (it ignores them) and the way that scan()
>> interprets them (it takes them as the end of the quoted string).
>  Ping?
>  I think this counts as a small, but real, bug. Should I go ahead
> and report it as such, or would someone explain why it's not a bug?

I think it can be defended to file as a bug, but it is tricky to pinpoint exactly what the issue is. E.g., notice that adding a few spaces changes the behaviour of scan() considerably:

> scan(what="")

1:  a b 'c
1: d e' f
5: g h' i
Read 7 items
[1] "a"      "b"      "c\nd e" "f"      "g"      "h'"     "i"     

(I'm confused... What is it that we really want here?)

Also, as you noted originally, beware the "Well don't do that then" aspect...

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email:  Priv:

______________________________________________ mailing list
Received on Tue 16 Nov 2010 - 13:08:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 16 Nov 2010 - 13:50:21 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive