Re: [Rd] Bug in read.table?

From: Ben Bolker <bbolker_at_gmail.com>
Date: Tue, 16 Nov 2010 01:59:26 +0000 (UTC)

Ben Bolker <bbolker <at> gmail.com> writes:

>
> Ben Bolker <bbolker <at> gmail.com> writes:
>
> >
> >
>
> Can simplify this still farther:
>
> a b'c
> d e'f
> g h'i

  This example file leads to duplicate lines. Arguably it should have behavior analogous to:

> scan(what="")

1: a b'c
3: d e'f
5: g h'i
7: Read 6 items

[1] "a" "b'c" "d" "e'f" "g" "h'i"

>
> > One of the first things that happens in read.table is that
> > the first few lines are read with readTableHead:
> >
> > lines <- .Internal(readTableHead(file, nlines, comment.char,
> > blank.lines.skip, quote, sep))
> >
> in this case, this reads the first two lines as one line;
> the single quote at pos. 4 of the first line closes on pos.
> 4 of the second line, preventing the first new line from
> ending a line.
>
> R then pushes back two copies of the lines that have
> been read (this is normal behavior; I don't quite follow the
> logic).
>
> The rest of the file is read with scan(), 1 line at a time.
> However, there is the discrepancy between the way
> that readTableHead interprets new lines in the middle of
> quoted strings (it ignores them) and the way that scan()
> interprets them (it takes them as the end of the quoted string).

  Ping?
  I think this counts as a small, but real, bug. Should I go ahead and report it as such, or would someone explain why it's not a bug?

  cheers
    Ben Bolker



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 16 Nov 2010 - 02:02:22 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 16 Nov 2010 - 13:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive