Re: [Rd] (PR#14103) read.csv confused by newline characters in

From: <ripley_at_stats.ox.ac.uk>
Date: Fri, 04 Dec 2009 13:40:25 +0100 (CET)


  This message is in MIME format. The first part should be readable text,   while the remaining parts are likely unreadable without MIME-aware tools.

--27464147-536455723-1259929222=:18586

Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8BIT
Content-ID: <alpine.LFD.2.00.0912041222341.18586_at_gannet.stats.ox.ac.uk>

It's not to do with pushback per se. The works as one might expect, e.g.

f <- file("test.txt", "r")
pushBack('"A1\nA2"', f)
pushBackLength(f)
scan(f, "", quote='"')

gives "A1\nA2" on a single line, then whatever was in test.txt. Rather, the issue is

         if (header) {
             readLines(file, 1L)          # skip over header

and that stops at the embedded newline. The fix is to read the header again the same way as before.

It seems to me that this is esoteric and the fix could tickle similar esoteric constructions, so I am only going to put the fix into R-devel and not the upcoming 2.10.1.

On Wed, 2 Dec 2009, Peter Dalgaard wrote:

> g.russell@eos-solutions.com wrote:
>> Full_Name: George Russell
>> Version: 2.10.0
>> OS: Microsoft Windows XP Service Pack 2
>> Submission from: (NULL) (217.111.3.131)
>>
>>
>> The following code (typed into R --vanilla)
>>
>> testString <- '"B1\nB2"\n1\n'
>> con <- textConnection(testString)
>> tab <- read.csv(con,stringsAsFactors = FALSE)
>>
>> produces a data frame with with one row and one column; the name of the column
>> is "B1.B2" (alright so far). However according to
>> print(tab[[1,1]])
>>
>> the value of the entry in the first row and first column is
>>
>> "B2\n1\n"
>>
>> So B2 has somehow got into both the names of the data frame and its entry.
>> Either R is confused or I am. What is going on?
>
> Presumably, read.table is not obeying quotes when removing what it
> thinks is the header line. Another variation is this:
>
>> tab <- read.table(stdin(), head=T)
> 0: "B1
> 0: B2"
> 1: 1
> 2:
>> tab
> B1.B2
> 1 B2"
> 2 1
>
>
> It's somehow connected to the
>
> pushBack(c(lines, lines), file)
>
> bits in readtable.R, but I don't quite get it.
>
> --
> O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
>
https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
--27464147-536455723-1259929222=:18586--

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 04 Dec 2009 - 12:44:14 GMT

This archive was generated by hypermail 2.2.0 : Fri 04 Dec 2009 - 13:40:59 GMT