[Rd] Wish: change behaviour of header in read.fwf (PR#9252)

From: <gregor.gorjanc_at_bfro.uni-lj.si>
Date: Tue 26 Sep 2006 - 00:01:41 GMT


Hello!

In my opinion read.fwf()'s behaviour of header is not really useful. Say I have the following data:

col1 col2 col3
 123 123 123
   a b
1234 12 1234

      65.4 4.5

Now if I want to read this data into R I can not use read.table due to missing fields.

read.table(file="test.txt")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :

        line 3 did not have 3 elements

However, read.fwf() can help me.

read.fwf(file="test.txt", widths=c(5, 6, 5))

     V1 V2 V3
1 col1 col2 col3
2 123 123 123

3    a             b
4 1234     12   1234
5        65.4    4.5

Upps, I need to specify header and help page says that header fields must be separated by sep. sep part of help page says

     sep: character; the separator used internally; should be a
          character that does not occur in the file (except in the
          header).

This is quite limiting because I never know in advance which characters do not occur in a datafile and if I do, I have to properly modify header in the file before import. Naive use of read.fwf returns an error

read.fwf(file="test.txt", widths=c(5, 6, 5), header=TRUE, sep=" ") Error in read.table(file = FILE, header = header, sep = sep, as.is = as.is, :

        more columns than column names

read.fwf(file="test.txt", widths=c(5, 6, 5), header=TRUE, sep=" ") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :

        invalid 'sep' value: must be one byte

I get lost in reading source of read.fwf, but I think that the following idea should be easy to implement and it would be also similar to read.table behaviour.

<ideaCode>

if(header) {
  ## sep is from read.fwf call
  header <- unlist(strsplit(readLines(con=file, n=1), split=sep)) }
...
## tweaks related to issues with length(header), row.names, ncol(), ... read.table(..., col.names=header, ...)

</ideaCode>

I know that FWF is not used much these days, but I would find proposed change really useful.

-- 
Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------
University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europe fax: +386 (0)1 72 17 888
----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue Sep 26 10:04:41 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 26 Sep 2006 - 02:30:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.