[R] read.table: fill=T for header?

From: Philipp Pagel <p.pagel_at_wzw.tum.de>
Date: Wed, 27 Apr 2011 14:15:33 +0200

        Dear ExpeRts,t

I am trying to read tab delimted data produced by somewhat brain dead software that seems to think it's a good idea to have an extra tab character after the last column - except for the header line. As explained in the help page, read.delim now assumes that the first column contains the row.names (which is not even wrong) but now and all col.names get shiftet by one column. Example:

infile <- 'sample\tx1\n1\tA\t\n2\tB\t\n3\tA\t' read.delim(textConnection(infile))

    sample x1

  1      A NA
  2      B NA
  3      A NA

So I set row.names to NULL because the man page said "Using ‘row.names = NULL’ forces row numbering.". Now the row.names really are numbered automatically but I get a "bonus column":

read.delim(textConnection(infile), row.names=NULL)

    row.names sample x1

  1         1      A NA
  2         2      B NA
  3         3      A NA

Hm - not what I want. I am also a bit puzzeled why the extra column is introduced instead of just using the first col.name. At the moment I deal with it by fixing the col.names and dumping the extra column:

dat <- read.delim(textConnection(infile), row.names=NULL) colnames(dat) <- colnames(dat)[-1]
dat <- dat[-ncol(dat)]

    sample x1

  1      1  A
  2      2  B
  3      3  A

I worked my way through ?read.delim but could not find an option to deal with these (flawed) files directly. As the opposite situation (i.e. more col.names than data) can be fixed with fill=T I was hoping something like fill.header=T or fill='header' may exist. Did I just not find it or does it not exist? And if it doesn't - does anyone else think it would be a nice item for the wishlist?



Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 27 Apr 2011 - 12:22:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 27 Apr 2011 - 12:40:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive