[Rd] read.table() errors with tab as separator (PR#9061)

From: <John.Maindonald_at_anu.edu.au>
Date: Wed 05 Jul 2006 - 09:35:01 GMT


(1) read.table(), with sep="\t", identifies 13 our of 1400 records, in a file with 1400 records of 3 fields each, as having only 2 fields. This happens under version 2.3.1 for Windows as well as with R 2.3.1 for Mac OS X, and with R-devel under Mac OS X. [R version 2.4.0 Under development (unstable) (2006-07-03 r38478)]

(2) Using read.table() with sep="\t", the first 1569 records only of a 1821 record file are input. The file has exactly two fields in each record, and the minimum length of the second field is 1 character. If however I extract lines 1561 to 1650 from the file (the file "short.txt" below), all 90 lines are input.

> webtwo <- "http://www.maths.anu.edu.au/~johnm/testfiles/twotabs.txt"
> xy <- read.table(url(webtwo), sep="\t")
Warning message:
number of items read is not a multiple of the number of columns
> z <- count.fields(url(webtwo), sep="\t")
> table(z)

z

    2 3
   13 1387
> table(sapply(strsplit(readLines(url(webtwo)), split="\t"), length))

    3
1400
> readLines(url(webtwo))[z==2][9:13] # last 5 as a sample (shorter
lines)

[1] "865\tlinear model (lm)! Cook's distance\t152"
[2] "1019\tlinear model (lm)! Cook's distance\t177"
[3] "1048\tlinear model (lm)! Cook's distance\t183"
[4] "1082\tlinear model (lm)! Cook's distance\t187"
[5] "1220\tlinear model (lm)! Cook's distance\t214"

> weblong <- "http://www.maths.anu.edu.au/~johnm/testfiles/long.txt"
> webshort <- "http://www.maths.anu.edu.au/~johnm/testfiles/short.txt"
> xyLong <- read.table(url(weblong), sep="\t")
> dim(xyLong) # Should be 1821 x 2

[1] 1569 2
> xyShort <- read.table(url(webshort), sep="\t")
> dim(xyShort) # Should be, and will be, 90 x 2
[1] 90 2
> long <- readLines(url(weblong))
> short <- readLines(url(webshort))
> length(long)

[1] 1821
> length(short)

[1] 90
> all(long[1561:1650]==short) # short is lines 1561:1650 of long
[1] TRUE
> ## Moreover strsplit() can pick up the \t's correctly
> lsplit <- strsplit(long, "\t")
> table(sapply(lsplit, length))

    2
1821
> # Try also table(sapply(lsplit, function(x)x[2]))

--please do not edit the information below--

Version:
platform = powerpc-apple-darwin8.6.0
arch = powerpc
os = darwin8.6.0
system = powerpc, darwin8.6.0
status =
major = 2
minor = 3.1

year = 2006
month = 06
day = 01
svn rev = 38247
language = R
version.string = Version 2.3.1 (2006-06-01)

Locale:
C

Search Path:
.GlobalEnv, package:lattice, package:methods, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, Autoloads, package:base



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Jul 05 19:39:18 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 05 Jul 2006 - 10:28:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.