Re: [Rd] read.table() can't read in this table (But Splus can)

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Tue, 15 May 2007 09:11:58 -0400

It's the quoting character(s). This following seems to read the file in correctly:

R> DF <- read.table("http://llmpp.nih.gov/DLBCL/NEJM_Web_Fig1data",

+                   header = TRUE, sep = "\t", quote="")
R> str(DF)
'data.frame': 7399 obs. of 295 variables: [...]

If I have to guess, it's the "3-prime" or "5-prime" that occurs commonly in biology...

I don't think Mr. 9000 Vax can blame R for this.

Best,
Andy  

From: marc_schwartz_at_comcast.net
>
> On Mon, 2007-05-14 at 23:41 +0200, vax9000@gmail.com wrote:
> > Full_Name: vax, 9000
> > Version: 2.4.0, 2.2.1
> > OS: 2.4.0: Mac OS X; 2.2.1: Linux
> > Submission from: (NULL) (192.35.79.70)
> >
> >
> > To reproduce this bug, first go to the website
> "http://llmpp.nih.gov/DLBCL/" and
> > download the 14.8M data set "Web Figure 1 Data file". The
> direct link is
> > "http://llmpp.nih.gov/DLBCL/NEJM_Web_Fig1data". Save it as
> "datafile.txt"
> >
> > Then, start R, type in command "x <-
> read.table("datafile.txt", header=TRUE,
> > sep="\t")". The data has 7400 lines, but not all lines
> could be read in by R.
> >
> > Easier test data set:
> > Use the command "head -n 100 datafile.txt >
> shortdatafile.txt" to extract the
> > first 100 lines. The R command "x <-
> read.table("datafile.txt", header=TRUE,
> > sep="\t")" could not read in even this 100 lines of data.
> >
> > But Splus can, with the same command. What is wrong?
>
> Using R version 2.5.0 Patched:
>
> > DF <-
> read.table("http://llmpp.nih.gov/DLBCL/NEJM_Web_Fig1data",
> header = TRUE, sep = "\t")
> Warning message:
> number of items read is not a multiple of the number of columns
>
>
> So I tried it with 'fill = TRUE' and that seems to work,
> which suggests
> that perhaps something is going on with the data file structure:
>
> DF <- read.table("http://llmpp.nih.gov/DLBCL/NEJM_Web_Fig1data",
> header = TRUE, sep = "\t", fill = TRUE)
>
> > str(DF)
> 'data.frame': 4734 obs. of 295 variables:
> $ UNIQID : int 27481
> 17013 24751 27498 27486 30984 17293 28329 27459 27482 ...
> $ NAME : Factor w/
> 4040 levels "||*AA037178|Hs.179661|FK506 binding protein 1A
> (12kD)",..: 3444 3445 3446 3444 3445 657 1788 3121 3119 3119 ...
> $ MLC94.46_LYM009_de.novo.untreated : num 0.234
> 0.452 0.405 0.115 0.249 ...
> $ MLC96.45_LYM186_de.novo.untreated : num -0.1725
> -0.0387 -0.0413 -0.0242 -0.1028 ...
> $ MLC91.27_LYM427_de.novo.untreated : num 0.200
> 0.175 0.195 0.223 0.179 ...
> $ MLC96.84_LYM225_transformed : num -0.213
> -0.325 -0.200 -0.199 -0.155 ...
> $ MLC95.43_LYM095_de.novo.untreated : num -0.1197
> 0.0038 -0.0213 -0.0705 -0.0755 ...
> $ MLC91.28_LYM428_de.novo.untreated : num -0.3729
> 0.0047 -0.2220 -0.3373 -0.2808 ...
> $ MLC94.50_LYM004_de.novo.untreated : num -0.195
> -0.224 -0.126 -0.161 -0.199 ...
> $ MLC95.46_LYM098_de.novo.untreated : num 0.489
> 0.611 0.577 0.661 0.519 ...
> $ MLC95.62_LYM114_de.novo.untreated : num 0.390
> 0.657 0.747 0.723 0.731 ...
> $ MLC95.85_LYM137_de.novo.untreated : num -0.277
> -0.564 -0.297 -0.140 -0.513 ...
> ..
>
>
> I would update your version of R and then re-try this.
>
> HTH,
>
> Marc Schwartz
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>



Notice: This e-mail message, together with any attachments,...{{dropped}}

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 15 May 2007 - 13:15:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 May 2007 - 15:04:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.