Re: [R] Building package - tab delimited example data issue

From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>
Date: Thu, 06 Dec 2007 11:52:46 +0100

Johannes Graumann wrote:
> Hello,
>
> I'm trying to integrate example data in the shape of a tab delimited ASCII
> file into my package and therefore dropped it into the data subdirectory.
> The build works out just fine, but when I attempt to install I get:
>
> ** building package indices ...
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, :
> line 1 did not have 500 elements
> Calls: <Anonymous> ... <Anonymous> -> switch -> assign -> read.table -> scan
> Execution halted
> ERROR: installing package indices failed
> ** Removing '/usr/local/lib/R/site-library/MaxQuantUtils'
> ** Restoring previous '/usr/local/lib/R/site-library/MaxQuantUtils'
>
> Accordingly the check delivers:
>
> ...
> * checking whether package 'MaxQuantUtils' can be installed ... ERROR
>
> Can anyone tell me what I'm doing wrong? build/install witout the ASCII file
> works just fine.
>
> Joh
>
>
If you had looked at help(data), you would have found a list of which file formats it supports and how they are read. Hint: TAB-delimited files are not among them. *Whitespace* separated files work, using
read.table(filename, header=TRUE), but that is not a superset of TAB-delimited data if there are empty fields.

A nice trick is to figure out how to read the data from the command line and drop the relevant code into a mydata.R file (assuming that the actual data file is mydata.txt). This gets executed when the data is loaded (by data(mydata) or when building the lazyload database) because .R files have priority over .txt.

This is quite general and allows a nice way of incorporating data management while retaining the original data source:

>more ISwR/data/stroke.R

stroke <- read.csv2("stroke.csv", na.strings=".") names(stroke) <- tolower(names(stroke))

stroke <-  within(stroke,{
    sex <- factor(sex,levels=0:1,labels=c("Female","Male"))
    dgn <- factor(dgn)
    coma <- factor(coma, levels=0:1, labels=c("No","Yes"))
    minf <- factor(minf, levels=0:1, labels=c("No","Yes"))
    diab <- factor(diab, levels=0:1, labels=c("No","Yes"))     han <- factor(han, levels=0:1, labels=c("No","Yes"))
    died <- as.Date(died, format="%d.%m.%Y")
    dstr <- as.Date(dstr,format="%d.%m.%Y")
    dead <- !is.na(died) & died < as.Date("1996-01-01")
    died[!dead] <- NA
})

>head ISwR/data/stroke.csv

SEX;DIED;DSTR;AGE;DGN;COMA;DIAB;MINF;HAN

1;7.01.1991;2.01.1991;76;INF;0;0;1;0
1;.;3.01.1991;58;INF;0;0;0;0
1;2.06.1991;8.01.1991;74;INF;0;0;1;1
0;13.01.1991;11.01.1991;77;ICH;0;1;0;1
0;23.01.1996;13.01.1991;76;INF;0;1;0;1
1;13.01.1991;13.01.1991;48;ICH;1;0;0;1
0;1.12.1993;14.01.1991;81;INF;0;0;0;1
1;12.12.1991;14.01.1991;53;INF;0;0;1;1
0;.;15.01.1991;73;ID;0;0;0;1



-- 
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 06 Dec 2007 - 10:54:55 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 06 Dec 2007 - 12:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.