Re: [R] Building package - tab delimited example data issue

From: Peter Dalgaard <>
Date: Thu, 06 Dec 2007 13:47:26 +0100

Berwin A Turlach wrote:
> G'day Peter,
> On Thu, 06 Dec 2007 11:52:46 +0100
> Peter Dalgaard <> wrote:
>> If you had looked at help(data), you would have found a list of which
>> file formats it supports and how they are read. Hint: TAB-delimited
>> files are not among them. [...]
> On the other hand, "Writing R Extensions" has stated since a long time
> (and still does):
> The @file{data} subdirectory is for additional data files the package
> makes available for loading using @code{data()}. Currently, data files
> can have one of three types as indicated by their extension: plain R
> code (@file{.R} or @file{.r}), tables (@file{.tab}, @file{.txt}, or
> @file{.csv}), or @code{save()} images (@file{.RData} or @file{.rda}).
> Now in my book, .csv files contain comma separated values, .tab files
> contain values separated by TABs and .txt files are "pure" text files,
> presumably values separated by any kind of white space.
> Thus, I think that the expectation that TAB-delimited file formats
> should work is not unreasonable; I was long time ago bitten by this
> too. Then I realised that the phrase "one of the three types" should
> probably be interpreted as implying that .tab, .txt and .csv files are
> all of the same type and, apparently, should contain values separated
> by whitespace. I admit that I never tested whether .csv files would
> lead to the same problems as TAB delimited .tab files. Rather, I decided
> in the end that the safest option, i.e. to avoid misleading file
> extensions, would be to use .rda files in the future.
Now had you lived in the Western world ... (Hey, what's that? New address!) ... then you would have known better than to have any trust in file extensions. At the time "they" apparently figured that the .CSV standard was so good that it was even better to have two of them (double standards are twice as good, right?), depending on whether you were in England or in Denmark, I lost faith completely. (In this country you can export to a text file with SAS and then NOT read it with SPSS and vice versa on the same Windows machine).

Actually, R is a bit perverse about .csv too since it expects _semicolon_ field separator, but not the comma decimal separator which usually accompanies it. The reason for this is lost in the mists of time -- the datasets in current versions of R do not include any .csv files. There are, however, six .tab files, three of which are not tab-separated, but I don't actually think there was ever a standard to the effect that they should be (.tab just means that it is a _table_).

So, you really need to read the help page for data, which does have the exact info. The passage you cite from the manual could do with a rephrasing, although it probably isn't technically incorrect. As it stands, it reminds me a bit of the old Monty Python sketch:

"Our *three* weapons are fear, surprise, and ruthless efficiency...and an almost fanatical devotion to the Pope.... Our *four* *Amongst* our weapons.... Amongst our weaponry...are such elements as fear, surprise.... I'll come in again"

(There really are 3 data TYPES, but 4 FORMATS and, er, diverse EXTENSIONS)


   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (                  FAX: (+45) 35327907

______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu 06 Dec 2007 - 12:52:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Dec 2007 - 03:30:17 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.