[R] Read function that detects format automatically

From: Jeroen Ooms <jeroenooms_at_gmail.com>
Date: Wed, 27 Apr 2011 19:22:17 -0700 (PDT)

I was wondering if there exists a function that automatically tries to detect the format of a datafile. E.g. if it is an ascii datafile, that it can detect appropriate defaults for the read.table() parameters. One could for example read the first 10 lines of the file and analyze the format of the first line in comparison with the others, count the number of dots, colons and semicolons, etc. More generally, one could use the file extension or if available the unix 'file' command to evaluate the filetype if it is non ascii.

I think it should not be very complicated to get a very high accuracy for detecting formats. For most datafiles it is for a human statistican easy to see the format of the file by looking at a fragment, so it should be possible to capture these rules in some code. It would be nice to have something like a read.magic() function that reads a datafile using the appropriate command, regardless of whether the user supplied an csv1, csv2, tab delimited, excel, spss, stata, etc file.

I actually started to code something like this, but then I figured that maybe someone else has had the exact same idea.

View this message in context: http://r.789695.n4.nabble.com/Read-function-that-detects-format-automatically-tp3479958p3479958.html
Sent from the R help mailing list archive at Nabble.com.

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 28 Apr 2011 - 03:06:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Apr 2011 - 03:20:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive