Re: [R] Importing Large Dataset into Excel

From: David Scott <d.scott_at_auckland.ac.nz>
Date: Thu, 13 Dec 2007 00:00:53 +1300 (NZDT)

On Wed, 12 Dec 2007, Peter Dalgaard wrote:

> Philippe Grosjean wrote:
> The problem is often a misspecification of the comment.char argument.
> For read.table(), it defaults to '#'. This means that everywhere you
> have a '#' char in your Excel sheet, the rest of the line is ignored.
> This results in a different number of items per line.
>
> You should better use read.csv() which provides better default arguments
> for your particular problem.
> Best,
>
>
Or read.delim/read.delim2, which should be even better at TAB-separated files.

In general, be very suspicious of read.table() with such files, not only because of the '#' but also because it expects columns separated by _arbitrary_ amounts of whitespace. I.e., n TABs counts as one, so empty fields are skipped over.

I would also say be very suspicious of Excel writing .csv files. I found by looking at the .csv file in an editor that for some reason when there were empty fields in the original .xls file that for some records, Excel didn't add in enough commas to make up the correct number of fields. It did for some records but not for others. Excel truly works in misterious ways.

read.csv has an argument fill which should fix this problem. In my case I was actually reading the .csv file into mySQL and the solution was to select the whole of the .xls file and format it as text before writing the .csv file.

David SCott


David Scott	Department of Statistics, Tamaki Campus
 		The University of Auckland, PB 92019
 		Auckland 1142,    NEW ZEALAND
Phone: +64 9 373 7599 ext 86830		Fax: +64 9 373 7000
Email:	d.scott_at_auckland.ac.nz

Graduate Officer, Department of Statistics Director of Consulting, Department of Statistics



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Dec 2007 - 11:16:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 12 Dec 2007 - 11:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.