"Kevin Wright" <kwright68@gmail.com> wrote in message
news:adf71a630511030726v10f1dc77sa7c760b96a27480f@mail.gmail.com...
> >From my experience (somewhat of a guess):
> Excel uses the first 16 rows of data to determine if a column is numeric
or
> character. The data type which is most common in the first 16 rows will
then
> be used for the whole column.
I ran some experiments trying to force RODBC to read column 1 of my worksheet as character data (the data are mostly numbers with two exceptions, 275a and 275b, as mentioned earlier).
Here's the base code:
> library(RODBC)
> channel <- odbcConnectExcel("U:/efg/lab/R/Krumlauf-Plasmid/construct
list.xls")
> plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE)
> odbcClose(channel)
> names(plasmid)
[1] "Plasmid Number" "Plasmid" "Concentration" "Comments" "Lost"
When Excel Sheet1 has rows 2:13 as an "X" to attempt to force treatment of column 1 as character data:
> class(plasmid$"Plasmid Number")
[1] "numeric"
> typeof(plasmid$"Plasmid Number")
[1] "double"
> plasmid$"Plasmid Number"[1:20]
[1] NA NA NA NA NA NA NA NA NA NA NA NA 2 3 4 5 6 7 8 9
Why would any software with 12 consecutive "X" character strings "assume" the data are purely numeric?
Add one more "X" so rows 2:14 have an "X" to attempt to force treatment of column 1 as character data:
> class(plasmid$"Plasmid Number")
[1] "character"
> typeof(plasmid$"Plasmid Number")
[1] "character"
> plasmid$"Plasmid Number"[1:20]
[1] "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" NA NA NA NA NA NA NA
So RODBC now recognizes "character" Xs in column 1 and then declares all numbers as invalid? These are incredibly (bad) assumptions.
I say this is a "bug", but it may be an ODBC problem and not one with "R. And if this is not an official "bug", then it's a serious design problem. Minimally, this issue should be described in the R Data Import/Export document, which everyone is told to read before asking a question.
It's frustrating when packages like this work for "toy" problems, and the documentation never mentions the pitfalls of real data.
> The gregmisc bundle has a different read.xls function that uses a Perl
> script (xls2csv) and seems to be safer with mixed-type columns.
> Requires a working version of Perl.
Thanks for this suggestion, but I think I'll just convert the Excel spreadsheet to a .csv and maintain it in that format.
efg
This archive was generated by hypermail 2.1.8 : Fri 04 Nov 2005 - 11:24:48 EST