Re: [R] RODBC, missing values, and Excel

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Thu 13 Jul 2006 - 13:04:55 EST

In thinking about this some more I have a better idea. Use rcom (or RDCOMClient)
to get a list of the sheet names and then use that to determine which sheet you need. Then use read.xls to get it like this assuming that the Excel file and path are C:\test.xls and that one of the sheets in that spreadsheet is xyz. In my version the na.strings had a space at the end so you may need to change the na.strings= setting:

library(rcom)
xls <- "C:\\test.xls"
oxl <- comCreateObject('Excel.Application') comSetProperty(oxl, "Visible", TRUE) # this line optional owb <- comGetProperty(oxl, "Workbooks")
ob <- comInvoke(owb, "Open", xls)
osheets <- comGetProperty(ob, "Worksheets") n <- comGetProperty(osheets, "Count")
ithSheetName <- function(i)

        comGetProperty(comGetProperty(osheets, "Item", i), "Name") sheetNames <- sapply(1:n, ithSheetName)
comInvoke(oxl, "Quit")

library(gdata)
read.xls(xls, match("xyz", sheetNames), na.strings = "na ")

On 7/12/06, Gabor Grothendieck <ggrothendieck@gmail.com> wrote:
> Would it be good enough to just read all the sheets in?
>
> The perl program can do that and although the read.xls R function does not
> interface to that aspect of its functionality its not that difficult to access
> it yourself. Assume your excel file is in \test.xls . Just
> switch to that folder. paste together a command to run the perl
> program, run it, get a list of the file names it produced and read them in:
>
> library(gdata)
> setwd("/")
> cmd <- paste("perl", system.file("perl/xls2csv.pl", package = "gdata"), "test")
> system(cmd)
> ff <- list.files(patt = "test_Sheet.*.csv")
> sapply(ff, read.csv, na.strings = "na ", simplify = FALSE)
>
>
> On 7/12/06, Sundar Dorai-Raj <sundar.dorai-raj@pdf.com> wrote:
> > Hi, Gabor,
> >
> > Thanks for the reply. Perhaps Prof. Ripley will enlighten us as he is
> > the RODBC maintainer.
> >
> > Unfortunately, gdata::read.xls will not work for me (at least I don't
> > think it will) because I need to refer to each worksheet by name and not
> > by number. For example, I need extract data from "Sheet1" and not simply
> > the first sheet.
> >
> > Thanks,
> >
> > --sundar
> >
> > Gabor Grothendieck wrote:
> > > I also got a strange result too (I renamed it sdr.read.xls
> > > to distinguish it from read.xls in gdata and noticed that a
> > > space got into my na's somehow so I used "na " for my
> > > na.strings:
> > >
> > >
> > >>sdr.read.xls("/test.xls", "Sheet2", na.strings = "na ")
> > >
> > > x
> > > 1 <NA>
> > > 2 <NA>
> > > 3 na
> > > 4 na
> > > 5 na
> > > 6 na
> > > 7 <NA>
> > >
> > > I had more success using read.xls in the gdata package.
> > > Note that we need to install perl first if not already present:
> > >
> > >
> > >>library(gdata) # for read.xls
> > >>read.xls("/test.xls", 2, na.strings = "na ")
> > >
> > > x
> > > 1 0.11
> > > 2 0.11
> > > 3 NA
> > > 4 NA
> > > 5 NA
> > > 6 NA
> > > 7 0.11
> > >
> > >
> > >>R.version.string # XP
> > >
> > > [1] "Version 2.3.1 Patched (2006-06-04 r38279)"
> > >
> > >>packageDescription("gdata")$Version
> > >
> > > [1] "2.1.2"
> > >
> > >>packageDescription("RODBC")$Version
> > >
> > > [1] "1.1-7"
> > >
> > >
> > > On 7/12/06, Sundar Dorai-Raj <sundar.dorai-raj@pdf.com> wrote:
> > >
> > >>Hi, all,
> > >>
> > >>I'm trying to use RODBC to read data from Excel. However, I'm having
> > >>trouble converting missing values to NA and rather perplexed by the
> > >>output. Below illustrates my problem:
> > >>
> > >>## DATA - copy to Excel and save as "tmp.xls"
> > >>## tmp.xls!Sheet1
> > >>x
> > >>0.11
> > >>0.11
> > >>na
> > >>na
> > >>na
> > >>0.11
> > >>
> > >>## tmp.xls!Sheet2
> > >>x
> > >>0.11
> > >>0.11
> > >>na
> > >>na
> > >>na
> > >>na
> > >>0.11
> > >>
> > >>## R Code
> > >>read.xls <- function(file, sheet = "Sheet1", ...) {
> > >> require(RODBC)
> > >> channel <- odbcConnectExcel(file)
> > >> sheet <- sprintf("select * from `%s$`", sheet)
> > >> x <- sqlQuery(channel, sheet, ...)
> > >> odbcClose(channel)
> > >> x
> > >>}
> > >>
> > >>read.xls("./tmp.xls", "Sheet1", na.strings = "na")
> > >>## works as expected
> > >># x
> > >>#1 0.11
> > >>#2 0.11
> > >>#3 NA
> > >>#4 NA
> > >>#5 NA
> > >>#6 0.11
> > >>
> > >>read.xls("./tmp.xls", "Sheet2", na.strings = "na")
> > >>## Huh? What happened?
> > >># x
> > >>#1 NA
> > >>#2 NA
> > >>#3 NA
> > >>#4 NA
> > >>#5 NA
> > >>#6 NA
> > >>#7 NA
> > >>
> > >> > sessionInfo()
> > >>Version 2.3.1 (2006-06-01)
> > >>i386-pc-mingw32
> > >>
> > >>attached base packages:
> > >>[1] "methods" "stats" "graphics" "grDevices" "utils" "datasets"
> > >>[7] "base"
> > >>
> > >>other attached packages:
> > >> RODBC
> > >>"1.1-7"
> > >>
> > >>______________________________________________
> > >>R-help@stat.math.ethz.ch mailing list
> > >>https://stat.ethz.ch/mailman/listinfo/r-help
> > >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> > >>
> > >
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 13 13:11:52 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 13 Jul 2006 - 22:14:14 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.