Re: [R] OT: batch processing XLS files to CSV

From: Prof Brian Ripley <>
Date: Thu, 29 May 2008 10:49:33 +0100 (BST)

I think the issue here is that the Excel spreadsheets as described do not contain simple tables of the form that the various .xls readers described in 'R Data Import/Export' handle. Your example is not applicable (and if it were there are several alternatives in that manual).

I would look into the Perl solutions for parsing .xls files.

On Thu, 29 May 2008, Pascal Pascal wrote:

> Hello,
> I'm not a guru but we have to do reporting with data in Excel files. We use
> "xlsReadWrite" package.
> Below an exemple, with an "Excel.R" file which import "MySheet" sheet from a
> "MyFile.xls" workbook. First column are numeric, Second are integer. Third
> character and so on.
> library(xlsReadWrite)
> tmp <- read.xls( "MyFile.xls", sheet="MySheet", colClasses = c("numeric",
> "integer", "character", "integer"), colNames=T )
> To extract all columns behind a definied column you can use a command like
> this:
> tmp <- tmp[,seq( which(colnames(tmp)=="StartColumn"),ncol(tmp))]) #
>> From column "StartColumn" to the end
> tmp <- tmp[,seq(
> which(colnames(tmp)=="StartColumn"),which(colnames(tmp)=="EndColumn")])
> # From column "StartColumn" to "EndColumn"
> You can call routinely this procedure by a simple batch file (".bat") under
> windows by this command.
> start /B /WAIT "Rcmd" "%Path to R%\bin\Rcmd.exe" BATCH "C:\R\excel.R"
> You'll find more information on xlsReadWrite on CRAN.
> You can also search in CPAN and on the web. There are more than one
> solution to import directly Excel files with Perl.

Indeed, and CRAN package gdata has an interface to one of them.

> Be more precise and give example if you want more help.
> Hope that helps,
> Pascal
> 2008/5/28 Chris Evans <>:
>> Dear R gurus, particularly those of generous M$ tolerance and diverse gifts
>> and knowledge!
>> I have an interesting challenge that I will end up crunching in R involving
>> service usage by patients. Maybe I can do all of it in R but I can't see
>> how yet.
>> My situation is that our IT Department can give me loads of XLS files about
>> patients one of our services have seen. The are one per patient per time
>> period. All the data are in the first sheet of the XLS files and that sheet
>> contains four variable length but fixed format matrices of data:
>> 1) demographics (actually, this is fixed length, one row!);
>> 2) community contacts with services, variable length, rarely zero rows but
>> could be;
>> 3) inpatient admissions, variable length, often zero rows;
>> and CPA information (don't ask what that is!), two rows, fixed format, just
>> to make things tricky, they're spearated by a fixed few junk rows in the xls
>> files. The column format of each block is different.
>> Each block starts with standard label rows so it will be easy to identify
>> these start points and know the format on the rows that follow each one. I
>> could use perl to scan for these and then read the zero to many lines of the
>> data in the matrix and end on finding the next header.
>> I would be fairly happy to do this with perl but would need to convert the
>> xls (xls 2002) files to CSV to get at them in Perl (I think).
>> Anyone out there done anything like this and can give me any advice? I'm
>> sorry, I'm sure there are more specific lists or web resources but I think
>> the skills are here too and if someone can tell me how to do this all in R,
>> I'd be fascinated.
>> Many thanks,
>> Chris
>> --
>> Chris Evans <> Skype: chris-psyctc
>> Professor of Psychotherapy, Nottingham University;
>> Consultant Psychiatrist in Psychotherapy, Notts PDD network;
>> Research Programmes Director, Nottinghamshire NHS Trust;
>> *If I am writing from one of those roles, it will be clear. Otherwise*
>> *my views are my own and not representative of those institutions *
>> ______________________________________________
>> mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 29 May 2008 - 10:22:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 11:30:44 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive