Re: [R] OT: batch processing XLS files to CSV

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu, 29 May 2008 10:49:33 +0100 (BST)

I think the issue here is that the Excel spreadsheets as described do not contain simple tables of the form that the various .xls readers described in 'R Data Import/Export' handle. Your example is not applicable (and if it were there are several alternatives in that manual).

I would look into the Perl solutions for parsing .xls files.

On Thu, 29 May 2008, Pascal Pascal wrote:

> Hello,
>
> I'm not a guru but we have to do reporting with data in Excel files. We use
> "xlsReadWrite" package.
>
> Below an exemple, with an "Excel.R" file which import "MySheet" sheet from a
> "MyFile.xls" workbook. First column are numeric, Second are integer. Third
> character and so on.
>
> library(xlsReadWrite)
> tmp <- read.xls( "MyFile.xls", sheet="MySheet", colClasses = c("numeric",
> "integer", "character", "integer"), colNames=T )
>
> To extract all columns behind a definied column you can use a command like
> this:
> tmp <- tmp[,seq( which(colnames(tmp)=="StartColumn"),ncol(tmp))]) #
>> From column "StartColumn" to the end
> tmp <- tmp[,seq(
> which(colnames(tmp)=="StartColumn"),which(colnames(tmp)=="EndColumn")])
> # From column "StartColumn" to "EndColumn"
>
> You can call routinely this procedure by a simple batch file (".bat") under
> windows by this command.
> start /B /WAIT "Rcmd" "%Path to R%\bin\Rcmd.exe" BATCH "C:\R\excel.R"
>
> You'll find more information on xlsReadWrite on CRAN.
>
> You can also search in CPAN and on the web. There are more than one
> solution to import directly Excel files with Perl.

Indeed, and CRAN package gdata has an interface to one of them.

> Be more precise and give example if you want more help.
>
> Hope that helps,
>
> Pascal
>
> 2008/5/28 Chris Evans <chris_at_psyctc.org>:
>
>> Dear R gurus, particularly those of generous M$ tolerance and diverse gifts
>> and knowledge!
>>
>> I have an interesting challenge that I will end up crunching in R involving
>> service usage by patients. Maybe I can do all of it in R but I can't see
>> how yet.
>>
>> My situation is that our IT Department can give me loads of XLS files about
>> patients one of our services have seen. The are one per patient per time
>> period. All the data are in the first sheet of the XLS files and that sheet
>> contains four variable length but fixed format matrices of data:
>> 1) demographics (actually, this is fixed length, one row!);
>> 2) community contacts with services, variable length, rarely zero rows but
>> could be;
>> 3) inpatient admissions, variable length, often zero rows;
>> and CPA information (don't ask what that is!), two rows, fixed format, just
>> to make things tricky, they're spearated by a fixed few junk rows in the xls
>> files. The column format of each block is different.
>>
>> Each block starts with standard label rows so it will be easy to identify
>> these start points and know the format on the rows that follow each one. I
>> could use perl to scan for these and then read the zero to many lines of the
>> data in the matrix and end on finding the next header.
>>
>> I would be fairly happy to do this with perl but would need to convert the
>> xls (xls 2002) files to CSV to get at them in Perl (I think).
>>
>> Anyone out there done anything like this and can give me any advice? I'm
>> sorry, I'm sure there are more specific lists or web resources but I think
>> the skills are here too and if someone can tell me how to do this all in R,
>> I'd be fascinated.
>>
>> Many thanks,
>>
>> Chris
>>
>> --
>> Chris Evans <chris_at_psyctc.org> Skype: chris-psyctc
>> Professor of Psychotherapy, Nottingham University;
>> Consultant Psychiatrist in Psychotherapy, Notts PDD network;
>> Research Programmes Director, Nottinghamshire NHS Trust;
>> *If I am writing from one of those roles, it will be clear. Otherwise*
>> *my views are my own and not representative of those institutions *
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 29 May 2008 - 10:22:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 11:30:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive