Re: [R] Reading large files in R

From: Andreas Hary <u08adh_at_hotmail.com>
Date: Tue 09 Aug 2005 - 07:49:09 EST

You can also use the RODBC package to hold the data in a database, say MySQL and only import it when you do the modelling, e.g.

> library(RODBC)
> library(sspir)
> con <- odbcConnect("MySQL Test")
> data(vandrivers)
> sqlSave(con,dat=vandrivers,append=FALSE)
> rm(vandrivers)
> gc()
> van.call <- sqlQuery(con,'select * from vandrivers;')
> vd <- ssm( y ~ tvar(1) + seatbelt + sumseason(time,12),
> time=time, family=poisson(link="log"),
> data=eval(van.call))
> vd$ss$phi["(Intercept)"] <- exp(- 2*3.703307 )
> vd$ss$C0 <- diag(13)*1000
> vd.res <- kfs(vd)
> gc()

In this case I have first saved the vandriver data in 'MySQL Test', but one can obviously write the data directly to the database. Since the data is not held in memory I find that I can do much larger computations than is otherwise possible. The downside is of course that computations take a bit longer.
Best wishes,

Andreas



Andreas D Hary
Email: u08adh@hotmail.com
Mobile: 07906860987
Phone: 02076554940

> ... and it is likely that even if you did have enough memory (several
> times
> the size of the data are generally needed) it would take a very long time.
>
> If you do have enough memory and the data are all of one type -- numeric
> here -- you're better off treating it as a matrix rather than converting
> it
> to a data frame.
>
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
>
> "The business of the statistician is to catalyze the scientific learning
> process." - George E. P. Box
>
>
>
>> -----Original Message-----
>> From: r-help-bounces@stat.math.ethz.ch
>> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of
>> Adaikalavan Ramasamy
>> Sent: Monday, August 08, 2005 12:02 PM
>> To: Jean-Pierre Gattuso
>> Cc: r-help@stat.math.ethz.ch
>> Subject: Re: [R] Reading large files in R
>>
>> >From Note section of help("read.delim") :
>>
>> 'read.table' is not the right tool for reading large matrices,
>> especially those with many columns: it is designed to read _data
>> frames_ which may have columns of very different classes. Use
>> 'scan' instead.
>>
>> So I am not sure why you used 'scan', then converted it to a
>> data frame.
>>
>> 1) Can provide an sample of the data that you are trying to read in.
>> 2) How much memory does your machine has ?
>> 3) Try reading in the first few lines using the nmax argument in scan.
>>
>> Regards, Adai
>>
>>
>>
>> On Mon, 2005-08-08 at 12:50 -0600, Jean-Pierre Gattuso wrote:
>> > Dear R-listers:
>> >
>> > I am trying to work with a big (262 Mb) file but apparently
>> reach a
>> > memory limit using R on a MacOSX as well as on a unix machine.
>> >
>> > This is the script:
>> >
>> > > type=list(a=0,b=0,c=0)
>> > > tmp <- scan(file="coastal_gebco_sandS_blend.txt", what=type,
>> > sep="\t", quote="\"", dec=".", skip=1, na.strings="-99",
>> nmax=13669628)
>> > Read 13669627 records
>> > > gebco <- data.frame(tmp)
>> > Error: cannot allocate vector of size 106793 Kb
>> >
>> >
>> > Even tmp does not seem right:
>> >
>> > > summary(tmp)
>> > Error: recursive default argument reference
>> >
>> >
>> > Do you have any suggestion?
>> >
>> > Thanks,
>> > Jean-Pierre Gattuso
>> >
>> > ______________________________________________
>> > R-help@stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>> >
>>
>> ______________________________________________
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 09 18:07:19 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:39:45 EST