From: Andreas Hary <u08adh_at_hotmail.com>

Date: Wed 10 Aug 2005 - 07:40:24 EST

Brief correction: it should read

*>> van.call <- call('sqlQuery',con,query='select * from vandrivers;')
*

rather than

>> van.call <- sqlQuery(con,'select * from vandrivers;')

The latter statement would load the data into memory as usual. Best wishes,

Andreas

- Original Message ----- From: "Andreas Hary" <u08adh@hotmail.com> To: "Berton Gunter" <gunter.berton@gene.com>; <ramasamy@cancer.org.uk>; "'Jean-Pierre Gattuso'" <gattuso@obs-vlfr.fr> Cc: <r-help@stat.math.ethz.ch> Sent: Monday, August 08, 2005 10:49 PM Subject: Re: [R] Reading large files in R

> You can also use the RODBC package to hold the data in a database, say

*> MySQL
**> and only import it when you do the modelling, e.g.
**>
**>> library(RODBC)
**>> library(sspir)
**>> con <- odbcConnect("MySQL Test")
**>> data(vandrivers)
**>> sqlSave(con,dat=vandrivers,append=FALSE)
**>> rm(vandrivers)
**>> gc()
**>> van.call <- sqlQuery(con,'select * from vandrivers;')
**>> vd <- ssm( y ~ tvar(1) + seatbelt + sumseason(time,12),
**>> time=time, family=poisson(link="log"),
**>> data=eval(van.call))
**>> vd$ss$phi["(Intercept)"] <- exp(- 2*3.703307 )
**>> vd$ss$C0 <- diag(13)*1000
**>> vd.res <- kfs(vd)
**>> gc()
**>
**> In this case I have first saved the vandriver data in 'MySQL Test', but
**> one
**> can obviously write the data directly to the database. Since the data is
**> not
**> held in memory I find that I can do much larger computations than is
**> otherwise possible. The downside is of course that computations take a bit
**> longer.
**> Best wishes,
**>
**> Andreas
**>
**> =====================
**> Andreas D Hary
**> Email: u08adh@hotmail.com
**> Mobile: 07906860987
**> Phone: 02076554940
**>
**>
**>
**>
**> ----- Original Message -----
**> From: "Berton Gunter" <gunter.berton@gene.com>
**> To: <ramasamy@cancer.org.uk>; "'Jean-Pierre Gattuso'"
**> <gattuso@obs-vlfr.fr>
**> Cc: <r-help@stat.math.ethz.ch>
**> Sent: Monday, August 08, 2005 8:35 PM
**> Subject: Re: [R] Reading large files in R
**>
**>
**>> ... and it is likely that even if you did have enough memory (several
**>> times
**>> the size of the data are generally needed) it would take a very long
**>> time.
**>>
**>> If you do have enough memory and the data are all of one type -- numeric
**>> here -- you're better off treating it as a matrix rather than converting
**>> it
**>> to a data frame.
**>>
**>> -- Bert Gunter
**>> Genentech Non-Clinical Statistics
**>> South San Francisco, CA
**>>
**>> "The business of the statistician is to catalyze the scientific learning
**>> process." - George E. P. Box
**>>
**>>
**>>
**>>> -----Original Message-----
**>>> From: r-help-bounces@stat.math.ethz.ch
**>>> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of
**>>> Adaikalavan Ramasamy
**>>> Sent: Monday, August 08, 2005 12:02 PM
**>>> To: Jean-Pierre Gattuso
**>>> Cc: r-help@stat.math.ethz.ch
**>>> Subject: Re: [R] Reading large files in R
**>>>
**>>> >From Note section of help("read.delim") :
**>>>
**>>> 'read.table' is not the right tool for reading large matrices,
**>>> especially those with many columns: it is designed to read _data
**>>> frames_ which may have columns of very different classes. Use
**>>> 'scan' instead.
**>>>
**>>> So I am not sure why you used 'scan', then converted it to a
**>>> data frame.
**>>>
**>>> 1) Can provide an sample of the data that you are trying to read in.
**>>> 2) How much memory does your machine has ?
**>>> 3) Try reading in the first few lines using the nmax argument in scan.
**>>>
**>>> Regards, Adai
**>>>
**>>>
**>>>
**>>> On Mon, 2005-08-08 at 12:50 -0600, Jean-Pierre Gattuso wrote:
**>>> > Dear R-listers:
**>>> >
**>>> > I am trying to work with a big (262 Mb) file but apparently
**>>> reach a
**>>> > memory limit using R on a MacOSX as well as on a unix machine.
**>>> >
**>>> > This is the script:
**>>> >
**>>> > > type=list(a=0,b=0,c=0)
**>>> > > tmp <- scan(file="coastal_gebco_sandS_blend.txt", what=type,
**>>> > sep="\t", quote="\"", dec=".", skip=1, na.strings="-99",
**>>> nmax=13669628)
**>>> > Read 13669627 records
**>>> > > gebco <- data.frame(tmp)
**>>> > Error: cannot allocate vector of size 106793 Kb
**>>> >
**>>> >
**>>> > Even tmp does not seem right:
**>>> >
**>>> > > summary(tmp)
**>>> > Error: recursive default argument reference
**>>> >
**>>> >
**>>> > Do you have any suggestion?
**>>> >
**>>> > Thanks,
**>>> > Jean-Pierre Gattuso
**>>> >
**>>> > ______________________________________________
**>>> > R-help@stat.math.ethz.ch mailing list
**>>> > https://stat.ethz.ch/mailman/listinfo/r-help
**>>> > PLEASE do read the posting guide!
**>>> http://www.R-project.org/posting-guide.html
**>>> >
**>>>
**>>> ______________________________________________
**>>> R-help@stat.math.ethz.ch mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>> PLEASE do read the posting guide!
**>>> http://www.R-project.org/posting-guide.html
**>>>
**>>
**>> ______________________________________________
**>> R-help@stat.math.ethz.ch mailing list
**>> https://stat.ethz.ch/mailman/listinfo/r-help
**>> PLEASE do read the posting guide!
**>> http://www.R-project.org/posting-guide.html
**>>
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide!
**> http://www.R-project.org/posting-guide.html
**>
*

*
