Re: [R] how to use large data set ?

From: mahesh r <rumahesh45_at_gmail.com>
Date: Thu 20 Jul 2006 - 06:22:44 EST

Hi,
I would like to extend to the query posted earlier on using large data bases. I am trying to use Rgdal to mine within the remote sensing imageries. I dont have problems bring the images within the R environment. But when I try to convert the images to a data.frame I receive an warning message from R saying "1: Reached total allocation of 510Mb: see help(memory.size)" and the process terminates. Due to project constarints I am given a very old 2.4Ghz computer with only 512 MB RAM. I think what R is currently doing is
trying to store the results in the RAM and since the image size is very big (some 9 million pixels), I think it gets out of memory.

My question is
1. Is there any possibility to dump the temporary variables in a temp folder within the hard disk (as many softwares do) instead of leting R store them in RAM
2. Could this be possible without creating a connection to a any back hand database like Oracle.

Thanks,

Mahesh

On 7/19/06, Greg Snow <Greg.Snow@intermountainmail.org> wrote:
>
> You did not say what analysis you want to do, but many common analyses
> can be done as special cases of regression models and you can use the
> biglm package to do regression models.
>
> Here is an example that worked for me to get the mean and standard
> deviation by day from an oracle database with over 23 million rows (I
> had previously set up 'edw' as an odbc connection to the database under
> widows, any of the database connections packages should work for you
> though):
>
> library(RODBC)
> library(biglm)
>
> con <- odbcConnect('edw',uid='glsnow',pwd=pass)
>
> odbcQuery(con, "select ADMSN_WEEKDAY_CD, LOS_DYS from CM.CASEMIX_SMRY")
>
> t1 <- Sys.time()
>
> tmp <- sqlGetResults(con, max=100000)
>
> names(tmp) <- c("Day","LoS")
> tmp$Day <- factor(tmp$Day, levels=as.character(1:7))
> tmp <- na.omit(tmp)
> tmp <- subset(tmp, LoS > 0)
>
> ff <- log(LoS) ~ Day
>
> fit <- biglm(ff, tmp)
>
> i <- nrow(tmp)
> while( !is.null(nrow( tmp <- sqlGetResults(con, max=100000) ) ) ){
> names(tmp) <- c("Day","LoS")
> tmp$Day <- factor(tmp$Day, levels=as.character(1:7))
> tmp <- na.omit(tmp)
> tmp <- subset(tmp, LoS > 0)
>
> fit <- update(fit,tmp)
>
> i <- i + nrow(tmp)
> cat(format(i,big.mark=',')," rows processed\n")
> }
>
> summary(fit)
>
> t2 <- Sys.time()
>
> t2-t1
>
> Hope this helps,
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow@intermountainmail.org
> (801) 408-8111
>
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Yohan CHOUKROUN
> Sent: Wednesday, July 19, 2006 9:42 AM
> To: 'r-help@stat.math.ethz.ch'
> Subject: [R] how to use large data set ?
>
> Hello R users,
>
>
>
> Sorry for my English, i'm French.
>
>
>
> I want to use a large dataset (3 millions of rows and 70 var) but I
> don't know how to do because my computer crash quickly (P4 2.8Ghz, 1Go
> ).
>
> I have also a bi Xeon with 2Go so I want to do computation on this
> computer and show the results on mine. Both of them are on Windows XP...
>
>
>
> To do shortly I have:
>
>
>
> 1 server with a MySQL database
>
> 1computer
>
> and I want to use them with a large dataset.
>
>
>
> I'm trying to use RDCOM to connect the database and installing (but it's
> hard for me..) Rpad.
>
>
>
> Is there another solutions ?
>
>
>
> Thanks in advance
>
>
>
>
>
> Yohan C.
>
>
>
> ----------------------------------------------------------------------
> Ce message est confidentiel. Son contenu ne represente en aucun cas un
> engagement de la part du Groupe Soft Computing sous reserve de tout
> accord conclu par ecrit entre vous et le Groupe Soft Computing. Toute
> publication, utilisation ou diffusion, meme partielle, doit etre
> autorisee prealablement.
> Si vous n'etes pas destinataire de ce message, merci d'en avertir
> immediatement l'expediteur.
> This message is confidential. Its content does not constitute a
> commitment by Soft Computing Group except where provided for in a
> written agreement between you and Soft Computing Group. Any unauthorised
> disclosure, use or dissemination, either whole or partial, is
> prohibited. If you are not the intended recipient of this message,
> please notify the sender immediately.
> ----------------------------------------------------------------------
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Jul 20 06:43:33 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 20 Jul 2006 - 08:22:50 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.