[R] Re : Large database help

From: justin bem <justin_bem_at_yahoo.fr>
Date: Tue 16 May 2006 - 23:15:08 EST


Try to open your db with MySQL and use RMySQL

You can read fixed-width-files with read.fwf(). But my rough calculation says that your dataset will require 40GB of RAM. I don't think you'll be able to read the entire thing into R. Maybe look at a subset?

-roger

Rogerio Porto wrote:
> Hello all.
>
> I have a large .txt file whose variables are fixed-columns,
> ie, variable V1 goes from columns 1 to 7, V2 from 8 to 23 etc.
> This is a 60GB file with 90 variables and 60 million observations.
>
> I'm working with a Pentium 4, 1GB RAM, Windows XP Pro.
> I tried the following code just to see if I could work with 2 variables
> but it seems not possible:
> R : Copyright 2005, The R Foundation for Statistical Computing
> Version 2.2.1 (2005-12-20 r36812)
> ISBN 3-900051-07-0

>> gc()

> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 169011 4.6 350000 9.4 350000 9.4
> Vcells 62418 0.5 786432 6.0 289957 2.3
>> memory.limit(size=4090)

> NULL
>> memory.limit()

> [1] 4288675840
>> system.time(a<-matrix(runif(1e6),nrow=1))

> [1] 0.28 0.02 2.42 NA NA
>> gc()

> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 171344 4.6 350000 9.4 350000 9.4
> Vcells 1063212 8.2 3454398 26.4 4063230 31.0
>> rm(a)
>> ls()

> character(0)
>> system.time(a<-matrix(runif(60e6),nrow=1))

> Error: not possible to alocate vector of size 468750 Kb
> Timing stopped at: 7.32 1.95 83.55 NA NA
>> memory.limit(size=5000)

> Erro em memory.size(size) : .....4GB
>
> So my questions are:
> 1) (newbie) how can I read fixed-columns text files like this?
> 2) is there a way I can analyze (statistics like correlations, cluster etc)
> such a large database neither increasing RAM nor changing to 64bit
> machine but still using R and not using a sample? How?
>
> Thanks in advance.
>
> Rogerio.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 16 23:18:32 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 17 May 2006 - 04:10:09 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.