Re: [R] memory management

From: bogdan romocea <br44114_at_gmail.com>
Date: Mon 30 Oct 2006 - 17:00:09 GMT


This was asked before. Collapse the data frame into a vector, e.g. v <- apply(DF,1,function(x) {paste(x,collapse="_")}) then work with the values of that vector (table, unique etc). If your data frame is really large run this in a DBMS.

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of
> Federico Calboli
> Sent: Monday, October 30, 2006 11:35 AM
> To: r-help
> Subject: [R] memory management
>
> Hi All,
>
> just a quick (?) question while I wait my code runs...
>
> I'm comparing the identity of the lines of a dataframe, doing
> all possible
> pairwise comparisons. In doing so I use identical(), but
> that's by the way. I'm
> doing a (not so) quick and dirty check, and subsetting the data as
>
> data[row.numb,]
>
> and
>
> data[a different row,]
>
> I suspect the problem there is that I load into memory the
> whole frame data[,]
> every time, making the biz quite slow and wasteful. As I'm
> idly waiting, I
> though, had I put every line of data[,] as the item of a
> list, then done my
> pairwise comparisons using the list, would I have had a
> better performance?
>
> (do I win the prize for the most convoluted sentence sent to
> the R-help?)
>
> For the pedants, yes, I know I could kill the process and try
> myself, but the
> spirit of the question is, is there a way of dealing with big
> data *efficiently*?
>
> Best,
>
> Fede
>
> --
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
>
> Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
>
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Oct 31 04:06:15 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 30 Oct 2006 - 17:30:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.