[R] Alternatives to merge for large data sets?

From: Adam D. I. Kramer <adik_at_ilovebacon.org>
Date: Thu 07 Sep 2006 - 06:12:52 GMT

Hello,

I am trying to merge two very large data sets, via

pubbounds.prof <-
merge(x=pubbounds,y=prof,by.x="user",by.y="userid",all=TRUE,sort=FALSE)

which gives me an error of

Error: cannot allocate vector of size 2962 Kb

I am reasonably sure that this is correct syntax.

The trouble is that pubbounds and prof are large; they are data frames which take up 70M and 11M respectively when saved as .Rdata files.

I understand from various archive searches that "merge can't handle that," because merge takes n^2 memory, which I do not have.

My question is whether there is an alternative to merge which would carry out the process in a slower, iterative manner...or if I should just bite the bullet, write.table, and use a perl script to do the job.

Thankful as always,
Adam D. I. Kramer



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Sep 07 16:18:12 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 07 Sep 2006 - 09:42:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.