Re: [R] efficiency in merging two data frames

From: bogdan romocea <br44114_at_gmail.com>
Date: Tue 02 May 2006 - 00:37:38 EST


Another good option is SQL, the fastest and most scalable solution. If you decide to give it a try pay close attention to indexes.

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Steve Miller
> Sent: Monday, May 01, 2006 8:55 AM
> To: 'Guojun Zhu'; r-help@stat.math.ethz.ch
> Subject: Re: [R] efficiency in merging two data frames
>
> I'm sure you'll get ingenious responses to help you optimize
> your R code. I
> deal with similar investment data in even larger numbers
> (e.g. 10 years of
> daily return data for each stock in the Russell 3000), and
> prefer reading
> and consolidating the data in Python using dictionaries and
> lists, then
> either piping the data to R in a read statement (read.table("pipe
> python...")) or using Rpy to write R data frames directly from Python.
> Python is more facile with these basic data manipulations for
> hundreds of
> thousands or even millions of records, and performance is generally
> considerably better.
>
> Steve Miller
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Guojun Zhu
> Sent: Monday, May 01, 2006 2:35 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] efficiency in merging two data frames
>
> I have two data sets about lots of companies' stock
> and fiscal data. One is monthly data with about
> 144,000 lines, and the other is quaterly with about
> 56,000. Each data set takes different company code.
> I need to merge these two together. I read both ask
> cvs. And the other file with corresponding firm code.
> Now I have three data sets. return$PERMNO,
> account$GVKEY. id is the data frames of the
> corresponding relation and has both id$PERMNO and
> id$GVKEY. Also, I need to convert the return's month
> into quarter and finally merge two data frames(return
> and account). I end up write a short program for
> this, but it runs very slow. 15+ minutes. Is there
> quick way to do it. Here is my original codes.
>
>
>
> id$fy=rep(0,length(id$PERMNO))
> for (i in 1:length(id$PERMNO))
>
> id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]]
>
> return$GVKEY=rep(0,length(return$PERMNO))
> return$fyy=rep(0,length(return$PERMNO))
> return$fyq=rep(0,length(return$PERMNO))
> for (i in i:length(return$PERMNO)) {
> temp<-id$PERMNO==return$PERMNO[[i]];
> tempmon<-id$fy[temp][[1]];
> if (return$month[[i]]<-tempmon) {
> return$fyy[[i]]<-return$year[[i]];
> return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3;
> }
> else{
> return$fyy[[i]]<-return$year[[i]]+1;
> return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3;
> }
> return$GVKEY[[i]]<-id$GVKEY[temp][[1]];
> }
>
> returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by
> .y<-c("GVKEY",
> "fyy","fyq"))
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 02 00:45:33 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 02 May 2006 - 02:09:54 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.