# Re: [R] efficiency in merging two data frames

From: Petr Pikal <petr.pikal_at_precheza.cz>
Date: Tue 02 May 2006 - 17:55:10 EST

Hi

and you would better to give up thinking in "for cycles" as a good tool for data manipulation and calculation.

R has usually functions for direct manipulation and calculation with vectors or matrices, vhich are far more effective than calculation in cycle.

I also recommend you to go through Paul Johnsons Rtips which is set of many good ideas how to do some things.

HTH
Petr

On 1 May 2006 at 7:54, Steve Miller wrote:

```From:           	"Steve Miller" <steve.miller@jhu.edu>
To:             	"'Guojun Zhu'" <shmilylemon@yahoo.com>,
<r-help@stat.math.ethz.ch>
Date sent:      	Mon, 1 May 2006 07:54:54 -0500
Subject:        	Re: [R] efficiency in merging two data frames

```

> code. I deal with similar investment data in even larger numbers (e.g.
> 10 years of daily return data for each stock in the Russell 3000), and
> prefer reading and consolidating the data in Python using dictionaries
> and lists, then either piping the data to R in a read statement
> (read.table("pipe python...")) or using Rpy to write R data frames
> directly from Python. Python is more facile with these basic data
> manipulations for hundreds of thousands or even millions of records,
> and performance is generally considerably better.
>
> Steve Miller
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Guojun Zhu
> Sent: Monday, May 01, 2006 2:35 AM To: r-help@stat.math.ethz.ch
> Subject: [R] efficiency in merging two data frames
>
> I have two data sets about lots of companies' stock
> and fiscal data. One is monthly data with about
> 144,000 lines, and the other is quaterly with about
> 56,000. Each data set takes different company code.
> I need to merge these two together. I read both ask
> cvs. And the other file with corresponding firm code.
> Now I have three data sets. return\$PERMNO,
> account\$GVKEY. id is the data frames of the
> corresponding relation and has both id\$PERMNO and
> id\$GVKEY. Also, I need to convert the return's month
> into quarter and finally merge two data frames(return
> and account). I end up write a short program for
> this, but it runs very slow. 15+ minutes. Is there
> quick way to do it. Here is my original codes.
>
>
>
> id\$fy=rep(0,length(id\$PERMNO))
> for (i in 1:length(id\$PERMNO))
>
> id\$fy[[i]]<-account\$FYR[id\$GVKEY[[i]]==account\$GVKEY][[1]]
>
> return\$GVKEY=rep(0,length(return\$PERMNO))
> return\$fyy=rep(0,length(return\$PERMNO))
> return\$fyq=rep(0,length(return\$PERMNO))
> for (i in i:length(return\$PERMNO)) {
> temp<-id\$PERMNO==return\$PERMNO[[i]];
> tempmon<-id\$fy[temp][[1]];
> if (return\$month[[i]]<-tempmon) {
> return\$fyy[[i]]<-return\$year[[i]];
> return\$fyq[[i]]<-4-(tempmon-return\$month[[i]])%/%3;
> }
> else{
> return\$fyy[[i]]<-return\$year[[i]]+1;
> return\$fyq[[i]]<-(return\$month[[i]]-tempmon-1)%/%3;
> }
> return\$GVKEY[[i]]<-id\$GVKEY[temp][[1]];
> }
>
> returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("G
> VKEY", "fyy","fyq"))
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html

Petr Pikal
petr.pikal@precheza.cz

R-help@stat.math.ethz.ch mailing list