I'm sure you'll get ingenious responses to help you optimize your R code. I deal with similar investment data in even larger numbers (e.g. 10 years of daily return data for each stock in the Russell 3000), and prefer reading and consolidating the data in Python using dictionaries and lists, then either piping the data to R in a read statement (read.table("pipe python...")) or using Rpy to write R data frames directly from Python. Python is more facile with these basic data manipulations for hundreds of thousands or even millions of records, and performance is generally considerably better.

I have two data sets about lots of companies' stock and fiscal data. One is monthly data with about 144,000 lines, and the other is quaterly with about 56,000. Each data set takes different company code. I need to merge these two together. I read both ask cvs. And the other file with corresponding firm code.  Now I have three data sets. return$PERMNO, account$GVKEY. id is the data frames of the corresponding relation and has both id$PERMNO and id$GVKEY. Also, I need to convert the return's month into quarter and finally merge two data frames(return and account). I end up write a short program for this, but it runs very slow. 15+ minutes. Is there quick way to do it. Here is my original codes.

for (i in 1:length(id$PERMNO))



for (i in i:length(return$PERMNO)) {

    temp<-id$PERMNO==return$PERMNO[[i]];     tempmon<-id$fy[temp][[1]];
    if (return$month[[i]]<-tempmon) {


else{ return$fyy[[i]]<-return$year[[i]]+1; return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3;

    return$GVKEY[[i]]<-id$GVKEY[temp][[1]]; }    

returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("GVKEY", "fyy","fyq"))

