[R] For loop gets exponentially slower as dataset gets larger...

From: r user <ruser2006_at_yahoo.com>
Date: Wed 04 Jan 2006 - 03:59:19 EST


I am running R 2.1.1 in a Microsoft Windows XP environment.    

  I have a matrix with three vectors (“columns”) and ~2 million “rows”. The three vectors are date_, id, and price. The data is ordered (sorted) by code and date_.    

(The matrix contains daily prices for several thousand stocks, and has ~2 million “rows”. If a stock did not trade on a particular date, its price is set to “NA”)
   

  I wish to add a fourth vector that is “next_price”. (“Next price” is the current price as long as the current price is not “NA”. If the current price is NA, the “next_price” is the next price that the security with this same ID trades. If the stock does not trade again, “next_price” is set to NA.)    

  I wrote the following loop to calculate next_price. It works as intended, but I have one problem. When I have only 10,000 rows of data, the calculations are very fast. However, when I run the loop on the full 2 million rows, it seems to take ~ 1 second per row.    

  Why is this happening? What can I do to speed the calculations when running the loop on the full 2 million rows?    

(I am not running low on memory, but I am maxing out my CPU at 100%)
   

  Here is my code and some sample data:    

  data<- data[order(data$code,data$date_),]   l<-dim(data)[1]
  w<-3
  data[l,w+1]<-NA    

  for (i in (l-1):(1)){
  data[i,w+1]<-ifelse(is.na(data[i,w])==F,data[i,w],ifelse(data[i,2]==data[i+1,2],data[i+1,w+1],NA))   }        

  date      id         price     next_price
  6/24/2005        1635    444.7838         444.7838
  6/27/2005        1635    448.4756         448.4756
  6/28/2005        1635    455.4161         455.4161
  6/29/2005        1635    454.6658         454.6658
  6/30/2005        1635    453.9155         453.9155
  7/1/2005          1635    453.3153         453.3153
  7/4/2005          1635    NA      453.9155
  7/5/2005          1635    453.9155         453.9155
  7/6/2005          1635    453.0152         453.0152
  7/7/2005          1635    452.8651         452.8651
  7/8/2005          1635    456.0163         456.0163
  12/19/2005      1635    442.6982         442.6982
  12/20/2005      1635    446.5159         446.5159
  12/21/2005      1635    452.4714         452.4714
  12/22/2005      1635    451.074           451.074
  12/23/2005      1635    454.6453         454.6453
  12/27/2005      1635    NA      NA
  12/28/2005      1635    NA      NA
  12/1/2003        1881    66.1562           66.1562
  12/2/2003        1881    64.9192           64.9192
  12/3/2003        1881    66.0078           66.0078
  12/4/2003        1881    65.8098           65.8098
  12/5/2003        1881    64.1275           64.1275
  12/8/2003        1881    64.8697           64.8697
  12/9/2003        1881    63.5337           63.5337
  12/10/2003      1881    62.9399           62.9399

		
---------------------------------

	[[alternative HTML version deleted]]



______________________________________________

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jan 04 04:20:05 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:46 EST