From: Guojun Zhu <shmilylemon_at_yahoo.com>

Date: Tue 02 May 2006 - 18:23:42 EST

}

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 02 18:27:38 2006

Date: Tue 02 May 2006 - 18:23:42 EST

Thank you very much. It rocks. And actually I
discovered that what really slow down the program is

"return$vol.cap[[i]]=mean(VOL[(i-12):(i-1)],na.rm=TRUE)/return$cap[[i]]"

If I took out this and my original code takes about 10 minutes and halt at the place where all NA shows up. It seems R is extra slow for something related to "[[]]". I end up rewriten this part as a vector addition, which takes a few seconds, not as great as what you showed, but way more impressive.

trail.mean<-function(a,n){

l<-length(a);temp<-a[1:(l-n)]; for (i in 2:n)temp<-temp+a[i:(l-n+i-1)]; c(rep(NA,n),temp/n)

}

return$vol.cap=trail.mean(VOL,12)/return$cap;

For your first method, actually add in "alg="exact"" and runmean does work with NA. Thank you very much. I never thought it could be so fast. It is so tricky though. :)

I am wondering if there is any materials about the efficiency of R. What command is quick and what is slow. I am going to read the runmean's original code when I have time.

Thank you again. I am actually thinking to change to use SAS before you guys save me.

- Gabor Grothendieck <ggrothendieck@gmail.com> wrote:

> Using runmean from caTools the first one below does

*> it in under 1 second but will not handle NAs. The
**> second one takes under 15 seconds and handles
**> them by replacing them with linear approximations.
**> Note that k must be odd.
**>
**> # 1
**>
**> library(caTools)
**> set.seed(1)
**> system.time({
**> y <- rnorm(140001)
**> x <- as.numeric(seq(y))
**> k <- 61
**> Mxy <- runmean(x * y, k)
**> Mxx <- runmean(x * x, k)
**> Mx <- runmean(x, k)
**> My <- runmean(y, k)
**> b <- (Mxy - Mx * My) / (Mxx - Mx * Mx)
**> a <- My - b * Mx
**> })
**>
**> # 2
**>
**> library(caTools)
**> library(zoo)
**> set.seed(1)
**> system.time({
**> y <- rnorm(140000)
**> x <- as.numeric(seq(y))
**> x[100:200] <- NA
**> x <- na.approx(zoo(x))
**> y <- zoo(y)
**> k <- 60
**> Mxy <- runmean(x * y, k)
**> Mxx <- runmean(x * x, k)
**> Mx <- runmean(x, k)
**> My <- runmean(y, k)
**> b <- (Mxy - Mx * My) / (Mxx - Mx * Mx)
**> a <- My - b * Mx
**> })
**>
**>
**> On 5/1/06, Guojun Zhu <shmilylemon@yahoo.com> wrote:
**> > I basically has a long data.frame a. but I only
**> need
**> > three columns x,y. Let us say the index of row is
**> t.
**> > I need to produce new column s_t as the linear
**> > regression coefficient of (x_(t-60),...x_(t-1)) on
**> > (y_(t-60),...,y_(t-1)). The data is about 140,000
**> > rows. I wrote a simple code on this which is
**> super
**> > slow, it takes more than 2 hours on a 2.8Ghz Intel
**> Duo
**> > Core. My friend use SAS and his code needs only
**> > couple of minutes. I know there must be some more
**> > efficient way to write it. Can anyone help me on
**> > this? Here is the code.
**> >
**> > Also one line produce a complete NA temp$y and lm
**> > function failed on that. How to make it just
**> produce
**> > a NA instead and keep runing?
**> >
**> > attach(return)
**> > betat=rep(NA,length(RET))
**> > for (i in 61:length(RET)){cat(i," ");
**> > if (year[[i]]>=1995){
**> >
**> >
*

>

temp<-data.frame(y=RET[(i-60):(i-1)]-riskfree[(i-60):(i-1)],x=sprtrn[(i-60):(i-1)]-riskfree[(i-60):(i-1)])

*> >
**> >
*

>

betat[[i]]<-lm(y~x+1,na.action=na.exclude,temp)[[1]][[2]]

> > #if (i%%100==0)

*> > cat(i," ");
**> >
**> >
**> >
*

>

return$vol.cap[[i]]=mean(VOL[(i-12):(i-1)],na.rm=TRUE)/return$cap[[i]]

*> > }
**> > }
**> >
**> > ______________________________________________
**> > R-help@stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide!
*

> http://www.R-project.org/posting-guide.html

*> >
*

>

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 02 18:27:38 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Tue 02 May 2006 - 20:10:01 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*