Re: [R] leaps

From: Maura E Monville <maura.monville_at_gmail.com>
Date: Tue, 18 Dec 2007 18:23:37 -0600

Thank you very much for the example. I think interactively I could get something.
But my obstacle is to write an R script that processes my set of data automatically.
My difficulty is to extract the information that appears on the screen, when R is operated interactively, from a scripts.

Let me go over some steps to make sure I am doing things right. Assume my data have been read into the matrix xx. Such a matrix contains a number of curve samples. I call each curve a cycle. I am processing one cycle at a time and using regression analysis to find the frequencies in the single cycle. Since I have many, I have to come up with an automatic way to do that. The main phases are:

  1. run a regression analysis including all possible frequencies
  2. use "step" to cut off the non significant frequencies
  3. input remaining frequencies from phase (2) to "regsubsets"

In the following I have cut and pasted some code and the information I get from phase (2) and phase (3). To start with I would like to make sure that I got the output from (3) right. The ouput of (3) tells me that the highest R^2 value was reached after 8 iterations and there are only 8 significant predictors in this model ??? In addition the only significant frequencies
(predictors) left are:

 cos1, cos2, cos4, cos7, sin1,sin2, sin3, sin5 I got this information interactively. But I'm in troubles at extracting it automatically.
Any suggestion ?

Question: Do I have to run "step" in advance of "regsubsets" for a first-pass model pruning or may I run "regsubsets" on the original model bringing in all possible frequencies (88 in all as sums of sinuoids) ?

Thank you so much.

Kind regards,
Maura


  T <- xx$timestamp[end] - xx$timestamp[start]    nsamples <- end +1 - start
   nfr <- ceiling(nsamples/2)
   yy <- xx[start:end,"amplitude"]
   tt <- xx[start:end,"timestamp"]

   cosmat <- matrix(nrow=nsamples,ncol=nfr)
   coscol <- NULL
   sinmat <- matrix(nrow=nsamples,ncol=nfr)
   sincol <- NULL
   for(i in 1:nfr){
     cosmat[,i] <- cos(tt*2*pi*i/T)
     coscol <- c(coscol,paste("cos",i,sep=""))
     sinmat[,i] <- sin(tt*2*pi*i/T)
     sincol <- c(sincol,paste("sin",i,sep=""))
   }
  colnames(cosmat) <- coscol
   colnames(sinmat) <- sincol
   xnam1 <- NULL
   xnam1 <- paste(sep="","cosmat[,",1:nfr,"]",collapse="+")
   xnam2 <- NULL
   xnam2 <- paste(sep="","sinmat[,",1:nfr,"]",collapse="+")
   fmla <- as.formula(paste("yy ~ ", paste(xnam1,"+",xnam2,sep="")))    FTmod <- lm(fmla)
   stepmod <- step(FTmod, direction="both")    summary(stepmod)

*Coefficients:

              Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.237808 0.008728 27.248 < 2e-16 ***
cosmat[, 1] -1.011932 0.012390 -81.675 < 2e-16 *** cosmat[, 2] -0.417265 0.012329 -33.844 < 2e-16 *** cosmat[, 3] 0.020143 0.012318 1.635 0.106604 cosmat[, 4] 0.081425 0.012397 6.568 8.43e-09 *** cosmat[, 6] -0.042804 0.012388 -3.455 0.000951 *** cosmat[, 7] -0.044927 0.012340 -3.641 0.000526 *** cosmat[, 9] 0.039322 0.012411 3.168 0.002298 ** cosmat[, 11] -0.020710 0.012375 -1.673 0.098831 . cosmat[, 14] 0.016393 0.012390 1.323 0.190245 cosmat[, 17] -0.016482 0.012374 -1.332 0.187319 cosmat[, 19] 0.016537 0.012387 1.335 0.186306 sinmat[, 1] -0.460705 0.012296 -37.469 < 2e-16 *** sinmat[, 2] -0.289316 0.012356 -23.416 < 2e-16 *** sinmat[, 3] -0.049610 0.012367 -4.011 0.000153 *** sinmat[, 4] 0.023937 0.012289 1.948 0.055551 . sinmat[, 5] 0.045542 0.012406 3.671 0.000476 *** sinmat[, 6] -0.017782 0.012298 -1.446 0.152790 sinmat[, 12] -0.016093 0.012325 -1.306 0.196038 ---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08135 on 68 degrees of freedom Multiple R-Squared: 0.9932, Adjusted R-squared: 0.9914 F-statistic: 549.8 on 18 and 68 DF, p-value: < 2.2e-16*

newmat <- cosmat[,-c(5,8,10,12,13,15,16,18,20:44)]
newmat <- cbind(newmat,sinmat[,-c(7:11,13:44)])
Regmod <- regsubsets(newmat, yy)

rs <- summary(Regmod)
which.max(rs$adjr)

*[1] 8

*rs$which[which.max(rs$adjr), ]

*(Intercept)        cos1        cos2        cos3        cos4        cos6
       TRUE        TRUE        TRUE       FALSE        TRUE       FALSE
       cos7        cos9       cos11       cos14       cos17       cos19
       TRUE       FALSE       FALSE       FALSE       FALSE       FALSE
       sin1        sin2        sin3        sin4        sin5        sin6
       TRUE        TRUE        TRUE       FALSE        TRUE       FALSE
      sin12
      FALSE*

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 19 Dec 2007 - 00:28:32 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 19 Dec 2007 - 00:30:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.