From: steven wilson <swpt07_at_gmail.com>

Date: Thu, 01 May 2008 00:10:39 -0400

}

}

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 01 May 2008 - 04:22:21 GMT

Date: Thu, 01 May 2008 00:10:39 -0400

The code I sent before had some typos, here is the corrected one:

pca.nipals <- function(X, ncomp, iter = 50, toler = sqrt(.Machine$double.eps))

* # X...data matrix, ncomp...number of components,
*

# iter...maximal number of iterations per component,

# toler...precision tolerance for calculation of components

{

Xh <- scale(X, center = TRUE, scale = FALSE) nr <- 0 T <- NULL; P <- NULL # matrix of scores and loadings for (h in 1:ncomp) { th <- Xh[, 1] ende <- FALSE while (!ende) { nr <- nr + 1 ph <- t(crossprod(th, Xh) * as.vector(1 / crossprod(th))) ph <- ph * as.vector(1/sqrt(crossprod(ph))) thnew <- t(tcrossprod(t(ph), Xh) * as.vector(1/(crossprod(ph)))) prec <- crossprod(th-thnew) th <- thnew if (prec <= (toler^2)) ende <- TRUE if (iter <= nr) ende <- TRUE # didn't converge } Xh <- Xh - tcrossprod(th) T <- cbind(T, th); P <- cbind(P, ph) nr <- 0 } list(T = T, P = P)

}

Thanks again

- Forwarded message ---------- From: steven wilson <swpt07_at_gmail.com> Date: Wed, Apr 30, 2008 at 11:56 PM Subject: efficient code - yet another question To: r-help_at_r-project.org

Dear list members;

The code given below corresponds to the PCA-NIPALS (principal component analysis) algorithm adapted from the nipals function in the package chemometrics. The reason for using NIPALS instead of SVD is the ability of this algorithm to handle missing values, but that's a different story. I've been trying to find a way to improve (if possible) the efficiency of the code, especially when the number of components to calculate is higher than 100. I've been working with a data set of 500 rows x 700 variables. The code gets really slow when the number of PC to calculate is for example 600 (why that number of components?....because I need them to detect outliers using another algorithm). In my old laptop running Win XP and R 2.7.0 (1GB RAM) it takes around 6 or 7 minutes. That shouldn't be a problem for one analysis, but when cross-validation is added the time increases severely.....Although there are several examples on the R help list giving some with 'hints' to improve effciency the truth is that I don't know (or I can't see it) the part of the code that can be improved (but it is clear that the bottle neck is the for and while loops). I would really appreciate any ideas/comments/etc...

Thanks

#################################################################

pca.nipals <- function(X, ncomp, iter = 50, toler = sqrt(.Machine$double.eps))

* # X...data matrix, ncomp...number of components,
*

# iter...maximal number of iterations per component,

# toler...precision tolerance for calculation of components

{

Xh <- scale(X, center = TRUE, scale = FALSE) nr <- 0 T <- NULL; P <- NULL # matrix of scores and loadings for (h in 1:ncomp) { th <- Xh[, 1] ende <- FALSE while (!ende) { nr <- nr + 1 ph <- t(crossprod(th, Xh) * as.vector(1 / crossprod(th))) ph <- ph * as.vector(1/sqrt(crossprod(ph))) thnew <- t(tcrossprod(t(ph), Xh) * as.vector(1/(crossprod(ph)))) prec <- crossprod(th-thnew) th <- thnew if (prec <= (tol^2)) ende <- TRUE if (it <= nr) ende <- TRUE # didn't converge } Xh <- Xh - tcrossprod(th) T <- cbind(T, th); P <- cbind(P, ph) nr <- 0 } list(T = T, P = P)

}

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 01 May 2008 - 04:22:21 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 01 May 2008 - 05:30:34 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*