From: Dana Honeycutt <dana_at_accelrys.com>

Date: Thu 24 Mar 2005 - 11:09:11 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Mar 24 11:17:54 2005

Date: Thu 24 Mar 2005 - 11:09:11 EST

I am working with data sets in which the number and order of columns
may vary, but each column is uniquely identified by its name. E.g.,
one data set might have columns

MW logP Num_Rings Num_H_Donors

while another has columns

Num_Rings Num_Atoms Num_H_Donors logP MW

I would like to be able to perform a principal component analysis (PCA) on one data set and save the PCA object to a file. In a later R session, I would like to load the object and then apply the loadings to a new data set in order to compute the principal component (PC) values for each row of new data.

I am trying to use the princomp method in R to do this. (I started with prcomp, but found that there is no predict method for objects created by prcomp.) The problem is that when using predict on a princomp object, R ignores the names of columns and simply assumes that the column order is the same as in the original data frame used to do the PCA. (This contrasts, for example, with the behavior of a model produced by lm, which is aware of column names in a data frame.)

What I think I need to do is this:

- After reloading the princomp object, extract the names and order of columns that it expects. (If you look at the loadings for the object, you can see that this info is there, but I would like to get at it directly somehow.)
- Reorder the columns in the new data set to correspond to this expected order, and remove any extra columns.
- Use the predict method to predict the PC values for the new data set.

Is this the best approach to achieve what I am attempting?

If so, can anyone tell me how to accomplish steps 1 and 2 above?

Thanks,

Dana Honeycutt

P.S. Here's a script that demonstrates the problem:

x1 <- rnorm(10)

x2 <- rnorm(10)

y <- rnorm(10)

frx <- data.frame(x1,x2)

frxy <- data.frame(x1,x2,y)

lm1 <- lm(y~x1+x2,frxy)

pca1 <- princomp(frx)

rm(x1,x2,y,frx,frxy)

z1 <- rnorm(10)

z2 <- rnorm(10)

frz <- data.frame(z1,z2)

predict(lm1, frz) # gives error: Object "x1" not found predict(pca1, frz) # gives no error, indicating column names ignored

z3 <- rnorm(10)

fr3z <- data.frame(frz,z3)

predict(pca1,fr3z) # gives error due to unexpected number of columns

loadings(pca1) # shows linear combos of variables corresponding to PCs

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Mar 24 11:17:54 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:30:55 EST
*