From: Liaw, Andy <andy_liaw_at_merck.com>

Date: Thu 24 Mar 2005 - 13:34:09 GMT

*> names(frz) <- c("x1", "x2")
*

*> predict(pca1, frz)
*

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Mar 25 00:38:51 2005

Date: Thu 24 Mar 2005 - 13:34:09 GMT

[Re-directing to R-devel, as I think this needs changes to the code.]

Can I suggest a modification to stats:predict.princomp so that it will check for column (variable) names?

In src/library/stats/R/princomp-add.R, insert the following after line 4:

if (!is.null(cn <- names(object$center))) newdata <- newdata[, cn]

Now Dana's example looks like:

*> predict(pca1, frz)
*

Error in "[.data.frame"(newdata, , names(object$center)) :

undefined columns selected

*> names(frz) <- c("x2", "x1")
**> predict(pca1, frz)
*

Comp.1 Comp.2

1 -3.29329963 -1.24675774 2 0.15760569 0.09364550 3 1.90206906 0.06292855 4 -0.92968723 0.64356801 5 -1.15298669 0.25451588 6 0.48466884 -0.87611668 7 0.98602646 -0.52156549 8 -1.53126034 -0.96259529 9 -0.79112984 -1.50831648 10 0.02997392 -0.18888807

Comp.1 Comp.2

1 2.49603051 -2.42516162 2 -0.15633499 0.15754735 3 -1.77400454 0.81118427 4 1.05941012 0.23869214 5 1.11286213 -0.20669206 6 -0.83645436 -0.60720531 7 -1.15932677 -0.08488413 8 0.98526969 -1.47482877 9 0.09070675 -1.68781215 10 -0.14930067 -0.15239717

Best,

Andy

*> From: Dana Honeycutt
**>
**> I am working with data sets in which the number and order of columns
**> may vary, but each column is uniquely identified by its name. E.g.,
**> one data set might have columns
**> MW logP Num_Rings Num_H_Donors
**> while another has columns
**> Num_Rings Num_Atoms Num_H_Donors logP MW
**>
**> I would like to be able to perform a principal component
**> analysis (PCA)
**> on one data set and save the PCA object to a file. In a
**> later R session,
**> I would like to load the object and then apply the loadings to a new
**> data set in order to compute the principal component (PC) values for
**> each row of new data.
**>
**> I am trying to use the princomp method in R to do this. (I started
**> with prcomp, but found that there is no predict method for objects
**> created by prcomp.) The problem is that when using predict on a
**> princomp object, R ignores the names of columns and simply assumes
**> that the column order is the same as in the original data frame used
**> to do the PCA. (This contrasts, for example, with the behavior of a
**> model produced by lm, which is aware of column names in a data frame.)
**>
**> What I think I need to do is this:
**>
**> 1. After reloading the princomp object, extract the names and order
**> of columns that it expects. (If you look at the loadings for the
**> object, you can see that this info is there, but I would like to
**> get at it directly somehow.)
**>
**> 2. Reorder the columns in the new data set to correspond to this
**> expected order, and remove any extra columns.
**>
**> 3. Use the predict method to predict the PC values for the
**> new data set.
**>
**> Is this the best approach to achieve what I am attempting?
**>
**> If so, can anyone tell me how to accomplish steps 1 and 2 above?
**>
**> Thanks,
**> Dana Honeycutt
**>
**> P.S. Here's a script that demonstrates the problem:
**>
**> x1 <- rnorm(10)
**> x2 <- rnorm(10)
**> y <- rnorm(10)
**>
**> frx <- data.frame(x1,x2)
**> frxy <- data.frame(x1,x2,y)
**>
**> lm1 <- lm(y~x1+x2,frxy)
**> pca1 <- princomp(frx)
**>
**> rm(x1,x2,y,frx,frxy)
**>
**> z1 <- rnorm(10)
**> z2 <- rnorm(10)
**> frz <- data.frame(z1,z2)
**>
**> predict(lm1, frz) # gives error: Object "x1" not found
**> predict(pca1, frz) # gives no error, indicating column names ignored
**>
**> z3 <- rnorm(10)
**> fr3z <- data.frame(frz,z3)
**> predict(pca1,fr3z) # gives error due to unexpected number of columns
**>
**> loadings(pca1) # shows linear combos of variables corresponding to PCs
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
*

> https://stat.ethz.ch/mailman/listinfo/r-help

*> PLEASE do read the posting guide!
**> http://www.R-project.org/posting-guide.html
**>
**>
**>
*

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Mar 25 00:38:51 2005

*
This archive was generated by hypermail 2.1.8
: Mon 20 Feb 2006 - 03:21:02 GMT
*