[Rd] RE: [R] Mapping actual to expected columns for princomp object

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Thu 24 Mar 2005 - 13:34:09 GMT


[Re-directing to R-devel, as I think this needs changes to the code.]

Can I suggest a modification to stats:predict.princomp so that it will check for column (variable) names?

In src/library/stats/R/princomp-add.R, insert the following after line 4:

    if (!is.null(cn <- names(object$center))) newdata <- newdata[, cn]

Now Dana's example looks like:

> predict(pca1, frz)

Error in "[.data.frame"(newdata, , names(object$center)) :

        undefined columns selected
> names(frz) <- c("x2", "x1")
> predict(pca1, frz)

        Comp.1 Comp.2

1  -3.29329963 -1.24675774
2   0.15760569  0.09364550
3   1.90206906  0.06292855
4  -0.92968723  0.64356801
5  -1.15298669  0.25451588
6   0.48466884 -0.87611668
7   0.98602646 -0.52156549
8  -1.53126034 -0.96259529
9  -0.79112984 -1.50831648
10  0.02997392 -0.18888807

> names(frz) <- c("x1", "x2")
> predict(pca1, frz)

        Comp.1 Comp.2

1   2.49603051 -2.42516162
2  -0.15633499  0.15754735
3  -1.77400454  0.81118427
4   1.05941012  0.23869214
5   1.11286213 -0.20669206
6  -0.83645436 -0.60720531
7  -1.15932677 -0.08488413
8   0.98526969 -1.47482877
9   0.09070675 -1.68781215
10 -0.14930067 -0.15239717

Best,
Andy

> From: Dana Honeycutt
>
> I am working with data sets in which the number and order of columns
> may vary, but each column is uniquely identified by its name. E.g.,
> one data set might have columns
> MW logP Num_Rings Num_H_Donors
> while another has columns
> Num_Rings Num_Atoms Num_H_Donors logP MW
>
> I would like to be able to perform a principal component
> analysis (PCA)
> on one data set and save the PCA object to a file. In a
> later R session,
> I would like to load the object and then apply the loadings to a new
> data set in order to compute the principal component (PC) values for
> each row of new data.
>
> I am trying to use the princomp method in R to do this. (I started
> with prcomp, but found that there is no predict method for objects
> created by prcomp.) The problem is that when using predict on a
> princomp object, R ignores the names of columns and simply assumes
> that the column order is the same as in the original data frame used
> to do the PCA. (This contrasts, for example, with the behavior of a
> model produced by lm, which is aware of column names in a data frame.)

>
> What I think I need to do is this:
>
> 1. After reloading the princomp object, extract the names and order
> of columns that it expects. (If you look at the loadings for the
> object, you can see that this info is there, but I would like to
> get at it directly somehow.)
>
> 2. Reorder the columns in the new data set to correspond to this
> expected order, and remove any extra columns.
>
> 3. Use the predict method to predict the PC values for the
> new data set.
>
> Is this the best approach to achieve what I am attempting?
>
> If so, can anyone tell me how to accomplish steps 1 and 2 above?
>
> Thanks,
> Dana Honeycutt
>
> P.S. Here's a script that demonstrates the problem:
>
> x1 <- rnorm(10)
> x2 <- rnorm(10)
> y <- rnorm(10)
>
> frx <- data.frame(x1,x2)
> frxy <- data.frame(x1,x2,y)
>
> lm1 <- lm(y~x1+x2,frxy)
> pca1 <- princomp(frx)
>
> rm(x1,x2,y,frx,frxy)
>
> z1 <- rnorm(10)
> z2 <- rnorm(10)
> frz <- data.frame(z1,z2)
>
> predict(lm1, frz) # gives error: Object "x1" not found
> predict(pca1, frz) # gives no error, indicating column names ignored
>
> z3 <- rnorm(10)
> fr3z <- data.frame(frz,z3)
> predict(pca1,fr3z) # gives error due to unexpected number of columns
>
> loadings(pca1) # shows linear combos of variables corresponding to PCs
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Mar 25 00:38:51 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:02 GMT