Re: [R] Scaling in predict.prcomp

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Sun, 20 Apr 2008 16:54:28 +0100 (BST)

On Sun, 20 Apr 2008, Gad Abraham wrote:

> Hi,
>
> Say x.train is a matrix of covariates that I want to do PCA on, so I can
> do regression on its principal components, and x.test is a test set of
> the same covariates on which I want to evaluate the regression fit. I
> would like the covariates to be centred and scaled:
>
> p <- prcomp(x.train, center=TRUE, scale=TRUE)
> x.train.pc <- predict(p)
>
> Now I want to get the PCs from the test set.

The way to do that is to call prcomp() on the test set.

If you want to project new data onto the PCs of the training set (as a set of axes in the data space), you just use predict(p, newdata=).

> Should I use the same center and scale vectors from the training set:
>
> x.test.pc <- predict(p, newdata=x.test, center=p$center, scale=p$center)
>
> or use the training set's own centers and scales:
>
> x.test.pc <- predict(p, newdata=x.test, center=TRUE, scale=TRUE)

I see no evidence that those additional arguments are used.

predict.prcomp uses the origin of the training set's PCs, since it is that coordinate system which you are projecting onto.

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 20 Apr 2008 - 16:05:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 Apr 2008 - 02:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive