Re: [R] Scaling in predict.prcomp

From: Gad Abraham <gabraham_at_csse.unimelb.edu.au>
Date: Mon, 21 Apr 2008 10:27:32 +1000

Prof Brian Ripley wrote:
> On Sun, 20 Apr 2008, Gad Abraham wrote:
>

>> Hi,
>>
>> Say x.train is a matrix of covariates that I want to do PCA on, so I can
>> do regression on its principal components, and x.test is a test set of
>> the same covariates on which I want to evaluate the regression fit. I
>> would like the covariates to be centred and scaled:
>>
>> p <- prcomp(x.train, center=TRUE, scale=TRUE)
>> x.train.pc <- predict(p)
>>
>> Now I want to get the PCs from the test set.

>
> The way to do that is to call prcomp() on the test set.
>
> If you want to project new data onto the PCs of the training set (as a
> set of axes in the data space), you just use predict(p, newdata=).
>
>> Should I use the same center and scale vectors from the training set:
>>
>> x.test.pc <- predict(p, newdata=x.test, center=p$center, scale=p$center)
>>
>> or use the training set's own centers and scales:
>>
>> x.test.pc <- predict(p, newdata=x.test, center=TRUE, scale=TRUE)

>
> I see no evidence that those additional arguments are used.
>
> predict.prcomp uses the origin of the training set's PCs, since it is
> that coordinate system which you are projecting onto.
>

I should've have looked more carefully, now I see that in the code for predict.prcomp the test data will indeed get centred and scaled according to the training data's vectors:

getAnywhere(predict.prcomp)
...
scale(newdata, object$center, object$scale) %*% object$rotation

Thanks,
Gad

-- 
Gad Abraham
Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham_at_csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 21 Apr 2008 - 00:31:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 Apr 2008 - 01:30:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive