[R] a basic question about standardization?

From: Michael <comtech.usa_at_gmail.com>
Date: Wed 15 Feb 2006 - 20:54:01 EST


Hi all,

I have a question about standardization.

Suppose I have training data which is a X matrix, of size N x p, where N is the number of samples, p is the number of variables in the data set. Y is a response vector of size N x 1, each element correspoding to each row of the X matrix.

I do standardization on X, X1=scale(X, TRUE, TRUE), and Y1=scale(Y, TRUE, FALSE). And I got a regression coefficient vector Beta.

I am wondering how to I manipulate this Beta to run the test?

Is this Beta the same as the one we would obtain if we don't standization the training data?

For testing data, I guess I should not standardize, otherwise, how can my predicted data match with the non-standardized test Y vector?

If I do standardization on the test X matrix and Y vector, then I lose my physical meaning of prediction error -- it will be distorted... since, at the end of the day, I want to see the prediction accuracy in the origianl non-standardized domain...

If I don't do standardization on the text X matrix and Y vector, I am not sure if that Beta obtained through standardization is usable here...

I am confused by the fact that there is an extra intercept... and there is one additional coefficient called Beta0.

How does that intercept interact with my standardization and not-standardization?

Thanks a lot!

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Feb 15 20:59:16 2006

This archive was generated by hypermail 2.1.8 : Thu 16 Feb 2006 - 16:08:52 EST