Re: [R] PCA problem in R

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue 16 Aug 2005 - 03:42:32 EST

On Mon, 15 Aug 2005, Dennis Shea wrote:

> [SNIP]>>
>>>> On Sat, 13 Aug 2005, Alan Zhao wrote:
>>>>
>>>>> When I have more variables than units, say a 195*10896 matrix which has
>>>>> 10896 variables and 195 samples. prcomp will give only 195 principal
>>>>> components. I checked in the help, but there is no explanation that why
>>>>> this happen.
>
> [SNIP]
>
>> Sincerely,
>> Zheng Zhao
>> Aug-14-2005
>> ______________________________________________
>
> Just yesterday I subscribed to r-help because I am planning
> on learning the basics of R ... today. :-)
> Thus, I am not sure about the history of this question.

> The above situation, more variables than samples,
> is commonly encounterd in the climate studies.
> Consider annual mean temperatures for 195 years
> on a coarse 72 [lat] x 144 [lon] grid [72*144=10368
> spatial variables].

Which are variables and which are samples here? In standard statistical parlance you have 195 variables at 10368 samples. In some fields there are the concepts of R-mode and Q-mode PCA, and you seem to be in Q-mode, which is why you have a transpose.

> Let S be the number of grid points and T be the number
> of years. I think there is a theorem (?Eckart-Young?)
> which states that the maximum number of unique eigenvalues
> is min(S,T). In your case 195 eigenvalues is correct.

Eigenvalues of what? Eckart-Young is about the SVD, see e.g.

http://voteview.com/ideal_point_Eckart_Young_Theorem.htm

as Googling easily shows. (It is used to prove some of the approximation properties of PCA, e.g. in

http://www.stats.ox.ac.uk/~ripley/MultAnal_MT2004/PCA.pdf)

> I speculate that the underlying function transposes the
> input data matrix and computes the the TxT [rather than SxS]
> covariance matrix and solves for the eigenvalues/vectors.
> It then uses a linear transformation to get the results
> for the original input data matrix.
>
> Computationally, the above is much faster and uses less memory.

You speculate incorrectly, even in your Q-mode view of the world. The real point is that is solves a different problem, which is what my answer to the original post was about.

> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

It really would be a good idea to do the homework it suggests.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Aug 16 03:48:35 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:21:14 EST