From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Tue 16 Aug 2005 - 03:42:32 EST

On Mon, 15 Aug 2005, Dennis Shea wrote:

> The above situation, more variables than samples,

*> is commonly encounterd in the climate studies.
**> Consider annual mean temperatures for 195 years
**> on a coarse 72 [lat] x 144 [lon] grid [72*144=10368
**> spatial variables].
Which are variables and which are samples here? In standard statistical parlance you have 195 variables at 10368 samples. In some fields there are the concepts of R-mode and Q-mode PCA, and you seem to be in Q-mode, which is why you have a transpose.

> Let S be the number of grid points and T be the number

*> of years. I think there is a theorem (?Eckart-Young?)
**> which states that the maximum number of unique eigenvalues
**> is min(S,T). In your case 195 eigenvalues is correct.
Eigenvalues of what? Eckart-Young is about the SVD, see e.g.

http://voteview.com/ideal_point_Eckart_Young_Theorem.htm

as Googling easily shows. (It is used to prove some of the approximation properties of PCA, e.g. in

http://www.stats.ox.ac.uk/~ripley/MultAnal_MT2004/PCA.pdf)

> I speculate that the underlying function transposes the

*> input data matrix and computes the the TxT [rather than SxS]
**> covariance matrix and solves for the eigenvalues/vectors.
**> It then uses a linear transformation to get the results
**> for the original input data matrix.
**> Computationally, the above is much faster and uses less memory.
You speculate incorrectly, even in your Q-mode view of the world. The real point is that is solves a different problem, which is what my answer to the original post was about.

It really would be a good idea to do the homework it suggests.

-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595

