Re: [R] PCA with not non-negative definite covariance

From: Quin Wills <quin.wills_at_googlemail.com>
Date: Thu 27 Jul 2006 - 19:48:22 EST


Thank you... I will definitely check that up.

Quin

-----Original Message-----
From: Stéphane Dray [mailto:dray@biomserv.univ-lyon1.fr] Sent: 27 July 2006 09:04 AM
To: Quin Wills
Cc: 'Berton Gunter'; r-help@stat.math.ethz.ch Subject: Re: [R] PCA with not non-negative definite covariance

As said by Pierre Bady,
an answer to your question is NIPALS analysis. PCA is usually obtained by the diagonalization of a variance-covariance matrix. But it can also be obtained by an iterative proedure which consists in two regressions. NIPLAS is an implementation of this iterative procedure and is strictly equivalent to PCA when there is no missing values.
The adavantage of NIPALS is that it can be used with missing values. However, note that the convergence is not always obtained (it depends of the number and distribution of missing values). You can find a description of the method and the algorithm here:

http://biomserv.univ-lyon1.fr/~dray/articles/SD165.html

Sincerely,

Quin Wills wrote:

>My apologies (in response to the last 2 replies). I should write sensibly -
>including subject titles that make grammatical sense.
>
>(1) By analogous, I mean that using classical MDS with Euclidian distance
is
>equivalent to plotting the first "k" principle components.
>(2) Agreed re. distribution assumptions.
>(3) Agreed re. the need to use some kind of imputation for calculating
>distances. I'm thinking pairwise exclusion for correlation.
>
>Re. why I want to do this is simply for graphically representing my data.
>
>Quin
>
>
>
>-----Original Message-----
>From: Berton Gunter [mailto:gunter.berton@gene.com]
>Sent: 26 July 2006 05:10 PM
>To: 'Quin Wills'; bady@univ-lyon1.fr
>Cc: r-help@stat.math.ethz.ch
>Subject: RE: [R] PCA with not non-negative definite covariance
>
>Not sure what "completely analagous" means; mds is nonlinear, PCA is
linear.
>
>In any case, the bottom line is that if you have high dimensional data with
>"many" missing values, you cannot know what the multivariate distribution
>looks like -- and you need a **lot** of data with many variables to
usefully
>characterize it anyway. So you must either make some assumptions about what
>the distribution could be (including imputation methodology) or use any of
>the many exploratory techniques available to learn what you can.
>Thermodynamics holds -- you can't get something for nothing (you can't fool
>Mother Nature).
>
>-- Bert Gunter
>Genentech Non-Clinical Statistics
>South San Francisco, CA
>
>"The business of the statistician is to catalyze the scientific learning
>process." - George E. P. Box
>
>
>
>
>
>>-----Original Message-----
>>From: r-help-bounces@stat.math.ethz.ch
>>[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Quin Wills
>>Sent: Wednesday, July 26, 2006 8:44 AM
>>To: bady@univ-lyon1.fr
>>Cc: r-help@stat.math.ethz.ch
>>Subject: Re: [R] PCA with not non-negative definite covariance
>>
>>Thanks.
>>
>>I suppose that another option could be just to use classical
>>multi-dimensional scaling. By my understanding this is (if based on
>>Euclidian measure) completely analogous to PCA, and because it's based
>>explicitly on distances, I could easily exclude the variables
>>with NA's on a
>>pairwise basis when calculating the distances.
>>
>>Quin
>>
>>-----Original Message-----
>>From: bady@univ-lyon1.fr [mailto:bady@univ-lyon1.fr]
>>Sent: 25 July 2006 09:24 AM
>>To: Quin Wills
>>Cc: r-help@stat.math.ethz.ch
>>Subject: Re: [R] PCA with not non-negative definite covariance
>>
>>Hi , hi all,
>>
>>
>>
>>>Am I correct to understand from the previous discussions on
>>>
>>>
>>this topic (a
>>
>>
>>>few years back) that if I have a matrix with missing values
>>>
>>>
>>my PCA options
>>
>>
>>>seem dismal if:
>>>(1) I don’t want to impute the missing values.
>>>(2) I don’t want to completely remove cases with missing values.
>>>(3) I do cov() with use=”pairwise.complete.obs”, as
>>>
>>>
>>this produces
>>
>>
>>>negative eigenvalues (which it has in my case!).
>>>
>>>
>>(4) Maybe you can use the Non-linear Iterative Partial Least Squares
>>(NIPALS)
>>algorithm (intensively used in chemometry). S. Dray proposes
>>a version of
>>this
>>procedure at http://pbil.univ-lyon1.fr/R/additifs.html.
>>
>>
>>Hope this help :)
>>
>>
>>Pierre
>>
>>
>>
>>--------------------------------------------------------------
>>------------
>>Ce message a été envoyé depuis le webmail IMP (Internet
>>Messaging Program)
>>
>>--
>>No virus found in this incoming message.
>>
>>
>>
>>
>>--
>>
>>______________________________________________
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>
>

-- 
Stéphane DRAY (dray@biomserv.univ-lyon1.fr )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57       Fax: 33 4 72 43 13 88
http://biomserv.univ-lyon1.fr/~dray/

-- 
No virus found in this incoming message.


 

--

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu Jul 27 19:52:40 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 27 Jul 2006 - 22:16:36 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.