Re: [R] Bootstrapped eigenvector

From: Jari Oksanen <jarioksa_at_sun3.oulu.fi>
Date: Tue 01 Feb 2005 - 00:42:56 EST

Jérôme,

On Sat, 2005-01-29 at 14:14 -0500, Jérôme Lemaître wrote:
> Hello alls,
>
> I found in the literature a technique that has been evaluated as one of the
> more robust to assess statistically the significance of the loadings in a
> PCA: bootstrapping the eigenvector (Jackson, Ecology 1993, 74: 2204-2214;
> Peres-Neto and al. 2003. Ecology 84:2347-2363). However, I'm not able to
> transform by myself the following steps into a R program, yet?
>
> Can someone could help me with this?
>
> I thank you very much by advance.
>
> Here are the steps that I need to perform:
> 1) Resample 1000 times with replacement entire raws from the original data
> sets (7 variables, 126 raws)
> 2) Conduct a PCA on each bootstrapped sample
> 3) To prevent axis reflexion and/or axis reordering in the bootstrap, here
> are two more steps for each bootstrapped sample
> 3a) calculate correlation matrix between the PCA scores of the original and
> those of the bootstrapped sample
> 3b) Examine whether the highest absolute correlation is between the
> corresponding axis for the original and bootstrapped samples. When it is not
> the case, reorder the eigenvectors. This means that if the highest
> correlation is between the first original axis and the second bootstrapped
> axis, the loadings for the second bootstrapped axis and use to estimate the
> confidence interval for the original first PC axis.
> 4) Determine the p value for each loading. Obtained as follow: number of

> loadings >=0 for loadings that were positive in the original matrix divided
> by the number of boostrap samples (1000) and/or number of loadings =<0 for
> loadings that were negative in the original matrix divided by the number of
> boostrap samples (1000).
>
The following function seems to run the analysis like Peres-Neto and others defined:

> netoboot

function (x, permutations=1000, ...)
{

   pcnull <- princomp(x, ...)
   res <- pcnull$loadings
   out <- matrix(0, nrow=nrow(res), ncol=ncol(res))    N <- nrow(x)
   for (i in 1:permutations) {

       pc <- princomp(x[sample(N, replace=TRUE), ], ...)
       pred <- predict(pc, newdata = x)
       r <-  cor(pcnull$scores, pred)
       k <- apply(abs(r), 2, which.max)
       reve <- sign(diag(r[k,]))
       sol <- pc$loadings[ ,k]
       sol <- sweep(sol, 2, reve, "*")
       out <- out + ifelse(res > 0, sol <=  0, sol >= 0)
   }
   out/permutations
}

With typical chemical data, you should pass option cor = TRUE to princomp. Another issue is whether you should use this method. Opinions may be divided here, but I'll let that to the proper Statistician to comment on.

Best wishes, Jari Oksanen

-- 
Jari Oksanen <jarioksa@sun3.oulu.fi>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Mon Jan 31 23:54:58 2005

This archive was generated by hypermail 2.1.8 : Tue 01 Feb 2005 - 01:18:49 EST