Re: [R] Principal component analysis PCA

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Thu, 14 Feb 2008 07:42:14 -0800 (PST)

On Wed, 13 Feb 2008, Wang, Zhaoming (NIH/NCI) [C] wrote:

>
> Try EIGENSTRAT http://www.nature.com/ng/journal/v38/n8/abs/ng1847.html

The same approach as EIGENSTRAT is pretty straightforward in R.

You need to create the covariance matrix of people (rather than of SNPs) for the 0/1/2 genotype at each SNP and take the principal components of that matrix.

In this case the number of individuals is small enough that you should be able to create the covariance matrix directly by matrix operations. In larger data sets where the entire data matrix doesn't fit in memory, you need some sort of double loop.

         -thomas

> Zhaoming
> -----Original Message-----
> From: SNN [mailto:s.nancy1_at_yahoo.com]
> Sent: Wednesday, February 13, 2008 9:14 PM
> To: r-help_at_r-project.org
> Subject: [R] Principal component analysis PCA
>
>
> Hi,
>
> I am trying to run PCA on a set of data with dimension 115*300,000. The
> columns represnt the snps and the row represent the individuals. so this
> is what i did.
>
> #load the data
>
> code<-read.table("code.txt", sep='\t', header=F, nrows=300000)
>
> # do PCA #
>
> pr<-prcomp(code, retx=T, center=T)
>
> I am getting the following error message
>
> "Error: cannot allocate vector of size 275.6 Mb"
>
> I tried to increase the memory size :
>
> "memory.size(4000)"
>
> but it did not work, is there a solution for this ? or is there another
> software that can handle large data sets.
>
> Thanks
>
>
> --
> View this message in context:
> http://www.nabble.com/Principal-component-analysis-PCA-tp15472509p154725
> 09.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley_at_u.washington.edu	University of Washington, Seattle

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 14 Feb 2008 - 15:47:32 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 14 Feb 2008 - 21:30:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive