Re: [R] Fastest way to do HWE.exact test on 100K SNP data?

From: <>
Date: Wed 07 Jun 2006 - 04:21:46 EST

Anna Pluzhnikov <> wrote:
> Hi everyone,

> I'm using the function 'HWE.exact' of 'genetics' package to compute
> p-values of the HWE test. My data set consists of ~600 subjects
> (cases and controls) typed at ~ 10K SNP markers; the test is applied
> separately to cases and controls. The genotypes are stored in a list
> of 'genotype' objects, all.geno, and p-values are calculated inside
> the loop over all SNP markers.

Just to concur with the previous two posters: when I've needed to calculate lots of HWE values, I've done one of two things:

  1. Use a faster test: either the chisq test or a likelihood ratio test. Optionally, you could use the exact test when one of the allele counts is very small, and use an asymtotic test in other cases. I often just use the fast test on everything. Especially since you are doing a permutation analysis on the test values, the exact test may not be buying you anything.
  2. Caching the HWE values works great if you can get some reuse of previously calculated values. If you're calculating for ~300 cases and ~300 controls, there would be 600*601/4 or only ~90K possible sets of AA/AB/BB allele counts assuming complete data (you can flip the counts around so that the "AA" count is always for the more frequent allele); with missing data there can be many more, but most of those possibilities will never be observed under the null hypothesis of HWE. And since you are computing roughly 10 million HWE values, you'll have a lot of reuse of previously calculated values.
    • Dave mailing list PLEASE do read the posting guide! Received on Wed Jun 07 04:25:54 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 07 Jun 2006 - 06:10:24 EST.

Mailing list information is available at Please read the posting guide before posting to the list.