Re: [R] Fast version of Fisher's Exact Test

From: Steve Lianoglou <mailinglist.honeypot_at_gmail.com>
Date: Mon, 11 Apr 2011 13:45:09 -0400

Hi,

On Fri, Apr 8, 2011 at 1:52 PM, Bert Gunter <gunter.berton_at_gene.com> wrote:
> 1. I am not an expert on this.

Definitely me neither, but:

> 2. However, my strong prior would be no, since because it is "exact" it has
> to calculate all the possible configurations and there are a lot to
> calculate with the values of n1 and n2 you gave.

But there are situations where one could get away with an approximation given large enough samples (ie. numbers in the contingency table), no?

For instance, my "wikipedia-certified statistics course" suggests that with large N, a chisq.test should give "decent" approximation to the pvalue. You can play with that as you like.

Also, the function "sage.test" in the "sagenhaft" package uses a "binomial approximation to the Fisher Exact test".

A slight modification from its examples:

R> library(sagenhaft)
R> s <- sage.test(c(0,5,10),c(0,30,50),n1=10000,n2=15000)

## And the fisher.exact equivalents:
R> M <- list(matrix(c(0,0,10000-0,15000-0),2,2),

            matrix(c(5,30,10000-5,15000-30),2,2),
            matrix(c(10,50,10000-10,15000-50),2,2))

R> m <- sapply(M, function(m) fisher.test(m)$p.value)

## How close are they to each other?
R> s - m
[1] 0.000000e+00 1.110054e-05 2.916176e-06

You can find the package here:
http://www.bioconductor.org/packages/release/bioc/html/sagenhaft.html

I guess you (Jim) can judge if it's (i) faster and (ii) appropriate to use in your scenario.

Enjoy,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 11 Apr 2011 - 17:52:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 11 Apr 2011 - 18:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive