Re: [R] Basis of fisher.test

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Fri 13 Jan 2006 - 08:22:08 EST

(Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> writes:

> I want to ascertain the basis of the table ranking,
> i.e. the meaning of "extreme", in Fisher's Exact Test
> as implemented in 'fisher.test', when applied to RxC
> tables which are larger than 2x2.
>
> One can summarise a strategy for the test as
>
> 1) For each table compatible with the margins
> of the observed table, compute the probability
> of this table conditional on the marginal totals.
>
> 2) Rank the possible tables in order of a measure
> of discrepancy between the table and the null
> hypothesis of "no association".
>
> 3) Locate the observed table, and compute the sum
> of the probabilties, computed in (1), for this
> table and more "extreme" tables in the sense of
> the ranking in (2).
>
> The question is: what "measure of discrepancy" is
> used in 'fisher.test' corresponding to stage (2)?
>
> (There are in principle several possibilities, e.g.
> value of a Pearson chi-squared, large values being
> discrepant; the probability calculated in (2),
> small values being discrepant; ... )
>
> "?fisher.test" says only:
>
> In the one-sided 2 by 2 cases, p-values are obtained
> directly using the hypergeometric distribution.
> Otherwise, computations are based on a C version of
> the FORTRAN subroutine FEXACT which implements the
> network developed by Mehta and Patel (1986) and
> improved by Clarkson, Fan & Joe (1993). The FORTRAN
> code can be obtained from
> <URL: http://www.netlib.org/toms/643>.
>
> I have had a look at this FORTRAN code, and cannot ascertain
> it from the code itself. However, there is a Comment to the
> effect:
>
> c PRE - Table p-value. (Output)
> c PRE is the probability of a more extreme table, where
> c 'extreme' is in a probabilistic sense.
>
> which suggests that the tables are ranked in order of their
> probabilities as computed in (2).
>
> Can anyone confirm definitively what goes on?

To my knowledge, it is the "table probability", according to the hypergeometric distribution, i.e. the probability of the table given the marginals, which can be translated to sampling a+b balls without replacement from a box with a+c white and b+d black balls.

Playing around with dhyper should be instructive.

(You're right that the "two-sided" p values are obtained by summing all smaller or equal table probabilities. This is the traditional way, but there are alternatives, e.g. tail balancing.)

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Jan 13 08:35:38 2006

This archive was generated by hypermail 2.1.8 : Fri 13 Jan 2006 - 14:19:46 EST