From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>

Date: Fri 13 Jan 2006 - 08:22:08 EST

Date: Fri 13 Jan 2006 - 08:22:08 EST

(Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> writes:

> I want to ascertain the basis of the table ranking,

*> i.e. the meaning of "extreme", in Fisher's Exact Test
**> as implemented in 'fisher.test', when applied to RxC
**> tables which are larger than 2x2.
**>
**> One can summarise a strategy for the test as
**>
**> 1) For each table compatible with the margins
**> of the observed table, compute the probability
**> of this table conditional on the marginal totals.
**>
**> 2) Rank the possible tables in order of a measure
**> of discrepancy between the table and the null
**> hypothesis of "no association".
**>
**> 3) Locate the observed table, and compute the sum
**> of the probabilties, computed in (1), for this
**> table and more "extreme" tables in the sense of
**> the ranking in (2).
**>
**> The question is: what "measure of discrepancy" is
**> used in 'fisher.test' corresponding to stage (2)?
**>
**> (There are in principle several possibilities, e.g.
**> value of a Pearson chi-squared, large values being
**> discrepant; the probability calculated in (2),
**> small values being discrepant; ... )
**>
**> "?fisher.test" says only:
**>
**> In the one-sided 2 by 2 cases, p-values are obtained
**> directly using the hypergeometric distribution.
**> Otherwise, computations are based on a C version of
**> the FORTRAN subroutine FEXACT which implements the
**> network developed by Mehta and Patel (1986) and
**> improved by Clarkson, Fan & Joe (1993). The FORTRAN
**> code can be obtained from
**> <URL: http://www.netlib.org/toms/643>.
**>
**> I have had a look at this FORTRAN code, and cannot ascertain
**> it from the code itself. However, there is a Comment to the
**> effect:
**>
**> c PRE - Table p-value. (Output)
**> c PRE is the probability of a more extreme table, where
**> c 'extreme' is in a probabilistic sense.
**>
**> which suggests that the tables are ranked in order of their
**> probabilities as computed in (2).
**>
**> Can anyone confirm definitively what goes on?
*

To my knowledge, it is the "table probability", according to the hypergeometric distribution, i.e. the probability of the table given the marginals, which can be translated to sampling a+b balls without replacement from a box with a+c white and b+d black balls.

Playing around with dhyper should be instructive.

(You're right that the "two-sided" p values are obtained by summing all smaller or equal table probabilities. This is the traditional way, but there are alternatives, e.g. tail balancing.)

-- O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Fri Jan 13 08:35:38 2006

*
This archive was generated by hypermail 2.1.8
: Fri 13 Jan 2006 - 14:19:46 EST
*