Re: [Rd] Possible bug in fisher.test() (PR#14196)

From: Ted Harding <Ted.Harding_at_manchester.ac.uk>
Date: Wed, 27 Jan 2010 18:14:59 +0000 (GMT)


On 27-Jan-10 17:30:10, nhorton_at_smith.edu wrote:

># is there a bug in the calculation of the odds ratio in fisher.test?
># Nicholas Horton, nhorton_at_smith.edu Fri Jan 22 08:29:07 EST 2010

>
> x1 = c(rep(0, 244), rep(1, 209))
> x2 = c(rep(0, 177), rep(1, 67), rep(0, 169), rep(1, 40))
>
> or1 = sum(x1==1&x2==1)*sum(x1==0&x2==0)/
> (sum(x1==1&x2==0)*sum(x1==0&x2==1))
>
> library(epitools)
> or2 = oddsratio.wald(x1, x2)$measure[2,1]
>
> or3 = fisher.test(x1, x2)$estimate
>
># or1=or2 = 0.625276, but or3=0.6259267!

>
> I'm running R 2.10.1 under Mac OS X 10.6.2.
> Nick

Not so. Look closely at ?fisher.test:

Value:
[...]
estimate: an estimate of the odds ratio. Note that the

          _conditional_ Maximum Likelihood Estimate (MLE)
          rather than the unconditional MLE (the sample
          odds ratio) is used. Only present in the 2 by 2 case.

Your or1 (and presumably the epitools value also) is the sample OR.

The conditional MLE is the value of rho (the OR) that maximises the probability of the table *conditional* on the margins.

In this case it differs slightly from the sample OR (by 0.1%). For smaller tables it will tend to differ even more, e.g.

  M1 <- matrix(c(4,7,17,18),nrow=2)
  M1

  #      [,1] [,2]
  # [1,]    4   17
  # [2,]    7   18

  (4*18)/(17*7)
  # [1] 0.605042

  fisher.test(M1)$estimate
  # odds ratio
  # 0.6116235 ## (1.1% larger than sample OR)

  M2 <- matrix(c(1,2,4,5),nrow=2)
  M2

  #      [,1] [,2]
  # [1,]    1    4
  # [2,]    2    5

  (1*5)/(4*2)
  # [1] 0.625

  fisher.test(M2)$estimate
  # odds ratio
  # 0.649423 ## (3.9% larger than sample OR)

The probability of a table matrix(c(a,b,c,d),nrow=2) given the marginals (a+b),(a+c),(b+c) and hence also (c+d) is a function of the odds ratio only. Again see ?fisher.test:

  "given all marginal totals fixed, the first element of    the contingency table has a non-central hypergeometric    distribution with non-centrality parameter given by    the odds ratio (Fisher, 1935)."

The value of the odds ratio which maximises this (for given observed 'a') is not the sample OR.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 27-Jan-10                                       Time: 18:14:57
------------------------------ XFMail ------------------------------

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 27 Jan 2010 - 18:19:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 27 Jan 2010 - 23:00:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive