Re: [R] strange fisher.test result

From: Ted Harding <ted.harding_at_nessie.mcc.ac.uk>
Date: Mon 02 Apr 2007 - 22:54:24 GMT


On 31-Mar-07 13:36:04, Williams Scott wrote:
> A simple question - using the following fishers test it appears that
> the P value is significant, but the CI includes 1. Is this result
> correct?
>
>> data.50p10min <- matrix(c(16,15, 8, 24),nrow=2)
>
>> fisher.test(data.50p10min)
>
> Fisher's Exact Test for Count Data
> data: data.50p10min
> p-value = 0.03941
>
> alternative hypothesis: true odds ratio is not equal to 1
>
> 95 percent confidence interval:
> 0.9810441 10.7771597
>
> sample estimates:
> odds ratio
> 3.138456

Apologies to Scott for overlooking, in my previous response, that he had in fact given his data in the line

  data.50p10min <- matrix(c(16,15, 8, 24),nrow=2)

(and to Rolf Turner for pointing this out to me nicely, in a private mail).

I shall denote this by X0 <- matrix(c(16,15, 8, 24),nrow=2)

I've done a bit of investigating with the help of R, to try to provide a well-formed explanation of reasons underlying Scott's observation and his question.

>From the above, the marginal totals for his 2x2 table

  a b = 16 8

  c d 15 24

are (rows then columns) 24,39,31,32

These fixed marginals mean that the whole table is determined by the value of a. The following function P.FX() computes the probabilities of all possible tables, conditional on the marginal totals (it is much more transparent than the code for the same purpose in fisher.test()):

P.FX <- function(margs,R){
  n  <- margs[1]+margs[2]
  a  <- (0:n)
  b  <- margs[1]-a
  c  <- margs[3]-a
  d  <- n-a-b-c
  ix <- (a>=0)&(b>=0)&(c>=0)&(d>=0)

  a <- a[ix]; b <- b[ix]; c <-c[ix]; d <- d[ix]

  P <- (R^a)/(exp(lgamma(a+1))*exp(lgamma(b+1))*

              exp(lgamma(c+1))*exp(lgamma(d+1)))   P <- P/sum(P)
  return(cbind(a,b,c,d,P))
}

where 'margs' is the set of marginal totals (in the same row-col order as above), and R = (alpha*delta)/(beta*gamma) is the odds-ratio of the cell probabilities

  alpha beta

  gamma delta

Hence the probabilties for the Null Hypothesis R=1 are

> P <- P.FX(c(24,39,31,32),1)
> P

       a b c d P

 [1,]  0 24 31  8 6.714279e-11
 [2,]  1 23 30  9 5.550471e-09
 [3,]  2 22 29 10 1.914912e-07
 [4,]  3 21 28 11 3.702164e-06
 [5,]  4 20 27 12 4.535151e-05
 [6,]  5 19 26 13 3.767664e-04
 [7,]  6 18 25 14 2.215745e-03
 [8,]  7 17 24 15 9.496050e-03
 [9,]  8 16 23 16 3.026866e-02
[10,]  9 15 22 17 7.280305e-02
[11,] 10 14 21 18 1.334723e-01
[12,] 11 13 20 19 1.877552e-01
[13,] 12 12 19 20 2.034015e-01
[14,] 13 11 18 21 1.698738e-01
[15,] 14 10 17 22 1.092046e-01
[16,] 15  9 16 23 5.381095e-02
[17,] 16  8 15 24 2.017911e-02
[18,] 17  7 14 25 5.697630e-03
[19,] 18  6 13 26 1.193093e-03
[20,] 19  5 12 27 1.814060e-04
[21,] 20  4 11 28 1.943636e-05
[22,] 21  3 10 29 1.404269e-06
[23,] 22  2  9 30 6.383041e-08

[24,] 23 1 8 31 1.611427e-09
[25,] 24 0 7 32 1.678570e-11

Extract a and P by

a <- P[,1]; p <- P[,5]

Reminder: The 2-sided Fisher test p-value is fisher.test(X0)$p.value
[1] 0.03940996

Looking at the code for fisher.test() (which takes some thought!), the values of 'a' are (effectively) considered in order of decreasing probability, and added in until the observed value of 'a' is about to be included; the resulting p-value is the sum of the probabilities of the 'a' values not included (amongst which is the observed value itself):

iy <- order(p,decreasing=TRUE)
cbind(a=a[iy],p=p[iy],pval=rev(cumsum(rev(p[iy]))))

         a p pval

 [1,]   12 2.034015e-01 1.000000e+00
 [2,]   11 1.877552e-01 7.965985e-01
 [3,]   13 1.698738e-01 6.088432e-01
 [4,]   10 1.334723e-01 4.389695e-01
 [5,]   14 1.092046e-01 3.054972e-01
 [6,]    9 7.280305e-02 1.962926e-01
 [7,]   15 5.381095e-02 1.234896e-01
 [8,]    8 3.026866e-02 6.967862e-02
 [9,]   16 2.017911e-02 3.940996e-02 ###### Observed value
[10,]    7 9.496050e-03 1.923085e-02
[11,]   17 5.697630e-03 9.734798e-03
[12,]    6 2.215745e-03 4.037168e-03
[13,]   18 1.193093e-03 1.821423e-03
[14,]    5 3.767664e-04 6.283293e-04
[15,]   19 1.814060e-04 2.515629e-04
[16,]    4 4.535151e-05 7.015687e-05
[17,]   20 1.943636e-05 2.480536e-05
[18,]    3 3.702164e-06 5.369000e-06
[19,]   21 1.404269e-06 1.666837e-06
[20,]    2 1.914912e-07 2.625675e-07
[21,]   22 6.383041e-08 7.107624e-08
[22,]    1 5.550471e-09 7.245826e-09
[23,]   23 1.611427e-09 1.695355e-09

[24,] 0 6.714279e-11 8.392849e-11
[25,] 24 1.678570e-11 1.678570e-11

Here it can be seen that, with "extreme" meaning "towards the bottom of the above listing", the probability of a result at least as extreme as the observed value a=16 is 3.940996e-02, the value actually returned by fisher.test() (see above). The right-hand column gives the "upwards" cumulative sum of the probabilities in the middle column.

Now for the 95% 2-sided Confidence Interval. Finding this is a somewhat different matter, since the above 2-sided approach involves ordering the outcomes "2-sidedly", whereas finding the confidence limits involves a 1-sided approach for each limit.

For the upper limit of the 95% CI, we seek the smallest value of the odds-ratio parameter such that the probability of an 'a' value less than or equal to the observed a=16 is at most (1 - 0.95)/2 = 0.025. Even by hand, one is led quite quickly to the result, varying the value of R on the lines of:

sum(P.FX(c(24,39,31,32),  1)[a<=16,5]) - 0.025
[1] 0.967907
sum(P.FX(c(24,39,31,32),  5)[a<=16,5]) - 0.025
[1] 0.2480870
sum(P.FX(c(24,39,31,32), 10)[a<=16,5]) - 0.025
[1] 0.008515804

.....
sum(P.FX(c(24,39,31,32), 10.75)[a<=16,5]) - 0.025 [1] 0.0002679926
.....
sum(P.FX(c(24,39,31,32), 10.77877506441498)[a<=16,5]) - 0.025 [1] 3.122502e-17

>From fisher.test(X0):
95 percent confidence interval:
  0.9810441 10.7771597

Whereas, from the function P.FX():
sum(P.FX(c(24,39,31,32),10.7771597)[a<=16,5]) [1] 0.02501496

so there's a slight discrepancy here...

Now, for the upper limit of the 95% CI, we seek the largest value of the odds-ratio parameter such that the probability of an 'a' value greater than or equal to the observed a=16 is at most (1 - 0.95)/2 = 0.025. By a similar procedure:

sum(P.FX(c(24,39,31,32), 1.0)[a>=16,5]) - 0.025 [1] 0.002272143
> sum(P.FX(c(24,39,31,32), 0.95)[a>=16,5]) - 0.025
[1] -0.003457482
> sum(P.FX(c(24,39,31,32), 0.98)[a>=16,5]) - 0.025
[1] -0.0001202556
.....
sum(P.FX(c(24,39,31,32), 0.981032895344028)[a>=16,5]) - 0.025 [1] -4.510281e-17

which again is not quite equal to the "0.9810441" from the fisher.test() for which, again:

sum(P.FX(c(24,39,31,32),0.9810441)[a>=16,5]) [1] 0.02500131

so again a slight discrepancy. (By the way, the above manual iteration is a lot quicker than one might at first expect; each of the above convergences took definitely less than minutes, and there is something rewarding about observing the convergence itself!)

I'm not sure what the reason for these slight discrepancies is. Maybe there's a small difference in numerical result between the functions dnhyper() and pnhyper() (defined internally in fisher.test()) and the results from my function P.FX() above; maybe there's s difference in "stopping rule" between the use of uniroot() in fisher.test(), and my manual method above.

Anyway, to come back to Scott's original query, it is clear from the above analysis that the way fisher.test() arrives at its 2-sided p-value is not directly related to the way it arrives at the upper and lower confidence limits.

For the p-value, it works outwards in both directions on the value of 'a', until the observed value is just excluded, using only the Null Hypothesis value of R, the odds ratio. Then the p-value is the sum of the probabilities of the 'a' values excluded on either side. The separate probabilties on each side are not necessarily (and generally are not) equal. Here the contribution from one side (in this case the lower values of 'a') is smaller than the contribution from the other side (the higher values of a), as one can see from the one-sided sums for odds-ratio=1 for the two-sided test:

sum(P.FX(c(24,39,31,2), 1)[a<=7,5])
[1] 0.01213781

sum(P.FX(c(24,39,31,2), 1)[a>=16,5])
[1] 0.02727214

For the confidence limits, the p-value for each limit is fixed to be exactly 0.025, and the odds-ratio is varied until the sum of the ">=a (=16)" probabilities (for the lower limit) is exactly 0.025, and again varied until the sum of the "<=a (=16)" probabilites (for the upper limit) is again exactly 0.025.

So it is not, in fact, a paradox that the p-value for the null hypothesis OR=1 is less than 0.05, while the 95% confidence interval for the OR includes the value 1.

This is not, of course, going to happen in most cases. It is associated with the fact that, in the present case, the p-value for the test is not much below 0.05.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) <ted.harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 02-Apr-07                                       Time: 23:54:20
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Apr 03 08:59:52 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 03 Apr 2007 - 15:00:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.