[R] binom.test() query

From: Ted Harding <ted.harding_at_nessie.mcc.ac.uk>
Date: Thu 05 Apr 2007 - 21:35:44 GMT


Hi Folks,
The recent correspondence about "strange fisher.test result", and especially Peter Dalgaard's reply on Tue 03 April 2007 (which I want to investigate further) led me to take a close look at the code for binom.test().

I now have a query!

The code for the two-sided case computes the p-value as follows:

      if (p == 0) (x == 0)
      else
        if (p == 1) (x == n) 
        else {
          relErr <- 1 + 1e-07
          d <- dbinom(x, n, p)
          m <- n * p 
          if (x == m) 1
          else
## PVAL if x < m
            if (x < m) {
              i <- seq(from = ceiling(m), to = n)
              y <- sum(dbinom(i, n, p) <= d * relErr)
   ## here:
              pbinom(x,n,p) + pbinom(n-y,n,p,lower=FALSE)
            }
## PVAL if x > m:
            else {
              i <- seq(from = 0, to = floor(m))
              y <- sum(dbinom(i, n, p) <= d * relErr)
   ## here:
              pbinom(y-1,n,p) + pbinom(x-1,n,p,lower=FALSE)
            }
          }
        }

The case "x < m" will do for my query (since it is the same for the case "x > m").

If I understand the code correctly, when x (the observed binomial number) is less than m (its expected value on the null hypothesis being tested), then the p-value is the total probability in the lower tail, from 0 to x inclusive, plus a "bit".

The "bit" is the probability in the upper tail of those value from (n-y) to n, where y is the number of values of i greater than or equal to (if m is an integer) m such that

   dbinom(i,n,p) <= d*(1 + 1e-07) = dbinom(x,n,p)*(1 + 1e-07).

or, in other words, ignoring the "1e-07", the values of i which have binomial probability (on the NH) less than or equal to the probability of x.

So, for x < m = n*p, the lower-tail contribution to the p-value is the probability that X <= x, while the upper-tail contribution is the probability that (X >= m)&(dbinom(X,n,p) <= dbinom(x,n,p)).

Now that is not so asymmetric as it looks, since for x < m = n*p the values of dbinom(i,n,p) decrease as i decreases for i < x, so in fact the left=hand tail is similarly the probability that (X <= m)&(dbinom(X,n,p) <= dbinom(x,n,p)). In effect, therefore, ignoring the "1e-07", we are simply calculating the probability of "dbinom(X,n,p) <= dbinom(x,n,p)" overall.

But what is the purpose of the "1e-07"? I can see that it might cause a miscalculation if the real intention is to simply sum the probabilties dbinom(X,n,p) which are less than or equal to dbinom(x,n,p).

For example (and I'm being a bit mischievous here, but I'm theoretically entitled to be):

   p<-0.3527785166
   n<-20
   x<-(4:10)

   print(cbind(x,dbinom(x,n,p)),digits=10)

      x
[1,] 4 0.07114577135
[2,] 5 0.12409328342
[3,] 6 0.16909761793
[4,] 7 0.18433877225
[5,] 8 0.16327483787
[6,] 9 0.11866078116
[7,] 10 0.07114577154

Now I run the standard binom.test():

x<-4 ; n<-20 ; p<-0.3527785166 ; binom.test(x,n,p)$p.value
[1] 0.2405347

Next, I modify the code for binom.test so that relErr <- 1 (instead of 1 + 1e-07), and I call this new function binom.test.0 and run it on the same data:

binom.test.0(x,n,p)$p.value
[1] 0.1693889

The difference is the value of dbinom(10,n,p) = 0.07114577...

So the "1e-07" has done its dirty work -- it has added 40% to my p-value! Admittedly it wasn't particularly significant to start with, at 0.17 (in round numbers); but it's much less so at 0.24, and if I wanted I'm sure I could construct a really "significant" example!

Anyway, I dare say the factor (1 + 1e-07) is there as a precaution against some possible numerical mishap (though I can't see what it might be), but the fact that it can make such a difference in special cases means that I think we need to know what it's doing and why -- just in case we don't need whatever it's supposed to be there for.

After all, if I were going to use the values of dbinom(x,n,p) as the ordering of the sample space for the test, I'd simply get PVAL as

sum(dbinom(which(dbinom((0:n),n,p)<=dbinom(x,n,p))-1,n,p))
[1] 0.1693889

(as above!).

Comments?

With thanks,
Ted.



E-Mail: (Ted Harding) <ted.harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 05-Apr-07                                       Time: 22:29:21
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Apr 06 07:45:48 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 05 Apr 2007 - 22:30:55 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.