From: Ted Harding <ted.harding_at_nessie.mcc.ac.uk>

Date: Thu 05 Apr 2007 - 21:35:44 GMT

print(cbind(x,dbinom(x,n,p)),digits=10)

E-Mail: (Ted Harding) <ted.harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Apr 06 07:45:48 2007

Date: Thu 05 Apr 2007 - 21:35:44 GMT

Hi Folks,

The recent correspondence about "strange fisher.test result",
and especially Peter Dalgaard's reply on Tue 03 April 2007
(which I want to investigate further) led me to take a close
look at the code for binom.test().

The code for the two-sided case computes the p-value as follows:

if (p == 0) (x == 0) else if (p == 1) (x == n) else { relErr <- 1 + 1e-07 d <- dbinom(x, n, p) m <- n * p if (x == m) 1 else ## PVAL if x < m if (x < m) { i <- seq(from = ceiling(m), to = n) y <- sum(dbinom(i, n, p) <= d * relErr) ## here: pbinom(x,n,p) + pbinom(n-y,n,p,lower=FALSE) } ## PVAL if x > m: else { i <- seq(from = 0, to = floor(m)) y <- sum(dbinom(i, n, p) <= d * relErr) ## here: pbinom(y-1,n,p) + pbinom(x-1,n,p,lower=FALSE) } } }

The case "x < m" will do for my query (since it is the same for the case "x > m").

If I understand the code correctly, when x (the observed binomial number) is less than m (its expected value on the null hypothesis being tested), then the p-value is the total probability in the lower tail, from 0 to x inclusive, plus a "bit".

The "bit" is the probability in the upper tail of those value from (n-y) to n, where y is the number of values of i greater than or equal to (if m is an integer) m such that

dbinom(i,n,p) <= d*(1 + 1e-07) = dbinom(x,n,p)*(1 + 1e-07).

or, in other words, ignoring the "1e-07", the values of i which have binomial probability (on the NH) less than or equal to the probability of x.

So, for x < m = n*p, the lower-tail contribution to the p-value is the probability that X <= x, while the upper-tail contribution is the probability that (X >= m)&(dbinom(X,n,p) <= dbinom(x,n,p)).

Now that is not so asymmetric as it looks, since for x < m = n*p the values of dbinom(i,n,p) decrease as i decreases for i < x, so in fact the left=hand tail is similarly the probability that (X <= m)&(dbinom(X,n,p) <= dbinom(x,n,p)). In effect, therefore, ignoring the "1e-07", we are simply calculating the probability of "dbinom(X,n,p) <= dbinom(x,n,p)" overall.

For example (and I'm being a bit mischievous here, but I'm theoretically entitled to be):

p<-0.3527785166 n<-20 x<-(4:10)

print(cbind(x,dbinom(x,n,p)),digits=10)

x

*[1,] 4 0.07114577135
**[2,] 5 0.12409328342
**[3,] 6 0.16909761793
**[4,] 7 0.18433877225
**[5,] 8 0.16327483787
*

[6,] 9 0.11866078116

[7,] 10 0.07114577154

Now I run the standard binom.test():

x<-4 ; n<-20 ; p<-0.3527785166 ; binom.test(x,n,p)$p.value

[1] 0.2405347

Next, I modify the code for binom.test so that relErr <- 1 (instead of 1 + 1e-07), and I call this new function binom.test.0 and run it on the same data:

binom.test.0(x,n,p)$p.value

[1] 0.1693889

The difference is the value of dbinom(10,n,p) = 0.07114577...

So the "1e-07" has done its dirty work -- it has added 40% to my p-value! Admittedly it wasn't particularly significant to start with, at 0.17 (in round numbers); but it's much less so at 0.24, and if I wanted I'm sure I could construct a really "significant" example!

Anyway, I dare say the factor (1 + 1e-07) is there as a precaution against some possible numerical mishap (though I can't see what it might be), but the fact that it can make such a difference in special cases means that I think we need to know what it's doing and why -- just in case we don't need whatever it's supposed to be there for.

After all, if I were going to use the values of dbinom(x,n,p) as the ordering of the sample space for the test, I'd simply get PVAL as

sum(dbinom(which(dbinom((0:n),n,p)<=dbinom(x,n,p))-1,n,p))

[1] 0.1693889

(as above!).

Comments?

With thanks,

Ted.

E-Mail: (Ted Harding) <ted.harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861

Date: 05-Apr-07 Time: 22:29:21 ------------------------------ XFMail ------------------------------ ______________________________________________R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Apr 06 07:45:48 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Thu 05 Apr 2007 - 22:30:55 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*