Re: [R] binom.test() query

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Thu 05 Apr 2007 - 22:08:41 GMT

On 4/5/2007 5:35 PM, (Ted Harding) wrote:
> Hi Folks,
> The recent correspondence about "strange fisher.test result",
> and especially Peter Dalgaard's reply on Tue 03 April 2007
> (which I want to investigate further) led me to take a close
> look at the code for binom.test().
>
> I now have a query!
>
> The code for the two-sided case computes the p-value as follows:
>
> if (p == 0) (x == 0)
> else
> if (p == 1) (x == n)
> else {
> relErr <- 1 + 1e-07
> d <- dbinom(x, n, p)
> m <- n * p
> if (x == m) 1
> else
> ## PVAL if x < m
> if (x < m) {
> i <- seq(from = ceiling(m), to = n)
> y <- sum(dbinom(i, n, p) <= d * relErr)
> ## here:
> pbinom(x,n,p) + pbinom(n-y,n,p,lower=FALSE)
> }
> ## PVAL if x > m:
> else {
> i <- seq(from = 0, to = floor(m))
> y <- sum(dbinom(i, n, p) <= d * relErr)
> ## here:
> pbinom(y-1,n,p) + pbinom(x-1,n,p,lower=FALSE)
> }
> }
> }
>
> The case "x < m" will do for my query (since it is the same
> for the case "x > m").
>
> If I understand the code correctly, when x (the observed binomial
> number) is less than m (its expected value on the null hypothesis
> being tested), then the p-value is the total probability in the
> lower tail, from 0 to x inclusive, plus a "bit".
>
> The "bit" is the probability in the upper tail of those value
> from (n-y) to n, where y is the number of values of i greater
> than or equal to (if m is an integer) m such that
>
> dbinom(i,n,p) <= d*(1 + 1e-07) = dbinom(x,n,p)*(1 + 1e-07).
>
> or, in other words, ignoring the "1e-07", the values of i
> which have binomial probability (on the NH) less than or equal
> to the probability of x.
>
> So, for x < m = n*p, the lower-tail contribution to the p-value
> is the probability that X <= x, while the upper-tail contribution
> is the probability that (X >= m)&(dbinom(X,n,p) <= dbinom(x,n,p)).
>
> Now that is not so asymmetric as it looks, since for x < m = n*p
> the values of dbinom(i,n,p) decrease as i decreases for i < x,
> so in fact the left=hand tail is similarly the probability
> that (X <= m)&(dbinom(X,n,p) <= dbinom(x,n,p)). In effect, therefore,
> ignoring the "1e-07", we are simply calculating the probability
> of "dbinom(X,n,p) <= dbinom(x,n,p)" overall.
>
> But what is the purpose of the "1e-07"? I can see that it might
> cause a miscalculation if the real intention is to simply sum
> the probabilties dbinom(X,n,p) which are less than or equal to
> dbinom(x,n,p).

I think you're looking at the deparsed binom.test. For tricky things like this, it's sometimes worth looking at the source

(which in this case is in
https://svn.r-project.org/R/trunk/src/library/stats/R/binom.test.R)

and has the comment:

                            ## Do
                            ##   d <- dbinom(0 : n, n, p)
                            ##   sum(d[d <= dbinom(x, n, p)])
                            ## a bit more efficiently ...
                            ## Note that we need a little fuzz.

So I expect the answer to your question is that the 1e-7 is there to avoid rounding error causing you to incorrectly leaving out an exactly equal probability.

As to whether the difference in your example below matters, I'd say not.   With discrete distributions you don't have a right to expect continuous p-values, so having a decision go the wrong way in a manually constructed nasty example is always going to be a possibility. Exactly equal probabilities will happen when p=0.5, and that's a more important case to get right than p=0.3527785166.

Duncan Murdoch

>
> For example (and I'm being a bit mischievous here, but I'm
> theoretically entitled to be):
>
> p<-0.3527785166
> n<-20
> x<-(4:10)
> print(cbind(x,dbinom(x,n,p)),digits=10)
>
> x
> [1,] 4 0.07114577135
> [2,] 5 0.12409328342
> [3,] 6 0.16909761793
> [4,] 7 0.18433877225
> [5,] 8 0.16327483787
> [6,] 9 0.11866078116
> [7,] 10 0.07114577154
>
> Now I run the standard binom.test():
>
> x<-4 ; n<-20 ; p<-0.3527785166 ; binom.test(x,n,p)$p.value
> [1] 0.2405347
>
>
> Next, I modify the code for binom.test so that relErr <- 1
> (instead of 1 + 1e-07), and I call this new function
> binom.test.0 and run it on the same data:
>
> binom.test.0(x,n,p)$p.value
> [1] 0.1693889
>
> The difference is the value of dbinom(10,n,p) = 0.07114577...
>
> So the "1e-07" has done its dirty work -- it has added 40% to
> my p-value! Admittedly it wasn't particularly significant to
> start with, at 0.17 (in round numbers); but it's much less so
> at 0.24, and if I wanted I'm sure I could construct a really
> "significant" example!
>
> Anyway, I dare say the factor (1 + 1e-07) is there as a precaution
> against some possible numerical mishap (though I can't see what
> it might be), but the fact that it can make such a difference in
> special cases means that I think we need to know what it's doing
> and why -- just in case we don't need whatever it's supposed
> to be there for.
>
> After all, if I were going to use the values of dbinom(x,n,p)
> as the ordering of the sample space for the test, I'd simply
> get PVAL as
>
> sum(dbinom(which(dbinom((0:n),n,p)<=dbinom(x,n,p))-1,n,p))
> [1] 0.1693889
>
> (as above!).
>
> Comments?
>
> With thanks,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding@nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 05-Apr-07 Time: 22:29:21
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Apr 06 08:13:57 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 05 Apr 2007 - 23:31:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.