Re: [Rd] inconsistency in pchisq (PR#7099)

From: <maechler_at_stat.math.ethz.ch>
Date: Sat 17 Jul 2004 - 21:21:45 EST


>>>>> "Richard" == Richard Mott <Richard.Mott@well.ox.ac.uk> >>>>> on Sat, 17 Jul 2004 11:12:23 +0100 (BST) writes:

    Richard> Martin - I agree that the p-values are essentially
    Richard> identical. i raised the issue becuase i can get
    Richard> negative p-values with the ncp=0 version. It is
    Richard> hard to reproduce this problem in the sense that
    Richard> the value of the chi-squared statistic that causes
    Richard> the phenomenom is identical to the one i sent you
    Richard> to 5 sig figures - i don't know how to print the
    Richard> full value in order to send it to you. Negative
    Richard> small probabilities could of course be treated as
    Richard> 0, but this creates problems when i take logs

sure, and much more of a problem, i.e., a clear bug. --> so after all your bug report was well valid [-> CC'ed back to R-bugs]

To see it more extremely, try the following :

> curve(pchisq(x, df=1 , lower=FALSE), 65, 70, ylim=c(-1,4)*4e-16, col=2)
> curve(pchisq(x, df=1, ncp=0, lower=FALSE), 65, 70, add=TRUE)

The reason for this behavior is simply that internal pnchisq(*, lower=FALSE) at the moment simply is equivalent to 1 - pnchisq(*, lower=TRUE), and since the computer epsilon is 2e-16 it's no wonder that cancellation swamps everything for these extreme abscissa values.

Note that we could easily change the case 'ncp=0' to use the central chisq, but that's not the case for e.g. ncp = 0.001.

==> pchisq(*, ncp) definitely needs to be improved.

I have code to do it (using "Wiener germ approximations"), but not finished testing it yet.

    Richard> If you like i can send you the data and program
    Richard> used to generate the problem - the chi-squared
    Richard> value is generated from a call to glm()

    Richard> Richard

    >>>>>>> "Richard" == Richard Mott <rmott@well.ox.ac.uk> on
    >>>>>>> Sat, 17 Jul 2004 01:42:18 +0100 writes:
    >>
    Richard> Martin - thanks - so is it always better to use
    Richard> pchisq(67.60644,df=1,lower.tail=F) if indeed ncp=0     Richard> ?
    >>  Yes, (for the algorithms currently in use; and as I
    >> said, these are destined to be improved).
    >> 
    >> However I do wonder: In which situations would it matter
    >> to have P = 2e-16 vs P = 3e-15 ??
    >> 
    >> Very often everything depends on underlying model
    >> assumptions, and such extreme tail probabilities are
    >> typically extremely dependent, i.e. would vary heavily by
    >> small changes in the underlying model.
    >> 
    >> Regards, Martin

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-devel Received on Sat Jul 17 21:26:49 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:45:02 EST