Re: [Rd] apparently incorrect p-values from 2-sided Kolmogorov-Smirnov (PR#14178)

From: <allwrigh_at_maths.ox.ac.uk>
Date: Tue, 05 Jan 2010 13:25:14 +0100 (CET)


Dear Thomas, Thank you, yes, that sounds good, and I take the point about integer overflow.
Various questions:
(a) Is there some way I can try out the routine with this modification? (I am on a Linux system where I am just a user - I cannot install new versions of software myself) ?
(b) Is there a reference you can give me to a published paper where the method being used to compute the p-values is described? Many thanks,
David.



On Fri, 18 Dec 2009, tlumley_at_u.washington.edu wrote:

>
>
> I've fixed this by adding 0.5/mn to q. The problem (at least in principle)
> with multiplying them all up is integer overflow.
>
> By the time 0.5/mn underflows to zero, missing one value in the distribution
> won't matter.
>
> -thomas
>
>
> On Fri, 18 Dec 2009, David John Allwright wrote:
>
>> Dear Thomas, Right, thank you. Yes, I haven't looked at the source code
>> (because I don't know C) but something like what you mention could well
>> cause the kind of problems I am seeing: a loop being exectued one too few
>> or one too many times. And yes, I think those quantities should be
>> multiplied up by m*n to all become integers so we escape rounding error
>> problems. David.
>>
>> ------------------------------------------------------------------------------
>> On Wed, 16 Dec 2009, tlumley_at_u.washington.edu wrote:
>>
>>> On Tue, 15 Dec 2009, allwrigh_at_maths.ox.ac.uk wrote; (in part)
>>>
>>>>
>>>> x<-1:5
>>>> y<-c(2.5,4.5)
>>>> ks.test(x,y)
>>>>
>>>> The value of the D_2,5 statistic is calculated as 0.4 correctly, but the
>>>> p-value is stated by R as 1, though in fact it should be 20/21=0.9524
>>>
>>>
>>> What we seem to have here is a rounding error problem.
>>>
>>> In ks.c:psmirnov2x, there is a double loop including
>>> if(fabs(i / md - j / nd) > q)
>>> u[j] = 0;
>>>
>>> where md=2, nd=5, and q=3/10.
>>>
>>> Now, to full precision abs(1/2 - 4/5) > 3/10 is false, but at least on
>>> my MacBook it is true in C double precision.
>>>
>>> I'm not sure why the loop is working with doubles, since multiplying by
>>> m*n should make everything an integer.
>>>
>>> -thomas
>>>
>>> Thomas Lumley Assoc. Professor, Biostatistics
>>> tlumley_at_u.washington.edu University of Washington, Seattle
>>>
>>>
>>>
>>
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley_at_u.washington.edu University of Washington, Seattle
>
>
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 05 Jan 2010 - 12:52:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 06 Jan 2010 - 00:00:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive