Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed 23 Feb 2005 - 08:51:51 EST

On Tue, 22 Feb 2005, Duncan Murdoch wrote:

> On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
> <tplate@blackmesacapital.com> wrote :
>
>> Is it a bug that quantile() can return a lower sample quantile for a higher
>> probability?
>>
>>> ##### quantile returns decreasing results with increasing probs (data at
>> the end of the message)
>>> quantile(x2, (0:5)/5)
>> 0% 20% 40% 60% 80%
>> -0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
>> 100%
>> 0.2905596324
>>> ##### the 40% quantile has a lower value than the 20% quantile
>>> diff(quantile(x2, (0:5)/5))
>> 20% 40% 60% 80% 100%
>> 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01
>>>
>>
>> This only happens for type=7:
>>
>>> for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
>> type=type))<0), "\n")
>> 1 FALSE
>> 2 FALSE
>> 3 FALSE
>> 4 FALSE
>> 5 FALSE
>> 6 FALSE
>> 7 TRUE
>> 8 FALSE
>> 9 FALSE
>>>
>>
>> I know this is at the limits of machine precision, but it still seems
>> wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on
>> my machine (according to the R quantile documentation, the S-PLUS
>> calculations correspond to type=7).
>
> I'd say it's not a bug in that rounding error is something you should
> expect in a calculation like this, but it does look wrong. And it
> isn't only restricted to type 7. If you make a vector of copies of
> that bad value, you'll see it in more cases:
>
>> x <- rep(-0.00090419678460984, 602)
>> for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
> + type=type))<0), "\n")
> 1 FALSE
> 2 FALSE
> 3 FALSE
> 4 FALSE
> 5 TRUE
> 6 TRUE
> 7 TRUE
> 8 FALSE
> 9 TRUE
>
> (at least on Windows). What's happening is that R is doing linear
> interpolation between two equal values, and not getting the same value
> back, because of rounding.
>
> The offending line appears to be this one:
>
> qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
>
> The equivalent calculation in the approx function (which doesn't
> appear to have this problem) is
>
> qs[i] + (x[hi[i]] - qs[i]) * h
>
> Can anyone think of why this would not be better? (The same sort of
> calculation shows up again later in quantile().)

Infinities, where arithmetic is not distributive.

It is done that way to interpolate correctly between Inf and Inf: the second version gives NaN, and that is something that was a problem with the first version of the update to give all those types. I am pretty sure there is a regression test for this.

If you really want to avoid this, I think you need to pre-test for equality.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Feb 23 08:02:51 2005

This archive was generated by hypermail 2.1.8 : Wed 23 Feb 2005 - 08:32:29 EST