Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Wed 23 Feb 2005 - 08:56:21 EST

On Tue, 22 Feb 2005 21:36:20 +0000, Duncan Murdoch <murdoch@stats.uwo.ca> wrote :

>On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
><tplate@blackmesacapital.com> wrote :
>
>>Is it a bug that quantile() can return a lower sample quantile for a higher
>>probability?
>>
>> > ##### quantile returns decreasing results with increasing probs (data at
>>the end of the message)
>> > quantile(x2, (0:5)/5)
>> 0% 20% 40% 60% 80%
>>-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
>> 100%
>> 0.2905596324
>> > ##### the 40% quantile has a lower value than the 20% quantile
>> > diff(quantile(x2, (0:5)/5))
>> 20% 40% 60% 80% 100%
>> 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01
>> >
>>
>>This only happens for type=7:
>>
>> > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
>>type=type))<0), "\n")
>>1 FALSE
>>2 FALSE
>>3 FALSE
>>4 FALSE
>>5 FALSE
>>6 FALSE
>>7 TRUE
>>8 FALSE
>>9 FALSE
>> >
>>
>>I know this is at the limits of machine precision, but it still seems
>>wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on
>>my machine (according to the R quantile documentation, the S-PLUS
>>calculations correspond to type=7).
>
>I'd say it's not a bug in that rounding error is something you should
>expect in a calculation like this, but it does look wrong. And it
>isn't only restricted to type 7. If you make a vector of copies of
>that bad value, you'll see it in more cases:
>
>> x <- rep(-0.00090419678460984, 602)
>> for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
>+ type=type))<0), "\n")
>1 FALSE
>2 FALSE
>3 FALSE
>4 FALSE
>5 TRUE
>6 TRUE
>7 TRUE
>8 FALSE
>9 TRUE
>
>(at least on Windows). What's happening is that R is doing linear
>interpolation between two equal values, and not getting the same value
>back, because of rounding.
>
>The offending line appears to be this one:
>
>qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
>
>The equivalent calculation in the approx function (which doesn't
>appear to have this problem) is
>
>qs[i] + (x[hi[i]] - qs[i]) * h
>
>Can anyone think of why this would not be better? (The same sort of
>calculation shows up again later in quantile().)

Just looked at the history of this line, and it appears the code is the way it is to avoid an error if the value of the vector is infinite. For example, we now get the right answer

> x <- rep(Inf, 100)
> quantile(x, 0:5/5)

  0% 20% 40% 60% 80% 100%
 Inf Inf Inf Inf Inf Inf

but with my modification above we wouldn't:

> quantile(x, 0:5/5)

  0% 20% 40% 60% 80% 100%
 Inf NaN NaN NaN NaN Inf

Duncan



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Feb 23 08:11:33 2005

This archive was generated by hypermail 2.1.8 : Wed 23 Feb 2005 - 08:32:29 EST