From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Wed 23 Feb 2005 - 08:51:51 EST

Date: Wed 23 Feb 2005 - 08:51:51 EST

On Tue, 22 Feb 2005, Duncan Murdoch wrote:

> On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate

*> <tplate@blackmesacapital.com> wrote :
**>
**>> Is it a bug that quantile() can return a lower sample quantile for a higher
**>> probability?
**>>
**>>> ##### quantile returns decreasing results with increasing probs (data at
**>> the end of the message)
**>>> quantile(x2, (0:5)/5)
**>> 0% 20% 40% 60% 80%
**>> -0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
**>> 100%
**>> 0.2905596324
**>>> ##### the 40% quantile has a lower value than the 20% quantile
**>>> diff(quantile(x2, (0:5)/5))
**>> 20% 40% 60% 80% 100%
**>> 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01
**>>>
**>>
**>> This only happens for type=7:
**>>
**>>> for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
**>> type=type))<0), "\n")
**>> 1 FALSE
**>> 2 FALSE
**>> 3 FALSE
**>> 4 FALSE
**>> 5 FALSE
**>> 6 FALSE
**>> 7 TRUE
**>> 8 FALSE
**>> 9 FALSE
**>>>
**>>
**>> I know this is at the limits of machine precision, but it still seems
**>> wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on
**>> my machine (according to the R quantile documentation, the S-PLUS
**>> calculations correspond to type=7).
**>
**> I'd say it's not a bug in that rounding error is something you should
**> expect in a calculation like this, but it does look wrong. And it
**> isn't only restricted to type 7. If you make a vector of copies of
**> that bad value, you'll see it in more cases:
**>
**>> x <- rep(-0.00090419678460984, 602)
**>> for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
**> + type=type))<0), "\n")
**> 1 FALSE
**> 2 FALSE
**> 3 FALSE
**> 4 FALSE
**> 5 TRUE
**> 6 TRUE
**> 7 TRUE
**> 8 FALSE
**> 9 TRUE
**>
**> (at least on Windows). What's happening is that R is doing linear
**> interpolation between two equal values, and not getting the same value
**> back, because of rounding.
**>
**> The offending line appears to be this one:
**>
**> qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
**>
**> The equivalent calculation in the approx function (which doesn't
**> appear to have this problem) is
**>
**> qs[i] + (x[hi[i]] - qs[i]) * h
**>
**> Can anyone think of why this would not be better? (The same sort of
**> calculation shows up again later in quantile().)
*

Infinities, where arithmetic is not distributive.

It is done that way to interpolate correctly between Inf and Inf: the second version gives NaN, and that is something that was a problem with the first version of the update to give all those types. I am pretty sure there is a regression test for this.

If you really want to avoid this, I think you need to pre-test for equality.

-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-develReceived on Wed Feb 23 08:02:51 2005

*
This archive was generated by hypermail 2.1.8
: Wed 23 Feb 2005 - 08:32:29 EST
*