From: Duncan Murdoch <murdoch_at_stats.uwo.ca>

Date: Wed 23 Feb 2005 - 08:36:20 EST

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Feb 23 07:43:30 2005

Date: Wed 23 Feb 2005 - 08:36:20 EST

On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate <tplate@blackmesacapital.com> wrote :

>Is it a bug that quantile() can return a lower sample quantile for a higher

*>probability?
**>
**> > ##### quantile returns decreasing results with increasing probs (data at
**>the end of the message)
**> > quantile(x2, (0:5)/5)
**> 0% 20% 40% 60% 80%
**>-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
**> 100%
**> 0.2905596324
**> > ##### the 40% quantile has a lower value than the 20% quantile
**> > diff(quantile(x2, (0:5)/5))
**> 20% 40% 60% 80% 100%
**> 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01
**> >
**>
**>This only happens for type=7:
**>
**> > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
**>type=type))<0), "\n")
**>1 FALSE
**>2 FALSE
**>3 FALSE
**>4 FALSE
**>5 FALSE
**>6 FALSE
**>7 TRUE
**>8 FALSE
**>9 FALSE
**> >
**>
**>I know this is at the limits of machine precision, but it still seems
**>wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on
**>my machine (according to the R quantile documentation, the S-PLUS
**>calculations correspond to type=7).
*

I'd say it's not a bug in that rounding error is something you should expect in a calculation like this, but it does look wrong. And it isn't only restricted to type 7. If you make a vector of copies of that bad value, you'll see it in more cases:

*> x <- rep(-0.00090419678460984, 602)
*

> for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,

+ type=type))<0), "\n")

**1 FALSE
**

**2 FALSE
**

**3 FALSE
**

**4 FALSE
**

**5 TRUE
**

**6 TRUE
**

**7 TRUE
**

**8 FALSE
**

**9 TRUE
**
(at least on Windows). What's happening is that R is doing linear
interpolation between two equal values, and not getting the same value
back, because of rounding.

The offending line appears to be this one:

qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])

The equivalent calculation in the approx function (which doesn't appear to have this problem) is

qs[i] + (x[hi[i]] - qs[i]) * h

Can anyone think of why this would not be better? (The same sort of calculation shows up again later in quantile().)

Duncan Murdoch

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Feb 23 07:43:30 2005

*
This archive was generated by hypermail 2.1.8
: Fri 18 Mar 2005 - 09:02:58 EST
*