From: Tony Plate <tplate_at_blackmesacapital.com>

Date: Wed 23 Feb 2005 - 09:14:00 EST

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Feb 23 08:21:57 2005

Date: Wed 23 Feb 2005 - 09:14:00 EST

Thanks for the diagnosis.

The reason I came across this was that I use both S-PLUS and R and I often use the results of quantile() as the breaks for cut(). In S-PLUS, cut() stops with an error if breaks has any decreasing values. Thus this example caused an S-PLUS function to unexpectedly stop with an error. However, cut() in R behaves differently: it sorts its breaks and thus does not object to decreasing values in breaks. Another difference is that cut() in R stops with an error if any breaks are duplicated, which, I guess, means that in R I should use findInterval() instead of cut() for this task. Except that findInterval() in R stops with an error if its breaks are unsorted...

> findInterval(x2, quantile(x2, (0:5)/5)) Error in findInterval(x2, quantile(x2, (0:5)/5)) :

'vec' must be sorted non-decreasingly >

- Tony Plate

At Tuesday 02:36 PM 2/22/2005, Duncan Murdoch wrote:

>On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate

*><tplate@blackmesacapital.com> wrote :
**>
**> >Is it a bug that quantile() can return a lower sample quantile for a higher
**> >probability?
**> >
**> > > ##### quantile returns decreasing results with increasing probs (data at
**> >the end of the message)
**> > > quantile(x2, (0:5)/5)
**> > 0% 20% 40% 60% 80%
**> >-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
**> > 100%
**> > 0.2905596324
**> > > ##### the 40% quantile has a lower value than the 20% quantile
**> > > diff(quantile(x2, (0:5)/5))
**> > 20% 40% 60% 80% 100%
**> > 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01
**> > >
**> >
**> >This only happens for type=7:
**> >
**> > > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
**> >type=type))<0), "\n")
**> >1 FALSE
**> >2 FALSE
**> >3 FALSE
**> >4 FALSE
**> >5 FALSE
**> >6 FALSE
**> >7 TRUE
**> >8 FALSE
**> >9 FALSE
**> > >
**> >
**> >I know this is at the limits of machine precision, but it still seems
**> >wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on
**> >my machine (according to the R quantile documentation, the S-PLUS
**> >calculations correspond to type=7).
**>
**>I'd say it's not a bug in that rounding error is something you should
**>expect in a calculation like this, but it does look wrong. And it
**>isn't only restricted to type 7. If you make a vector of copies of
**>that bad value, you'll see it in more cases:
**>
**> > x <- rep(-0.00090419678460984, 602)
**> > for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
**>+ type=type))<0), "\n")
**>1 FALSE
**>2 FALSE
**>3 FALSE
**>4 FALSE
**>5 TRUE
**>6 TRUE
**>7 TRUE
**>8 FALSE
**>9 TRUE
**>
**>(at least on Windows). What's happening is that R is doing linear
**>interpolation between two equal values, and not getting the same value
**>back, because of rounding.
**>
**>The offending line appears to be this one:
**>
**>qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
**>
**>The equivalent calculation in the approx function (which doesn't
**>appear to have this problem) is
**>
**>qs[i] + (x[hi[i]] - qs[i]) * h
**>
**>Can anyone think of why this would not be better? (The same sort of
**>calculation shows up again later in quantile().)
**>
**>Duncan Murdoch
*

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Feb 23 08:21:57 2005

*
This archive was generated by hypermail 2.1.8
: Fri 18 Mar 2005 - 09:02:58 EST
*