# Re: [R] CDF of Sample Quantile

From: peter dalgaard <pdalgd_at_gmail.com>
Date: Tue, 08 Mar 2011 12:23:34 +0100

On Mar 7, 2011, at 19:12 , Bentley Coffey wrote:

```> Just to tie up this thread, I wanted to report my solution:
>
> When (n-1)p is an integer, there is a closed form solution:
> pbinom(j-1,n,...)
>
> When it is not an integer, its fairly easy to approximate the solution by
> interpolating between the closed-form solutions: fitting log(1 - probability
> from closed form solution) on an orthogonal polynomial in n.  This is a
> _very_ fast and fairly accurate solution.
>
> Thanks to all who offered their help...

```

If you have too much time on your hand, Wikipedia has the joint density of two order statistics, from which you could probably proceed to find the marginal density of a linear combination of two neighboring order stats. Just take a large piece of paper and a couple of spare days.... Numerical integration might do the job, with some care.

-p

```>
> On Thu, Feb 17, 2011 at 11:11 PM, Bentley Coffey
> <bentleygcoffey_at_gmail.com>wrote:
>
>>
>> Duncan,
>>
>> I'm not sure how I missed your message. Sorry. What you describe is what I
>> do when (n-1)p is an integer so that R computes the sample quantile using a
>> single order statistic. My later posting has that exact binomial expression
>> in there as a special case. When (n-1)p is not an integer then R uses a
>> weighted average of 2 order statistics, in which case I'm left with my
>> standing problem...
>>
>>
>> On Mon, Feb 14, 2011 at 2:26 PM, Duncan Murdoch <murdoch.duncan_at_gmail.com>wrote:
>>
>>> On 14/02/2011 9:58 AM, Bentley Coffey wrote:
>>>
```

>>>> I need to calculate the probability that a sample quantile will exceed a
>>>> threshold given the size of the iid sample and the parameters describing
>>>> the
>>>> distribution of each observation (normal, in my case). I can compute the
>>>> probability with brute force simulation: simulate a size N sample, apply
>>>> R's
>>>> quantile() function on it, compare it to the threshold, replicate this
>>>> MANY
>>>> times, and count the number of times the sample quantile exceeded the
>>>> threshold (dividing by the total number of replications yields the
>>>> probability of interest). The problem is that the number of replications
>>>> required to get sufficient precision (3 digits say) is so HUGE that this
>>>> takes FOREVER. I have to perform this task so much in my script
>>>> (searching
>>>> over the sample size and repeated for several different distribution
>>>> parameters) that it takes too many hours to run.
>>>>
>>>> I've searched for pre-existing code to do this in R and haven't found
>>>> anything. Perhaps I'm missing something. Is anyone aware of an R function
>>>> to
>>>> compute this probability?
>>>>
>>>> I've tried writing my own code using the fact that R's quantile()
>>>> function
>>>> is a linear combination of 2 order statistics. Basically, I wrote down
>>>> the
>>>> mathematical form of the joint pdf for the 2 order statistics (a function
>>>> of
>>>> the sample size and the distribution parameters) then performed a
>>>> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's
>>>> random draws) over the region where the sample quantile exceeds the
>>>> threshold. In theory, this should work and it takes about 1000 times
>>>> fewer
>>>> clock cycles to compute than the Brute Force approach. My problem is that
>>>> there is a significant discrepancy between the results using Brute Force
>>>> and
>>>> using this more efficient approach that I have coded up. I believe that
>>>> the
>>>> problem is numerical error but it could be some programming bug;
>>>> regardless,
>>>> I have been unable to locate the source of this problem and have spent
>>>> over
>>>> 20 hours trying to identify it this weekend. Please, somebody help!!!
>>>>
>>>> So, again, my question: is there code in R for quickly evaluating the CDF
>>>> of
>>>> a Sample Quantile given the sample size and the parameters governing the
>>>> distribution of each iid point in the sample?
>>>>
```>>>
>>> I think the answer to your question is no, but I think it's the wrong
>>> question.  Suppose Xm:n is the mth sample quantile in a sample of size n,
>>> and you want to calculate P(Xm:n > x).  Let X be a draw from the underlying
>>> distribution, and suppose P(X > x) = p.  Then the event Xm:n > x
>>> is the same as the event that out of n independent draws of X, at least
>>> n-m+1 are bigger than x:  a binomial probability.  And R can calculate
>>> binomial probabilities using pbinom().
>>>
>>> Duncan Murdoch
>>>
>>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.

```
```--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes_at_cbs.dk  Priv: PDalgd_at_gmail.com

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help