Re: [R] Bootstrapping Confidence Intervals for Medians

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Sat 06 Jan 2007 - 08:59:12 GMT

On Sat, 6 Jan 2007, gilbertg@musc.edu wrote:

> I apologize for this post. I am new to R (two days) and I have tried and
> tried to calculated confidence intervals for medians. Can someone help
> me?

Later, you say you want a confidence interval for a difference in medians, not the same thing.

For medians, see MASS4 section 5.7 for worked examples and discussion of the pitfalls.

> Here is my data:
>
> institution1
> 0.21
> 0.16
> 0.32
> 0.69
> 1.15
> 0.9
> 0.87
> 0.87
> 0.73
>
> The first four observations compose group 1 and observations 5 through 9
> compose group 2. I would like to create a bootstrapped 90% confidence
> interval on the difference of the medians (n2-n1). I have successfully
> calculated a permutation test.
>
> This shouldn't be as difficult as I am making it, would someone please
> enlighten me?

It seems to me to be much more difficult than you have made it. We need to know exactly what you mean by

> a bootstrapped 90% confidence interval on the difference of the medians

The 'standard' theory of bootstrap confidence intervals as implemented in e.g. package 'boot' is for a single-sample problem (and it would be pushing its justification very hard to use this for n=9). But you have two samples, and haven't told us how you intend to bootstrap. I guess you mean a stratified bootstrap, sampling with replacement independently from observations 1-4 and 5-9. I don't know of theory for bootstrap confidence intervals from that scenario: do you?

Beyond this, there are considerable problems with bootstrapping medians in small samples as the median is a non-smooth function of the data and the bootstrap samples take very few values. See for example the galaxies dataset as discussed in MASS4. For the stratified bootstrapping I referred to, there are only a handful of possible values of each of the medians and so the bootstrap distribution is a highly non-uniform one on a few values. E.g.

x <- c(0.21, 0.16, 0.32, 0.69)
y <- c(1.15, 0.9, 0.87, 0.87, 0.73)
z <- numeric(10000)

for(i in seq_len(10000))
z[i] <- median(sample(x, replace=TRUE)) - median(sample(y, replace=TRUE))

  -0.99 -0.965 -0.94 -0.91 -0.885 -0.83 -0.74 -0.725 -0.715 -0.71

     33 70 83 27 134 64 129 16 259 317    -0.7 -0.69 -0.685 -0.66 -0.645 -0.635 -0.63 -0.605 -0.58 -0.57      43 370 711 1064 70 538 455 1388 424 29   -0.55 -0.545 -0.52 -0.49 -0.475 -0.465 -0.46 -0.45 -0.445 -0.42     905 57 79 41 54 119 28 183 146 436   -0.41 -0.395 -0.365 -0.305 -0.28 -0.225 -0.21 -0.18 -0.04     100 290 759 13 34 64 117 323 28

You could use that table to give 'basic' or 'percentile' confidence intervals, if you have reason to believe in them.

> Greg Gilbert, Faculty Research Associate
> Department of Biostatistics, Bioinformatics, & Epidemiology
> Medical University of South Carolina

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat Jan 06 20:05:08 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 06 Jan 2007 - 11:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.