From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Sat 06 Jan 2007 - 08:59:12 GMT

for(i in seq_len(10000))

z[i] <- median(sample(x, replace=TRUE)) - median(sample(y, replace=TRUE))

Date: Sat 06 Jan 2007 - 08:59:12 GMT

On Sat, 6 Jan 2007, gilbertg@musc.edu wrote:

> I apologize for this post. I am new to R (two days) and I have tried and

*> tried to calculated confidence intervals for medians. Can someone help
**> me?
*

Later, you say you want a confidence interval for a difference in medians, not the same thing.

For medians, see MASS4 section 5.7 for worked examples and discussion of the pitfalls.

> Here is my data:

*>
**> institution1
**> 0.21
**> 0.16
**> 0.32
**> 0.69
**> 1.15
**> 0.9
**> 0.87
**> 0.87
**> 0.73
**>
**> The first four observations compose group 1 and observations 5 through 9
**> compose group 2. I would like to create a bootstrapped 90% confidence
**> interval on the difference of the medians (n2-n1). I have successfully
**> calculated a permutation test.
**>
**> This shouldn't be as difficult as I am making it, would someone please
**> enlighten me?
*

It seems to me to be much more difficult than you have made it. We need to know exactly what you mean by

> a bootstrapped 90% confidence interval on the difference of the medians

The 'standard' theory of bootstrap confidence intervals as implemented in e.g. package 'boot' is for a single-sample problem (and it would be pushing its justification very hard to use this for n=9). But you have two samples, and haven't told us how you intend to bootstrap. I guess you mean a stratified bootstrap, sampling with replacement independently from observations 1-4 and 5-9. I don't know of theory for bootstrap confidence intervals from that scenario: do you?

Beyond this, there are considerable problems with bootstrapping medians in small samples as the median is a non-smooth function of the data and the bootstrap samples take very few values. See for example the galaxies dataset as discussed in MASS4. For the stratified bootstrapping I referred to, there are only a handful of possible values of each of the medians and so the bootstrap distribution is a highly non-uniform one on a few values. E.g.

x <- c(0.21, 0.16, 0.32, 0.69) y <- c(1.15, 0.9, 0.87, 0.87, 0.73) z <- numeric(10000)

for(i in seq_len(10000))

z[i] <- median(sample(x, replace=TRUE)) - median(sample(y, replace=TRUE))

-0.99 -0.965 -0.94 -0.91 -0.885 -0.83 -0.74 -0.725 -0.715 -0.71

33 70 83 27 134 64 129 16 259 317 -0.7 -0.69 -0.685 -0.66 -0.645 -0.635 -0.63 -0.605 -0.58 -0.57 43 370 711 1064 70 538 455 1388 424 29 -0.55 -0.545 -0.52 -0.49 -0.475 -0.465 -0.46 -0.45 -0.445 -0.42 905 57 79 41 54 119 28 183 146 436 -0.41 -0.395 -0.365 -0.305 -0.28 -0.225 -0.21 -0.18 -0.04 100 290 759 13 34 64 117 323 28

You could use that table to give 'basic' or 'percentile' confidence intervals, if you have reason to believe in them.

> Greg Gilbert, Faculty Research Associate

*> Department of Biostatistics, Bioinformatics, & Epidemiology
**> Medical University of South Carolina
*

-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Sat Jan 06 20:05:08 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sat 06 Jan 2007 - 11:30:25 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*