Re: R-beta: CI for median in funtion boxplot

Martin Maechler (maechler@stat.math.ethz.ch)
Mon, 6 Apr 1998 10:21:15 +0200


Date: Mon, 6 Apr 1998 10:21:15 +0200
Message-Id: <199804060821.KAA00495@sophie.ethz.ch>
From: Martin Maechler <maechler@stat.math.ethz.ch>
To: p.dalgaard@biostat.ku.dk
Subject: Re: R-beta: CI for median in funtion boxplot

>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:

    PD> Rick White <rick@stat.ubc.ca> writes:
    >>  I noticed that boxplot computes a 95% CI for the median by using
    >> median +/- 1.58*IQR./sqrt(n)
    >> 
    >> Where does the 1.58 constant come from?
    >> 

    PD> Search me... However, wouldn't it be better in any case to do an
    PD> exact 95% CI based on the binomial distribution? Of course, you
    PD> need at least 6 observations to do that.

No, please not yet another definition of the boxplot!
People looking at boxplots should be able to rely on their knowledge of
what a boxplot is.

I don't know the exact history; in any case,
John Tukey devised the boxplot, including the notches, 
and ``1.58 is THE number''.

A very accessible reference  on how  1.58  was construed is
Section 3.12, p.79--81 of
@Book{VelPH81,
  author = 	{Paul F. Velleman and David C. Hoaglin},
  title = 	{Applications, Basics, and Computing of Exploratory
		  Data Analysis},
  publisher = 	{Duxbury Press, Boston, Massachusetts},
  year = 	1981
}

Here a ``compact'' summary  (if you really want to know ...)

Comparing two normal populations, there are two extreme cases: 
In the first one, the variances are about equal, 
in the other, one variance is much higher than the other.
The corresponding z-Tests are 
	abs(mean(x1) - mean(x2)) - 1.96 sqrt(2) sigma_xbar
and
	abs(mean(x1) - mean(x2)) - 1.96         sigma_xbar (the big one).

Where the first corresponds to a CI of  
	mean(x) +/- 1.96 sqrt(2) / 2 sigma_xbar =
    =	mean(x) +/- 1.39 sigma_xbar 
the second one must have
	mean(x1) +/- 1.96 sigma_xbar(x1) and the same for x2.

An omnibus compromise factor is  (1.39 + 1.96) / 2 ~= 1.7
[``exact'' would be   qnorm(.975)*(1 + sqrt(2)/2)/2 = 1.672934].

Now, we also have  
	sigma = 1.349 * IQR,  [[exact:  2*qnorm(3/4)  * IQR ]]
and
	var(median) = pi/2 * var(arith.mean)

The three things put together:

	"notch length" =  (IQR/1.349) * sqrt(pi/2) * (1.7 / sqrt(n)) =
		       =  1.58  * IQR / sqrt(n),

i.e. 1.58 = sqrt(pi/2)*1.7/1.349  (= 1.579417)

Instead, the ``exact'' value for 1.58 would be

1/(2*qnorm(.75))* sqrt(pi/2) * (qnorm(.975)*(1 + sqrt(2)/2)/2) =  1.554295

---
So, 1.58 ``should be'' 1.554 instead, 
but of course, the big deal is the compromise of the two extreme
situations, anyway.  Rounding up leads to the slightly increased factor
which may be somewhat more realistic for long-tailed nonnormal situations.

----------
PS: Should the above go into the online documentation?

Martin Maechler <maechler@stat.math.ethz.ch>			<><
Seminar fuer Statistik, ETH-Zentrum SOL G1;	Sonneggstr.33
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1086
http://www.stat.math.ethz.ch/~maechler/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._