From: Spencer Graves <spencer.graves_at_pdf.com>

Date: Tue 12 Jul 2005 - 12:08:39 EST

>>-----Original Message-----

*>>From: r-help-bounces@stat.math.ethz.ch
*

*>>[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Ted Harding
*

*>>Sent: Monday, July 11, 2005 2:52 PM
*

*>>To: r-help@stat.math.ethz.ch
*

*>>Subject: Re: [R] Boxplot philosophy {was "Boxplot in R"}
*

*>>
*

*>>On 11-Jul-05 Martin Maechler wrote:
*

*>>
*

*>>>>>>>>"AdaiR" == Adaikalavan Ramasamy <ramasamy@cancer.org.uk>
*

*>>>>>>>> on Mon, 11 Jul 2005 03:04:44 +0100 writes:
*

*>>>
*

*>>> AdaiR> Just an addendum on the philosophical aspect of doing
*

*>>> AdaiR> this. By selecting the 5% and 95% quantiles, you are
*

*>>> AdaiR> always going to get 10% of the data as "extreme" and
*

*>>> AdaiR> these points may not necessarily outliers. So when
*

*>>> AdaiR> you are comparing information from multiple columns
*

*>>> AdaiR> (i.e. boxplots), it is harder to say which column
*

*>>> AdaiR> contains more extreme value compared to others etc.
*

*>>>
*

*>>>Yes, indeed!
*

*>>>
*

*>>>People {and software implementations} have several times provided
*

*>>>differing definitions of how the boxplot whiskers should be defined.
*

*>>>
*

*>>>I strongly believe that this is very often a very bad idea!!
*

*>>>
*

*>>>A boxplot should be a universal mean communication and so one
*

*>>>should be *VERY* reluctant redefining the outliers.
*

*>>>
*

*>>>I just find that Matlab (in their statistics toolbox)
*

*>>>does *NOT* use such a silly 5% / 95% definition of the whiskers,
*

*>>>at least not according to their documentation.
*

*>>>That's very good (and I wonder where you, Larry, got the idea of
*

*>>>the 5 / 95 %).
*

*>>>Using such a fixed percentage is really a very inferior idea to
*

*>>>John Tukey's definition {the one in use in all implementations
*

*>>>of S (including R) probably for close to 20 years now}.
*

*>>>
*

*>>>I see one flaw in Tukey's definition {which is shared of course
*

*>>>by any silly "percentage" based ``outlier'' definition}:
*

*>>>
*

*>>> The non-dependency on the sample size.
*

*>>>
*

*>>>If you have a 1000 (or even many more) points,
*

*>>>you'll get more and more `outliers' even for perfectly normal data.
*

*>>>
*

*>>>But then, I assume John Tukey would have told us to do more
*

*>>>sophisticated things {maybe things like the "violin plots"} than
*

*>>>boxplot if you have really very many data points, you may want
*

*>>>to see more features -- or he would have agreed to use
*

*>>> boxplot(*, range = monotone_slowly_growing(n) )
*

*>>>for largish sample sizes n.
*

*>>>
*

*>>>Martin Maechler, ETH Zurich
*

*>>
*

*>>I happily agree with Martin's essay on Boxplot philiosophy.
*

*>>
*

*>>It would cerainly confuse boxplot watchers if the interpretation
*

*>>of what they saw had to vary from case to case. The fact that
*

*>>careful (and necessarily detailed) explanations of what was
*

*>>different this time would be necessary in the text would not
*

*>>help much, and would defeat the primary objective of the boxplot
*

*>>which is to present a summary of features of the data in a form
*

*>>which can be grasped visually very quickly indeed.
*

*>>
*

*>>I'm sure many of us have at times felt some frustration at the
*

*>>rigidly precise numerical interpretations which Tukey imposed
*

*>>on the elements of his many EDA techniques; but this did ensure
*

*>>that the viewer really knew, at a glance, what he was looking at.
*

*>>
*

*>>EDA brilliantly combined several aspects of "looking at data":
*

*>>selection of features of the data; highly efficient encoding of
*

*>>these, and of their inter-relationships, into a medium directly
*

*>>adapted to visual perception; robustness (so that the perceptions
*

*>>were not unstable with respect to wondering just what the underlying
*

*>>distribution might be); accessibility (in the sense of being truly
*

*>>understood) to non-theoreticians; and capacity to be implemented on
*

*>>primitive information technology.
*

*>>
*

*>>Indeed, one might say that the "core team" of EDA consists of the
*

*>>techniques for which you need only pencil and paper.
*

*>>
*

*>>Nevertheless, Tukey was no rigid dogmatist. His objective was
*

*>>always to give a good representation of the data, and he would
*

*>>happily shift his ground, or adapt a technique (albeit probably
*

*>>giving it a different name), or devise a new one, if that would
*

*>>be useful for the case in hand.
*

*>>
*

*>>Best wishes to all,
*

*>>Ted.
*

*>>
*

*>>
*

*>>--------------------------------------------------------------------
*

*>>E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk>
*

*>>Fax-to-email: +44 (0)870 094 0861
*

*>>Date: 11-Jul-05 Time: 22:19:47
*

*>>------------------------------ XFMail ------------------------------
*

*>>
*

*>>______________________________________________
*

*>>R-help@stat.math.ethz.ch mailing list
*

*>>https://stat.ethz.ch/mailman/listinfo/r-help
*

*>>PLEASE do read the posting guide!
*

*>>http://www.R-project.org/posting-guide.html
*

*>>
*

Date: Tue 12 Jul 2005 - 12:08:39 EST

I'll bite: How does one detect bimodalidty from a boxplot?

> FWIW: > > I have been an enthusiastic user of boxplots for decades. Of course, the > issue of how to handle the whiskers ("outliers"] is a valid one, and indeed > sample size related. Dogma is always dangerous. I got to know John Tukey > somewhat (I used to chauffer him to and from meetings with a group of Merck > statisticians), and I,too,think he would have been the first to agree that > some flexibility here is wise. > > HOWEVER, the chief advantage of boxplots is their simplicity at displaying > simultaneously and easily **several** important aspects of the data, of > which outliers are probably the most problematic (as they often result in > severe distortion of the plots without careful scaling). Even with dozens of > boxplots, center, scale, and skewness are easy to discern and compare. I > think this would NOT be true of "violin" plots and other more complex > versions -- simplicity can be a virtue. > > Finally, a tidbit for boxplot afficianados: how does one detect bimodality > from a boxplot? > > -- Bert Gunter > Genentech Non-Clinical Statistics > South San Francisco, CA > > "The business of the statistician is to catalyze the scientific learning > process." - George E. P. Box > > > >

>>-----Original Message-----

> > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves@pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Tue Jul 12 12:14:52 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:33:30 EST
*