Re: [R] normality tests [Broadcast]

From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
Date: Fri, 25 May 2007 17:31:41 -0500

Cody_Hamilton_at_Edwards.com wrote:
> Following up on Frank's thought, why is it that parametric tests are so
> much more popular than their non-parametric counterparts? As
> non-parametric tests require fewer assumptions, why aren't they the
> default? The relative efficiency of the Wilcoxon test as compared to the
> t-test is 0.955, and yet I still see t-tests in the medical literature all
> the time. Granted, the Wilcoxon still requires the assumption of symmetry
> (I'm curious as to why the Wilcoxon is often used when asymmetry is
> suspected, since the Wilcoxon assumes symmetry), but that's less stringent
> than requiring normally distributed data. In a similar vein, one usually
> sees the mean and standard deviation reported as summary statistics for a
> continuous variable - these are not very informative unless you assume the
> variable is normally distributed. However, clinicians often insist that I
> included these figures in reports.
>
> Cody Hamilton, PhD
> Edwards Lifesciences

Well said Cody, just want to add that Wilcoxon does not assume symmetry if you are interested in testing for stochastic ordering and not just for a mean.

Frank

>
>
>
>
> Frank E Harrell
> Jr
> <f.harrell_at_vander To
> bilt.edu> "Lucke, Joseph F"
> Sent by: <Joseph.F.Lucke_at_uth.tmc.edu>
> r-help-bounces_at_st cc
> at.math.ethz.ch r-help <r-help_at_stat.math.ethz.ch>
> Subject
> Re: [R] normality tests
> 05/25/2007 02:42 [Broadcast]
> PM
>
>
>
>
>
>
>
>
>
> Lucke, Joseph F wrote:

>>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
>> non-normalilty for significance testing. It's the sample means that have
>> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
>> for normality prior to choosing a test statistic is generally not a good
>> idea.

>
> I beg to differ Joseph. I have had many datasets in which the CLT was
> of no use whatsoever, i.e., where bootstrap confidence limits were
> asymmetric because the data were so skewed, and where symmetric
> normality-based confidence intervals had bad coverage in both tails
> (though correct on the average). I see this the opposite way:
> nonparametric tests works fine if normality holds.
>
> Note that the CLT helps with type I error but not so much with type II
> error.
>
> Frank
>
>> -----Original Message-----
>> From: r-help-bounces_at_stat.math.ethz.ch
>> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of Liaw, Andy
>> Sent: Friday, May 25, 2007 12:04 PM
>> To: gatemaze_at_gmail.com; Frank E Harrell Jr
>> Cc: r-help
>> Subject: Re: [R] normality tests [Broadcast]
>>
>> From: gatemaze_at_gmail.com
>>> On 25/05/07, Frank E Harrell Jr <f.harrell_at_vanderbilt.edu> wrote:
>>>> gatemaze_at_gmail.com wrote:
>>>>> Hi all,
>>>>>
>>>>> apologies for seeking advice on a general stats question. I ve run
>>>>> normality tests using 8 different methods:
>>>>> - Lilliefors
>>>>> - Shapiro-Wilk
>>>>> - Robust Jarque Bera
>>>>> - Jarque Bera
>>>>> - Anderson-Darling
>>>>> - Pearson chi-square
>>>>> - Cramer-von Mises
>>>>> - Shapiro-Francia
>>>>>
>>>>> All show that the null hypothesis that the data come from a normal
>>>>> distro cannot be rejected. Great. However, I don't think
>>> it looks nice
>>>>> to report the values of 8 different tests on a report. One note is
>>>>> that my sample size is really tiny (less than 20
>>> independent cases).
>>>>> Without wanting to start a flame war, are there any
>>> advices of which
>>>>> one/ones would be more appropriate and should be reported
>>> (along with
>>>>> a Q-Q plot). Thank you.
>>>>>
>>>>> Regards,
>>>>>
>>>> Wow - I have so many concerns with that approach that it's
>>> hard to know
>>>> where to begin.  But first of all, why care about
>>> normality?  Why not
>>>> use distribution-free methods?
>>>>
>>>> You should examine the power of the tests for n=20.  You'll probably
>>>> find it's not good enough to reach a reliable conclusion.
>>> And wouldn't it be even worse if I used non-parametric tests?
>> I believe what Frank meant was that it's probably better to use a
>> distribution-free procedure to do the real test of interest (if there is
>> one) instead of testing for normality, and then use a test that assumes
>> normality.
>>
>> I guess the question is, what exactly do you want to do with the outcome
>> of the normality tests?  If those are going to be used as basis for
>> deciding which test(s) to do next, then I concur with Frank's
>> reservation.
>>
>> Generally speaking, I do not find goodness-of-fit for distributions very
>> useful, mostly for the reason that failure to reject the null is no
>> evidence in favor of the null.  It's difficult for me to imagine why
>> "there's insufficient evidence to show that the data did not come from a
>> normal distribution" would be interesting.
>>
>> Andy
>>
>>
>>>> Frank
>>>>
>>>>
>>>> --
>>>> Frank E Harrell Jr   Professor and Chair           School
>>> of Medicine
>>>>                       Department of Biostatistics
>>> Vanderbilt University
>>>
>>> --
>>> yianni
>>>
>>> ______________________________________________
>>> R-help_at_stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>
>> ------------------------------------------------------------------------
>> ------
>> Notice:  This e-mail message, together with any
>> attachments,...{{dropped}}
>>
>> ______________________________________________
>> R-help_at_stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

>
>
> --
> Frank E Harrell Jr Professor and Chair School of Medicine
> Department of Biostatistics Vanderbilt University
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 25 May 2007 - 22:45:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 26 May 2007 - 01:31:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.