Re: [R] normality tests [Broadcast]

From: <Cody_Hamilton_at_edwards.com>
Date: Fri, 25 May 2007 15:28:26 -0700

Following up on Frank's thought, why is it that parametric tests are so much more popular than their non-parametric counterparts? As non-parametric tests require fewer assumptions, why aren't they the default? The relative efficiency of the Wilcoxon test as compared to the t-test is 0.955, and yet I still see t-tests in the medical literature all the time. Granted, the Wilcoxon still requires the assumption of symmetry (I'm curious as to why the Wilcoxon is often used when asymmetry is suspected, since the Wilcoxon assumes symmetry), but that's less stringent than requiring normally distributed data. In a similar vein, one usually sees the mean and standard deviation reported as summary statistics for a continuous variable - these are not very informative unless you assume the variable is normally distributed. However, clinicians often insist that I included these figures in reports.

Cody Hamilton, PhD
Edwards Lifesciences

                                                                           
             Frank E Harrell                                               
             Jr                                                            
             <f.harrell_at_vander                                          To 
             bilt.edu>                 "Lucke, Joseph F"                   
             Sent by:                  <Joseph.F.Lucke_at_uth.tmc.edu>        
             r-help-bounces_at_st                                          cc 
             at.math.ethz.ch           r-help <r-help_at_stat.math.ethz.ch>   
                                                                   Subject 
                                       Re: [R] normality tests             
             05/25/2007 02:42          [Broadcast]                         
             PM                                                            
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Lucke, Joseph F wrote:
> Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> non-normalilty for significance testing. It's the sample means that have
> to be normal, not the data. The CLT kicks in fairly quickly. Testing
> for normality prior to choosing a test statistic is generally not a good
> idea.

I beg to differ Joseph. I have had many datasets in which the CLT was of no use whatsoever, i.e., where bootstrap confidence limits were asymmetric because the data were so skewed, and where symmetric normality-based confidence intervals had bad coverage in both tails (though correct on the average). I see this the opposite way: nonparametric tests works fine if normality holds.

Note that the CLT helps with type I error but not so much with type II error.

Frank

>
> -----Original Message-----
> From: r-help-bounces_at_stat.math.ethz.ch
> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of Liaw, Andy
> Sent: Friday, May 25, 2007 12:04 PM
> To: gatemaze_at_gmail.com; Frank E Harrell Jr
> Cc: r-help
> Subject: Re: [R] normality tests [Broadcast]
>
> From: gatemaze_at_gmail.com
>> On 25/05/07, Frank E Harrell Jr <f.harrell_at_vanderbilt.edu> wrote:
>>> gatemaze_at_gmail.com wrote:
>>>> Hi all,
>>>>
>>>> apologies for seeking advice on a general stats question. I ve run
>
>>>> normality tests using 8 different methods:
>>>> - Lilliefors
>>>> - Shapiro-Wilk
>>>> - Robust Jarque Bera
>>>> - Jarque Bera
>>>> - Anderson-Darling
>>>> - Pearson chi-square
>>>> - Cramer-von Mises
>>>> - Shapiro-Francia
>>>>
>>>> All show that the null hypothesis that the data come from a normal
>
>>>> distro cannot be rejected. Great. However, I don't think
>> it looks nice
>>>> to report the values of 8 different tests on a report. One note is
>
>>>> that my sample size is really tiny (less than 20
>> independent cases).
>>>> Without wanting to start a flame war, are there any
>> advices of which
>>>> one/ones would be more appropriate and should be reported
>> (along with
>>>> a Q-Q plot). Thank you.
>>>>
>>>> Regards,
>>>>
>>> Wow - I have so many concerns with that approach that it's
>> hard to know
>>> where to begin. But first of all, why care about
>> normality? Why not
>>> use distribution-free methods?
>>>
>>> You should examine the power of the tests for n=20. You'll probably
>
>>> find it's not good enough to reach a reliable conclusion.
>> And wouldn't it be even worse if I used non-parametric tests?
>
> I believe what Frank meant was that it's probably better to use a
> distribution-free procedure to do the real test of interest (if there is
> one) instead of testing for normality, and then use a test that assumes
> normality.
>
> I guess the question is, what exactly do you want to do with the outcome
> of the normality tests? If those are going to be used as basis for
> deciding which test(s) to do next, then I concur with Frank's
> reservation.
>
> Generally speaking, I do not find goodness-of-fit for distributions very
> useful, mostly for the reason that failure to reject the null is no
> evidence in favor of the null. It's difficult for me to imagine why
> "there's insufficient evidence to show that the data did not come from a
> normal distribution" would be interesting.
>
> Andy
>
>
>>> Frank
>>>
>>>
>>> --
>>> Frank E Harrell Jr Professor and Chair School
>> of Medicine
>>> Department of Biostatistics
>> Vanderbilt University
>>
>> --
>> yianni
>>
>> ______________________________________________
>> R-help_at_stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>
> ------------------------------------------------------------------------
> ------
> Notice: This e-mail message, together with any
> attachments,...{{dropped}}
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 25 May 2007 - 22:34:40 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 26 May 2007 - 00:31:04 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.