From: Thomas Lumley <tlumley_at_u.washington.edu>

Date: Mon, 28 May 2007 11:58:37 -0700 (PDT)

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 28 May 2007 - 19:13:05 GMT

Date: Mon, 28 May 2007 11:58:37 -0700 (PDT)

On Mon, 28 May 2007, Martin Maechler wrote:

>>>>>> "LuckeJF" == Lucke, Joseph F <Joseph.F.Lucke@uth.tmc.edu>

*>>>>>> on Fri, 25 May 2007 12:29:49 -0500 writes:
**>
**> LuckeJF> Most standard tests, such as t-tests and ANOVA,
**> LuckeJF> are fairly resistant to non-normalilty for
**> LuckeJF> significance testing. It's the sample means that
**> LuckeJF> have to be normal, not the data. The CLT kicks in
**> LuckeJF> fairly quickly.
**>
**> Even though such statements appear in too many (text)books,
**> that's just plain wrong practically:
**>
**> Even though *level* of the t-test is resistant to non-normality,
**> the power is not at all!! And that makes the t-test NON-robust!
*

While it is true that this makes the t-test non-robust, it doesn't mean that the statement is just plain wrong practically.

The issue really is more complicated than a lot of people claim (not you specifically, Martin, but upthread and previous threads).

Starting with the demonstrable mathematical facts:

- lots of rank tests are robust in the sense of Huber
- rank tests are optimal for specific location-shift testing problems.
- lots of rank tests have excellent power for location shift alternatives over a wide range of underlying distributions.
- rank tests fail to be transitive when stochastic ordering is not assumed (they are not consistent with any ordering on all distributions)
- rank tests do not lead to confidence intervals unless a location shift or similar one-dimensional family is assumed
- No rank test is uniformly more powerful than any parametric test or vice versa (if we rule out pathological cases)
- there is no rank test that is consistent precisely against a difference in means
- the t-test (and essentially all tests) can be made distribution-free in large samples (for small values of 'large', usually)
- being distribution-free does not guarantee robustness of power (for the t-test or for any other test)

Now, if we assume stochastic ordering is the Wilcoxon rank-sum test more or less powerful than the t-test? Everyone knows that this depends on the null hypothesis distribution. Fewer people seem to know that it also depends on the alternative, especially in large samples.

Suppose the alternative of interest is not that the values are uniformly larger by 1 unit, but that 5% of them are about 20 units larger. The Wilcoxon test -- precisely because it gives less weight to outliers -- will have lower power. For example (ObR)

one.sim<-function(n, pct, delta){

x<-rnorm(n) y<-rnorm(n)+delta*rbinom(n,1,pct) list(x=x,y=y) }

mean(replicate(100, {d<-one.sim(100,.05,20); t.test(d$x,d$y)$p.value})<0.05) mean(replicate(100, {d<-one.sim(100,.05,20); wilcox.test(d$x,d$y)$p.value})<0.05)

mean(replicate(100, {d<-one.sim(100,.5,1); t.test(d$x,d$y)$p.value})<0.05) mean(replicate(100, {d<-one.sim(100,.5,1); wilcox.test(d$x,d$y)$p.value})<0.05)

Since both relatively uniform shifts and large shifts of small fractions are genuinely important alternatives in real problems it is true in practice as well as in theory that neither the Wilcoxon nor the t-test is uniformly superior.

This is without even considering violations of stochastic ordering -- which are not just esoteric pathologies, since it is quite plausible for a treatment to benefit some people and harm others. For example, I've seen one paper in which a Wilcoxon test on medical cost data was statistically significant in the *opposite direction* to the difference in means.

This has been a long rant, but I keep encountering statisticians who think anyone who ever recommends a t-test just needs to have the number 0.955 quoted to them.

<snip>

*>
*

> LuckeJF> Testing for normality prior to choosing a test

*> LuckeJF> statistic is generally not a good idea.
**>
**> Definitely. Or even: It's a very bad idea ...
**>
*

I think that's something we can all agree on.

-thomas

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 28 May 2007 - 19:13:05 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 28 May 2007 - 20:33:24 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*