From: P Ehlers <ehlers_at_math.ucalgary.ca>

Date: Thu 18 Aug 2005 - 18:42:58 EST

>> On Wed, 17 Aug 2005, Greg Hather wrote:

*>>
*

*>>
*

*>>> I'm having trouble with the wilcox.test command in R.
*

*>>
*

*>>
*

*>>
*

*>> Are you sure it is not the concepts that are giving 'trouble'?
*

*>> What real problem are you trying to solve here?
*

*>>
*

*>>
*

*>>> To demonstrate the anomalous behavior of wilcox.test, consider
*

*>>>
*

*>>>
*

*>>>> wilcox.test(c(1.5,5.5), c(1:10000), exact = F)$p.value
*

*>>>
*

*>>>
*

*>>> [1] 0.01438390
*

*>>>
*

*>>>> wilcox.test(c(1.5,5.5), c(1:10000), exact = T)$p.value
*

*>>>
*

*>>>
*

*>>> [1] 6.39808e-07 (this calculation takes noticeably longer).
*

*>>>
*

*>>>> wilcox.test(c(1.5,5.5), c(1:20000), exact = T)$p.value
*

*>>>
*

*>>>
*

*>>> (R closes/crashes)
*

*>>>
*

*>>> I believe that wilcox.test(c(1.5,5.5), c(1:10000), exact = F)$p.value
*

*>>> yields a bad result because of the normal approximation which R uses
*

*>>> when exact = F.
*

*>>
*

*>>
*

*>>
*

*>> Expecting an approximation to be good in the tail for m=2 is pretty
*

*>> unrealistic. But then so is believing the null hypothesis of a common
*

*>> *continuous* distribution. Why worry about the distribution under a
*

*>> hypothesis that is patently false?
*

*>>
*

*>> People often refer to this class of tests as `distribution-free', but
*

*>> they are not. The Wilcoxon test is designed for power against shift
*

*>> alternatives, but here there appears to be a very large difference in
*

*>> spread. So
*

*>>
*

*>>
*

*>>> wilcox.test(5000+c(1.5,5.5), c(1:10000), exact = T)$p.value
*

*>>
*

*>>
*

*>> [1] 0.9989005
*

*>>
*

*>> even though the two samples differ in important ways.
*

*>>
*

*>>
*

*>>
*

*>>> Any suggestions for how to compute wilcox.test(c(1.5,5.5),
*

*>>> c(1:20000), exact = T)$p.value?
*

*>>
*

*>>
*

*>>
*

*>> I get (current R 2.1.1 on Linux)
*

*>>
*

*>>
*

*>>> wilcox.test(c(1.5,5.5), c(1:20000), exact = T)$p.value
*

*>>
*

*>>
*

*>> [1] 1.59976e-07
*

*>>
*

*>> and no crash. So the suggestion is to use a machine adequate to the
*

*>> task, and that probably means an OS with adequate stack size.
*

*>>
*

*>>
*

*>>> [[alternative HTML version deleted]]
*

*>>
*

*>>
*

*>>
*

*>>> PLEASE do read the posting guide!
*

*>>> http://www.R-project.org/posting-guide.html
*

*>>
*

*>>
*

*>>
*

*>> Please do heed it. What version of R and what machine is this? And
*

*>> do take note of the request about HTML mail.
*

*>>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Aug 18 18:47:44 2005

Date: Thu 18 Aug 2005 - 18:42:58 EST

P Ehlers wrote:

> > Prof Brian Ripley wrote: >

>> On Wed, 17 Aug 2005, Greg Hather wrote:

> > One could also try wilcox.exact() in package exactRankTests (0.8-11) > which also gives (with suitable patience) > > [1] 1.59976e-07 > > even on my puny 256M Windows laptop. > > Still, it might be worthwhile adding a "don't do something this silly" > error message to wilcox.test() rather than having it crash R. Low > priority, IMHO. > > Windows XP SP2 > "R version 2.1.1, 2005-08-11" > > Peter Ehlers >

I should also mention package coin's wilcox_test() which does the job in about a quarter of the time used by exactRankTests.

Peter Ehlers

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Aug 18 18:47:44 2005

*
This archive was generated by hypermail 2.1.8
: Sun 23 Oct 2005 - 15:28:34 EST
*