Re: [R] trouble with wilcox.test

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu 18 Aug 2005 - 19:38:48 EST

If this is stack overflow (and I don't know that yet: when I tried this on Windows the traceback was clearly corrupt, referring to bratio), the issue is that it is impossible to catch such an error, and it is not even AFAIK portably possible to find the stack size limit (or even the current usage) to do some estimates. (The amount of RAM is not relevant.) On Unix-alikes the stack size limit can be controlled from the shell used to launch R so we don't have any a priori knowledge.

The underlying code could be rewritten not to use recursion, but that seems not worth the effort involved.

All I can see we can do it to put a warning in the help file.

On Thu, 18 Aug 2005, P Ehlers wrote:

>
> Prof Brian Ripley wrote:
>> On Wed, 17 Aug 2005, Greg Hather wrote:
>>
>>
>>> I'm having trouble with the wilcox.test command in R.
>>
>>
>> Are you sure it is not the concepts that are giving 'trouble'?
>> What real problem are you trying to solve here?
>>
>>
>>> To demonstrate the anomalous behavior of wilcox.test, consider
>>>
>>>
>>>> wilcox.test(c(1.5,5.5), c(1:10000), exact = F)$p.value
>>>
>>> [1] 0.01438390
>>>
>>>> wilcox.test(c(1.5,5.5), c(1:10000), exact = T)$p.value
>>>
>>> [1] 6.39808e-07 (this calculation takes noticeably longer).
>>>
>>>> wilcox.test(c(1.5,5.5), c(1:20000), exact = T)$p.value
>>>
>>> (R closes/crashes)
>>>
>>> I believe that wilcox.test(c(1.5,5.5), c(1:10000), exact = F)$p.value
>>> yields a bad result because of the normal approximation which R uses when
>>> exact = F.
>>
>>
>> Expecting an approximation to be good in the tail for m=2 is pretty
>> unrealistic. But then so is believing the null hypothesis of a common
>> *continuous* distribution. Why worry about the distribution under a
>> hypothesis that is patently false?
>>
>> People often refer to this class of tests as `distribution-free', but they
>> are not. The Wilcoxon test is designed for power against shift
>> alternatives, but here there appears to be a very large difference in
>> spread. So
>>
>>
>>> wilcox.test(5000+c(1.5,5.5), c(1:10000), exact = T)$p.value
>>
>> [1] 0.9989005
>>
>> even though the two samples differ in important ways.
>>
>>
>>
>>> Any suggestions for how to compute wilcox.test(c(1.5,5.5), c(1:20000),
>>> exact = T)$p.value?
>>
>>
>> I get (current R 2.1.1 on Linux)
>>
>>
>>> wilcox.test(c(1.5,5.5), c(1:20000), exact = T)$p.value
>>
>> [1] 1.59976e-07
>>
>> and no crash. So the suggestion is to use a machine adequate to the task,
>> and that probably means an OS with adequate stack size.
>>
>>
>>> [[alternative HTML version deleted]]
>>
>>
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>
>>
>> Please do heed it. What version of R and what machine is this? And do
>> take note of the request about HTML mail.
>>
>
> One could also try wilcox.exact() in package exactRankTests (0.8-11)
> which also gives (with suitable patience)
>
> [1] 1.59976e-07
>
> even on my puny 256M Windows laptop.
>
> Still, it might be worthwhile adding a "don't do something this silly"
> error message to wilcox.test() rather than having it crash R. Low
> priority, IMHO.
>
> Windows XP SP2
> "R version 2.1.1, 2005-08-11"
>
> Peter Ehlers
>
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Thu Aug 18 19:43:28 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:39:51 EST