# Re: [R] Randomization tests, grouped data

From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Fri, 11 Jan 2008 14:50:34 -0800

On Fri, 11 Jan 2008, Johannes Hüsing wrote:

> Tom Backer Johnsen <backer@psych.uib.no> [Fri, Jan 11, 2008 at 06:57:41PM CET]:
> [...]
>>> Are there something that can handle this in R?
>>
>
> Have you considered the coin package?

>
>> After a few hours thinking on and off about the problem, I suspect
>> that the question may be stupid or silly (or both). If that is the
>> case, I would very much like to know why.
>>
>
> I am not quite clear in my thinking anymore, but there are 2^2n
> permutations, of which (2n choose n) happen to yield the same
> effect. These cases are "part of life" and should be counted in
> the permutation test just as well. You might save a little bit of
> computation time by singling these group-preserving permutations
> out, but this is not worth the while at all.
>

It depends (as always...)

Suppose you have two samples with n1 and n2 independent observations in each. You wish to do a two sample test on each of M variables and M is quite large. And you wish to account for multiplicity in testing. So, a permutation test is constructed.

If n1 == n2 == 4, there are choose(8,4) == 70 arrangements. By enumerating them all you can get the p-value of your test statistic, and often this is practical.

But if you sample (say) 70 from the factorial(8) arrangements, you will likely miss some and repeat others. The number 0.632 comes to mind as the fraction of distinct arrangements that will actually show up (see Efron and Tibs Intro to the Bootstrap to check if this is right).

To get an accurate p-value via sampling from the factorial(8), you would need a much larger sample than the number of distinct arrangements.

OTOH, if the number of distinct arrangements is too large to be able to enumerate them all and is much larger than the number you could afford to enumerate, then sampling from factorial(n1+n2) and sampling from choose(n1+n2,n2) are nearly equivalent. You could use the finite population correction to ascertain just how different they are, I think.

HTH, Chuck

> --
> Johannes Hüsing There is something fascinating about science.
> One gets such wholesale returns of conjecture
> mailto:johannes_at_huesing.name from such a trifling investment of fact.
> http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi")
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

```Charles C. Berry                            (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry_at_tajo.ucsd.edu	            UC San Diego
```
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 11 Jan 2008 - 22:53:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Jan 2008 - 23:30:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.