Re: [R] Generating groupings of ordered observations

From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Sat, 21 Jun 2008 17:51:04 -0700

On Sat, 21 Jun 2008, Gavin Simpson wrote:

> On Sat, 2008-06-21 at 17:56 -0400, Gabor Grothendieck wrote:
>> On Sat, Jun 21, 2008 at 12:40 PM, Gavin Simpson <gavin.simpson_at_ucl.ac.uk> wrote:
>>> Dear List,
>>>
>>> I have a problem I'm finding it difficult to make headway with.
>>>
>>> Say I have 6 ordered observations, and I want to find all combinations
>>> of splitting these 6 ordered observations in g groups, where g = 1, ...,
>>> 6. Groups can only be formed by adjacent observations, so observations 1
>>> and 4 can't be in a group on their own, only if 1,2,3&4 are all in the
>>> group.
>>>
> <snip />
>>>
>>> I'd like to be able to do this automagically, for any (reasonable,
>>> small, say n = 10-20) number of observations, n, and for g = 1, ..., n
>>> groups.
>>>
>>> I can't see the pattern here or a way forward. Can anyone suggest an
>>> approach?
>>>
>>
>> Peter Wolf has APL-style encode/decode functions on his web site that
>> can readily be used for this. The output of the encode below are the binary
>> digits expansions of the numbers 0:15, one per column, and the remainder
>> transforms that matrix to the required one (but columns are in a different
>> order than yours):
>
> Thanks for this Gabor. Will take a look.
>
> One additional complication that I failed to mention, is that I would
> like to state the number of groups. So in my example, instead of
> splitting the 6 observations into 1,2,...,6 groups, I would like to get
> only the sets that partition the 6 observations into say 4 groups.
>
> For larger problems, where n is > 100 it is not possible to do the
> required calculations on the choose(n-1, 0:(n-1)) possible combinations.
> Instead we would evaluate the combinations of the n observations
> relating to a small number of groups, say g = 1:10 where n = 100 for
> example.
>
> So Chuck and your solutions answer the question I asked, but I can't see
> how to modify them to set the number of groups to be less than the
> number of observations.

So you want all ways to split an ordered set of n objects into k contiguous groupings with k << n ??

The approach shown in this thread is one approach:

         http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76379.html

but with more than N=30, you will need to typedef an object that has enough bits to represent any possible split.

Also, apropos the warnings (further down) in that thread, you can easily exhaust the memory of a computer. For n=100 and ngroups=10, there are choose(99,9) possibilities which I guess is many terabytes.

HTH, Chuck

>
> All the best,
>
> G
>
>>
>>> source("http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/decodeencode/decodeencode.R")
>>> n <- 6
>>> n1 <- n-1
>>> apply(rbind(1, encode(0:(2^n1-1), rep(2, n1))), 2, cumsum)
>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
>> [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
>> [,24] [,25] [,26]
>> [1,] 1 1 1 1 1 1 1 1 1 1 1 1
>> 1 1 1 1 1 1 1 1 1 1 1 1
>> 1 1
>> [2,] 1 1 1 1 1 1 1 1 1 1 1 1
>> 1 1 1 1 2 2 2 2 2 2 2 2
>> 2 2
>> [3,] 1 1 1 1 1 1 1 1 2 2 2 2
>> 2 2 2 2 2 2 2 2 2 2 2 2
>> 3 3
>> [4,] 1 1 1 1 2 2 2 2 2 2 2 2
>> 3 3 3 3 2 2 2 2 3 3 3 3
>> 3 3
>> [5,] 1 1 2 2 2 2 3 3 2 2 3 3
>> 3 3 4 4 2 2 3 3 3 3 4 4
>> 3 3
>> [6,] 1 2 2 3 2 3 3 4 2 3 3 4
>> 3 4 4 5 2 3 3 4 3 4 4 5
>> 3 4
>> [,27] [,28] [,29] [,30] [,31] [,32]
>> [1,] 1 1 1 1 1 1
>> [2,] 2 2 2 2 2 2
>> [3,] 3 3 3 3 3 3
>> [4,] 3 3 4 4 4 4
>> [5,] 4 4 4 4 5 5
>> [6,] 4 5 4 5 5 6
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> ECRC, UCL Geography, [f] +44 (0)20 7679 0565
> Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry_at_tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 22 Jun 2008 - 00:57:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 22 Jun 2008 - 01:30:47 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive