# Re: [R] Finding overlaps in vector

From: Johannes Graumann <johannes_graumann_at_web.de>
Date: Fri, 21 Dec 2007 21:04:07 +0100

Thank you very much for this elegant solution to the problem. The reason I still hope for an extension of Jim's code (not the one re responded with in this thread, but the one I actually reference) is that windows of overlap can be asymetric with that: one can check e.g. whether values overlap given the constraints that the closest allowed proximity 'down' is 0.5 and 'up' is 0.75. I would highly cherish a solution that would allow for cluster isolation with that requirement.

Thanks for your time and insight,

Joh

Gabor Grothendieck wrote:

```> This may not be as direct as Jim's in terms of specifying granularity but
> will uses conventional hierarchical clustering to create the clusters and
> also
> draws a nice dendrogram for you.   I have split the dendrogram at a
> height of 0.5
> to define the clusters but you can change that to whatever granularity you
> like:
>
```

>> v <- c(0, 0.45, 1, 2, 3, 3.25, 3.33, 3.75, 4.1, 5, 6, 6.45, 7, 7.1, 8)
>>
>> # cluster and plot
>> hc <- hclust(dist(v), method = "single")
>> plot(hc, lab = v)
>> cl <- rect.hclust(hc, h = .5, border = "red")
>>
>> # each component of list cl is one cluster. Print them out.
>> for(idx in cl) print(unname(v[idx]))
```> [1] 8
> [1] 7.0 7.1
> [1] 6.00 6.45
> [1] 5
> [1] 3.00 3.25 3.33 3.75 4.10
> [1] 2
> [1] 1
> [1] 0.00 0.45
>
```

>> # a different representation of the clusters
>> vv <- v
>> names(vv) <- ct <- cutree(hc, h = .5)
>> vv
```>    1    1    2    3    4    4    4    4    4    5    6    6    7    7    8
> 0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00 7.10 8.00
>
>
> On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann_at_web.de>
> wrote:
```

>> <posted & mailed>
>>
>> Dear all,
>>
>> I'm trying to solve the problem, of how to find clusters of values in a
>> vector that are closer than a given value. Illustrated this might look as
>> follows:
>>
>> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
>>
>> When using '0.5' as the proximity requirement, the following groups would
>> result:
>> 0,0.45
>> 3,3.25,3.33,3.75,4.1
>> 6,6.45
>> 7,7.1
>>
>> Jim Holtman proposed a very elegant solution in
>> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
>> modified and perused since he wrote it to me. The beauty of this approach
>> is that it will not only work for constant proximity requirements as
>> above, but also for overlap-windows defined in terms of ppm around each
>> value. Now I have an additional need and have found no way (short of
>> iteratively step through all the groups returned) to figure out how to do
>> that with Jim's approach: how to figure out that 6,6.45 and 7,7.1 are
>> separate clusters?
>>
>> Thanks for any hints, Joh
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html and provide commented,
>> minimal, self-contained, reproducible code.
>>
```>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help