Re: [R] Finding overlaps in vector

From: Johannes Graumann <johannes_graumann_at_web.de>
Date: Sat, 22 Dec 2007 14:38:21 +0100

Here's what I finally came up with. Thanks for your help!

Joh

MQUSpotOverlapClusters <- function(
  Series,# Vector of data to be evaluated   distance=0.5,# Maximum distance of clustered data points   minSize=2# Minimum size of clusters returned ){

############################################################################################

# Check prerequisites
  #####################

# Check prerequisites: Series

  if(!(is.numeric(Series) & length(Series) > 1)){     stop("'Series' must be a vector of numerical data.")   }
# Check prerequisites: distance

  if(!(is.numeric(distance) & distance > 0)){     stop("'distance' must be a positive number.")   }
############################################################################################

# Perform clustering
  ####################

  hc <- hclust(dist(Series), method = "single")   hcut <- cutree(hc,h=distance)
  cluster.idx <- c()
  for(i in unique(hcut)){
    members <- which(hcut == i)
    if(length(members) >= minSize){
      cluster.idx <- append(cluster.idx,list(members))     }
  }
  return(cluster.idx)
}

Gabor Grothendieck wrote:

> If we don't need any plotting we don't really need rect.hclust at
> all.  Split the output of cutree, instead.  Continuing from the
> prior code:
> 
>> for(el in split(unname(vv), names(vv))) print(el)
> [1] 0.00 0.45
> [1] 1
> [1] 2
> [1] 3.00 3.25 3.33 3.75 4.10
> [1] 5
> [1] 6.00 6.45
> [1] 7.0 7.1
> [1] 8
> 
> On Dec 21, 2007 3:24 PM, Johannes Graumann <johannes_graumann_at_web.de>
> wrote:
>> Hm, hm, rect.hclust doesn't accept "plot=FALSE" and cutree doesn't retain
>> the indexes of membership ... anyway short of ripping out the guts of
>> rect.hclust to achieve the same result without an active graphics device?
>>
>> Joh
>>
>>
>> >> # cluster and plot
>> >> hc <- hclust(dist(v), method = "single")
>> >> plot(hc, lab = v)
>> >> cl <- rect.hclust(hc, h = .5, border = "red")
>> >>
>> >> # each component of list cl is one cluster.  Print them out.
>> >> for(idx in cl) print(unname(v[idx]))
>> > [1] 8
>> > [1] 7.0 7.1
>> > [1] 6.00 6.45
>> > [1] 5
>> > [1] 3.00 3.25 3.33 3.75 4.10
>> > [1] 2
>> > [1] 1
>> > [1] 0.00 0.45
>> >
>> >> # a different representation of the clusters
>> >> vv <- v
>> >> names(vv) <- ct <- cutree(hc, h = .5)
>> >> vv
>> >    1    1    2    3    4    4    4    4    4    5    6    6    7    7  
>> >     8
>> > 0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00 7.10
>> > 8.00
>> >
>> >
>> > On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann_at_web.de>
>> > wrote:
>> >> <posted & mailed>
>> >>
>> >> Dear all,
>> >>
>> >> I'm trying to solve the problem, of how to find clusters of values in
>> >> a vector that are closer than a given value. Illustrated this might
>> >> look as follows:
>> >>
>> >> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
>> >>
>> >> When using '0.5' as the proximity requirement, the following groups
>> >> would result:
>> >> 0,0.45
>> >> 3,3.25,3.33,3.75,4.1
>> >> 6,6.45
>> >> 7,7.1
>> >>
>> >> Jim Holtman proposed a very elegant solution in
>> >> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I
>> >> have modified and perused since he wrote it to me. The beauty of this
>> >> approach is that it will not only work for constant proximity
>> >> requirements as above, but also for overlap-windows defined in terms
>> >> of ppm around each value. Now I have an additional need and have found
>> >> no way (short of iteratively step through all the groups returned) to
>> >> figure out how to do that with Jim's approach: how to figure out that
>> >> 6,6.45 and 7,7.1 are separate clusters?
>> >>
>> >> Thanks for any hints, Joh
>> >>
> 
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 22 Dec 2007 - 13:48:11 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 22 Dec 2007 - 17:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.