[R] Problems with hclust and/or cutree.

From: Rolf Turner <r.turner_at_auckland.ac.nz>
Date: Fri, 30 May 2008 12:33:13 +1200

I have been attempting to do some work using hclust, and have run into a (possibly subtle) problem.

  The background is that I constructed a dissimilarity matrix ``d1''
(it involved something called the ``Jaccard similarity coefficient'';
I won't go
into the details unless requested). I then did

	d2 <- as.dist(d1)
	try <- hclust(d2,method=ward)
	plot(try,labels=FALSE)

After looking at the plot, I tried

        mmm <- cutree(try,h=7)

and got the error message

Error in cutree(try, h = 7) :

   the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first

I was much puzzled by this initially, since try is already an ``hclust'' object
(I checked class(try)) but after a substantial amount of hair-tearing
I discovered
that the entries of the height component of try are constant over long stretches.
E.g. the first 54 entries are 0 (to the 7 printed decimal places). This doesn't
*seem* to be cause for alarm --- the help says explicitly that height is a
*non-decreasing* sequence (but not necessarily a strictly increasing one).

I checked

        with(try,all.equal(height,sort(height))

and got

[1] TRUE but order(try$height) is NOT equal to 1:745 (note that 746 is the number of subjects
in the data set).

I have done an RSiteSearch() on "cutree" and turned up nothing that seemed relevant.

Finally, I found that if I do

        try$height <- round(try$height,6)
then

        mmm <- cutree(try,h=7)

``works'' (without error).

Are there traps for young players in employing such a strategy? What should I
really worry about?

If anyone wants to try it for themselves with the real distance matrix, I can bundle
it up and email it to them privately.

Thanks for any insights.

        cheers,

                Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 30 May 2008 - 00:36:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 30 May 2008 - 02:30:45 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive