[R] hclust(stats) merge matrix interpretation

From: Tarun Kumar Singh <tarun30devraj_at_gmail.com>
Date: Sun 12 Feb 2006 - 00:33:05 EST


Hi,

We are trying to interpret the clusters generated by hclust method of R "stats" package. The problem here is when i get the hc$order then there is some order, while exporting to file that order is lost. Here is the example code and their results:

> hc <- hclust(dist(USArrests), "ave")
> plot(hc)
> hc$label
 [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"
 [5] "California"     "Colorado"       "Connecticut"    "Delaware"
 [9] "Florida"        "Georgia"        "Hawaii"         "Idaho"

[13] "Illinois" "Indiana" "Iowa" "Kansas"
[17] "Kentucky" "Louisiana" "Maine" "Maryland"
[21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska" "Nevada"
[29] "New Hampshire" "New Jersey" "New Mexico" "New York"
[33] "North Carolina" "North Dakota" "Ohio" "Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode Island" "South Carolina"
[41] "South Dakota" "Tennessee" "Texas" "Utah"
[45] "Vermont" "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"

> hc$order
 [1] 9 33 5 20 3 31 8 1 18 13 32 22 28 2 24 40 47 37 50 36 46 39 21 30 25
[26] 4 42 10 6 43 12 27 17 26 35 44 14 16 7 38 11 48 19 41 34 45 23 49 15
29
> hc$height
 [1]   2.291288   3.834058   3.929377   6.236986   6.637771   7.355270
 [7]   8.027453   8.537564  10.184218  10.736739  10.771175  11.456439

[13] 12.438692 12.614278 12.878100 13.044922 13.297368 13.352260
[19] 13.896043 14.501034 15.026107 15.122897 15.453120 15.454449
[25] 16.425489 16.891499 18.417331 18.993398 20.198479 20.598507
[31] 21.167192 22.595978 23.972143 26.363428 26.713777 27.779904
[37] 28.012211 28.095803 29.054195 33.117815 38.527912 39.394633
[43] 41.094765 44.283922 44.837933 54.746831 77.605024 89.232093
[49] 152.313999

> hc$merge

      [,1] [,2]

 [1,]  -15  -29
 [2,]  -17  -26
 [3,]  -14  -16
 [4,]  -13  -32
 [5,]  -35  -44
 [6,]  -36  -46
 [7,]   -7  -38
 [8,]  -19  -41
 [9,]  -49    1

[10,] -50 6
[11,] -48 8
[12,] -21 -30
[13,] -27 2
[14,] -4 -42
[15,] -37 10
[16,] -34 -45
[17,] -22 -28
[18,] 3 7
[19,] -3 -31
[20,] -6 -43
[21,] -12 13
[22,] 5 18
[23,] -20 19
[24,] -1 -18
[25,] -47 15
[26,] -8 24
[27,] 4 17
[28,] -23 9
[29,] -25 14
[30,] 21 22
[31,] -24 -40
[32,] -39 12
[33,] -10 20
[34,] 26 27
[35,] 25 32
[36,] 16 28
[37,] -5 23
[38,] -2 31
[39,] 29 33
[40,] 11 36
[41,] -9 -33
[42,] 34 38
[43,] -11 40
[44,] 37 42
[45,] 35 39
[46,] 30 43
[47,] 41 44
[48,] 45 46
[49,] 47 48

Plot generates a dendrogram with clustered nodes. Ideal solution for us would be, a method which generates a matrix with distance attributes for each node from the dendrogram. Even if anyone could suggest a method such that we could keep the hc$order structure intact. It would help us a lot.

Second problem is the interpretation of the matrix which is generated by "hc$merge" command.

Thanking You
-Tarun

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Feb 12 00:41:17 2006

This archive was generated by hypermail 2.1.8 : Tue 14 Feb 2006 - 00:37:30 EST