[R] Elbow criterion plots for determining k in hierarchical clustering

From: Guera <jeppesen_becky_at_hotmail.com>
Date: Sat, 08 Mar 2008 15:01:41 -0800 (PST)

Hi There,

I'm working on some cluster analyses on a large data-set using hclust with Wards method and Manhattan (city block) distance measures. I've created dendrograms to illustrate the clustering criteria, but would like to create a plot to examine for the classic elbow criterion to use in determining the best number of clusters. Ideally I'd like to plot percent variance explained (y axis) against number of clusters (x axis).

Is there a way to do this in R base or cluster packages that I'm missing? As an alternative I've attempted to write a function for the purpose, but am unable to find a way to determine the within group variance for each cluster and total variance (needed to compute variance explained).

I'm new to R in the last month or so and greatly appreciate any advice you can give me. I've included my code for a subset of the data below (in which k=4 as an example)
Thanks in advance,

> HClf_dn <- hclust(dist(model.matrix(~-1 + f_dn1+f_dn2+f_dn3+f_dn4,
> CwdDbh), method= "manhattan") , method= "ward")
> plot(HClf_dn, main= "Cluster Dendrogram for Solution HClf_dn", xlab=
> "Observation Number in Data Set CwdDbh", sub="Method=ward;
> Distance=city-block")
> summary(as.factor(cutree(HClf_dn, k = 4))) # Cluster Sizes
> by(model.matrix(~-1 + f_dn1 + f_dn2 + f_dn3 + f_dn4, CwdDbh),
> as.factor(cutree(HClf_dn, k = 4)), mean) # Cluster Centroids
> biplot(princomp(model.matrix(~-1 + f_dn1 + f_dn2 + f_dn3 + f_dn4,
> CwdDbh)), xlabs = as.character(cutree(HClf_dn, k = 4)))

Rebecca Jeppesen, MSc Candidate
Acadia University
Wolfville, N.S.
View this message in context: http://www.nabble.com/Elbow-criterion-plots-for-determining-k-in-hierarchical-clustering-tp15921695p15921695.html
Sent from the R help mailing list archive at Nabble.com.

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat 08 Mar 2008 - 23:04:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 14 Mar 2008 - 18:30:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive