On Wed, 2006-05-10 at 18:59 +0200, Moritz Lennert wrote:

> Replying to myself for the record:

*>
**> Moritz Lennert wrote:
**> > Hello,
**> >
**> > Can someone point me to documentation or ideas on how to calculate the
**> > centroids of clusters identified with hclust ?
**> >
**> > I would like to be able to chose the number of clusters (in the style of
**> > cutree) and then get the centroids of these clusters.
**> >
**> > This seems like a quite obvious task to me, but I haven't been able to
**> > put my hands on a relevant command.
*

Sorry, Moritz, I meant to reply to your original post, but deleted it from my emailer accidentally and hadn't had chance to use the archives to follow up.

fun(swiss.x, cutree(h, 3))

Wrapping the Venables & Ripley version into a function to give the same output as your function:

## ## clust.means - function to find centroids of clusters ## based on example by Venables & Ripley, MASS 4thEd, Page 318 [1] ## ## x = input data as data.frame or matrix ## res.clust = object of class "hclust" ## groups = number of groups to cut dendrogram into ## ## References: ## ## [1] Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics ## with S. 4th Edition. Springer.

clust.means <- function(x, res.clust, groups) {

if(!is.matrix(x))

x <- as.matrix(x)

means <- tapply(x, list(rep(cutree(res.clust, groups), ncol(x)),

col(x)), mean)

dimnames(means) <- list(NULL, dimnames(x)[[2]]) return(as.data.frame(means))

}

clust.means(swiss, h, 3)

> system.time(for(i in 1:10000) fun(swiss.x, cutree(h, 3)))

[1] 8.917 0.000 9.695 0.000 0.000

*>
*

> system.time(for(i in 1:10000) clust.means(swiss, h, 3))

[1] 31.642 0.008 35.348 0.000 0.000

**HTH
**
G

*>
*

> Here's a simple function that does the job for me:

*>
**> Variables:
**>
**> data: matrix of original (absolute value) data introduced into hclust or
**> HierClust
**> clust: result of a 'cutree' call on the results of the hclust or
**> HierClust call
**>
**> Value:
**>
**> a matrix of relative values of the variables at the centroids of the types
**>
**>
**> function (data, clust) {
**> nvars=length(data[1,])
**> ntypes=max(clust)
**> centroids<-matrix(0,ncol=nvars,nrow=ntypes)
**> for(i in 1:ntypes) {
**> c<-rep(0,nvars)
**> n<-0
**> for(j in names(clust[clust==i])) {
**> n<-n+1
**> c<-c+data[j,]
**> }
**> centroids[i,]<-c/n
**> }
**> rownames(centroids)<-c(1:ntypes)
**> colnames(centroids)<-colnames(data)
**> centroids
**> }
**>
**> Moritz
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

