From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>

Date: Thu 11 May 2006 - 04:17:35 EST

dimnames(initial) <- list(NULL, dimnames(swiss.x)[[2]]) initial

}

centroids[i,]<-c/n

}

rownames(centroids)<-c(1:ntypes)

colnames(centroids)<-colnames(data)

centroids

}

clust.means <- function(x, res.clust, groups) {

if(!is.matrix(x))

x <- as.matrix(x)

means <- tapply(x, list(rep(cutree(res.clust, groups), ncol(x)),

dimnames(means) <- list(NULL, dimnames(x)[[2]]) return(as.data.frame(means))

}

Date: Thu 11 May 2006 - 04:17:35 EST

On Wed, 2006-05-10 at 18:59 +0200, Moritz Lennert wrote:

> Replying to myself for the record:

*>
**> Moritz Lennert wrote:
**> > Hello,
**> >
**> > Can someone point me to documentation or ideas on how to calculate the
**> > centroids of clusters identified with hclust ?
**> >
**> > I would like to be able to chose the number of clusters (in the style of
**> > cutree) and then get the centroids of these clusters.
**> >
**> > This seems like a quite obvious task to me, but I haven't been able to
**> > put my hands on a relevant command.
*

Sorry, Moritz, I meant to reply to your original post, but deleted it from my emailer accidentally and hadn't had chance to use the archives to follow up.

col(swiss.x)), mean)

dimnames(initial) <- list(NULL, dimnames(swiss.x)[[2]]) initial

Which gives almost the same output as your function:

fun <- function (data, clust) {

nvars=length(data[1,])

ntypes=max(clust)

centroids<-matrix(0,ncol=nvars,nrow=ntypes)
for(i in 1:ntypes) {

c<-rep(0,nvars)

n<-0

for(j in names(clust[clust==i])) {

n<-n+1 c<-c+data[j,]

}

centroids[i,]<-c/n

}

rownames(centroids)<-c(1:ntypes)

colnames(centroids)<-colnames(data)

centroids

}

fun(swiss.x, cutree(h, 3))

Wrapping the Venables & Ripley version into a function to give the same output as your function:

## ## clust.means - function to find centroids of clusters ## based on example by Venables & Ripley, MASS 4thEd, Page 318 [1] ## ## x = input data as data.frame or matrix ## res.clust = object of class "hclust" ## groups = number of groups to cut dendrogram into ## ## References: ## ## [1] Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics ## with S. 4th Edition. Springer.

clust.means <- function(x, res.clust, groups) {

if(!is.matrix(x))

x <- as.matrix(x)

means <- tapply(x, list(rep(cutree(res.clust, groups), ncol(x)),

col(x)), mean)

dimnames(means) <- list(NULL, dimnames(x)[[2]]) return(as.data.frame(means))

}

clust.means(swiss, h, 3)

> system.time(for(i in 1:10000) fun(swiss.x, cutree(h, 3)))

[1] 8.917 0.000 9.695 0.000 0.000

*>
*

> system.time(for(i in 1:10000) clust.means(swiss, h, 3))

[1] 31.642 0.008 35.348 0.000 0.000

**HTH
**
G

*>
*

> Here's a simple function that does the job for me:

*>
**> Variables:
**>
**> data: matrix of original (absolute value) data introduced into hclust or
**> HierClust
**> clust: result of a 'cutree' call on the results of the hclust or
**> HierClust call
**>
**> Value:
**>
**> a matrix of relative values of the variables at the centroids of the types
**>
**>
**> function (data, clust) {
**> nvars=length(data[1,])
**> ntypes=max(clust)
**> centroids<-matrix(0,ncol=nvars,nrow=ntypes)
**> for(i in 1:ntypes) {
**> c<-rep(0,nvars)
**> n<-0
**> for(j in names(clust[clust==i])) {
**> n<-n+1
**> c<-c+data[j,]
**> }
**> centroids[i,]<-c/n
**> }
**> rownames(centroids)<-c(1:ntypes)
**> colnames(centroids)<-colnames(data)
**> centroids
**> }
**>
**> Moritz
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% * Note new Address, Telephone & Fax numbers from 6th April 2006 * %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson ECRC & ENSIS [t] +44 (0)20 7679 0522 UCL Department of Geography [f] +44 (0)20 7679 0565 Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. [w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT. >%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Thu May 11 04:22:35 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Mon 15 May 2006 - 02:10:05 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*