From: Tal Galili <tal.galili_at_gmail.com>

Date: Tue, 15 Jun 2010 19:26:59 +0300

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Jun 2010 - 16:29:24 GMT

Date: Tue, 15 Jun 2010 19:26:59 +0300

Hello Hadley, Tormod and every one else.

I just published a post on my blog, giving the code and presenting an example of it's use (on the Iris data set) http://www.r-statistics.com/2010/06/clustergram-a-graph-for-visualizing-cluster-analyses-r-code/

I welcome any comments (pitfalls, suggestions or ideas) regarding this method of visualizing cluster analysis in the hope that all of us can learn from each others knowledge.

And again, thank you Hadley for offering your advice.

Best,

Tal

----------------Contact Details:-------------------------------------------------------Contact me: Tal.Galili_at_gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English)

On Tue, Jun 15, 2010 at 5:04 PM, Tal Galili <tal.galili_at_gmail.com> wrote:

> Hi Hadley,

*>
**> I wrapped the code into a function.
**> I made it so all the lines would always start from the cluster mean.
**> And I tried to give more meaning to the colors by giving the
**> color according the the order of the first principal component of that
**> observation.
**>
**> What do you think ?
**>
**> Tal
**>
**>
**>
**>
**> # -------------------------------
**>
**>
**> clustergram <- function(Data, k.range = 2:10 ,
**> clustering.function = kmeans,
**> line.width = .004, add.center.points = T)
**> {
**> n <- dim(Data)[1]
**> PCA.1 <- Data %*% princomp(Data)$loadings[,1] # first principal component
**> of our data
**>
**>
**> COL <- heat_hcl(n)[order(PCA.1)] # line colors
**>
**> line.width <- rep(line.width, n)
**> Y <- NULL # Y matrix
**> X <- NULL # X matrix
**>
**> plot(0,0, col = "white", xlim = range(k.range), ylim = range(PCA.1),
**> xlab = "Number of clusters (k)", ylab = "Mean of the first principal
**> component by clusters", main = "Clustergram of first principal component
**> mean by k-mean clusters")
**> axis(side =1, at = k.range)
**> abline(v = k.range, col = "grey")
**>
**> centers.points <- list()
**>
**> for(k in k.range)
**> {
**> cl <- clustering.function(Data, k)
**> clusters.vec <- cl$cluster
**> # the.centers <- apply(cl$centers,1, mean)
**> the.centers <- cl$centers %*% princomp(Data)$loadings[,1]
**>
**> noise <- unlist(tapply(line.width, clusters.vec,
**> cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
**> # noise <- noise - mean(range(noise))
**> y <- the.centers[clusters.vec] + noise
**> Y <- cbind(Y, y)
**> x <- rep(k, length(y))
**> X <- cbind(X, x)
**>
**> centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k))
**> # points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5)
**> }
**>
**> require(colorspace)
**> matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)
**>
**> if(add.center.points)
**> {
**> # add points
**> suppressMessages(lapply(centers.points, function(xx) {
**> with(xx,points(y~x, pch = 19, col = "red", cex = 1.3))
**> return(1)
**> }))
**> }
**>
**> }
**>
**>
**> set.seed(250)
**> Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
**> matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
**> clustergram(Data, k.range = 2:8 , line.width = .004, add.center.points =
**> T)
**>
**>
**>
**>
**>
**>
**>
**> ----------------Contact
**> Details:-------------------------------------------------------
**> Contact me: Tal.Galili_at_gmail.com | 972-52-7275845
**> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
**> www.r-statistics.com (English)
**>
**> ----------------------------------------------------------------------------------------------
**>
**>
**>
**>
**> On Tue, Jun 15, 2010 at 4:46 PM, Hadley Wickham <hadley_at_rice.edu> wrote:
**>
**>> > The glitches are the cases where you would have a bundle of lines
**>> belonging
**>> > to a specific cluster, but had spaces between them (because the place of
**>> one
**>> > of the lines was saved for another line that in the meantime moved to
**>> > another cluster).
**>>
**>> I think that display looked just fine!
**>>
**>> > I just came up with a solution for how to resolve this (After showering,
**>> it
**>> > tends to help my thinking...) - it is attached at the bottom of this
**>> e-mail.
**>> > I will later cleanup the code a bit and publish it.
**>>
**>> I'd also suggest reordering the lines within each cluster mean so that
**>> (e.g.) all the lines going from 1a to 2a are all in the same position
**>> (i.e. at the top of the bundle of lines, not interspersed throughout).
**>>
**>> And again, think about using the colour for something useful, maybe
**>> the value of the variable that you're averaging over to get the y
**>> position.
**>>
**>> Hadley
**>>
**>> --
**>> Assistant Professor / Dobelman Family Junior Chair
**>> Department of Statistics / Rice University
**>> http://had.co.nz/
**>>
**>
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Jun 2010 - 16:29:24 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 15 Jun 2010 - 17:30:32 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*