# Re: [R] Graphics question: How to create a changing "smudge factor" for overlapping lines?

From: Tal Galili <tal.galili_at_gmail.com>
Date: Tue, 15 Jun 2010 17:04:22 +0300

I wrapped the code into a function.
I made it so all the lines would always start from the cluster mean. And I tried to give more meaning to the colors by giving the color according the the order of the first principal component of that observation.

What do you think ?

Tal

```# -------------------------------

```

clustergram <- function(Data, k.range = 2:10 ,  clustering.function = kmeans,
line.width = .004, add.center.points = T)
{

n <- dim(Data)
PCA.1 <- Data %*% princomp(Data)\$loadings[,1] # first principal component of our data

COL <- heat_hcl(n)[order(PCA.1)] # line colors

line.width <- rep(line.width, n)
Y <- NULL # Y matrix
X <- NULL # X matrix

plot(0,0, col = "white", xlim = range(k.range), ylim = range(PCA.1), xlab = "Number of clusters (k)", ylab = "Mean of the first principal component by clusters", main = "Clustergram of first principal component mean by k-mean clusters")
axis(side =1, at = k.range)
abline(v = k.range, col = "grey")

centers.points <- list()

for(k in k.range)
{

cl <- clustering.function(Data, k)
clusters.vec <- cl\$cluster

noise <- unlist(tapply(line.width, clusters.vec, cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]  # noise <- noise - mean(range(noise))
y <- the.centers[clusters.vec] + noise
Y <- cbind(Y, y)
x <- rep(k, length(y))
X <- cbind(X, x)

centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k))  # points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5) }

require(colorspace)
matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)

{

suppressMessages(lapply(centers.points, function(xx) { with(xx,points(y~x, pch = 19, col = "red", cex = 1.3))  return(1)
}))
}

}

set.seed(250)
Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),

matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) clustergram(Data, k.range = 2:8 , line.width = .004, add.center.points = T)

```----------------Contact
Details:-------------------------------------------------------
```
Contact me: Tal.Galili_at_gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English)

On Tue, Jun 15, 2010 at 4:46 PM, Hadley Wickham <hadley_at_rice.edu> wrote:

```> > The glitches are the cases where you would have a bundle of lines
> belonging
> > to a specific cluster, but had spaces between them (because the place of
> one
> > of the lines was saved for another line that in the meantime moved to
> > another cluster).
>
> I think that display looked just fine!
>
> > I just came up with a solution for how to resolve this (After showering,
> it
> > tends to help my thinking...) - it is attached at the bottom of this
> e-mail.
> > I will later cleanup the code a bit and publish it.
>
> I'd also suggest reordering the lines within each cluster mean so that
> (e.g.) all the lines going from 1a to 2a are all in the same position
> (i.e. at the top of the bundle of lines, not interspersed throughout).
>
> And again, think about using the colour for something useful, maybe
> the value of the variable that you're averaging over to get the y
> position.
>
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University