Re: [R] Graphics question: How to create a changing "smudge factor" for overlapping lines?

From: Tal Galili <tal.galili_at_gmail.com>
Date: Tue, 15 Jun 2010 17:04:22 +0300

Hi Hadley,

I wrapped the code into a function.
I made it so all the lines would always start from the cluster mean. And I tried to give more meaning to the colors by giving the color according the the order of the first principal component of that observation.

What do you think ?

Tal

# -------------------------------


clustergram <- function(Data, k.range = 2:10 ,  clustering.function = kmeans,
line.width = .004, add.center.points = T)
{

n <- dim(Data)[1]
 PCA.1 <- Data %*% princomp(Data)$loadings[,1] # first principal component of our data

COL <- heat_hcl(n)[order(PCA.1)] # line colors

 line.width <- rep(line.width, n)
 Y <- NULL # Y matrix
 X <- NULL # X matrix

plot(0,0, col = "white", xlim = range(k.range), ylim = range(PCA.1), xlab = "Number of clusters (k)", ylab = "Mean of the first principal component by clusters", main = "Clustergram of first principal component mean by k-mean clusters")
 axis(side =1, at = k.range)
abline(v = k.range, col = "grey")

 centers.points <- list()

for(k in k.range)
{

 cl <- clustering.function(Data, k)
 clusters.vec <- cl$cluster
 # the.centers <- apply(cl$centers,1, mean) the.centers <- cl$centers %*% princomp(Data)$loadings[,1]

noise <- unlist(tapply(line.width, clusters.vec, cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]  # noise <- noise - mean(range(noise))
y <- the.centers[clusters.vec] + noise
 Y <- cbind(Y, y)
x <- rep(k, length(y))
X <- cbind(X, x)

centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k))  # points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5) }

require(colorspace)
matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)

if(add.center.points)
{

# add points
 suppressMessages(lapply(centers.points, function(xx) { with(xx,points(y~x, pch = 19, col = "red", cex = 1.3))  return(1)
}))
}

}

set.seed(250)
Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),

           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) clustergram(Data, k.range = 2:8 , line.width = .004, add.center.points = T)

----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili_at_gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English)

On Tue, Jun 15, 2010 at 4:46 PM, Hadley Wickham <hadley_at_rice.edu> wrote:

> > The glitches are the cases where you would have a bundle of lines
> belonging
> > to a specific cluster, but had spaces between them (because the place of
> one
> > of the lines was saved for another line that in the meantime moved to
> > another cluster).
>
> I think that display looked just fine!
>
> > I just came up with a solution for how to resolve this (After showering,
> it
> > tends to help my thinking...) - it is attached at the bottom of this
> e-mail.
> > I will later cleanup the code a bit and publish it.
>
> I'd also suggest reordering the lines within each cluster mean so that
> (e.g.) all the lines going from 1a to 2a are all in the same position
> (i.e. at the top of the bundle of lines, not interspersed throughout).
>
> And again, think about using the colour for something useful, maybe
> the value of the variable that you're averaging over to get the y
> position.
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Jun 2010 - 14:14:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 Jun 2010 - 17:20:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive