From: Tal Galili <tal.galili_at_gmail.com>

Date: Tue, 15 Jun 2010 16:22:58 +0300

> > k number of clusters, but that causes glitches in the produced image (run

> > code to see).

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Jun 2010 - 13:25:44 GMT

Date: Tue, 15 Jun 2010 16:22:58 +0300

Hi Hadley,

Thanks for replying.

The glitches are the cases where you would have a bundle of lines belonging to a specific cluster, but had spaces between them (because the place of one of the lines was saved for another line that in the meantime moved to another cluster).

I just came up with a solution for how to resolve this (After showering, it tends to help my thinking...) - it is attached at the bottom of this e-mail.

I will later cleanup the code a bit and publish it.

Best,

Tal

#----------------------------------------

set.seed(100)

Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),

matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) colnames(x) <- c("x", "y")

# noise <- runif(100,0,.05)

line.width <- rep(.004, dim(Data)[1])

Y <- NULL

X <- NULL

k.range <- 2:10

plot(0,0, col = "white", xlim = c(1,10), ylim = c(-.5,1.6),
xlab = "Number of clusters", ylab = "Clusters means", main = "(Basic)
Clustergram")

axis(side =1, at = k.range)

abline(v = k.range, col = "grey")

centers.points <- list()

for(k in k.range)

{

cl <- kmeans(Data, k)

clusters.vec <- cl$cluster

the.centers <- apply(cl$centers,1, mean)

noise <- unlist(tapply(line.width, clusters.vec,
cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
noise <- noise - mean(range(noise))

y <- the.centers[clusters.vec] + noise

Y <- cbind(Y, y)

x <- rep(k, length(y))

X <- cbind(X, x)

centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k)) # points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5) }

require(colorspace)

COL <- rainbow_hcl(100)

matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)

# add points

lapply(centers.points, function(xx) {with(xx,points(y~x, pch = 19, col =
"red", cex = 1.3))})

----------------Contact Details:-------------------------------------------------------Contact me: Tal.Galili_at_gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English)

On Tue, Jun 15, 2010 at 3:45 PM, Hadley Wickham <hadley_at_rice.edu> wrote:

> > My current solution is to use a constant jitter (based on "seq") on all

> the

> > k number of clusters, but that causes glitches in the produced image (run

> my

> > code to see).

> > What are the glitches? It looks pretty good to me. (I'm not sure if > the colour does anything apart from make it pretty though). > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > [[alternative HTML version deleted]] ______________________________________________R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Jun 2010 - 13:25:44 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 15 Jun 2010 - 14:00:33 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*