Re: [R] cluster in R

From: Christian Hennig <chrish_at_stats.ucl.ac.uk>
Date: Wed 18 Oct 2006 - 23:26:48 GMT

Dear Weiwei,

> btw, ?cluster.stats does not work on my Mac machine.
>> version
> _
> platform i386-apple-darwin8.6.1
> arch i386
> os darwin8.6.1
> system i386, darwin8.6.1
> status
> major 2
> minor 3.1
> year 2006
> month 06
> day 01
> svn rev 38247
> language R
> version.string Version 2.3.1 (2006-06-01)

Because I don't have access to a Mac, I can't tell you anything about this, unfortunately.
I always thought that my package should work on all platforms if it passes all the standard tests for packages?
(Is there anyone else who could comment on this please?)

> I have a sample like this
>> dim(dd.df)

> [1] 142 28
>
> and I want to cluster rows;
> first of all, I followed the examples for cluster.stats() by
> d.dd <- dist(dd.df) # use Euclidean
> d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
> cluster.stats(d.dd, d.4) # gives me some results like this:
>
> $cluster.size
> [1] 133 5 2 2
>
> $avg.silwidth
> [1] 0.9857916
>
> but when I tried to use pearson dist here, by visualization, i think 4
> or 5 clusters are good for pearson dist, but it gave me a very bad
> avg.siqlwidth
>
> d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
> $cluster.size
> [1] 86 31 6 19
>
> $avg.silwidth
> [1] -0.09543089

cor can give negative values, which doesn't fit the usual definition of a distance. I don't know what as.dist does in this case, but I think that, depending on your application, you should rather use the absolute value of the correlation, or 1+cor.

> btw, what's $seperation? where can I find the detailed explanation on
> the output from cluster.stats?

This is documented on the cluster.stats help page:

separation: vector of clusterwise minimum distances of a point in the

           cluster to a point of another cluster.

Best regards,
Christian


R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Oct 19 09:38:12 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 19 Oct 2006 - 08:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.