Re: [R] cluster in R

From: Christian Hennig <chrish_at_stats.ucl.ac.uk>
Date: Thu 19 Oct 2006 - 10:37:31 GMT

On Wed, 18 Oct 2006, Weiwei Shi wrote:

> Dear Chris:
>
> I tried to use cor+1 but it still gives me sil width < 0 for average.

Well, then it seems that the clustering is not that good. I don't know your data and there is no theoretical reason why it has to be positive. You should read the Kaufman and Rousseeuw book to understand the average silhouette width better.

Best wishes,
Christian

>
>> set.seed(1000)
>> t9 <- cor(t(x), method="pearson")+1 # here i add 1
>> t8 <- as.dist(t9)
>> t7 <- cutree(hclust(t8), 4)
>> cluster.stats(t8, t7)$avg.silwidth
> [1] -0.008750826
>> set.seed(1000)
>> t9 <- cor(t(x), method="pearson") # here I did not add 1
>> t8 <- as.dist(t9)
>> t7 <- cutree(hclust(t8), 4)
>> cluster.stats(t8, t7)$avg.silwidth
> [1] -0.09543089
>
> On 10/18/06, Weiwei Shi <helprhelp@gmail.com> wrote:
>> Dear Chris:
>>
>> thanks for the prompt reply!
>>
>> You are right, dist from pearson has negatives there, which I should
>> use cor+1 in my case (since negatively correlated genes should be
>> considered farthest). Thanks.
>>
>> as to the ?cluster.stats, I double-checked it and I found I need to
>> restart my JGR, until then the help page function starts to accept
>> newly loaded package, like fpc for this case.
>>
>> sorry for the confusion,
>>
>> weiwei
>>
>> On 10/18/06, Christian Hennig <chrish@stats.ucl.ac.uk> wrote:
>> > Dear Weiwei,
>> >
>> > > btw, ?cluster.stats does not work on my Mac machine.
>> > >> version
>> > > _
>> > > platform i386-apple-darwin8.6.1
>> > > arch i386
>> > > os darwin8.6.1
>> > > system i386, darwin8.6.1
>> > > status
>> > > major 2
>> > > minor 3.1
>> > > year 2006
>> > > month 06
>> > > day 01
>> > > svn rev 38247
>> > > language R
>> > > version.string Version 2.3.1 (2006-06-01)
>> >
>> > Because I don't have access to a Mac, I can't tell you anything about
>> > this, unfortunately.
>> > I always thought that my package should work on all platforms if it
>> passes
>> > all the standard tests for packages?
>> > (Is there anyone else who could comment on this please?)
>> >
>> > > I have a sample like this
>> > >> dim(dd.df)
>> > > [1] 142 28
>> > >
>> > > and I want to cluster rows;
>> > > first of all, I followed the examples for cluster.stats() by
>> > > d.dd <- dist(dd.df) # use Euclidean
>> > > d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
>> > > cluster.stats(d.dd, d.4) # gives me some results like this:
>> > >
>> > > $cluster.size
>> > > [1] 133 5 2 2
>> > >
>> > > $avg.silwidth
>> > > [1] 0.9857916
>> > >
>> > > but when I tried to use pearson dist here, by visualization, i think 4
>> > > or 5 clusters are good for pearson dist, but it gave me a very bad
>> > > avg.siqlwidth
>> > >
>> > > d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
>> > > $cluster.size
>> > > [1] 86 31 6 19
>> > >
>> > > $avg.silwidth
>> > > [1] -0.09543089
>> >
>> > cor can give negative values, which doesn't fit the usual definition
>> > of a distance. I don't know what as.dist does in this case, but I think
>> > that, depending on your application, you should rather use the absolute
>> > value of the correlation, or 1+cor.
>> >
>> > > btw, what's $seperation? where can I find the detailed explanation on
>> > > the output from cluster.stats?
>> >
>> > This is documented on the cluster.stats help page:
>> >
>> > separation: vector of clusterwise minimum distances of a point in the
>> > cluster to a point of another cluster.
>> >
>> > Best regards,
>> > Christian
>> >
>> >
>> > *** --- ***
>> > Christian Hennig
>> > University College London, Department of Statistical Science
>> > Gower St., London WC1E 6BT, phone +44 207 679 1698
>> > chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>> >
>>
>>
>> --
>> Weiwei Shi, Ph.D
>> Research Scientist
>> GeneGO, Inc.
>>
>> "Did you always know?"
>> "No, I did not. But I believed..."
>> ---Matrix III
>>
>
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>


R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Oct 19 21:12:44 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 19 Oct 2006 - 11:30:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.