# Re: [R] Clustering algorithms don't find obvious clusters

From: Dave Roberts <dvrbts_at_ecology.msu.montana.edu>
Date: Sat, 12 Jun 2010 14:54:26 -0600

Henrik,

Given your initial matrix, that should tell you which authors are similar/dissimilar to which other authors in terms of which authors they cite. In this case authors 1 and 3 are most similar because they both cite authors 2 and 4. Authors 2 and 3 are most different because they both cite 6 authors but none of the same authors (sqrt(6^2+5^2+1^2)=7.87). 1 and 2 are next most different because 1 only cites 5 authors but shares none with 2 (sqrt(6^2+4^2+1^2)=7.28) etc.

If you want to know which authors are similar in terms of who gas cited them, simply transpose the matrix

daisy(t(M))

I'm guessing none of this is actually what you are looking for however, and Etienne's graph theoretic approach may be more what you have in mind.

Dave

```David W. Roberts                                     office 406-994-4548
Department of Ecology                         email droberts_at_montana.edu
```
Montana State University
Bozeman, MT 59717-3460

Henrik Aldberg wrote:
> Dave,
>
> I used daisy with the default settings (daisy(M) where M is the matrix).
>
>
> Henrik
>
> On 11 June 2010 21:57, Dave Roberts <dvrbts_at_ecology.msu.montana.edu
> <mailto:dvrbts_at_ecology.msu.montana.edu>> wrote:
>
> Henrik,
>
> The clustering algorithms you refer to (and almost all others)
> expect the matrix to be symmetric. They do not seek a
> graph-theoretic solution, but rather proximity in geometric or
> topological space.
>
> How did you convert y9oru matrix to a dissimilarity?
>
> Dave Roberts
>
> Henrik Aldberg wrote:
>
> I have a directed graph which is represented as a matrix on the form
>
>
> 0 4 0 1
>
> 6 0 0 0
>
> 0 1 0 5
>
> 0 0 4 0
>
>
> Each row correspond to an author (A, B, C, D) and the values
> says how many
> times this author have cited the other authors. Hence the first
> row says
> that author A have cited author B four times and author D one
> time. Thus the
> matrix represents two groups of authors: (A,B) and (C,D) who
> cites each
> other. But there is also a weak link between the groups. In
> reality this
> matrix is much bigger and very sparce but it still consists of
> distinct
> groups of authors.
>
>
> My problem is that when I cluster the matrix using pam, clara or
> agnes the
> algorithms does not find the obvious clusters. I have tried to
> turn it into
> a dissimilarity matrix before clustering but that did not help
> either.
>
>
> The layout of the clustering is not that important to me, my primary
> interest is the to get the right nodes into the right clusters.
>
>
>
> Sincerely
>
>
> Henrik
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org <mailto:R-help_at_r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> -
>
>

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 12 Jun 2010 - 21:00:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 12 Jun 2010 - 22:00:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.