Re: [R] Clustering algorithms don't find obvious clusters

From: Joris Meys <jorismeys_at_gmail.com>
Date: Sun, 13 Jun 2010 16:07:44 +0200

Henrik,

the methods you use are NOT applicable to directed graphs, in the contrary even. They will split up what you want to put together. In your data, an author never cites himself. Hence, A and B are far more different than B and D according to the techniques you use.

Please check out Etiennes solution, that is what you want. Cheers
Joris

On Sat, Jun 12, 2010 at 8:43 PM, Henrik Aldberg <henrik.aldberg_at_gmail.com> wrote:
> Dave,
>
> I used daisy with the default settings (daisy(M) where M is the matrix).
>
>
> Henrik
>
> On 11 June 2010 21:57, Dave Roberts <dvrbts_at_ecology.msu.montana.edu> wrote:
>
>> Henrik,
>>
>>    The clustering algorithms you refer to (and almost all others) expect
>> the matrix to be symmetric.  They do not seek a graph-theoretic solution,
>> but rather proximity in geometric or topological space.
>>
>>    How did you convert y9oru matrix to a dissimilarity?
>>
>> Dave Roberts
>>
>> Henrik Aldberg wrote:
>>
>>> I have a directed graph which is represented as a matrix on the form
>>>
>>>
>>> 0 4 0 1
>>>
>>> 6 0 0 0
>>>
>>> 0 1 0 5
>>>
>>> 0 0 4 0
>>>
>>>
>>> Each row correspond to an author (A, B, C, D) and the values says how many
>>> times this author have cited the other authors. Hence the first row says
>>> that author A have cited author B four times and author D one time. Thus
>>> the
>>> matrix represents two groups of authors: (A,B) and (C,D) who cites each
>>> other. But there is also a weak link between the groups. In reality this
>>> matrix is much bigger and very sparce but it still consists of distinct
>>> groups of authors.
>>>
>>>
>>> My problem is that when I cluster the matrix using pam, clara or agnes the
>>> algorithms does not find the obvious clusters. I have tried to turn it
>>> into
>>> a dissimilarity matrix before clustering but that did not help either.
>>>
>>>
>>> The layout of the clustering is not that important to me, my primary
>>> interest is the to get the right nodes into the right clusters.
>>>
>>>
>>>
>>> Sincerely
>>>
>>>
>>> Henrik
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> -
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys_at_Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 13 Jun 2010 - 14:09:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 13 Jun 2010 - 14:10:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive