Re: [R] finding clusters in a network

From: Gabor Csardi <csardi_at_rmki.kfki.hu>
Date: Fri 05 May 2006 - 23:50:03 EST

Hi,

i would recommend to use the igraph package, i think it can be done with the others as well, but i don't know them too much.

If your ids are numeric, be sure that the first id is zero. Then create a matrix of your data with the parentid in the first column and the id in the second and omit the NA's. So you need something like this:

from1 to1
from2 to2

.
.
.

from1 -> from to1 is the first directed edge, from2 -> to2 the second, etc. If you have that (say in variable 'el') then it is pretty straightforward. If you have 'N' different ids

library(igraph)
g <- graph(t(el), n=N)
cl1 <- clusters(g, mode="weak")
cl2 <- clusters(g, mode="strong")

These calculate weakly and strongly connected components in your graph. Strongly connected means that in has to be possible to reach each node from each other node via a directed path. In the weakly connected case, a single undirected path between nodes is enough to be in the same component.

In cl1 you have cl1$membership which contains the id of its cluster for every node (cluster ids start with zero) and cl1$csize contains the sizes of the clusters. The same applies to cl2.

Hope this helps,
Gabor

On Thu, May 04, 2006 at 06:02:14PM -0400, jv37@columbia.edu wrote:
> Hello,
>
> I have data with 1500+ observations. Each observation has two
> attributes _id_, a self identifier, and _parentid_, identifying a
> single association with a previous observation. The value of
> _parentid_ can either be the _id_ value of that single associated
> observation or NA. Different observations can be associated with
> the same previous observation and share the same value for
> _parentid_. In this way, these 1500+ observations form a directed
> graph made up of several disconnected clusters (of various sizes)
> and several isolates. I need to identify these disconnected
> clusters of two or more observations. I have been trying to figure
> out how to use the network, sna, and kinship packages, but, without
> little time before finals to read the relevant literature, I am
> desperate for some helpful advice.
>
> Thank you in advance--
>
> John
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Csardi Gabor <csardi@rmki.kfki.hu>    MTA RMKI, ELTE TTK

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri May 05 23:54:05 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 06 May 2006 - 02:09:59 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.