Re: [R] R and Clusters

From: Gabor Csardi <>
Date: Mon, 7 Jan 2008 15:45:53 +0100

Lorenzo, why can't you actually generate the graph to find the connection components? With the 'igraph' package this is something like:

g <- graph.adjacency( DIST < 0.5, mode="undirected" ) g <- simplify(g)

assuming you have your distance matrix in 'DIST'. If N is too big then you don't really want to create the distance matrix, but use a more sophisticated approach to find the points that are close to each other than your threshold; then create the graph. So why using clustering algorithms?

Btw. from your vector of points you can create the distance matrix by using the 'outer' function.

Btw2. your algorithm for graph generation is similar to geometric random graphs, see function '' in 'igraph'.


On Mon, Jan 07, 2008 at 03:26:57PM +0100, Lorenzo Isella wrote:
> Dear All,
> I hope I am not asking a FAQ. I am dealing with a problem of graph
> theory [connected components in a non-directed graph] and I do not
> want to rediscover the wheel.
> I saw a large number of R packages dealing for instance with the
> k-means method or hierarchical clustering for spatially distributed
> data and I am basically facing a similar problem.
> I am given a set of data which are the positions of particles in 3
> dimensions; I define two particles A and B to be directly connected if
> their Euclidean distance is below a certain threshold d. If A and B
> are directly connected and B and C are directly connected, then A,B
> and C are connected components (physically it means that they are
> members of the same cluster).
> All my N particles then split into k disjointed clusters, each with a
> certain number of connected components, and this is what I want to
> investigate.
> I do not know a priori how many clusters I have (this is my problem
> with e.g. k-means since k is an output for me); the only input is the
> set of 3-dimensional particle positions and a threshold distance.
> The algorithm/package I am looking should return the number of
> clusters and the composition of each cluster, e.g. the fact that the
> second cluster is made up of particles {R,T,L}.
> Consider for instance:
> # a 2-dimensional example
> x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
> matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
> colnames(x) <- c("x", "y")
> How can I then find out how many connected components I have when my
> threshold distance is d=0.5?
> Many thanks
> Lorenzo
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Csardi Gabor <>    MTA RMKI, ELTE TTK

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 07 Jan 2008 - 14:50:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Jan 2008 - 17:30:05 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive