Re: [R] Extracting clusters from Data Frame

From: Gustaf Rydevik <>
Date: Mon, 10 Dec 2007 15:06:03 +0100

On Dec 10, 2007 2:28 PM, Johannes Graumann <> wrote:
> Hello,
> I have a large data frame (1006222 rows), which I subject to a crude
> clustering attempt that results in a vector stating whether the datapoint
> represented by a row belongs to a cluster or not. Conceptually this looks
> something like this:
> Value Cluster?
> 0.01 FALSE
> 0.03 TRUE
> 0.04 TRUE
> 0.05 TRUE
> 0.07 FALSE
> ...
> What I'm looking for is an efficient strategy to extract all consecutive
> rows associated with "TRUE" as a single cluster (data.frame
> representation?) without cluttering memory with thousends of data.frames.
> I was thinking of an independent data.frame that would contain a column of
> lists that reference all indexes from the big one which are contained in
> one cluster ...
> Can anyone kindly nudge me and let me know how to deal with this
> efficiently?
> Joh

How about :<-sample(c(TRUE,FALSE),100,replace=T) Cluster<-data.frame(c.ndx=cumsum(rle($lengths),c.size=rle($lengths,c.type=rle($values) Cluster<-Cluster[Cluster$c.type==TRUE,]

##Then, to get all original data belonging to cluster three:[rev(Cluster[3,"c.ndx"]-seq(length.out=Cluster[3,"c.size"])+1)]

Not the neatest solution, but I'm sure someone here can improve on it. /Gustaf

Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 10 Dec 2007 - 14:13:36 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 11 Dec 2007 - 20:30:18 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.