Re: [R] Extracting clusters from Data Frame

From: Johannes Graumann <johannes_graumann_at_web.de>
Date: Tue, 11 Dec 2007 20:36:39 +0100

Gustaf Rydevik wrote:

> On Dec 10, 2007 2:28 PM, Johannes Graumann <johannes_graumann@web.de>
> wrote:
>> Hello,
>>
>> I have a large data frame (1006222 rows), which I subject to a crude
>> clustering attempt that results in a vector stating whether the datapoint
>> represented by a row belongs to a cluster or not. Conceptually this looks
>> something like this:
>> Value Cluster?
>> 0.01 FALSE
>> 0.03 TRUE
>> 0.04 TRUE
>> 0.05 TRUE
>> 0.07 FALSE
>> ...
>> What I'm looking for is an efficient strategy to extract all consecutive
>> rows associated with "TRUE" as a single cluster (data.frame
>> representation?) without cluttering memory with thousends of data.frames.
>> I was thinking of an independent data.frame that would contain a column
>> of lists that reference all indexes from the big one which are contained
>> in one cluster ...
>> Can anyone kindly nudge me and let me know how to deal with this
>> efficiently?
>>
>> Joh
>>
>
> How about :
> orig.data<-sample(c(TRUE,FALSE),100,replace=T)
>
Cluster<-data.frame(c.ndx=cumsum(rle(orig.data)$lengths),c.size=rle(orig.data)$lengths,c.type=rle(orig.data)$values)
> Cluster<-Cluster[Cluster$c.type==TRUE,]
>
> ##Then, to get all original data belonging to cluster three:
> orig.data[rev(Cluster[3,"c.ndx"]-seq(length.out=Cluster[3,"c.size"])+1)]
>
>
> Not the neatest solution, but I'm sure someone here can improve on it.
> /Gustaf

Thank you for this example! "rle" was indeed what safed my day!

Joh



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 11 Dec 2007 - 19:41:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 11 Dec 2007 - 20:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.