[R] rules for optimizing samples in CLARA (size and numbers) ?

From: vincent vicaire <vincent.vicaire_at_gmail.com>
Date: Thu, 03 Jun 2010 12:32:21 +0200


With a 9000 observations dataset, I have noticed a significant variability in the silhouette index when I change the default value for samples (5 default value) and sampsize (40+2*clusters number) in CLARA.

Is there somes rules according to the number of cluster and observations to fix samples and sampsize parameters efficiently, so as to avoid under- and oversampling with CLARA in one hand and keeping a good time running in other hand ?

I didn't not find any rules of this type on the web (except avoiding biaised samples...).

Gratefully yours.

        [[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 03 Jun 2010 - 10:47:57 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jun 2010 - 11:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive