Re: [Rd] kmeans

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Mon, 05 Jul 2010 18:54:36 +0200

>>>>> Gabor Grothendieck <ggrothendieck_at_gmail.com> >>>>> on Fri, 2 Jul 2010 18:50:28 -0400 writes:

> In kmeans() in stats one gets an error message with the default
> clustering algorithm if centers = 1. Its often useful to calculate
> the sum of squares for 1 cluster, 2 clusters, etc. and this error
> complicates things since one has to treat 1 cluster as a special case.
> A second reason is that easily getting the 1 cluster sum of squares
> makes it easy to calculate the between cluster sum of squares when
> there is more than 1 cluster.

> I suggest adding the line marked ### to the source code of kmeans (the
> other lines shown are just ther to illustrate context). Adding this
> line forces kmeans to use the code for algorithm 3 if centers is 1.
> This is useful since unlike the code for the default algorithm, the
> code for algorithm 3 succeeds for centers = 1.

> if(length(centers) == 1) {
> if (centers == 1) nmeth <- 3 ###
> k <- centers

I agree that this is a reasonable improvement, and have applied this (+ docu + example) to the R-devel sources.

Thank you, Gabor.

> Also note that KMeans in Rcmdr produces a betweenss and a tot.withinss
> and it would be nice if kmeans in stats did that too:

Well, patches (to the R-devel *sources*) are happily accepted

Martin

    >> library(Rcmdr)
    >> str(KMeans(USArrests, 3))

> List of 6
> $ cluster : Named int [1:50] 1 1 1 2 1 2 3 1 1 2 ...
> ..- attr(*, "names")= chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
> $ centers : num [1:3, 1:4] 11.81 8.21 4.27 272.56 173.29 ...
> ..- attr(*, "dimnames")=List of 2
> .. ..$ : chr [1:3] "1" "2" "3"
> .. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
> $ withinss : num [1:3] 19564 9137 19264
> $ size : int [1:3] 16 14 20
> $ tot.withinss: num 47964 <=================
> $ betweenss : num 307844 <=================
> - attr(*, "class")= chr "kmeans"

> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 05 Jul 2010 - 17:02:42 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 05 Jul 2010 - 21:00:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive