From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Fri, 02 Jul 2010 18:50:28 -0400

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 02 Jul 2010 - 22:53:09 GMT

Date: Fri, 02 Jul 2010 18:50:28 -0400

In kmeans() in stats one gets an error message with the default
clustering algorithm if centers = 1. Its often useful to calculate
the sum of squares for 1 cluster, 2 clusters, etc. and this error
complicates things since one has to treat 1 cluster as a special case.
A second reason is that easily getting the 1 cluster sum of squares
makes it easy to calculate the between cluster sum of squares when
there is more than 1 cluster.

I suggest adding the line marked ### to the source code of kmeans (the other lines shown are just ther to illustrate context). Adding this line forces kmeans to use the code for algorithm 3 if centers is 1. This is useful since unlike the code for the default algorithm, the code for algorithm 3 succeeds for centers = 1.

if (centers == 1) nmeth <- 3 ### k <- centers

Also note that KMeans in Rcmdr produces a betweenss and a tot.withinss and it would be nice if kmeans in stats did that too:

> library(Rcmdr)

> str(KMeans(USArrests, 3))

List of 6

$ cluster : Named int [1:50] 1 1 1 2 1 2 3 1 1 2 ...
..- attr(*, "names")= chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
$ centers : num [1:3, 1:4] 11.81 8.21 4.27 272.56 173.29 ...

..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:3] "1" "2" "3" .. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape" $ withinss : num [1:3] 19564 9137 19264 $ size : int [1:3] 16 14 20 $ tot.withinss: num 47964 <================= $ betweenss : num 307844 <=================

- attr(*, "class")= chr "kmeans"

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 02 Jul 2010 - 22:53:09 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 05 Jul 2010 - 18:30:11 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*