Re: [R] kmeans and incom,plete distance matrix concern

From: Ffenics <ffenics2002_at_yahoo.co.uk>
Date: Tue 08 Aug 2006 - 01:43:10 EST


I still don't quite understand. I thought kmeans algorithm went something like this:

Iterate until stable :
Determine the centroid coordinate

Determine the distance of each object to the centroids

Group the object based on minimum distance

         So, why do I not want a distance matrix?

Christian Hennig <chrish@stats.ucl.ac.uk> wrote: On Mon, 7 Aug 2006, Ffenics wrote:

> well then i dont understand because everything i have read so far suggests that you use the dist() function to create a matrix based on the euclideam distance and then the kmeans() function.

kmeans requires a data matrix where cases are rows and variables are columns. (If you understand what kmeans does, you should know why - means can't be computed from distances.)

I'm not sure about the NA behaviour. I guess NAs produce an error? (Try it ou!)
Anyway, I'd think about casewise deletion or imputation if I had to run kmeans on data with missing values.

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Aug 08 01:53:11 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 08 Aug 2006 - 02:21:44 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.