[R] kmeans and incom,plete distance matrix concern

From: Ffenics <ffenics2002_at_yahoo.co.uk>
Date: Tue 08 Aug 2006 - 00:38:27 EST


Hi there
I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created

i.e:

[
mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2),  dimnames = list(levels(DF$V1), levels(DF$V2))) mat[cbind(DF$V1, DF$V2)] <- DF$V3
This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering.

My query is this: not all the data for the initial matrix (x) exists and therefore the matrix is not fully populated - empty cells are populated with '0's.

Could someone please tell me how this may affect the result from the dist() command - because a '0' in a distance matrix means that the two variables are identical doesnt it(?) - but I dont want tthings clustered together simply because there was no information.

Is this a problem and are there ways to circumnavigate them? Thanks

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Aug 08 00:42:46 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 08 Aug 2006 - 06:19:27 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.