Re: [R] k-means: should columns in dataset be in same scale?

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed, 23 Apr 2008 06:46:23 +0100 (BST)

k-means uses Euclidean distance, so scaling of the variables does matter. Whether you want to standardize depends on the example (as it does in most multivariate analysis problems, e.g. PCA has the same issues).

On Tue, 22 Apr 2008, Johan Jackson wrote:

> Hi all,
>
> Simple question re k-means. If I have a data set with columns that are on
> different scales (say col 1 has var=100 and col2 var=2), will this make a
> difference to the k-means algorithm? It seems as though it does. If so,
> should we first standardize the columns of the dataset so that each column
> is given equal weight?
>
> JJ

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 23 Apr 2008 - 05:58:06 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Apr 2008 - 08:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive