Re: [R] Cluster analysis, factor variables, large data set

From: Hans Ekbrand <>
Date: Thu, 31 Mar 2011 20:48:02 +0200

On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote:
> Dear Hans,
> clara doesn't require a distance matrix as input (and therefore
> doesn't require you to run daisy), it will work with the raw data
> matrix using
> Euclidean distances implicitly.
> I can't tell you whether Euclidean distances are appropriate in this
> situation (this depends on the interpretation and variables and
> particularly on how they are scaled), but they may be fine at least
> after some transformation and standardisation of your variables.

The variables are unordered factors, stored as integers 1:9, where

1 means "Full-time employment"
2 means "Part-time employment"
3 means "Student"
4 means "Full-time self-employee"


Does euclidean distances make sense on unordered factors coded as integers? mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu 31 Mar 2011 - 19:06:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 Mar 2011 - 19:30:25 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive