Re: [R] Cluster analysis, factor variables, large data set

From: Hans Ekbrand <hans_at_sociologi.cjb.net>
Date: Thu, 31 Mar 2011 20:48:02 +0200

On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote:
> Dear Hans,
>
> clara doesn't require a distance matrix as input (and therefore
> doesn't require you to run daisy), it will work with the raw data
> matrix using
> Euclidean distances implicitly.
> I can't tell you whether Euclidean distances are appropriate in this
> situation (this depends on the interpretation and variables and
> particularly on how they are scaled), but they may be fine at least
> after some transformation and standardisation of your variables.

The variables are unordered factors, stored as integers 1:9, where

1 means "Full-time employment"
2 means "Part-time employment"
3 means "Student"
4 means "Full-time self-employee"

...

Does euclidean distances make sense on unordered factors coded as integers?



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 31 Mar 2011 - 19:06:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 Mar 2011 - 19:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive