Re: [R] Cluster on both categorical and numerical data

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Wed, 18 Jun 2008 13:02:45 +0100

On Wed, 2008-06-18 at 12:43 +0100, Gavin Simpson wrote:
> On Wed, 2008-06-18 at 03:45 -0700, Birgitle wrote:
> > You could have a look at library(analogue) , function ?distance
>
> Thanks for the plug Birgit, but (and I say this as the author of
> distance), if you just want to compute a dissimilarity matrix using
> Gower's coefficient for mixed data, use daisy() from recommended package
> cluster because i) as cluster is recommended you don't need to install
> further packages, and ii) I haven't done timings, but daisy() will be
> much faster, and potentially use less memory, than distance() because
> daisy() is in compiled FORTRAN and is doing half the computations that
> distance does, which uses a pure R approach.
>
> distance was written with a very specific use-case in mind; of
> dissimilarities between rows of matrix A and rows of matrix B. That it
> does full dissimilarity matrix computation when provided a single matrix
> is a side effect (one that I intend to keep however).
>
> Eventually, distance will move to compiled C code, but that is
> immediately below "Learn C" on the ever lengthening TODO list ;-)
>
> >
> > and library (cluster), function ?agnes
>
> I think you mean daisy() here. agnes() is for /clustering/.

I meant to say here:

You pass agnes() (or other clustering function that takes a dissimilarity matrix) the output from daisy(). agnes() itself can't do the mixed-mode dissimilarity.

?hclust is another solution in base R for doing the clustering, once you have the dissimilarity matrix.

G

>
> G
>
> >
> > B.
> >
> >
> > Chua Siang Li wrote:
> > >
> > >
> > > Hello there. Is there any function in R that can do cluster on a set
> > > of
> > > data that has both categorical and numerical variables? thanks.
> > > siangli
> > > ______________________________________________
> > > R-help_at_r-project.org mailing list
> > >
https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> > -----
> > The art of living is more like wrestling than dancing.
> > (Marcus Aurelius)

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 18 Jun 2008 - 12:10:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 18 Jun 2008 - 12:30:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive