Re: [R] millions of comparisons, speed wanted

From: Adrian DUSA <>
Date: Sun 18 Dec 2005 - 00:57:09 EST

The daisy function is _very_ good!
I have been able to use it for nominal variables as well, simply by: daisy(input)*ncol(input)

Now, for very large number of rows (say 5000), daisy works for about 3 minutes using the swap space. I probably need more RAM (only 512 on my computer). But at least I get a result... :)

For relatively small input matrices, it increased the speed by a factor of 3. Way to go!


On 12/16/05, Martin Maechler <> wrote:
> I have not taken the time to look into this example,
> but
> daisy()
> from the (recommended, hence part of R) package 'cluster'
> is more flexible than dist(), particularly in the case of NAs
> and for (a mixture of continuous and) categorical variables.
> It uses a version of Gower's formula in order to deal with NAs
> and asymmetric binary variables. The example below look like
> very well matching to this problem.
> Regards,
> Martin Maechler, ETH Zurich mailing list PLEASE do read the posting guide! Received on Sun Dec 18 06:12:07 2005

This archive was generated by hypermail 2.1.8 : Sun 18 Dec 2005 - 09:29:49 EST