Re: [R] millions of comparisons, speed wanted

From: Adrian DUSA <dusa.adrian_at_gmail.com>
Date: Sun 18 Dec 2005 - 00:57:09 EST

The daisy function is _very_ good!
I have been able to use it for nominal variables as well, simply by: daisy(input)*ncol(input)

Now, for very large number of rows (say 5000), daisy works for about 3 minutes using the swap space. I probably need more RAM (only 512 on my computer). But at least I get a result... :)

For relatively small input matrices, it increased the speed by a factor of 3. Way to go!

Best,
Adrian

On 12/16/05, Martin Maechler <maechler@stat.math.ethz.ch> wrote:
> I have not taken the time to look into this example,
> but
> daisy()
> from the (recommended, hence part of R) package 'cluster'
> is more flexible than dist(), particularly in the case of NAs
> and for (a mixture of continuous and) categorical variables.
>
> It uses a version of Gower's formula in order to deal with NAs
> and asymmetric binary variables. The example below look like
> very well matching to this problem.
>
> Regards,
> Martin Maechler, ETH Zurich



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Dec 18 06:12:07 2005

This archive was generated by hypermail 2.1.8 : Sun 18 Dec 2005 - 09:29:49 EST