Re: [R] Very Slow Gower Similarity Function

From: Jari Oksanen <jari.oksanen_at_oulu.fi>
Date: Tue 19 Apr 2005 - 05:00:10 EST

On 18 Apr 2005, at 20:36, Anon. wrote:

> Jari Oksanen wrote:
>
>>
>> On 18 Apr 2005, at 19:10, Tyler Smith wrote:
>>
>>> Hello,
>>>
>>> I am a relatively new user of R. I have written a basic function to
>>> calculate
>>> the Gower similarity function. I was motivated to do so partly as an
>>> excercise
>>> in learning R, and partly because the existing option (vegdist in
>>> the vegan
>>> package) does not accept missing values.
>>>
>> Speed is the reason to use C instead of R. It should be easy, almost
>> trivial, to modify the vegdist.c so that it handles missing values.
>> I guess this handling means ignoring the value pair if one of the
>> values is missing -- which is not so gentle to the metric properties
>> so dear to Gower. Package vegan is designed for ecological community
>> data which generally do not have missing values (except in
>> environmental data), but contributions are welcome.
>>
> The only reason you never see ecological community data with missing
> values is because the ecologists remove those species/sites from their
> Excel sheets before they give it to you to sort out their mess.

Well, ecologists have plenty of missing species in their community data, but these have zero values since they were not observed. I guess some Bob O'Hara is going to have a paper about this in JAE.

> This is actually one of the few things they know how to do in Excel -
> I'm dreading the day when a paper appears in JAE saying that you can
> use Excel to produce P-values.
>
The "A" in "JAE" stands for "Animal": for real things they still have Journal of Ecology.

> To be slightly more serious, as an exercise the OP could consider
> writing a wrapper function in R that removes the missing data and then
> calls vegdist to calculate his Gower similarity index.
>
The looping goes within C code, and for pairwise deletion of missing values wrapping is difficult. With complete.cases this is trivial (and then your result would be more metric as well).

--
Jari Oksanen, Oulu, Finland

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Apr 19 05:04:38 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:15 EST