# Re: [R] Very Slow Gower Similarity Function

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Tue 19 Apr 2005 - 05:33:58 EST

>>>>> "Tyler" == Tyler Smith <tyler.smith@mail.mcgill.ca>
>>>>> on Mon, 18 Apr 2005 12:10:34 -0400 writes:

```    Tyler> Hello, I am a relatively new user of R. I have
Tyler> written a basic function to calculate the Gower
Tyler> similarity function. I was motivated to do so partly
Tyler> as an excercise in learning R, and partly because the
Tyler> existing option (vegdist in the vegan package) does
Tyler> not accept missing values.

```
```    Tyler> I think I have succeeded - my function gives me the
Tyler> correct values. However, now that I'm starting to use
Tyler> it with real data, I realise it's very slow. It takes
Tyler> more than 45 minutes on my Windows 98 machine (R
Tyler> 2.0.1 Patched (2005-03-29)) with a 185x32 matrix with
Tyler> ca 100 missing values. If anyone can suggest ways to
Tyler> speed up my function I would appreciate it. I suspect
Tyler> having a pair of nested for loops is the problem, but
Tyler> I couldn't figure out how to get rid of them.

```

Tyler> The function is:

Tyler> ### Gower Similarity Matrix###

Tyler> sGow <- function (mat){

```    Tyler> OBJ <- nrow(mat) #number of objects MATDESC <- ncol
Tyler> (mat) #number of descriptors MRANGE <- apply
Tyler> (mat,2,max, na.rm=T)-apply (mat,2,min,na.rm=T) #descr
Tyler> ranges DESCRIPT <- 1:MATDESC #descriptor index vector
Tyler> smat <- matrix(1, nrow = OBJ, ncol = OBJ) #'empty'
Tyler> similarity matrix

```

Tyler> for (i in 1:OBJ){ for (j in i:OBJ){

```    Tyler>     ##calculate index vector of non-NA descriptors
Tyler> between objects i and j descvect <- intersect
Tyler> (setdiff (DESCRIPT,
Tyler> DESCRIPT[is.na(mat[i,DESCRIPT])]), setdiff (DESCRIPT,
Tyler> DESCRIPT[is.na (mat[j,DESCRIPT])]))

Tyler>     descnum <- length(descvect) # number of valid
```
Tyler> descr for i~j comparison

Tyler> partialsim <- (1-
Tyler> abs(mat[i,descvect]-mat[j,descvect])/MRANGE[descvect])

Tyler> smat[i,j] <- smat[j,i] <- sum (partialsim) /     Tyler> descnum } } smat }

Tyler> Tyler

Tyler> -- Tyler Smith

Tyler> PhD Candidate Plant Science Department McGill     Tyler> University

Tyler> tyler.smith@mail.mcgill.ca

```    Tyler> ______________________________________________
Tyler> R-help@stat.math.ethz.ch mailing list