# RE: [R] Developing functions

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Thu 01 Jul 2004 - 11:46:16 EST

> From: daniel@sintesys.com.ar
>
> Hi,
> I´m new in R. I´m working with similarity coefficients for clustering
> items. I created one function (coef), to calculate the
> coefficients from
> two pairs of vectors and then, as an example, the function
> simple_matching,
> taking a data.frame(X) and using coef in a for cicle.
> It works, but I believe it is a bad way to do so (I believe
> the for cicle
> is not necessary). Somebody can suggest anything better.
> Thanks
> Daniel Rozengardt
>
> coef<-function(x1,x2){a<-sum(ifelse(x1==1&x2==1,1,0));
> b<-sum(ifelse(x1==1&x2==0,1,0));
> c<-sum(ifelse(x1==0&x2==1,1,0));
> d<-sum(ifelse(x1==0&x2==0,1,0));
> ret<-cbind(a,b,c,d);
> ret
> }
>
> simple_matching<-function(X) {
> ret<-matrix(ncol=dim(X),nrow=dim(X));
> diag(ret)<-1;
> for (i in 2:length(X[,1])) {
> for (j in i:length(X[,1])) {
> vec<-coef(X[i-1,],X[j,]);
> result<-(vec+vec)/sum(vec);
> ret[i-1,j]<-result;
> ret[j,i-1]<-result}};
> ret}

1. Unless you are putting multiple statements on the same line, there's no need to use ";".
2. In `coef' (which is a bad choice for a function name: There's a built-in generic function by that name in R, for extracting coefficients from fitted model objects), a, b, c and d are scalars. You don't need to cbind() them; c() works just fine.
3. One of the best strategies for efficiency is to vectorize. Try to formulate the problem in matrix/vector operations as much as possible.
4. The computation looks a bit odd to me. Assuming the data are binary (i.e., all 0s and 1s), you are computing (N11 + N01) / N, where N is the length of the vectors, N11 is the number of 1-1 matches and N01 is the number of 0-1 matches. Are you sure that's what you want to compute?

Here's what I'd do (assuming the input matrix contains all 0s and 1s):

simple_matching <- function(X) {

```    N11 <- crossprod(t(X))
N01 <- crossprod(t(X), t(1-X))
ans <- (N11 + N01) / ncol(X)
```

diag(ans) <- 1
ans
}

HTH,
Andy

R-help@stat.math.ethz.ch mailing list