From: Liaw, Andy <andy_liaw_at_merck.com>

Date: Thu 01 Jul 2004 - 11:46:16 EST

diag(ans) <- 1

ans

}

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 01 11:50:17 2004

Date: Thu 01 Jul 2004 - 11:46:16 EST

*> From: daniel@sintesys.com.ar
**>
**> Hi,
*

> I´m new in R. I´m working with similarity coefficients for clustering

*> items. I created one function (coef), to calculate the
**> coefficients from
**> two pairs of vectors and then, as an example, the function
**> simple_matching,
**> taking a data.frame(X) and using coef in a for cicle.
**> It works, but I believe it is a bad way to do so (I believe
**> the for cicle
**> is not necessary). Somebody can suggest anything better.
**> Thanks
**> Daniel Rozengardt
**>
**> coef<-function(x1,x2){a<-sum(ifelse(x1==1&x2==1,1,0));
**> b<-sum(ifelse(x1==1&x2==0,1,0));
**> c<-sum(ifelse(x1==0&x2==1,1,0));
**> d<-sum(ifelse(x1==0&x2==0,1,0));
**> ret<-cbind(a,b,c,d);
**> ret
**> }
**>
**> simple_matching<-function(X) {
**> ret<-matrix(ncol=dim(X)[1],nrow=dim(X)[1]);
**> diag(ret)<-1;
**> for (i in 2:length(X[,1])) {
**> for (j in i:length(X[,1])) {
**> vec<-coef(X[i-1,],X[j,]);
**> result<-(vec[1]+vec[3])/sum(vec);
**> ret[i-1,j]<-result;
**> ret[j,i-1]<-result}};
**> ret}
*

A few comments first:

- Unless you are putting multiple statements on the same line, there's no need to use ";".
- In `coef' (which is a bad choice for a function name: There's a built-in generic function by that name in R, for extracting coefficients from fitted model objects), a, b, c and d are scalars. You don't need to cbind() them; c() works just fine.
- One of the best strategies for efficiency is to vectorize. Try to formulate the problem in matrix/vector operations as much as possible.
- The computation looks a bit odd to me. Assuming the data are binary (i.e., all 0s and 1s), you are computing (N11 + N01) / N, where N is the length of the vectors, N11 is the number of 1-1 matches and N01 is the number of 0-1 matches. Are you sure that's what you want to compute?

Here's what I'd do (assuming the input matrix contains all 0s and 1s):

simple_matching <- function(X) {

N11 <- crossprod(t(X)) N01 <- crossprod(t(X), t(1-X)) ans <- (N11 + N01) / ncol(X)

diag(ans) <- 1

ans

}

**HTH,
**

Andy

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 01 11:50:17 2004

*
This archive was generated by hypermail 2.1.8
: Wed 03 Nov 2004 - 22:54:38 EST
*