RE: [R] Developing functions

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Thu 01 Jul 2004 - 11:46:16 EST


> From: daniel@sintesys.com.ar
>
> Hi,
> Im new in R. Im working with similarity coefficients for clustering
> items. I created one function (coef), to calculate the
> coefficients from
> two pairs of vectors and then, as an example, the function
> simple_matching,
> taking a data.frame(X) and using coef in a for cicle.
> It works, but I believe it is a bad way to do so (I believe
> the for cicle
> is not necessary). Somebody can suggest anything better.
> Thanks
> Daniel Rozengardt
>
> coef<-function(x1,x2){a<-sum(ifelse(x1==1&x2==1,1,0));
> b<-sum(ifelse(x1==1&x2==0,1,0));
> c<-sum(ifelse(x1==0&x2==1,1,0));
> d<-sum(ifelse(x1==0&x2==0,1,0));
> ret<-cbind(a,b,c,d);
> ret
> }
>
> simple_matching<-function(X) {
> ret<-matrix(ncol=dim(X)[1],nrow=dim(X)[1]);
> diag(ret)<-1;
> for (i in 2:length(X[,1])) {
> for (j in i:length(X[,1])) {
> vec<-coef(X[i-1,],X[j,]);
> result<-(vec[1]+vec[3])/sum(vec);
> ret[i-1,j]<-result;
> ret[j,i-1]<-result}};
> ret}

A few comments first:

  1. Unless you are putting multiple statements on the same line, there's no need to use ";".
  2. In `coef' (which is a bad choice for a function name: There's a built-in generic function by that name in R, for extracting coefficients from fitted model objects), a, b, c and d are scalars. You don't need to cbind() them; c() works just fine.
  3. One of the best strategies for efficiency is to vectorize. Try to formulate the problem in matrix/vector operations as much as possible.
  4. The computation looks a bit odd to me. Assuming the data are binary (i.e., all 0s and 1s), you are computing (N11 + N01) / N, where N is the length of the vectors, N11 is the number of 1-1 matches and N01 is the number of 0-1 matches. Are you sure that's what you want to compute?

Here's what I'd do (assuming the input matrix contains all 0s and 1s):

simple_matching <- function(X) {

    N11 <- crossprod(t(X))
    N01 <- crossprod(t(X), t(1-X))
    ans <- (N11 + N01) / ncol(X)

    diag(ans) <- 1
    ans
}

HTH,
Andy



R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 01 11:50:17 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:54:38 EST