# [R] script problem to obtain pairs of overlap values

From: Rogério Rosa da Silva <rogeriorosas_at_gmail.com>
Date: Wed 12 Jul 2006 - 05:11:53 EST

Dear,

I wrote a code to estimate the overlap between two kernel distributions. The script must estimates the overlap among each columns of data frame. With S sampled species (columns) in my data frame, I want obtain S(S-1)/2 pairs of overlap values between species. However, the code is not well write at all (only an overlap value is produced) and I can't find the solution.

To illustrate the calculations, I use the data frame "tdon" and the value of the bandwidth "h", which was estimated in other part of script.

tdon <- data.frame (sp.1=c (5 ,9 ,NA ,5, 11) , sp.2=c (4, 2, 4, NA, 11, ),sp.3=c(5, 4, 2, 6, 13), sp.4=c(3 , 11, NA, 5, 3), sp.5=c(2 ,5 ,2, 9, 9))

> h

[1] 1.047 2.973 0.887 1.520 2.955

Here is the code:

for (i in 1:(nbcol-1)) # nbcol<-ncol(tdon)

{tdon1<-tdon[,i]
tdon11<- subset(tdon1,tdon1!="NA")
fctk1<-function(x)
{density (tdon11, bw=h[i], kernel="gaussian")\$y}

for (j in (i+1):nbcol)

{tdon2<-tdon[,j]
tdon21<- subset(tdon2,tdon2!="NA")
fctk2<-function(x)
{density (tdon21, bw=h[j], kernel="gaussian")\$y}

```        diffctk<-function(x)
{abs(fctk1(x)-fctk2(x))}

intctk<- approxfun (diffctk(x), rule=2)
int<- integrate(diffctk,-Inf,Inf)\$value
overlap<- 1 - 0.5* int
}
}

```

The use of "approxfun" to integrate the difference in the estimated density values (my "diffctk" function) was suggested by Thomas Lumley, but I'm not sure that I have found the solution or if this solution is correct for my problem.

I need that the "overlap" produce a vector with the length equal to 10, with all pairs of overlap values.

Any help or advice on improvement for this code will be appreciated.

With kind regards,

Rogério

R-help@stat.math.ethz.ch mailing list