# [R] distribution overlap - how to quantify?

From: Doktor, Daniel <d.doktor03_at_imperial.ac.uk>
Date: Thu 25 Jan 2007 - 14:53:03 GMT

Dear R-Users,

my objective is to measure the overlap/divergence of two probability density functions, p1(x) and p2(x). One could apply the chi-square test or determine the potential mixture components and then compare the respective means and sigmas. But I was rather looking for a simple measure of similarity.
Therefore, I used the concept of 'intrinsic discrepancy' which is defined as:

\delta{p_{1},p_{2}} = min
\left\{ \int_{\chi}p_{1}(x)\log \frac{p_{1}(x)}{p_{2}(x)}dx, \int_{\chi}p_{2}(x)\log\frac{p_{2}(x)}{p_{1}(x)}dx \right\}

The smaller the delta the more similar are the distributions (0 when identical). I implemented this in 'R' using an adaptation of the Kullback-Leibler divergence. The function works, I get the expected results.

The question is how to interpret the results. Obviously a delta of 0.5 reflects more similarity than a delta of 2.5. But how much more? Is there some kind of a statistical test for such an index (other than a simulation based evaluation)?

Daniel

Daniel Doktor
PhD Student
Imperial College
Royal School of Mines Building, DEST, RRAG Prince Consort Road
London SW7 2BP, UK
tel: 0044-(0)20-7589-5111-59276(ext)

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Jan 26 02:09:17 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 25 Jan 2007 - 15:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.