**From:** Warren R. Greiff (*greiff@mitre.org*)

**Date:** Fri 16 Feb 2001 - 02:51:49 EST

**Next message:**Peter Dalgaard BSA: "Re: [R] Reading single precision floats from binary file"**Previous message:**Thomas Lumley: "Re: [R] deviance vs entropy"**In reply to:**RemoteAPL: "[R] deviance vs entropy"**Next in thread:**RemoteAPL: "Re: [R] deviance vs entropy"**Reply:**RemoteAPL: "Re: [R] deviance vs entropy"

Message-ID: <3A8C0925.89FA0678@mitre.org>

*> RemoteAPL wrote:
*

*>
*

*> Hello,
*

*>
*

*> The question looks like simple. It's probably even stupid. But I spent several hours
*

*> searching Internet, downloaded tons of papers, where deviance is mentioned and...
*

*> And haven't found an answer.
*

*>
*

*> Well, it is clear for me the using of entropy when I split some node of a classification tree.
*

*> The sense is clear, because entropy is an old good measure of how uniform is distribution.
*

*> And we want, for sure, the distribution to be uniform, represent one class only as the best.
*

*>
*

*> Where deviance come from at all? I look at a formula and see that the only difference to
*

*> entropy is use of *number* of each class points, instead of *probability* as a multiplier
*

*> of log(Pik). So, it looks like the deviance and entropy differ by factor 1/N (or 2/N), where
*

*> N is total number of cases. Then WHY to say "deviance"? Any historical reason?
*

*> Or most likely I do not understand something very basic. Please, help.
*

*>
*

*> Thanks,
*

*> Alexander Skomorokhov,
*

*>
*

I'm not quite sure what you have in mind, but I'm inferring from your comments that by "deviance"

you mean:

-SUM p_i log (p_i/q_i) (or -2 SUM p_i log (p_i/q_i))

In information theoretic terms this is:

D(p_i||q_i) = - SUM p_i log p_i + SUM p_i log q_i = H(p) - H(p:q)

where H(p) is entropy of p, and H(p:q) is the cross entropy. If q is the uniform distribution, then

the cross entropy reduces to:

-SUM p_i log q_i = -SUM p_i log 1/N = log(N) SUM p_i = log(N)

If q_i is the uniform distribution, then you get:

D(p||q) = H(p) + log(N).

I'm guessing that in the things you've read, when they are talking about deviance, q can (and

generally is) something other than the uniform distribution. For example, p is often the empirical

distribution of a data sample, and q is the distribution corresponding to some induced model. Then

D(p||q) is a measure of how far the model is from the observed data.

Note that the cross entropy corresponds to the likelihood, L(data;q), of the data (which has an

empirical distribution of p) having been produced by the induced model q. The entropy corresponds

to the likelihood, L(data;p), of the data having been produced by a saturated model, one which fits

the empirical distribution of the data perfectly. So the deviance:

D(p||q) = log L(data;p) - log L(data;q) = log [L(data;p) / L(data;q)]

is a measure of how far the model is from being as good as it could be. Statisticians tend to think

in terms of the ratio of likelihoods, Machine Learning folks tend to think in terms of relative

entropy (entropy - cross_entropy, or KL-divergence). Statisticians are interested in deviance

because (with the factor of 2) it is asymptotically chi-square for many modeling families. In

information theoretic terms it's nice to think of the deviance as the number of bits extra that it

would take to transmit the data for a system assuming the distribution q, relative to a system that

had assumed p, which is the best system for transmitting that particular data set.

Then again, maybe I've misunderstood you completely. Please set me straight if I have.

-warren

- text/x-vcard attachment: Card for Warren R. Greiff

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

Send "info", "help", or "[un]subscribe"

(in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

**Next message:**Peter Dalgaard BSA: "Re: [R] Reading single precision floats from binary file"**Previous message:**Thomas Lumley: "Re: [R] deviance vs entropy"**In reply to:**RemoteAPL: "[R] deviance vs entropy"**Next in thread:**RemoteAPL: "Re: [R] deviance vs entropy"**Reply:**RemoteAPL: "Re: [R] deviance vs entropy"

*
This archive was generated by hypermail 2b30
: Fri 22 Jun 2001 - 18:58:33 EST
*