Re: [R] deviance vs entropy

About this list Date view Thread view Subject view Author view

From: RemoteAPL (
Date: Fri 16 Feb 2001 - 10:24:46 EST

Message-ID: <008201c097ae$ddb00d20$0200a8c0@suse>


Thank you for your answer. It gave some food to my brain. Let me ask more...

> I'm not quite sure what you have in mind, but I'm inferring from your
comments that by "deviance"
> you mean:
> -SUM p_i log (p_i/q_i) (or -2 SUM p_i log (p_i/q_i))

I am sorry for my language. I meant in particular those deviance which is
calculated when
we select split of a node building classification tree. As far as I know it
should be:

 -2 SUM n_i log (p_i)

where n_i is number of points of class c_i at this node and p_i=n_i/N, where
N is total number
of cases at this node. Probably it refers some way to what you wrote above.
May you tell more
on p_i and q_i in your formula?

> D(p_i||q_i) = - SUM p_i log p_i + SUM p_i log q_i = H(p) - H(p:q)
> where H(p) is entropy of p, and H(p:q) is the cross entropy. If q is the
uniform distribution, then
> the cross entropy reduces to:

I probably understand this and the next statements if I understand the first

> I'm guessing that in the things you've read, when they are talking about
deviance, q can (and
> generally is) something other than the uniform distribution. For example,
p is often the empirical
> distribution of a data sample, and q is the distribution corresponding to
some induced model. Then
> D(p||q) is a measure of how far the model is from the observed data.

It sounds interesting. May you please repeat this in terms of classification
trees? I mean what is
"induced model" and "corresponding distribution" if we are speaking on CART?

> entropy (entropy - cross_entropy, or KL-divergence). Statisticians are
interested in deviance
What "KL" stands for?

> because (with the factor of 2) it is asymptotically chi-square for many
modeling families. In
That's probably the most important argument PRO.

> information theoretic terms it's nice to think of the deviance as the
number of bits extra that it
> would take to transmit the data for a system assuming the distribution q,
relative to a system that
> had assumed p, which is the best system for transmitting that particular
data set.
Very interesting! I must think over this more.

> Then again, maybe I've misunderstood you completely. Please set me
straight if I have.
I see that you sit quite straight. I am afraid that I lie horizontally:-)


r-help mailing list -- Read
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:

About this list Date view Thread view Subject view Author view

This archive was generated by hypermail 2b30 : Fri 22 Jun 2001 - 18:58:33 EST