**From:** RemoteAPL (*remoteapl@obninsk.com*)

**Date:** Fri 16 Feb 2001 - 10:24:46 EST

**Next message:**Denis White: "[R] polygon border colors"**Previous message:**RemoteAPL: "Re: [R] deviance vs entropy"**In reply to:**Warren R. Greiff: "Re: [R] deviance vs entropy"

Message-ID: <008201c097ae$ddb00d20$0200a8c0@suse>

Warren,

Thank you for your answer. It gave some food to my brain. Let me ask more...

*> I'm not quite sure what you have in mind, but I'm inferring from your
*

comments that by "deviance"

*> you mean:
*

*>
*

*> -SUM p_i log (p_i/q_i) (or -2 SUM p_i log (p_i/q_i))
*

I am sorry for my language. I meant in particular those deviance which is

calculated when

we select split of a node building classification tree. As far as I know it

should be:

-2 SUM n_i log (p_i)

where n_i is number of points of class c_i at this node and p_i=n_i/N, where

N is total number

of cases at this node. Probably it refers some way to what you wrote above.

May you tell more

on p_i and q_i in your formula?

*> D(p_i||q_i) = - SUM p_i log p_i + SUM p_i log q_i = H(p) - H(p:q)
*

*>
*

*> where H(p) is entropy of p, and H(p:q) is the cross entropy. If q is the
*

uniform distribution, then

*> the cross entropy reduces to:
*

I probably understand this and the next statements if I understand the first

formula.

*> I'm guessing that in the things you've read, when they are talking about
*

deviance, q can (and

*> generally is) something other than the uniform distribution. For example,
*

p is often the empirical

*> distribution of a data sample, and q is the distribution corresponding to
*

some induced model. Then

*> D(p||q) is a measure of how far the model is from the observed data.
*

It sounds interesting. May you please repeat this in terms of classification

trees? I mean what is

"induced model" and "corresponding distribution" if we are speaking on CART?

*> entropy (entropy - cross_entropy, or KL-divergence). Statisticians are
*

interested in deviance

What "KL" stands for?

*> because (with the factor of 2) it is asymptotically chi-square for many
*

modeling families. In

That's probably the most important argument PRO.

*> information theoretic terms it's nice to think of the deviance as the
*

number of bits extra that it

*> would take to transmit the data for a system assuming the distribution q,
*

relative to a system that

*> had assumed p, which is the best system for transmitting that particular
*

data set.

Very interesting! I must think over this more.

*> Then again, maybe I've misunderstood you completely. Please set me
*

straight if I have.

I see that you sit quite straight. I am afraid that I lie horizontally:-)

Regards,

Alexander.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

Send "info", "help", or "[un]subscribe"

(in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

**Next message:**Denis White: "[R] polygon border colors"**Previous message:**RemoteAPL: "Re: [R] deviance vs entropy"**In reply to:**Warren R. Greiff: "Re: [R] deviance vs entropy"

*
This archive was generated by hypermail 2b30
: Fri 22 Jun 2001 - 18:58:33 EST
*