**From:** array chip (*arrayprofile@yahoo.com*)

**Date:** Wed 07 May 2003 - 02:27:57 EST

**Next message:**Marc Schwartz: "RE: [R] "help.start() and kde konqueror""**Previous message:**Douglas Bates: "[R] Re: [Socsci-stat] Workshop on multilevel modeling in R"

Message-id: <20030506162757.72730.qmail@web41210.mail.yahoo.com>

Hi,

It seems my last post had formatting problem, so

hopefully this time it works:

I am trying to use the proportional-odds model

(cumulative logistic regression) using the "polr"

function in the MASS library of R.

I tried with the dataset of "housing" contained in the

MASS book where "Sat" (3 levels: low, medium, high) is

the dependent variable, "Infl" (low, medium, high),

"Type" (tower, apartment, atrium, terrace) and "Cont"

(low, high) are the predictor variables. And I have

some questions; hope someone could help me out. The

following commands are taken from the MASS book as

well. If you know R, it might be better to understand

my problem, if not, I hope it's still ok - my problem

is actually a statistically problem, not a programming

problem, ignore the codes, but just look at the

output.

For the dataset, the proportional odds model tried to

run cumulative logistic regression of the dependent

variable "Sat" on the 3 predictor variables "Infl",

"Type", and "Cont" without interactions. In R, the

lowest level of each variable is treated as the

reference level.

*> house.plr<-polr(Sat~Infl+Type+Cont,data=housing,
*

weights=Freq)

*> summary(house.plr)
*

Call: polr(formula = Sat ~ Infl + Type + Cont, data =

housing, weights = Freq)

Coefficients:

Value Std. Error t value

InflMedium 0.5663922 0.10465276 5.412109

InflHigh 1.2888137 0.12715609 10.135682

TypeApartment -0.5723552 0.11923800 -4.800107

TypeAtrium -0.3661907 0.15517331 -2.359882

TypeTerrace -1.0910073 0.15148595 -7.202036

ContHigh 0.3602803 0.09553577 3.771156

Intercepts:

Value Std. Error t value

Low|Medium -0.4961 0.1248 -3.9740

Medium|High 0.6907 0.1255 5.5049

Residual Deviance: 3479.149

AIC: 3495.149

I also tried to predict the probabilities of the 3

categories of the dependent variable "Sat" for every

combination of the 3 predictor variables using the

predict function in R:

*> hnames<-lapply(housing[,-5],levels)
*

*>
*

house.pr1<-predict(house.plr,expand.grid(hnames[-1]),type="probs")

*> cbind(expand.grid(hnames[-1]),round(house.pr1,2))
*

Infl Type Cont Low Medium High

1 Low Tower Low 0.38 0.29 0.33

2 Medium Tower Low 0.26 0.27 0.47

3 High Tower Low 0.14 0.21 0.65

4 Low Apartment Low 0.52 0.26 0.22

5 Medium Apartment Low 0.38 0.29 0.33

6 High Apartment Low 0.23 0.26 0.51

7 Low Atrium Low 0.47 0.27 0.26

8 Medium Atrium Low 0.33 0.29 0.38

9 High Atrium Low 0.19 0.25 0.56

10 Low Terrace Low 0.64 0.21 0.14

11 Medium Terrace Low 0.51 0.26 0.23

12 High Terrace Low 0.33 0.29 0.38

13 Low Tower High 0.30 0.28 0.42

14 Medium Tower High 0.19 0.25 0.56

15 High Tower High 0.10 0.17 0.72

16 Low Apartment High 0.43 0.28 0.29

17 Medium Apartment High 0.30 0.28 0.42

18 High Apartment High 0.17 0.23 0.60

19 Low Atrium High 0.38 0.29 0.33

20 Medium Atrium High 0.26 0.27 0.47

21 High Atrium High 0.14 0.21 0.64

22 Low Terrace High 0.56 0.25 0.19

23 Medium Terrace High 0.42 0.28 0.30

24 High Terrace High 0.26 0.27 0.47

What I am confused is that when I tried to reproduce

these predicted probabilities manually using the model

coefficients, I can sometimes get different results:

According to cumulative logistic regression, the model

is:

logit(P(Y<=k | x1,x2,x3) = a + b1*x1 + b2*x2 + b3*x3

for k=1,2,3

For example, for low Infl, Type tower, Cont low (all

on reference level),

logit(P(Sat=low))=P(Sat=low)/(1-P(Sat=low))=-0.4961,

solve for

P(Sat=low)=exp(-0.4961)/(1+exp(-0.4961))=0.38;

and

logit(P(Sat=low, medium)) =

P(Sat=low,medium)/(1-P(Sat=low,medium)) = 0.6907,

solve for P(Sat=low,medium) =

exp(0.6907)/(1+exp(0.6907))=0.67, which is the sum of

0.38 plus 0.29.

The above 2 examples showed that I can reproduce the

predicted probabilities when using the intercept

alone. However, for other combinations of the

predictor variables, I can NOT reproduce the results.

For example, for medium Infl, Type tower, cont low,

logit(P(Sat=low))=P(Sat=low)/(1-P(Sat=low))=-0.4961+0.56639=0.07029,

solve for

P(Sat=low)=exp(0.07029)/(1+exp(0.07029))=0.52; but the

probability using "predict" function is 0.26.

and

logit(P(Sat=low, medium)) =

P(Sat=low,medium)/(1-P(Sat=low,medium)) =

0.6907+0.56639=1.25709, solve for P(Sat=low,medium) =

exp(1.25709)/(1+exp(1.25709))=0.78, which certainly is

NOT the sum of 0.26 plus 0.27, and is not the sum of

0.52 and 0.27.

Did I misunderstand something here? Or the way I used

to reproduce the predicted probabilities is not

correct?

Thanks very much!

______________________________________________

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help

**Next message:**Marc Schwartz: "RE: [R] "help.start() and kde konqueror""**Previous message:**Douglas Bates: "[R] Re: [Socsci-stat] Workshop on multilevel modeling in R"

*
This archive was generated by hypermail 2.1.3
: Tue 01 Jul 2003 - 09:11:45 EST
*