# RE: [R] How to intepret a factor response model?

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Wed 04 May 2005 - 18:21:11 EST

On 04-May-05 Maciej Bliziński wrote:
> Hello,
>
> I'd like to create a model with a factor-type response variable.
> This is an example:

>

```>> mydata <- data.frame(factor_var = as.factor(c(rep('one', 100),
>> rep('two', 100), rep('three', 100))), real_var = c(rnorm(150),
>> rnorm(150) + 5))
>> summary(mydata)

>  factor_var     real_var

>  one  :100   Min.   :-2.742877
>  three:100   1st Qu.:-0.009493
>  two  :100   Median : 2.361669
>              Mean   : 2.490411
>              3rd Qu.: 4.822394
>              Max.   : 6.924588
>> mymodel = glm(factor_var ~ real_var, family = 'binomial', data =
>> mydata)
>> summary(mymodel)
```

>
> Call:
> glm(formula = factor_var ~ real_var, family = "binomial", data =
> mydata)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -1.7442 -0.6774 0.1849 0.3133 2.1187
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.6798 0.1882 -3.613 0.000303 ***
> real_var 0.8971 0.1066 8.417 < 2e-16 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 381.91 on 299 degrees of freedom
> Residual deviance: 213.31 on 298 degrees of freedom
> AIC: 217.31
>
> Number of Fisher Scoring iterations: 6

Have you noticed that you get identical results with

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),

rep('two',100), rep('three', 100))),
real.var = c(rnorm(150), rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata) summary(mymodel)

and

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),

rep('two',200))),real.var = c(rnorm(150),rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata) summary(mymodel)

(I've left out the "summary(mydata)" since these do naturally differ, and I've replaced "factor_var" with "factor.var" and "real_var" with "real.var" because of potential complications with "_"; also "mymodel =" to "mymodel <-").

So I think the interpretation of the results from your first model is that, because of the "family='binomial'", glm is treating "factor.var='one'" as binomial response "0", say, and "factor.var='two'" or "factor.var='three'" as binomial response "1".

You're trying to fit a multinomial response, but you've specified a binomial family to 'glm'. 'glm' does not have a multinomial response family.

You could try 'multinom' from package 'nnet' which fits loglinear models to factor responses with more than 2 levels.

E.g.

library(nnet)
mymodel <- multinom(factor.var ~ real.var,data=mydata)

```   ### weights:  9 (4 variable)
##  initial  value 329.583687
##  iter  10 value 209.780666
##  final  value 209.779951
##  converged
```

summary(mymodel)
```   ## Re-fitting to get Hessian
## Call:
## multinom(formula = factor.var ~ real.var, data = mydata)
##  Coefficients:
##        (Intercept)  real.var
##  three  -3.4262565 1.3838231
##  two    -0.6754253 0.7116955
##
## Std. Errors:
##   (Intercept)  real.var
## three   0.5028541 0.1480138
## two     0.1846827 0.1068821
##
## Residual Deviance: 419.5599
## AIC: 427.5599
##
## Correlation of Coefficients:
##             three:(Intercept) three:real.var two:(Intercept)
## three:real.var  -0.7286258
## two:(Intercept)  0.1986995        -0.1261034
## two:real.var    -0.1411377         0.7012481     -0.3285741

```

This output does suggest a fairly clear interpretation!

Hoping this helps,
Ted.

E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
```Date: 04-May-05                                       Time: 09:18:03
------------------------------ XFMail ------------------------------

______________________________________________
```
R-help@stat.math.ethz.ch mailing list