# Re: [R] Poisson regression in R

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Sun, 02 Mar 2008 09:27:45 +0100

glmstat wrote:
> I have these questions:
> (1) Use Poisson regression to estimate the main effects of car, age, and
> dist (each treated as categorical and modelled using indicator variables)
> and interaction terms.
> (2) It was determined by one study that all the interactions were
> unimportant and decided that age and car could be treated as though they
> were continuous variables. Fit a model incorporating these features and
> compare it with the best model obtained in (1).
>
>
This looks like homework, so only hints are offered.

You don't seem to be using n, consider incorporating an offset (I would expect most texts on Poison regr. to discuss this).

> n is the number of insurance policies
> y is the number of claims
> car is the car in an insurance category
> age is the age of policy holder
> dist is the district where the policy holder lived (1 for London and other
> major cities, and 0 otherwise)
>
> Data:
>
> car age dist y n
> 1 1 0 65 317
> 1 2 0 65 476
> 1 3 0 52 486
> 1 4 0 310 3259
> 2 1 0 98 486
> 2 2 0 159 1004
> 2 3 0 175 1355
> 2 4 0 877 7660
> 3 1 0 41 223
> 3 2 0 117 539
> 3 3 0 137 697
> 3 4 0 477 3442
> 4 1 0 11 40
> 4 2 0 35 148
> 4 3 0 39 214
> 4 4 0 167 1019
> 1 1 1 2 20
> 1 2 1 5 33
> 1 3 1 4 40
> 1 4 1 36 316
> 2 1 1 7 31
> 2 2 1 10 81
> 2 3 1 22 122
> 2 4 1 102 724
> 3 1 1 5 18
> 3 2 1 7 39
> 3 3 1 16 68
> 3 4 1 63 344
> 4 1 1 0 3
> 4 2 1 6 16
> 4 3 1 8 25
> 4 4 1 33 114
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> I need help finding the correct R code to construct models. According to the
> previous study, the model in (2) "is simpler than (1), fits well (deviance =
> 53.11, d.f. = 60, p-value = 0.72) and gives coefficients (standard errors):
> AGE, – 0.177 (0.018); CAR, 0.198 (0.021); DIST, 0.210 (0.059)."
>
> As of the first model, I think that I should use this code, but not sure:
>
>
>> firstmodel<-glm(y~factor(age)*factor(car)*factor(dist),family=poisson)
>>
>
> As of the second model, I used this code, but it produces results that
> contradict what the previous study says (and deleting intercept does not
> help):
>
>
>> secondmodel<-glm(y~age+car+factor(dist),family=poisson)
>> summary(secondmodel)
>>
> Call:
> glm(formula = y ~ age + car + factor(dist), family = poisson)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -14.0258 -3.3200 -0.6296 2.0575 18.1442
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 3.08222 0.08127 37.92 <2e-16 ***
> age 0.83664 0.02067 40.48 <2e-16 ***
> car -0.16723 0.01612 -10.37 <2e-16 ***
> factor(dist)1 -2.15937 0.05849 -36.92 <2e-16 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
> Null deviance: 5660.6 on 31 degrees of freedom
> Residual deviance: 1154.5 on 28 degrees of freedom
> AIC: 1330.8
>
> Number of Fisher Scoring iterations: 5
>

```--
O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help