Re: [R] Parameterization puzzle

From: Murray Jorgensen <maj_at_waikato.ac.nz>
Date: Sat 22 Jul 2006 - 08:57:56 EST

Thanks to Brian and Berwin with there help. I faced a double problem in that I not only wanted to fit the model but I also wanted to do so in such a way that it would seem natural for a classroom example.

The moral seems to be that I should do the orthogonal polynomial stuff outside the model formula. Here then is my solution:

pyears <- scan()
18793 52407 10673 43248 5710 28612 2585 12663 1462 5317

deaths <- scan()
2 32 12 104 28 206 28 186 31 102

l <- log(pyears)
Smoke <- gl(2,1,10,labels=c("No","Yes")) Age <- gl(5,2,10,labels=c("35-44","45-54","55-64","65-74","75-84"),

            ordered=TRUE)
mod1.glm <- glm(deaths ~ Age * Smoke + offset(l),family=poisson) summary(mod1.glm)
age <- as.numeric(Age)
age1 <- poly(age,2)[,1]
age2 <- poly(age,2)[,2]
mod2.glm <- aso1.glm <- glm(deaths ~ age1 + age2 + Smoke +

                       age1:Smoke + offset(l),family=poisson)
summary(mod2.glm)

The final summary then comes out looking like this:

               Estimate Std. Error z value Pr(>|z|)
(Intercept)    -5.8368     0.1213 -48.103  < 2e-16 ***
age1            5.3238     0.4129  12.893  < 2e-16 ***
age2           -1.0460     0.1448  -7.223 5.08e-13 ***
SmokeYes        0.5183     0.1262   4.106 4.02e-05 ***
age1:SmokeYes  -1.3755     0.4340  -3.169  0.00153 **


which is just what I wanted.

Cheers, Murray Jorgensen

Prof Brian D Ripley wrote:

> On Fri, 21 Jul 2006, Berwin A Turlach wrote:
> 

>> G'day all,
>>
>>>>>>> "BDR" == Prof Brian Ripley <ripley@stats.ox.ac.uk> writes:
>>
>> BDR> R does not know that poly(age,2) and poly(age,1) are linearly
>> BDR> dependent.
>> Indeed, I also thought that this is the reason of the problem.
>>
>> BDR> (And indeed they only are for some functions 'poly'.)
>> I am surprised about this. Should probably read the help page of
>> 'poly' once more and more carefully.
> 
> My point was perhaps simpler: if you or I or Murray had a function 
> poly() in our workspace, that one would be found, I think.  (I have not 
> checked the ramifications of namespaces here, but I believe that would 
> be the intention, that formulae are evaluated in their defining 
> environment.)  So omly when the model matrix is set up could the linear 
> dependence be known (and there is nothing in the system poly() to tell 
> model.matrix).
> 
> What is sometimes called extrinsic aliasing is left to the fitting 
> function, which seems to be to do a sensible job provided the main 
> effect is in the model.  Indeed, including interactions without main 
> effects (as Murray did) often makes the results hard to interpret.
> 

>> BDR> I cannot reproduce your example ('l' is missing), [...]
>> My guess is that 'l' is 'pyears'. At least, I worked under that
>> assumption.
> 
> Apparently l = log(pyears), which was my uncertain guess.
> 

>> Interestingly, on my machine (using R 2.3.1, 2.2.1 and 2.1.1) I cannot
>> fit any of the Poisson GLM that Murray tried. I always get the error
>> message:
>>
>> Error: no valid set of coefficients has been found: please supply
>> starting values
> 
> Related to the offset, I believe.
> 

>>
>> But I have to investigate this further. I can fit binomial models
>> that give me similar answers.
>>
>> BDR> [...] but perhaps
>> BDR> glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l),
>> BDR> poisson)
>> BDR> was your intention?
>> In this parameterisation a 'poly(age,1)' term will appear among the
>> coefficients with an estimated value of NA since it is aliased with
>> 'poly(age, 2)1'. So I don't believe that this was Murray's intention.
>>
>> The only suggestion I can come up with is:
>>
>>> summary(glm(cbind(deaths, l-deaths) ~ age*Smoke+I(age^2),
>>> family=binomial))
>>
>> [...]
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -10.79895 0.45149 -23.918 < 2e-16 ***
>> age 2.37892 0.20877 11.395 < 2e-16 ***
>> SmokeYes 1.44573 0.37347 3.871 0.000108 ***
>> I(age^2) -0.19706 0.02749 -7.168 7.6e-13 ***
>> age:SmokeYes -0.30850 0.09756 -3.162 0.001566 **
>>
>> [...]
>>
>> Which doesn't use orthogonal polynomials anymore. But I don't see how
>> you can fit the model that Murray want to fit using orthogonal
>> polynomials given the way R's model language operates.
>>
>> So I guess the Poisson GLM that Murray wants to fit is:
>>
>> glm(deaths~ age*Smoke+I(age^2)+offset(l), family=poisson)
>>
>> Cheers,
>>
>> Berwin
>>
>> ========================== Full address ============================
>> Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
>> School of Mathematics and Statistics +61 (8) 6488 3383 (self)
>> The University of Western Australia FAX : +61 (8) 6488 1028
>> 35 Stirling Highway
>> Crawley WA 6009 e-mail: berwin@maths.uwa.edu.au
>> Australia http://www.maths.uwa.edu.au/~berwin
>>
>>
>
-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj@waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441    Mobile 021 1395 862

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat Jul 22 09:06:26 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 22 Jul 2006 - 10:15:15 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.