Re: [R] stepAIC and polynomial terms

From: Robert A LaBudde <ral_at_lcfltd.com>
Date: Sun, 16 Mar 2008 22:50:49 -0500

At 08:50 PM 3/16/2008, caspar wrote:
>Dear all,
>I have a question regarding the use of stepAIC and polynomial
>(quadratic to be specific) terms in a binary logistic regression
>model. I read in McCullagh and Nelder, (1989, p 89) and as far as I
>remember from my statistics cources, higher-degree polynomial
>effects should not be included without the main effects. If I
>understand this correctly, following a stepwise model selection
>based on AIC should not lead to a model where the main effect of
>some continuous covariate is removed, but the quadratic term is kept.
>The question is, should I keep the quadratic term (note, there are
>other main effects that were retained following the stepwise
>algorithm) in the final model or should I delete it as well and move
>on? Or should I retain the main effect as well?
>
>To picture it, the initial model to which I called stepAIC is:
>
>Call: glm(formula = S ~ FR + Date * age + I(age^2), family =
>logexposure(ExposureDays = DATA$int), data = DATA)
>
>and the final one:
>
>Call: glm(formula = S ~ FR + Date + I(age^2), family =
>logexposure(ExposureDays = DATA$int), data = DATA)
>
>Thanks very much in advance for your thoughts and suggestions,
>
>Caspar

  1. You should only exclude "age" as a linear term if you have sound causal reason for believing a pure quadratic component is solely present. Based on your example, you probably don't have this.
  2. You also need to work about interactions.
  3. An alternative to your polynomial approach to such a causal variable as age is to categorize age into 5 or 10 year intervals, and see how the fit breaks down by these levels.
  4. You should plot your data vs. age to see what the dependence is. Frequently curve is flat up to a certain age, and then linear thereafter. This gives rise to a pseudo-quadratic relationship. You should be able to fit it better with the split plus a linear term.
  5. Think about how age should affect your response before trying models.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral_at_lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 17 Mar 2008 - 02:57:32 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 17 Mar 2008 - 04:30:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive