Re: [R] model simplification using Crawley as a guide

From: hadley wickham <>
Date: Wed, 11 Jun 2008 09:33:55 -0500

On Wed, Jun 11, 2008 at 6:42 AM, Frank E Harrell Jr <> wrote:
> ChCh wrote:
>> Hello,
>> I have consciously avoided using step() for model simplification in favour
>> of manually updating the model by removing non-significant terms one at a
>> time. I'm using The R Book by M.J. Crawley as a guide. It comes as no
>> surprise that my analysis does proceed as smoothly as does Crawley's and
>> being a beginner, I'm struggling with what to do next.
>> I have a model:
>> lm(y~A * B * C)
>> where A is a categorical variable with three levels and B and C are
>> continuous covariates.
>> Following Crawley, I execute the model, then use summary.aov() to identify
>> non-significant terms. I begin deleting non-significant interaction terms
>> one at a time (using update). After each update() statement, I use
>> anova(modelOld,modelNew) to contrast the previous model with the updated
>> one. After removing all the interaction terms, I'm left with:
>> lm(y~ A + B + C)
>> again, using summary.aov() I identify A to be non-significant, so I remove
>> it, leaving:
>> lm(y~B + C) both of which are continuous variables
>> Does it still make sense to use summary.aov() or should I use summary.lm()
>> instead? Has the analysis switched from an ANCOVA to a regression? Both
>> give different results so I'm uncertain which summary to accept.
>> Any help would be appreciated!
> What is the theoretical basis for removing insignificant terms? How will
> you compensate for this in the final analysis (e.g., how do you unbias your
> estimate of sigma squared)?

And in a similar vein, where are your exploratory graphics? How do you know that there is a linear relationship between your response and your predictors? Are the distributional assumptions you are making appropriate?



______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 11 Jun 2008 - 14:47:33 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 Jun 2008 - 02:30:47 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive