Re: [R] model simplification using Crawley as a guide

From: hadley wickham <h.wickham_at_gmail.com>
Date: Wed, 11 Jun 2008 09:33:55 -0500

On Wed, Jun 11, 2008 at 6:42 AM, Frank E Harrell Jr <f.harrell_at_vanderbilt.edu> wrote:
> ChCh wrote:
>>
>> Hello,
>>
>> I have consciously avoided using step() for model simplification in favour
>> of manually updating the model by removing non-significant terms one at a
>> time. I'm using The R Book by M.J. Crawley as a guide. It comes as no
>> surprise that my analysis does proceed as smoothly as does Crawley's and
>> being a beginner, I'm struggling with what to do next.
>> I have a model:
>>
>> lm(y~A * B * C)
>>
>> where A is a categorical variable with three levels and B and C are
>> continuous covariates.
>>
>> Following Crawley, I execute the model, then use summary.aov() to identify
>> non-significant terms. I begin deleting non-significant interaction terms
>> one at a time (using update). After each update() statement, I use
>> anova(modelOld,modelNew) to contrast the previous model with the updated
>> one. After removing all the interaction terms, I'm left with:
>>
>> lm(y~ A + B + C)
>>
>> again, using summary.aov() I identify A to be non-significant, so I remove
>> it, leaving:
>>
>> lm(y~B + C) both of which are continuous variables
>>
>> Does it still make sense to use summary.aov() or should I use summary.lm()
>> instead? Has the analysis switched from an ANCOVA to a regression? Both
>> give different results so I'm uncertain which summary to accept.
>>
>> Any help would be appreciated!
>>
>>
>
> What is the theoretical basis for removing insignificant terms? How will
> you compensate for this in the final analysis (e.g., how do you unbias your
> estimate of sigma squared)?

And in a similar vein, where are your exploratory graphics? How do you know that there is a linear relationship between your response and your predictors? Are the distributional assumptions you are making appropriate?

Hadley

-- 
http://had.co.nz/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 11 Jun 2008 - 14:47:33 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 Jun 2008 - 02:30:47 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive