[R] R help: problems with step function

From: Ping Wang <pwang_at_stat.wisc.edu>
Date: Tue, 13 May 2008 00:23:55 -0500

Dear List Members,

I have encountered two problems when using the step function to select models. To better illustrate the problems, attached is an R image which includes the objects needed to run the code attached. lm.data.frame have factor variables with 3 levels.

The following run shows the first problem. AICs (* and **) are different. I noticed that the Df for rs13482096:rs13483699 is 4, while I think Df should be 6, 2 from rs13483699 and 4 from interactions. When I ran add1 directly, I got Df=6 and AIC 848.75.

> step2.bic.out <- step(step.bic.out, scope=list(lower=scope.lower2,
upper=scope.upper2),

+                       direction="both", k=log(length(step.bic.out$y)),
trace=1)
Start: AIC=841.18
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221
                       Df Deviance    AIC
+ rs13482096:rs13483699  4   216.63 840.63 (*)
<none>                       233.82 841.18
- rs8254221              2   244.08 842.90
- rs13482096             2   245.20 844.31
......

Step: AIC=848.75 (**)
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 +

   rs13482096:rs13483699

> add1(step.bic.out, scope="rs13482096:rs13483699",
k=log(length(step.bic.out$y)))
Single term additions

Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221

                     Df Deviance    AIC
<none>                     233.82 841.18
rs13482096:rs13483699 6 214.28 848.75 (**)

When I used add1 to handle terms to be added together and separately, I got different results. The example is shown below. This might explain the discrepancy shown above.
> add1(step.bic.out, scope=int.terms[11:12], k=log(length(step.bic.out$y)))
Single term additions

Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221

                     Df Deviance    AIC
<none>                     233.82 841.18
rs13479085:rs13475933  6   224.95 863.66
rs13480057:rs13475933 4 226.72 854.62 (***)
> add1(step.bic.out, scope=int.terms[11], k=log(length(step.bic.out$y)))
Single term additions

Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221

                     Df Deviance    AIC
<none>                     233.82 841.18
rs13479085:rs13475933 6 224.95 863.66
> add1(step.bic.out, scope=int.terms[12], k=log(length(step.bic.out$y)))
Single term additions

Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221

                     Df Deviance    AIC
<none>                     233.82 841.18
rs13480057:rs13475933 6 215.95 851.12 (***)

Another problem is that the final model seems to be the first model that satisfies (bAIC >= AIC + 1e-07) if steps haven't used up, rather than the one before that. Please see below.

> formula(step2.bic.out)

pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 +

   rs13482096:rs13483699

> step2.bic.out$anova

 Step Df Deviance Resid. Df Resid. Dev      AIC
1      NA       NA       298   233.8226 841.1784

Any insights would be greatly appreciated. Thanks much !

Ping Wang



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 13 May 2008 - 05:39:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 13 May 2008 - 06:30:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive