Re: [R] lm/model.matrix confusion (? bug)

From: Berwin A Turlach <berwin_at_maths.uwa.edu.au>
Date: Wed, 12 Dec 2007 18:36:32 +0800

G'day Mark,

On Wed, 12 Dec 2007 02:05:54 -0800 (PST) Mark Difford <mark_difford_at_yahoo.co.uk> wrote:

> In order to get the same coefficients as we get from the following
[...]
> we need to do the following (if we use model.matrix to specify the
> model)

By why would you want to do this?

> ##
> summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data =
> whiteside) )
>
> That is, we need to take out "two intercepts." Is this "correct"?

Yes.  

> Shouldn't lm check to see if an intercept has been requested as part
> of the model formula?

No, it does not. In the Details section of lm's help page you will find the following:

     A formula has an implied intercept term.  To remove this use
     either 'y ~ x - 1' or 'y ~ 0 + x'.  See 'formula' for more details
     of allowed formulae.

Thus, except if you explicitly ask for a constant term not be included, lm will add a constant term (a column of ones) additionally to what ever you have specified on the right hand side of the formula.

> If I do
> ##
> summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1,
> data=whiteside)), data=whiteside))
>
> we get a strange model.

Well, you get a model in which not all parameters are identifiable, and a particular parameter that is not identifiable is estimated by NA. I am not sure what is strange about this.

> But the formula part of this model qualifies
> as a valid formula
> ##
> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside))

Debatable, the above command only shows that it can be coerced into a valid formula. :)

> just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside)
>
> But we know that the _correct_ formula is the following
 

> ##
> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1)

Why is this formula any more correct than the other one? Both specify exactly the same model. It is just that one does it in an overparameterised way.

> (Sorry, this is getting really long) --- So, my question/confusion
> comes down to wanting to know why lm() doesn't check to see if an
> intercept has been specified when the model has been specified using
> model.matrix.

Because lm() is documented not to check this. If you do not want to have an intercept in the model you have to specifically ask it for.

Also, comparing the output of

        summary( lm(Gas ~ Insul/Temp - 1, data = whiteside) ) and

        summary( lm(Gas ~ Insul/Temp, data = whiteside ) )

you can see that lm() does not check whether there is an implicit intercept in the model. Compare the (Adjusted) R-squared values returned; one case is using the formula for models with no intercept the other one the formula for models with intercept. Similar story with the reported F-statistics.

Cheers,

        Berwin


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Dec 2007 - 10:53:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 12 Dec 2007 - 15:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.