From: Douglas Bates <bates_at_stat.wisc.edu>

Date: Mon 23 Oct 2006 - 13:23:49 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Oct 24 07:01:24 2006

Date: Mon 23 Oct 2006 - 13:23:49 GMT

On 10/22/06, Lukas Rode <lukas.rode@googlemail.com> wrote:

> Thanks Douglas for the detailed explanation; it was very helpful (I'll

*> definitely give lme4 a try as well).
**> For the record: I also found it helpful to play with model.matrix (e.g. p.
**> 14ff and p.28 of the book by Jose Pinheiro and Douglas) to see how the
**> formulas for fixed and random effects are related to the Laird&Ware'82
**> formulation for mixed-effects models.
**>
**> Now I have one "follow-up" question, about model selection:
**>
**> From your advice, I figure that the suggested procedure is to build a number
**> of models (with differences in fixed and/or random effects) and then
**> comparing them to select the most adequate one.
**> When using anova(model1, model2) to compare two models that differ in both
**> fixed and random effects, I get a warning message: "Fitted objects with
**> different fixed effects. REML comparisons are not meaningful. in:
**> anova.lme(model1, model2)".
*

You can compare models with different fixed-effects specifications if you use method = "ML". The REML criterion does not behave like a likelihood when you change the fixed-effects specification.

Generally if you are using lme (and, as in your case, you have a simple specification like (1|Subject) for the random effects) then you can use anova applied to the model with the main effects and interactions to test for the significance of the interaction term. If you decide to eliminate that term then you can test for the other terms.

With lmer fits I recommend checking a Markov chain Monte Carlo sample from the posterior distribtuion of the parameters to determine which are signification (although this is not terribly well documented at the present time).

> So apparently this means that anova only works for comparing models with

*> same fixed-effects component but different random effects, right?
**> So what is then the correct way to compare models with different fixed
**> and/or random effects?
**> For example, would it be correct to simply choose the model with the lowest
**> AIC/BIC? (this seems to be a better choice than selecting by logLikelihood
**> since AIC/BIC penalize higher #parameters, whereas likelihood does not).
*

Some people like to use AIC/BIC. I find the definition of the cut-off points to be rather arbitrary. Also, one will frequently have the model selection by AIC and by BIC disagree.

> Or is there another, better way?

*>
**> Many thanks so far,
**> Lukas
**>
**>
**>
**> On 10/22/06, Douglas Bates <bates@stat.wisc.edu> wrote:
**> > On 10/21/06, Lukas Rode <lukas.rode@googlemail.com> wrote:
**> > > Dear list,
**> > >
**> > > I'm trying to fit a multilevel (mixed-effects) model using the lme
**> function
**> > > (package nlme) in R 2.4.0. As a mixed-effects newbie I'm neither sure
**> about
**> > > the modeling nor the correct R syntax.
**> >
**> > You may also want to consider using the lmer function from package
**> > lme4. It's a later version of lme for R and is generally faster than
**> > lme. I'll show the syntax for both.
**> >
**> > > My data is structured as follows: For each subject, a quantity Y is
**> measured
**> > > at a number (>= 2) of time points. Moreover, at time point 0
**> ("baseline"), a
**> > > quantity X is measured for each subject (I am interested to see whether
**> X,
**> > > or log(X), is a predictor of Y). I saw in some examples that
**> time-invariant
**> > > predictors should be repeated for all rows of a subject, which is why I
**> > > copied the baseline value of X also to time points > 0.
**> >
**> > That's correct. For both lme and lmer the data need to be in the
**> > 'long' form, also called the 'person-period' form. You can reshape
**> > the data by hand, as you have done, or you can use the 'reshape'
**> > function or the more general capabilities in Hadley Wickham's reshape
**> > package.
**> >
**> > > The resulting data frame looks like this:
**> >
**> > > Grouped Data: Y ~ TIME | Subject
**> > > Y TIME Subject X.Baseline
**> > > 9 0.0 1 1084
**> > > 7 3.7 1 1084
**> > > 11 0.0 2 7150
**> > > 8 9.2 2 7150
**> >
**> > > Intra-subject trajectories of Y very close to linear. I'd like to check
**> > > whether slope (and maybe also offset) of this line are (in part)
**> predicted
**> > > by X.baseline.
**> >
**> > I would say that this question would be addressed as part of the
**> > fixed-effects specification. You would be comparing a model with main
**> > effects for TIME and X.Baseline and their interaction to various
**> > submodels. In the S language formula notation these fixed effects
**> > specifications are
**> >
**> > Y ~ TIME * X.Baseline # main effects for TIME and X.Baseline and
**> > their interaction
**> > Y ~ TIME + X.Baseline + TIME:X.Baseline # equivalent to first model
**> > Y ~ TIME + X.Baseline # main effects but no interaction. That is
**> > X.Baseline does not affect slope
**> > Y ~ TIME + TIME: X.Baseline # X.Baseline affects slope (wrt TIME) but not
**> offset
**> > Y ~ TIME # X.Baseline has no effect on Y
**> >
**> > > Thus, I think the multilevel model specification should be as follows (i
**> =
**> > > subject, j=measurement):
**> > > y_ij = \beta_i + b_i * TIME_ij + \epsilon_ij,
**> > > with
**> > > b_i = \zeta_i0 + \zeta_i1 * X.Baseline
**> > > Is this correct?
**> >
**> > Well that model doesn't have a main effect for X.Baseline so you can't
**> > test for an effect for X.Baseline on the slope.
**> >
**> > > Now, I am completely unsure how to "translate" this into the syntax
**> needed
**> > > by lme.
**> > > Is there any standard procedure on how to get from e.g . the
**> Laird&Ware'82
**> > > matrix model notation to the lme input?
**> >
**> > I think that is more of a question of how to use the model formula
**> > language than a question about lme. In lme the right hand side of the
**> > formula argument generates Laird&Ware's matrix X and the random
**> > specification generates the matrix Z.
**> >
**> > > And, in my case, is the correct model as follows, or am I wrong?
**> > > my.model <- lme( Y ~ TIME, my.data.frame, random=~X.Baseline | Subject )
**> > > [in case this is correct, it is by accident, since I made it up from
**> some
**> > > examples I found without really understanding the translation from the
**> model
**> > > formulation]
**> >
**> > Actually this model specification should throw an error but it doesn't.
**> >
**> > Estimates for the random effects in this model are not well defined
**> > because X.Baseline is constant within Subject. Thus within Subject
**> > the Intercept term and the X.Baseline term are confounded. I would
**> > begin with the model
**> >
**> > lme(Y ~ TIME * X.Baseline, my.data.frame, random = ~ 1|Subject)
**> >
**> > which is equivalent to
**> >
**> > lmer(Y ~ TIME * X.Baseline + (1|Subject), my.data.frame)
**> >
**> > Then examine the estimates for the fixed-effects coefficients and
**> > their standard errors to see what reductions of this model may be
**> > appropriate.
**> >
**> > > I'd greatly appreciate some advice or help.
**> >
**> > I hope this helps.
**> >
**>
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Oct 24 07:01:24 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Mon 23 Oct 2006 - 22:30:13 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*