From: John Maindonald <john.maindonald_at_anu.edu.au>

Date: Fri, 16 May 2008 14:28:59 +1000

> Thanks for the questions, Rolf. I completely agree that mixed model

*> specification can be an extremely confusing area.
*

> Let's consider a set of models for the Machines data reproduced (from

*> Milliken and Johnson) in Pinheiro and Bates and available in the MEMSS
*

*> package.
*

> Loading required package: Matrix

*> Loading required package: lattice
*

> Attaching package: 'Matrix'

*> $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3
*

*> 4 ...
*

*> $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
*

*> $ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...
*

> We consider the Machine factor to have a fixed set of levels in that

*> we only consider these three machines. The levels of the Worker
*

*> factor represent a sample from the set of potential operators. As you
*

*> might imagine from this description, I now think of the distinction
*

*> between "fixed" and "random" as being associated with the factor, not
*

*> necessarily the "effects".
*

> If you plot these data

> (resulting PDF enclosed) you will see evidence of an interaction.

*> That is, some workers have noticeably different score patterns on the
*

*> three machines than do others. Worker 6 on machine B is the most
*

*> striking example.
*

> One way to model this interaction is to say that there is a random

*> effect for each worker and a separate random effect for each
*

*> worker/machine combination. If the random effects for the
*

*> worker/machine combinations are assumed to be independent with
*

*> constant variance then one expresses the model as
*

*> Linear mixed model fit by REML
*

> Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker)

*> Data: Machines
*

*> AIC BIC logLik deviance REMLdev
*

*> 227.7 239.6 -107.8 225.5 215.7
*

*> Random effects:
*

*> Groups Name Variance Std.Dev.
*

*> Machine:Worker (Intercept) 13.90963 3.72956
*

*> Worker (Intercept) 22.85528 4.78072
*

*> Residual 0.92464 0.96158
*

*> Number of obs: 54, groups: Machine:Worker, 18; Worker, 6
*

> An equivalent formulation is

*> Linear mixed model fit by REML
*

*> Formula: score ~ Machine + (1 | Worker/Machine)
*

*> Data: Machines
*

*> AIC BIC logLik deviance REMLdev
*

*> 227.7 239.6 -107.8 225.5 215.7
*

*> Random effects:
*

*> Groups Name Variance Std.Dev.
*

*> Machine:Worker (Intercept) 13.90963 3.72956
*

*> Worker (Intercept) 22.85528 4.78072
*

> Residual 0.92464 0.96158

> Number of obs: 54, groups: Machine:Worker, 18; Worker, 6

> The expression (1|Worker/Machine) is just "syntactic sugar". It is

*> expanded to (1|Worker) + (1|Machine:Worker) before the model matrices
*

*> are created.
*

> If you want to start with the formula and see what that means for the

*> model then use these rules:
*

> - a term including the '|' operator is a random effects term

*> - if the left-hand side of the '|' operator is 1 then the random
*

*> effects are scalar random effects, one for each level of the factor on
*

*> the right of the '|'
*

*> - random effects associated with different terms are independent
*

*> - random effects associated with different levels of the factor within
*

*> a term are independent
*

*> - the variance of the random effects within the same term is constant
*

> However, there is another mixed-effects model that could make sense

*> for these data. Suppose I consider the variations associated with
*

*> each worker as a vector of length 3 (Machines A, B and C) with a
*

*> symmetric, positive semidefinite 3 by 3 variance-covariance matrix. I
*

*> fit that model as
*

*> Linear mixed model fit by REML
*

> Formula: score ~ Machine + (Machine | Worker)

*> Data: Machines
*

*> AIC BIC logLik deviance REMLdev
*

*> 228.3 248.2 -104.2 216.6 208.3
*

*> Random effects:
*

*> Groups Name Variance Std.Dev. Corr
*

*> Worker (Intercept) 16.64051 4.07928
*

*> MachineB 34.54670 5.87764 0.484
*

*> MachineC 13.61398 3.68971 -0.365 0.297
*

*> Residual 0.92463 0.96158
*

*> Number of obs: 54, groups: Worker, 6
*

> It may be more meaningful to write it as

*> Linear mixed model fit by REML
*

> Formula: score ~ Machine + (0 + Machine | Worker)

*> Data: Machines
*

*> AIC BIC logLik deviance REMLdev
*

*> 228.3 248.2 -104.2 216.6 208.3
*

*> Random effects:
*

*> Groups Name Variance Std.Dev. Corr
*

*> Worker MachineA 16.64097 4.07933
*

*> MachineB 74.39557 8.62529 0.803
*

*> MachineC 19.26646 4.38936 0.623 0.771
*

*> Residual 0.92463 0.96158
*

*> Number of obs: 54, groups: Worker, 6
*

> Now we are fitting 3 variances and 3 covariances for the random

*> effects instead of the previous models which had two variances. The
*

*> difference in the models is exactly what made you pause - the simpler
*

*> model assumes that, conditional on the random effect for the worker,
*

*> the worker/machine random effects are independent and have constant
*

*> variance. In the more general models the worker/machine interactions
*

*> are allowed to be correlated within worker.
*

> It is more common to allow this kind of correlation within subject in

*> models for longitudinal data (the Laird-Ware formulation) where each
*

*> subject has a random effect for the intercept and a random effect for
*

*> the slope with respect to time and these can be correlated. However,
*

*> this type of representation can make sense with a factor on the left
*

*> hand side of the '|' operator, like the Machine factor here. If that
*

*> factor has a large number of levels then the model quickly becomes
*

*> unwieldy because the number of variance-covariance parameters to
*

*> estimate is quadratic in the number of levels of the factor on the
*

*> lhs.
*

> I hope this helps.

*> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 16 May 2008 - 04:32:27 GMT

Date: Fri, 16 May 2008 14:28:59 +1000

I've been looking back over this discussion. Another model that one can fit using lme is:

> lme(score~Machine, random=list(Worker=pdIdent(~0+Machine)),
+ weights=varIdent(form=~1|Machine), data=Machines)
Linear mixed-effects model fit by REML

Data: Machines

Log-restricted-likelihood: -108.9547

Fixed: score ~ Machine

(Intercept) MachineB MachineC

52.355556 7.966667 13.916667

Random effects:

Formula: ~0 + Machine | Worker

Structure: Multiple of an Identity

MachineA MachineB MachineC Residual StdDev: 6.06209 6.06209 6.06209 1.148581

Variance function:

Structure: Different standard deviations per stratum
Formula: ~1 | Machine

Parameter estimates:

A B C

1.0000000 0.8713263 0.5859709

Number of Observations: 54

Number of Groups: 6

This insists (I think) that conditional on the random effect for the
worker,

the worker/machine random effects be independent,
but allows them to have different variances. I am wondering whether
it is possible to fit such a model using lmer().

[In this example the large estimated correlations suggest that it is not a sensible model.]

John Maindonald email: john.maindonald_at_anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.

On 14 May 2008, at 2:49 AM, Douglas Bates wrote:

> On Mon, May 12, 2008 at 5:39 PM, Rolf Turner

*> <r.turner_at_auckland.ac.nz> wrote:
*

>> >> On 13/05/2008, at 4:09 AM, Douglas Bates wrote: >> >>> I'm entering this discussion late so I may be discussing issues that >>> have already been addressed. >>> >>> As I understand it, Federico, you began by describing a model for >>> data >>> in which two factors have a fixed set of levels and one factor has >>> an >>> extensible, or "random", set of levels and you wanted to fit a model >>> that you described as >>> >>> y ~ effect1 * effect2 * effect3 >>> >>> The problem is that this specification is not complete. >> >> At *last* (as Owl said to Rabbit) we're getting somewhere!!! >> >> I always knew that there was some basic fundamental point >> about this business that I (and I believe many others) were >> simply missing. But I could not for the life of me get anyone >> to explain to me what that point was. Or to put it another >> way, I was never able to frame a question that would illuminate >> just what it was that I wasn't getting. >> >> I now may be at a stage where I can start asking the right >> questions. >> >>> An interaction of factors with fixed levels and a factor with random >>> levels can mean, in the lmer specification, >>> >>> lmer(y ~ effect1 * effect2 + (1| effect3) + (1| >>> effect1:effect2:effect3), >>> ...) >>> >>> or >>> >>> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) >>> >>> or other variations. When you specify a random effect or an random >>> interaction term you must, either explicitly or implicitly, specify >>> the form of the variance-covariance matrix associated with those >>> random effects. >>> >>> The "advantage" that other software may provide for you is that it >>> chooses the model for you but that, of course, means that you only >>> have the one choice. >> >> Now may I start asking what I hope are questions that will lift >> the fog a bit? >> >> Let us for specificity consider a three-way model with two >> fixed effects and one random effect from the good old >> Rothamstead >> style >> agricultural experiment context: Suppose we have a number of >> species/breeds of wheat (say) and a number of fertilizers. >> These are fixed effects. And we have a number of fields >> (blocks) >> --- a random effect. Each breed-fertilizer combination is >> applied a number of times in each field. We ***assume*** that >> that the field or block effect is homogeneous throughout. This >> may or may not be a ``good'' assumption, but it's not completely >> ridiculous and would often be made in practice. And probably >> *was* made at Rothamstead. The response would be something like >> yield in bushels per acre. >> >> The way that I would write the ``full'' model for this setting, >> in mathematical style is: >> >> Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + >> (alpha.C)_ik >> + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl >> >> The alpha_i and beta_j are parameters corresponding to breed and >> fertilizer >> respectively; the C_k are random effects corresponding to >> fields or >> blocks. >> Any effect ``involving'' C is also random. >> >> The assumptions made by the Package-Which-Must-Not-Be-Named >> are (I >> think) >> that >> >> C_k ~ N(0,sigma_C^2) >> (alpha.C)_ik ~ N(0,sigma_aC^2) >> (beta.C)jk ~ N(0,sigma_bC^2) >> (alpha.beta.C)_ijk ~ N(0,sigma_abC^2) >> E_ijkl ~ N(0,sigma^2) >> >> and these random variables are *all independent*. >> >> Ahhhhhhhh ... perhaps I'm on the way to answering my own >> question. >> Is >> it this assumption of ``all independent'' which is >> questionable? It >> seemed innocent enough when I first learned about this stuff, lo >> these >> many years ago. But .... mmmmmaybe not! >> >> To start with: What would be the lmer syntax to fit the >> foregoing >> (possibly naive) model? I am sorry, but I really cannot get >> my head >> around the syntax of lmer model specification, and I've >> tried. I >> really have. Hard. I know I must be starting from the wrong >> place, >> but I haven't a clue as to what the *right* place to start >> from is. >> And if I'm in that boat, I will wager Euros to pretzels that >> there >> are others in it. I know that I'm not the brightest bulb in the >> chandelier, but I'm not the dullest either. >

> Thanks for the questions, Rolf. I completely agree that mixed model

>

> Let's consider a set of models for the Machines data reproduced (from

> >> library(lme4)

> Loading required package: Matrix

>

> Attaching package: 'Matrix'

> >

> The following object(s) are masked from package:stats :

>

> xtabs

>> data("Machines", package = "MEMSS") >> str(Machines)

> 'data.frame': 54 obs. of 3 variables:

>

> We consider the Machine factor to have a fixed set of levels in that

>

> If you plot these data

>> dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine, >> type = c("g", "p", "a"), xlab = "Efficiency score", ylab = >> "Worker", auto.key = list(columns = 3, lines = TRUE)) >

> (resulting PDF enclosed) you will see evidence of an interaction.

>

> One way to model this interaction is to say that there is a random

> >> print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1| >> Machine:Worker), Machines), corr = FALSE)

> Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker)

>

> Fixed effects:> Estimate Std. Error t value> (Intercept) 52.356 2.486 21.062

> MachineB 7.967 2.177 3.659

> MachineC 13.917 2.177 6.393

>

> An equivalent formulation is

>> print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines), >> corr = FALSE)

> Residual 0.92464 0.96158

> Number of obs: 54, groups: Machine:Worker, 18; Worker, 6

>

> Fixed effects:> Estimate Std. Error t value> (Intercept) 52.356 2.486 21.062

> MachineB 7.967 2.177 3.659

> MachineC 13.917 2.177 6.393

>

> The expression (1|Worker/Machine) is just "syntactic sugar". It is

>

> If you want to start with the formula and see what that means for the

>

> - a term including the '|' operator is a random effects term

>

> However, there is another mixed-effects model that could make sense

> >> print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines), >> corr = FALSE)

> Formula: score ~ Machine + (Machine | Worker)

>

> Fixed effects:> Estimate Std. Error t value> (Intercept) 52.356 1.681 31.151

> MachineB 7.967 2.421 3.291

> MachineC 13.917 1.540 9.037

>

> It may be more meaningful to write it as

> >> print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines), >> corr = FALSE)

> Formula: score ~ Machine + (0 + Machine | Worker)

>

> Fixed effects:> Estimate Std. Error t value> (Intercept) 52.356 1.681 31.150

> MachineB 7.967 2.421 3.291

> MachineC 13.917 1.540 9.037

>

> Now we are fitting 3 variances and 3 covariances for the random

>

> It is more common to allow this kind of correlation within subject in

>

> I hope this helps.

> >> Having got there: Presuming that I'm more-or-less on the >> right track >> in my foregoing conjecture that it's the over-simple dependence >> structure >> that is the problem with what's delivered by the >> Package-Which-Must-Not-Be-Named, >> how might one go about being less simple-minded? I.e. what >> might be >> some >> more realistic dependence structures, and how would one >> specify these >> in lmer? >> And how would one assess whether the assumed dependence >> structure >> gives >> a reasonable fit to the data? >> >>> If you can describe how many variance components you think should be >>> estimated in your model and what they would represent then I think >>> it >>> will be easier to describe how to fit the model. >> >> How does this fit in with my conjecture (above) about what >> I've been >> missing all these years? Does it fit? How many variance >> components >> are there in the ``naive'' model? It looks like 5 to me ... but >> maybe >> I'm totally out to lunch in what I think I'm understanding at >> this >> stage. >> (And besides --- there are three sorts of statistician; those >> who can >> count, >> and those who can't.) >> >> Thank you for your indulgence. >> >> cheers, >> >> Rolf Turner >> >> ###################################################################### >> Attention:This e-mail message is privileged and confidential. If >> you are not >> theintended recipient please delete the message and notify the >> sender.Any >> views or opinions presented are solely those of the author. >> >> This e-mail has been scanned and cleared by >> MailMarshalwww.marshalsoftware.com >> ###################################################################### >>> R-sig-mixed-models@r-project.org mailing list

> <Machines.pdf>_______________________________________________

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 16 May 2008 - 04:32:27 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 16 May 2008 - 05:30:43 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*