Re: [R] Question about data used to fit the mixed model

From: Douglas Bates <bates_at_stat.wisc.edu>
Date: Tue 01 Aug 2006 - 08:28:53 EST

On 7/29/06, Nantachai Kantanantha <kantanantha@hotmail.com> wrote:
> Hi everyone,
>
> I would like to ask a question regarding to the data used to fit the mixed
> model.
>
> I wonder that, for the response variable data used to fit the mixed model
> (either via "spm" or "lme"), we must have several observations per subject
> (i.e. Yij, i = 1,..,M, j = 1,.., ni) or it can be just one observation per
> subject (i.e. Yi, i = 1,...,M). Since we have to specify the groups for
> random effect components, if we have only one observation per subject, then
> each group will have only one observation.

As Harold Doran mentioned in his earlier reply, if you only have one observation in each group you can't estimate the parameters in a mixed model because the random effect for a group is completely confounded with the per-observation noise term for the observation. The model would be of the form

X\beta + Z b + \epsilon

for which you would estimate the variance of the components of b and the variance of the components of \epsilon. However, with only one observation per group the number of components in b and in \epsilon would be the same and, by suitably reordering the observations, the matrix Z could be made to be an identity matrix. Thus the model reduces to

 X\beta + (b + \epsilon)

and the elements of b are confounded with those of \epsilon.

A different version of this question is to ask whether some of the groups can have only a single observation while others have more that one observation. The answer to that is a qualified "yes".

An example of data with different numbers of observations per group is the star data that Harold mentioned. The "student" identifier in this data set is named "id". If we table the number of observations per student then table that result we get a table of the number of students with 1, 2, 3 or 4 observations.

> data("star", package = 'mlmRev')
> table(table(star$id))

   1 2 3 4
4314 2455 1744 3085
> length(unique(star$id))

[1] 11598
> 4314/11598

[1] 0.3719607

This shows that more than a third of the students have data from only a single year.

It is possible to include such students in a mixed model with a random effect for student. It is even possible to include such students in a mixed model with a random intercept and a random slope with respect to time for student. However, such students contribute very little information to the model fit and the "estimates" (actually "predictors") of the random effects for such students are artificially small because they are confounded with the per-observation noise term.

So while it can be attractive when designing an experimental or planning a observational study to have many groups and few observations per group, such experiments or studies provide very sparse information. Using a mixed model on such data doesn't magically add information to the data. Mixed models are statistical models, not magic.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Aug 01 08:36:59 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 01 Aug 2006 - 10:18:23 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.