Re: [R] Can lmer() fit a multilevel model embedded in a regression?

From: Doran, Harold <HDoran_at_air.org>
Date: Wed 24 May 2006 - 08:24:22 EST


I've thought about this a bit and am just short of simulating data, for which I do not have time. But, if there are some available data I would be happy to experiment.

Based on my understanding of the model and data structure, I do think it is possible to estimate using lmer, but I think it may push some limits, especially as the structure of the random effects seems to be very large with many covariances. This can be controlled by assuming independence of the group-level errors, hence making the model more parsimonious and simpler for lmer to estimate(see the Bates article on lmer in R News).

There is a large number of variables, so using as.formula would be wise, but for illustration here is what I think the lmer syntax would look like:

fm1 <- lmer(outcome ~ food_1*folic_1 + food_2*folic_2 + ... + food_82*folic_82 + sex + age + (food_1 + food_2 + ... + food_82|id), data, family =binomial(link='logit'), method = "Laplace", control = list(usePQL=FALSE) )

One can assess the reasonableness of the model using the MCMCsamp() function. This returns an object of class mcmc and so all diagnostics can be performed using the various functions in the coda package.

I might suggest experimenting with this code for a much smaller set of columns in the X matrix for foods. I must admit that I think of the model notation slightly different than written in this exchange. My inclination is to think of the model as a linear model with a covariance structure that accounts for correlations in the data by incorporating random effects.

HTH,
Harold

-----Original Message-----

From:	Andrew Gelman [mailto:gelman@stat.columbia.edu]
Sent:	Mon 5/22/2006 11:12 AM
To:	Doran, Harold
Cc:	r-help@stat.math.ethz.ch; reg26@columbia.edu
Subject:	Re: [R] Can lmer() fit a multilevel model embedded in a regression?

Harold,

I think we use slightly different notation (I like to use variance parameters rather than covariance matrices). Let me try to write it in model form:

Data points y_i, i=1,...,800

800 x 84 matrix of predictors, X: for columns j=1,...,82, X_{i,j} is the amount of food j consumed by person i. X_{i,83} is an indicator (1 if male, 0 if female), and X_{i,84} is the age of person i.

Data-level model: Pr (y_i=1) = inverse.logit (X_i*beta), for i=1,...,800, with independent outcomes.

beta is a (column) vector of length 84.

Group-level model: for j=1,...,82: beta_j ~ Normal (gamma_0 + gamma_1 * u_j, sigma^2_{beta}).

u is a vector of length 82, where u_j = folate concentration in food j

gamma_0 and gamma_1 are scalar coefficients (for the group-level model), and sigma_{beta} is the sd of the group-level errors.

It would be hopeless to estimate all the betas using maximum likelihood: that's 800 data points and 84 predictors, the results will just be too noisy. But it should be ok using the 2-level model above. The question is: can I fit in lmer()?

Thanks again.
Andrew

Doran, Harold wrote:

> So, in the hierarchical notation, does the model look like this (for
> the linear predictor):
>
> DV = constant + food_1(B_1) + food_2(B_2) + ... + food_82(B_82) +
> sex(B_83) + age(B_84)
> food_1 = gamma_00 + gamma_01(folic) + r_01
> food_2 = gamma_10 + gamma_11(folic) + r_02
> ...
> food_82 = gamma_20 + gamma_21(folic) + r_82
>
> where r_qq ~ N(0, Psi) and Psi is an 82-dimensional covariance matrix.
>
> I usually need to see this in model form as it helps me translate this
> into lmer syntax if it can be estimated. From what I see, this would
> be estimating 82(82+1)/2 = 3403 parameters in the covariance matrix.
>
> What I'm stuck on is below you say it would be hopeless to estimate
> the 82 predictors using ML. But, if I understand the model correctly,
> the multilevel regression still resolves the predictors (fixed
> effects) using ML once estimates of the variances are obtained. So, I
> feel I might still be missing something.
>
>
>
> -----Original Message-----
> From: Andrew Gelman [mailto:gelman@stat.columbia.edu]
> Sent: Sun 5/21/2006 7:35 PM
> To: Doran, Harold
> Cc: r-help@stat.math.ethz.ch; reg26@columbia.edu
> Subject: Re: [R] Can lmer() fit a multilevel model embedded in
> a regression?
>
> Harold,
>
> I get confused by the terms "fixed" and "random". Our first-level model
> (in the simplified version we're discussing here) has 800 data points
> (the persons in the study) and 84 predictors: sex, age, and 82
> coefficients for foods. The second-level model has 82 data points (the
> foods) and two predictors: a constant term and folic acid concentration.
>
> It would be hopeless to estimate the 82 food coefficients via maximum
> likelihood, so the idea is to do a multilevel model, with a regression
> of these coefficients on the constant term and folic acid. The
> group-level model has a residual variance. If the group-level residual
> variance is 0, it's equivalent to ignoring food, and just using total
> folic acid as an individual predictor. If the group-level residual
> variance is infinity, it's equivalent to estimating the original
> regression (with 84 predictors) using least squares.
>
> The difficulty is that the foods aren't "groups" in the usual sense,
> since persons are not nested within foods; rather, each person eats many
> foods, and this is reflected in the X matrix.
>
> Andrew
>
> Doran, Harold wrote:
>
> > OK, I'm piecing this together a bit, sorry I'm not familiar with the
> > article you cite. Let me try and fully understand the issue if you
> > don't mind. Are you estimating each of the 82 foods as fixed effects?
> > If so, in the example below this implies 84 total fixed effects (1 for
> > each food type in the X matrix and then sex and age).
> >
> > I'm assuming that food type is nested within one of the 82 folic acid
> > concentrations and then folic acid is treated as a random effect.
> >
> > Is this accurate?
> >
> >
> > -----Original Message-----
> > From: Andrew Gelman [mailto:gelman@stat.columbia.edu]
> > Sent: Sun 5/21/2006 9:17 AM
> > To: Doran, Harold
> > Cc: r-help@stat.math.ethz.ch; reg26@columbia.edu
> > Subject: Re: [R] Can lmer() fit a multilevel model embedded in
> > a regression?
> >
> > Harold,
> >
> > I'm confused now. Just for concretness, suppose we have 800 people, 82
> > food items, and one predictor ("folic", the folic acid concentration) at
> > the food-item level. Then DV will be a vector of length 800, foods is
> > an 800 x 82 matrix, sex is a vector of length 800, age is a vector of
> > length 800, and folic is a vector of length 82. The vector of folic
> > acid concentrations in individual diets is then just foods%*%folic,
> > which I can call folic_indiv.
> >
> > How would I fit the model in lmer(), then? There's some bit of
> > understading that I'm still missing.
> >
> > Thanks.
> > Andrew
> >
> >
> > Doran, Harold wrote:
> >
> > > Prof Gelman:
> > >
> > > I believe the answer is yes. It sounds as though persons are partially
> > > crossed within food items?
> > >
> > > Assuming a logit link, the syntax might follow along the lines of
> > >
> > > fm1 <- lmer(DV ~ foods + sex + age + (1|food_item), data, family =
> > > binomial(link='logit'), method = "Laplace", control = list(usePQL=
> > > FALSE) )
> > >
> > > Maybe this gets you partly there.
> > >
> > > Harold
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: r-help-bounces@stat.math.ethz.ch on behalf of Andrew Gelman
> > > Sent: Sat 5/20/2006 5:49 AM
> > > To: r-help@stat.math.ethz.ch
> > > Cc: reg26@columbia.edu
> > > Subject: [R] Can lmer() fit a multilevel model embedded in a
> > > regression?
> > >
> > > I would like to fit a hierarchical regression model from Witte et al.
> > > (1994; see reference below). It's a logistic regression of a health
> > > outcome on quntities of food intake; the linear predictor has the
> form,
> > > X*beta + W*gamma,
> > > where X is a matrix of consumption of 82 foods (i.e., the rows of X
> > > represent people in the study, the columns represent different foods,
> > > and X_ij is the amount of food j eaten by person i); and W is a matrix
> > > of some other predictors (sex, age, ...).
> > >
> > > The second stage of the model is a regression of X on some food-level
> > > predictors.
> > >
> > > Is it possible to fit this model in (the current version of) lmer()?
> > > The challenge is that the persons are _not_ nested within food
> items, so
> > > it is not a simple multilevel structure.
> > >
> > > We're planning to write a Gibbs sampler and fit the model
> directly, but
> > > it would be convenient to be able to flt in lmer() as well to check.
> > >
> > > Andrew
> > >
> > > ---
> > >
> > > Reference:
> > >
> > > Witte, J. S., Greenland, S., Hale, R. W., and Bird, C. L. (1994).
> > > Hierarchical regression analysis applied to a
> > > study of multiple dietary exposures and breast cancer.
> Epidemiology 5,
> > > 612-621.
> > >
> > > --
> > > Andrew Gelman
> > > Professor, Department of Statistics
> > > Professor, Department of Political Science
> > > gelman@stat.columbia.edu
> > > www.stat.columbia.edu/~gelman
> > >
> > > Statistics department office:
> > > Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
> > > 212-851-2142
> > > Political Science department office:
> > > International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
> > > 212-854-7075
> > >
> > > Mailing address:
> > > 1255 Amsterdam Ave, Room 1016
> > > Columbia University
> > > New York, NY 10027-5904
> > > 212-851-2142
> > > (fax) 212-851-2164
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> >
> > --
> > Andrew Gelman
> > Professor, Department of Statistics
> > Professor, Department of Political Science
> > gelman@stat.columbia.edu
> > www.stat.columbia.edu/~gelman
> >
> > Statistics department office:
> > Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
> > 212-851-2142
> > Political Science department office:
> > International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
> > 212-854-7075
> >
> > Mailing address:
> > 1255 Amsterdam Ave, Room 1016
> > Columbia University
> > New York, NY 10027-5904
> > 212-851-2142
> > (fax) 212-851-2164
> >
> >
> >
>
> --
> Andrew Gelman
> Professor, Department of Statistics
> Professor, Department of Political Science
> gelman@stat.columbia.edu
> www.stat.columbia.edu/~gelman
>
> Statistics department office:
> Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
> 212-851-2142
> Political Science department office:
> International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
> 212-854-7075
>
> Mailing address:
> 1255 Amsterdam Ave, Room 1016
> Columbia University
> New York, NY 10027-5904
> 212-851-2142
> (fax) 212-851-2164
>
>
>

-- 
Andrew Gelman
Professor, Department of Statistics
Professor, Department of Political Science
gelman@stat.columbia.edu
www.stat.columbia.edu/~gelman

Statistics department office:
  Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
  212-851-2142
Political Science department office:
  International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
  212-854-7075

Mailing address:
  1255 Amsterdam Ave, Room 1016
  Columbia University
  New York, NY 10027-5904
  212-851-2142
  (fax) 212-851-2164




	[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Wed May 24 08:30:19 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 24 May 2006 - 10:10:19 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.