Re: [R] fixed effects following lmer and mcmcsamp - which to present?

From: Douglas Bates <>
Date: Wed 09 Aug 2006 - 01:45:56 EST

On 8/8/06, Henrik Parn <> wrote:
> Dear all,
> I am running a mixed model using lmer. In order to obtain CI of
> individual coefficients I use mcmcsamp. However, I need advice which
> values that are most appropriate to present in result section of a
> paper. I have not used mixed models and lmer so much before so my
> question is probably very naive. However, to avoid to much problems with
> journal editors and referees addicted to p-values, I would appreciate
> advice of which values of the output for the fixed factor that would be
> most appropriate to present in a result section, in order to convince
> them of the p-value free 'lmer-mcmcsamp'-approach!
> Using the example from the help page on lmer:
> fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
> ...I obtain the following for 'Days':
> summary(fm1)
> ...
> Estimate Std. Error t value
> Days 10.4673 1.5458 6.771
> ...and from mcmcsamp:
> summary(mcmcsamp(fm1 , n = 10000))
> 1. Empirical mean and standard deviation for each variable,
> plus standard error of the mean:
> Mean SD Naive SE Time-series SE
> Days 10.4695 1.7354 0.017354 0.015921
> 2. Quantiles for each variable:
> 2.5% 25% 50% 75% 97.5%
> Days 7.0227 9.3395 10.4712 11.5719 13.957
> The standard way of presenting coefficients following a 'non-lmer'
> output is often (beta=..., SE=..., statistic=..., P=...). What would be
> the best equivalent in a 'lmer-mcmcsamp-context'? (beta=..., CI=...) is
> a good start I believe. But which beta? And what else?
> I assume that the a 95% CI in this case would be 7.0227-13.957 (please,
> do correct me I have completely misunderstood!). But which would be the
> corresponding beta? 10.4673?, 10.4695? 10.4712? Is the t-value worth
> presenting or is it 'useless' without corresponding degrees of freedom
> and P-value? If I present the mcmcsamp-CI, does it make sense to present
> any of the three SE obtained in the output above? BTW, I have no idea
> what Naive SE, Time-series SE means. Could not find much in help and
> pdfs to coda or Matrix, or in Google.

I would definitely report the REML value 10.4673 as the estimate. This is a reproducible estimate - the other two are stochastic in that you will get different values from different mcmcsamp runs. However, the reproducibility is high. Notice that they all round to 10.47 and two decimal places is quite enough precision for reporting this value when you consider that a 95% confidence interval is [7.02,13.96].

Another way of evaluating a confidence interval using the coda package is the HPDinterval function. (HPD stands for "Highest Posterior Density") It returns the shortest interval with a 95% probability content in the empirical distribution.

Here are values from a sample that I evaluated

> data(sleepstudy)
> fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
> fs1 <- mcmcsamp(fm1, 10000)
> library(coda)
> summary(fs1)

1. Empirical mean and standard deviation for each variable,

   plus standard error of the mean:

                        Mean     SD Naive SE Time-series SE
Days                 10.4810 1.6910 0.016910       0.015762

2. Quantiles for each variable:

                        2.5%       25%       50%       75%    97.5%
Days                  7.0975    9.3971   10.4851   11.5751   13.810

> fixef(fm1)
(Intercept) Days

  251.40510 10.46729
> HPDinterval(fs1)
                         lower       upper
Days                  7.038076   13.734493
[1] 0.95

If you would like to use a t-distribution to calculate a confidence interval, I would argue that the degrees of freedom for the t should be somewhere between n - p = 178 in this case (n = number of observations, p = number of coefficients in the fixed effects or rank(X) where X is the model matrix for the fixed effects) and n - rank([Z:X]) = 144. (there are 36 random effects and 2 fixed effects but the rank of [Z:X] = 36)

If we use the lower value of 144 we produce a confidence interval of

> 10.4673 + c(-1,1) * qt(0.975, df = 144) * 1.5458
[1] 7.41191 13.52269

Notice that this interval is contained in the HPD interval and in the interval obtained from the 2.5% and 97.5% quantiles of the empirical distribution. I attribute this to the fact that the standard errors are calculated conditional on the variance of the random effects. Thus the t-based confidence interval takes into account imprecision of the estimate of \sigma^2 but it assumes that the variance of the random effects is fixed at the estimated values. (That's not quite true - it is the relative variance that is assumed fixed - but the effect is the same.) The MCMC sample allows all the parameters to vary and I would claim is therefore a better measure of the marginal variability of this parameter.

However, as stated above, the HPD interval is stochastic. I would create more than one sample to check the reproducibility of the intervals. In this case intervals based on a chain of length 10000 are not wonderfully consistent

Days                  6.9389037   13.851661
Days                  6.9968306   13.768851

and I might go to chains of length 50000 to check further

Days                  7.0880422   13.903068
Days                  7.0758948   13.965146

For the benefit of editors of a biological journal I think I would report both the interval derived from the t-distribution and a representative interval from the MCMC samples to show that the MCMC samples show variation in excess of that reported by the t interval.

I hope this helps.

--Doug mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Wed Aug 09 01:50:26 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 09 Aug 2006 - 04:18:26 EST.

Mailing list information is available at Please read the posting guide before posting to the list.