From: Joris Meys <jorismeys_at_gmail.com>

Date: Sat, 19 Jun 2010 01:55:53 +0200

Date: Sat, 19 Jun 2010 01:55:53 +0200

So in the end, it always boils down to interpretation.

On Fri, Jun 18, 2010 at 10:29 PM, Carlo Fezzi <c.fezzi_at_uea.ac.uk> wrote:

> Thanks Joris,

*>
**> I understand your point regarding the need for the two models to be
**> nested. So, according to your in the example case the LR test is not
**> appropriate and the two model should be compared with other criteria such
**> as AIC or BIC for example.
**>
**> On the other hand, Simon Wood indicated that such a LR test is
**> (approximately) correct in his previous reply... a am bit confused, which
**> is the correct approach to test the two models? Is the LR test correct
**> only if the parametric model is linear in the x variables maybe? In this
**> case, which is the best appraoch to compare a "gamm" vs a "lme" with
**> quadratic specification?
**>
**> Best wishes,
**>
**> Carlo
**>
**>> Just realized something: You should take into account that the LR test
**>> is actually only valid for _nested_ models. Your models are not
**>> nested. Hence, you shouldn't use the anova function to compare them,
**>> and you shouldn't compare the df. In fact, if you're interested in the
**>> contribution of a term, then using anova to compare the model with
**>> that term and without that term gives you an answer on the hypothesis
**>> whether that term with spline contributes significantly to the model.
**>>
**>>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML")
**>>
**>>> f3 <- gamm(y ~ x, random = list(id=~1), method="ML" )
**>>
**>>> f4 <- gamm(y ~ 1, random = list(id=~1), method="ML" )
**>>
**>>> anova(f3$lme,f2$lme)
**>> Model df AIC BIC logLik Test L.Ratio p-value
**>> f3$lme 1 4 760 770 -376
**>> f2$lme 2 5 381 394 -186 1 vs 2 380 <.0001
**>>
**>>> anova(f4$lme,f2$lme)
**>> Model df AIC BIC logLik Test L.Ratio p-value
**>> f4$lme 1 3 945 953 -470
**>> f2$lme 2 5 381 394 -186 1 vs 2 568 <.0001
**>>
**>>> anova(f3$lme,f4$lme)
**>> Model df AIC BIC logLik Test L.Ratio p-value
**>> f3$lme 1 4 760 770 -376
**>> f4$lme 2 3 945 953 -470 1 vs 2 188 <.0001
**>>
**>> This is the correct application of a likelihood ratio test. You see
**>> that adding the spline increases the df with 1 compared to the linear
**>> model, as part of the spline gets into the random component. Notice as
**>> well that the interpretation of a test in case of a random component
**>> is not the same as in case of a fixed component. If I understood
**>> correctly, this LR test specifically says something over the effect of
**>> X, without being interested in the shape of the spline. The
**>> "significance of a spline" is a difficult concept anyway, as a spline
**>> can be seen as a form of local regression. It's exactly the use of the
**>> randomization that allows for a general hypothesis about the added
**>> value of the spline, without focusing on its actual shape. Hence the
**>> "freedom" connected to that actual shape should not be used in the df
**>> used to test the general hypothesis.
**>>
**>> Hope this makes sense someway...
**>>
**>> Cheers
**>> Joris
**>>
**>>
**>> On Fri, Jun 18, 2010 at 6:27 PM, Carlo Fezzi <c.fezzi_at_uea.ac.uk> wrote:
**>>> Dear Simon,
**>>>
**>>> thanks a lot for your prompt reply.
**>>>
**>>> Unfortunately I am still confused about which is the correct way to test
**>>> the two models... as you point out: why in my example the two models
**>>> have
**>>> the same degrees of freedom?
**>>>
**>>> Intuitively it seems to me the gamm model is more flexible since, as I
**>>> understand also from you response, it should contain more random effects
**>>> than the other model because some of the smooth function parameters are
**>>> represented as such. This should not be taken into account when testing
**>>> one model vs the other?
**>>>
**>>> Continuing with my example, the two models:
**>>>
**>>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML")
**>>> f3 <- gamm(y ~ x + I(x^2), random = list(id=~1), method="ML" )
**>>>
**>>> Can be tested with:
**>>>
**>>> anova(f3$lme,f2$lme)
**>>>
**>>> But why are the df the same? Model f2 appears to be more flexible and,
**>>> as
**>>> such, should have more (random) parameters. Should not a test of one
**>>> model
**>>> vs the other take this into account?
**>>>
**>>> Sorry if this may sound dull, many thanks for your help,
**>>>
**>>> Carlo
**>>>
**>>>
**>>>
**>>>> On Wednesday 16 June 2010 20:33, Carlo Fezzi wrote:
**>>>>> Dear all,
**>>>>>
**>>>>> I am using the "mgcv" package by Simon Wood to estimate an additive
**>>>>> mixed
**>>>>> model in which I assume normal distribution for the residuals. I would
**>>>>> like to test this model vs a standard parametric mixed model, such as
**>>>>> the
**>>>>> ones which are possible to estimate with "lme".
**>>>>>
**>>>>> Since the smoothing splines can be written as random effects, is it
**>>>>> correct to use an (approximate) likelihood ratio test for this?
**>>>> -- yes this is ok (subject to the usual caveats about testing on the
**>>>> boundary
**>>>> of the parameter space) but your 2 example models below will have the
**>>>> same
**>>>> number of degrees of freedom!
**>>>>
**>>>>> If so,
**>>>>> which is the correct number of degrees of freedom?
**>>>> --- The edf from the lme object, if you are testing using the log
**>>>> likelihood
**>>>> returned by the lme representation of the model.
**>>>>
**>>>>> Sometime the function
**>>>>> LogLik() seems to provide strange results regarding the number of
**>>>>> degrees
**>>>>> of freedom (df) for the gam, for instance in the example I copied
**>>>>> below
**>>>>> the df for the "gamm" are equal to the ones for the "lme", but the
**>>>>> summary(model.gam) seems to indicate a much higher edf for the gamm.
**>>>> --- the edf for the lme representation of the model counts only the
**>>>> fixed
**>>>> effects + the variance parameters (which includes smoothing
**>>>> parameters).
**>>>> Each
**>>>> smooth typically contributes only one or two fixed effect parameters,
**>>>> with
**>>>> the rest of the coefficients for the smooth treated as random effects.
**>>>>
**>>>> --- the edf for the gam representation of the same model differs in
**>>>> that
**>>>> it
**>>>> also counts the *effective* number of parameters used to represent each
**>>>> smooth: this includes contributions from all those coefficients that
**>>>> the
**>>>> lme
**>>>> representation treated as strictly random.
**>>>>
**>>>> best,
**>>>> Simon
**>>>>
**>>>>
**>>>>> I would be very grateful to anybody who could point out a solution,
**>>>>>
**>>>>> Best wishes,
**>>>>>
**>>>>> Carlo
**>>>>>
**>>>>> Example below:
**>>>>>
**>>>>> ----
**>>>>>
**>>>>> rm(list = ls())
**>>>>> library(mgcv)
**>>>>> library(nlme)
**>>>>>
**>>>>> set.seed(123)
**>>>>>
**>>>>> x <- runif(100,1,10) # regressor
**>>>>> b0 <- rep(rnorm(10,mean=1,sd=2),each=10) # random intercept
**>>>>> id <- rep(1:10, each=10) # identifier
**>>>>>
**>>>>> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1) # dependent variable
**>>>>>
**>>>>> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" ) # lme
**>>>>> model
**>>>>>
**>>>>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" ) # gamm
**>>>>>
**>>>>> ## same number of "df" according to logLik:
**>>>>> logLik(f1)
**>>>>> logLik(f2$lme)
**>>>>>
**>>>>> ## much higher edf according to summary:
**>>>>> summary(f2$gam)
**>>>>>
**>>>>> -----------
**>>>>>
**>>>>> ______________________________________________
**>>>>> R-help_at_r-project.org mailing list
**>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>>>> PLEASE do read the posting guide
**>>>>> http://www.R-project.org/posting-guide.html and provide commented,
**>>>>> minimal,
**>>>>> self-contained, reproducible code.
**>>>>
**>>>> --
**>>>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY
**>>>>> UK
**>>>>> +44 1225 386603 www.maths.bath.ac.uk/~sw283
**>>>>
**>>>
**>>> ______________________________________________
**>>> R-help_at_r-project.org mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>> PLEASE do read the posting guide
**>>> http://www.R-project.org/posting-guide.html
**>>> and provide commented, minimal, self-contained, reproducible code.
**>>>
**>>
**>>
**>>
**>> --
**>> Joris Meys
**>> Statistical consultant
**>>
**>> Ghent University
**>> Faculty of Bioscience Engineering
**>> Department of Applied mathematics, biometrics and process control
**>>
**>> tel : +32 9 264 59 87
**>> Joris.Meys_at_Ugent.be
**>> -------------------------------
**>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
**>>
**>
**>
**>
*

-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys_at_Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Fri 18 Jun 2010 - 23:58:44 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 21 Jun 2010 - 18:10:35 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*