Re: [R] testing non-linear component in mgcv:gam

From: John Fox <jfox_at_mcmaster.ca>
Date: Thu 06 Oct 2005 - 02:12:15 EST


Dear Denis,

The chi-square test is formed in analogy to what's done for a GLM: The difference in residual deviance for the nested models is divided by the estimated scale parameter -- i.e., the estimated error variance for a model with normal errors. Otherwise, as you point out, the test would be dependent upon the scale of the response.

John



John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Denis Chabot
> Sent: Wednesday, October 05, 2005 9:04 AM
> To: John Fox
> Cc: R list
> Subject: Re: [R] testing non-linear component in mgcv:gam
>
> Hi John,
>
> Le 05-10-05 à 09:45, John Fox a écrit :
>
> > Dear Denis,
> >
> > Take a closer look at the anova table: The models provide identical
> > fits to the data. The differences in degrees of freedom and
> deviance
> > between the two models are essentially zero, 5.5554e-10 and
> 2.353e-11
> > respectively.
> >
> > I hope this helps,
> > John
> This is one of my difficulties. In some examples I found on
> the web, the difference in deviance is compared directly
> against the chi- squared distribution. But my y variable has
> a very small range (between 0 and 0.5, most of the time) so
> the difference in deviance is always very small and if I
> compared it against the chi-squared distribution as I have
> seen done in examples, the non-linear component would always
> be not significant. Yet it is (with one exception), tested
> with both mgcv:gam and gam:gam. I think the examples I have
> read were wrong in this regard, the "scale" factor seen in
> mgcv output seems to intervene. But exactly how is still
> mysterious to me and I hesitate to judge the size of the
> deviance difference myself.
>
> I agree it is near zero in my example. I guess I need to have
> more experience with these models to better interpret the output...
>
> Denis
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces@stat.math.ethz.ch
> >> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Denis Chabot
> >> Sent: Wednesday, October 05, 2005 8:22 AM
> >> To: r-help@stat.math.ethz.ch
> >> Subject: [R] testing non-linear component in mgcv:gam
> >>
> >> Hi,
> >>
> >> I need further help with my GAMs. Most models I test are very
> >> obviously non-linear. Yet, to be on the safe side, I report the
> >> significance of the smooth (default output of mgcv's
> >> summary.gam) and confirm it deviates significantly from linearity.
> >>
> >> I do the latter by fitting a second model where the same
> predictor is
> >> entered without the s(), and then use anova.gam to compare
> the two. I
> >> thought this was the equivalent of the default output of anova.gam
> >> using package gam instead of mgcv.
> >>
> >> I wonder if this procedure is correct because one of my models
> >> appears to be linear. In fact mgcv estimates df to be
> exactly 1.0 so
> >> I could have stopped there. However I inadvertently repeated the
> >> procedure outlined above. I would have thought in this case the
> >> anova.gam comparing the smooth and the linear fit would
> for sure have
> >> been not significant.
> >> To my surprise, P was 6.18e-09!
> >>
> >> Am I doing something wrong when I attempt to confirm the non-
> >> parametric part a smoother is significant? Here is my example case
> >> where the relationship does appear to be linear:
> >>
> >> library(mgcv)
> >>
> >>> This is mgcv 1.3-7
> >>>
> >> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
> 0.38, 0.62,
> >> 0.88, 1.12,
> >> 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
> >> 3.62, 3.88,
> >> 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
> >> 6.38, 6.62, 6.88,
> >> 7.12, 8.38, 13.62)
> >> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27,
> 27, 29, 31,
> >> 22, 26, 24, 23,
> >> 15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5,
> 3, 1, 1,
> >> 1, 1, 1) wm.sed <- c(0.000000000, 0.016129032, 0.000000000,
> >> 0.062046512, 0.396459596, 0.189082949,
> >> 0.054757925, 0.142810440, 0.168005168, 0.180804428,
> >> 0.111439628, 0.128799505,
> >> 0.193707937, 0.105921610, 0.103497845, 0.028591837,
> >> 0.217894389, 0.020535469,
> >> 0.080389068, 0.105234450, 0.070213450, 0.050771363,
> >> 0.042074434, 0.102348837,
> >> 0.049748344, 0.019100478, 0.005203125, 0.101711864,
> >> 0.000000000, 0.000000000,
> >> 0.014808824, 0.000000000, 0.222000000, 0.167000000,
> >> 0.000000000, 0.000000000,
> >> 0.000000000)
> >>
> >> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
> >> summary.gam(sed.gam)
> >>
> >>> Family: gaussian
> >>> Link function: identity
> >>>
> >>> Formula:
> >>> wm.sed ~ s(Temp)
> >>>
> >>> Parametric coefficients:
> >>> Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept) 0.08403 0.01347 6.241 3.73e-07 ***
> >>> ---
> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>> Approximate significance of smooth terms:
> >>> edf Est.rank F p-value
> >>> s(Temp) 1 1 13.95 0.000666 ***
> >>> ---
> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5%
> >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37
> >>>
> >>
> >> # testing non-linear contribution
> >> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
> >> summary.gam(sed.lin)
> >>
> >>> Family: gaussian
> >>> Link function: identity
> >>>
> >>> Formula:
> >>> wm.sed ~ Temp
> >>>
> >>> Parametric coefficients:
> >>> Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
> >>> Temp -0.023792 0.006369 -3.736 0.000666 ***
> >>> ---
> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>>
> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5%
> >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37
> >>>
> >> anova.gam(sed.lin, sed.gam, test="F")
> >>
> >>> Analysis of Deviance Table
> >>>
> >>> Model 1: wm.sed ~ Temp
> >>> Model 2: wm.sed ~ s(Temp)
> >>> Resid. Df Resid. Dev Df Deviance F Pr(>F)
> >>> 1 3.5000e+01 3.279
> >>> 2 3.5000e+01 3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
> >>>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >> Denis Chabot
> >>
> >> ______________________________________________
> >> R-help@stat.math.ethz.ch mailing list
> >>
https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide!
> >> http://www.R-project.org/posting-guide.html
> >>
> >
> >
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Oct 06 02:15:41 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:21:11 EST