Re: [R] testing non-linear component in mgcv:gam

From: John Fox <jfox_at_mcmaster.ca>
Date: Thu 06 Oct 2005 - 07:01:58 EST


Dear Denis,

You got me: I would have thought from ?summary.gam that this would be the same as the adjusted R^2 for a linear model. Note, however, that the percentage of deviance explained checks with the R^2 from the linear model, as expected.

Maybe you should address this question to the package author.

Regards,
 John



John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: Denis Chabot [mailto:chabotd@globetrotter.net]
> Sent: Wednesday, October 05, 2005 3:33 PM
> To: John Fox
> Cc: R list
> Subject: Re: [R] testing non-linear component in mgcv:gam
>
> Thank you everyone for your help, but my introduction to GAM
> is turning my brain to mush. I thought the one part of the
> output I understood the best was r-sq (adj), but now even
> this is becoming foggy.
>
> In my original message I mentioned a gam fit that turns out
> to be a linear fit. By curiosity I analysed it with a linear
> predictor only with mgcv package, and then as a linear model.
> The output was identical in both, but the r-sq (adj) was 0.55
> in mgcv and 0.26 in lm. In lm I hope that my interpretation
> that 26% of the variance in y is explained by the linear
> relationship with x is valid. Then what does r2 mean in mgcv?
>
> Denis
> > summary.gam(lin)
>
> Family: gaussian
> Link function: identity
>
> Formula:
> wm.sed ~ Temp
>
> Parametric coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
> Temp -0.023792 0.006369 -3.736 0.000666 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>
> R-sq.(adj) = 0.554 Deviance explained = 28.5%
> GCV score = 0.09904 Scale est. = 0.093686 n = 37
>
>
> > summary(sed.true.lin)
>
> Call:
> lm(formula = wm.sed ~ Temp, weights = N.sets)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.6138 -0.1312 -0.0325 0.1089 1.1449
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
> Temp -0.023792 0.006369 -3.736 0.000666 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.3061 on 35 degrees of freedom
> Multiple R-Squared: 0.285, Adjusted R-squared: 0.2646
> F-statistic: 13.95 on 1 and 35 DF, p-value: 0.000666
>
>
> Le 05-10-05 à 09:45, John Fox a écrit :
>
> > Dear Denis,
> >
> > Take a closer look at the anova table: The models provide identical
> > fits to the data. The differences in degrees of freedom and
> deviance
> > between the two models are essentially zero, 5.5554e-10 and
> 2.353e-11
> > respectively.
> >
> > I hope this helps,
> > John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> >
http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces@stat.math.ethz.ch
> >> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Denis Chabot
> >> Sent: Wednesday, October 05, 2005 8:22 AM
> >> To: r-help@stat.math.ethz.ch
> >> Subject: [R] testing non-linear component in mgcv:gam
> >>
> >> Hi,
> >>
> >> I need further help with my GAMs. Most models I test are very
> >> obviously non-linear. Yet, to be on the safe side, I report the
> >> significance of the smooth (default output of mgcv's
> >> summary.gam) and confirm it deviates significantly from linearity.
> >>
> >> I do the latter by fitting a second model where the same
> predictor is
> >> entered without the s(), and then use anova.gam to compare
> the two. I
> >> thought this was the equivalent of the default output of anova.gam
> >> using package gam instead of mgcv.
> >>
> >> I wonder if this procedure is correct because one of my models
> >> appears to be linear. In fact mgcv estimates df to be
> exactly 1.0 so
> >> I could have stopped there. However I inadvertently repeated the
> >> procedure outlined above. I would have thought in this case the
> >> anova.gam comparing the smooth and the linear fit would
> for sure have
> >> been not significant.
> >> To my surprise, P was 6.18e-09!
> >>
> >> Am I doing something wrong when I attempt to confirm the non-
> >> parametric part a smoother is significant? Here is my example case
> >> where the relationship does appear to be linear:
> >>
> >> library(mgcv)
> >>
> >>> This is mgcv 1.3-7
> >>>
> >> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
> 0.38, 0.62,
> >> 0.88, 1.12,
> >> 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
> >> 3.62, 3.88,
> >> 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
> >> 6.38, 6.62, 6.88,
> >> 7.12, 8.38, 13.62)
> >> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27,
> 27, 29, 31,
> >> 22, 26, 24, 23,
> >> 15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5,
> 3, 1, 1,
> >> 1, 1, 1) wm.sed <- c(0.000000000, 0.016129032, 0.000000000,
> >> 0.062046512, 0.396459596, 0.189082949,
> >> 0.054757925, 0.142810440, 0.168005168, 0.180804428,
> >> 0.111439628, 0.128799505,
> >> 0.193707937, 0.105921610, 0.103497845, 0.028591837,
> >> 0.217894389, 0.020535469,
> >> 0.080389068, 0.105234450, 0.070213450, 0.050771363,
> >> 0.042074434, 0.102348837,
> >> 0.049748344, 0.019100478, 0.005203125, 0.101711864,
> >> 0.000000000, 0.000000000,
> >> 0.014808824, 0.000000000, 0.222000000, 0.167000000,
> >> 0.000000000, 0.000000000,
> >> 0.000000000)
> >>
> >> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
> >> summary.gam(sed.gam)
> >>
> >>> Family: gaussian
> >>> Link function: identity
> >>>
> >>> Formula:
> >>> wm.sed ~ s(Temp)
> >>>
> >>> Parametric coefficients:
> >>> Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept) 0.08403 0.01347 6.241 3.73e-07 ***
> >>> ---
> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>> Approximate significance of smooth terms:
> >>> edf Est.rank F p-value
> >>> s(Temp) 1 1 13.95 0.000666 ***
> >>> ---
> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5%
> >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37
> >>>
> >>
> >> # testing non-linear contribution
> >> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
> >> summary.gam(sed.lin)
> >>
> >>> Family: gaussian
> >>> Link function: identity
> >>>
> >>> Formula:
> >>> wm.sed ~ Temp
> >>>
> >>> Parametric coefficients:
> >>> Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
> >>> Temp -0.023792 0.006369 -3.736 0.000666 ***
> >>> ---
> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>>
> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5%
> >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37
> >>>
> >> anova.gam(sed.lin, sed.gam, test="F")
> >>
> >>> Analysis of Deviance Table
> >>>
> >>> Model 1: wm.sed ~ Temp
> >>> Model 2: wm.sed ~ s(Temp)
> >>> Resid. Df Resid. Dev Df Deviance F Pr(>F)

> >>> 1 3.5000e+01 3.279
> >>> 2 3.5000e+01 3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
> >>>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >> Denis Chabot
> >>
> >> ______________________________________________
> >> R-help@stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide!
> >> http://www.R-project.org/posting-guide.html
> >>
> >
> >
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Oct 06 07:10:51 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:23:20 EST