From: Simon Wood <sw283_at_maths.bath.ac.uk>

Date: Mon 24 Oct 2005 - 23:41:19 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Oct 24 23:54:21 2005

Date: Mon 24 Oct 2005 - 23:41:19 EST

Looks like a bug in mgcv::summary.gam when the model is strictly parametric... I'll take a look and fix it. thanks, Simon

*>
*

> In my original message I mentioned a gam fit that turns out to be a

*> linear fit. By curiosity I analysed it with a linear predictor only
**> with mgcv package, and then as a linear model. The output was
**> identical in both, but the r-sq (adj) was 0.55 in mgcv and 0.26 in
**> lm. In lm I hope that my interpretation that 26% of the variance in y
**> is explained by the linear relationship with x is valid. Then what
**> does r2 mean in mgcv?
**>
**> Denis
**> > summary.gam(lin)
**>
**> Family: gaussian
**> Link function: identity
**>
**> Formula:
**> wm.sed ~ Temp
**>
**> Parametric coefficients:
**> Estimate Std. Error t value Pr(>|t|)
**> (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
**> Temp -0.023792 0.006369 -3.736 0.000666 ***
**> ---
**> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
**>
**>
**> R-sq.(adj) = 0.554 Deviance explained = 28.5%
**> GCV score = 0.09904 Scale est. = 0.093686 n = 37
**>
**>
**> > summary(sed.true.lin)
**>
**> Call:
**> lm(formula = wm.sed ~ Temp, weights = N.sets)
**>
**> Residuals:
**> Min 1Q Median 3Q Max
**> -0.6138 -0.1312 -0.0325 0.1089 1.1449
**>
**> Coefficients:
**> Estimate Std. Error t value Pr(>|t|)
**> (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
**> Temp -0.023792 0.006369 -3.736 0.000666 ***
**> ---
**> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
**>
**> Residual standard error: 0.3061 on 35 degrees of freedom
**> Multiple R-Squared: 0.285, Adjusted R-squared: 0.2646
**> F-statistic: 13.95 on 1 and 35 DF, p-value: 0.000666
**>
**>
**> Le 05-10-05 à 09:45, John Fox a écrit :
**>
**> > Dear Denis,
**> >
**> > Take a closer look at the anova table: The models provide identical
**> > fits to
**> > the data. The differences in degrees of freedom and deviance
**> > between the two
**> > models are essentially zero, 5.5554e-10 and 2.353e-11 respectively.
**> >
**> > I hope this helps,
**> > John
**> >
**> > --------------------------------
**> > John Fox
**> > Department of Sociology
**> > McMaster University
**> > Hamilton, Ontario
**> > Canada L8S 4M4
**> > 905-525-9140x23604
**> > http://socserv.mcmaster.ca/jfox
**> > --------------------------------
**> >
**> >
**> >> -----Original Message-----
**> >> From: r-help-bounces@stat.math.ethz.ch
**> >> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Denis Chabot
**> >> Sent: Wednesday, October 05, 2005 8:22 AM
**> >> To: r-help@stat.math.ethz.ch
**> >> Subject: [R] testing non-linear component in mgcv:gam
**> >>
**> >> Hi,
**> >>
**> >> I need further help with my GAMs. Most models I test are very
**> >> obviously non-linear. Yet, to be on the safe side, I report
**> >> the significance of the smooth (default output of mgcv's
**> >> summary.gam) and confirm it deviates significantly from linearity.
**> >>
**> >> I do the latter by fitting a second model where the same
**> >> predictor is entered without the s(), and then use anova.gam
**> >> to compare the two. I thought this was the equivalent of the
**> >> default output of anova.gam using package gam instead of mgcv.
**> >>
**> >> I wonder if this procedure is correct because one of my
**> >> models appears to be linear. In fact mgcv estimates df to be
**> >> exactly 1.0 so I could have stopped there. However I
**> >> inadvertently repeated the procedure outlined above. I would
**> >> have thought in this case the anova.gam comparing the smooth
**> >> and the linear fit would for sure have been not significant.
**> >> To my surprise, P was 6.18e-09!
**> >>
**> >> Am I doing something wrong when I attempt to confirm the non-
**> >> parametric part a smoother is significant? Here is my example
**> >> case where the relationship does appear to be linear:
**> >>
**> >> library(mgcv)
**> >>
**> >>> This is mgcv 1.3-7
**> >>>
**> >> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
**> >> 0.38, 0.62, 0.88, 1.12,
**> >> 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12,
**> >> 3.38, 3.62, 3.88,
**> >> 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88,
**> >> 6.12, 6.38, 6.62, 6.88,
**> >> 7.12, 8.38, 13.62)
**> >> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27,
**> >> 29, 31, 22, 26, 24, 23,
**> >> 15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3,
**> >> 1, 1, 1, 1, 1) wm.sed <- c(0.000000000, 0.016129032,
**> >> 0.000000000, 0.062046512, 0.396459596, 0.189082949,
**> >> 0.054757925, 0.142810440, 0.168005168,
**> >> 0.180804428, 0.111439628, 0.128799505,
**> >> 0.193707937, 0.105921610, 0.103497845,
**> >> 0.028591837, 0.217894389, 0.020535469,
**> >> 0.080389068, 0.105234450, 0.070213450,
**> >> 0.050771363, 0.042074434, 0.102348837,
**> >> 0.049748344, 0.019100478, 0.005203125,
**> >> 0.101711864, 0.000000000, 0.000000000,
**> >> 0.014808824, 0.000000000, 0.222000000,
**> >> 0.167000000, 0.000000000, 0.000000000,
**> >> 0.000000000)
**> >>
**> >> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
**> >> summary.gam(sed.gam)
**> >>
**> >>> Family: gaussian
**> >>> Link function: identity
**> >>>
**> >>> Formula:
**> >>> wm.sed ~ s(Temp)
**> >>>
**> >>> Parametric coefficients:
**> >>> Estimate Std. Error t value Pr(>|t|)
**> >>> (Intercept) 0.08403 0.01347 6.241 3.73e-07 ***
**> >>> ---
**> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**> >>>
**> >>> Approximate significance of smooth terms:
**> >>> edf Est.rank F p-value
**> >>> s(Temp) 1 1 13.95 0.000666 ***
**> >>> ---
**> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**> >>>
**> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5%
**> >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37
**> >>>
**> >>
**> >> # testing non-linear contribution
**> >> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
**> >> summary.gam(sed.lin)
**> >>
**> >>> Family: gaussian
**> >>> Link function: identity
**> >>>
**> >>> Formula:
**> >>> wm.sed ~ Temp
**> >>>
**> >>> Parametric coefficients:
**> >>> Estimate Std. Error t value Pr(>|t|)
**> >>> (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
**> >>> Temp -0.023792 0.006369 -3.736 0.000666 ***
**> >>> ---
**> >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**> >>>
**> >>>
**> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5%
**> >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37
**> >>>
**> >> anova.gam(sed.lin, sed.gam, test="F")
**> >>
**> >>> Analysis of Deviance Table
**> >>>
**> >>> Model 1: wm.sed ~ Temp
**> >>> Model 2: wm.sed ~ s(Temp)
**> >>> Resid. Df Resid. Dev Df Deviance F Pr(>F)
**> >>> 1 3.5000e+01 3.279
**> >>> 2 3.5000e+01 3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
**> >>>
**> >>
**> >>
**> >> Thanks in advance,
**> >>
**> >>
**> >> Denis Chabot
**> >>
**> >> ______________________________________________
**> >> R-help@stat.math.ethz.ch mailing list
**> >> https://stat.ethz.ch/mailman/listinfo/r-help
**> >> PLEASE do read the posting guide!
**> >> http://www.R-project.org/posting-guide.html
**> >>
**> >
**> >
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Oct 24 23:54:21 2005

*
This archive was generated by hypermail 2.1.8
: Tue 25 Oct 2005 - 01:12:46 EST
*