From: Simon Wood <sw283_at_maths.bath.ac.uk>

Date: Mon 24 Oct 2005 - 23:37:30 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Oct 24 23:45:58 2005

Date: Mon 24 Oct 2005 - 23:37:30 EST

In this case the models being compared are really identical, and the P-value is meaningless numerical noise.

If your main focus is hypothesis testing and you really need near exact p-values, then do this sort of testing using unpenalized models. i.e. don't have mgcv::gam estimate the EDF of the smooth, just use `s(...,fx=TRUE)' to estimate it unpenalized. This gives horrible point estimates and excellent p-values.

best,

Simon

*> Hi,
**>
*

> I need further help with my GAMs. Most models I test are very

*> obviously non-linear. Yet, to be on the safe side, I report the
**> significance of the smooth (default output of mgcv's summary.gam) and
**> confirm it deviates significantly from linearity.
**>
**> I do the latter by fitting a second model where the same predictor is
**> entered without the s(), and then use anova.gam to compare the two. I
**> thought this was the equivalent of the default output of anova.gam
**> using package gam instead of mgcv.
**>
**> I wonder if this procedure is correct because one of my models
**> appears to be linear. In fact mgcv estimates df to be exactly 1.0 so
**> I could have stopped there. However I inadvertently repeated the
**> procedure outlined above. I would have thought in this case the
**> anova.gam comparing the smooth and the linear fit would for sure have
**> been not significant. To my surprise, P was 6.18e-09!
**>
**> Am I doing something wrong when I attempt to confirm the non-
**> parametric part a smoother is significant? Here is my example case
**> where the relationship does appear to be linear:
**>
**> library(mgcv)
**> > This is mgcv 1.3-7
**> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 0.38, 0.62,
**> 0.88, 1.12,
**> 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
**> 3.62, 3.88,
**> 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
**> 6.38, 6.62, 6.88,
**> 7.12, 8.38, 13.62)
**> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,
**> 22, 26, 24, 23,
**> 15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,
**> 1, 1, 1)
**> wm.sed <- c(0.000000000, 0.016129032, 0.000000000, 0.062046512,
**> 0.396459596, 0.189082949,
**> 0.054757925, 0.142810440, 0.168005168, 0.180804428,
**> 0.111439628, 0.128799505,
**> 0.193707937, 0.105921610, 0.103497845, 0.028591837,
**> 0.217894389, 0.020535469,
**> 0.080389068, 0.105234450, 0.070213450, 0.050771363,
**> 0.042074434, 0.102348837,
**> 0.049748344, 0.019100478, 0.005203125, 0.101711864,
**> 0.000000000, 0.000000000,
**> 0.014808824, 0.000000000, 0.222000000, 0.167000000,
**> 0.000000000, 0.000000000,
**> 0.000000000)
**>
**> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
**> summary.gam(sed.gam)
**> > Family: gaussian
**> > Link function: identity
**> >
**> > Formula:
**> > wm.sed ~ s(Temp)
**> >
**> > Parametric coefficients:
**> > Estimate Std. Error t value Pr(>|t|)
**> > (Intercept) 0.08403 0.01347 6.241 3.73e-07 ***
**> > ---
**> > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
**> >
**> > Approximate significance of smooth terms:
**> > edf Est.rank F p-value
**> > s(Temp) 1 1 13.95 0.000666 ***
**> > ---
**> > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
**> >
**> > R-sq.(adj) = 0.554 Deviance explained = 28.5%
**> > GCV score = 0.09904 Scale est. = 0.093686 n = 37
**>
**> # testing non-linear contribution
**> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
**> summary.gam(sed.lin)
**> > Family: gaussian
**> > Link function: identity
**> >
**> > Formula:
**> > wm.sed ~ Temp
**> >
**> > Parametric coefficients:
**> > Estimate Std. Error t value Pr(>|t|)
**> > (Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
**> > Temp -0.023792 0.006369 -3.736 0.000666 ***
**> > ---
**> > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
**> >
**> >
**> > R-sq.(adj) = 0.554 Deviance explained = 28.5%
**> > GCV score = 0.09904 Scale est. = 0.093686 n = 37
**> anova.gam(sed.lin, sed.gam, test="F")
**> > Analysis of Deviance Table
**> >
**> > Model 1: wm.sed ~ Temp
**> > Model 2: wm.sed ~ s(Temp)
**> > Resid. Df Resid. Dev Df Deviance F Pr(>F)
**> > 1 3.5000e+01 3.279
**> > 2 3.5000e+01 3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
**>
**>
**> Thanks in advance,
**>
**>
**> Denis Chabot
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Oct 24 23:45:58 2005

*
This archive was generated by hypermail 2.1.8
: Tue 25 Oct 2005 - 01:12:46 EST
*