From: gordon@stat.ubc.ca Date: Tue, 14 May 1996 20:39:20 -0700 Message-Id: <9605150339.AA07377@> To: ihaka@stat.auckland.ac.nz, r-testers@stat.math.ethz.ch Subject: Re: R-alpha: Glm Problems (from Jim Lindsey). > From r-testers-owner@stat.math.ethz.ch Tue May 14 18:04:55 1996 > To: r-testers@stat.math.ethz.ch > Subject: R-alpha: Glm Problems (from Jim Lindsey). > > > (2) for a saturated model with glm(), I always obtain > > Warning: NAs produced in function "pt" > > Then in the table of coefficients, P(>|t|) is a column of NAs > > This does not happen for unsaturated models. > > This is happening because the p-values are being computed > for t distributions with 0 degrees of freedom. S doesn't > address the issue of the distribution of the coefficients > and just prints "t-statistics" with no p-values. > Perhaps we should be using the normal distribution rather > than the t. > > I'm a time-series-kinda-guy and what I know about glms came > out of the Glim manual. > Help! In general you should be using the normal distribution when the glm dispersion parameter is known (eg binomial, poisson glm) and the t-dist when the dispersion is unknown (eg normal, gamma, inverse.gaussian). This means that you can really can give P-values for the saturated model when the dispersion is known, although they will not be of any interest. For the saturated model when the dispersion is unknown you cannot even give standard errors, let alone P-values, these really are both NA. This brings up the point of known dispersion parameters. Please don't make the same mistakes that S-Plus has. The S-Plus summary function will correctly recognize that a binomial or poisson glm has dispersion=1 and known, however the anova and predict.glm functions are not so smart. The predict.glm function insists on estimating the dispersion using the mean squared Pearson residual. It also does an incorrect job of computing confidence intervals on the fitted.value scale. I have had to rewrite the predict function when using S-Plus to teach logistic or Poisson regression. The anova function can be sort of coerced into doing the right thing by setting test=chi. The situation is worse for glms for which the dispersion just happens to be known, rather than being structurally known. Again the summary command can handle this with the dispersion=xx option. The predict and anova commands fail in this case. It would be far more satisfactory in general if glm objects had an component indicating when the dispersion was known. In other words, there needs to be an equivalent of the GLIM $scale command. Gordon Smyth ----------------------- Dr Gordon K Smyth Department of Mathematics University of Queensland Brisbane, Q 4072 Australia (Currently visiting Statistics, University of British Columbia) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- r-testers mailing list -- To (un)subscribe, send subscribe or unsubscribe (in the "body", not the subject !) To: r-testers-request@stat.math.ethz.ch =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-