Re: R-alpha: Glm Problems (from Jim Lindsey).

gordon@stat.ubc.ca
Tue, 14 May 1996 20:39:20 -0700


From: gordon@stat.ubc.ca
Date: Tue, 14 May 1996 20:39:20 -0700
Message-Id: <9605150339.AA07377@>
To: ihaka@stat.auckland.ac.nz, r-testers@stat.math.ethz.ch
Subject: Re: R-alpha: Glm Problems (from Jim Lindsey).

>  From r-testers-owner@stat.math.ethz.ch Tue May 14 18:04:55 1996
>  To: r-testers@stat.math.ethz.ch
>  Subject: R-alpha: Glm Problems (from Jim Lindsey).
>  
>  > (2) for a saturated model with glm(), I always obtain
>  > Warning: NAs produced in function "pt"
>  > Then in the table of coefficients, P(>|t|) is a column of NAs
>  > This does not happen for unsaturated models.
>  
>  	This is happening because the p-values are being computed
>  	for t distributions with 0 degrees of freedom.  S doesn't
>  	address the issue of the distribution of the coefficients
>  	and just prints "t-statistics" with no p-values.
>  	Perhaps we should be using the normal distribution rather
>  	than the t.
>  
>  	I'm a time-series-kinda-guy and what I know about glms came
>  	out of the Glim manual.
>  	Help!

In general you should be using the normal distribution when the glm
dispersion parameter is known (eg binomial, poisson glm) and the t-dist
when the dispersion is unknown (eg normal, gamma, inverse.gaussian).
This means that you can really can give P-values for the saturated
model when the dispersion is known, although they will not be of any
interest.  For the saturated model when the dispersion is unknown you
cannot even give standard errors, let alone P-values, these really are
both NA.

This brings up the point of known dispersion parameters.  Please don't
make the same mistakes that S-Plus has.

The S-Plus summary function will correctly recognize that a binomial or
poisson glm has dispersion=1 and known, however the anova and
predict.glm functions are not so smart.  The predict.glm function
insists on estimating the dispersion using the mean squared Pearson
residual.  It also does an incorrect job of computing confidence
intervals on the fitted.value scale.  I have had to rewrite the predict
function when using S-Plus to teach logistic or Poisson regression.
The anova function can be sort of coerced into doing the right thing by
setting test=chi.

The situation is worse for glms for which the dispersion just happens to
be known, rather than being structurally known.  Again the summary command
can handle this with the dispersion=xx option.  The predict and anova
commands fail in this case.

It would be far more satisfactory in general if glm objects had an
component indicating when the dispersion was known.  In other words, there
needs to be an equivalent of the GLIM $scale command.

Gordon Smyth
-----------------------
Dr Gordon K Smyth
Department of Mathematics
University of Queensland
Brisbane, Q  4072
Australia
(Currently visiting Statistics, University of British Columbia)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-testers mailing list -- To (un)subscribe, send
subscribe	or	unsubscribe
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-