R-alpha: Glm Problems (from Jim Lindsey).

Ross Ihaka (ihaka@stat.auckland.ac.nz)
Wed, 15 May 1996 13:03:48 +1200

Date: Wed, 15 May 1996 13:03:48 +1200
From: Ross Ihaka <ihaka@stat.auckland.ac.nz>
Message-Id: <199605150103.NAA26162@stat.auckland.ac.nz>
To: r-testers@stat.math.ethz.ch
Subject: R-alpha: Glm Problems (from Jim Lindsey).

First some history:
Our glm implementation was written over last summer by Simon Davies,
one of our (former) masters students (we hire the better ones over
summer in an RA capacity).  Simon was pestering me for things to do and
I was rather busy so in a fit of nastiness I told him to go away and
implement glm in R for me.  Much to my surprise he came back a couple
of weeks later saying he'd done it.  In the prrocess he had to program
around some of the shortfalls in the general modelling code, but it
seems to work.  Not bad for someone just having finished an
introductory course in glms.

Anyway I relate this because the glm code is a bit of black box to
both Robert and I.  Fixing it is going to be a bit of a learning
experience.  That said:

> (1) the workspace does not save options (it did in 0.1) such as
>  contr.treatment.

	This is a consequence of our now saving only user data,
	not the entire system state as we previously did.  The
	options are store in with the system stuff.

	S also appears not to save options between sessions.
	The strategy they suggest is putting an options command
	in your .First function.

	On the other hand session-to-session memory of the options
	settings seems like a useful thing to have.  Unless anyone
	objects, we will look to makning this change.

> (2) for a saturated model with glm(), I always obtain
> Warning: NAs produced in function "pt"
> Then in the table of coefficients, P(>|t|) is a column of NAs
> This does not happen for unsaturated models.

	This is happening because the p-values are being computed
	for t distributions with 0 degrees of freedom.  S doesn't
	address the issue of the distribution of the coefficients
	and just prints "t-statistics" with no p-values.
	Perhaps we should be using the normal distribution rather
	than the t.

	I'm a time-series-kinda-guy and what I know about glms came
	out of the Glim manual.

> (3) with contr.treatment, coefficients are labelled from 1 to k-1 as
> if the last category were baseline, but the estimates correspond to
> categories 2 to k with the first set to zero i.e. baseline

	The numbers printed are in fact contrast indices not level
	indices.  I think this is compatible with S (or soon will be).

> (4) with saturated models, 10 iterations are always used with the
> Warning: Algorithml did not converge
> whereas GLIM stops after 2-3 cycles. Of course, after 10 iterations,
> the deviance is much closer to zero than in GLIM
> (5) For seemingly random combinations of data and covariate codings, binomial
> gives
> Error in if(abs((dev-devold)/sum(dev^2)) < control$epsilon & abs((coef
> -coefold)/sum(   : missing value where logical needed.

	Both of these seem to be related to faulty convergence tests
	For example, what would happen above if sum(dev^2) was equal
	to 0.  I will look closer at this.

These will start to get a workover when I have fixed the X11 problems.

r-testers mailing list -- To (un)subscribe, send
subscribe	or	unsubscribe
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch