Date: Wed, 15 May 1996 13:03:48 +1200 From: Ross Ihaka <ihaka@stat.auckland.ac.nz> Message-Id: <199605150103.NAA26162@stat.auckland.ac.nz> To: r-testers@stat.math.ethz.ch Subject: R-alpha: Glm Problems (from Jim Lindsey). First some history: Our glm implementation was written over last summer by Simon Davies, one of our (former) masters students (we hire the better ones over summer in an RA capacity). Simon was pestering me for things to do and I was rather busy so in a fit of nastiness I told him to go away and implement glm in R for me. Much to my surprise he came back a couple of weeks later saying he'd done it. In the prrocess he had to program around some of the shortfalls in the general modelling code, but it seems to work. Not bad for someone just having finished an introductory course in glms. Anyway I relate this because the glm code is a bit of black box to both Robert and I. Fixing it is going to be a bit of a learning experience. That said: > (1) the workspace does not save options (it did in 0.1) such as > contr.treatment. This is a consequence of our now saving only user data, not the entire system state as we previously did. The options are store in with the system stuff. S also appears not to save options between sessions. The strategy they suggest is putting an options command in your .First function. On the other hand session-to-session memory of the options settings seems like a useful thing to have. Unless anyone objects, we will look to makning this change. > (2) for a saturated model with glm(), I always obtain > Warning: NAs produced in function "pt" > Then in the table of coefficients, P(>|t|) is a column of NAs > This does not happen for unsaturated models. This is happening because the p-values are being computed for t distributions with 0 degrees of freedom. S doesn't address the issue of the distribution of the coefficients and just prints "t-statistics" with no p-values. Perhaps we should be using the normal distribution rather than the t. I'm a time-series-kinda-guy and what I know about glms came out of the Glim manual. Help! > (3) with contr.treatment, coefficients are labelled from 1 to k-1 as > if the last category were baseline, but the estimates correspond to > categories 2 to k with the first set to zero i.e. baseline The numbers printed are in fact contrast indices not level indices. I think this is compatible with S (or soon will be). > (4) with saturated models, 10 iterations are always used with the > Warning: Algorithml did not converge > whereas GLIM stops after 2-3 cycles. Of course, after 10 iterations, > the deviance is much closer to zero than in GLIM > (5) For seemingly random combinations of data and covariate codings, binomial > gives > Error in if(abs((dev-devold)/sum(dev^2)) < control$epsilon & abs((coef > -coefold)/sum( : missing value where logical needed. Both of these seem to be related to faulty convergence tests For example, what would happen above if sum(dev^2) was equal to 0. I will look closer at this. These will start to get a workover when I have fixed the X11 problems. Ross =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- r-testers mailing list -- To (un)subscribe, send subscribe or unsubscribe (in the "body", not the subject !) To: r-testers-request@stat.math.ethz.ch =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-