Re: GLMs: show me the model!

From: Murray Jorgensen <maj_at_stats.waikato.ac.nz>
Date: Fri, 20 Feb 2009 13:08:30 +1300

One of the achievements of modern statistical computing is the ability
to express a wide variety of models with just a simple statement in the
language of a statistical package. Patrick is right to be concerned to
'unpack' the statement and know just exactly what it implies about the
distribution of the responses given the parameters and the values of the
covariates.

To many students seem content to just be able to manipulate model
specification statements without any clear idea of what they mean. To me
a model should be a prescription of how to generate new data conforming
to the model. If you can't program that, you don't understand the model.
Bill Venables has pointed out the historical connection between links
and transformations which is why links are defined as they are. I do
think, though, that the use of the link function rather than the more
natural inverse link function in a glm plays a part in making the model
more mysterious.

Apart from a number of regrettable typos I am a great fan of Gelman and
Hill's book "Data Analysis Using Regression and Multilevel/Hierarchical
Models". They go to extreme lengths with some of their examples spelling
out the the gory details of the models in about 3 equivalent ways. But
this sort of thing has to be done somewhere or users do not have a good
enough understanding of just what they are fitting.

Cheers, Murray Jorgensen

Patrick Cordue wrote:
> I asked a question on GLMs a couple of days ago. In essence I was asking
> "what is the model - please write it down - you, know, like for a linear
> model: Y = a + bx + e, where e ~N(0,s^2) - can't we do that for a GLM?"
>
> I come from a modelling background where the first step is to "write down
> the model"; the second step is to look for tools which will provide
> estimates of the unknown parameters; (I am assuming we already have a data
> set). If my model is a GLM, then I can just use glm() in R. So, I wanted to
> know the form of the GLM models for different families and link functions.
> In particular, which implied simple additive errors (Y = mu + e) and which
> implied simple multiplicative errors (Y = mu * e)?
> (where mu = E(Y))
>
> The answer provided by Murray Jorgensen is correct:
>
> "In glms there is no simple characterisation of how the
> systematic and random parts of the model combine to give you the data
> (other than the definition of the glm, of course)."
>
> Clearly for discrete distributions, it makes no sense to look for a
> "building block" error e which can be added/multiplied to/by the expectation
> to provide the response variable. My question was aimed at continuous
> distributions.
>
> Murray Smith (from NIWA) provided some useful comments (see below), which, I
> think, get to the heart of my question.
>
> However, I deduced the following results from first principles:
>
> For the Gaussian family, Y = mu + e where e ~ N(0, s^2) (and E(Y) = mu =
> m(eta) where eta is the linear combination of the explanatory/stimulus
> variables, and m^-1 is the link function) is a GLM. I take this to imply
> that when one fits a model using glm() with a Gaussian family and any link,
> that the implied error structure is additive.
>
> For the Gamma family, Y = mu * e where e ~ Gamma(k, 1/k) is a GLM. I take
> this to imply that when one fits a model using glm() with a Gamma family and
> any link, that the implied error structure is multiplicative.
>
> For the inverse Gaussian family the implied model does not have a simple
> additive or multiplicative error structure (someone might know how to write
> down the model in this case - but not me).
>
> Thanks to everyone who provided comments and references.
>
> --------------------------------------
>
> Murray H. Smith wrote:
>
> "In most GLMs the error is neither multiplicative nor additive. Parameterize
> the 1-parameter error family by the mean (fixing any dispersion or shape
> parameters, which is what pure GLM is with the added constraint that the
> error distribution belongs to a 1-parameter exponential family).
>
> We can only write
> y ~ mu + e or y ~ mu*e
> for e not depending on mu, if mu is a location or scale parameter for the
> error family. I.e.
> y ~ f( y;mu) where f(y;mu) = f(y - mu; mu =0)
> or
> y ~ f( y;mu) where f(y;mu) =1/mu* f(y/mu; mu =1)
>
> The variance function V(mu), the variance expressed as a function of the
> mean, must be constant for an additive error and proportional to mu^2 for
> multiplicative."
>
>
> --
> -----
> Patrick Cordue
> Director
> Innovative Solutions Ltd
> www.isl-solutions.co.nz
>
>
> ----
>
> FOR INFORMATION ABOUT "ANZSTAT", INCLUDING UNSUBSCRIBING, PLEASE VISIT http://www.maths.uq.edu.au/anzstat/

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj_at_waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350
----
FOR INFORMATION ABOUT "ANZSTAT", INCLUDING UNSUBSCRIBING, PLEASE VISIT http://www.maths.uq.edu.au/anzstat/
Received on Fri Feb 20 2009 - 10:08:42 EST

This archive was generated by hypermail 2.2.0 : Thu Feb 26 2009 - 11:40:40 EST