Re: [R] glm: offset

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon, 03 Mar 2008 09:16:41 +0000 (GMT)

On Mon, 3 Mar 2008, Ted.Harding_at_manchester.ac.uk wrote:

> On 03-Mar-08 03:19:01, Wensui Liu wrote:
>> HI, John,
>> my understanding is that you should use log(...) instead of its
>> original scale. Below is the logic in the case of poisson reg.
>> log(y / offset) = x'b
>> => log(y) - log(offset) = x'b
>> => log(y) = x'b + log(offset)
>
> Well, this is where it gets interesting!
> The above statement of the "logic" begs the question (i.e. assumes
> the answer).
>
> I would go according to the general interpretation of "offset"
> in LM and GLM modelling -- an "offset" is
>
> "a quantitative variable whose regression coefficient
> is known to be 1"
> [McCullough and Nelder (1983) "Generalised Linear Models",
> page 138]

Yes, and that is how it is defined in R too -- see ?offset.

The issue is more what you want to do with the offset. In a Poisson regression, the offset is most often used to include exposure time, the Poisson model being for log rate. Thus

mu = lambda*T, log(lamba) = Xb

means

log(mu) = Xb + log(T)

is the model for Poisson counts of occurrences in time intervals and hence the offset is log(T).

As ?offset hints, there are examples under ?glm (taken from MASS) and for dataset Insurance in package MASS. One with non-logged offset and one with ....

> Since the GLM for a Poisson regression with log link is to model
>
> L = log(mu) = a + b1*X1 + B2*X2 + ...
>
> mu is the Poisson mean, and where X1, X2, ... are the raw
> (untransformed, unless you have other reasons for tranforming
> them prior to bringing them into the regression) explanatory
> variables, if X1 is the variable you wish to use as "offset"
> in the above sense then it should be used un-transformed.
> On this basis, the answer to John Sorkin's question should be:
> don't use log(NumUniPt), use NumUniPt.
>
> There's a potential confusion here in that presumably
> "NumUniPt" may be a positive variable whose distribution
> in the data may be skew, i.e. the sort of variable that
> you may feel urged to take the log of before using it.
>
> But that would be an "other reason" in the sense of my
> comment above.
>
> After all, suppose "NumUniPt" denoted a variable in the
> data that could take negative values. Would you be happy
> to use log(NumUniPt) in that case?
>
> Best wishes to all,
> Ted.
>
>
>> On Sun, Mar 2, 2008 at 10:01 PM, John Sorkin
>> <jsorkin_at_grecc.umaryland.edu> wrote:
>>> R 2.6.0
>>> Windows XP
>>>
>>> A question about running a generalized linear model.
>>>
>>> I am running a glm with
>>> (1) a poisson distribution and a log link:
>>> family=poisson(link = "log")
>>> and an offset.
>>> I would like to know if I should express the offset as the log of the
>>> offset value, i.e.
>>> offset=log(NumUniqPt)
>>> or as:
>>> offset=NumUniqPt
>>>
>>> I suspect I need to use the log, bu t I can't find any discussion of
>>> this in MASS 1994 or on the man page for glm.
>>> Thanks
>>> John
>>>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 03 Mar 2008 - 09:21:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 03 Mar 2008 - 09:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive