Re: [R] poisson regression with robust error variance ('eyestudy

From: Paul Johnson <>
Date: Thu, 08 May 2008 10:28:27 -0500

On Thu, May 8, 2008 at 8:38 AM, Ted Harding <> wrote:
> The below is an old thread:
> On 02-Jun-04 10:52:29, Lutz Ph. Breitling wrote:
> > Dear all,
> >
> > i am trying to redo the 'eyestudy' analysis presented on the site
> >
> > with R (1.9.0), with special interest in the section on "relative
> > risk estimation by poisson regression with robust error variance".
> >
> > so i guess rlm is the function to use. but what is its equivalent
> > to the glm's argument "family" to indicate 'poisson'? or am i
> > somehow totally wrong and this is not applicable here?
There have been several questions about getting robust standard errors in glm lately. I went and read that UCLA website on the RR eye study and the Zou article that uses a glm with robust standard errors.

I don't think "rlm" is the right way to go because that gives different parameter estimates. You need to estimate with glm and then get standard errors that are adjusted for heteroskedasticity. Well, you may wish to use rlm for other reasons, but to replicate that eyestudy project, you need to take the ordinary usual estimates of the b's and adjust the standard errors.

The Zou article does not give the mathematical formulae used to estimate those robust errors, it rather gives a code snippit on using SAS proc GENMOD to get those estimates. Presumably, if we had access to the SAS formula, we could easily get the calculations we need with R. It is a little irksome to me that people think saying "use SAS proc GENMOD" is informative. Rather, we really need to know which TYPE of robust standard error is being considered, since there are about 10 competing formulations.

I started checking various R packages for calculating sandwich variance estimates. There is a package "sandwich" for that purpose, and in the "car" package there is a function hccm that can do so. The hccm in car refuses to take glm objects, but the function vcovHC in "sandwich" will do so. The discussion for sandwich's vcovHC function refers to linear models, and lm objects are used in the examples, but the vignette "sandwich" distributed with the package states "The HAC estimators are already available for generalized linear models (fitted by glm) and robust regression (fitted by rlm in package MASS)." .

As a result, one can get a var/covar matrix from the routines in the sandwich package, but I'm not entirely sure they are match the ones SAS gives. The vcovHC help page has a nice explanation of the differences across sandwich estimators. It says they are all variants on

(X'X)^{-1} X' Omega X (X'X)^{-1}

With different stipulations about Omega. If we knew the stipulation about OMEGA used in the SAS routine, we could try it.

I tried to get the eyestudy data, but the link on the UCLA website is no longer valid.

So I generated some phony 0-1 data:

y <- as.numeric(rnorm(1000) > 0)
x <- rnorm(1000)
> glm1 <- glm(y~x, family=binomial)
> hccm(glm1)
Error in hccm.lm(glm1) : requires unweighted lm
> vcovH

vcovHAC vcovHC
> glm1 <- glm(y~x, family=poisson(link=log))
> vcovHC(glm1)

              (Intercept) x
(Intercept) 1.017588e-03 -3.722456e-05
x -3.722456e-05 9.492517e-04

The default type of estimation the method called HC3, which is recommended by authors of some Monte Carlo research studies. vcovHC calculates White's basic correction if you run:

> vcovHC(glm1, type="HC0")

              (Intercept) x
(Intercept) 1.013508e-03 -3.691381e-05
x -3.691381e-05 9.417839e-04

Compare against the non-robust glm var/covar matrix.

> vcov(glm1)

              (Intercept) x
(Intercept) 0.0020152998 -0.0000778422
x -0.0000778422 0.0018721903

In conclusion, use glm followed by vcovHC and I believe you will find estimates like the ones provided by SAS or Stata. But, without access to their data, I can't be entirely sure.

Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 08 May 2008 - 15:57:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 08 May 2008 - 16:30:36 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive