Re: [R] exponential distribution

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon, 09 Jun 2008 06:42:55 +0100 (BST)

What do you mean by 'correct' here? If you do manually what glm() does, you would expect to get the same answer, but that is not independent verification.

As far as I recall either glm nor survreg are calculation the exact variance in this case -- they are both making use of asymptotic theory. glm makes use of the expected (Fisher) information evaluated at the MLE, survreg of the observed information -- those are asymptotically equivalent but there is some evidence preferring the observed information.

As I hinted, I've not checked this for several years and I will leave it to you to study the documentation and code.

On Sun, 8 Jun 2008, Antonio Gasparrini wrote:

> Dear all,
>
> I've tried to solve the Es. 12, cap 4 of "Introduction to GLM" by
> Annette Dobson. It's about the relationship between survival time of
> leukemia patients and blood cell count. I tried to fit a model with
> exponential distribution, first by glm (family gamma and then dispersion
> parameter fixed to 1) and then with survreg. They gave me the same point
> estimates but the standard errors are slightly different. I checked the
> results building the routine manually, and it seems that the glm results
> are the correct one.
>
> #######################################################
> data <- data.frame(y=c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65),
> x=c(3.36,2.88,3.63,3.41,3.78,4.02,4.00,4.23,3.73,3.85,3.97,4.51,
> 4.54,5.00,5.00,4.72,5.00))
>
> model1 <- glm(y~x,family=Gamma(link="log"),data)
> summary(model1,dispersion=1)
> model2 <- survreg(Surv(y) ~ x, data, dist="exponential")
> summary(model2)
>
> X <- model.matrix(model1)
> y <- as.vector(data$y)
> b <- as.matrix(c(1,1)) # STARTING VALUES
> iter <- matrix(0,7,2)
> iter[1,] <- b
> for (i in 2:7) {
> W <- diag(rep(1,length(y)))
> z <- X%*%b + (y-exp(X%*%b))*1/exp(X%*%b)
> b <- solve(t(X)%*%W%*%X) %*% (t(X)%*%W%*%z)
> iter[i,] <- b
> }
> summary(model1,dispersion=1)$coef
> summary(model2)
> iter[nrow(temp),]
> sqrt(diag(solve(t(X)%*%W%*%X)))
> #######################################################
>
> Can you explain if this difference is due to an error?
> Thanks in advance
>
>
> Antonio Gasparrini
> Public and Environmental Health Research Unit (PEHRU)
> London School of Hygiene & Tropical Medicine
> Keppel Street, London WC1E 7HT, UK
> Office: 0044 (0)20 79272406 - Mobile: 0044 (0)79 64925523
> http://www.lshtm.ac.uk/people/gasparrini.antonio ( http://www.lshtm.ac.uk/pehru/ )

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 09 Jun 2008 - 05:51:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 09 Jun 2008 - 06:30:39 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive