# Re: [R] Estimate of baseline hazard in survival

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Sat 11 Jun 2005 - 04:47:46 EST

On Fri, 10 Jun 2005, Hanke, Alex wrote:

> Dear All,
> I'm having just a little terminology problem, relating the language used in
> the Hosmer and Lemeshow text on Applied Survival Analysis to that of the
> help that comes with the survival package.
>
> I am trying to back out the values for the baseline hazard, h_o(t_i), for
> each event time or observation time.
> Now survfit(fit)\$surv gives me the value of the survival function,
> S(t_i|X_i,B), using mean values of the covariates and the coxph() object
> provides me with the estimate of the linear predictors, exp(X'B).
> If S(t_i|X_i,B)=S_o(t_i)^exp(X_iB) is the expression for the survival
> function
> And
> -ln(S_o(t_i) ) is the expression for the cumulative baseline hazard
> function, H_o(t_i)
> Then by rearranging the expression for the survival function I get the
> following:
> -ln(S_o(t_i) ) = -ln( S(t_i|X_i,B) ) / exp(X_iB)
> = basehaz(fit)/exp(fit\$linear.predictors)
> Am I right so far and is there an easier way?

No, and yes.

You are dividing the centered baseline hazard at each time point by the linear predictor for the person who happened to die at that time, rather than the linear predictor at the mean covariates.

basehaz(fit, centered=FALSE) will get you the baseline hazard at zero covariates.

You don't even need that. The baseline hazard at zero covariates is constant if and only if the centered baseline hazard is constant, so you could also work with basehaz(fit), which is often more numerically stable.

> The plot of the cumulative baseline hazard function , H_o(t_i), should be
> linear across time. Once I have, H_o(t_i), to get at h_o(t_i) I then need
> to reverse the cumsum operation. The corresponding plot should have a
> constant baseline hazard over time.

No. Not at all.

Unless you smooth the h_0(t_i) they are completely useless for what you want.

Suppose the hazard rate is constant and you have no covariates in the model and not even any censoring. In that case the increments of the baseline hazard are 1/n, 1/(n-1), 1/(n-2),..., 1/2, 1, where n is the sample size. So in this simplest possible cause a constant baseline hazard rate leads to h_0(t_i) increasing with t.

The proper smoothing is a little tricky, because the failure distribution is skewed and has a boundary at zero, and because of censoring. That's why textbooks often recommend graphing the cumulative hazard to see if it is linear rather than the increments in the cumulative hazard to see if they are constant.

-thomas

R-help@stat.math.ethz.ch mailing list