Re: [R] Explaining Survival difference between Stata and R

About this list Date view Thread view Subject view Author view Attachment view

From: Paul Johnson (pauljohn@ku.edu)
Date: Thu 13 May 2004 - 17:10:35 EST


Message-id: <40A31F6B.8050102@ku.edu>

Dear Goran (and others)

I did not know about the "eha" package, but reading the docs, I see many
things I've been looking for, including the parametric hazard model with
Weibull baseline. Thanks for the tip, and the package.

I still don't quite understand your point about the reason that coxph
crashes. Why does the difference of scale between the variables cause
trouble? I understand that a redundant set of variables would cause
singularity, but do not see why differing scales for variables would
cause that. Does the explanation lie in the optimization algorithm
itself? I'll read the eha package coxreg source code. I bet that will
show me what's going on in your fix, anyway.

Still, I wish coxph just handled all that stuff.

Just for fun, here are the Stata results for the same small model that
you report. If I ask stata to use the efron method of breaking ties,
then the results do almost exactly match what you found.

stset yrs2, failure (ratify)
stcox haz_wst pol_free , robust nohr efron

Iteration 0: log pseudo-likelihood = -45.380139
Iteration 1: log pseudo-likelihood = -45.00981
Iteration 2: log pseudo-likelihood = -45.001196
Iteration 3: log pseudo-likelihood = -45.001193
Refining estimates:
Iteration 0: log pseudo-likelihood = -45.001193

Cox regression -- Efron method for ties

No. of subjects = 21 Number of obs =
     21
No. of failures = 21
Time at risk = 78
                                                    Wald chi2(2) =
    1.09
Log pseudo-likelihood = -45.001193 Prob > chi2 =
0.5785

------------------------------------------------------------------------------
              | Robust
           _t | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
      haz_wst | 8.48e-08 8.63e-08 0.98 0.326 -8.43e-08
2.54e-07
     pol_free | .0089626 .1047656 0.09 0.932 -.1963741
.2142994
------------------------------------------------------------------------------

Göran Broström wrote:

> Paul and Thomas,
>
> 'coxph' or the data (I got it from Paul) indeed has a problem:
> ------------------------------------------------------------
> Call:
> coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
>
>
> coef exp(coef) se(coef) z p
> haz.wst 8.53e-08 1 9.47e-08 0.901 0.37
> pol.free NA NA 0.00e+00 NA NA
>
> Likelihood ratio test=0.76 on 1 df, p=0.385 n= 21
> Warning message:
> X matrix deemed to be singular; variable 2 in:
> coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
> ---------------------------------------------------------------
>
> Is it the data? Let's try 'coxreg' (eha):
> ---------------------------------------------------------------
> Call:
> coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
>
> Covariate Mean Coef RR Wald p
> haz.wst 2054901 0.000 1.000 0.372
> pol.free 2.090 0.009 1.009 0.958
>
> Events 21
> Total time at risk 78
> Max. log. likelihood -45.001
> LR test statistic 0.76
> Degrees of freedom 2
> Overall p-value 0.684583
> ----------------------------------------------------------------
>
> This worked just fine (Paul, same results as in Stata?). But, we
> seem to have a scaling problem; lok at the means of the covariates!
> Some rescaling gives:
> ----------------------------------------------------------------
> Call:
> coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free,
> data = dat)
>
>
> coef exp(coef) se(coef) z p
> I(haz.wst * 1e-06) 0.08479 1.09 0.095 0.8920 0.37
> pol.free 0.00896 1.01 0.170 0.0526 0.96
>
> Likelihood ratio test=0.76 on 2 df, p=0.685 n= 21
> ----------------------------------------------------------------
> and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph
> for sure, but I do know that coxreg centers all covariates before the NR
> procedure starts. Maybe we also should rescale to unit variance? And of
> course scale back the coefficients and se:s at the end?
>
> BTW, Paul's data is heavily tied. Could be an idea to use a discrete time
> version of the PH model: you can do that with 'mlreg' (eha):
> ---------------------------------------------------------------
> Call:
> mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free,
> data = dat)
>
> Covariate Mean Coef RR Wald p
> I(haz.wst * 1e-06) 2.055 0.097 1.102 0.320
> pol.free 2.090 0.003 1.003 0.985
>
> Events 21
> Total time at risk 78
> Max. log. likelihood -36.324
> LR test statistic 0.93
> Degrees of freedom 2
> Overall p-value 0.627056
> ----------------------------------------------------------------
> Doesn't make much of a difference, though. Efron's method is quite
> robust. (You might check if there are any differences with 'breslow')
>
> Best,
>
> Göran
>
> [...]

-- 
Paul E. Johnson                       email: pauljohn@ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700

______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:09 EST