[Rd] fitting problems in coxph.fit

From: Torsten Hothorn <Torsten.Hothorn_at_rzmail.uni-erlangen.de>
Date: Fri 17 Dec 2004 - 02:30:55 EST

Dear Thomas & Dear List,

the fitting function `coxph.fit' called by `coxph' may fail to estimate the regression coefficients when some values of the design matrix are very large. For example

library(survival)

### load example data
load(url("http://www.imbe.med.uni-erlangen.de/~hothorn/coxph_fit.Rda")) method <- "efron"

### copied from `coxph.fit'
coxfit <- .C("coxfit2", iter=as.integer(maxiter),

                   as.integer(n),
                   as.integer(nvar), stime,
                   sstat,
                   x= x[sorted,] ,
                   as.double(offset[sorted] - mean(offset)),
                   as.double(weights),
                   newstrat,
                   means= double(nvar),
                   coef= as.double(init),
                   u = double(nvar),
                   imat= double(nvar*nvar), loglik=double(2),
                   flag=integer(1),
                   double(2*n + 2*nvar*nvar + 3*nvar),
                   as.double(control$eps),
                   as.double(control$toler.chol),
                   sctest=as.double(method=="efron")
                 ,PACKAGE="survival")

produces

R> coxfit$coef
[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

because (?)

R> summary(x[,1])

     Min. 1st Qu. Median Mean 3rd Qu. Max. 2.378e-01 8.758e+00 5.872e+01 1.640e+04 2.732e+02 4.000e+06

One the other hand

x[,1] <- x[,1]/max(x[,1])

coxfit <- .C("coxfit2", iter=as.integer(maxiter),

                   as.integer(n),
                   as.integer(nvar), stime,
                   sstat,
                   x= x[sorted,] ,
                   as.double(offset[sorted] - mean(offset)),
                   as.double(weights),
                   newstrat,
                   means= double(nvar),
                   coef= as.double(init),
                   u = double(nvar),
                   imat= double(nvar*nvar), loglik=double(2),
                   flag=integer(1),
                   double(2*n + 2*nvar*nvar + 3*nvar),
                   as.double(control$eps),
                   as.double(control$toler.chol),
                   sctest=as.double(method=="efron")
                 ,PACKAGE="survival")

looks much better

R> coxfit$coef
[1] 0.25123203 0.00000000 -0.42595541 -0.04488913 -0.26061995
0.44426458
[7] -0.38954286 -0.43081374 0.79573107 0.48234405 0.94636357
-0.25193465
[13] 1.15619712 0.32651765 -1.06731019 -0.24249939

I nailed the problem down to lines 261ff of `coxfit2.c' where

            zbeta = offset[person];
            for (i=0; i<nvar; i++)
                zbeta += newbeta[i]*covar[i][person];
            risk = exp(zbeta ) * weights[person];

and `zbeta' may become very large and thus `exp' returns `nan'. Of course one could transform the independent variables prior to fitting but I'm not sure if this is the intented solution.

Best,

Torsten



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Dec 17 01:39:12 2004

This archive was generated by hypermail 2.1.8 : Fri 17 Dec 2004 - 02:19:40 EST