# Re: [R] Coxph with factors

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Sun 17 Jul 2005 - 01:07:28 EST

On Sat, 16 Jul 2005, Kylie-Anne Richards wrote:

> Thank you for your help.
> ____________________________________________________________
>> In any case, to specify f.pom You need it to be a factor with the same set
>> of levels. You don't say what the lowest level of pom is, but if it is,
>> say, -3.
>>
>> f.pom=factor(-3, levels=seq(-3,2.5, by=0.5))
> ____________________________________________________________
>
> For this particular model, f.pom starts at -5.5 going to 2 in 0.5 increments.
> I seem to have misunderstood your explanation, as R is still producing an
> error.

In the model you showed, there were no factor levels below -2.5. You need to make sure that the levels are the same in the initial data and the data supplied to survfit. Check this with levels().

> ____________________________________________________________
>> I would first note that the survival function at zero covariates is not a
>> very useful thing and is usually numerically unstable, and it's often more
>> useful to get the survival function at some reasonable set of covariates.
> ____________________________________________________________
>
> Please correct me if I'm wrong, I was under the impression that the survival

> function at zero covariates gave the baseline distribution. I.e. if given the
> baseline prob.,S_0, at time t, one could calculate the survival prob for
> specified covariates by
> S_0^exp(beta(vo)*specified(vo)+beta(po)*specified(po)+beta(f.pom at the level
> of interest)) for time t.
>
> Since I was unable to get survfit to work with specified covariates, I was
> using the survival probs of the 'avg' covariates, S(t), to determine the
> baseline at time t, i.e.
> S(t)^(1/exp(beta(vo)*mean(vo)+beta(po)*mean(po)+beta(f.pom-5.5)*mean(f.pom-5.5)+beta(f.pom-5.0)*mean(f.pom-5.0)+........).
> And then proceeding as mention in the above paragraph (clearly not an
> efficient way of doing things).
>

S(t; z1)= S(t; z2)^(z1-z2)

For convenience of mathematical notation, mathematical statisticians write everything in terms of z2=0, and call this "the baseline". In the real world, though, you are better off with a baseline defined at a covariate value somewhere in the vicinity of the actual data. If, as if often the case, the zero covariate value is a long way from the observed data, both the computation of the survival curve at zero and the transformation to the covariates you want are numerically ill-conditioned.

So, you can use the "baseline" returned by survfit(z2), which is at z2=fit\$means, to do anything you can do with the baseline at z=0, and the computations will be more accurate.

-thomas

R-help@stat.math.ethz.ch mailing list