[R] covariate selection in cox model (counting process)

From: Mayeul KAUFFMANN <mayeul.kauffmann_at_tiscali.fr>
Date: Tue 27 Jul 2004 - 08:28:45 EST


Thank you a lot for your time and your answer, Thomas. Like all good answers, it raised new questions for me ;-)

>In the case of recurrent events coxph() is not
> using maximum likelihood or even maximum partial likelihood. It is
> maximising the quantity that (roughly speaking) would be the partial
> likelihood if the covariates explained all the cluster differences.

I could have non repeating events by removing countries once they have experienced a war. But I'm not sure it will change the estimation procedure since this will change the dataset only, not the formula coxph(Surv(start,stop,status)~x1+x2+...+cluster(id),robust=T)

I am not sure I understood you well: do you really mean "recurrent events" alone or "any counting process notation (including allowing for recurrent events)".

I thought the counting process notation did not differ really from the Cox model in R, since Terry M. Therneau (A Package for Survival Analysis in S, April 22, 1996) concludes his mathematical section "3.3 Cox Model" by "The above notation is derived from the counting process representation [...] It allows very naturally for several extensions to the original Cox model formulation: multiple events per subject, discontinuous intervals of risk [...],left truncation." (I used it to introduce 1. time-dependent covariates, some covariates changing yearly, other irregularly, and 2. left truncation: not all countries existed at the beginning of the study)

>In the case of recurrent events coxph() is not
> using maximum likelihood or even maximum partial likelihood.

Then, what does fit$loglik give in this case? Still a likelihood or a valid criterion to maximise ?
If not, how to get ("manually") the criterion that was maximsed?

That's of interest for me since
> I created artificial covariates measuring the proximity since some
events: exp(-days.since.event/a.chosen.parameter).

...and I used fit$loglik to chose a.chosen.parameter from 8 values, for 3 types of events:

la<-c(263.5, 526.9,1053.9,2107.8,4215.6,8431.1) #list of values to choose from
z<-NULL;for(a1 in la) for(a2 in la) for(a3 in la) {coxtmp <-

(coxph(Surv(start,stop,status)~
+I(exp(-days.since.event.of.type.one/a1))
+I(exp(-days.since.event.of.type.two/a2))
+I(exp(-days.since.event.of.type.three/a3))
+ other.time.dependent.covariates
+cluster(id)
,data=x,robust=T))
rbind(z,c(a1,a2,a3,coxtmp$wald.test, coxtmp$rscore, coxtmp$loglik, coxtmp$score))->z
}
z <- data.frame(z)
names(z) <- c("a1","a2", "a3","wald.test", "rscore",
"NULLloglik","loglik", "score")
z[which.max(z$rscore),]
z[which.max(z$loglik),]

The last two commands gave me almost always the same set for c(a1,a2,a3). But they sometimes differed significantly on some models.

Which criteria (if any ?!) should I use to select the best set c(a1,a2,a3) ?

(If you wish to see what the proximity variables look like, run the following code. The dashed lines show the "half life" of the proximity variable,here=6 months, which is determined by a.chosen.parameter, e.g. a1=la[1]:
#start of code
curve(exp(-(x)/263.5),0,8*365.25,xlab="number of days since last political regime change (dsrc)",ylab="Proximity of political regime change = exp(-dsrc/263.5)",las=1)
axis(1,at=365.25/2, labels= "(6 months)");axis(2,at=seq(0,1,.1),las=1) lines(c(365.25/2,365.25/2,-110),c(-.05,0.5,0.5),lty="dashed") #end of code)

Thanks a lot again.

Mayeul KAUFFMANN
Univ. Pierre Mendes France
Grenoble - France



R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jul 27 08:37:35 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 02:39:19 EST