[R] validation, calibration and Design

From: Williams Scott <Scott.Williams_at_petermac.org>
Date: Mon 11 Jul 2005 - 19:03:29 EST


Hi R experts,  

I am trying to do a prognostic model validation study, using cancer survival data. There are 2 data sets - 1500 cases used to develop a nomogram, and another of 800 cases used as an independent validation cohort. I have validated the nomogram in the original data (easy with the Design tools), and then want to show that it also has good results with the independent data using 60 month survival. I would also like to show that the nomogram is significantly different to an existing model based on 60 month survival data generated by it (eg by McNemar's test).

Hence, somewhat shortened:  

#using R 2.01 on Windows



data1 #dataframe with predictor variables A and B, cens and time

      columns (months)

ddist1 <- datadist(data1)


s1 <- Surv(data1$time, data1$cens)  

cph.nomo <- cph(s1 ~ A+B, surv=T, x=T, y=T, time.inc=60)  

survcph <- Survival(cph.nomo, x=T, y=T, time.inc=60, surv=T)

surv5 <- function(lp) survcph(60, lp)

nomogram(cph.nomo, lp=T, conf.int=F, fun=list(surv5, surv7),

funlabel=c("5 yr DFS"))  

# now have a useful nomogram model, with good discrimination and

#calibration when checked with validate and calibrate (not shown)

#....move on to validation cohort of n=800

Data2 #Validation data with same predictor variables A, B, cens, time

# do I need to put data2 into datadist??

s2 <- Surv(data2$time, data2$cens)  

#able to derive 60 month estimates of survival using

data2.est5 <- survest(cph.nomo, expand.grid(A=data2$A, B=data2$B),

times=c(60), conf.int=0)  

rcorr.cens(data2.est5$surv, s2) # tests discrimination of the model

#against the validation data observed censored data

# I cant find a way to use calibrate in this setting though??

# Also, if I have the 5 year estimates for 2 different models, I can

# use rcorr.cens to show discrimination, but which values are

# suitable for a test of difference (eg with McNemars)?

# I have tried predict / newdata function a number of ways but it

# typically returns an error relating to unequal vector lengths

What I cant work out is where to go now to derive a calibration curve of the predicted 5 year result (val.data5) and the observed (s2). Or can I do it another way? For example, could I merge the 2 data frames and use lines1:1500 to build the model and the last 800 lines to validate?  

Obviously I am a novice, and sure to be missing something simple. I have spent countless hours pouring over Prof Harrell's text (which is great but doesn't have a specific example of this) and Design Help plus the R news archive with no success, so any help is very much appreciated.  

Scott Williams MD

Peter MacCallum Cancer Centre

Melbourne Australia  

        [[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 11 19:10:13 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:28 EST