Re: [R] Discrepancy in the PBC data set

From: Terry Therneau <>
Date: Mon, 24 Nov 2008 07:39:49 -0600 (CST)

  The data set in R is wrong. I've found mistakes on 2 lines in a quick look.   

  I don't know if the data is incorrect in the Appendix of Fleming and Harrington as well (someone seems to have borrowed my copy), which is where the data set appears to have been taken from, given all the "-9" codes in it. (Note, Tom Fleming originally got the data from me, so I'm fairly confident in calling my Mayo version the authoritative one). I'll make sure this gets fixed.   

  You can grab a correct data set from our department web page. Code is below.   

          Terry Therneau              

pbcurl <-
" at"

pbc <- read.table(pbcurl, header=F,

                  col.names=c('id', 'time', 'status', 'trt',  'age', 'sex',

'ascites', 'hepato', 'spiders', 'edema',
'bili', 'chol', 'albumin', 'copper',
'alk.phos', 'ast', 'trig', 'platelet',
'protime', 'stage'),

pbc$age <- pbc$age/365.25

newfit <- coxph(Surv(time, status==2) ~ age + edema + log(bili) +

        log(protime) + log(albumin), data=pbc)


                coef exp(coef) se(coef)     z       p
age           0.0396    1.0404  0.00767  5.16 2.4e-07
edema         0.8963    2.4505  0.27141  3.30 9.6e-04
log(bili)     0.8636    2.3716  0.08294 10.41 0.0e+00
log(protime) 2.3868 10.8791 0.76851 3.11 1.9e-03 log(albumin) -2.5069 0.0815 0.65292 -3.84 1.2e-04

Likelihood ratio test=231 on 5 df, p=0 n=416 (2 observations deleted due to missingness) mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Mon 24 Nov 2008 - 13:41:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 24 Nov 2008 - 15:30:27 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive