Re: [R] How to calculate confidence interval of C statistic by rcorr.cens

From: <khosoda_at_med.kobe-u.ac.jp>
Date: Mon, 23 May 2011 14:24:20 +0900

Dear Prof. Harrell,

I'm sorry to say this, but I'm afraid I cannot understand what you write very well. Do you mean that the method to calculate confidence intervals for Dxy or C statistics in logistic model penalized for overfitting has not been established yet and what I did is wrong? Could you elaborate it or teach me some reference point?

Kohkichi

(11/05/23 4:22), Frank Harrell wrote:
> Hi Kohkichi,
> What we really need to figure out is how to make validate give you
> confidence intervals for Dxy or C while it is penalizing for overfitting.
> Some people have ad hoc solutions for that but nothing is nailed down yet.
> Frank
>
> khosoda wrote:
>>
>> Thank you for your comment, Prof Harrell.
>>
>> I changed the function;
>>
>> CstatisticCI<- function(x) # x is object of rcorr.cens.
>> {
>> se<- x["S.D."]/2
>> Low95<- x["C Index"] - 1.96*se
>> Upper95<- x["C Index"] + 1.96*se
>>
>> cbind(x["C Index"], Low95, Upper95)
>> }
>>
>> > CstatisticCI(MyModel.lrm.penalized.rcorr)
>> Low95 Upper95
>> C Index 0.8222785 0.7195828 0.9249742
>>
>> I obtained wider CI than the previous incorrect one.
>> Regarding your comments on overfitting, this is a sample used in model
>> development. However, I performed penalization by pentrace and lrm in
>> rms package. The CI above is CI of penalized model. Results of
>> validation of each model are followings;
>>
>> First model
>> > validate(MyModel.lrm, bw=F, B=1000)
>> index.orig training test optimism index.corrected n
>> Dxy 0.6385 0.6859 0.6198 0.0661 0.5724 1000
>> R2 0.3745 0.4222 0.3388 0.0834 0.2912 1000
>> Intercept 0.0000 0.0000 -0.1446 0.1446 -0.1446 1000
>> Slope 1.0000 1.0000 0.8266 0.1734 0.8266 1000
>> Emax 0.0000 0.0000 0.0688 0.0688 0.0688 1000
>> D 0.2784 0.3248 0.2474 0.0774 0.2010 1000
>> U -0.0192 -0.0192 0.0200 -0.0392 0.0200 1000
>> Q 0.2976 0.3440 0.2274 0.1166 0.1810 1000
>> B 0.1265 0.1180 0.1346 -0.0167 0.1431 1000
>> g 1.7010 2.0247 1.5763 0.4484 1.2526 1000
>> gp 0.2414 0.2512 0.2287 0.0225 0.2189 1000
>>
>> penalized model
>> > validate(MyModel.lrm.penalized, bw=F, B=1000)
>> index.orig training test optimism index.corrected n
>> Dxy 0.6446 0.6898 0.6256 0.0642 0.5804 1000
>> R2 0.3335 0.3691 0.3428 0.0264 0.3072 1000
>> Intercept 0.0000 0.0000 0.0752 -0.0752 0.0752 1000
>> Slope 1.0000 1.0000 1.0547 -0.0547 1.0547 1000
>> Emax 0.0000 0.0000 0.0249 0.0249 0.0249 1000
>> D 0.2718 0.2744 0.2507 0.0236 0.2481 1000
>> U -0.0192 -0.0192 -0.0027 -0.0165 -0.0027 1000
>> Q 0.2910 0.2936 0.2534 0.0402 0.2508 1000
>> B 0.1279 0.1192 0.1336 -0.0144 0.1423 1000
>> g 1.3942 1.5259 1.5799 -0.0540 1.4482 1000
>> gp 0.2141 0.2188 0.2298 -0.0110 0.2251 1000
>>
>> Optimism of slope and intercept were improved from 0.1446 and 0.1734 to
>> -0.0752 and -0.0547, respectively. Emax was improved from 0.0688 to
>> 0.0249. Therefore, I thought overfitting was improved at least to some
>> extent. Well, I'm not sure whether this is enough improvement though.
>>
>> --
>> Kohkichi
>>
>> (11/05/22 23:27), Frank Harrell wrote:
>>> S.D. is the standard deviation (standard error) of Dxy. It already
>>> includes
>>> the effective sample size in its computation so the sqrt(n) terms is not
>>> needed. The help file for rcorr.cens has an example where the confidence
>>> interval for C is computed. Note that you are making the strong
>>> assumption
>>> that there is no overfitting in the model or that you are evaluating C on
>>> a
>>> sample not used in model development.
>>> Frank
>>>
>>>
>>> Kohkichi wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to calculate 95% confidence interval of C statistic of
>>>> logistic regression model using rcorr.cens in rms package. I wrote a
>>>> brief function for this purpose as the followings;
>>>>
>>>> CstatisticCI<- function(x) # x is object of rcorr.cens.
>>>> {
>>>> se<- x["S.D."]/sqrt(x["n"])
>>>> Low95<- x["C Index"] - 1.96*se
>>>> Upper95<- x["C Index"] + 1.96*se
>>>> cbind(x["C Index"], Low95, Upper95)
>>>> }
>>>>
>>>> Then,
>>>>
>>>>> MyModel.lrm.rcorr<- rcorr.cens(x=predict(MyModel.lrm), S=df$outcome)
>>>>> MyModel.lrm.rcorr
>>>> C Index Dxy S.D. n
>>>> missing uncensored
>>>> 0.8222785 0.6445570 0.1047916 104.0000000
>>>> 0.0000000 104.0000000
>>>> Relevant Pairs Concordant Uncertain
>>>> 3950.0000000 3248.0000000 0.0000000
>>>>
>>>>> CstatisticCI(x5factor_final.lrm.pen.rcorr)
>>>> Low95 Upper95
>>>> C Index 0.8222785 0.8021382 0.8424188
>>>>
>>>> I'm not sure what "S.D." in object of rcorr.cens means. Is this standard
>>>> deviation of "C Index" or standard deviation of "Dxy"?
>>>> I thought it is standard deviation of "C Index". Therefore, I wrote the
>>>> code above. Am I right?
>>>>
>>>> I would appreciate any help in advance.
>>>>
>>>> --
>>>> Kohkichi Hosoda M.D.
>>>>
>>>> Department of Neurosurgery,
>>>> Kobe University Graduate School of Medicine,
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> -----
>>> Frank Harrell
>>> Department of Biostatistics, Vanderbilt University
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/How-to-calculate-confidence-interval-of-C-statistic-by-rcorr-cens-tp3541709p3542163.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-calculate-confidence-interval-of-C-statistic-by-rcorr-cens-tp3541709p3542654.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
*************************************************
 神戸大学大学院医学研究科 脳神経外科学分野
 細田 弘吉
 
 〒650-0017 神戸市中央区楠町7丁目5-1
     Phone: 078-382-5966
     Fax  : 078-382-5979
     E-mail address
         Office: khosoda_at_med.kobe-u.ac.jp
	Home  : khosoda_at_venus.dti.ne.jp

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 23 May 2011 - 05:29:53 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 23 May 2011 - 11:10:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive