Re: [R] Question on approximations of full logistic regression model

From: Tim Hesterberg <timhesterberg_at_gmail.com>
Date: Mon, 16 May 2011 22:13:48 -0700

My usual rule is that whatever gives the widest confidence intervals in a particular problem is most accurate for that problem :-)

Bootstrap percentile intervals tend to be too narrow. Consider the case of the sample mean; the usual formula CI is

    xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n) The bootstrap percentile interval for symmetric data is roughly

    xbar +- z_alpha sqrt( (1/(n )) sum((x_i - xbar)^2)) / sqrt(n) It is narrower than the formula CI because

In stratified sampling, the narrowness factor depends on the stratum sizes, not the overall n.
In regression, estimates for some quantities may be based on a small subset of the data (e.g. coefficients related to rare factor levels).

This doesn't mean we should give up on the bootstrap. There are remedies for the bootstrap biases, see e.g. Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930. http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

And other methods have their own biases, particularly in nonlinear applications such as logistic regression.

Tim Hesterberg

>Thank you for your reply, Prof. Harrell.
>
>I agree with you. Dropping only one variable does not actually help a lot.
>
>I have one more question.
>During analysis of this model I found that the confidence
>intervals (CIs) of some coefficients provided by bootstrapping (bootcov
>function in rms package) was narrower than CIs provided by usual
>variance-covariance matrix and CIs of other coefficients wider. My data
>has no cluster structure. I am wondering which CIs are better.
>I guess bootstrapping one, but is it right?
>
>I would appreciate your help in advance.
>--
>KH
>
>
>
>(11/05/16 12:25), Frank Harrell wrote:
>> I think you are doing this correctly except for one thing. The validation
>> and other inferential calculations should be done on the full model. Use
>> the approximate model to get a simpler nomogram but not to get standard
>> errors. With only dropping one variable you might consider just running the
>> nomogram on the entire model.
>> Frank
>>
>>
>> KH wrote:
>>>
>>> Hi,
>>> I am trying to construct a logistic regression model from my data (104
>>> patients and 25 events). I build a full model consisting of five
>>> predictors with the use of penalization by rms package (lrm, pentrace
>>> etc) because of events per variable issue. Then, I tried to approximate
>>> the full model by step-down technique predicting L from all of the
>>> componet variables using ordinary least squares (ols in rms package) as
>>> the followings. I would like to know whether I am doing right or not.
>>>
>>>> library(rms)
>>>> plogit<- predict(full.model)
>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
>>>> fastbw(full.ols, aics=1e10)
>>>
>>> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2
>>> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991
>>> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882
>>> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711
>>> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544
>>> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000
>>>
>>> Then, fitted an approximation to the full model using most imprtant
>>> variable (R^2 for predictions from the reduced model against the
>>> original Y drops below 0.95), that is, dropping "stenosis".
>>>
>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>>> full.ols.approx$stats
>>> n Model L.R. d.f. R2 g Sigma
>>> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622
>>>
>>> This approximate model had R^2 against the full model of 0.99.
>>> Therefore, I updated the original full logistic model dropping
>>> "stenosis" as predictor.
>>>
>>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>>
>>>> validate(full.model, bw=F, B=1000)
>>> index.orig training test optimism index.corrected n
>>> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000
>>> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000
>>> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000
>>> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000
>>> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000
>>>
>>>> validate(full.approx.lrm, bw=F, B=1000)
>>> index.orig training test optimism index.corrected n
>>> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000
>>> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000
>>> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000
>>> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000
>>> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000
>>>
>>> Validatin revealed this approximation was not bad.
>>> Then, I made a nomogram.
>>>
>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>> plot(full.approx.lrm.nom)
>>>
>>> Another nomogram using ols model,
>>>
>>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>> plot(full.ols.approx.nom)
>>>
>>> These two nomograms are very similar but a little bit different.
>>>
>>> My questions are;
>>>
>>> 1. Am I doing right?
>>>
>>> 2. Which nomogram is correct
>>>
>>> I would appreciate your help in advance.
>>>
>>> --
>>> KH
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> -----
>> Frank Harrell
>> Department of Biostatistics, Vanderbilt University
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> E-mail address
> Office: khosoda_at_med.kobe-u.ac.jp
> Home : khosoda_at_venus.dti.ne.jp
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 17 May 2011 - 05:16:46 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 18 May 2011 - 13:10:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive