From: <khosoda_at_med.kobe-u.ac.jp>

Date: Wed, 18 May 2011 21:54:04 +0900

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 18 May 2011 - 12:59:34 GMT

Date: Wed, 18 May 2011 21:54:04 +0900

Thank you for your advice, Tim.

I am reading your paper and other materials in your website. I could not find R package of your bootknife method. Is there any R package for this procedure?

(11/05/17 14:13), Tim Hesterberg wrote:

> My usual rule is that whatever gives the widest confidence intervals

*> in a particular problem is most accurate for that problem :-)
**>
**> Bootstrap percentile intervals tend to be too narrow.
**> Consider the case of the sample mean; the usual formula CI is
**> xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n)
**> The bootstrap percentile interval for symmetric data is roughly
**> xbar +- z_alpha sqrt( (1/(n )) sum((x_i - xbar)^2)) / sqrt(n)
**> It is narrower than the formula CI because
**> * z quantiles rather than t quantiles
**> * standard error uses divisor of n rather than (n-1)
**>
**> In stratified sampling, the narrowness factor depends on the
**> stratum sizes, not the overall n.
**> In regression, estimates for some quantities may be based on a small
**> subset of the data (e.g. coefficients related to rare factor levels).
**>
**> This doesn't mean we should give up on the bootstrap.
**> There are remedies for the bootstrap biases, see e.g.
**> Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling
**> vs. Smoothing, Proceedings of the Section on Statistics and the
**> Environment, American Statistical Association, 2924-2930.
**> http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf
**>
**> And other methods have their own biases, particularly in nonlinear
**> applications such as logistic regression.
**>
**> Tim Hesterberg
**>
*

>> Thank you for your reply, Prof. Harrell. >> >> I agree with you. Dropping only one variable does not actually help a lot. >> >> I have one more question. >> During analysis of this model I found that the confidence >> intervals (CIs) of some coefficients provided by bootstrapping (bootcov >> function in rms package) was narrower than CIs provided by usual >> variance-covariance matrix and CIs of other coefficients wider. My data >> has no cluster structure. I am wondering which CIs are better. >> I guess bootstrapping one, but is it right? >> >> I would appreciate your help in advance. >> -- >> KH >> >> >> >> (11/05/16 12:25), Frank Harrell wrote: >>> I think you are doing this correctly except for one thing. The validation >>> and other inferential calculations should be done on the full model. Use >>> the approximate model to get a simpler nomogram but not to get standard >>> errors. With only dropping one variable you might consider just running the >>> nomogram on the entire model. >>> Frank >>> >>> >>> KH wrote: >>>> >>>> Hi, >>>> I am trying to construct a logistic regression model from my data (104 >>>> patients and 25 events). I build a full model consisting of five >>>> predictors with the use of penalization by rms package (lrm, pentrace >>>> etc) because of events per variable issue. Then, I tried to approximate >>>> the full model by step-down technique predicting L from all of the >>>> componet variables using ordinary least squares (ols in rms package) as >>>> the followings. I would like to know whether I am doing right or not. >>>> >>>>> library(rms) >>>>> plogit<- predict(full.model) >>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) >>>>> fastbw(full.ols, aics=1e10) >>>> >>>> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 >>>> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 >>>> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 >>>> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 >>>> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 >>>> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 >>>> >>>> Then, fitted an approximation to the full model using most imprtant >>>> variable (R^2 for predictions from the reduced model against the >>>> original Y drops below 0.95), that is, dropping "stenosis". >>>> >>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure) >>>>> full.ols.approx$stats >>>> n Model L.R. d.f. R2 g Sigma >>>> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 >>>> >>>> This approximate model had R^2 against the full model of 0.99. >>>> Therefore, I updated the original full logistic model dropping >>>> "stenosis" as predictor. >>>> >>>>> full.approx.lrm<- update(full.model, ~ . -stenosis) >>>> >>>>> validate(full.model, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 >>>> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 >>>> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 >>>> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 >>>> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000 >>>> >>>>> validate(full.approx.lrm, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 >>>> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 >>>> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 >>>> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 >>>> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 >>>> >>>> Validatin revealed this approximation was not bad. >>>> Then, I made a nomogram. >>>> >>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.approx.lrm.nom) >>>> >>>> Another nomogram using ols model, >>>> >>>>> full.ols.approx.nom<- nomogram(full.ols.approx, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.ols.approx.nom) >>>> >>>> These two nomograms are very similar but a little bit different. >>>> >>>> My questions are; >>>> >>>> 1. Am I doing right? >>>> >>>> 2. Which nomogram is correct >>>> >>>> I would appreciate your help in advance. >>>> >>>> -- >>>> KH >>>> >>>> ______________________________________________ >>>> R-help_at_r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> ----- >>> Frank Harrell >>> Department of Biostatistics, Vanderbilt University >>> -- >>> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help_at_r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> E-mail address >> Office: khosoda_at_med.kobe-u.ac.jp >> Home : khosoda_at_venus.dti.ne.jp >> >> > ______________________________________________R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 18 May 2011 - 12:59:34 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 18 May 2011 - 13:20:07 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*