Re: [R] [Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]

From: Gustaf Rydevik <gustaf.rydevik_at_gmail.com>
Date: Wed, 23 Jul 2008 15:45:06 +0200

On Wed, Jul 23, 2008 at 3:14 PM, Michal Figurski <figurski_at_mail.med.upenn.edu> wrote:
> I think the argument supporting the use of bootstrap to determine
> coefficients, as opposed to just running linear regression on the whole
> dataset, is the comparison of Rsq and prediction errors between these
> two approaches - page 1502. There's a substantial difference in favor of
> the bootstrap approach.
>
> --
> Michal J. Figurski
>

Are you talking about this passage?

"A commonly used approach for establishing estimation models is to perform a multiple stepwise linear regression on the total set of full AUCs (19 ). When we used that approach, we obtained a r2 value of 0.74 and a prediction error of 7.6% 26.7%, (median, 6.5%; 95% CI,  51.9% to 67.5%), and the model estimated MPA AUC to within 15% of the full value in 56% of the profiles. Our estimation model using the repeated cross-validation approach was significantly better, with a r2 value of 0.862, prediction error of 6.1% 19%, (median, 3.0%; 95% CI,  33.1% to 32%), and estimation of MPA AUC to within 15% of the value (when all 12 samples are used to calculate MPA AUC) in 82% of the profiles".

As far as I can tell, they are talking about the disadvantage using stepwise regression to determine the optimal variables in the regression, versus the bootstrap/CV-approach. And this might well be true.

It is the following part in the methods description that seem unmotivated to me: "Once the general model (of the 26) was
selected, the proposed regression coefficients were taken as the median of the distribution of regression coefficient values described in step 2."

I.e, after having decided upon the model that uses C0, C0.5 and C2 , using a median of the bootstrap estimates (which is what the R-code I wrote does, more or less) , instead of fitting that model on the entire data set. I don't see how this could be better, since we can't get any more information from the data other than what's there from the beginning. And I believe that this is what's all the other people on the list is trying to tell you, that it's a step without purpose.

You have to distinguish between finding out which model is best, which bootstrap can be useful for, and estimating the parameters for the final, decided model, where bootstrapping several regressions and taking median most likely is no better than standard regression.

best regards,

Gustaf

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 23 Jul 2008 - 15:40:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Jul 2008 - 16:32:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive