Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

From: Gustaf Rydevik <gustaf.rydevik_at_gmail.com>
Date: Wed, 23 Jul 2008 16:39:46 +0200

On Wed, Jul 23, 2008 at 4:08 PM, Michal Figurski <figurski_at_mail.med.upenn.edu> wrote:
> Gustaf,
>
> I am sorry, but I don't get the point. Let's just focus on predictive
> performance from the cited passage, that is the number of values predicted
> within 15% of the original value.
> So, the predictive performance from the model fit on entire dataset was 56%
> of profiles, while from bootstrapped model it was 82% of profiles. Well - I
> see a stunning purpose in the bootstrap step here: it turns an useless
> equation into a clinically applicable model!
>
> Honestly, I also can't see how this can be better than fitting on entire
> dataset, but here you have a proof that it is.
>
> I think that another argument supporting this approach is model validation.
> If you fit model on entire data, you have no data left to validate its
> predictions.
>
> On the other hand, I agree with you that the passage in methods section
> looks awkward.
>
> In my work on a similar problem, that is going to appear in August in Ther
> Drug Monit, I used medians since beginning and all the comparisons were done
> based on models with median coefficients. I think this is what the authors
> of that paper did, though they might just have had a problem with describing
> it correctly, and unfortunately it passed through review process unchanged.
>

Hi,

I believe that you misunderstand the passage. Do you know what multiple stepwise regression is?

Since they used SPSS, I copied from
http://www.visualstatistics.net/SPSS%20workbook/stepwise_multiple_regression.htm

"Stepwise selection is a combination of forward and backward procedures. Step 1

The first predictor variable is selected in the same way as in forward selection. If the probability associated with the test of significance is less than or equal to the default .05, the predictor variable with the largest correlation with the criterion variable enters the equation first.

Step 2

The second variable is selected based on the highest partial correlation. If it can pass the entry requirement (PIN=.05), it also enters the equation.

Step 3

>From this point, stepwise selection differs from forward selection:
the variables already in the equation are examined for removal according to the removal criterion (POUT=.10) as in backward elimination.

Step 4

Variables not in the equation are examined for entry. Variable selection ends when no more variables meet entry and removal criteria.


It is the outcome of this *entire process*,step1-4, that they compare with the outcome of their *entire bootstrap/crossvalidation/selection process*, Step1-4 in the methods section, and find that their approach gives better result
What you are doing is only step4 in the article's method section,estimating the parameters of a model *when you already know which variables to include*.It is the way this step is conducted that I am sceptical about.

Regards,

Gustaf

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 23 Jul 2008 - 16:03:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Jul 2008 - 23:32:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive