Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

From: Doran, Harold <HDoran_at_air.org>
Date: Tue, 22 Jul 2008 14:29:54 -0400


> install.packages('fortunes')
> library(fortunes)
> fortune(28)

> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz_at_comcast.net]
> Sent: Tuesday, July 22, 2008 1:29 PM
> To: Michal Figurski
> Cc: Doran, Harold; r-help_at_r-project.org; Frank E Harrell Jr;
> Bert Gunter
> Subject: Re: [R] Coefficients of Logistic Regression from
> bootstrap - how to get them?
>
> Michal,
>
> With all due respect, you have openly acknowledged that you
> don't know enough about the subject at hand.
>
> If that is the case, on what basis are you in a position to
> challenge the collective wisdom of those professionals who
> have voluntarily offered *expert* level statistical advice to you?
>
> You have erected a wall around your thinking.
>
> You may choose to use R or any other software application to
> "Git-R-Done". But that does not make it correct.
>
> There are other methods to consider that could be used during
> the model building process itself, rather than on a post-hoc
> basis and I would specifically refer you to Frank's book,
> Regression Modeling Strategies:
>
>
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
>
> Marc Schwartz
>
> on 07/22/2008 09:43 AM Michal Figurski wrote:
> > Hmm...
> >
> > It sounds like ideology to me. I was asking for technical
> help. I know
> > what I want to do, just don't know how to do it in R. I'll
> go back to
> > SAS then. Thank you.
> >
> > --
> > Michal J. Figurski
> >
> > Doran, Harold wrote:
> >> I think the answer has been given to you. If you want to
> continue to
> >> ignore that advice and use bootstrap for point estimates
> rather than
> >> the properties of those estimates (which is what bootstrap is for)
> >> then you are on your own.
> >>> -----Original Message-----
> >>> From: r-help-bounces_at_r-project.org
> >>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Michal Figurski
> >>> Sent: Tuesday, July 22, 2008 9:52 AM
> >>> To: r-help_at_r-project.org
> >>> Subject: Re: [R] Coefficients of Logistic Regression from
> bootstrap
> >>> - how to get them?
> >>>
> >>> Dear all,
> >>>
> >>> I don't want to argue with anybody about words or about what
> >>> bootstrap is suitable for - I know too little for that.
> >>>
> >>> All I need is help to get the *equation coefficients*
> optimized by
> >>> bootstrap - either by one of the functions or by simple median.
> >>>
> >>> Please help,
> >>>
> >>> --
> >>> Michal J. Figurski
> >>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics
> >>> Research Laboratory 3400 Spruce St. 7 Maloney
> Philadelphia, PA 19104
> >>> tel. (215) 662-3413
> >>>
> >>> Frank E Harrell Jr wrote:
> >>>> Michal Figurski wrote:
> >>>>> Frank,
> >>>>>
> >>>>> "How does bootstrap improve on that?"
> >>>>>
> >>>>> I don't know, but I have an idea. Since the data in my set
> >>> are just a
> >>>>> small sample of a big population, then if I use my whole
> >>> dataset to
> >>>>> obtain max likelihood estimates, these estimates may be
> >>> best for this
> >>>>> dataset, but far from ideal for the whole population.
> >>>> The bootstrap, being a resampling procedure from your
> >>> sample, has the
> >>>> same issues about the population as MLEs.
> >>>>
> >>>>> I used bootstrap to virtually increase the size of my
> dataset, it
> >>>>> should result in estimates more close to that from the
> >>> population -
> >>>>> isn't it the purpose of bootstrap?
> >>>> No
> >>>>
> >>>>> When I use such median coefficients on another dataset (another
> >>>>> sample from population), the predictions are better, than
> >>> using max
> >>>>> likelihood estimates. I have already tested that and it worked!
> >>>> Then your testing procedure is probably not valid.
> >>>>
> >>>>> I am not a statistician and I don't feel what
> >>> "overfitting" is, but
> >>>>> it may be just another word for the same idea.
> >>>>>
> >>>>> Nevertheless, I would still like to know how can I get the
> >>>>> coeffcients for the model that gives the "nearly unbiased
> >>> estimates".
> >>>>> I greatly appreciate your help.
> >>>> More info in my book Regression Modeling Strategies.
> >>>>
> >>>> Frank
> >>>>
> >>>>> --
> >>>>> Michal J. Figurski
> >>>>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics
> >>>>> Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA
> >>>>> 19104 tel. (215) 662-3413
> >>>>>
> >>>>> Frank E Harrell Jr wrote:
> >>>>>> Michal Figurski wrote:
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>> I am trying to optimize my logistic regression model by using
> >>>>>>> bootstrap. I was previously using SAS for this kind of
> >>> tasks, but I
> >>>>>>> am now switching to R.
> >>>>>>>
> >>>>>>> My data frame consists of 5 columns and has 109 rows.
> >>> Each row is a
> >>>>>>> single record composed of the following values: Subject_name,
> >>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no).
> All three
> >>>>>>> numerics are used to predict outcome using LR.
> >>>>>>>
> >>>>>>> In SAS I have written a macro, that was splitting the
> dataset,
> >>>>>>> running LR on one half of data and making predictions
> on second
> >>>>>>> half. Then it was collecting the equation
> coefficients from each
> >>>>>>> iteration of bootstrap. Later I was just taking
> medians of these
> >>>>>>> coefficients from all iterations, and used them as an
> >>> optimal model
> >>>>>>> - it really worked well!
> >>>>>> Why not use maximum likelihood estimation, i.e., the
> coefficients
> >>>>>> from the original fit. How does the bootstrap improve on that?
> >>>>>>
> >>>>>>> Now I want to do the same in R. I tried to use the
> 'validate' or
> >>>>>>> 'calibrate' functions from package "Design", and I also
> >>>>>>> experimented with function 'sm.binomial.bootstrap'
> from package
> >>>>>>> "sm". I tried also the function 'boot' from package
> >>> "boot", though
> >>>>>>> without success
> >>>>>>> - in my case it randomly selected _columns_ from my
> data frame,
> >>>>>>> while I wanted it to select _rows_.
> >>>>>> validate and calibrate in Design do resampling on the rows
> >>>>>>
> >>>>>> Resampling is mainly used to get a nearly unbiased
> >>> estimate of the
> >>>>>> model performance, i.e., to correct for overfitting.
> >>>>>>
> >>>>>> Frank Harrell
> >>>>>>
> >>>>>>> Though the main point here is the optimized LR
> equation. I would
> >>>>>>> appreciate any help on how to extract the LR equation
> >>> coefficients
> >>>>>>> from any of these bootstrap functions, in the same form
> >>> as given by
> >>>>>>> 'glm' or 'lrm'.
> >>>>>>>
> >>>>>>> Many thanks in advance!
> >>>>>>>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 18:42:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 19:31:58 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive