From: Doran, Harold <HDoran_at_air.org>

Date: Tue, 22 Jul 2008 14:29:54 -0400

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 18:42:43 GMT

Date: Tue, 22 Jul 2008 14:29:54 -0400

> install.packages('fortunes')

*> library(fortunes)
**> fortune(28)
*

> -----Original Message-----

*> From: Marc Schwartz [mailto:marc_schwartz_at_comcast.net]
**> Sent: Tuesday, July 22, 2008 1:29 PM
**> To: Michal Figurski
**> Cc: Doran, Harold; r-help_at_r-project.org; Frank E Harrell Jr;
**> Bert Gunter
**> Subject: Re: [R] Coefficients of Logistic Regression from
**> bootstrap - how to get them?
**>
**> Michal,
**>
**> With all due respect, you have openly acknowledged that you
**> don't know enough about the subject at hand.
**>
**> If that is the case, on what basis are you in a position to
**> challenge the collective wisdom of those professionals who
**> have voluntarily offered *expert* level statistical advice to you?
**>
**> You have erected a wall around your thinking.
**>
**> You may choose to use R or any other software application to
**> "Git-R-Done". But that does not make it correct.
**>
**> There are other methods to consider that could be used during
**> the model building process itself, rather than on a post-hoc
**> basis and I would specifically refer you to Frank's book,
**> Regression Modeling Strategies:
**>
**> **http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
**>
**> Marc Schwartz
**>
**> on 07/22/2008 09:43 AM Michal Figurski wrote:
**> > Hmm...
**> >
**> > It sounds like ideology to me. I was asking for technical
**> help. I know
**> > what I want to do, just don't know how to do it in R. I'll
**> go back to
**> > SAS then. Thank you.
**> >
**> > --
**> > Michal J. Figurski
**> >
**> > Doran, Harold wrote:
**> >> I think the answer has been given to you. If you want to
**> continue to
**> >> ignore that advice and use bootstrap for point estimates
**> rather than
**> >> the properties of those estimates (which is what bootstrap is for)
**> >> then you are on your own.
**> >>> -----Original Message-----
**> >>> From: r-help-bounces_at_r-project.org
**> >>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Michal Figurski
**> >>> Sent: Tuesday, July 22, 2008 9:52 AM
**> >>> To: r-help_at_r-project.org
**> >>> Subject: Re: [R] Coefficients of Logistic Regression from
**> bootstrap
**> >>> - how to get them?
**> >>>
**> >>> Dear all,
**> >>>
**> >>> I don't want to argue with anybody about words or about what
**> >>> bootstrap is suitable for - I know too little for that.
**> >>>
**> >>> All I need is help to get the *equation coefficients*
**> optimized by
**> >>> bootstrap - either by one of the functions or by simple median.
**> >>>
**> >>> Please help,
**> >>>
**> >>> --
**> >>> Michal J. Figurski
**> >>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics
**> >>> Research Laboratory 3400 Spruce St. 7 Maloney
**> Philadelphia, PA 19104
**> >>> tel. (215) 662-3413
**> >>>
**> >>> Frank E Harrell Jr wrote:
**> >>>> Michal Figurski wrote:
**> >>>>> Frank,
**> >>>>>
**> >>>>> "How does bootstrap improve on that?"
**> >>>>>
**> >>>>> I don't know, but I have an idea. Since the data in my set
**> >>> are just a
**> >>>>> small sample of a big population, then if I use my whole
**> >>> dataset to
**> >>>>> obtain max likelihood estimates, these estimates may be
**> >>> best for this
**> >>>>> dataset, but far from ideal for the whole population.
**> >>>> The bootstrap, being a resampling procedure from your
**> >>> sample, has the
**> >>>> same issues about the population as MLEs.
**> >>>>
**> >>>>> I used bootstrap to virtually increase the size of my
**> dataset, it
**> >>>>> should result in estimates more close to that from the
**> >>> population -
**> >>>>> isn't it the purpose of bootstrap?
**> >>>> No
**> >>>>
**> >>>>> When I use such median coefficients on another dataset (another
**> >>>>> sample from population), the predictions are better, than
**> >>> using max
**> >>>>> likelihood estimates. I have already tested that and it worked!
**> >>>> Then your testing procedure is probably not valid.
**> >>>>
**> >>>>> I am not a statistician and I don't feel what
**> >>> "overfitting" is, but
**> >>>>> it may be just another word for the same idea.
**> >>>>>
**> >>>>> Nevertheless, I would still like to know how can I get the
**> >>>>> coeffcients for the model that gives the "nearly unbiased
**> >>> estimates".
**> >>>>> I greatly appreciate your help.
**> >>>> More info in my book Regression Modeling Strategies.
**> >>>>
**> >>>> Frank
**> >>>>
**> >>>>> --
**> >>>>> Michal J. Figurski
**> >>>>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics
**> >>>>> Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA
**> >>>>> 19104 tel. (215) 662-3413
**> >>>>>
**> >>>>> Frank E Harrell Jr wrote:
**> >>>>>> Michal Figurski wrote:
**> >>>>>>> Hello all,
**> >>>>>>>
**> >>>>>>> I am trying to optimize my logistic regression model by using
**> >>>>>>> bootstrap. I was previously using SAS for this kind of
**> >>> tasks, but I
**> >>>>>>> am now switching to R.
**> >>>>>>>
**> >>>>>>> My data frame consists of 5 columns and has 109 rows.
**> >>> Each row is a
**> >>>>>>> single record composed of the following values: Subject_name,
**> >>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no).
**> All three
**> >>>>>>> numerics are used to predict outcome using LR.
**> >>>>>>>
**> >>>>>>> In SAS I have written a macro, that was splitting the
**> dataset,
**> >>>>>>> running LR on one half of data and making predictions
**> on second
**> >>>>>>> half. Then it was collecting the equation
**> coefficients from each
**> >>>>>>> iteration of bootstrap. Later I was just taking
**> medians of these
**> >>>>>>> coefficients from all iterations, and used them as an
**> >>> optimal model
**> >>>>>>> - it really worked well!
**> >>>>>> Why not use maximum likelihood estimation, i.e., the
**> coefficients
**> >>>>>> from the original fit. How does the bootstrap improve on that?
**> >>>>>>
**> >>>>>>> Now I want to do the same in R. I tried to use the
**> 'validate' or
**> >>>>>>> 'calibrate' functions from package "Design", and I also
**> >>>>>>> experimented with function 'sm.binomial.bootstrap'
**> from package
**> >>>>>>> "sm". I tried also the function 'boot' from package
**> >>> "boot", though
**> >>>>>>> without success
**> >>>>>>> - in my case it randomly selected _columns_ from my
**> data frame,
**> >>>>>>> while I wanted it to select _rows_.
**> >>>>>> validate and calibrate in Design do resampling on the rows
**> >>>>>>
**> >>>>>> Resampling is mainly used to get a nearly unbiased
**> >>> estimate of the
**> >>>>>> model performance, i.e., to correct for overfitting.
**> >>>>>>
**> >>>>>> Frank Harrell
**> >>>>>>
**> >>>>>>> Though the main point here is the optimized LR
**> equation. I would
**> >>>>>>> appreciate any help on how to extract the LR equation
**> >>> coefficients
**> >>>>>>> from any of these bootstrap functions, in the same form
**> >>> as given by
**> >>>>>>> 'glm' or 'lrm'.
**> >>>>>>>
**> >>>>>>> Many thanks in advance!
**> >>>>>>>
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 18:42:43 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 19:31:58 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*