From: N. Lapidus <n.lapidus_at_gmail.com>

Date: Tue, 22 Jul 2008 17:07:13 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 15:40:52 GMT

Date: Tue, 22 Jul 2008 17:07:13 +0200

Hi Michal,

This paper by John Fox may help you to precise what you are looking for and
to perform your analyses

http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf
Nael

On Tue, Jul 22, 2008 at 3:51 PM, Michal Figurski < figurski_at_mail.med.upenn.edu> wrote:

> Dear all,

*>
**> I don't want to argue with anybody about words or about what bootstrap is
**> suitable for - I know too little for that.
**>
**> All I need is help to get the *equation coefficients* optimized by
**> bootstrap - either by one of the functions or by simple median.
**>
**> Please help,
**>
**> --
**> Michal J. Figurski
**> HUP, Pathology & Laboratory Medicine
**> Xenobiotics Toxicokinetics Research Laboratory
**> 3400 Spruce St. 7 Maloney
**> Philadelphia, PA 19104
**> tel. (215) 662-3413
**>
**> Frank E Harrell Jr wrote:
**>
**>> Michal Figurski wrote:
**>>
**>>> Frank,
**>>>
**>>> "How does bootstrap improve on that?"
**>>>
**>>> I don't know, but I have an idea. Since the data in my set are just a
**>>> small sample of a big population, then if I use my whole dataset to obtain
**>>> max likelihood estimates, these estimates may be best for this dataset, but
**>>> far from ideal for the whole population.
**>>>
**>>
**>> The bootstrap, being a resampling procedure from your sample, has the same
**>> issues about the population as MLEs.
**>>
**>>
**>>> I used bootstrap to virtually increase the size of my dataset, it should
**>>> result in estimates more close to that from the population - isn't it the
**>>> purpose of bootstrap?
**>>>
**>>
**>> No
**>>
**>>
**>>> When I use such median coefficients on another dataset (another sample
**>>> from population), the predictions are better, than using max likelihood
**>>> estimates. I have already tested that and it worked!
**>>>
**>>
**>> Then your testing procedure is probably not valid.
**>>
**>>
**>>> I am not a statistician and I don't feel what "overfitting" is, but it
**>>> may be just another word for the same idea.
**>>>
**>>> Nevertheless, I would still like to know how can I get the coeffcients
**>>> for the model that gives the "nearly unbiased estimates". I greatly
**>>> appreciate your help.
**>>>
**>>
**>> More info in my book Regression Modeling Strategies.
**>>
**>> Frank
**>>
**>>
**>>> --
**>>> Michal J. Figurski
**>>> HUP, Pathology & Laboratory Medicine
**>>> Xenobiotics Toxicokinetics Research Laboratory
**>>> 3400 Spruce St. 7 Maloney
**>>> Philadelphia, PA 19104
**>>> tel. (215) 662-3413
**>>>
**>>> Frank E Harrell Jr wrote:
**>>>
**>>>> Michal Figurski wrote:
**>>>>
**>>>>> Hello all,
**>>>>>
**>>>>> I am trying to optimize my logistic regression model by using
**>>>>> bootstrap. I was previously using SAS for this kind of tasks, but I am now
**>>>>> switching to R.
**>>>>>
**>>>>> My data frame consists of 5 columns and has 109 rows. Each row is a
**>>>>> single record composed of the following values: Subject_name, numeric1,
**>>>>> numeric2, numeric3 and outcome (yes or no). All three numerics are used to
**>>>>> predict outcome using LR.
**>>>>>
**>>>>> In SAS I have written a macro, that was splitting the dataset, running
**>>>>> LR on one half of data and making predictions on second half. Then it was
**>>>>> collecting the equation coefficients from each iteration of bootstrap. Later
**>>>>> I was just taking medians of these coefficients from all iterations, and
**>>>>> used them as an optimal model - it really worked well!
**>>>>>
**>>>>
**>>>> Why not use maximum likelihood estimation, i.e., the coefficients from
**>>>> the original fit. How does the bootstrap improve on that?
**>>>>
**>>>>
**>>>>> Now I want to do the same in R. I tried to use the 'validate' or
**>>>>> 'calibrate' functions from package "Design", and I also experimented with
**>>>>> function 'sm.binomial.bootstrap' from package "sm". I tried also the
**>>>>> function 'boot' from package "boot", though without success - in my case it
**>>>>> randomly selected _columns_ from my data frame, while I wanted it to select
**>>>>> _rows_.
**>>>>>
**>>>>
**>>>> validate and calibrate in Design do resampling on the rows
**>>>>
**>>>> Resampling is mainly used to get a nearly unbiased estimate of the model
**>>>> performance, i.e., to correct for overfitting.
**>>>>
**>>>> Frank Harrell
**>>>>
**>>>>
**>>>>> Though the main point here is the optimized LR equation. I would
**>>>>> appreciate any help on how to extract the LR equation coefficients from any
**>>>>> of these bootstrap functions, in the same form as given by 'glm' or 'lrm'.
**>>>>>
**>>>>> Many thanks in advance!
**>>>>>
**>>>>>
**>>>>
**>>>>
**>>>
**>>
**>>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 15:40:52 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 16:31:53 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*