From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>

Date: Mon, 21 Jul 2008 18:22:26 -0500

*>
*

Date: Mon, 21 Jul 2008 18:22:26 -0500

Michal Figurski wrote:

> Frank,

*>
**> "How does bootstrap improve on that?"
**>
**> I don't know, but I have an idea. Since the data in my set are just a
**> small sample of a big population, then if I use my whole dataset to
**> obtain max likelihood estimates, these estimates may be best for this
**> dataset, but far from ideal for the whole population.
*

The bootstrap, being a resampling procedure from your sample, has the same issues about the population as MLEs.

*>
*

> I used bootstrap to virtually increase the size of my dataset, it should

*> result in estimates more close to that from the population - isn't it
**> the purpose of bootstrap?
*

No

*>
*

> When I use such median coefficients on another dataset (another sample

*> from population), the predictions are better, than using max likelihood
**> estimates. I have already tested that and it worked!
*

Then your testing procedure is probably not valid.

*>
*

> I am not a statistician and I don't feel what "overfitting" is, but it

*> may be just another word for the same idea.
**>
**> Nevertheless, I would still like to know how can I get the coeffcients
**> for the model that gives the "nearly unbiased estimates". I greatly
**> appreciate your help.
*

More info in my book Regression Modeling Strategies.

Frank

*>
**> --
*

> Michal J. Figurski

*> HUP, Pathology & Laboratory Medicine
**> Xenobiotics Toxicokinetics Research Laboratory
**> 3400 Spruce St. 7 Maloney
**> Philadelphia, PA 19104
**> tel. (215) 662-3413
**>
**> Frank E Harrell Jr wrote:
*

>> Michal Figurski wrote: >>> Hello all, >>> >>> I am trying to optimize my logistic regression model by using >>> bootstrap. I was previously using SAS for this kind of tasks, but I >>> am now switching to R. >>> >>> My data frame consists of 5 columns and has 109 rows. Each row is a >>> single record composed of the following values: Subject_name, >>> numeric1, numeric2, numeric3 and outcome (yes or no). All three >>> numerics are used to predict outcome using LR. >>> >>> In SAS I have written a macro, that was splitting the dataset, >>> running LR on one half of data and making predictions on second half. >>> Then it was collecting the equation coefficients from each iteration >>> of bootstrap. Later I was just taking medians of these coefficients >>> from all iterations, and used them as an optimal model - it really >>> worked well! >> >> Why not use maximum likelihood estimation, i.e., the coefficients from >> the original fit. How does the bootstrap improve on that? >> >>> >>> Now I want to do the same in R. I tried to use the 'validate' or >>> 'calibrate' functions from package "Design", and I also experimented >>> with function 'sm.binomial.bootstrap' from package "sm". I tried also >>> the function 'boot' from package "boot", though without success - in >>> my case it randomly selected _columns_ from my data frame, while I >>> wanted it to select _rows_. >> >> validate and calibrate in Design do resampling on the rows >> >> Resampling is mainly used to get a nearly unbiased estimate of the >> model performance, i.e., to correct for overfitting. >> >> Frank Harrell >> >>> >>> Though the main point here is the optimized LR equation. I would >>> appreciate any help on how to extract the LR equation coefficients >>> from any of these bootstrap functions, in the same form as given by >>> 'glm' or 'lrm'. >>> >>> Many thanks in advance! >>> >> >>

-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Mon 21 Jul 2008 - 23:26:24 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 14:31:59 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*