Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

From: Michal Figurski <figurski_at_mail.med.upenn.edu>
Date: Mon, 21 Jul 2008 15:28:38 -0400

Frank,

"How does bootstrap improve on that?"

I don't know, but I have an idea. Since the data in my set are just a small sample of a big population, then if I use my whole dataset to obtain max likelihood estimates, these estimates may be best for this dataset, but far from ideal for the whole population.

I used bootstrap to virtually increase the size of my dataset, it should result in estimates more close to that from the population - isn't it the purpose of bootstrap?

When I use such median coefficients on another dataset (another sample from population), the predictions are better, than using max likelihood estimates. I have already tested that and it worked!

I am not a statistician and I don't feel what "overfitting" is, but it may be just another word for the same idea.

Nevertheless, I would still like to know how can I get the coeffcients for the model that gives the "nearly unbiased estimates". I greatly appreciate your help.

--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

Frank E Harrell Jr wrote:

> Michal Figurski wrote:
>> Hello all,
>>
>> I am trying to optimize my logistic regression model by using
>> bootstrap. I was previously using SAS for this kind of tasks, but I am
>> now switching to R.
>>
>> My data frame consists of 5 columns and has 109 rows. Each row is a
>> single record composed of the following values: Subject_name,
>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>> numerics are used to predict outcome using LR.
>>
>> In SAS I have written a macro, that was splitting the dataset, running
>> LR on one half of data and making predictions on second half. Then it
>> was collecting the equation coefficients from each iteration of
>> bootstrap. Later I was just taking medians of these coefficients from
>> all iterations, and used them as an optimal model - it really worked
>> well!
>
> Why not use maximum likelihood estimation, i.e., the coefficients from
> the original fit. How does the bootstrap improve on that?
>
>>
>> Now I want to do the same in R. I tried to use the 'validate' or
>> 'calibrate' functions from package "Design", and I also experimented
>> with function 'sm.binomial.bootstrap' from package "sm". I tried also
>> the function 'boot' from package "boot", though without success - in
>> my case it randomly selected _columns_ from my data frame, while I
>> wanted it to select _rows_.
>
> validate and calibrate in Design do resampling on the rows
>
> Resampling is mainly used to get a nearly unbiased estimate of the model
> performance, i.e., to correct for overfitting.
>
> Frank Harrell
>
>>
>> Though the main point here is the optimized LR equation. I would
>> appreciate any help on how to extract the LR equation coefficients
>> from any of these bootstrap functions, in the same form as given by
>> 'glm' or 'lrm'.
>>
>> Many thanks in advance!
>>
>
>
______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Mon 21 Jul 2008 - 19:33:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 Jul 2008 - 23:31:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive