Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

From: N. Lapidus <n.lapidus_at_gmail.com>
Date: Tue, 22 Jul 2008 17:07:13 +0200

Hi Michal,
This paper by John Fox may help you to precise what you are looking for and to perform your analyses
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf Nael

On Tue, Jul 22, 2008 at 3:51 PM, Michal Figurski < figurski_at_mail.med.upenn.edu> wrote:

> Dear all,
>
> I don't want to argue with anybody about words or about what bootstrap is
> suitable for - I know too little for that.
>
> All I need is help to get the *equation coefficients* optimized by
> bootstrap - either by one of the functions or by simple median.
>
> Please help,
>
> --
> Michal J. Figurski
> HUP, Pathology & Laboratory Medicine
> Xenobiotics Toxicokinetics Research Laboratory
> 3400 Spruce St. 7 Maloney
> Philadelphia, PA 19104
> tel. (215) 662-3413
>
> Frank E Harrell Jr wrote:
>
>> Michal Figurski wrote:
>>
>>> Frank,
>>>
>>> "How does bootstrap improve on that?"
>>>
>>> I don't know, but I have an idea. Since the data in my set are just a
>>> small sample of a big population, then if I use my whole dataset to obtain
>>> max likelihood estimates, these estimates may be best for this dataset, but
>>> far from ideal for the whole population.
>>>
>>
>> The bootstrap, being a resampling procedure from your sample, has the same
>> issues about the population as MLEs.
>>
>>
>>> I used bootstrap to virtually increase the size of my dataset, it should
>>> result in estimates more close to that from the population - isn't it the
>>> purpose of bootstrap?
>>>
>>
>> No
>>
>>
>>> When I use such median coefficients on another dataset (another sample
>>> from population), the predictions are better, than using max likelihood
>>> estimates. I have already tested that and it worked!
>>>
>>
>> Then your testing procedure is probably not valid.
>>
>>
>>> I am not a statistician and I don't feel what "overfitting" is, but it
>>> may be just another word for the same idea.
>>>
>>> Nevertheless, I would still like to know how can I get the coeffcients
>>> for the model that gives the "nearly unbiased estimates". I greatly
>>> appreciate your help.
>>>
>>
>> More info in my book Regression Modeling Strategies.
>>
>> Frank
>>
>>
>>> --
>>> Michal J. Figurski
>>> HUP, Pathology & Laboratory Medicine
>>> Xenobiotics Toxicokinetics Research Laboratory
>>> 3400 Spruce St. 7 Maloney
>>> Philadelphia, PA 19104
>>> tel. (215) 662-3413
>>>
>>> Frank E Harrell Jr wrote:
>>>
>>>> Michal Figurski wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I am trying to optimize my logistic regression model by using
>>>>> bootstrap. I was previously using SAS for this kind of tasks, but I am now
>>>>> switching to R.
>>>>>
>>>>> My data frame consists of 5 columns and has 109 rows. Each row is a
>>>>> single record composed of the following values: Subject_name, numeric1,
>>>>> numeric2, numeric3 and outcome (yes or no). All three numerics are used to
>>>>> predict outcome using LR.
>>>>>
>>>>> In SAS I have written a macro, that was splitting the dataset, running
>>>>> LR on one half of data and making predictions on second half. Then it was
>>>>> collecting the equation coefficients from each iteration of bootstrap. Later
>>>>> I was just taking medians of these coefficients from all iterations, and
>>>>> used them as an optimal model - it really worked well!
>>>>>
>>>>
>>>> Why not use maximum likelihood estimation, i.e., the coefficients from
>>>> the original fit. How does the bootstrap improve on that?
>>>>
>>>>
>>>>> Now I want to do the same in R. I tried to use the 'validate' or
>>>>> 'calibrate' functions from package "Design", and I also experimented with
>>>>> function 'sm.binomial.bootstrap' from package "sm". I tried also the
>>>>> function 'boot' from package "boot", though without success - in my case it
>>>>> randomly selected _columns_ from my data frame, while I wanted it to select
>>>>> _rows_.
>>>>>
>>>>
>>>> validate and calibrate in Design do resampling on the rows
>>>>
>>>> Resampling is mainly used to get a nearly unbiased estimate of the model
>>>> performance, i.e., to correct for overfitting.
>>>>
>>>> Frank Harrell
>>>>
>>>>
>>>>> Though the main point here is the optimized LR equation. I would
>>>>> appreciate any help on how to extract the LR equation coefficients from any
>>>>> of these bootstrap functions, in the same form as given by 'glm' or 'lrm'.
>>>>>
>>>>> Many thanks in advance!
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 15:40:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 16:31:53 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive