Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

From: Michal Figurski <figurski_at_mail.med.upenn.edu>
Date: Tue, 22 Jul 2008 10:43:59 -0400

Hmm...

It sounds like ideology to me. I was asking for technical help. I know what I want to do, just don't know how to do it in R. I'll go back to SAS then. Thank you.

--
Michal J. Figurski

Doran, Harold wrote:
> I think the answer has been given to you. If you want to continue to
> ignore that advice and use bootstrap for point estimates rather than the
> properties of those estimates (which is what bootstrap is for) then you
> are on your own. 
> 

>> -----Original Message-----
>> From: r-help-bounces_at_r-project.org
>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Michal Figurski
>> Sent: Tuesday, July 22, 2008 9:52 AM
>> To: r-help_at_r-project.org
>> Subject: Re: [R] Coefficients of Logistic Regression from
>> bootstrap - how to get them?
>>
>> Dear all,
>>
>> I don't want to argue with anybody about words or about what
>> bootstrap is suitable for - I know too little for that.
>>
>> All I need is help to get the *equation coefficients*
>> optimized by bootstrap - either by one of the functions or by
>> simple median.
>>
>> Please help,
>>
>> --
>> Michal J. Figurski
>> HUP, Pathology & Laboratory Medicine
>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce
>> St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>
>> Frank E Harrell Jr wrote:
>>> Michal Figurski wrote:
>>>> Frank,
>>>>
>>>> "How does bootstrap improve on that?"
>>>>
>>>> I don't know, but I have an idea. Since the data in my set
>> are just a
>>>> small sample of a big population, then if I use my whole
>> dataset to
>>>> obtain max likelihood estimates, these estimates may be
>> best for this
>>>> dataset, but far from ideal for the whole population.
>>> The bootstrap, being a resampling procedure from your
>> sample, has the
>>> same issues about the population as MLEs.
>>>
>>>> I used bootstrap to virtually increase the size of my dataset, it
>>>> should result in estimates more close to that from the
>> population -
>>>> isn't it the purpose of bootstrap?
>>> No
>>>
>>>> When I use such median coefficients on another dataset (another
>>>> sample from population), the predictions are better, than
>> using max
>>>> likelihood estimates. I have already tested that and it worked!
>>> Then your testing procedure is probably not valid.
>>>
>>>> I am not a statistician and I don't feel what
>> "overfitting" is, but
>>>> it may be just another word for the same idea.
>>>>
>>>> Nevertheless, I would still like to know how can I get the
>>>> coeffcients for the model that gives the "nearly unbiased
>> estimates".
>>>> I greatly appreciate your help.
>>> More info in my book Regression Modeling Strategies.
>>>
>>> Frank
>>>
>>>> --
>>>> Michal J. Figurski
>>>> HUP, Pathology & Laboratory Medicine
>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>
>>>> Frank E Harrell Jr wrote:
>>>>> Michal Figurski wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> I am trying to optimize my logistic regression model by using
>>>>>> bootstrap. I was previously using SAS for this kind of
>> tasks, but I
>>>>>> am now switching to R.
>>>>>>
>>>>>> My data frame consists of 5 columns and has 109 rows.
>> Each row is a
>>>>>> single record composed of the following values: Subject_name,
>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>>>>>> numerics are used to predict outcome using LR.
>>>>>>
>>>>>> In SAS I have written a macro, that was splitting the dataset,
>>>>>> running LR on one half of data and making predictions on second
>>>>>> half. Then it was collecting the equation coefficients from each
>>>>>> iteration of bootstrap. Later I was just taking medians of these
>>>>>> coefficients from all iterations, and used them as an
>> optimal model
>>>>>> - it really worked well!
>>>>> Why not use maximum likelihood estimation, i.e., the coefficients
>>>>> from the original fit. How does the bootstrap improve on that?
>>>>>
>>>>>> Now I want to do the same in R. I tried to use the 'validate' or
>>>>>> 'calibrate' functions from package "Design", and I also
>>>>>> experimented with function 'sm.binomial.bootstrap' from package
>>>>>> "sm". I tried also the function 'boot' from package
>> "boot", though
>>>>>> without success
>>>>>> - in my case it randomly selected _columns_ from my data frame,
>>>>>> while I wanted it to select _rows_.
>>>>> validate and calibrate in Design do resampling on the rows
>>>>>
>>>>> Resampling is mainly used to get a nearly unbiased
>> estimate of the
>>>>> model performance, i.e., to correct for overfitting.
>>>>>
>>>>> Frank Harrell
>>>>>
>>>>>> Though the main point here is the optimized LR equation. I would
>>>>>> appreciate any help on how to extract the LR equation
>> coefficients
>>>>>> from any of these bootstrap functions, in the same form
>> as given by
>>>>>> 'glm' or 'lrm'.
>>>>>>
>>>>>> Many thanks in advance!
>>>>>>
>>>>>
>>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Tue 22 Jul 2008 - 15:16:55 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Jul 2008 - 02:32:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive