Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

From: Michal Figurski <>
Date: Tue, 22 Jul 2008 15:42:25 -0400

Dear Marc and all,

Thank you for all the due respect.

I tried to explain as much explicitly as I could what I am trying to do in my first email. I did not invent this procedure, it was already published in the paper:

T. Pawinski, M. Hale, M. Korecka, W.E. Fitzsimmons, L.M. Shaw. Limited Sampling Strategy for the Estimation of Mycophenolic Acid Area under the Curve in Adult Renal Transplant Patients Treated with Concomitant Tacrolimus. Clinical Chemistry 2002(48:9), 1497-1504

I only adopted this methodology to work under SAS and now I try to do it under R, because I like R. I need a practical advice because I have a practical problem, and I do not understand much of the theoretical discussion on what bootstrap is suitable for or not. Apparently I am trying to use it for something else than the experts are used to...

Honestly, I did not learn anything from this discussion so far, I am just disappointed.

Though, since the discussion has already started, I'd welcome your criticism on this procedure - I just ask that you express it in human language.

Michal J. Figurski

Marc Schwartz wrote:

> Michal,
> With all due respect, you have openly acknowledged that you don't know
> enough about the subject at hand.
> If that is the case, on what basis are you in a position to challenge
> the collective wisdom of those professionals who have voluntarily
> offered *expert* level statistical advice to you?
> You have erected a wall around your thinking.
> You may choose to use R or any other software application to
> "Git-R-Done". But that does not make it correct.
> There are other methods to consider that could be used during the model
> building process itself, rather than on a post-hoc basis and I would
> specifically refer you to Frank's book, Regression Modeling Strategies:
> Marc Schwartz
> on 07/22/2008 09:43 AM Michal Figurski wrote:
>> Hmm... >> >> It sounds like ideology to me. I was asking for technical help. I know >> what I want to do, just don't know how to do it in R. I'll go back to >> SAS then. Thank you. >> >> -- >> Michal J. Figurski >> >> Doran, Harold wrote: >>> I think the answer has been given to you. If you want to continue to >>> ignore that advice and use bootstrap for point estimates rather than the >>> properties of those estimates (which is what bootstrap is for) then you >>> are on your own. >>>> -----Original Message----- >>>> From: >>>> [] On Behalf Of Michal Figurski >>>> Sent: Tuesday, July 22, 2008 9:52 AM >>>> To: >>>> Subject: Re: [R] Coefficients of Logistic Regression from bootstrap >>>> - how to get them? >>>> >>>> Dear all, >>>> >>>> I don't want to argue with anybody about words or about what >>>> bootstrap is suitable for - I know too little for that. >>>> >>>> All I need is help to get the *equation coefficients* optimized by >>>> bootstrap - either by one of the functions or by simple median. >>>> >>>> Please help, >>>> >>>> -- >>>> Michal J. Figurski >>>> HUP, Pathology & Laboratory Medicine >>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 >>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413 >>>> >>>> Frank E Harrell Jr wrote: >>>>> Michal Figurski wrote: >>>>>> Frank, >>>>>> >>>>>> "How does bootstrap improve on that?" >>>>>> >>>>>> I don't know, but I have an idea. Since the data in my set >>>> are just a >>>>>> small sample of a big population, then if I use my whole >>>> dataset to >>>>>> obtain max likelihood estimates, these estimates may be >>>> best for this >>>>>> dataset, but far from ideal for the whole population. >>>>> The bootstrap, being a resampling procedure from your >>>> sample, has the >>>>> same issues about the population as MLEs. >>>>> >>>>>> I used bootstrap to virtually increase the size of my dataset, it >>>>>> should result in estimates more close to that from the >>>> population - >>>>>> isn't it the purpose of bootstrap? >>>>> No >>>>> >>>>>> When I use such median coefficients on another dataset (another >>>>>> sample from population), the predictions are better, than >>>> using max >>>>>> likelihood estimates. I have already tested that and it worked! >>>>> Then your testing procedure is probably not valid. >>>>> >>>>>> I am not a statistician and I don't feel what >>>> "overfitting" is, but >>>>>> it may be just another word for the same idea. >>>>>> >>>>>> Nevertheless, I would still like to know how can I get the >>>>>> coeffcients for the model that gives the "nearly unbiased >>>> estimates". >>>>>> I greatly appreciate your help. >>>>> More info in my book Regression Modeling Strategies. >>>>> >>>>> Frank >>>>> >>>>>> -- >>>>>> Michal J. Figurski >>>>>> HUP, Pathology & Laboratory Medicine >>>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 >>>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413 >>>>>> >>>>>> Frank E Harrell Jr wrote: >>>>>>> Michal Figurski wrote: >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I am trying to optimize my logistic regression model by using >>>>>>>> bootstrap. I was previously using SAS for this kind of >>>> tasks, but I >>>>>>>> am now switching to R. >>>>>>>> >>>>>>>> My data frame consists of 5 columns and has 109 rows. >>>> Each row is a >>>>>>>> single record composed of the following values: Subject_name, >>>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three >>>>>>>> numerics are used to predict outcome using LR. >>>>>>>> >>>>>>>> In SAS I have written a macro, that was splitting the dataset, >>>>>>>> running LR on one half of data and making predictions on second >>>>>>>> half. Then it was collecting the equation coefficients from each >>>>>>>> iteration of bootstrap. Later I was just taking medians of these >>>>>>>> coefficients from all iterations, and used them as an >>>> optimal model >>>>>>>> - it really worked well! >>>>>>> Why not use maximum likelihood estimation, i.e., the coefficients >>>>>>> from the original fit. How does the bootstrap improve on that? >>>>>>> >>>>>>>> Now I want to do the same in R. I tried to use the 'validate' or >>>>>>>> 'calibrate' functions from package "Design", and I also >>>>>>>> experimented with function 'sm.binomial.bootstrap' from package >>>>>>>> "sm". I tried also the function 'boot' from package >>>> "boot", though >>>>>>>> without success >>>>>>>> - in my case it randomly selected _columns_ from my data frame, >>>>>>>> while I wanted it to select _rows_. >>>>>>> validate and calibrate in Design do resampling on the rows >>>>>>> >>>>>>> Resampling is mainly used to get a nearly unbiased >>>> estimate of the >>>>>>> model performance, i.e., to correct for overfitting. >>>>>>> >>>>>>> Frank Harrell >>>>>>> >>>>>>>> Though the main point here is the optimized LR equation. I would >>>>>>>> appreciate any help on how to extract the LR equation >>>> coefficients >>>>>>>> from any of these bootstrap functions, in the same form >>>> as given by >>>>>>>> 'glm' or 'lrm'. >>>>>>>> >>>>>>>> Many thanks in advance! >>>>>>>> ______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.
Received on Tue 22 Jul 2008 - 19:50:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Jul 2008 - 01:32:37 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive