Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

From: Michal Figurski <figurski_at_mail.med.upenn.edu>
Date: Wed, 30 Jul 2008 16:05:07 -0400

Tim,

If I understand correctly, you are saying that one can't improve on estimating a mean by doing bootstrap and summarizing means of many such steps. As far as I understand (again), you're saying that this way one can only add bias without any improvement...

Well, this is in contradiction to some guides to bootstrap, that I found on the web (I did my homework), for example to this one: http://people.revoledu.com/kardi/tutorial/Bootstrap/Lyra/Bootstrap Statistic Mean.htm

It is all confusing, guys... Once somebody said, that there are as many opinions on a topic, as there are statisticians...

Also, translating your statements into the example of hammer and rock, you are saying that one cannot use hammer to break rocks because it was created to drive nails.

With all respect, despite my limited knowledge, I do not agree. The big point is that the mean, or standard error, or confidence intervals of the data itself are *meaningless* in the pharmacokinetic dataset. These data are time series of a highly variable quantity, that is known to display a peak (or two in the case of Pawinski's paper). It is as if you tried to calculate a mean of a chromatogram (example for chemists, sorry).

Nevertheless, I thank all of you, experts, for your insight and advice. In the end, I learned a lot, though I keep my initial view. Summarizing your criticism of the procedure described in Pawinski's paper:

Kind regards,

--
Michal J. Figurski


Tim Hesterberg wrote:

> I'll address the question of whether you can use the bootstrap to
> improve estimates, and whether you can use the bootstrap to "virtually
> increase the size of the sample".
>
> Short answer - no, with some exceptions (bumping / Random Forests).
>
> Longer answer:
> Suppose you have data (x1, ..., xn) and a statistic ThetaHat,
> that you take a number of bootstrap samples (all of size n) and
> let ThetaHatBar be the average of those bootstrap statistics from
> those samples.
>
> Is ThetaHatBar better than ThetaHat? Usually not. Usually it
> is worse. You have not collected any new data, you are just using the
> existing data in a different way, that is usually harmful:
> * If the statistic is the sample mean, all this does is to add
> some noise to the estimate
> * If the statistic is nonlinear, this gives an estimate that
> has roughly double the bias, without improving the variance.
>
> What are the exceptions? The prime example is tree models (random
> forests) - taking bootstrap averages helps smooth out the
> discontinuities in tree models. For a simple example, suppose that a
> simple linear regression model really holds:
> y = beta x + epsilon
> but that you fit a tree model; the tree model predictions are
> a step function. If you bootstrap the data, the boundaries of
> the step function will differ from one sample to another, so
> the average of the bootstrap samples smears out the steps, getting
> closer to the smooth linear relationship.
>
> Aside from such exceptions, the bootstrap is used for inference
> (bias, standard error, confidence intervals), not improving on
> ThetaHat.
>
> Tim Hesterberg
>
>> Hi Doran, >> >> Maybe I am wrong, but I think bootstrap is a general resampling method which >> can be used for different purposes...Usually it works well when you do not >> have a presentative sample set (maybe with limited number of samples). >> Therefore, I am positive with Michal... >> >> P.S., overfitting, in my opinion, is used to depict when you got a model >> which is quite specific for the training dataset but cannot be generalized >> with new samples...... >> >> Thanks, >> >> --Jerry >> 2008/7/21 Doran, Harold <HDoran_at_air.org>: >> >>>> I used bootstrap to virtually increase the size of my >>>> dataset, it should result in estimates more close to that >>>> from the population - isn't it the purpose of bootstrap? >>> No, not really. The bootstrap is a resampling method for variance >>> estimation. It is often used when there is not an easy way, or a closed >>> form expression, for estimating the sampling variance of a statistic. > ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Wed 30 Jul 2008 - 20:09:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 Jul 2008 - 04:33:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive