From: Michal Figurski <figurski_at_mail.med.upenn.edu>

Date: Wed, 30 Jul 2008 16:05:07 -0400

Date: Wed, 30 Jul 2008 16:05:07 -0400

Tim,

If I understand correctly, you are saying that one can't improve on estimating a mean by doing bootstrap and summarizing means of many such steps. As far as I understand (again), you're saying that this way one can only add bias without any improvement...

Well, this is in contradiction to some guides to bootstrap, that I found on the web (I did my homework), for example to this one: http://people.revoledu.com/kardi/tutorial/Bootstrap/Lyra/Bootstrap Statistic Mean.htm

It is all confusing, guys... Once somebody said, that there are as many opinions on a topic, as there are statisticians...

- Some of you say that this isn't bootstrap at all. In terms of terminology I totally submit to that, because I know too little. Would anyone suggest a name?
- Most of you say that this procedure is not the best one, that there are better ways. I will definitely do my homework on penalized regression, though no one of you has actually discredited this methodology. Therefore, though possibly not optimal, it remains valid.
- The criticism on "predictive performance" is that one has to take into account also other important quantities, like bias, variance, etc. Fortunately I did that in my work: using RMSE and log residuals from the validation process. I just observed that models with relatively small RMSE and log residuals (compared to other models) usually possess good predictive performance. And vice versa. Predictive performance has also a great advantage over RMSE or variance or anything else suggested here - it is easily understood by non-statisticians. I don't think it is /too simple/ in Einstein's terms, it's just simple.

Kind regards,

-- Michal J. Figurski Tim Hesterberg wrote:Received on Wed 30 Jul 2008 - 20:09:25 GMT

> I'll address the question of whether you can use the bootstrap to

> improve estimates, and whether you can use the bootstrap to "virtually> increase the size of the sample".>> Short answer - no, with some exceptions (bumping / Random Forests).>> Longer answer:> Suppose you have data (x1, ..., xn) and a statistic ThetaHat,> that you take a number of bootstrap samples (all of size n) and> let ThetaHatBar be the average of those bootstrap statistics from> those samples.>> Is ThetaHatBar better than ThetaHat? Usually not. Usually it> is worse. You have not collected any new data, you are just using the> existing data in a different way, that is usually harmful:> * If the statistic is the sample mean, all this does is to add> some noise to the estimate> * If the statistic is nonlinear, this gives an estimate that> has roughly double the bias, without improving the variance.>> What are the exceptions? The prime example is tree models (random> forests) - taking bootstrap averages helps smooth out the> discontinuities in tree models. For a simple example, suppose that a> simple linear regression model really holds:> y = beta x + epsilon> but that you fit a tree model; the tree model predictions are> a step function. If you bootstrap the data, the boundaries of> the step function will differ from one sample to another, so> the average of the bootstrap samples smears out the steps, getting> closer to the smooth linear relationship.>> Aside from such exceptions, the bootstrap is used for inference> (bias, standard error, confidence intervals), not improving on> ThetaHat.>> Tim Hesterberg>

>> Hi Doran, >> >> Maybe I am wrong, but I think bootstrap is a general resampling method which >> can be used for different purposes...Usually it works well when you do not >> have a presentative sample set (maybe with limited number of samples). >> Therefore, I am positive with Michal... >> >> P.S., overfitting, in my opinion, is used to depict when you got a model >> which is quite specific for the training dataset but cannot be generalized >> with new samples...... >> >> Thanks, >> >> --Jerry >> 2008/7/21 Doran, Harold <HDoran_at_air.org>: >> >>>> I used bootstrap to virtually increase the size of my >>>> dataset, it should result in estimates more close to that >>>> from the population - isn't it the purpose of bootstrap? >>> No, not really. The bootstrap is a resampling method for variance >>> estimation. It is often used when there is not an easy way, or a closed >>> form expression, for estimating the sampling variance of a statistic. > ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 31 Jul 2008 - 04:33:10 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*