From: Henrik Bengtsson <hb_at_biostat.ucsf.edu>

Date: Thu, 04 Nov 2010 10:59:31 -0700

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 04 Nov 2010 - 18:01:52 GMT

Date: Thu, 04 Nov 2010 10:59:31 -0700

Hi.

On Thu, Nov 4, 2010 at 7:42 AM, Tim Hesterberg <timhesterberg_at_gmail.com> wrote:

> On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb@biostat.ucsf.edu>wrote:

*>
**>> Hi, consider this one as an FYI, or a seed for further discussion.
**>>
**>> I am aware that many traps on sample() have been reported over the
**>> years. I know that these are also documents in help("sample"). Still
**>> I got bitten by this while writing
**>>...
**>> All of the above makes sense when one study the code of sample(), but
**>> sample() is indeed dangerous, e.g. imagine how many bootstrap
**>> estimates out there quietly gets incorrect.
**>
**> Nonparametric bootstrapping from a sample of size 1 is <always> incorrect.
**> If you draw a single observation from a sample of size 1, you get that
**> same observation back. This implies zero sampling variability, which
**> is wrong. If this single sample represents one stratum or sample in
**> a larger problem, this would contribute zero variability to the overall
**> result, again wrong.
**>
**> In general, the ordinary bootstrap underestimates variability in
**> small samples. For a sample mean, the ordinary bootstrap corresponds
**> to using an estimate of variance equal to (1/n) sum((x - mean(x))^2),
**> instead of a divisor of n-1. In stratified and multi-sample applications
**> the downward bias is similarly (n-1)/n.
**>
**> Three remedies are:
**> * draw bootstrap samples of size n-1
**> * "bootknife" sampling - omit one observation (a jackknife sample), then
**> draw a bootstrap sample of size n from that
**> * bootstrap from a kernel density estimate, with kernel covariance equal
**> to empirical covariance (with divisor n-1) / n.
**> The latter two are described in
**> Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs.
**> Smoothing, Proceedings of the Section on Statistics and the Environment,
**> American Statistical Association, 2924-2930.
**> http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf
**>
**> All three are undefined for samples of size 1. You need to go to some
**> other bootstrap, e.g. a parametric bootstrap with variability estimated
**> from other data.
*

I had a feeling that I was going to be bitten by that attention grabber on bootstrapping. Worse it may be misleading to some. But honestly, thank you Tim for pointing this out and so clearly explaining it all.

/Henrik

*>
**> Tim Hesterberg
**>
**>
*

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 04 Nov 2010 - 18:01:52 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 04 Nov 2010 - 18:10:17 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*