From: Tim Hesterberg <timhesterberg_at_gmail.com>

Date: Thu, 04 Nov 2010 07:42:52 -0700

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 04 Nov 2010 - 14:50:05 GMT

Date: Thu, 04 Nov 2010 07:42:52 -0700

On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb_at_biostat.ucsf.edu>wrote:

> Hi, consider this one as an FYI, or a seed for further discussion.

*>
**> I am aware that many traps on sample() have been reported over the
**> years. I know that these are also documents in help("sample"). Still
**> I got bitten by this while writing
**>...
**> All of the above makes sense when one study the code of sample(), but
**> sample() is indeed dangerous, e.g. imagine how many bootstrap
**> estimates out there quietly gets incorrect.
*

Nonparametric bootstrapping from a sample of size 1 is <always> incorrect. If you draw a single observation from a sample of size 1, you get that same observation back. This implies zero sampling variability, which is wrong. If this single sample represents one stratum or sample in a larger problem, this would contribute zero variability to the overall result, again wrong.

In general, the ordinary bootstrap underestimates variability in small samples. For a sample mean, the ordinary bootstrap corresponds to using an estimate of variance equal to (1/n) sum((x - mean(x))^2), instead of a divisor of n-1. In stratified and multi-sample applications the downward bias is similarly (n-1)/n.

Three remedies are:

* draw bootstrap samples of size n-1

* "bootknife" sampling - omit one observation (a jackknife sample), then
draw a bootstrap sample of size n from that
* bootstrap from a kernel density estimate, with kernel covariance equal
to empirical covariance (with divisor n-1) / n.
The latter two are described in

Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

All three are undefined for samples of size 1. You need to go to some other bootstrap, e.g. a parametric bootstrap with variability estimated from other data.

Tim Hesterberg

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 04 Nov 2010 - 14:50:05 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 04 Nov 2010 - 18:10:17 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*