[Rd] Using sample() to sample one value from a single value?

From: Henrik Bengtsson <hb_at_biostat.ucsf.edu>
Date: Wed, 03 Nov 2010 10:54:18 -0700

Hi, consider this one as an FYI, or a seed for further discussion.

I am aware that many traps on sample() have been reported over the years. I know that these are also documents in help("sample"). Still I got bitten by this while writing

sample(units, size=length(units));

where 'units' is an index (positive integer) vector. It works in all cases as expected (=I expect) expect for length(units) == 1. I know, it is well known. However, it got to make me wonder if it is possible to use sample() to draw a single value from a set containing only one value. I don't think so, unless you draw from a value that is <= 1.

For instance, you can sample from c(10,10) by doing:

> sample(rep(10, times=2), size=2);

[1] 10 10

but you cannot sample from c(10) by doing:

> sample(rep(10, times=1), size=1);

[1] 9

unless you sample from a value <= 1, e.g.

sample(rep(0.31, times=1), size=1);
[1] 0.31

sample(rep(-10, times=1), size=1);
[1] -10

Note also the related issue of sampling from a double vector of length 1, e.g.

> sample(rep(1.2, times=2), size=2);

[1] 1.2 1.2
> sample(rep(1.2, times=1), size=1);

[1] 1

I the latter case 1.2 is coerced to an integer.

All of the above makes sense when one study the code of sample(), but sample() is indeed dangerous, e.g. imagine how many bootstrap estimates out there quietly gets incorrect.

In order to cover all cases of length(units), including one, a solution is:

sampleFrom <- function(x, size=length(x), ...) {   n <- length(x);
  if (n == 1L) {
    res <- x;
  } else {
    res <- sample(x, size=size, ...);
} # sampleFrom()

> sampleFrom(rep(10, times=2), size=2);
[1] 10 10

> sampleFrom(rep(10, times=1), size=1);
[1] 10

> sampleFrom(rep(0.31, times=1), size=1);
[1] 0.31

> sampleFrom(rep(-10, times=1), size=1);
[1] -10

> sampleFrom(rep(1.2, times=2), size=2);
[1] 1.2 1.2

> sampleFrom(rep(1.2, times=1), size=1);
[1] 1.2

I want to add sampleFrom() to the wishlist of functions to be available in default R. Alternatively, one can add an argument 'sampleFrom=FALSE' to the existing sample() function. Eventually such an argument can be made TRUE by default.


R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 03 Nov 2010 - 18:00:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 03 Nov 2010 - 18:20:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive