From: Henrik Bengtsson <hb_at_biostat.ucsf.edu>

Date: Wed, 03 Nov 2010 11:19:47 -0700

R-devel_at_r-project.org mailing list

Received on Wed 03 Nov 2010 - 18:21:46 GMT

On Wed, Nov 3, 2010 at 11:07 AM, Henrik Bengtsson <hb_at_biostat.ucsf.edu> wrote:

> On Wed, Nov 3, 2010 at 11:02 AM, Henrique Dallazuanna <wwwhsd@gmail.com> wrote:

*>> The resample function in the example section from sample help page does it
**>> or not?
**>
**> Yes, I just noticed that one [at the very end of the example in
**> help("sample")]. So, maybe resample() should be a function available
**> in R?
*

So for completeness, this has also be discussed in R-devel thread '[patch] add is.set parameter to sample()' started on 2010-03-23, cf. http://www.mail-archive.com/r-devel@r-project.org/msg19998.html.

/Henrik

*>
**> /Henrik
**>
**>>
**>> On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb_at_biostat.ucsf.edu>wrote:
**>>
*

>>> Hi, consider this one as an FYI, or a seed for further discussion.

*>>>
**>>> I am aware that many traps on sample() have been reported over the
**>>> years. I know that these are also documents in help("sample"). Still
**>>> I got bitten by this while writing
**>>>
**>>> sample(units, size=length(units));
**>>>
**>>> where 'units' is an index (positive integer) vector. It works in all
**>>> cases as expected (=I expect) expect for length(units) == 1. I know,
**>>> it is well known. However, it got to make me wonder if it is possible
**>>> to use sample() to draw a single value from a set containing only one
**>>> value. I don't think so, unless you draw from a value that is <= 1.
**>>>
**>>> For instance, you can sample from c(10,10) by doing:
**>>>
**>>> > sample(rep(10, times=2), size=2);
**>>> [1] 10 10
**>>>
**>>> but you cannot sample from c(10) by doing:
**>>>
**>>> > sample(rep(10, times=1), size=1);
**>>> [1] 9
**>>>
**>>> unless you sample from a value <= 1, e.g.
**>>>
**>>> sample(rep(0.31, times=1), size=1);
**>>> [1] 0.31
**>>>
**>>> sample(rep(-10, times=1), size=1);
**>>> [1] -10
**>>>
**>>> Note also the related issue of sampling from a double vector of length 1,
**>>> e.g.
**>>>
**>>> > sample(rep(1.2, times=2), size=2);
**>>> [1] 1.2 1.2
**>>> > sample(rep(1.2, times=1), size=1);
**>>> [1] 1
**>>>
**>>> I the latter case 1.2 is coerced to an integer.
**>>>
**>>> All of the above makes sense when one study the code of sample(), but
**>>> sample() is indeed dangerous, e.g. imagine how many bootstrap
**>>> estimates out there quietly gets incorrect.
**>>>
**>>>
**>>> In order to cover all cases of length(units), including one, a solution is:
**>>>
**>>> sampleFrom <- function(x, size=length(x), ...) {
**>>> n <- length(x);
**>>> if (n == 1L) {
**>>> res <- x;
**>>> } else {
**>>> res <- sample(x, size=size, ...);
**>>> }
**>>> res;
**>>> } # sampleFrom()
**>>>
**>>> > sampleFrom(rep(10, times=2), size=2);
**>>> [1] 10 10
**>>>
**>>> > sampleFrom(rep(10, times=1), size=1);
**>>> [1] 10
**>>>
**>>> > sampleFrom(rep(0.31, times=1), size=1);
**>>> [1] 0.31
**>>>
**>>> > sampleFrom(rep(-10, times=1), size=1);
**>>> [1] -10
**>>>
**>>> > sampleFrom(rep(1.2, times=2), size=2);
**>>> [1] 1.2 1.2
**>>>
**>>> > sampleFrom(rep(1.2, times=1), size=1);
**>>> [1] 1.2
**>>>
**>>>
**>>> I want to add sampleFrom() to the wishlist of functions to be
**>>> available in default R. Alternatively, one can add an argument
**>>> 'sampleFrom=FALSE' to the existing sample() function. Eventually such
**>>> an argument can be made TRUE by default.
**>>>
**>>> /Henrik
**>>>
**>>
**>>
**>>
**>> --
**>> Henrique Dallazuanna
**>> Curitiba-Paranį-Brasil
**>> 25° 25' 40" S 49° 16' 22" O
**>>
**>> [[alternative HTML version deleted]]
**>>
**>>
**>
*

