Re: [Rd] Using sample() to sample one value from a single value?

From: Henrik Bengtsson <hb_at_biostat.ucsf.edu>
Date: Wed, 03 Nov 2010 11:07:51 -0700

On Wed, Nov 3, 2010 at 11:02 AM, Henrique Dallazuanna <wwwhsd_at_gmail.com> wrote:
> The resample function in the example section from sample help page does it
> or not?

Yes, I just noticed that one [at the very end of the example in help("sample")]. So, maybe resample() should be a function available in R?

/Henrik

>
> On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb@biostat.ucsf.edu>wrote:
>
>> Hi, consider this one as an FYI, or a seed for further discussion.
>>
>> I am aware that many traps on sample() have been reported over the
>> years.  I know that these are also documents in help("sample").  Still
>> I got bitten by this while writing
>>
>> sample(units, size=length(units));
>>
>> where 'units' is an index (positive integer) vector.  It works in all
>> cases as expected (=I expect) expect for length(units) == 1.  I know,
>> it is well known.  However, it got to make me wonder if it is possible
>> to use sample() to draw a single value from a set containing only one
>> value.  I don't think so, unless you draw from a value that is <= 1.
>>
>> For instance, you can sample from c(10,10) by doing:
>>
>> > sample(rep(10, times=2), size=2);
>> [1] 10 10
>>
>> but you cannot sample from c(10) by doing:
>>
>> > sample(rep(10, times=1), size=1);
>> [1] 9
>>
>> unless you sample from a value <= 1, e.g.
>>
>> sample(rep(0.31, times=1), size=1);
>> [1] 0.31
>>
>> sample(rep(-10, times=1), size=1);
>> [1] -10
>>
>> Note also the related issue of sampling from a double vector of length 1,
>> e.g.
>>
>> > sample(rep(1.2, times=2), size=2);
>> [1] 1.2 1.2
>> > sample(rep(1.2, times=1), size=1);
>> [1] 1
>>
>> I the latter case 1.2 is coerced to an integer.
>>
>> All of the above makes sense when one study the code of sample(), but
>> sample() is indeed dangerous, e.g. imagine how many bootstrap
>> estimates out there quietly gets incorrect.
>>
>>
>> In order to cover all cases of length(units), including one, a solution is:
>>
>> sampleFrom <- function(x, size=length(x), ...) {
>>  n <- length(x);
>>  if (n == 1L) {
>>    res <- x;
>>  } else {
>>    res <- sample(x, size=size, ...);
>>  }
>>  res;
>> } # sampleFrom()
>>
>> > sampleFrom(rep(10, times=2), size=2);
>> [1] 10 10
>>
>> > sampleFrom(rep(10, times=1), size=1);
>> [1] 10
>>
>> > sampleFrom(rep(0.31, times=1), size=1);
>> [1] 0.31
>>
>> > sampleFrom(rep(-10, times=1), size=1);
>> [1] -10
>>
>> > sampleFrom(rep(1.2, times=2), size=2);
>> [1] 1.2 1.2
>>
>> > sampleFrom(rep(1.2, times=1), size=1);
>> [1] 1.2
>>
>>
>> I want to add sampleFrom() to the wishlist of functions to be
>> available in default R.  Alternatively, one can add an argument
>> 'sampleFrom=FALSE' to the existing sample() function.  Eventually such
>> an argument can be made TRUE by default.
>>
>> /Henrik
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>>
https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 03 Nov 2010 - 18:10:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 03 Nov 2010 - 18:40:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive