# Re: [R] Sampling

From: Tim Hesterberg <timh_at_insightful.com>
Date: Wed, 06 Feb 2008 10:49:24 -0800

> I want to generate different samples using the
>followindg code:
>
>g<-sample(LETTERS[1:2], 24, replace=T)
>
> How can I specify that I need 12 "A"s and 12 "B"s?

I introduced the concept of "sampling with minimal replacement" into the S-PLUS version of sample to handle things like this:

sample(LETTERS[1:2], 24, minimal = T)

This is very useful in variance reduction applications, to approximately stratify but with introducing bias. I'd like to see this in R.

The selection probabilities are not proportional to the specified probabilities.

In contrast, in S-PLUS:
> values <- sapply(1:1000, function(i) sample(1:3, size=2, prob = c(.5, .25, .25)))
> table(values)

1 2 3
1000 501 499

You can specify minimal = FALSE to get the same behavior as R:
> values <- sapply(1:1000, function(i) sample(1:3, size=2, prob = c(.5, .25, .25), minimal = F))
> table(values)

1 2 3
844 592 564

There is a reason this is associated with the concept of sampling with minimal replacement. Consider for example:

sample(1:4, size = 3, prob = 1:4/10)
The expected frequencies of (1,2,3,4) should be proportional to size*prob = c(.3,.6,.9,1.2). That isn't possible when sampling without replacement. Sampling with minimal replacement allows this; observation 4 is included in every sample, and is included twice in 20% of the samples.

Tim Hesterberg

Disclaimer - these are my opinions, not those of my employer.

```| Tim Hesterberg       Senior Research Scientist       |
| timh_at_insightful.com  Insightful Corp.                |
| (206)802-2319        1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|                      www.insightful.com/Hesterberg   |
========================================================
```
I'll teach short courses:
Advanced Programming in S-PLUS: San Antonio TX, March 26-27, 2008. Bootstrap Methods and Permutation Tests: San Antonio, March 28, 2008.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 06 Feb 2008 - 19:11:11 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 06 Feb 2008 - 20:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.