From: Jonathan P Daily <jdaily_at_usgs.gov>

Date: Wed, 02 Mar 2011 15:00:28 -0500

Jonathan P. Daily

Technician - USGS Leetown Science Center 11649 Leetown Road

Kearneysville WV, 25430

(304) 724-4480

"Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it."

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 02 Mar 2011 - 20:07:58 GMT

Date: Wed, 02 Mar 2011 15:00:28 -0500

I apologize if I was not clear in my response. I only mentioned x1, x2 in my example, but I did not clarify that I also knew that P(x6 = 1 | x1..5 = 1) = 0 in the original request. I also see that if he meant that he wanted to sample with replacement from the set of sequences that sample(rep(1:20, 5), 20) is fine for generating said sequences. My interpretation was that the sequences themselves should be sampling with replacement until frequency hits 5, whereupon it is not replaced. Hence my suggestion of:

bigsamp <- sample(1:20, 100, T)

idx <- sort(unlist(sapply(1:20, function(x) which(bigsamp ==
x)[1:5])))[1:20]

samp <- bigsamp[idx]

I apologize for my lack of clarity, though after reading the original post I'm not sure which solution the OP was looking for.

Cheers,

Jon

Jonathan P. Daily

Technician - USGS Leetown Science Center 11649 Leetown Road

Kearneysville WV, 25430

(304) 724-4480

"Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it."

- Jubal Early, Firefly

Bert Gunter <gunter.berton_at_gene.com> wrote on 03/02/2011 02:42:40 PM:

> [image removed]

*>
**> Re: [R] bootstrap resampling - simplified
**>
**> Bert Gunter
**>
**> to:
**>
**> Jonathan P Daily
**>
**> 03/02/2011 02:42 PM
**>
**> Cc:
**>
**> "Vokey, John", r-help, r-help-bounces
**>
**> Folks:
**>
**> On Wed, Mar 2, 2011 at 10:32 AM, Jonathan P Daily <jdaily_at_usgs.gov>
*

wrote:

> > I will point out again that sampling a five-fold replicate of 1:20 is

not

> > the same as resampling with replacement,

*>
**> -- Correct. In sampling with replacement from 1:20 there is positive
**> probability of getting all 1's or all 2's, etc. The poster
**> specifically said that he wanted 0 probability of such results. So,
**> obviously, the poster does NOT want to "sample with replacement from
**> 1:20." What he does want (I think) is a re-sample of size n from the
**> set of all **vectors** of length 20, each element of which is an
**> integer from 1 to 20, and for which no individual values occur more
**> than 5 times in the vector. Of course I'm just
**> interpreting/paraphrasing the original post (if I got it right), but I
**> think doing so makes the nature of the task clearer: one needs to find
**> some way to sample with replacement from the space of all such
**> **sequences**.
**>
**> I think it is now clear that one may do so by rejection sampling: i.e.
**> sample with replacement from 1:20 and throw away any sequences that
**> fail the at most 5 criterion. The sequences that remain are samples of
**> size 1 from the population of sequences that satisfy the poster's
**> criteria (in theory, anyway; this might tax a pseudo RNG in practice).
**> A collection of n such sequences is a bootstrap sample from this
**> population. I **think** that's what the poster wants -- and what
**> others have already provided. However, maybe this clarifies why it
**> works.
**>
**> If I have made any error in this, **Please** post a message pointing
**> out my error. I sometimes get confused about this stuff, too.
**>
**> Cheers,
**> Bert
**>
**>
**>
**>
**>
**> although I made an error in
**> > reporting probabilities - the P(x2 = 1 | x1 = 1) = 4/99 and not 4/100.
**> > When sampling with replacement, P(x2 = 1 | x1 = 1) = P(x2 = 1 | x1 !=
*

1) =

> > 1/20.

*> > --------------------------------------
**> > Jonathan P. Daily
**> > Technician - USGS Leetown Science Center
**> > 11649 Leetown Road
**> > Kearneysville WV, 25430
**> > (304) 724-4480
**> > "Is the room still a room when its empty? Does the room,
**> > the thing itself have purpose? Or do we, what's the word... imbue
*

it."

> > - Jubal Early, Firefly

*> >
**> > r-help-bounces_at_r-project.org wrote on 03/02/2011 01:05:01 PM:
**> >
**> >> [image removed]
**> >>
**> >> Re: [R] bootstrap resampling - simplified
**> >>
**> >> Vokey, John
**> >>
**> >> to:
**> >>
**> >> r-help
**> >>
**> >> 03/02/2011 01:07 PM
**> >>
**> >> Sent by:
**> >>
**> >> r-help-bounces_at_r-project.org
**> >>
**> >> On 2011-03-02, at 4:00 AM, r-help-request_at_r-project.org wrote:
**> >>
**> >> > Hello there,
**> >> >
**> >> > I have a problem concerning bootstrapping in R - especially
**> >> focusing on the resampling part of it. I try to sum it up in a
**> >> simplified way so that I would not confuse anybody.
**> >> >
**> >> > I have a small database consisting of 20 observations (basically
**> >> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
**> >> >
**> >> > I would like to resample this database many times for the
**> >> bootstrap process with the following conditions. Firstly, every
**> >> resampled database should also include 20 observations. Secondly,
**> >> when selecting a number from the above-mentioned 20 numbers, you can
**> >> do this selection with replacement. The difficult part comes now:
**> >> one number can be selected only maximum 5 times. In order to make
**> >> this clear I show you a couple of examples. So the resampled
**> >> databases might be like the following ones:
**> >> >
**> >> > (1st database) 1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
**> >> > 4 different numbers are chosen (1, 2, 3, 4), each selected - for
**> >> the maximum possible - 5 times.
**> >> >
**> >> > (2nd database) 1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
**> >> > Two numbers - 8 and 6 - selected 5 times (the maximum possible
**> >> times), number 1 selected 4 times, the others selected less than 4
**> > times.
**> >> >
**> >> > (3rd database) 1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
**> >> > Number 9 chosen for the maximum possible 5 times, number 10, 3, 2,
**> >> 1 chosen for 3 times, number 4 selected twice and number 13
*

selectedonly

> > once.

*> >> >
**> >> > ...
**> >> >
**> >> > Anybody knows how to implement my "tricky" condition into one of
**> >> the R functions - that one number can be selected only 5 times at
**> >> most? Are 'boot' and 'bootstrap' packages capable of managing this?
**> >> I guess they are, I just couldn't figure it out yet...
**> >> >
**> >> > Thanks very much! Best regards,
**> >> > Laszlo Bodnar
**> >>
**> >> Laszlo,
**> >> Create a vector consisting of 5 of each number. Then, for each
**> >> sample, scramble the order of the items in the vector, and select
**> >> the first 20.
**> >>
**> >>
**> >> --
**> >> Please avoid sending me Word or PowerPoint attachments.
**> >> See <http://www.gnu.org/philosophy/no-word-attachments.html>
**> >>
**> >> -Dr. John R. Vokey
**> >>
**> >> ______________________________________________
**> >> R-help_at_r-project.org mailing list
**> >> https://stat.ethz.ch/mailman/listinfo/r-help
**> >> PLEASE do read the posting guide
**> > http://www.R-project.org/posting-guide.html
**> >> and provide commented, minimal, self-contained, reproducible code.
**> >
**> > ______________________________________________
**> > R-help_at_r-project.org mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide
*

http://www.R-project.org/posting-guide.html

*> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**>
**>
**> --
**> Bert Gunter
*

> Genentech Nonclinical Biostatistics

*> 467-7374
**> http://devo.gene.com/groups/devo/depts/ncb/home.shtml
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 02 Mar 2011 - 20:07:58 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 02 Mar 2011 - 20:40:20 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*