Re: [R] how to control the sampling to make each sample unique

From: Rory Martin <>
Date: Thu, 10 May 2007 09:09:43 -0400

I think you're asking a design question about a Monte Carlo simulation. You have a "population" (size 10,000) from which you're defining an empirical distribution, and you're sampling from this to create pairs of training and test samples.

You need to ensure that each specific pair of training and test samples is disjoint, meaning no observations in common. Normally, you wouldn't want to make the different training samples disjoint, if that's what you meant by them being "unique". Or were you using it to mean "identical"?

Rory Martin

> From: HelponR <> Date: Wed, 09 May 2007 17:28:19
> I have a dataset of 10000 records which I want to use to compare two
> prediction models.
> I split the records into test dataset (size = ntest) and training dataset
> (size = ntrain). Then I run the two models.
> Now I want to shuffle the data and rerun the models. I want many shuffles.
> I know that the following command
> sample ((1:10000), ntrain)
> can pick ntrain numbers from 1 to 10000. Then I just use these rows as the
> training dataset.
> But how can I make sure each run of sample produce different results? I
> want the data output be unique each time. I tested sample(). and found it
> usually produce different combinations. But can I control it some how? Is
> there a better way to write this? mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu 10 May 2007 - 13:15:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 May 2007 - 21:32:10 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.