[R] Odp: random sampling with levels and with replacement

From: Petr PIKAL <petr.pikal_at_precheza.cz>
Date: Fri, 08 Apr 2011 11:11:01 +0200

Hi

r-help-bounces_at_r-project.org napsal dne 08.04.2011 09:31:44:

> Dear all,
> i have a dataset of about 400 records , with a variable that has two
levels
> 40 bad and 360 good among other variables,how do i come up with10
random
> samples that have the composition of as the main sample but maintaining
the
> 40 bad 360 good with replacement, i recently discovered that my random
samples
> generated dont maintain the ratio. My code is as :
>
> mysample <- final[sample(1:nrow(final), 400,replace=TRUE),]
>
> does not give me the ratio of 40 bad and 360 good can anyone give me
some
> pointers please?

If you sample 400 items with replacement 400 times you will only accidentally get exact proportion of good and bad. Consider that in each sample your chance to get bad one is 40/360 but it does not mean that from 400 random picks you will get exactly 40 bad items.

If you just want shuffle your rows use sampling without replacement.

mysample <- final[sample(1:nrow(final), 400),]

In that case you get the same data but with random row order.

But if you want to do sample with replacement you will get on average the proportion of good and bad items. You can check it e.g. by

x<-c(rep("g", 360), rep("b",40))
res<-rep(NA, 1000)
for( i in 1:1000) {

y<-table(sample(x,400, replace=T))
res[i]<-y[1]/y[2]
hist(res)
abline(v=40/360, col=2)
}

Regards
Petr

>
>
>
> Thanks,
> Taby
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 08 Apr 2011 - 09:14:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Apr 2011 - 09:20:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive