[R] Permutations and large data sets

From: Chris Miller <chrisamiller_at_gmail.com>
Date: Wed, 12 Nov 2008 16:47:25 -0600


I have 200 samples, with 1 million data points in each. Each data point can have a value from zero to 10, and we can assume that they're normally distributed. If I calculate a sum by drawing one random data point from each sample and adding them, what value does that sum need to be before I can say that it's higher than 95% of the other possible sums (with reasonable probability)?

The brute-force way to do this is to calculate all possible sums, sort them, then find the value 95% of the way through the list. Obviously, this won't work, since the number of permutations is astronomical. So what's the appropriate way to approximate this, using R?

Thanks,

Chris Miller



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Nov 2008 - 22:50:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 13 Nov 2008 - 00:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive