[Rd] c/c++ Random Number Generators Benchmarks using OpenMP

From: Karl Forner <karl.forner_at_gmail.com>
Date: Fri, 02 Mar 2012 10:38:54 +0100


Dear R gurus,

I am interested in permutations-based cpu-intensive methods so I had to pay a little attention to Random Number Generators (RNG). For my needs, RNGs have to:

  1. be fast. I profiled my algorithms, and for some the bottleneck was the RNG.
  2. be scalable. Meaning that I want the RNG to remain fast as I add threads.
  3. offer a long cycle length. Some basic generators have a cycle length so low that in a few seconds you can finish it, making further computations useless and redundant
  4. be able to give reproducible results independent of the number of threads used, i.e. I want my program to give the very same exact results using one or 10 threads ( 4) "be good" of course )

I found an implementation that seems to meet my criterion and made a preliminary package to test it.
In the meantime Petr Savicky contacted saying he was about to release a similar package called rngOpenMP.

So I decided to perform some quick benchmarks. The benchmark code is available as a R package "rngBenchmarks" here: https://r-forge.r-project.org/scm/viewvc.php/pkg/?root=gwas-bin-tests but it depends on some unpublished package, like rngOpenMP, and my preliminary package, yet available from the same URL.

As a benchmark I implemented a Monte-Carlo computation of PI. I tried to use the exact same computation method, using a template argument for the RNG, and providing wrappers for the different available RNGs, except for the rngOpenMP that is not instantiable, so I adapted specifically the code.

I included in the benchmark:

My conclusions:

The problem with random_r is that its cycle length according to my manpage is ~ 3E10, enabling for instance only 3 millions permutations of a vector of 10,000 elements,
to be compared with

Leaving the RcppRandomSFMT as best candidate. This implementation also allows multiple seeds, solving my requisite number 4, reproducible results independent of the number of threads, if I use as second seed the task identifier.

Of course I am probably biased, so please tell me if you have some better ideas of benchmarks, tests of correctness, if you'd like some other implementations to be included.

People interested in this topic couldcontact me in order that we collaboratively propose an implementation suiting all needs.

Thanks,

Karl Forner

Annex:

I ran the benchmarks on a linux Intel(R) Xeon(R) with 2 cpus of 4 cores each ( CPU E5520 @ 2.27GHz).

        type threads     n        error    time time_per_chunk
1     lecuyer       1 1e+07 2.105472e-04   1.538     0.00153800
2     lecuyer       1 1e+08 4.441492e-05  15.265     0.00152650
3     lecuyer       1 1e+09 2.026819e-05 153.209     0.00153209
4     lecuyer       2 1e+07 3.182633e-04   0.821     0.00082100
5     lecuyer       2 1e+08 7.375036e-05   7.751     0.00077510
6     lecuyer       2 1e+09 9.290323e-06  76.476     0.00076476
7     lecuyer       4 1e+07 9.630351e-05   0.401     0.00040100
8     lecuyer       4 1e+08 1.263486e-05   3.887     0.00038870
9     lecuyer       4 1e+09 1.151515e-06  38.618     0.00038618
10    lecuyer       8 1e+07 1.239703e-05   0.241     0.00024100
11    lecuyer       8 1e+08 7.894518e-05   2.133     0.00021330
12    lecuyer       8 1e+09 6.782041e-06  20.420     0.00020420
13   random_r       1 1e+07 7.898746e-05   0.137     0.00013700
14   random_r       1 1e+08 4.748343e-05   1.290     0.00012900
15   random_r       1 1e+09 1.685692e-05  12.844     0.00012844
16   random_r       2 1e+07 4.757590e-06   0.095     0.00009500
17   random_r       2 1e+08 7.389450e-05   0.663     0.00006630
18   random_r       2 1e+09 2.913732e-05   6.469     0.00006469
19   random_r       4 1e+07 1.664590e-04   0.037     0.00003700
20   random_r       4 1e+08 1.138106e-04   0.330     0.00003300
21   random_r       4 1e+09 3.734717e-05   3.209     0.00003209
22   random_r       8 1e+07 1.034678e-04   0.051     0.00005100
23   random_r       8 1e+08 4.733472e-05   0.167     0.00001670
24   random_r       8 1e+09 1.985413e-05   1.694     0.00001694
25 rng_openmp       1 1e+07 2.097492e-04   1.231     0.00123100
26 rng_openmp       1 1e+08 7.580436e-05  12.155     0.00121550
27 rng_openmp       1 1e+09 2.772810e-05 120.712     0.00120712
28 rng_openmp       2 1e+07 6.870833e-05   0.609     0.00060900
29 rng_openmp       2 1e+08 1.071006e-04   6.095     0.00060950
30 rng_openmp       2 1e+09 4.296624e-05  61.426     0.00061426
31 rng_openmp       4 1e+07 3.000218e-04   0.312     0.00031200
32 rng_openmp       4 1e+08 6.313562e-05   3.066     0.00030660
33 rng_openmp       4 1e+09 1.436418e-05  30.441     0.00030441
34 rng_openmp       8 1e+07 2.767216e-04   0.191     0.00019100
35 rng_openmp       8 1e+08 9.341252e-06   1.614     0.00016140
36 rng_openmp       8 1e+09 6.874726e-06  16.121     0.00016121
37       sfmt       1 1e+07 2.313942e-04   0.114     0.00011400
38       sfmt       1 1e+08 1.485365e-06   1.186     0.00011860
39       sfmt       1 1e+09 1.083604e-05  11.417     0.00011417
40       sfmt       2 1e+07 1.898866e-04   0.062     0.00006200
41       sfmt       2 1e+08 2.627199e-06   0.573     0.00005730
42       sfmt       2 1e+09 1.295063e-05   5.779     0.00005779
43       sfmt       4 1e+07 1.691328e-04   0.033     0.00003300
44       sfmt       4 1e+08 4.942283e-05   0.315     0.00003150
45       sfmt       4 1e+09 1.232928e-05   2.871     0.00002871
46       sfmt       8 1e+07 1.001232e-04   0.022     0.00002200
47       sfmt       8 1e+08 3.991173e-05   0.149     0.00001490
48       sfmt       8 1e+09 5.775921e-06   1.550     0.00001550

         type threads     n        error    time time_div_threads
1     lecuyer       8 1e+07 6.628918e-05   0.269         0.033625
2     lecuyer       8 1e+08 8.265030e-05   2.033         0.254125
3     lecuyer       8 1e+09 6.422987e-06  20.684         2.585500
4     lecuyer       8 1e+10 1.785588e-06 203.427        25.428375
5    random_r       8 1e+07 2.349934e-04   0.017         0.002125
6    random_r       8 1e+08 6.640785e-05   0.172         0.021500
7    random_r       8 1e+09 2.074285e-05   1.701         0.212625
8    random_r       8 1e+10 2.084929e-05  17.145         2.143125
9  rng_openmp       8 1e+07 2.737931e-04   0.159         0.019875
10 rng_openmp       8 1e+08 7.173399e-07   1.624         0.203000
11 rng_openmp       8 1e+09 7.219774e-06  16.097         2.012125
12 rng_openmp       8 1e+10 4.310556e-06 161.093        20.136625
13       sfmt       8 1e+07 1.332275e-04   0.015         0.001875
14       sfmt       8 1e+08 4.277652e-05   0.149         0.018625
15       sfmt       8 1e+09 3.728551e-06   1.501         0.187625
16       sfmt       8 1e+10 6.365431e-06  15.041         1.880125

	[[alternative HTML version deleted]]

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 02 Mar 2012 - 09:41:11 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 02 Mar 2012 - 12:40:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive