Re: [Rd] portable parallel seeds project: request for critiques

From: Petr Savicky <savicky_at_cs.cas.cz>
Date: Sat, 18 Feb 2012 11:40:02 +0100

On Fri, Feb 17, 2012 at 09:33:33PM -0600, Paul Johnson wrote: [...]
> The seed things I'm using are the 6 integer values from the L'Ecuyer.
> If you run the example script, the verbose option causes some to print
> out. The first 3 seeds in a saved project seeds file looks like:
>
> > projSeeds[[1]]
> [[1]]
> [1] 407 376488316 1939487821 1433925148 -1040698333 579503880
> [7] -624878918
>
> [[2]]
> [1] 407 -1107332181 854177397 1773099324 1774170776 -266687360
> [7] 816955059
>
> [[3]]
> [1] 407 936506900 -1924631332 -1380363206 2109234517 1585239833
> [7] -1559304513
>
> The 407 in the first position is an integer R uses to note the type of
> stream for which the seed is intended, in this case R'Lecuyer.

The paralel streams uses MRG32k3a generator by P.L'Ecuyer, whose original implementation stores its internal state as double precision floating point numbers. The vector .Random.seed consists of integers. Do you know, whether a modified implementation of MRG32k3a is used, which works with integers internally, or the conversion between double and integer is done whenever the state is stored to .Random.seed?  

> I don't have any formal proof that a "good quality hash function"
> would truly create seeds from which independent streams will be drawn.

One of the suggested ways how to generate, say, 2^16 cryptographically secure random numbers, is to use a counter producing a sequence 1, 2, 3, ..., 2^16 and transform these values by a hash function. An example is Fortuna generator

  http://en.wikipedia.org/wiki/Fortuna_(PRNG)

which is also available on CRAN as "randaes". The length of the sequence is limited, since the sequence contains no collisions. If the sequence is too long, this could allow to distinguish it from truly random numbers. After generating 2^16 numbers, the seed is recomputed and another 2^16 numbers are generated.

Such a generator is a very good one. It is not used for simulations, since it is slower (say by a factor of 5) than the linear generators like Mersenne Twister, WELL or MRG family of generators. However, if it is used only for initialization, then the speed is less important.

> There is, however, the proof in the L'Ecuyer paper that one can take
> the long stream and divide it into sections. That's the approach I'm
> taking here. Its the same approach the a parallel package in R
> follows, and parallel frameworks like snow.

According to

  http://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00.pdf

the sectioning of the stream to substreams is done by jump ahead algorithm. The new seed is far enough in the sequence from the previous seed, so it is guaranteed that the sequence generated from the previous seed is shorter than the jump and the subsequences do not overlap. However, the new seed is computable as a function of the previous seed. If we use this strategy to produce a matrix of seeds

  s_{1,1}, s_{1,2}, ...
  s_{2,1}, s_{2,2}, ...
  s_{3,1}, s_{2,2}, ...

then each s_{i,j} may be computed from s_{1,1} and i, j. In this case, it is sufficient to store s_{1,1}. If we know for each run the indices i,j, then we can compute s_{i,j} by an efficient algorithm.

> The different thing in my approach is that I'm saving one row of seeds
> per simulation "run". So each run can be replicated exactly.
>
> I hope.

Saving .Random.seed should be a safe strategy.

Petr Savicky.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 18 Feb 2012 - 10:42:52 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 19 Feb 2012 - 17:40:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive