Re: [Rd] portable parallel seeds project: request for critiques

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Fri, 17 Feb 2012 21:33:33 -0600

On Fri, Feb 17, 2012 at 5:06 PM, Petr Savicky <savicky_at_cs.cas.cz> wrote:
> On Fri, Feb 17, 2012 at 02:57:26PM -0600, Paul Johnson wrote:
> Hi.
>
> Some of the random number generators allow as a seed a vector,
> not only a single number. This can simplify generating the seeds.
> There can be one seed for each of the 1000 runs and then,
> the rows of the seed matrix can be
>
>  c(seed1, 1), c(seed1, 2), ...
>  c(seed2, 1), c(seed2, 2), ...
>  c(seed3, 1), c(seed3, 2), ...
>  ...
>
Yes, I understand.

The seed things I'm using are the 6 integer values from the L'Ecuyer. If you run the example script, the verbose option causes some to print out. The first 3 seeds in a saved project seeds file looks like:

> projSeeds[[1]]

[[1]]
[1]         407   376488316  1939487821  1433925148 -1040698333   579503880
[7]  -624878918

[[2]]
[1]         407 -1107332181   854177397  1773099324  1774170776  -266687360
[7] 816955059
[[3]]
[1]         407   936506900 -1924631332 -1380363206  2109234517  1585239833
[7] -1559304513

The 407 in the first position is an integer R uses to note the type of stream for which the seed is intended, in this case R'Lecuyer.

> There could be even only one seed and the matrix can be generated as
>
>  c(seed, 1, 1), c(seed, 1, 2), ...
>  c(seed, 2, 1), c(seed, 2, 2), ...
>  c(seed, 3, 1), c(seed, 3, 2), ...
>
> If the initialization using the vector c(seed, i, j) is done
> with a good quality hash function, the runs will be independent.
>
I don't have any formal proof that a "good quality hash function" would truly create seeds from which independent streams will be drawn.

There is, however, the proof in the L'Ecuyer paper that one can take the long stream and divide it into sections. That's the approach I'm taking here. Its the same approach the a parallel package in R follows, and parallel frameworks like snow.

The different thing in my approach is that I'm saving one row of seeds per simulation "run". So each run can be replicated exactly.

I hope.

pj

pj

> What is your opinion on this?
>
> An advantage of seeding with a vector is also that there can
> be significantly more initial states of the generator among
> which we select by the seed than 2^32, which is the maximum
> for a single integer seed.
>
> Petr Savicky.
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 18 Feb 2012 - 03:43:11 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 19 Feb 2012 - 19:50:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive