Re: [Rd] portable parallel seeds project: request for critiques

From: Petr Savicky <savicky_at_cs.cas.cz>
Date: Wed, 22 Feb 2012 22:15:09 +0100

On Wed, Feb 22, 2012 at 12:17:25PM -0600, Paul Johnson wrote: [...]
> In order for this to be easy for users, I need to put the init streams
> and set current stream functions into a package, and then streamline
> the process of creating the seed array. My opinion is that CRAN is
> now overflowed with too many "one function" packages, I don't want to
> create another one just for these two little functions, and I may roll
> it into my general purpose regression package called "rockchalk".

I am also preparing a solution to the problem. One is based on AES used for initialization of the R base Mersenne-Twister generator, so it only replaces set.seed() function. Another solution is based on "rlecuyer" package. I suggest to discuss the possible solutions off-list before submitting to CRAN.

> One technical issue that has been raised to me is that R parallel's
> implementation of the L'Ecuyer generator is based on integer valued
> variables, whereas the original L'Ecuyer code uses double real
> variables. But I'm trusting the R Core on this, if they code the
> generator in a way that is good enough for R itself, it is good enough
> for me. (And if that's wrong, we'll all find out together :) ).

I do not know about any L'Ecuyer's generator in R base. You probably mean the authors of the extension packages with these generators.

> Josef Leydold (the rstream package author) has argued that R's
> implementation runs more slowly than it ought to. We had some
> correspondence and I tracked a few threads in forums. It appears the
> approach suggested there is roadblocked by some characteristics deep
> down in R and the way random streams are managed. Packages have only
> a limited, tenuous method to replace R's generators with their own
> generators.

In order to connect a user defined generator to R, there are two obligatory entry points "user_unif_rand" and "user_unif_init". The first allows to call the generator from runif() and the similar functions. The second connects the generator to set.seed() function. If there is only one extension package with a generator loaded to an R session, then these entry points are good enough. If the package provides several generators, like "randtoolbox", it is possible to change between them easily using functions provided by the package for this purpose. I think that having several packages with generators simultaneously can be good for their development, but this is not needed for their use in applications.

There are also two other entry points "user_unif_nseed" and "user_unif_seedloc", which allow to support the variable ".Random.seed". A problem with this is that R converts the internal state of the generator to ".Random.seed" by reading a specific memory location, but does not alert the package about this event. So, if the state requires a transformation to integer before storing to ".Random.seed", it is not possible to do this only when needed.

In the package "rngwell19937", i included some code that tries to determine, whether the user changed ".Random.seed" or not. The reason is that most of the state is integer and is stored to ".Random.seed", but the state contains also a function pointer, which is not stored. It can be recomputed from ".Random.seed" and this recomputing is done, if the package detects a change of ".Random.seed". This is not a nice solution. So in "randtoolbox" we decided not to support ".Random.seed".

I understand that in the area of parallel computing, the work with ".Random.seed" is a good paradigm, but if the generator provides other tools for converting the state to an R object and put it back to the active state, then ".Random.seed" is not strictly necessary.

> Parallel Random Number Generation in C++ and R Using RngStream
> Andrew Karl · Randy Eubank · Dennis Young
> http://math.la.asu.edu/~eubank/webpage/rngStreamPaper.pdf

Thank you very much for this link.

All the best, Petr.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 22 Feb 2012 - 21:30:44 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 23 Feb 2012 - 08:20:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive