Re: [Rd] [R] RNG Cycle and Duplication (PR#12540)

From: Christophe Dutang <dutangc_at_gmail.com>
Date: Sun, 17 Aug 2008 10:05:15 +0200

Hi,

If you want I can put RANARRAY in the 'randtoolbox' package? unless you want in R and not in a package?

In randtoolbox (not the version on cran), I port SFMT and WELL generator respectively from Matsumoto and L'Ecuyer. It could be a good idea to add Knuth's code?

Christophe Dutang

Le 17 août 08 à 00:09, Shengqiao Li a écrit :

>
> Knuth's double version RNG rng-double.c dose a great job. No ties
> were observed for 10M numbers ( totally 2^52 possible different
> values?). In rng-double, double modulo mod_sum replaced the integer
> version mod_diff in the integer version rng.c that is adopted by R.
>
> The integer version uses modulus 2^30. Therefore there are only 2^30
> distinct numbers, which is confirmed by my previous test in R.
>
> If someday Knuth's double version is also included in R, it will be
> great.
>
>
> Shengqiao Li
>
>
> On Fri, 15 Aug 2008, Duncan Murdoch wrote:
>
>> On 15/08/2008 10:28 AM, Shengqiao Li wrote:
>>> Thank you for your reply and for your suggestion. So the note in
>>> man page could be more accurate since for an end user, man page
>>> should be more helpful and source code is mainly for developers.
>>> I was also adviced to use Knuth's double version RANARRAY from
>>> http://www-cs-faculty.stanford.edu/~knuth/programs.html instead of
>>> the integer versions in R. I'm a R user. So why not also include
>>> the double verion in R implementation?
>>
>> You can try it using kind="user-supplied" if you like, but I
>> suspect it's the same as "Knuth-TAOCP-2002".
>>
>> Duncan Murdoch
>>
>>> Thanks again,
>>> ========================================
>>> Shengqiao Li
>>> Research Associate
>>> The Department of Statistics
>>> PO Box 6330
>>> West Virginia University
>>> Morgantown, WV 26506-6330
>>> ========================================
>>> On Fri, 15 Aug 2008, Duncan Murdoch wrote:
>>>> shli_at_stat.wvu.edu wrote:
>>>>> This message is in MIME format. The first part should be
>>>>> readable text,
>>>>> while the remaining parts are likely unreadable without MIME-
>>>>> aware tools.
>>>>> ---559023410-851401618-1218751024=:15885
>>>>> Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
>>>>> Content-Transfer-Encoding: QUOTED-PRINTABLE
>>>>> I didn't describe the problem clearly. It's about the number of
>>>>> distinct=20
>>>>> values. So just ignore cycle issue.
>>>>> My tests were:
>>>>> RNGkind(kind=3D"Knuth-TAOCP");
>>>>> sum(duplicated(runif(1e7))); #return 46552
>>>>> RNGkind(kind=3D"Knuth-TAOCP-2002");
>>>>> sum(duplicated(runif(1e7))); #return 46415
>>>>> #These collision frequency suggested there were 2^30 distinct
>>>>> values by=20
>>>>> birthday problem.
>>>> The birthday problem distribution applies to independent draws,
>>>> but they are only pseudo-independent. I think the only ways to
>>>> know for sure if there are 2^30 values are to look at the code,
>>>> or run through a complete cycle. And you need to determine the
>>>> cycle by looking at .Random.seed, not at the returned value.
>>>>> RNGkind(kind=3D"Marsaglia-Multicarry");
>>>>> sum(duplicated(runif(1e7))); #return 11682
>>>>> RNGkind(kind=3D"Super-Duper");
>>>>> sum(duplicated(runif(1e7))); #return 11542
>>>>> RNGkind(kind=3D"Mersenne-Twister");
>>>>> sum(duplicated(runif(1e7))); #return 11656
>>>>> #These indicated there were 2^32 distinct values, which agrees
>>>>> with the=20
>>>>> help info.
>>>> If there are 2^30 distinct values for the two generators above,
>>>> that also agrees with the documentation.
>>>>> RNGkind(kind=3D"Wichmann-Hill");
>>>>> sum(duplicated(runif(1e7))); #return 0
>>>>> #So for this method, there should be more than 2^32 distinct
>>>>> values.
>>>>> You may not get the exact numbers, but they should be close. So
>>>>> how to=20
>>>>> explain above problem?
>>>> You haven't demonstrated what you claim, but if you look at the
>>>> source, you'll see that in fact the man page is wrong: Wichmann-
>>>> Hill is based on 3 integer values, which each take on
>>>> approximately 15 bits of different values. So Wichmann-Hill could
>>>> take nearly 2^45 different values (actually 30269*30307*30323).
>>>> The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c
>>>> if you want to check the others.
>>>>> I need generate a large sample without any ties, it seems to me=20
>>>>> "Wichmann-Hill" is only choice right now.
>>>> An alternative would be to construct a new value from two (or
>>>> more) runif() values, but be careful that you don't mess up the
>>>> distribution when you do that.
>>>> Duncan Murdoch
>>>>> =
>>>>> 3D
>>>>> =
>>>>> 3D
>>>>> =
>>>>> 3D
>>>>> =
>>>>> 3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>> Shengqiao Li
>>>>> The Department of Statistics
>>>>> PO Box 6330
>>>>> West Virginia University
>>>>> Morgantown, WV 26506-6330
>>>>> =
>>>>> 3D
>>>>> =
>>>>> 3D
>>>>> =
>>>>> 3D
>>>>> =
>>>>> 3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>> On Thu, 14 Aug 2008, Peter Dalgaard wrote:
>>>>>> Shengqiao Li wrote:
>>>>>>> Hello all,
>>>>>>> =20
>>>>>>> I am generating large samples of random numbers. The RNG help
>>>>>>> page says:=
>>>>> =20
>>>>>>> "All the supplied uniform generators return 32-bit integer
>>>>>>> values that a=
>>>>> re=20
>>>>>>> converted to doubles, so they take at most 2^32 distinct
>>>>>>> values and long=
>>>>> =20
>>>>>>> runs will return duplicated values." But I find that the
>>>>>>> cycles are not =
>>>>> the=20
>>>>>>> same as the 32-bit integer.
>>>>>>> =20
>>>>>>> My test indicated that the cycles for Knuth's methods were
>>>>>>> 2^30 while=20
>>>>>>> Wichmann-Hill's cycle was larger than 2^32! No numbers were
>>>>>>> duplicated i=
>>>>> n=20
>>>>>>> 10M numbers generated by runif using Wichmann-Hill. The other
>>>>>>> three meth=
>>>>> ods=20
>>>>>>> had cycle length of 2^32.
>>>>>>> =20
>>>>>>> So, anybody can explain this? And any improvement to the
>>>>>>> implementation =
>>>>> can=20
>>>>>>> be made to increase the cycle length like the Wichmann-Hill
>>>>>>> method?
>>>>>>> =20
>>>>>> What test? These are not simple linear congruential generators.
>>>>>> Just beca=
>>>>> use=20
>>>>>> you get the same value twice, it doesn't mean that the sequence
>>>>>> is repeat=
>>>>> ing.=20
>>>>>> Perhaps you should read the entire help page rather than just
>>>>>> the note.
>>>>>> --=20
>>>>>> O__ ---- Peter Dalgaard =D8ster Farimagsgade 5,
>>>>>> Entr.B
>>>>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>>>>>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
>>>>>> 35327918
>>>>>> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45)
>>>>>> 35327907
>>>>> ---559023410-851401618-1218751024=:15885--
>>>>> ______________________________________________
>>>>> R-devel_at_r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> ______________________________________________
>>> R-devel_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 17 Aug 2008 - 08:09:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 17 Aug 2008 - 12:36:57 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive