From: Christophe Dutang <dutangc_at_gmail.com>

Date: Sun, 17 Aug 2008 10:05:15 +0200

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 17 Aug 2008 - 08:09:24 GMT

Date: Sun, 17 Aug 2008 10:05:15 +0200

Hi,

If you want I can put RANARRAY in the 'randtoolbox' package? unless you want in R and not in a package?

In randtoolbox (not the version on cran), I port SFMT and WELL generator respectively from Matsumoto and L'Ecuyer. It could be a good idea to add Knuth's code?

Christophe Dutang

Le 17 août 08 à 00:09, Shengqiao Li a écrit :

*>
*

> Knuth's double version RNG rng-double.c dose a great job. No ties

*> were observed for 10M numbers ( totally 2^52 possible different
**> values?). In rng-double, double modulo mod_sum replaced the integer
**> version mod_diff in the integer version rng.c that is adopted by R.
**>
**> The integer version uses modulus 2^30. Therefore there are only 2^30
**> distinct numbers, which is confirmed by my previous test in R.
**>
**> If someday Knuth's double version is also included in R, it will be
**> great.
**>
**>
**> Shengqiao Li
**>
**>
**> On Fri, 15 Aug 2008, Duncan Murdoch wrote:
**>
**>> On 15/08/2008 10:28 AM, Shengqiao Li wrote:
**>>> Thank you for your reply and for your suggestion. So the note in
**>>> man page could be more accurate since for an end user, man page
**>>> should be more helpful and source code is mainly for developers.
**>>> I was also adviced to use Knuth's double version RANARRAY from
**>>> http://www-cs-faculty.stanford.edu/~knuth/programs.html instead of
**>>> the integer versions in R. I'm a R user. So why not also include
**>>> the double verion in R implementation?
**>>
**>> You can try it using kind="user-supplied" if you like, but I
**>> suspect it's the same as "Knuth-TAOCP-2002".
**>>
**>> Duncan Murdoch
**>>
**>>> Thanks again,
**>>> ========================================
**>>> Shengqiao Li
**>>> Research Associate
**>>> The Department of Statistics
**>>> PO Box 6330
**>>> West Virginia University
**>>> Morgantown, WV 26506-6330
**>>> ========================================
**>>> On Fri, 15 Aug 2008, Duncan Murdoch wrote:
**>>>> shli_at_stat.wvu.edu wrote:
**>>>>> This message is in MIME format. The first part should be
**>>>>> readable text,
**>>>>> while the remaining parts are likely unreadable without MIME-
**>>>>> aware tools.
**>>>>> ---559023410-851401618-1218751024=:15885
**>>>>> Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
**>>>>> Content-Transfer-Encoding: QUOTED-PRINTABLE
**>>>>> I didn't describe the problem clearly. It's about the number of
**>>>>> distinct=20
**>>>>> values. So just ignore cycle issue.
**>>>>> My tests were:
**>>>>> RNGkind(kind=3D"Knuth-TAOCP");
**>>>>> sum(duplicated(runif(1e7))); #return 46552
**>>>>> RNGkind(kind=3D"Knuth-TAOCP-2002");
**>>>>> sum(duplicated(runif(1e7))); #return 46415
**>>>>> #These collision frequency suggested there were 2^30 distinct
**>>>>> values by=20
**>>>>> birthday problem.
**>>>> The birthday problem distribution applies to independent draws,
**>>>> but they are only pseudo-independent. I think the only ways to
**>>>> know for sure if there are 2^30 values are to look at the code,
**>>>> or run through a complete cycle. And you need to determine the
**>>>> cycle by looking at .Random.seed, not at the returned value.
**>>>>> RNGkind(kind=3D"Marsaglia-Multicarry");
**>>>>> sum(duplicated(runif(1e7))); #return 11682
**>>>>> RNGkind(kind=3D"Super-Duper");
**>>>>> sum(duplicated(runif(1e7))); #return 11542
**>>>>> RNGkind(kind=3D"Mersenne-Twister");
**>>>>> sum(duplicated(runif(1e7))); #return 11656
**>>>>> #These indicated there were 2^32 distinct values, which agrees
**>>>>> with the=20
**>>>>> help info.
**>>>> If there are 2^30 distinct values for the two generators above,
**>>>> that also agrees with the documentation.
**>>>>> RNGkind(kind=3D"Wichmann-Hill");
**>>>>> sum(duplicated(runif(1e7))); #return 0
**>>>>> #So for this method, there should be more than 2^32 distinct
**>>>>> values.
**>>>>> You may not get the exact numbers, but they should be close. So
**>>>>> how to=20
**>>>>> explain above problem?
**>>>> You haven't demonstrated what you claim, but if you look at the
**>>>> source, you'll see that in fact the man page is wrong: Wichmann-
**>>>> Hill is based on 3 integer values, which each take on
**>>>> approximately 15 bits of different values. So Wichmann-Hill could
**>>>> take nearly 2^45 different values (actually 30269*30307*30323).
**>>>> The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c
**>>>> if you want to check the others.
**>>>>> I need generate a large sample without any ties, it seems to me=20
**>>>>> "Wichmann-Hill" is only choice right now.
**>>>> An alternative would be to construct a new value from two (or
**>>>> more) runif() values, but be careful that you don't mess up the
**>>>> distribution when you do that.
**>>>> Duncan Murdoch
**>>>>> =
**>>>>> 3D
**>>>>> =
**>>>>> 3D
**>>>>> =
**>>>>> 3D
**>>>>> =
**>>>>> 3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
**>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
**>>>>> Shengqiao Li
**>>>>> The Department of Statistics
**>>>>> PO Box 6330
**>>>>> West Virginia University
**>>>>> Morgantown, WV 26506-6330
**>>>>> =
**>>>>> 3D
**>>>>> =
**>>>>> 3D
**>>>>> =
**>>>>> 3D
**>>>>> =
**>>>>> 3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
**>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
**>>>>> On Thu, 14 Aug 2008, Peter Dalgaard wrote:
**>>>>>> Shengqiao Li wrote:
**>>>>>>> Hello all,
**>>>>>>> =20
**>>>>>>> I am generating large samples of random numbers. The RNG help
**>>>>>>> page says:=
**>>>>> =20
**>>>>>>> "All the supplied uniform generators return 32-bit integer
**>>>>>>> values that a=
**>>>>> re=20
**>>>>>>> converted to doubles, so they take at most 2^32 distinct
**>>>>>>> values and long=
**>>>>> =20
**>>>>>>> runs will return duplicated values." But I find that the
**>>>>>>> cycles are not =
**>>>>> the=20
**>>>>>>> same as the 32-bit integer.
**>>>>>>> =20
**>>>>>>> My test indicated that the cycles for Knuth's methods were
**>>>>>>> 2^30 while=20
**>>>>>>> Wichmann-Hill's cycle was larger than 2^32! No numbers were
**>>>>>>> duplicated i=
**>>>>> n=20
**>>>>>>> 10M numbers generated by runif using Wichmann-Hill. The other
**>>>>>>> three meth=
**>>>>> ods=20
**>>>>>>> had cycle length of 2^32.
**>>>>>>> =20
**>>>>>>> So, anybody can explain this? And any improvement to the
**>>>>>>> implementation =
**>>>>> can=20
**>>>>>>> be made to increase the cycle length like the Wichmann-Hill
**>>>>>>> method?
**>>>>>>> =20
**>>>>>> What test? These are not simple linear congruential generators.
**>>>>>> Just beca=
**>>>>> use=20
**>>>>>> you get the same value twice, it doesn't mean that the sequence
**>>>>>> is repeat=
**>>>>> ing.=20
**>>>>>> Perhaps you should read the entire help page rather than just
**>>>>>> the note.
**>>>>>> --=20
**>>>>>> O__ ---- Peter Dalgaard =D8ster Farimagsgade 5,
**>>>>>> Entr.B
**>>>>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
**>>>>>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
**>>>>>> 35327918
**>>>>>> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45)
**>>>>>> 35327907
**>>>>> ---559023410-851401618-1218751024=:15885--
**>>>>> ______________________________________________
**>>>>> R-devel_at_r-project.org mailing list
**>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
**>>> ______________________________________________
**>>> R-devel_at_r-project.org mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-devel
**>>
**>>
**>
**> ______________________________________________
**> R-devel_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-devel
*

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 17 Aug 2008 - 08:09:24 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sun 17 Aug 2008 - 12:36:57 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*