From: Duncan Murdoch <murdoch_at_stats.uwo.ca>

Date: Fri, 15 Aug 2008 01:58:29 -0400

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 15 Aug 2008 - 06:18:45 GMT

Date: Fri, 15 Aug 2008 01:58:29 -0400

shli_at_stat.wvu.edu wrote:

> This message is in MIME format. The first part should be readable text,

*> while the remaining parts are likely unreadable without MIME-aware tools.
**>
**> ---559023410-851401618-1218751024=:15885
**> Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
**> Content-Transfer-Encoding: QUOTED-PRINTABLE
**>
**>
**> I didn't describe the problem clearly. It's about the number of distinct=20
**> values. So just ignore cycle issue.
**>
**> My tests were:
**>
**> RNGkind(kind=3D"Knuth-TAOCP");
**> sum(duplicated(runif(1e7))); #return 46552
**>
**> RNGkind(kind=3D"Knuth-TAOCP-2002");
**> sum(duplicated(runif(1e7))); #return 46415
**>
**> #These collision frequency suggested there were 2^30 distinct values by=20
**> birthday problem.
**>
*

The birthday problem distribution applies to independent draws, but they
are only pseudo-independent. I think the only ways to know for sure if
there are 2^30 values are to look at the code, or run through a complete
cycle. And you need to determine the cycle by looking at .Random.seed,
not at the returned value.

*>
*

> RNGkind(kind=3D"Marsaglia-Multicarry");

*> sum(duplicated(runif(1e7))); #return 11682
**>
**> RNGkind(kind=3D"Super-Duper");
**> sum(duplicated(runif(1e7))); #return 11542
**>
**> RNGkind(kind=3D"Mersenne-Twister");
**> sum(duplicated(runif(1e7))); #return 11656
**>
**> #These indicated there were 2^32 distinct values, which agrees with the=20
**> help info.
**>
**>
*

If there are 2^30 distinct values for the two generators above, that also agrees with the documentation.

> RNGkind(kind=3D"Wichmann-Hill");

*> sum(duplicated(runif(1e7))); #return 0
**>
**> #So for this method, there should be more than 2^32 distinct values.
**>
**> You may not get the exact numbers, but they should be close. So how to=20
**> explain above problem?
**>
*

You haven't demonstrated what you claim, but if you look at the source, you'll see that in fact the man page is wrong: Wichmann-Hill is based on 3 integer values, which each take on approximately 15 bits of different values. So Wichmann-Hill could take nearly 2^45 different values (actually 30269*30307*30323).

The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c if you
want to check the others.

> I need generate a large sample without any ties, it seems to me=20

*> "Wichmann-Hill" is only choice right now.
**>
*

An alternative would be to construct a new value from two (or more) runif() values, but be careful that you don't mess up the distribution when you do that.

Duncan Murdoch

> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

*> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
**> Shengqiao Li
**>
**> The Department of Statistics
**> PO Box 6330
**> West Virginia University
**> Morgantown, WV 26506-6330
**> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
**> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
**>
**> On Thu, 14 Aug 2008, Peter Dalgaard wrote:
**>
**>
**>> Shengqiao Li wrote:
**>>
**>>> Hello all,
**>>> =20
**>>> I am generating large samples of random numbers. The RNG help page says:=
**>>>
**> =20
**>
**>>> "All the supplied uniform generators return 32-bit integer values that a=
**>>>
**> re=20
**>
**>>> converted to doubles, so they take at most 2^32 distinct values and long=
**>>>
**> =20
**>
**>>> runs will return duplicated values." But I find that the cycles are not =
**>>>
**> the=20
**>
**>>> same as the 32-bit integer.
**>>> =20
**>>> My test indicated that the cycles for Knuth's methods were 2^30 while=20
**>>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated i=
**>>>
**> n=20
**>
**>>> 10M numbers generated by runif using Wichmann-Hill. The other three meth=
**>>>
**> ods=20
**>
**>>> had cycle length of 2^32.
**>>> =20
**>>> So, anybody can explain this? And any improvement to the implementation =
**>>>
**> can=20
**>
**>>> be made to increase the cycle length like the Wichmann-Hill method?
**>>> =20
**>>>
**>> What test? These are not simple linear congruential generators. Just beca=
**>>
**> use=20
**>
**>> you get the same value twice, it doesn't mean that the sequence is repeat=
**>>
**> ing.=20
**>
**>> Perhaps you should read the entire help page rather than just the note.
**>>
**>> --=20
**>> O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, Entr.B
**>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
**>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
**>> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907
**>>
**>>
**>>
**> ---559023410-851401618-1218751024=:15885--
**>
**> ______________________________________________
**> R-devel_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-devel
**>
*

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 15 Aug 2008 - 06:18:45 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 15 Aug 2008 - 21:37:23 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*