Re: [Rd] [R] RNG Cycle and Duplication (PR#12540)

From: <shli_at_stat.wvu.edu>
Date: Thu, 14 Aug 2008 23:45:09 +0200 (CEST)


  This message is in MIME format. The first part should be readable text,   while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-851401618-1218751024=:15885 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE

I didn't describe the problem clearly. It's about the number of distinct=20 values. So just ignore cycle issue.

My tests were:

RNGkind(kind=3D"Knuth-TAOCP");
sum(duplicated(runif(1e7))); #return 46552

RNGkind(kind=3D"Knuth-TAOCP-2002");
sum(duplicated(runif(1e7))); #return 46415

#These collision frequency suggested there were 2^30 distinct values by=20 birthday problem.

RNGkind(kind=3D"Marsaglia-Multicarry");
sum(duplicated(runif(1e7))); #return 11682

RNGkind(kind=3D"Super-Duper");
sum(duplicated(runif(1e7))); #return 11542

RNGkind(kind=3D"Mersenne-Twister");
sum(duplicated(runif(1e7))); #return 11656

#These indicated there were 2^32 distinct values, which agrees with the=20 help info.

RNGkind(kind=3D"Wichmann-Hill");
sum(duplicated(runif(1e7))); #return 0

#So for this method, there should be more than 2^32 distinct values.

You may not get the exact numbers, but they should be close. So how to=20 explain above problem?

I need generate a large sample without any ties, it seems to me=20 "Wichmann-Hill" is only choice right now.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Shengqiao Li

The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

On Thu, 14 Aug 2008, Peter Dalgaard wrote:

> Shengqiao Li wrote:

>> Hello all,

>>=20
>> I am generating large samples of random numbers. The RNG help page says:=
=20
>> "All the supplied uniform generators return 32-bit integer values that a=
re=20
>> converted to doubles, so they take at most 2^32 distinct values and long=
=20
>> runs will return duplicated values." But I find that the cycles are not =
the=20
>> same as the 32-bit integer.
>>=20

>> My test indicated that the cycles for Knuth's methods were 2^30 while=20
>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated i=
n=20
>> 10M numbers generated by runif using Wichmann-Hill. The other three meth=
ods=20
>> had cycle length of 2^32.
>>=20
>> So, anybody can explain this? And any improvement to the implementation =
can=20
>> be made to increase the cycle length like the Wichmann-Hill method?
>>=20
> What test? These are not simple linear congruential generators. Just beca= use=20
> you get the same value twice, it doesn't mean that the sequence is repeat= ing.=20
> Perhaps you should read the entire help page rather than just the note.
>
> --=20
>  O__  ---- Peter Dalgaard             =D8ster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907
>
>

---559023410-851401618-1218751024=:15885--

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 14 Aug 2008 - 21:55:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 17 Aug 2008 - 00:36:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive