Re: [Rd] [R] RNG Cycle and Duplication (PR#12540)

From: Shengqiao Li <shli_at_stat.wvu.edu>
Date: Fri, 15 Aug 2008 10:52:45 -0400 (EDT)

Professor Ripley,

Thank you for your solution. So the last paragraph of the Note in RNG help page will be updated since Wichmann-Hill is different from other supplied uniform generators in the number of distinct values?

Shengqiao Li

On Fri, 15 Aug 2008, Prof Brian Ripley wrote:

> Remember Wichmann-Hill is a composite generator. Its composition does take
> more than 2^32 distinct values.
>
> You still haven't identifed a problem here. The note is to warn that runif()
> does repeat within a cycle, because people wrote code assuming otherwise. It
> would be a poor use of runif() to rely on the low-order bits, and that's
> standard advice in the field.
>
> For a large sample of uniforms use something like the normal inversion does,
> e.g. 2^(-30) * (runif(N, 0, 2^30) %% 2^30 + runif(N))
>
> Please do leave R-bugs out of this: we already have 4 entries as a result of
> your misunderstandings and false claims.
>
> On Thu, 14 Aug 2008, shli_at_stat.wvu.edu wrote:
>
>> This message is in MIME format. The first part should be readable text,
>> while the remaining parts are likely unreadable without MIME-aware tools.
>>
>> ---559023410-851401618-1218751024=:15885
>> Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
>> Content-Transfer-Encoding: QUOTED-PRINTABLE
>>
>>
>> I didn't describe the problem clearly. It's about the number of distinct=20
>> values. So just ignore cycle issue.
>>
>> My tests were:
>>
>> RNGkind(kind=3D"Knuth-TAOCP");
>> sum(duplicated(runif(1e7))); #return 46552
>>
>> RNGkind(kind=3D"Knuth-TAOCP-2002");
>> sum(duplicated(runif(1e7))); #return 46415
>>
>> #These collision frequency suggested there were 2^30 distinct values by=20
>> birthday problem.
>>
>>
>> RNGkind(kind=3D"Marsaglia-Multicarry");
>> sum(duplicated(runif(1e7))); #return 11682
>>
>> RNGkind(kind=3D"Super-Duper");
>> sum(duplicated(runif(1e7))); #return 11542
>>
>> RNGkind(kind=3D"Mersenne-Twister");
>> sum(duplicated(runif(1e7))); #return 11656
>>
>> #These indicated there were 2^32 distinct values, which agrees with the=20
>> help info.
>>
>> RNGkind(kind=3D"Wichmann-Hill");
>> sum(duplicated(runif(1e7))); #return 0
>>
>> #So for this method, there should be more than 2^32 distinct values.
>>
>> You may not get the exact numbers, but they should be close. So how to=20
>> explain above problem?
>>
>> I need generate a large sample without any ties, it seems to me=20
>> "Wichmann-Hill" is only choice right now.
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> Shengqiao Li
>>
>> The Department of Statistics
>> PO Box 6330
>> West Virginia University
>> Morgantown, WV 26506-6330
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>
>> On Thu, 14 Aug 2008, Peter Dalgaard wrote:
>>
>>> Shengqiao Li wrote:
>>>> Hello all,
>>>> =20
>>>> I am generating large samples of random numbers. The RNG help page says:=
>> =20
>>>> "All the supplied uniform generators return 32-bit integer values that a=
>> re=20
>>>> converted to doubles, so they take at most 2^32 distinct values and long=
>> =20
>>>> runs will return duplicated values." But I find that the cycles are not =
>> the=20
>>>> same as the 32-bit integer.
>>>> =20
>>>> My test indicated that the cycles for Knuth's methods were 2^30 while=20
>>>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated i=
>> n=20
>>>> 10M numbers generated by runif using Wichmann-Hill. The other three meth=
>> ods=20
>>>> had cycle length of 2^32.
>>>> =20
>>>> So, anybody can explain this? And any improvement to the implementation =
>> can=20
>>>> be made to increase the cycle length like the Wichmann-Hill method?
>>>> =20
>>> What test? These are not simple linear congruential generators. Just beca=
>> use=20
>>> you get the same value twice, it doesn't mean that the sequence is repeat=
>> ing.=20
>>> Perhaps you should read the entire help page rather than just the note.
>>>
>>> --=20
>>> O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, Entr.B
>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
>>> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907
>>>
>>>
>> ---559023410-851401618-1218751024=:15885--
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 15 Aug 2008 - 19:41:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 15 Aug 2008 - 20:38:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive