Re: [R] uniform integer RNG 0 to t inclusive

From: Sean O'Riordain <seanpor_at_acm.org>
Date: Thu 21 Sep 2006 - 07:32:15 GMT

Prof Ripley,
You are absolutely correct, this code will not work at all - for starters, M isn't correctly initialized, etc. I edited my code in tinn-r and never ran it before posting... my apologies, I always seem to be too quick off the mark to reply - despite the 4*runif(1) suggestion in the posting guide...

I hadn't realized before what significant difference the replace=TRUE would make to the runtime... Now I can just use the sample() code you suggested and remove my runif() code altogether, as sample(t+1, 1, replace=TRUE) - 1 will work fine with t <- 2e9 which is considerably more that I need.

Thanks again to both Prof Ripley and Duncan Murdoch, Sean O'Riordain
affiliation <- NULL

On 19/09/06, Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:
> On Tue, 19 Sep 2006, Sean O'Riordain wrote:
>
> > Hi Duncan,
> >
> > Thanks for that. In the light of what you've suggested, I'm now using
> > the following:
> >
> > # generate a random integer from 0 to t (inclusive)
> > if (t < 10000000) { # to avoid memory problems...
> > M <- sample(t, 1)
> > } else {
> > while (M > t) {
> > M <- as.integer(urand(1,min=0, max=t+1-.Machine$double.eps))
> > }
> > }
>
> sample(t, 1) is a sample from 1:t, not 0:t.
>
> You need
>
> sample(t+1, 1, replace=TRUE) - 1
>
> which works in all cases up to INT_MAX-1, and beyond that you need to
> worry about the resolution of the RNG (and to use floor not as.integer).
>
> There is no such thing as urand in base R ....
>
> >
> > cheers and Thanks,
> > Sean
> >
> > On 18/09/06, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
> >> On 9/18/2006 3:37 AM, Sean O'Riordain wrote:
> >>> Good morning,
> >>>
> >>> I'm trying to concisely generate a single integer from 0 to n
> >>> inclusive, where n might be of the order of hundreds of millions.
> >>> This will however be used many times during the general procedure, so
> >>> it must be "reasonably efficient" in both memory and time... (at some
> >>> later stage in the development I hope to go vectorized)
> >>>
> >>> The examples I've found through searching RSiteSearch() relating to
> >>> generating random integers say to use : sample(0:n, 1)
> >>> However, when n is "large" this first generates a large sequence 0:n
> >>> before taking a sample of one... this computer doesn't have the memory
> >>> for that!
> >>
> >> You don't need to give the whole vector: just give n, and you'll get
> >> draws from 1:n. The man page is clear on this.
> >>
> >> So what you want is sample(n+1, 1) - 1. (Use "replace=TRUE" if you want
> >> a sample bigger than 1, or you'll get sampling without replacement.)
> >>>
> >>> When I look at the documentation for runif(n, min, max) it states that
> >>> the generated numbers will be min <= x <= max. Note the "<= max"...
> >>
> >> Actually it says that's the range for the uniform density. It's silent
> >> on the range of the output. But it's good defensive programming to
> >> assume that it's possible to get the endpoints.
> >>
> >>>
> >>> How do I generate an x such that the probability of being (the
> >>> integer) max is the same as any other integer from min (an integer) to
> >>> max-1 (an integer) inclusive... My attempt is:
> >>>
> >>> urand.int <- function(n,t) {
> >>> as.integer(runif(n,min=0, max=t+1-.Machine$double.eps))
> >>> }
> >>> # where I've included the parameter n to help testing...
> >>
> >> Because of rounding error, t+1-.Machine$double.eps might be exactly
> >> equal to t+1. I'd suggest using a rejection method if you need to use
> >> this approach: but sample() is better in the cases where as.integer()
> >> will work.
> >>
> >> Duncan Murdoch
> >>>
> >>> is floor() "better" than as.integer?
> >>>
> >>> Is this correct? Is the probability of the integer t the same as the
> >>> integer 1 or 0 etc... I have done some rudimentary testing and this
> >>> appears to work, but power being what it is, I can't see how to
> >>> realistically test this hypothesis.
> >>>
> >>> Or is there a a better way of doing this?
> >>>
> >>> I'm trying to implement an algorithm which samples into an array,
> >>> hence the need for an integer - and yes I know about sample() thanks!
> >>> :-)
> >>>
> >>> { incidentally, I was surprised to note that the maximum value
> >>> returned by summary(integer_vector) is "pretty" and appears to be
> >>> rounded up to a "nice round number", and is not necessarily the same
> >>> as max(integer_vector) where the value is large, i.e. of the order of
> >>> say 50 million }
> >>>
> >>> Is version etc relevant? (I'll want to be portable)
> >>>> version _
> >>> platform i386-pc-mingw32
> >>> arch i386
> >>> os mingw32
> >>> system i386, mingw32
> >>> status
> >>> major 2
> >>> minor 3.1
> >>> year 2006
> >>> month 06
> >>> day 01
> >>> svn rev 38247
> >>> language R
> >>> version.string Version 2.3.1 (2006-06-01)
> >>>
> >>> Many thanks in advance for your help.
> >>> Sean O'Riordain
> >>> affiliation <- NULL
> >>>
> >>> ______________________________________________
> >>> R-help@stat.math.ethz.ch mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley, ripley@stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Sep 21 17:36:52 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 21 Sep 2006 - 08:30:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.