From: Sean O'Riordain <seanpor_at_acm.org>

Date: Thu 21 Sep 2006 - 07:32:15 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Sep 21 17:36:52 2006

Date: Thu 21 Sep 2006 - 07:32:15 GMT

Prof Ripley,

You are absolutely correct, this code will not work at all - for
starters, M isn't correctly initialized, etc. I edited my code in
tinn-r and never ran it before posting... my apologies, I always seem
to be too quick off the mark to reply - despite the 4*runif(1)
suggestion in the posting guide...

I hadn't realized before what significant difference the replace=TRUE would make to the runtime... Now I can just use the sample() code you suggested and remove my runif() code altogether, as sample(t+1, 1, replace=TRUE) - 1 will work fine with t <- 2e9 which is considerably more that I need.

Thanks again to both Prof Ripley and Duncan Murdoch,
Sean O'Riordain

affiliation <- NULL

On 19/09/06, Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:

> On Tue, 19 Sep 2006, Sean O'Riordain wrote:

*>
**> > Hi Duncan,
**> >
**> > Thanks for that. In the light of what you've suggested, I'm now using
**> > the following:
**> >
**> > # generate a random integer from 0 to t (inclusive)
**> > if (t < 10000000) { # to avoid memory problems...
**> > M <- sample(t, 1)
**> > } else {
**> > while (M > t) {
**> > M <- as.integer(urand(1,min=0, max=t+1-.Machine$double.eps))
**> > }
**> > }
**>
**> sample(t, 1) is a sample from 1:t, not 0:t.
**>
**> You need
**>
**> sample(t+1, 1, replace=TRUE) - 1
**>
**> which works in all cases up to INT_MAX-1, and beyond that you need to
**> worry about the resolution of the RNG (and to use floor not as.integer).
**>
**> There is no such thing as urand in base R ....
**>
**> >
**> > cheers and Thanks,
**> > Sean
**> >
**> > On 18/09/06, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
**> >> On 9/18/2006 3:37 AM, Sean O'Riordain wrote:
**> >>> Good morning,
**> >>>
**> >>> I'm trying to concisely generate a single integer from 0 to n
**> >>> inclusive, where n might be of the order of hundreds of millions.
**> >>> This will however be used many times during the general procedure, so
**> >>> it must be "reasonably efficient" in both memory and time... (at some
**> >>> later stage in the development I hope to go vectorized)
**> >>>
**> >>> The examples I've found through searching RSiteSearch() relating to
**> >>> generating random integers say to use : sample(0:n, 1)
**> >>> However, when n is "large" this first generates a large sequence 0:n
**> >>> before taking a sample of one... this computer doesn't have the memory
**> >>> for that!
**> >>
**> >> You don't need to give the whole vector: just give n, and you'll get
**> >> draws from 1:n. The man page is clear on this.
**> >>
**> >> So what you want is sample(n+1, 1) - 1. (Use "replace=TRUE" if you want
**> >> a sample bigger than 1, or you'll get sampling without replacement.)
**> >>>
**> >>> When I look at the documentation for runif(n, min, max) it states that
**> >>> the generated numbers will be min <= x <= max. Note the "<= max"...
**> >>
**> >> Actually it says that's the range for the uniform density. It's silent
**> >> on the range of the output. But it's good defensive programming to
**> >> assume that it's possible to get the endpoints.
**> >>
**> >>>
**> >>> How do I generate an x such that the probability of being (the
**> >>> integer) max is the same as any other integer from min (an integer) to
**> >>> max-1 (an integer) inclusive... My attempt is:
**> >>>
**> >>> urand.int <- function(n,t) {
**> >>> as.integer(runif(n,min=0, max=t+1-.Machine$double.eps))
**> >>> }
**> >>> # where I've included the parameter n to help testing...
**> >>
**> >> Because of rounding error, t+1-.Machine$double.eps might be exactly
**> >> equal to t+1. I'd suggest using a rejection method if you need to use
**> >> this approach: but sample() is better in the cases where as.integer()
**> >> will work.
**> >>
**> >> Duncan Murdoch
**> >>>
**> >>> is floor() "better" than as.integer?
**> >>>
**> >>> Is this correct? Is the probability of the integer t the same as the
**> >>> integer 1 or 0 etc... I have done some rudimentary testing and this
**> >>> appears to work, but power being what it is, I can't see how to
**> >>> realistically test this hypothesis.
**> >>>
**> >>> Or is there a a better way of doing this?
**> >>>
**> >>> I'm trying to implement an algorithm which samples into an array,
**> >>> hence the need for an integer - and yes I know about sample() thanks!
**> >>> :-)
**> >>>
**> >>> { incidentally, I was surprised to note that the maximum value
**> >>> returned by summary(integer_vector) is "pretty" and appears to be
**> >>> rounded up to a "nice round number", and is not necessarily the same
**> >>> as max(integer_vector) where the value is large, i.e. of the order of
**> >>> say 50 million }
**> >>>
**> >>> Is version etc relevant? (I'll want to be portable)
**> >>>> version _
**> >>> platform i386-pc-mingw32
**> >>> arch i386
**> >>> os mingw32
**> >>> system i386, mingw32
**> >>> status
**> >>> major 2
**> >>> minor 3.1
**> >>> year 2006
**> >>> month 06
**> >>> day 01
**> >>> svn rev 38247
**> >>> language R
**> >>> version.string Version 2.3.1 (2006-06-01)
**> >>>
**> >>> Many thanks in advance for your help.
**> >>> Sean O'Riordain
**> >>> affiliation <- NULL
**> >>>
**> >>> ______________________________________________
**> >>> R-help@stat.math.ethz.ch mailing list
**> >>> https://stat.ethz.ch/mailman/listinfo/r-help
**> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> >>> and provide commented, minimal, self-contained, reproducible code.
**> >>
**> >>
**> >
**> > ______________________________________________
**> > R-help@stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**> --
**> Brian D. Ripley, ripley@stats.ox.ac.uk
**> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
**> University of Oxford, Tel: +44 1865 272861 (self)
**> 1 South Parks Road, +44 1865 272866 (PA)
**> Oxford OX1 3TG, UK Fax: +44 1865 272595
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Sep 21 17:36:52 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Thu 21 Sep 2006 - 08:30:06 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*