Re: [Rd] efficiency of sample() with prob.

From: Bo Peng <ben.bob_at_gmail.com>
Date: Fri 24 Jun 2005 - 15:32:45 GMT

On 6/24/05, Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:
> `Research' involves looking at all the competitor methods, devising a
> near-optimal strategy and selecting amongst methods according to that
> strategy. It is not a quick fix we are looking for but something that
> will be good for the long term.
>

I am sorry but I am afraid that I do not have enough time and background knowledge
to do a thorough research in this area. I have tried bisection search method and the alias
method, the latter has greatly improved the performance of my bioinformatics application. Since this method is the only one mentioned in Knuth's book, I have no
idea about other alternatives.

Attached is a slightly improved version of the alias method. It may be helpful to people
having similar problems.

Thanks.

--
Bo Peng
Department of Statistics
Rice University.
http://bp6.stat.rice.edu:8080/

/* replace probSampleReplace in src/main/random.c */
static void ProbSampleReplace(int n, double *p, int *perm, int nans, int *ans)
{
    /* allocate memory for a, p, H and L is the second half of a  */
    double * q = Calloc(n, double);
    int * a = Calloc(2*n, int);
    int * LL = a+n, * HH = a+2*n-1;  /* start of H and L vector */
    int * L = LL, * H = HH;          /* end of H and L vector */
    int i, j, k;
    double rU;  /* U[0,1)*n */

    /* set up alias table */
    /* initialize q with n*p0,...n*p_n-1 */
    for(i=0; i<n; ++i)
        q[i] = p[i]*n;

    /* initialize a with indices */
    for(i=0; i<n; ++i)
        a[i] = i;

    /* set up H and L */
    for(i=0; i<n; ++i) {
        if( q[i] >= 1.)
            *H-- = i;
        else
            *L++ = i;
    }

    while( L != LL && H != HH ) {
        j = *(L-1);
        k = *(H+1);
        a[j] = k;
        q[k] += q[j] - 1;
        L--;                                  /* remove j from L */
        if( q[k] < 1. ) {
            *L++ = k;                         /* add k to L */
            ++H;                              /* remove k */
        }
    }

    /* generate sample */
    for (i = 0; i < nans; ++i) {
	rU = unif_rand() * n;

        k = (int)(rU);
        rU -= k;  /* rU becomes rU-[rU] */

        if( rU < q[k] )
            ans[i] = k+1;
        else
            ans[i] = a[k]+1;
    }
    Free(a);
    Free(q); 
}

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat Jun 25 01:35:58 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:09 GMT