Re: [Rd] C function with unknown output length

From: Herve Pages <>
Date: Wed, 06 Jun 2007 12:20:31 -0700

Vincent Goulet wrote:
> Hi all,
> Could anyone point me to one or more examples in the R sources of a C
> function that is called without knowing in advance what will be the
> length (say) of the output vector?
> To make myself clearer, we have a C function that computes
> probabilities until their sum gets "close enough" to 1. Hence, the
> number of probabilities is not known in advance.

Hi Vincent,

Let's say you want to write a function get_matches(const char * pattern, const char * x) that will find all the occurrences of string 'pattern' in string 'x' and "return" their positions in the form of an array of integers. Of course you don't know in advance how many occurrences you're going to find.

One possible strategy is to:

      int get_matches(int **pos_ptr, const char * pattern, const char * x)

    Note that pos_ptr is a pointer to an int pointer.

      int get_matches(...)
        int *tmp_pos, tmp_size, npos = 0;

        tmp_size = some initial guess of the number of matches
        tmp_pos = (int *) S_alloc((long) tmp_size, sizeof(int));

    Then start searching for matches and every time you find one, store its     position in tmp_pos[npos] and increase npos.     When tmp_pos is full (npos == tmp_size), realloc with:

        old_size = tmp_size;
        tmp_size = 2 * old_size; /* there are many different strategies for this */
        tmp_pos = (int *) S_realloc((char *) tmp_pos, (long) tmp_size,
                                    (long) old_tmp_size, sizeof(int));

    Note that there is no need to check that the call to S_alloc() or S_realloc()     were successful because these functions will raise an error and end the call     to .Call if they fail. In this case they will free the memory currently allocated     (and so will do on any error or user interrupt).

    When you are done, just return with:

        *pos_ptr = tmp_pos;
        return npos;

      int *pos, npos;

      npos = get_matches(&pos, pattern, x);

    Note that memory allocation took place in 'get_matches' but now you need     to decide how and when the memory pointed by 'pos' will be freed.     In the R environment, this can be addressed by using exclusively transient     storage allocation (     as we did in get_matches() so the allocated memory will be automatically     reclaimed at the end of the call to .C or .Call.     Of course, the integers stored in pos have to be moved to a "safe" place     before .Call returns. Typically this will be done with something like:

      SEXP Call_get_matches(...)
        npos = get_matches(&pos, pattern, x);
        PROTECT(pos_sxp = NEW_INTEGER(npos));
        memcpy(INTEGER(pos_sxp), pos, npos * sizeof(int));
        return pos_sxp; /* end of call to .Call */

There are many variations around this. One of them is to "share" pos and npos between get_matches and its caller by making them global variables (in this case it is recommended to use 'static' in their declarations but this requires that get_matches and its caller are in the same .c file).

Hope this helps.


> I would like to have an idea what is the best way to handle this
> situation in R.
> Thanks in advance!
> ---
> Vincent Goulet, Associate Professor
> École d'actuariat
> Université Laval, Québec
> ______________________________________________
> mailing list
> mailing list Received on Wed 06 Jun 2007 - 19:45:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Jun 2007 - 06:34:56 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.