Re: [Rd] Embedding R and registering routines

From: Duncan Temple Lang <duncan_at_wald.ucdavis.edu>
Date: Wed, 02 May 2007 10:12:30 -0700

Simon Urbanek wrote:
> Duncan,
>
> I see your point. But in that case Apache is the one managing the
> life of the so, not R, and in many cases unloading the module would
> also mean to unload R in which case the problem doesn't arise.

Not quite. If apache loads mod_perl and mod_R, say, and "we" make routines in mod_perl available to R, when apache unloads mod_perl but not mod_R, we have to tell R that these are no longer available.

> Also that general case requires that R and the embedding application agree
> on the dylib loading method so they can share the handle. This may
> not be trivial across platforms.

If by handle we are talking about the handle to the DLL, then while, I can see some potential complications in strange cases, generally it is not an issue.
The registration mechanism precisely avoids sharing the handle and deals directly with pointers to routines. Indeed, it is getting away from global variables found by name lookup.

> So on the whole I agree, but I'm not
> quite convinced yet that it's worth the extra effort.. maybe at some
> point ;) ...

Neither am I; just cautious about making things too simple at one step which makes the entire thing more complex in subsequent steps. I think we have data on that ... :-)

>
> Cheers,
> Simon
>
>
> On May 1, 2007, at 5:47 PM, Duncan Temple Lang wrote:
>
> > Simon Urbanek wrote:
> >> Duncan,
> >>
> >> are you going to take care of this? I have a quick solution for R-
> >> devel that adds a special entry if requested.
> >
> > If you want to go ahead, be my guest.
> > I'm somewhat occupied for the next few days...
> >
> >>
> >> I'm not quite convinced that we need as much flexibility as adding
> >> arbitrary DllInfos, because the embedding application is a really
> >> special concept (everything else is dynamically loaded except for the
> >> application). In a sense "base" does that for non-embedded R and the
> >> distinction is that it doesn't allow dynamic lookup. I don't think
> >> adding arbitrary DllInfos is wise, because we would have to expose
> >> DLL handles etc. - do we really want to do that? And as for adding
> >> NULL-handle DLLInfos, there is only one legitimate use and that is
> >> the embedding application, so anything else looks more like abuse to
> >> me... (just lazy solution to not have to determine the dll). Also the
> >> embedded DllInfo cannot be unloaded by design, so it doesn't need
> >> anything complicated...
> >
> > I agree that we don't necessarily want to expose the entire DllInfo
> > structure (but we don't need to - just a constructor function to
> > create a new instance), and also that the embedded case is
> > special. However, Jeff's example illustrates that it is not as simple
> > as the host application maing symbols available to R. In fact, it is
> > not apache that is making the symbols available to R, it is the code
> > in mod_R.so. And it might be that we want to make routines available
> > from a different module dynamically loaded into apache. Now,
> > we can do this by shovelling them all into the "embed" DllInfo,
> > but that is almost the same as putting them all into "base"
> > as we have lost the provenance of the registration.
> > And so if we want to unload an apache module and therefore unregister
> > the routines it provided to R, our life is somewhat more complex.
> >
> > I am not saying that we absolutely need this level of generality.
> > Clearly we have lived without it for a while. However, it does arise
> > in other embedded situations such as when we put R inside
> > Java, Python, Perl, Postgres, ... as each of them can load other
> > .so's. I do believe that we want to and can merge a lot of this
> > inter-system functionality in an increasingly transparent way, and
> > keeping things separate with reflection information is vital for
> > this.
> >
> > And of course, once we make a particular feature such as "add to
> > embed" into R, we are loate to take it out and we live with these
> > constraints for a long time. But in this case, it is not a big deal,
> > so please go ahead if you have the time and want to.
> >
> > Thanks,
> > D.
> >
> >>
> >> Cheers,
> >> Simon
> >>
> >> On May 1, 2007, at 4:24 PM, Duncan Temple Lang wrote:
> >>
> >>> Simon Urbanek wrote:
> >>>> Since I'm not sure I really understand Jeff's question this is just
> >>>> my interpretation, but I think the point was that you may want to
> >>>> register symbols *not* from a DLL but from the embedding
> >>>> application
> >>>> itself (e.g. like R.app GUI that embeds libR registers its entry
> >>>> for
> >>>> quartz.save). I would welcome a support for this, because the
> >>>> current
> >>>> dirty hack (don't do this at home, kids!) is to use R_getDllInfo
> >>>> ("base") and append the entry instead of overwriting it. It is an
> >>>> ugly hack, but I don't think we have any API for this. Maybe a
> >>>> worthwhile endeavor would be to simply add something like
> >>>> R_getDllInfo
> >>>> ("embedded") reserved specifically for such purposes (or "R" or
> >>>> whatever...).
> >>>>
> >>>
> >>> I think we are all talking about the same thing and the code
> >>> that I posted does that for a DLL coming from an arbitrary
> >>> package rather than base.
> >>>
> >>> Rather than having yet another global concept, i.e. "embedded", we
> >>> could allow users to add their own R_DllInfo and so allow mire
> >>> than on
> >>> in the same session. The only issue is removing them, freeing the
> >>> memory, and so on. But this is relatively easy to do, and various
> >>> implementations suggest themselves.
> >>>
> >>>
> >>> Thanks for the feedback.
> >>>
> >>> D.
> >>>
> >>>
> >>>> Cheers,
> >>>> Simon
> >>>>
> >>>> On May 1, 2007, at 1:56 PM, Duncan Temple Lang wrote:
> >>>>
> >>>>> Jeffrey Horner wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> The use of .Call and the like all depend on loading shared
> >>>>>> libraries and
> >>>>>> registering routines from it. Also, .Primitive and .Internal
> >>>>>> depend on
> >>>>>> routines being registered in the R binary. And applications that
> >>>>>> embed R
> >>>>>> can override routines declared in Rinterfac.h, but is there a way
> >>>>>> for an
> >>>>>> application embedding R to register other routines defined in the
> >>>>>> application without loading a shared library or re-compiling R?
> >>>>>
> >>>>> I think I understand the question, and if so, the answer is yes!
> >>>>>
> >>>>> I have put some code near the end of the message that illustrates
> >>>>> (tests) this idea.
> >>>>>
> >>>>> The basic idea is that after you initialize R and load your
> >>>>> RApache package with its .so, you can ask for the corresponding
> >>>>> DllInfo object for that RApache.so. (You need the full path.)
> >>>>>
> >>>>> Then, you call R_registerRoutines() with that object as the first
> >>>>> argument and your collection of routines for .C, .Call, .Fortran,
> >>>>> etc.
> >>>>> And then those routines are available to R via the corresponding
> >>>>> interface function.
> >>>>>
> >>>>> This is currently slightly strained in two ways.
> >>>>>
> >>>>> Firstly, R_registerRoutines() just overwrites any existing
> >>>>> registered
> >>>>> entries. So we should have something that allows us to append to
> >>>>> this. We could add something, if this is a worthwhile approach and
> >>>>> others want to chime in with comments.
> >>>>>
> >>>>> Also we are adding these symbols to a table to which they do not
> >>>>> really belong, i.e. pretending they are the same as the
> >>>>> routines in
> >>>>> RApache.so. But it works. Ideally, we would like to be able to
> >>>>> create
> >>>>> and add our own special type of DllInfo. A class system from an
> >>>>> object-oriented language would really help here. But we also
> >>>>> would
> >>>>> need to make this possible via the R API.
> >>>>>
> >>>>>
> >>>>> (Another hacky, unreliable way is using global symbols.
> >>>>> It is possible for R to resolve symbols on some platforms
> >>>>> by looking in the application's global symbol table.
> >>>>> So R could find symbols in the executable. Of course, you load
> >>>>> mod_R.so and so its symbols are not likely to be in the global
> >>>>> symbol
> >>>>> as I doubt very much Apache loads modules globally.
> >>>>> And we would also have to bed R slightly to make this work.
> >>>>> )
> >>>>>
> >>>>>
> >>>>> main.c:
> >>>>> -----------------------------
> >>>>> #include <Rinternals.h>
> >>>>> #include <Rembedded.h>
> >>>>> #include <R_ext/Rdynload.h>
> >>>>>
> >>>>> void
> >>>>> foo(int *x)
> >>>>> {
> >>>>> fprintf(stderr, "In foo\n");
> >>>>> *x = 101;
> >>>>> }
> >>>>>
> >>>>> SEXP
> >>>>> bar(SEXP n)
> >>>>> {
> >>>>> return(ScalarInteger(INTEGER(n)[0] * 2));
> >>>>> }
> >>>>>
> >>>>> void
> >>>>> unregistered()
> >>>>> {
> >>>>> fprintf(stderr, "In unregistered\n");
> >>>>> }
> >>>>>
> >>>>> static R_CallMethodDef callMethods[] = {
> >>>>> {"bar", (DL_FUNC) &bar, 1},
> >>>>> {NULL, NULL, 0}
> >>>>> };
> >>>>>
> >>>>> static R_CMethodDef cmethods[] = {
> >>>>> {"foo", (DL_FUNC) &foo, 1}, /* type { INTSXP }*/
> >>>>> {NULL, NULL, 0}
> >>>>> };
> >>>>>
> >>>>> void
> >>>>> registerApplicationRoutinesWithR()
> >>>>> {
> >>>>> DllInfo *dll;
> >>>>> dll = R_getDllInfo("/home/duncan/Rpackage/XML/libs/XML.so");
> >>>>> R_registerRoutines(dll, cmethods, callMethods, NULL, NULL);
> >>>>> }
> >>>>>
> >>>>> int
> >>>>> main(int argc, char *argv[])
> >>>>> {
> >>>>> int errorOccurred = 0;
> >>>>> SEXP e;
> >>>>> Rf_initEmbeddedR(argc, argv);
> >>>>> registerApplicationRoutinesWithR();
> >>>>>
> >>>>> PROTECT(e = allocVector(LANGSXP, 2));
> >>>>> SETCAR(e, Rf_install("source"));
> >>>>> SETCAR(CDR(e), mkString("test.R"));
> >>>>> R_tryEval(e, R_GlobalEnv, &errorOccurred);
> >>>>>
> >>>>> return(0);
> >>>>> }
> >>>>>
> >>>>>
> >>>>> test.R:
> >>>>> ---------------------------
> >>>>> print(.C("foo", x= as.integer(1))$x)
> >>>>> print(.Call("bar", as.integer(3)))
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> GNUmakefile:
> >>>>> -------------------------------------
> >>>>>
> >>>>> CFLAGS=-g -I$(R_HOME)/include
> >>>>>
> >>>>> main: main.o
> >>>>> $(CC) -o $@ $^ -L$(R_HOME)/lib -lR
> >>>>>
> >>>>>>
> >>>>>> The only such way I've found that comes close to a solution to
> >>>>>> this is
> >>>>>> creating an RObjectTable and attaching that to the search path.
> >>>>>> Assignments to variables in that environment can call the
> >>>>>> table's get
> >>>>>> routine which is defined in the application, and I think that
> >>>>>> might be
> >>>>>> an interesting solution for a new RApache implementation.
> >>>>>>
> >>>>>> For the RApache Project, the mod_R.c shared library get's loaded
> >>>>>> into
> >>>>>> the apache process and its purpose is to initializes R. Next, it
> >>>>>> calls
> >>>>>> 'library(RApache)' to load RApache.so, a package that implements
> >>>>>> the
> >>>>>> RApache API. This two-library system works, but the
> >>>>>> implementation is
> >>>>>> too complex. I'd like to simplify down to just one shared
> >>>>>> library.
> >>>>>>
> >>>>>> Any comments, suggestion are much appreciated.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Jeff
> >>>>>> --
> >>>>>> http://biostat.mc.vanderbilt.edu/JeffreyHorner
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-devel_at_r-project.org mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>>
> >>>>> --
> >>>>> Duncan Temple Lang duncan_at_wald.ucdavis.edu
> >>>>> Department of Statistics work: (530) 752-4782
> >>>>> 4210 Mathematical Sciences Bldg. fax: (530) 752-7099
> >>>>> One Shields Ave.
> >>>>> University of California at Davis
> >>>>> Davis, CA 95616, USA
> >>>>>
> >>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-devel_at_r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>> --
> >>> Duncan Temple Lang duncan_at_wald.ucdavis.edu
> >>> Department of Statistics work: (530) 752-4782
> >>> 4210 Mathematical Sciences Bldg. fax: (530) 752-7099
> >>> One Shields Ave.
> >>> University of California at Davis
> >>> Davis, CA 95616, USA
> >>>
> >>>
> >>>
> >
> > --
> > Duncan Temple Lang duncan_at_wald.ucdavis.edu
> > Department of Statistics work: (530) 752-4782
> > 4210 Mathematical Sciences Bldg. fax: (530) 752-7099
> > One Shields Ave.
> > University of California at Davis
> > Davis, CA 95616, USA
> >
> >
> >
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Duncan Temple Lang                duncan_at_wald.ucdavis.edu
Department of Statistics          work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA




______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

  • application/pgp-signature attachment: stored
Received on Wed 02 May 2007 - 17:47:53 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 May 2007 - 07:34:02 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.