Re: [Rd] How to execute R scripts simultaneously from multiple threads

From: Martin Morgan <>
Date: Thu 04 Jan 2007 - 20:07:09 GMT

Vladimir, Jeff, et al.,

This is more pre-publicity than immediately available solution, but we've been working on the 'RWebServices' project. The R evaluator and user functions get wrapped in Java, and the Java exposed as web service. We use ActiveMQ to broker transactions between the front-end web service and persistent back-end R workers. The workers rely on SJava to wrap and evaluate R.

Some of the features and opportunities of this system are: strongly typed functions in R (using the TypeInfo package); native R-Java translation (using SJava and our own converters); programmatic interface (i.e., as web services; this benefits from use of S4 as a formal class system); scalable computation (through addition of more / specialized workers in ActiveMQ); and access to Java-based tools available for web service deployment. Creating web services can be nearly automatic, once the R functions are appropriately typed. Mostly our focus has been on big-data computation, which might be orthogonal to the needs of the original post.

We will provide more information at the Directions in Statistical Computing conference in mid-February, so please drop me a line if you'd like to be kept up-to-date.


Martin T. Morgan
Bioconductor / Computational Biology

Jeffrey Horner <> writes:

> Vladimir Dergachev wrote:
>> On Thursday 04 January 2007 4:54 am, Erik van Zijst wrote:
>>> Vladimir Dergachev wrote:
>>>> On Wednesday 03 January 2007 3:47 am, Erik van Zijst wrote:
>>>>> Appearantly the R C-API does not provide a mechanism for parallel
>>>>> execution..
>>>>> It is preferred that the solution is not based on multi-processing (like
>>>>> C/S), because that would introduce IPC overhead.
>>>> One thing to keep in mind is that IPC is very fast in Linux. So unless
>>>> you are making lots of calls to really tiny functions this should not be
>>>> an issue.
>>> Using pipes or shared memory to pass things around to other processes on
>>> the same box is very fast indeed, but if we base our design around
>>> something like RServe which uses TCP it could be significantly slower.
>>> Our R-based system will be running scripts in response to high-volume
>>> real-time stock exchange data, so we expect lots of calls to many tiny
>>> functions indeed.
>> Very interesting :)
>> If you are running RServe on the other box you will need to send data over
>> ethernet anyway (and will probably use TCP). If it is on the same box and you
>> use "localhost" the packets will go over loopback - which would be
>> significantly faster.

> I haven't looked at RServe in awhile, but I think that it fires up an R
> interpreter in response to a client request and then sticks around for
> the same client to serve it additional requests. The question is how
> does it manage all the R interpreters with varying demand.
> This issue is solved when you embed R into Apache (using the prefork
> MPM), as the pool of apache child processes (each with their own R
> interpreter) expands and contracts on demand. Using this with the
> loopback device would be a nice solution:

> Jeff
> --
> ______________________________________________
> mailing list
______________________________________________ mailing list
Received on Fri Jan 05 07:14:08 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 05 Jan 2007 - 10:31:01 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.