Re: [Rd] How to execute R scripts simultaneously from multiple threads

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Thu 04 Jan 2007 - 20:07:09 GMT

Vladimir, Jeff, et al.,

This is more pre-publicity than immediately available solution, but we've been working on the 'RWebServices' project. The R evaluator and user functions get wrapped in Java, and the Java exposed as web service. We use ActiveMQ to broker transactions between the front-end web service and persistent back-end R workers. The workers rely on SJava to wrap and evaluate R.

Some of the features and opportunities of this system are: strongly typed functions in R (using the TypeInfo package); native R-Java translation (using SJava and our own converters); programmatic interface (i.e., as web services; this benefits from use of S4 as a formal class system); scalable computation (through addition of more / specialized workers in ActiveMQ); and access to Java-based tools available for web service deployment. Creating web services can be nearly automatic, once the R functions are appropriately typed. Mostly our focus has been on big-data computation, which might be orthogonal to the needs of the original post.

We will provide more information at the Directions in Statistical Computing conference in mid-February, so please drop me a line if you'd like to be kept up-to-date.

Martin

-- 
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org

Jeffrey Horner <jeff.horner@vanderbilt.edu> writes:


> Vladimir Dergachev wrote:
>> On Thursday 04 January 2007 4:54 am, Erik van Zijst wrote:
>>> Vladimir Dergachev wrote:
>>>> On Wednesday 03 January 2007 3:47 am, Erik van Zijst wrote:
>>>>> Appearantly the R C-API does not provide a mechanism for parallel
>>>>> execution..
>>>>>
>>>>> It is preferred that the solution is not based on multi-processing (like
>>>>> C/S), because that would introduce IPC overhead.
>>>> One thing to keep in mind is that IPC is very fast in Linux. So unless
>>>> you are making lots of calls to really tiny functions this should not be
>>>> an issue.
>>> Using pipes or shared memory to pass things around to other processes on
>>> the same box is very fast indeed, but if we base our design around
>>> something like RServe which uses TCP it could be significantly slower.
>>> Our R-based system will be running scripts in response to high-volume
>>> real-time stock exchange data, so we expect lots of calls to many tiny
>>> functions indeed.
>>
>> Very interesting :)
>>
>> If you are running RServe on the other box you will need to send data over
>> ethernet anyway (and will probably use TCP). If it is on the same box and you
>> use "localhost" the packets will go over loopback - which would be
>> significantly faster.

>
> I haven't looked at RServe in awhile, but I think that it fires up an R
> interpreter in response to a client request and then sticks around for
> the same client to serve it additional requests. The question is how
> does it manage all the R interpreters with varying demand.
>
> This issue is solved when you embed R into Apache (using the prefork
> MPM), as the pool of apache child processes (each with their own R
> interpreter) expands and contracts on demand. Using this with the
> loopback device would be a nice solution:

>
> http://biostat.mc.vanderbilt.edu/RApacheProject
>
> Jeff
> --
> http://biostat.mc.vanderbilt.edu/JeffreyHorner
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri Jan 05 07:14:08 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 05 Jan 2007 - 10:31:01 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.