[Rd] Pre-compilation and server-side parallel execution

From: Erik van Zijst <r_at_erik.prutser.cx>
Date: Fri 08 Dec 2006 - 14:51:39 GMT


My company operates a platform that distributes real-time financial data from exchanges to users. To extend our services I want to allow users to write and submit custom R scripts to our platform that operate on our streaming data to do real-time analysis.

We have thousands of users deploying scripts and each script is evaluated repeatedly when certain conditions in the stream apply. For example, a script could compute the NASDAQ100 index value each time one of its 100 constituents trade.

Scripts are typically small and execute quickly. Each script is registered once and then repeatedly evaluated with different parameters
(possibly several times per second per script). In this context my
biggest concern is scalability.

The evaluation engine is a pure server-side component without display abilities. An R-script is invoked with parameters and whatever it returns is sent to the user.

Ideally I'd need a C api to interact with the interpreter. I've looked at projects like R/Apache, RServe and RSJava for inspiration and came to the conclusion that all these projects work by forking multiple instances of the R-engine where each instance evaluates one script at a time.

As our service must evaluate many different scripts concurrently
(isolated from one another), I have the following concerns:

  1. Spawning a pool of engine instances for massive parallel execution is expensive, but might work with lots of memory.
  2. R's native C-api [http://cran.r-project.org/doc/manuals/R-exts.html#The-R-API] does not separate parsing from evaluation. When the same script is evaluated 10 times, it is also parsed 10 times.

I'm mostly concerned about the second issue. Our scripts are registered once and continuously evaluated. I want to avoid parsing the same script again each time it is evaluated. Does the engine recognize previously parsed scripts (like oracle does for SQL queries)?

I interested to hear your thoughts on my concerns and whether you think R would work in this architecture.

kind regards,
Erik van Zijst

And on the seventh day, He exited from append mode.

R-devel@r-project.org mailing list
Received on Sat Dec 09 01:54:35 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 13 Dec 2006 - 05:30:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.