Re: [R] multithreading calling from the rpy Python package

From: Duncan Temple Lang <duncan_at_wald.ucdavis.edu>
Date: Thu 12 Oct 2006 - 16:43:01 GMT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[Taken from below]
> Is this because R itself isn't thread-safe, or maybe the R code I'm
> calling? I've found discussions on "why should we make R thread-safe
> and how" on the website, but there appears to be no date on these
> documents.
>

It is a mixture of two things. Yes, R is not thread safe so if two system threads were to access R concurrently, bad things would happen a.s.
It is also an issue when Python is compiled and linked with threaded options and routines from the system, e.g. libpthread and R is not. When R is dynamically loaded into the Python process, unless R is very carefully compiled, symbols (i.e. routines) that R uses will come from the Python executable and these may not agree with R's view at compilation. And bad things happen. This depends on your operating system, and it doesn't appear that you have told us what that is. Bad boy :-) This is an issue with Rpy, RSPython, RSPerl, R apache module, rJava, .......

I have started down the road of making R thread-safe and threaded on several occassions. I have not committed these extensive changes for a variety of reasons. One is that a lot of R internals would change and this would have an impact of packages with native code. So we need a way to, at least partially, automate this for package authors. I am making a lot of progress in that front recently with the RGCCTranslationUnit package which allows us to examine C/C++ code from within R.

[The following is definitely for R-devel, so anyone replying, please remove the r-help and cc r-devel@r-project.org]

And one of the issues that also makes me hesitate in doing this is whether we shouldn't take the time to introduce additional extensive changes in the architecture of an R-like interpreter, e.g. make it extensible at the native level. For stat. computing to continue to grow and for all of us to be able to explore newer areas, we probably need to think about building infrastructure for the next 5- 10 years and not continue to tweak a model that has been around for 30 years. How we do this requires some serious thought and evaluating trade-offs of building things ourselves with a small community or leveraging other existing or emerging systems, e.g. Python, Perl6/Parrot, etc.

 My $.02

  D.

René J.V. Bertin wrote:
> Hello,
>
> I don't know if this question ought to go here, or rather on R-devel,
> so please bear with me.
>
> I'm interfacing to R via RPy (rpy.sf.net) and an embedded Python
> interpreter. This is really quite convenient.
>
> I use this approach to calculate the correlation coefficient of 1
> independent dataset (vector) with 4 dependent vectors. It'd be nice if
> that could be done in 4 parallel threads, or even two.
>
> As long as I stick to pure Python code (using equivalents to R
> routines that can be found in Numpy and SciPy), this works fine.
> (Tested on a single-core machine.) However, when I call R functions
> through rpy, a crash will occur at some point, with the error
>
> *** caught segfault ***
> address 0x5164000, cause 'memory not mapped'
>
> (this is on Mac OS X 10.4.8), somewhere in Rf_eval:
> Thread 4 Crashed:
> 0 libR.dylib 0x03676af0 Rf_eval + 128
> 1 libR.dylib 0x03676e6c Rf_eval + 1020
> 2 libR.dylib 0x03677108 Rf_eval + 1688
> 3 libR.dylib 0x03676e6c Rf_eval + 1020
> 4 libR.dylib 0x03677108 Rf_eval + 1688
> 5 libR.dylib 0x03676e6c Rf_eval + 1020
> 6 libR.dylib 0x03677108 Rf_eval + 1688
> 7 libR.dylib 0x03678144 Rf_evalList + 148
> 8 libR.dylib 0x036bb5cc do_internal + 796
> 9 libR.dylib 0x03676fbc Rf_eval + 1356
> 10 libR.dylib 0x0367ad10 Rf_applyClosure + 1120
> 11 libR.dylib 0x03676e3c Rf_eval + 972
> 12 libR.dylib 0x0367ad10 Rf_applyClosure + 1120
> 13 libR.dylib 0x03676e3c Rf_eval + 972
> 14 libR.dylib 0x0367a110 do_if + 48
> 15 libR.dylib 0x03676fbc Rf_eval + 1356
> 16 libR.dylib 0x0367932c do_begin + 108
> 17 libR.dylib 0x03676fbc Rf_eval + 1356
> 18 libR.dylib 0x0367ad10 Rf_applyClosure + 1120
> 19 libR.dylib 0x03676e3c Rf_eval + 972
> 20 libR.dylib 0x0361b7c0 protectedEval + 64
> 21 libR.dylib 0x0361c170 R_ToplevelExec + 544
> 22 libR.dylib 0x0361c22c R_tryEval + 60
> 23 _rpy2031.so 0x032f0b8c do_eval_expr + 108
>
>>>24 _rpy2031.so 0x032ef950 Robj_call + 688
>
> 25 Python2.5 0x023c6c08 PyObject_Call + 56
> 26 Python2.5 0x024a68ec PyEval_EvalFrameEx + 16844
> 27 Python2.5 0x024a8cf8 PyEval_EvalFrameEx + 26072
> 28 Python2.5 0x024aaef8 PyEval_EvalCodeEx + 3512
> 29 Python2.5 0x024a7ce0 PyEval_EvalFrameEx + 21952
> 30 Python2.5 0x024a8cf8 PyEval_EvalFrameEx + 26072
> 31 Python2.5 0x024aaef8 PyEval_EvalCodeEx + 3512
> 32 Python2.5 0x023fbb88 function_call + 472
> 33 Python2.5 0x023c6c08 PyObject_Call + 56
> 34 Python2.5 0x023d3294 instancemethod_call + 388
> 35 Python2.5 0x023c6c08 PyObject_Call + 56
> 36 Python2.5 0x024a0cf4 PyEval_CallObjectWithKeywords + 276
> 37 Python2.5 0x024f244c t_bootstrap + 60
> 38 libSystem.B.dylib 0x9002b508 _pthread_body + 96
>
>
> Is this because R itself isn't thread-safe, or maybe the R code I'm
> calling? I've found discussions on "why should we make R thread-safe
> and how" on the website, but there appears to be no date on these
> documents.
>
> The R/Python wrapper functions I'm using:
>
> # a variance calculator that returns 0 for vectors that have only 1
> non-NaN element:
> def vvar(a):
> v=rpy.r.var(a, na_rm=True)
> if isnan(v):
> return 0
> return v
>
> # Calculate the Spearman Rho correlation between a and b and return the result
> # as scipy.stats.stats.spearmanr() does:
> R_spearmanr=rpy.r('function(a,b){ kk<-cor.test(a,b,method="spearman");
> c( kk$estimate[[1]], kk$p.value) ; }')
>
> I'm taking care to make copies of the arrays I'm correlating when
> initialising the threads. (I can post more of the Python code, if
> required.)
> I'm using R 2.3.1 .
>
> thanks in advance,
> René
>
> (as always, please CC me on replies sent to the list, thanks!)
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

iD8DBQFFLnCV9p/Jzwa2QP4RAkIRAJ9IoVzSThKySLEdriqrIc1ytASqZwCeKtPo dEPN+UBNoItTrz5GgJpdTL8=
=T+1X
-----END PGP SIGNATURE-----



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Oct 13 02:47:02 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 12 Oct 2006 - 17:30:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.