From: Simon Urbanek <simon.urbanek_at_r-project.org>

Date: Tue, 19 Jul 2011 14:47:46 -0400

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 19 Jul 2011 - 18:50:13 GMT

Date: Tue, 19 Jul 2011 14:47:46 -0400

On Jul 19, 2011, at 11:07 AM, Alireza Mahani wrote:

> Further pursuing my curiosity to measure the efficiency of R/C++ interface, I

*> conducted a simple matrix-vector multiplication test using .C and .Call
**> functions in R. In each case, I measured the execution time in R, as well as
**> inside the C++ function. Subtracting the two, I came up with a measure of
**> overhead associated with each call. I assume that this overhead would be
**> non-existent of the entire code was implemented in C++. [Please let me know
**> if my approach is somehow faulty.] The results of this simple test have
**> significant implications for my software development approach. I hope others
**> find this information useful as well. I have attached the code files in the
**> bottom. Below I provide a summary output, and interpret the results. [All
**> times in table below are in microseconds]
**>
**> m/n/time_.Call_R/time_.Call_C++/.Call_overhead/time_.C_R/time_.C_C++/.C_overhead
**> 100/10/3.23/1.33/*1.90*/16.89/0.96/*15.93*
**> 200/15/5.38/3.44/*1.94*/30.77/2.81/*27.96*
**> 500/20/12.48/10.4/*2.08*/78.12/9.37/*68.75*
**>
*

Your "overhead" calculations are anything but since they include a loop, value replacement, arithmetics, unnecessary function calls (you don't need as.double() for .Call - it's more efficient to use coerceVector() or throw an error depending on what you desire) - that have nothing to do with .Call.

In addition, if you really cared about overhead, you would avoid symbol lookup millions of times:

foo.c:

#include <Rinternals.h>

SEXP foo() { return R_NilValue; }

> system.time(for(i in 1:1e6) .Call("foo"))

user system elapsed

4.943 0.016 4.960

> foo = getNativeSymbolInfo("foo")$address

> system.time(for(i in 1:1e6) .Call(foo))

user system elapsed

0.363 0.001 0.365

This is very close to just calling a constant closure with the same result:

> f=function() NULL

> system.time(for(i in 1:1e6) f())

user system elapsed

0.238 0.001 0.239

So .Call overhead is essentially negligible - and not what you were measuring ;).

There is only a tiny overhead in forcing the argument promise, but it's constant:

> x=rnorm(1e6)

> system.time(for(i in 1:1e6) .Call(foo,x))

user system elapsed

0.455 0.001 0.456

> x=rnorm(1e9)

> system.time(for(i in 1:1e6) .Call(foo,x))

user system elapsed

0.454 0.000 0.455

and it's again no different that for a closure:

> f=function(x) NULL

> system.time(for(i in 1:1e6) f())

user system elapsed

0.259 0.001 0.259

> system.time(for(i in 1:1e6) f(x))

user system elapsed

0.329 0.001 0.329

so this is inherent to any call. .Call is the fastest you can get, there is nothing faster (other than hacking your code into R internals...).

Cheers,

Simon

> Interpretation:

*>
**> 1- .Call overhead holds nearly constant, i.e. independent of data size. This
**> is expected since .Call works by passing pointers. (How can we explain the
**> slight increase in overhead?)
**> 2- C++ times for .C are somewhat better than .Call. This is likely to be due
**> to the overhead associated with unpacking the SEXP pointers in a .Call
**> function.
**> 3- The overhead for .C dominates the execution time. For a 500x20 matrix,
**> the overhead is ~90% of total time. This means that whenever we need to make
**> repeated calls to a C/C++ function from R, and when performance is important
**> to us, .Call is much preferred to .C, even at modest data sizes.
**> 4- Overhead for .C scales sub-linearly with data size. I imagine that this
**> overhead can be reduced through registering the functions and using the
**> style field in R_CMethodDef to optimize data transfer (per Section 5.4 of
**> "Writing R Extensions"), but perhaps not by more than a half.
**> 5- Even using .Call, the overhead is still a significant percentage of total
**> time, though the contribution decreases with data size (and compute load of
**> function). However, in the context of parallelization benefits, this
**> overhead puts an unpleasant cap on achievable gains. For example, even if
**> the compute time goes down by 100x, if overhead was even 10% of the time,
**> our effective gain will saturate at 10x. In other words, effective
**> parallelization can only be achieved by moving big chunks of the code to
**> C/C++ so as to avoid repeated calls from R to C.
**>
**> I look forward to your comments.
**>
**> -Alireza
**> --------------------------------
**> #code file: mvMultiply.r
**> #measuring execution time for a matrix-vector multiplication (A%*%x) using
**> .C and .Call functions
**> #execution time is measured both in R and inside the C++ functions;
**> subtracting the two is
**> #used as an indicator of overhead associated with each call
**>
**> rm(list = ls())
**> system("R CMD SHLIB mvMultiply.cc")
**> dyn.load("mvMultiply.so")
**>
**> m <- 100 #number of rows in matrix A
**> n <- 10 #number of columns in matrix A (= number of rows in vector x)
**> N <- 1000000
**>
**> A <- runif(m*n)
**> x <- runif(n)
**>
**> # measuring .Call
**> tCall_c <- 0.0
**> t1 <- proc.time()[3]
**> for (i in 1:N) {
**> tCall_c <- tCall_c + .Call("matvecMultiply", as.double(A), as.double(x))
**> }
**> tCall_R <- proc.time()[3] - t1
**> cat(".Call - Time measured in R: ", round(tCall_R,2), "sec\n")
**> cat(".Call - Time measured in C++: ", round(tCall_c,2), "sec\n")
**> cat(".Call - Implied overhead: ", round(tCall_R,2) - round(tCall_c,2), "sec
**> -> per call: ", 1000000*(round(tCall_R,2) - round(tCall_c,2))/N, "usec\n")
**>
**> # measuring .C
**> tC_c <- 0.0
**> t1 <- proc.time()[3]
**> for (i in 1:N) {
**> tC_c <- tC_c + .C("matvecMultiply2", as.double(A), as.double(x),
**> double(c(m)), as.integer(c(m)), as.integer(c(n)), t = double(1))$t
**> }
**> tC_R <- proc.time()[3] - t1
**> cat(".C - Time measured in R: ", round(tC_R,2), "sec\n")
**> cat(".C - Time measured in C++: ", round(tC_c,2), "sec\n")
**> cat(".C - Implied overhead: ", round(tC_R,2) - round(tC_c,2), "sec -> per
**> call: ", 1000000*(round(tC_R,2) - round(tC_c,2))/N, "usec\n")
**>
**> dyn.unload("mvMultiply.so")
**> --------------------------------
**> #code file: myMultiply.cc
**> #include <Rinternals.h>
**> #include <R.h>
**> #include <sys/time.h>
**>
**> extern "C" {
**>
**> SEXP matvecMultiply(SEXP A, SEXP x) {
**> timeval time1, time2;
**> gettimeofday(&time1, NULL);
**> SEXP execTime_sxp;
**> PROTECT(execTime_sxp = allocVector(REALSXP, 1));
**> double *execTime; execTime = REAL(execTime_sxp);
**> double *rA = REAL(A), *rx = REAL(x), *ry;
**> int n = length(x);
**> int m = length(A) / n;
**> SEXP y;
**> PROTECT(y = allocVector(REALSXP, m));
**> ry = REAL(y);
**> for (int i = 0; i < m; i++) {
**> ry[i] = 0.0;
**> for (int j = 0; j < n; j++) {
**> ry[i] += rA[j * m + i] * rx[j];
**> }
**> }
**> UNPROTECT(1);
**> gettimeofday(&time2, NULL);
**> *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
**> time1.tv_usec)/1000000.0;
**> UNPROTECT(1);
**> return execTime_sxp;
**> }
**>
**> void matvecMultiply2(double *A, double *x, double *y, int *m_ptr, int
**> *n_ptr, double *execTime) {
**> timeval time1, time2;
**> gettimeofday(&time1, NULL);
**> int m = *m_ptr;
**> int n = *n_ptr;
**> for (int i = 0; i < m; i++) {
**> y[i] = 0.0;
**> for (int j = 0; j < n; j++) {
**> y[i] += A[j * m + i] * x[j];
**> }
**> }
**> gettimeofday(&time2, NULL);
**> *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
**> time1.tv_usec)/1000000.0;
**> }
**>
**> }
**> --------------------------------
**>
**>
**> --
**> View this message in context: http://r.789695.n4.nabble.com/Measuring-and-comparing-C-and-Call-overhead-tp3678361p3678361.html
**> Sent from the R devel mailing list archive at Nabble.com.
**>
**> ______________________________________________
**> R-devel_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-devel
**>
**>
*

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 19 Jul 2011 - 18:50:13 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

*
Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.
Archive generated by hypermail 2.2.0, at Tue 19 Jul 2011 - 20:00:10 GMT.
*

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*