[Rd] Measuring and comparing .C and .Call overhead

From: Alireza Mahani <alireza.s.mahani_at_gmail.com>
Date: Tue, 19 Jul 2011 08:07:35 -0700 (PDT)

Further pursuing my curiosity to measure the efficiency of R/C++ interface, I conducted a simple matrix-vector multiplication test using .C and .Call functions in R. In each case, I measured the execution time in R, as well as inside the C++ function. Subtracting the two, I came up with a measure of overhead associated with each call. I assume that this overhead would be non-existent of the entire code was implemented in C++. [Please let me know if my approach is somehow faulty.] The results of this simple test have significant implications for my software development approach. I hope others find this information useful as well. I have attached the code files in the bottom. Below I provide a summary output, and interpret the results. [All times in table below are in microseconds]



1- .Call overhead holds nearly constant, i.e. independent of data size. This is expected since .Call works by passing pointers. (How can we explain the slight increase in overhead?)
2- C++ times for .C are somewhat better than .Call. This is likely to be due to the overhead associated with unpacking the SEXP pointers in a .Call function.
3- The overhead for .C dominates the execution time. For a 500x20 matrix, the overhead is ~90% of total time. This means that whenever we need to make repeated calls to a C/C++ function from R, and when performance is important to us, .Call is much preferred to .C, even at modest data sizes. 4- Overhead for .C scales sub-linearly with data size. I imagine that this overhead can be reduced through registering the functions and using the style field in R_CMethodDef to optimize data transfer (per Section 5.4 of "Writing R Extensions"), but perhaps not by more than a half. 5- Even using .Call, the overhead is still a significant percentage of total time, though the contribution decreases with data size (and compute load of function). However, in the context of parallelization benefits, this overhead puts an unpleasant cap on achievable gains. For example, even if the compute time goes down by 100x, if overhead was even 10% of the time, our effective gain will saturate at 10x. In other words, effective parallelization can only be achieved by moving big chunks of the code to C/C++ so as to avoid repeated calls from R to C.

I look forward to your comments.


#code file: mvMultiply.r
#measuring execution time for a matrix-vector multiplication (A%*%x) using
.C and .Call functions
#execution time is measured both in R and inside the C++ functions;
subtracting the two is
#used as an indicator of overhead associated with each call

rm(list = ls())
system("R CMD SHLIB mvMultiply.cc")

m <- 100 #number of rows in matrix A
n <- 10 #number of columns in matrix A (= number of rows in vector x)
N <- 1000000

A <- runif(m*n)
x <- runif(n)

# measuring .Call

tCall_c <- 0.0
t1 <- proc.time()[3]
for (i in 1:N) {

        tCall_c <- tCall_c + .Call("matvecMultiply", as.double(A), as.double(x)) }
tCall_R <- proc.time()[3] - t1

cat(".Call - Time measured in R: ", round(tCall_R,2), "sec\n")
cat(".Call - Time measured in C++: ", round(tCall_c,2), "sec\n")
cat(".Call - Implied overhead: ", round(tCall_R,2) - round(tCall_c,2), "sec 
-> per call: ", 1000000*(round(tCall_R,2) - round(tCall_c,2))/N, "usec\n")

# measuring .C

tC_c <- 0.0
t1 <- proc.time()[3]
for (i in 1:N) {

        tC_c <- tC_c + .C("matvecMultiply2", as.double(A), as.double(x), double(c(m)), as.integer(c(m)), as.integer(c(n)), t = double(1))$t }
tC_R <- proc.time()[3] - t1

cat(".C - Time measured in R: ", round(tC_R,2), "sec\n")
cat(".C - Time measured in C++: ", round(tC_c,2), "sec\n")
cat(".C - Implied overhead: ", round(tC_R,2) - round(tC_c,2), "sec  ->  per
call: ", 1000000*(round(tC_R,2) - round(tC_c,2))/N, "usec\n")


#code file: myMultiply.cc
#include <Rinternals.h>
#include <R.h>
#include &lt;sys/time.h&gt;

extern "C" {

SEXP matvecMultiply(SEXP A, SEXP x) {

	timeval time1, time2;
	gettimeofday(&time1, NULL);
	SEXP execTime_sxp;
	PROTECT(execTime_sxp = allocVector(REALSXP, 1));
	double *execTime; execTime = REAL(execTime_sxp);
	double *rA = REAL(A), *rx = REAL(x), *ry;
	int n = length(x);
	int m = length(A) / n;
	SEXP y;
	PROTECT(y = allocVector(REALSXP, m));
	ry = REAL(y);
	for (int i = 0; i < m; i++) {
		ry[i] = 0.0;
		for (int j = 0; j < n; j++) {
			ry[i] += rA[j * m + i] * rx[j];
	gettimeofday(&time2, NULL);
	*execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
	return execTime_sxp;


void matvecMultiply2(double *A, double *x, double *y, int *m_ptr, int *n_ptr, double *execTime) {

	timeval time1, time2;
	gettimeofday(&time1, NULL);
	int m = *m_ptr;
	int n = *n_ptr;
	for (int i = 0; i < m; i++) {
		y[i] = 0.0;
		for (int j = 0; j < n; j++) {
			y[i] += A[j * m + i] * x[j];
	gettimeofday(&time2, NULL);
	*execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -


View this message in context: http://r.789695.n4.nabble.com/Measuring-and-comparing-C-and-Call-overhead-tp3678361p3678361.html
Sent from the R devel mailing list archive at Nabble.com.

R-devel_at_r-project.org mailing list
Received on Tue 19 Jul 2011 - 15:09:26 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 19 Jul 2011 - 19:20:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive