From: Douglas Bates <bates_at_stat.wisc.edu>

Date: Tue, 19 Jul 2011 10:09:11 -0500

>> (I am using a LINUX machine)

*>>
*

*>> Jeff,
*

*>>
*

*>> In creating reproducible results, I 'partially' answered my question. I have
*

*>> attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
*

*>> files into your chosen directory, then run 'Rscript mvMultiply.r' in that
*

*>> directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
*

*>> 'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
*

*>> verify that the two methods produce the same output vector.)
*

*>>
*

*>> Below are the results that I get, along with discussion (tR and tCall are in
*

*>> sec):
*

*>>
*

*>> INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
*

*>> F,F,13.536,13.875
*

*>> F,T,13.824,14.299
*

*>> T,F,13.688,18.167
*

*>> T,T,13.982,30.730
*

*>>
*

*>> Interpretation: The execution time for the .Call line is nearly identical to
*

*>> the call to R operator '%*%'. Two data preparation lines for the .Call
*

*>> method create the overhead:
*

*>>
*

*>> A <- t(A) (~12sec, or 12usec per call)
*

*>> dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)
*

*>>
*

*>> While the first line can be avoided by providing options in c++ function (as
*

*>> is done in the BLAS API), I wonder if the second line can be avoided, aside
*

*>> from the obvious option of rewriting the R scripts to use vectors instead of
*

*>> matrices. But this defies one of the primary advantages of using R, which is
*

*>> succinct, high-level coding. In particular, if one has several matrices as
*

*>> input into a .Call function, then the overhead from matrix-to-vector
*

*>> transformations can add up. To summarize, my questions are:
*

*>>
*

*>> 1- Do the above results seem reasonable to you? Is there a similar penalty
*

*>> in R's '%*%' operator for transforming matrices to vectors before calling
*

*>> BLAS functions?
*

*>> 2- Are there techniques for reducing the above overhead for developers
*

*>> looking to augment their R code with compiled code?
*

*>>
*

*>> Regards,
*

*>> Alireza
*

*>>
*

*>> ---------------------------------------
*

*>> # mvMultiply.r
*

*>> # comparing performance of matrix multiplication in R (using '%*%' operator)
*

*>> vs. calling compiled code (using .Call function)
*

*>> # y [m x 1] = A [m x n] %*% x [n x 1]
*

*>>
*

*>> rm(list = ls())
*

*>> system("R CMD SHLIB mvMultiply.cc")
*

*>> dyn.load("mvMultiply.so")
*

*>>
*

*>> INCLUDE_DATAPREP <- F
*

*>> ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or
*

*>> column-major fashion
*

*>>
*

*>> m <- 100
*

*>> n <- 10
*

*>> N <- 1000000
*

*>>
*

*>> diffVec <- array(0, dim = N)
*

*>> tR <- 0.0
*

*>> tCall <- 0.0
*

*>> for (i in 1:N) {
*

*>>
*

*>> A <- runif(m*n); dim(A) <- c(m,n)
*

*>> x <- runif(n)
*

*>>
*

*>> t1 <- proc.time()[3]
*

*>> y1 <- A %*% x
*

*>> tR <- tR + proc.time()[3] - t1
*

*>>
*

*>> if (INCLUDE_DATAPREP) {
*

*>> t1 <- proc.time()[3]
*

*>> }
*

*>> if (ROWMAJOR) { #since R will convert matrix to vector in a column-major
*

*>> fashion, if the c++ function expects a row-major format, we need to
*

*>> transpose A before converting it to a vector
*

*>> A <- t(A)
*

*>> }
*

*>> dim(A) <- dim(A)[1] * dim(A)[2]
*

*>> if (!INCLUDE_DATAPREP) {
*

*>> t1 <- proc.time()[3]
*

*>> }
*

*>> y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
*

*>> as.logical(c(ROWMAJOR)))
*

*>> tCall <- tCall + proc.time()[3] - t1
*

*>>
*

*>> diffVec[i] <- max(abs(y2 - y1))
*

*>> }
*

*>> cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
*

*>> cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
*

*>> cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
*

*>> cat("Time - Using '.Call' function: ", tCall, "sec\n")
*

*>> cat("Maximum difference between methods: ", max(diffVec), "\n")
*

*>>
*

*>> dyn.unload("mvMultiply.so")
*

*>> ---------------------------------------
*

*>> # mvMultiply.cc
*

*>> #include <Rinternals.h>
*

*>> #include <R.h>
*

*>>
*

*>> extern "C" {
*

*>>
*

*>> SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
*

*>> double *rA = REAL(A), *rx = REAL(x), *ry;
*

*>> int *rrm = LOGICAL(rowmajor);
*

*>> int n = length(x);
*

*>> int m = length(A) / n;
*

*>> SEXP y;
*

*>> PROTECT(y = allocVector(REALSXP, m));
*

*>> ry = REAL(y);
*

*>> for (int i = 0; i < m; i++) {
*

*>> ry[i] = 0.0;
*

*>> for (int j = 0; j < n; j++) {
*

*>> if (rrm[0] == 1) {
*

*>> ry[i] += rA[i * n + j] * rx[j];
*

*>> } else {
*

*>> ry[i] += rA[j * m + i] * rx[j];
*

*>> }
*

*>> }
*

*>> }
*

*>> UNPROTECT(1);
*

*>> return(y);
*

*>> }
*

*>>
*

*>> }
*

*>>
*

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 19 Jul 2011 - 15:47:51 GMT

Date: Tue, 19 Jul 2011 10:09:11 -0500

I just saw that I left a syntax error in the .R and the first _Rout.txt files. Notice that in the second _Rout.txt file the order of the arguments in the constructors for the MMatrixXd and the MVectorXd are in a different order than in the .R and the first _Rout.txt files. The correct order has the pointer first, then the dimensions. For the first _Rout.txt file this part of the code is not used.

On Tue, Jul 19, 2011 at 10:00 AM, Douglas Bates <bates_at_stat.wisc.edu> wrote:

> On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani > <alireza.s.mahani_at_gmail.com> wrote:

>> (I am using a LINUX machine)

> > I realize that you are just beginning to use the .C and .Call > interfaces and your example is therefore a simple one. However, if > you plan to continue with such development it is worthwhile learning > of some of the tools available. I think one of the most important is > the "inline" package that can take a C or C++ code segment as a text > string and go through all the steps of creating and loading a > .Call'able compiled function. > > Second, if you are going to use C++ (the code you show could be C code > as it doesn't use any C++ extensions) then you should look at the Rcpp > package written by Dirk Eddelbuettel and Romain Francois which allows > for comparatively painless interfacing of R objects and C++ objects. > The Rcpp-devel list, which I have copied on this reply, is for > questions related to that system. The inline package allows for > various "plugin" constructions to wrap your code in the appropriate > headers and point the compiler to the locations of header files and > libraries. There are two extensions to Rcpp for numerical linear > algebra in C++, RcppArmadillo and RcppEigen. I show the use of > RcppEigen here. > > Third there are several packages in R that do the busy work of > benchmarking expressions and neatly formulating the results. I use > the rbenchmark package. > > Putting all these together yields the enclosed script and results. > > In Eigen, a MatrixXd object is the equivalent of R's numeric matrix > (similarly MatrixXi for integer and MatrixXcd for complex) and a > VectorXd object is the equivalent of a numeric vector. A "mapped" > matrix or vector is one that uses the storage allocated by R, thereby > avoiding a copy operation (similar to your accessing elements of the > arrays through the pointer returned by REAL()). To adhere to R's > functional programming semantics it is a good idea to declare such > objects as const. The 'as' and 'wrap' functions are provided by Rcpp > with extensions in RcppEigen to the Eigen classes in the development > version. In the released versions of Rcpp and RcppEigen we use > intermediate Rcpp objects. These functions have the advantage of > checking the types of R objects being passed. The Eigen code for > matrix multiplication will check the consistency of dimensions in the > operation. > > Given the size of the matrix you are working with it is not surprising > that interpretation overhead and checking will be a large part of the > elapsed time, hence the relative differences between different methods > of doing the numerical calculation will be small. The operation of > multiplying a 100 x 10 matrix by a 10-vector involves "only" 1000 > floating point operations. Furthermore, each element of the matrix is > used only once so sophisticated methods of manipulating cache contents > won't buy you much. These benchmark results are from a system that > uses Atlas BLAS (basic linear algebra subroutines); other systems will > provide different results. Interestingly, I found on some systems > using R's BLAS, which are not accelerated, the R code is closer in > speed to the code using Eigen. An example is given in the second > version of the output. > ______________________________________________R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 19 Jul 2011 - 15:47:51 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

*
Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.
Archive generated by hypermail 2.2.0, at Tue 19 Jul 2011 - 20:00:10 GMT.
*

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*