[Rd] Best practices for writing R functions (really copying)

From: Radford Neal <radford_at_cs.toronto.edu>
Date: Mon, 25 Jul 2011 11:53:23 -0400

Gabriel Becker writes:

  AFAIK R does not automatically copy function arguments. R actually tries   very hard to avoid copying while maintaining "pass by value" functionality.

  ... R only copies data when you modify an object, not   when you simply pass it to a function.

This is a bit misleading. R tries to avoid copying by maintaining a count of how many references there are to an object, so that x[i] <- 9 can be done without a copy if x is the only reference to the vector. However, it never decrements such counts. As a result, simply passing x to a function that accesses but does not change it will result in x being copied if x[i] is changed after that function returns. An exception is that this usually isn't the case if x is passed to a primitive function. But note that not all standard functions are technically "primitive".

The end result is that it's rather difficult to tell when copying will be done. Try the following test, for example:

  cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } ))
  cat("b: "); print(system.time( { A[1,1]<-7; 0 } ))
  cat("c: "); print(system.time( { B <- sqrt(A); 0 } ))
  cat("d: "); print(system.time( { A[1,1]<-7; 0 } ))
  cat("e: "); print(system.time( { B <- t(A); 0 } ))
  cat("f: "); print(system.time( { A[1,1]<-7; 0 } ))
  cat("g: "); print(system.time( { A[1,1]<-7; 0 } ))

You'll find that the time printed after b:, d:, and g: is near zero, but that there is non-negligible time for f:. This is because sqrt is primitive but t is not, so the modification to A after the call t(A) requires that a copy be made.

   Radford Neal

