From: Paul Gilbert <pgilbert_at_bank-banque-canada.ca>

Date: Fri 28 Jan 2005 - 06:53:17 EST

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Jan 28 06:08:23 2005

Date: Fri 28 Jan 2005 - 06:53:17 EST

> z <- matrix(rnorm(20000),10000,2)

> system.time(for (i in 1:1000) apply(z,2,sum)) [1] 13.44 0.48 14.08 0.00 0.00

> system.time(for (i in 1:1000) rep(1,10000) %*% z) [1] 6.46 0.11 6.84 0.00 0.00

which seemed completely contrary to all my childhood teachings. Now

> system.time(for (i in 1:1000) crossprod(rep(1,10000), z)) [1] 1.90 0.12 2.24 0.00 0.00

makes sense because it is suppose to be faster than %*% , but why is
apply so slow?

(And should I go back and change apply in my code everywhere or is this
likely to reverse again?)

Paul Gilbert

> The following is at least as much out of intellectual curiosity

*> as for practical reasons.
**> On reviewing some code written by novices to R, I came
**> across:
**>
**> crossprod(x, y)[1,1]
**>
**> I thought, "That isn't a very S way of saying that, I wonder
**> what the penalty is for using 'crossprod'." To my surprise the
**> penalty was substantially negative. Handily the client had S-PLUS
**> as well -- there the sign of the penalty was as I had expected, but
**> the order of magnitude was off.
**>
**> Here are the timings of 1 million computations on vectors of
**> length 1000. This is under Windows, R version 1.9.1 and S-PLUS
**> 6.2 (on the same machine).
**>
**> Command R S-PLUS
**> sum(x * y) 28.61 97.6
**> crossprod(x, y)[1,1] 6.77 2256.2
**>
**>
**> Another example is when computing the sums of the columns of a
**> matrix. For example:
**>
**> set.seed(1)
**> jjm <- matrix(rnorm(600), 5)
**>
**> Timings for this under Windows 2000 with R version 2.0.1 (on an
**> old chip running at about 0.7Ghz) for 100,000 computations are:
**>
**> apply(jjm, 2, sum) 536.59
**> colSums(jjm) 18.26
**> rep(1,5) %*% jjm 15.41
**> crossprod(rep(1,5), jjm) 13.16
**>
**> (These timings seem to be stable across R versions and on at least
**> one Linux platform.)
**>
**> Andy Liaw showed another example of 'crossprod' being fast a couple
**> days ago on R-help.
**>
**> Questions for those with a more global picture of the code:
**>
**> * Is the speed advantage of 'crossprod' inherent, or is it because
**> more care has been taken with its implementation than the other
**> functions?
**>
**> * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
**> going to BLAS while 'sum' can't?
**>
**> * Would it make sense to (essentially) use 'crossprod' in
**> 'colSums' and its friends at least for the special case of matrices?
**>
**> Patrick Burns
**>
**> Burns Statistics
**> patrick@burns-stat.com
**> +44 (0)20 8525 0696
**> http://www.burns-stat.com
**> (home of S Poetry and "A Guide for the Unwilling S User")
**>
**> ______________________________________________
**> R-devel@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-devel
**>
*

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Jan 28 06:08:23 2005

*
This archive was generated by hypermail 2.1.8
: Fri 18 Mar 2005 - 09:02:42 EST
*