Phil Spector
Date: Thu, 10 Mar 2011

To add to William's remarks, another advantage of the apply family of functions is that they avoid growing an object inside a loop, which is very inefficient in R. In other words, without the *apply functions, users might do something like this:

answer = NULL
for(i in 1:nrows)

answer = rbind(answer,calculateanewrow(i))

or

answer = NULL
for(i in 1:n)

answer = c(answer,newcalculation(i))

So in addition to making your program easier to understand (which is a huge advantange in and of itself), they also help you avoid a programming paradigm that's very inefficient in R:

> mat = matrix(abs(rnorm(10000)),1000,10) > system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})

user system elapsed
0.052 0.020 0.072
> system.time({answer1 = t(apply(mat,1,log))})

user system elapsed
0.012 0.000 0.012
> all.equal(answer,answer1)
[1] TRUE That's a speedup of a factor of 6, which gets even bigger as the size of the object increases:

> mat = matrix(abs(rnorm(100000)),10000,10) > system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})

user system elapsed
5.960 1.524 7.505
> system.time({answer1 = t(apply(mat,1,log))})

user system elapsed
0.120 0.004 0.123
> all.equal(answer,answer1)
[1] TRUE Now it's a speedup of 60 -- essentially an O(n^2) algorithm competing with an O(n) algorithm.

The lack of scalability of this paradigm often leads new users to believe that R can't handle large problems. Learning to use the apply family of functions from the start avoids this misconception.

• Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector_at_stat.berkeley.edu

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.