Re: [R] using lapply

From: Phil Spector <spector_at_stat.berkeley.edu>
Date: Thu, 10 Mar 2011 09:51:53 -0800 (PST)

To add to William's remarks, another advantage of the apply family of functions is that they avoid growing an object inside a loop, which is very inefficient in R. In other words, without the *apply functions, users might do something like this:

answer = NULL
for(i in 1:nrows)

    answer = rbind(answer,calculateanewrow(i))

or

answer = NULL
for(i in 1:n)

    answer = c(answer,newcalculation(i))

So in addition to making your program easier to understand (which is a huge advantange in and of itself), they also help you avoid a programming paradigm that's very inefficient in R:

> mat = matrix(abs(rnorm(10000)),1000,10) > system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})

    user system elapsed
   0.052 0.020 0.072
> system.time({answer1 = t(apply(mat,1,log))})

    user system elapsed
   0.012 0.000 0.012
> all.equal(answer,answer1)
[1] TRUE That's a speedup of a factor of 6, which gets even bigger as the size of the object increases:

> mat = matrix(abs(rnorm(100000)),10000,10) > system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})

    user system elapsed
   5.960 1.524 7.505
> system.time({answer1 = t(apply(mat,1,log))})

    user system elapsed
   0.120 0.004 0.123
> all.equal(answer,answer1)
[1] TRUE Now it's a speedup of 60 -- essentially an O(n^2) algorithm competing with an O(n) algorithm.

The lack of scalability of this paradigm often leads new users to believe that R can't handle large problems. Learning to use the apply family of functions from the start avoids this misconception.

On Thu, 10 Mar 2011, William Dunlap wrote:

>> -----Original Message-----
>> From: r-help-bounces_at_r-project.org
>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of
>> rex.dwyer@syngenta.com
>> Sent: Thursday, March 10, 2011 8:47 AM
>> To: ligges_at_statistik.tu-dortmund.de; arun.kumar.saha_at_gmail.com
>> Cc: r-help_at_r-project.org
>> Subject: Re: [R] using lapply
>>
>> But no one answered Kushan's question about performance
>> implications of for-loop vs lapply.
>> With apologies to George Orwell:
>> "for-loops BAAAAAAD, no loops GOOOOOOD."

>
> While using no loops is faster, lapply has
> a loop in it and isn't much different in
> speed from the equvialent for loop.  The big
> advantage of the *apply functions is that
> they can make your code easier to understand.
> Here are some times for various ways of computing
> log(1:1000000).  This example is probably close
> to a worst-case scenario for the for loop, since
> the time is dominated by the [<- operation.
> Using the various *apply functions can get you a
> speed-up of c. 4x, which is nice, but the vectorized
> log gives a speed-up of c. 15x over the fastest of
> the loops.  I think the for-loop method is ungainly
> because it obscures to flow of the data, but there is
> no accounting for taste.
>
>  > system.time({ val.for <- numeric(1e6);for(i in
> seq_len(1e6))val.for[i]<-log(i)})
>     user  system elapsed
>     7.03    0.02    7.19
>  > system.time({ val.sapply <- sapply(seq_len(1e6), log) })
>     user  system elapsed
>     6.59    0.03    6.80
>  > system.time({ val.lapply <- unlist(lapply(seq_len(1e6), log)) })
>     user  system elapsed
>     2.48    0.00    2.52
>  > system.time({ val.vapply <- vapply(seq_len(1e6), log, FUN.VALUE=0)
> })
>     user  system elapsed
>     1.74    0.00    1.76
>  > system.time({ val.log <- log(seq_len(1e6)) })
>     user  system elapsed
>     0.12    0.00    0.12
>  > identical(val.vapply,val.sapply) && identical(val.vapply,val.for) &&
> identical(val.vapply,val.lapply) && identical(val.vapply,val.log)
>  [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>

>>
>> -----Original Message-----
>> From: r-help-bounces_at_r-project.org
>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Uwe Ligges
>> Sent: Thursday, March 10, 2011 4:38 AM
>> To: Arun Kumar Saha
>> Cc: r-help_at_r-project.org
>> Subject: Re: [R] using lapply
>>
>>
>>
>> On 10.03.2011 08:30, Arun Kumar Saha wrote:
>>> On reply to the post
>>> http://r.789695.n4.nabble.com/using-lapply-td3345268.html
>>
>> Hmmm, can you please reply to the original post and quote it?
>> You mail was not recognized to be in the same thread as the message of
>> the original poster (and hence I wasted time to answer it again).
>>
>> Thanks,
>> Uwe Ligges
>>
>>
>>
>>
>>> Dear Kushan, this may be a good start:
>>>
>>> ## assuming 'instr.list' is your list object and you are applying
>>> my.strat() function on each element of that list, you can use lapply
>>> function as
>>> lapply(instr.list, function(x) return(my.strat(x)))
>>>
>>> Here resulting element will again be another list with
>> length is same as the
>>> length of your original list 'instr.list.'
>>>
>>> Instead if the returned object for my.strat() function is a
>> single number
>>> then you might want to create a vector instead list, in
>> that case just use
>>> 'sapply'
>>>
>>> HTH
>>>
>>> Arun,
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> message may contain confidential information. If you are not
>> the designated recipient, please notify the sender
>> immediately, and delete the original and any copies. Any use
>> of the message by you is prohibited.
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 10 Mar 2011 - 17:54:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 Mar 2011 - 18:00:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive