Re: [R] Computing sums of the columns of an array

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Sat 06 Aug 2005 - 02:55:06 EST

On 8/5/2005 12:43 PM, Uwe Ligges wrote:

> Duncan Murdoch wrote:
> 
>> On 8/5/2005 12:16 PM, Martin C. Martin wrote:
>> 

>>>Hi,
>>>
>>>I have a 5x731 array A, and I want to compute the sums of the columns.
>>>Currently I do:
>>>
>>>apply(A, 2, sum)
>>>
>>>But it turns out, this is slow: 70% of my CPU time is spent here, even
>>>though there are many complicated steps in my computation.
>>>
>>>Is there a faster way?
>> 
>> 
>> You'd probably do better with matrix multiplication:
>> 
>> rep(1, nrow(A)) %*% A
> 
> 
> No, better use colSums(), which has been optimized for this purpose:
> 
>   A <- matrix(seq(1, 10000000), ncol=10000)
>   system.time(colSums(A))
>   # ~ 0.1 sec.
>   system.time(rep(1, nrow(A)) %*% A)
>   # ~ 0.5 sec.

I didn't claim my solution was the best, only better. :-)

One point of interest: I think your example exaggerates the difference by using a matrix of integers. On my machine I get a ratio something like yours with the same example

 > A <- matrix(seq(1, 10000000), ncol=10000)  > system.time(colSums(A))
[1] 0.08 0.00 0.08 NA NA
 > system.time(rep(1, nrow(A)) %*% A)
[1] 0.25 0.01 0.23 NA NA

but if I make A floating point, there's much less difference:

 > A <- matrix(as.numeric(seq(1, 10000000)), ncol=10000)  > system.time(colSums(A))
[1] 0.09 0.00 0.09 NA NA
 > system.time(rep(1, nrow(A)) %*% A)
[1] 0.11 0.00 0.12 NA NA

Still, colSums is the winner in both cases.

Duncan Murdoch



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Aug 06 03:07:38 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:39:44 EST