Re: [Rd] faster base::sequence

From: Romain Francois <romain_at_r-enthusiasts.com>
Date: Sun, 28 Nov 2010 10:56:47 +0100

Le 28/11/10 10:30, Prof Brian Ripley a écrit :
> Is sequence used enough to warrant this? As the help page says
>
> Note that ‘sequence <- function(nvec) unlist(lapply(nvec,
> seq_len))’ and it mainly exists in reverence to the very early
> history of R.

I don't know. Would it be used more if it were more efficient ?

> I regard it as unsafe to assume that NA_INTEGER will always be negative,
> and bear in mind that at some point not so far off R integers (or at
> least lengths) will need to be more than 32-bit.

sure. updated and dressed up as a patch.

I've made it a .Call because I'm not really comfortable with .Internal, etc ...

Do you mean that I should also use something else instead of "int" and "int*". Is there some future proof typedef or macro for the type associated with INTSXP ?

> On Sun, 28 Nov 2010, Romain Francois wrote:
>
>> Hello,
>>
>> Based on yesterday's R-help thread (help: program efficiency), and
>> following Bill's suggestions, it appeared that sequence:
>>
>>> sequence
>> function (nvec)
>> unlist(lapply(nvec, seq_len))
>> <environment: namespace:base>
>>
>> could benefit from being written in C to avoid unnecessary memory
>> allocations.
>>
>> I made this version using inline:
>>
>> require( inline )
>> sequence_c <- local( {
>> fx <- cfunction( signature( x = "integer"), '
>> int n = length(x) ;
>> int* px = INTEGER(x) ;
>> int x_i, s = 0 ;
>> /* error checking */
>> for( int i=0; i<n; i++){
>> x_i = px[i] ;
>> /* this includes the check for NA */
>> if( x_i <= 0 ) error( "needs non negative integer" ) ;
>> s += x_i ;
>> }
>>
>> SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
>> int * p_res = INTEGER(res) ;
>> for( int i=0; i<n; i++){
>> x_i = px[i] ;
>> for( int j=0; j<x_i; j++, p_res++)
>> *p_res = j+1 ;
>> }
>> UNPROTECT(1) ;
>> return res ;
>> ' )
>> function( nvec ){
>> fx( as.integer(nvec) )
>> }
>> })
>>
>>
>> And here are some timings:
>>
>>> x <- 1:10000
>>> system.time( a <- sequence(x ) )
>> utilisateur système écoulé
>> 0.191 0.108 0.298
>>> system.time( b <- sequence_c(x ) )
>> utilisateur système écoulé
>> 0.060 0.063 0.122
>>> identical( a, b )
>> [1] TRUE
>>
>>
>>
>>> system.time( for( i in 1:10000) sequence(1:10) )
>> utilisateur système écoulé
>> 0.119 0.000 0.119
>>>
>>> system.time( for( i in 1:10000) sequence_c(1:10) )
>> utilisateur système écoulé
>> 0.019 0.000 0.019
>>
>>
>> I would write a proper patch if someone from R-core is willing to push
>> it.
>>
>> Romain

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube



______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Received on Sun 28 Nov 2010 - 09:59:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 17 Dec 2010 - 14:50:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive