Re: [Rd] sequence(c(2, 0, 3)) produces surprising results, would (PR#9813)

From: <bill_at_insightful.com>
Date: Thu, 26 Jul 2007 20:38:41 +0200 (CEST)


On Thu, 26 Jul 2007 bill_at_insightful.com wrote:

> Full_Name: Bill Dunlap
> Version: 2.5.0
> OS: Linux
> Submission from: (NULL) (70.98.76.47)
>
> sequence(nvec) is documented to return
> the concatenation of seq(nvec[i]), for
> i in seq(along=nvec). This produces inconvenient
> (for me) results for 0 inputs.
> > sequence(c(2,0,3)) # would like 1 2 1 2 3, ignore 0
> [1] 1 2 1 0 1 2 3
> Would changing sequence(nvec) to use seq_len(nvec[i])
> instead of the current 1:nvec[i] break much existing code?
>
> On the other hand, almost no one seems to use sequence()
> and it might make more sense to allow seq_len() and seq()
> to accept a vector for length.out and they would return a
> vector of length sum(length.out),
> c(seq_len(length.out[1]), seq_len(length.out[2]), ...)

seq_len() could be changed to do that with the following code change. It does slow down seq_len in the scalar case

                             old time    new time
for(i in 1:1e6)seq_len(2)    1.251       1.516
for(i in 1:1e6)seq_len(20)   1.690       1.990
for(i in 1:1e6)seq_len(200)  5.480       5.860

It becomes much faster than sequence in the vectorized case.

   > unix.time(for(i in 1:1e4)sequence(20:1))

      user  system elapsed
     1.550   0.000   1.557
   > unix.time(for(i in 1:1e4)seq_len(20:1))
      user  system elapsed
     0.070   0.000   0.066

   > identical(sequence(20:1), seq_len(20:1))    [1] TRUE
My problem cases are where the length.out vector is long and contains small integers (e.g., the output of table on a vector of mostly unique values).

Index: src/main/seq.c


 SEXP attribute_hidden do_seq_len(SEXP call, SEXP op, SEXP args, SEXP rho)  {
- SEXP ans;
- int i, len, *p;
+ SEXP ans, slengths;
+ int i, *p, anslen, *lens, nlens, ilen, nprotected=0 ;

     checkArity(op, args);

-    len = asInteger(CAR(args));
-    if(len == NA_INTEGER || len < 0)
-	errorcall(call, _("argument must be non-negative"));
-    ans = allocVector(INTSXP, len);
+    slengths = CAR(args);
+    if (TYPEOF(slengths) != INTSXP) {
+    	PROTECT(slengths = coerceVector(CAR(args), INTSXP));
+        nprotected++;
+    }
+    lens = INTEGER(slengths);
+    nlens = LENGTH(slengths);
+    anslen = 0 ;
+    for(ilen=0;ilen<nlens;ilen++) {
+        int len = lens[ilen] ;
+        if(len == NA_INTEGER || len < 0)
+	    errorcall(call, _("argument must be non-negative"));
+        anslen += len ;

+ }
+ ans = allocVector(INTSXP, anslen);

     p = INTEGER(ans);
- for(i = 0; i < len; i++) p[i] = i+1; -

+    for(ilen=0;ilen<nlens;ilen++) {
+        int len = lens[ilen] ;
+        for(i = 0; i < len; i++) *p++ = i+1;
+    }
+    if(nprotected>0)
+        UNPROTECT(nprotected);
     return ans;

 }

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 26 Jul 2007 - 18:55:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 26 Jul 2007 - 19:36:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.