Re: [R] Efficient way to find consecutive integers in vector?

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Sat, 22 Dec 2007 12:29:05 +0100

>>>>> "TP" == Tony Plate <tplate_at_acm.org> >>>>> on Fri, 21 Dec 2007 18:17:18 -0700 writes:

    TP> Martin Maechler wrote:
    >>>>>>> "MS" == Marc Schwartz <marc_schwartz_at_comcast.net>
    >>>>>>> on Thu, 20 Dec 2007 16:33:54 -0600 writes:

>>

    MS> On Thu, 2007-12-20 at 22:43 +0100, Johannes Graumann wrote:
>> >> Hi all,
>> >>
>> >> Does anybody have a magic trick handy to isolate directly consecutive
>> >> integers from something like this:
>> >> c(1,2,3,4,7,8,9,10,12,13)
>> >>
>> >> The result should be, that groups 1-4, 7-10 and 12-13 are consecutive
>> >> integers ...
>> >>
>> >> Thanks for any hints, Joh
>>
    MS> Not fully tested, but here is one possible approach:
>>
>> >> Vec

    MS> [1] 1 2 3 4 7 8 9 10 12 13
>>

    MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
>>
>> >> Breaks

    MS> [1] 0 4 8 10
>>
>> >> sapply(seq(length(Breaks) - 1),
    MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
    MS> [[1]]
    MS> [1] 1 2 3 4

>>

    MS> [[2]]
    MS> [1] 7 8 9 10
>>

    MS> [[3]]
    MS> [1] 12 13
>>
>>
>>

    MS> For a quick test, I tried it on another vector:
>>
>>

    MS> set.seed(1)
    MS> Vec <- sort(sample(20, 15))
>>
>> >> Vec

    MS> [1] 1 2 3 4 5 6 8 9 10 11 14 15 16 19 20
>>

    MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
>>
>> >> Breaks

    MS> [1] 0 6 10 13 15
>>
>> >> sapply(seq(length(Breaks) - 1),
    MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
    MS> [[1]]
    MS> [1] 1 2 3 4 5 6

>>

    MS> [[2]]
    MS> [1] 8 9 10 11
>>

    MS> [[3]]
    MS> [1] 14 15 16
>>

    MS> [[4]]
    MS> [1] 19 20
>>
>> Seems ok, but ``only works for increasing sequences''.
>> More than 12 years ago, I had encountered the same problem and
>> solved it like this:
>>
>> In package 'sfsmisc', there has been the function inv.seq(),
>> named for "inversion of seq()",
>> which does this too, currently returning an expression,
>> but returning a call in the development version of sfsmisc:
>>
>> Its definition is currently
>>
>> inv.seq <- function(i) {
>> ## Purpose: 'Inverse seq': Return a short expression for the 'index' `i'
>> ## --------------------------------------------------------------------
>> ## Arguments: i: vector of (usually increasing) integers.
>> ## --------------------------------------------------------------------
>> ## Author: Martin Maechler, Date: 3 Oct 95, 18:08
>> ## --------------------------------------------------------------------
>> ## EXAMPLES: cat(rr <- inv.seq(c(3:12, 20:24, 27, 30:33)),"\n"); eval(rr)
>> ## r2 <- inv.seq(c(20:13, 3:12, -1:-4, 27, 30:31)); eval(r2); r2
>> li <- length(i <- as.integer(i))
>> if(li == 0) return(expression(NULL))
>> else if(li == 1) return(as.expression(i))
>> ##-- now have: length(i) >= 2
>> di1 <- abs(diff(i)) == 1 #-- those are just simple sequences n1:n2 !
>> s1 <- i[!c(FALSE,di1)] # beginnings
>> s2 <- i[!c(di1,FALSE)] # endings
>>
>> ## using text & parse {cheap and dirty} :
>> mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
>> parse(text =
>> paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
>> srcfile = NULL)[[1]]
>> }
>>
>> with example code
>>
>> > v <- c(1:10,11,6,5,4,0,1)
>> > (iv <- inv.seq(v))
>> c(1:11, 6:4, 0:1)
>> > stopifnot(identical(eval(iv), as.integer(v)))
>> > iv[[2]]
>> 1:11
>> > str(iv)
>> language c(1:11, 6:4, 0:1)
>> > str(iv[[2]])
>> language 1:11
>> >
>>
>>
>> Now, given that this stems from 1995, I should be excused for
>> using parse(text = *) [see fortune(106) if you don't understand].
>>
>> However, doing this differently by constructing the resulting
>> language object directly {using substitute(), as.symbol(),
>> as.expression() ... etc}
>> seems not quite trivial.
>>
>> So here's the Friday afternoon / Christmas break quizz:
>>
>> What's the most elegant way
>> to replace the last statements in inv.seq()
>> ------------------------------------------------------------------------
>> ## using text & parse {cheap and dirty} :
>> mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
>> parse(text =
>> paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
>> srcfile = NULL)[[1]]
>> ------------------------------------------------------------------------
>>
>> by code that does not use parse (or source() or similar) ???
>>
>> I don't have an answer yet, at least not at all an elegant one.
>> And maybe, the solution to the quiz is that there is no elegant
>> solution.

    TP> How about this ? :

>> i <- c(1, 10, 12)
>> j <- c(5, 10, 14)
>> mkseq <- function(i, j) if (i==j) i else call(':', i, j)
>> as.call(c(list(as.name('c')), mapply(i, j, FUN=mkseq)))

Excellent, Tony!
That's just about what I had tried to do myself for half an hour and didn't get around to..

So, I'd say you've clearly won the quiz. Congratulations!

If you can think of an appropriate price, please say so. Otherwise, if we meet at the next useR! conference in Dortmund.. it will be a beer or something like that..

Martin

    TP> c(1:5, 10, 12:14)
>> eval(.Last.value)
    TP> [1] 1 2 3 4 5 10 12 13 14
>>

    TP> -- Tony Plate



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 22 Dec 2007 - 11:33:59 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 22 Dec 2007 - 14:00:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.