# Re: [R] algorithm help

From: Rainer M Krug <r.m.krug_at_gmail.com>
Date: Fri, 07 Jan 2011 09:58:59 +0100

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/06/2011 11:57 PM, (Ted Harding) wrote:

```> On 06-Jan-11 22:16:38, array chip wrote:
```

>> Hi, I am seeking help on designing an algorithm to identify the
>> locations of stretches of 1s in a vector of 0s and 1s. Below is
>> an simple example:
>>
>>> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
>> ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
>>
>>> dat
>> a b
>> 1 0 4
>> 2 0 12
>> 3 1 13
>> 4 1 16
>> 5 1 18
>> 6 1 20
>> 7 0 28
>> 8 0 30
>> 9 1 34
>> 10 1 46
>> 11 0 47
>> 12 1 49
>> 13 1 61
>> 14 1 73
>> 15 1 77
>> 16 0 84
>> 17 0 87
>> 18 0 90
>> 19 0 95
>> 20 1 97
>>
>> In this dataset, "b" is sorted and denotes the location for each
>> number in "a".
>> So I would like to find the starting & ending locations for each
>> stretch of 1s within "a", also counting the number of 1s in each
>> stretch as well.
>> Hope the results from the algorithm would be:
>>
>> stretch start end No.of.1s
>> 1 13 20 4
>> 2 34 46 2
>> 3 49 77 4
>> 4 97 97 1
>>
>> I can imagine using for loops can do the job, but I feel it's not a
>> clever way to do this. Is there an efficient algorithm that can do
>> this fast?
>>
>> Thanks for any suggestions.
>> John
```>
> The basic information you need can be got using rle() ("run length
> encoding"). See '?rle'. In your example:
>
>   rle(dat\$a)
>   # Run Length Encoding
>   #   lengths: int [1:8] 2 4 2 2 1 4 4 1
>   #   values : num [1:8] 0 1 0 1 0 1 0 1
>   ## Note: F -> 0, T -> 1
>
> The following has a somewhat twisted logic at the end, and may
> be flawed, but you can probably adapt it!
>
>   L <- rle(dat\$a)\$lengths
>   V <- rle(dat\$a)\$values
>   pos <- c(1,cumsum(L))
>   V1 <- c(-1,V)
>   1+pos[V1==0]
>   # [1]  3  9 12 20
>   ## Positions in the series dat\$a where each run of "T" (i.e. 1)
>   ##   starts

```

A different approach would be to use the diff() function:

Where

is not equal 0, the value is changing from 0 to 1 or one to 0. The indices of the first new value can be found by:

> which(diff(dat\$a)!=0) + 1
[1] 3 7 9 11 12 16 20

where it is changing from 0 to 1 is at

> which(diff(dat\$a)==1) + 1
[1] 3 9 12 20

where it is changing from 1 to 0 is at

> which(diff(dat\$a)==-1) + 1
[1] 7 11 16

By taking into consideration if the first value and the last values are 0 or 1, you can calculate the length.

Cheers,

Rainer

```>
> Hoping this helps,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding_at_wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 06-Jan-11                                       Time: 22:57:44
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.

```
• -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

```Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      Rainer_at_krugs.de

Skype:      RMkrug
```

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0m1dMACgkQoYgNqgF2egoQbACcCB3iFQ6SKYfL4KVX8AMAN9Gp 1awAn0Z+8KXnOmwCLu61gihc8xZIT++j
=O+xA
-----END PGP SIGNATURE-----

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Jan 2011 - 09:02:55 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 11 Jan 2011 - 12:50:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.