From: Rainer M Krug <r.m.krug_at_gmail.com>

Date: Fri, 07 Jan 2011 09:58:59 +0100

>> Hi, I am seeking help on designing an algorithm to identify the

*>> locations of stretches of 1s in a vector of 0s and 1s. Below is
*

*>> an simple example:
*

*>>
*

*>>> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
*

*>> ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
*

*>>
*

*>>> dat
*

*>> a b
*

*>> 1 0 4
*

*>> 2 0 12
*

*>> 3 1 13
*

*>> 4 1 16
*

*>> 5 1 18
*

*>> 6 1 20
*

*>> 7 0 28
*

*>> 8 0 30
*

*>> 9 1 34
*

*>> 10 1 46
*

*>> 11 0 47
*

*>> 12 1 49
*

*>> 13 1 61
*

*>> 14 1 73
*

*>> 15 1 77
*

*>> 16 0 84
*

*>> 17 0 87
*

*>> 18 0 90
*

*>> 19 0 95
*

*>> 20 1 97
*

*>>
*

*>> In this dataset, "b" is sorted and denotes the location for each
*

*>> number in "a".
*

*>> So I would like to find the starting & ending locations for each
*

*>> stretch of 1s within "a", also counting the number of 1s in each
*

*>> stretch as well.
*

*>> Hope the results from the algorithm would be:
*

*>>
*

*>> stretch start end No.of.1s
*

*>> 1 13 20 4
*

*>> 2 34 46 2
*

*>> 3 49 77 4
*

*>> 4 97 97 1
*

*>>
*

*>> I can imagine using for loops can do the job, but I feel it's not a
*

*>> clever way to do this. Is there an efficient algorithm that can do
*

*>> this fast?
*

*>>
*

*>> Thanks for any suggestions.
*

*>> John
*

**-----BEGIN PGP SIGNATURE-----
**

Version: GnuPG v1.4.10 (GNU/Linux)

Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Jan 2011 - 09:02:55 GMT

Date: Fri, 07 Jan 2011 09:58:59 +0100

**-----BEGIN PGP SIGNED MESSAGE-----
**

Hash: SHA1

On 01/06/2011 11:57 PM, (Ted Harding) wrote:

> On 06-Jan-11 22:16:38, array chip wrote:

>> Hi, I am seeking help on designing an algorithm to identify the

> > The basic information you need can be got using rle() ("run length > encoding"). See '?rle'. In your example: > > rle(dat$a) > # Run Length Encoding > # lengths: int [1:8] 2 4 2 2 1 4 4 1 > # values : num [1:8] 0 1 0 1 0 1 0 1 > ## Note: F -> 0, T -> 1 > > The following has a somewhat twisted logic at the end, and may > be flawed, but you can probably adapt it! > > L <- rle(dat$a)$lengths > V <- rle(dat$a)$values > pos <- c(1,cumsum(L)) > V1 <- c(-1,V) > 1+pos[V1==0] > # [1] 3 9 12 20 > ## Positions in the series dat$a where each run of "T" (i.e. 1) > ## starts

A different approach would be to use the diff() function:

Where

> diff(dat$a)

[1] 0 1 0 0 0 -1 0 1 0 -1 1 0 0 0 -1 0 0 0 1

is not equal 0, the value is changing from 0 to 1 or one to 0. The indices of the first new value can be found by:

> which(diff(dat$a)!=0) + 1

[1] 3 7 9 11 12 16 20

where it is changing from 0 to 1 is at

> which(diff(dat$a)==1) + 1

[1] 3 9 12 20

where it is changing from 1 to 0 is at

> which(diff(dat$a)==-1) + 1

[1] 7 11 16

By taking into consideration if the first value and the last values are 0 or 1, you can calculate the length.

Cheers,

Rainer

> > Hoping this helps, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <ted.harding_at_wlandres.net> > Fax-to-email: +44 (0)870 094 0861 > Date: 06-Jan-11 Time: 22:57:44 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help_at_r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.

- -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building

Office Suite 2039

Stellenbosch University

Main Campus, Merriman Avenue

Stellenbosch

South Africa

Tel: +33 - (0)9 53 10 27 44 Cell: +27 - (0)8 39 47 90 42 Fax (SA): +27 - (0)8 65 16 27 82 Fax (D) : +49 - (0)3 21 21 25 22 44 Fax (FR): +33 - (0)9 58 10 27 44 email: Rainer_at_krugs.de Skype: RMkrug

Version: GnuPG v1.4.10 (GNU/Linux)

Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0m1dMACgkQoYgNqgF2egoQbACcCB3iFQ6SKYfL4KVX8AMAN9Gp
1awAn0Z+8KXnOmwCLu61gihc8xZIT++j

=O+xA

**-----END PGP SIGNATURE-----
**

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Jan 2011 - 09:02:55 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 11 Jan 2011 - 12:50:06 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*