Re: [R] algorithm help

From: Rainer M Krug <r.m.krug_at_gmail.com>
Date: Fri, 07 Jan 2011 09:58:59 +0100

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/06/2011 11:57 PM, (Ted Harding) wrote:

> On 06-Jan-11 22:16:38, array chip wrote:

>> Hi, I am seeking help on designing an algorithm to identify the
>> locations of stretches of 1s in a vector of 0s and 1s. Below is
>> an simple example:
>>
>>> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
>> ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
>>
>>> dat
>> a b
>> 1 0 4
>> 2 0 12
>> 3 1 13
>> 4 1 16
>> 5 1 18
>> 6 1 20
>> 7 0 28
>> 8 0 30
>> 9 1 34
>> 10 1 46
>> 11 0 47
>> 12 1 49
>> 13 1 61
>> 14 1 73
>> 15 1 77
>> 16 0 84
>> 17 0 87
>> 18 0 90
>> 19 0 95
>> 20 1 97
>>
>> In this dataset, "b" is sorted and denotes the location for each
>> number in "a".
>> So I would like to find the starting & ending locations for each
>> stretch of 1s within "a", also counting the number of 1s in each
>> stretch as well.
>> Hope the results from the algorithm would be:
>>
>> stretch start end No.of.1s
>> 1 13 20 4
>> 2 34 46 2
>> 3 49 77 4
>> 4 97 97 1
>>
>> I can imagine using for loops can do the job, but I feel it's not a
>> clever way to do this. Is there an efficient algorithm that can do
>> this fast?
>>
>> Thanks for any suggestions.
>> John
> 
> The basic information you need can be got using rle() ("run length
> encoding"). See '?rle'. In your example:
> 
>   rle(dat$a)
>   # Run Length Encoding
>   #   lengths: int [1:8] 2 4 2 2 1 4 4 1
>   #   values : num [1:8] 0 1 0 1 0 1 0 1
>   ## Note: F -> 0, T -> 1
> 
> The following has a somewhat twisted logic at the end, and may
> be flawed, but you can probably adapt it!
> 
>   L <- rle(dat$a)$lengths
>   V <- rle(dat$a)$values
>   pos <- c(1,cumsum(L))
>   V1 <- c(-1,V)
>   1+pos[V1==0]
>   # [1]  3  9 12 20
>   ## Positions in the series dat$a where each run of "T" (i.e. 1)
>   ##   starts

A different approach would be to use the diff() function:

Where

> diff(dat$a)
 [1] 0 1 0 0 0 -1 0 1 0 -1 1 0 0 0 -1 0 0 0 1

is not equal 0, the value is changing from 0 to 1 or one to 0. The indices of the first new value can be found by:

> which(diff(dat$a)!=0) + 1
[1] 3 7 9 11 12 16 20

where it is changing from 0 to 1 is at

> which(diff(dat$a)==1) + 1
[1] 3 9 12 20

where it is changing from 1 to 0 is at

> which(diff(dat$a)==-1) + 1
[1] 7 11 16

By taking into consideration if the first value and the last values are 0 or 1, you can calculate the length.

Cheers,

Rainer

> 
> Hoping this helps,
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding_at_wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 06-Jan-11                                       Time: 22:57:44
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


Centre of Excellence for Invasion Biology Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      Rainer_at_krugs.de

Skype:      RMkrug

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0m1dMACgkQoYgNqgF2egoQbACcCB3iFQ6SKYfL4KVX8AMAN9Gp 1awAn0Z+8KXnOmwCLu61gihc8xZIT++j
=O+xA
-----END PGP SIGNATURE-----



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Jan 2011 - 09:02:55 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 11 Jan 2011 - 12:50:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive