Re: [R] Identifying peaks (or offsets) in a time series

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Tue 25 Jul 2006 - 21:21:29 EST


On 25-Jul-06 Ulrik Stervbo wrote:

> Could Petr Pikal's peaks function
> (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/33097.html)
> be of any use?
> 
> Ulrik
> 
> On 7/24/06, Tauber, Dr E. <et22@leicester.ac.uk> wrote:

>> Dear R-users,
>>
>> We are monitoring the activity of animals during a few days
>> period. The data from each animal (crossing of infra-red beam)
>> are collected as a time series (in 30 min bins). An example is
>> attached below.
>>
>> y <-
>> c(0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,28,27,46,76,77,60,
>> 19,35,55,59,48,87,20,38,82,62,60,85,105,69,109,102,100,101,
>> 116,126,119,63,27,25,15,8,0,0,3,0,0,3,0,0,5,3,0,0,6,1,29,
>> 73,56,56,57,92,34,51,30,76,30,38,47,87,22,0,68,76,94,101,
>> 119,114,115,111,116,134,125,76,23,19,30,2,8,0,3,0,0,0,7,
>> 0,0,0,0,4,0,7,0,21,4,49,51,56,43,55,55,34,48,16,0,61,22,
>> 94,63,102,47,100,96,113,93,109,123,120,124,115,94,96,76,36,3,
>> 0,0,0,0,0,0,2,5,0,0,0,0,2,10,33,34,15,0,47,22,20,33,52,4,
>> 41,45,0,21,18,38,32,21,78,82,72,102,103,118,116,118,114,82,
>> 18,5,21,4,0,14,0,5,2,0,0,2,2,0,0,3,0,2,7,16,13,17,50,0,
>> 48,16,19,34,39,33,3,67,0,68,34,65,84,61,100,85,108,124,
>> 141,139,134,96,54,91,54,12,0,0,0,0,0,0,0,0,0,0,0,4,11,
>> 0,19,27,15,12,20)
>>
>> We would like to have an automatic way, using R, to identify
>> the time point of offset of each bout of activity (i.e. when
>> activity goes down to a minimum value, for a defined duration).
>> In the example above the offset times (the element number)
>> should be approximately: 53, 99, 146, 191, 239, 283, 330
>> (the last bout of activity can be ignores).
>>
>> Any help or advice will be greatly appreciated,
>>
>> Many thanks, Eran

First of all, I find only 255 values in your 'y' above, so I'm not sure whether your "53, 99, 146, 191, 239, 283, 330" is referring to the same series. So this may not be helpful in interpreting what you mean by "goes down to a minimum value for a defined duration".

To interpret that clearly, one needs a specified value for "minimum value" and a specified value for "defined duration".

You can use R to see what happens in the series by methods such as the following.

You can plot the series (with 24-hour indicators) with

  t0<-c(  0, 48, 96,144,192,240)
  y0<-c(  0,  0,  0,  0,  0,  0)
  y1<-c(140,140,140,140,140,140)

  plot(y)
  lines(y)
  for(i in (1:6)){lines(c(t0[i],t0[i]),c(y0[i],y1[i]),col="green")}

Adopting for illustration "minimum value" = 0,

  Z <- 1*(y==0) ## a series of 0 and 1, Z=1 whenever y=0

  Runs<-rle(Z) ## "run length encoding" function -- see ?rle

  R.L<-Runs$lengths
  R.V<-Runs$values
  R.L[R.V==1]   ## gives the lengths of runs of "y=0"
## [1]  5 12  2  2  2  2  1  1  3  4

##[11] 1 1 1 6 4 1 1 1 1 2
##[21] 2 1 1 1 11 1

So you can see that available values for "defined duration" in the series are:

  1 (13 times), 2 (6), 3 (1), 4 (2), 5 (1), 6 (1), 11 (1), 12 (1)

Next, there is clearly an overall 48-point periodicity (which of course corresponds to a 48*30min = 24-hour periodicity), and there are 5 complete 48-point segments in your series:

  1:48, 49:96, 97:144, 145:192, 192:240

with a trailing incomplete segment of 15 points. Overall (see the plot) there are 6 stretches of "very low" activity every 24 hours, so from the above you could think you can obtain these by choosing "defined duration" = 4 (any run of 0 of length at least 4 is a "period of very low activity", since there are 2 (=4) + 1 (=5) + 1 (=6) + 1 (=11) + 1 (=12) = 6 of these. But, as you can see from the plot, this will pick out wrong segments!

This is because the second period of "very low" activity is 0's broken by relatively frequent positive values.

So you need to raise your threshold of "activity" a bit. Now you're venturing into "groping" territory -- since the nice clean "minimum value" = 0 doesn't work, you have to find one which does.

If you go up to "minimum value = 5", you get:

  Z <- 1*(y<=5); Runs<-rle(Z); R.L<-Runs$lengths; R.V<-Runs$values   R.L[R.V==1]
## [1] 19 12 1 1 1 5 6 1 1 1
##[11] 14 1 1 1 1 2 12 1 1 1 12 1

which better captures the 6 periods, except that the third one is split (5,6). Only when you go up to 7 do you finally get

  Z <- 1*(y<=7); Runs<-rle(Z); R.L<-Runs$lengths; R.V<-Runs$values   R.L[R.V==1]
## [1] 19 14 1 1 14 1 1 14 1 1 1 1 2 13 1 1 1 12 1

and now you can find your 6 "low activity" periods where you have 19,14,14,14,13,12.

So, in terms of "activity goes down to a minimum value for a defined period", you need (for this data set) "minimum value" to be at least 7, and "defined period" could be anything at most 12 but greater than 2; in this case therefore, for instance, you can choose say 8 for the latter.

To really locate these periods in the original series, you now need (say)

  R.L

## [1] 19 32 14 15  1 15  1  1 14  1
##[11]  1  9  1 19 14  4  1  5  1  2
##[21]  1 16  1  1  2  1 13  4  1  6
##[31]  1  1  1 17 12  1  1  5

  ix<-(R.V==1)&(R.L>8) ##(Activity<="min val"=7)&(runlength>8)   1*ix

## [1]  1  0  1  0  0  0  0  0  1  0
##[11]  0  0  0  0  1  0  0  0  0  0
##[21]  0  0  0  0  0  0  1  0  0  0
##[31]  0  0  0  0  1  0  0  0

  (1-ix)

## [1]  0  1  0  1  1  1  1  1  0  1
##[11]  1  1  1  1  0  1  1  1  1  1
##[21]  1  1  1  1  1  1  0  1  1  1
##[31]  1  1  1  1  0  1  1  1

So "inactivity" is defined by 1*ix==1, and "activity" by (1-ix)==1.

You can now match "inactivity" to the original series using the "rep" function:

  inact<-rep(1*ix,R.L)

and check this by adding lines to the plot:

  lines(inact,col="blue")

(where the high value "100" corresponds to inactivity). Now anything you want to find out about the periods of "inactivity" (as defined above) ccan be determined from the sequence "inact".

However, it is clear that this approach (which is primarily motivated by trying to clarify your statement of your aims in terms of the data you supplied) is clearly unlikely to be useful when you have many such datasets, since you will have to go into "groping" mode for each dataset, and are likely to get many different answers over your several datasets.

At this point, I think, it is becoming clear that you need to think about what you *really" mean by "actvity", and about how to model it in terms which can be carried over across datasets.

There is a fundamental point about how your data represent real activity -- to what extent is there scope for the animal to be active without breaking the light beam? -- but I'll leave that one aside.

Taking the 24 hours (48*30min) to run from midnight at the start of the series, there is an apparent cycle of activity which, overall, consists of inactivity until somewhere in the (approximate) time range 06:30-09:00, then an increasing trend until a maximum which occurs between 18:00 and 23:00, followed by a rapid decline to a very low level which takes place over some 2-4 hours.

You can get a numerical overview of this variation from day to day by using commands like

  y[1:48]
  y[(1:48)+48*1]
  y[(1:48)+48*2]
  y[(1:48)+48*3]
  y[(1:48)+48*4]
  y[(1:48)+48*5]

It is tempting to try to model this rise and fall by a mathematical curve. To the extent that this well represents the initial stage of the rise, and also the final stage of the decline, you could then solve that curve for the point at which you decide that activity starts and ends.

However, it is apparent from the plot that the rising phase, at least, is not simple. In most cases there is a "midday peak" over approx 10:00-14:00 (of variable duration) followed by a brief "siesta", then the sharp rise to the late night peak.

Also, it seems from this dataset that the "day" of this animal is a bit shorter than 24 hours, since the major peak shifts steadily from 23:00 on the first day to 19:00 on the 5th day, i.e. 1 hour per day. In addition, the magnitude of the "midday peak" decreases steadily over the days, while its distinctiveness tends to increase. Apparently, there is some progressive factor in operation which influences activity.

So (and I can't comment on whether such things are intrinsic to the behaviour of your animal or have been brought about by your experimental setup) a detailed modelling of the activity cycle would need to bring such comsiderations into the model. How to do this, only you can say!

Hoping that this helps!
Ted.



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 25-Jul-06                                       Time: 12:21:24
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jul 25 21:25:15 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 25 Jul 2006 - 22:24:14 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.