[R] Up- or downsampling time series in R

From: Brandt, T. (Tobias) <TobiasBr_at_Taquanta.com>
Date: Thu 26 Oct 2006 - 15:46:12 GMT


Hi  

I have data that is sampled (in time) with a certain frequency and I would like to express this time series as a time series of a higher (or lower) frequency with the newly added time points being filled in with NA, 0, or perhaps interpolated. My data might be regularly or irregularly spaced. For example, I might have quarterly data that I would like to handle as a monthly time series with NAs filled in for the missing months.  

RSiteSearch("upsample") gave one link to a function in the "waveslim" package that I'm not familiar with. It seems to me that this would be a fairly common time series task and thus am hoping to find something in the more common time series packages/classes such as ts, zoo, tseries, etc...  

I will now give some example code.  

If I am "lucky" enough that my data is irregularly spaced, then a combination of zoo and ts already accomplishes this task.  

> require(zoo)

[1] TRUE
> dt <- sample(c(1,3,9), 20, replace=TRUE)
> t <- zoo(dt, as.yearmon(Sys.Date()) + cumsum(dt)/12)
> t

Jan 2007 Feb 2007 Nov 2007 Feb 2008 Nov 2008 Dec 2008 Mar 2009 Apr 2009 Jul 2009 Aug 2009

       3        1        9        3        9        1        3        1
3        1 

Nov 2009 Feb 2010 Nov 2010 Aug 2011 May 2012 Jun 2012 Jul 2012 Oct 2012 Jul 2013 Aug 2013
       3        3        9        9        9        1        1        3
9        1 

> as.ts(t)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2007 3 1 NA NA NA NA NA NA NA NA 9 NA 2008 NA 3 NA NA NA NA NA NA NA NA 9 1 2009 NA NA 3 1 NA NA 3 1 NA NA 3 NA 2010 NA 3 NA NA NA NA NA NA NA NA 9 NA 2011 NA NA NA NA NA NA NA 9 NA NA NA NA 2012 NA NA NA NA 9 1 1 NA NA 3 NA NA 2013 NA NA NA NA NA NA 9 1
> plot(t)
 

However if the data happens to be regularly spaced, upsampling it isn't quite as straightforward.  

> t2 <- zoo(sample(1:3, 20, replace=TRUE), as.yearmon(seq(2000, by=0.5,
length=20)))
> t2

Jan 2000 Jul 2000 Jan 2001 Jul 2001 Jan 2002 Jul 2002 Jan 2003 Jul 2003 Jan 2004 Jul 2004

       3        3        2        2        1        3        1        2
3        3 

Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008 Jul 2008 Jan 2009 Jul 2009
       2        2        3        3        2        3        3        2
1        3 

> (t2.ts <- as.ts(t2))

Time Series:
Start = c(2000, 1)
End = c(2009, 2)
Frequency = 2
 [1] 3 3 2 2 1 3 1 2 3 3 2 2 3 3 2 3 3 2 1 3
> plot(t2)
>
 

I would expect this to be as simple as changing the frequency attribute of t2.ts to 12 but I didn't seem to be able to find out how to do this or if it is possible.  

So far, the only way around this that I have found is doing it "manually" in the following way:  

> t2.monthly <- zoo(NA, as.yearmon(seq(from=2000, to=2009.5, by=1/12)))
> window(t2.monthly, as.numeric(time(t2)) ) <- as.numeric(t2) #
can this be done using "[]" indexing?
> t2.monthly

Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000 Jul 2000 Aug 2000 Sep 2000 Oct 2000

       3       NA       NA       NA       NA       NA        3       NA
NA       NA 

Nov 2000 Dec 2000 Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul 2001 Aug 2001
      NA       NA        2       NA       NA       NA       NA       NA
2       NA 

Sep 2001 Oct 2001 Nov 2001 Dec 2001 Jan 2002 Feb 2002 Mar 2002 Apr 2002 May 2002 Jun 2002
      NA       NA       NA       NA        1       NA       NA       NA
NA       NA 

Jul 2002 Aug 2002 Sep 2002 Oct 2002 Nov 2002 Dec 2002 Jan 2003 Feb 2003 Mar 2003 Apr 2003
       3       NA       NA       NA       NA       NA        1       NA
NA       NA 

May 2003 Jun 2003 Jul 2003 Aug 2003 Sep 2003 Oct 2003 Nov 2003 Dec 2003 Jan 2004 Feb 2004
      NA       NA        2       NA       NA       NA       NA       NA
3       NA 

Mar 2004 Apr 2004 May 2004 Jun 2004 Jul 2004 Aug 2004 Sep 2004 Oct 2004 Nov 2004 Dec 2004
      NA       NA       NA       NA        3       NA       NA       NA
NA       NA 

Jan 2005 Feb 2005 Mar 2005 Apr 2005 May 2005 Jun 2005 Jul 2005 Aug 2005 Sep 2005 Oct 2005
       2       NA       NA       NA       NA       NA        2       NA
NA       NA 

Nov 2005 Dec 2005 Jan 2006 Feb 2006 Mar 2006 Apr 2006 May 2006 Jun 2006 Jul 2006 Aug 2006
      NA       NA        3       NA       NA       NA       NA       NA
3       NA 

Sep 2006 Oct 2006 Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 2007 Apr 2007 May 2007 Jun 2007
      NA       NA       NA       NA        2       NA       NA       NA
NA       NA 

Jul 2007 Aug 2007 Sep 2007 Oct 2007 Nov 2007 Dec 2007 Jan 2008 Feb 2008 Mar 2008 Apr 2008
       3       NA       NA       NA       NA       NA        3       NA
NA       NA 

May 2008 Jun 2008 Jul 2008 Aug 2008 Sep 2008 Oct 2008 Nov 2008 Dec 2008 Jan 2009 Feb 2009
      NA       NA        2       NA       NA       NA       NA       NA
1       NA 
Mar 2009 Apr 2009 May 2009 Jun 2009 Jul 2009 
      NA       NA       NA       NA        3 

> points(t2.monthly, type="p", col="blue")
> lines(na.locf(t2.monthly), col="blue") # as an example of why I
might want to do this.
>
 

Similarly, it would be nice if one could conveniently downsample a time series, choosing to keep only the Nth point, or the sum or the average of the previous N points, etc... I can see how that particular application could probably be accomplished relatively easily using rapply and a subsetting operation. However it might be nice to have a convenient wrapper for this.  

Any help would be appreciated. Thanks in advance.  

Tobias



Nedbank Limited Reg No 1951/000009/06. The following link displays the names of the Nedbank Board of Directors and Company Secretary. [ http://www.nedbank.co.za/terms/DirectorsNedbank.htm ] This email is confidential and is intended for the addressee only. The following link will take you to Nedbank's legal notice. [ http://www.nedbank.co.za/terms/EmailDisclaimer.htm ]

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Oct 27 02:22:28 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 26 Oct 2006 - 17:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.