Re: [R] How to speed up grouping time series, help please

From: Den Alpin <den.alpin_at_gmail.com>
Date: Thu, 07 Apr 2011 12:53:05 +0200

I found a faster implementation (by an order of magnitude from my tests) than the one using xts, split, merge (from Joshua). I report the two fastest solution below with code to generate a test case; some work still to be done for columns order and naming, Test case has grown from my previous post to get a more realistic timing.

Any comment or idea to further speed up multivariate time series creation with classes xts or timeSeries starting from a data.frame like the one reported here is welcome.

Best regards,
Den

a data.frame example (code below to generate it)

  ID                DATE     VALUE
14  3 2000-01-01 00:00:03 0.5726334
4   1 2000-01-01 00:00:03 0.8830174
1   1 2000-01-01 00:00:00 0.2875775
15  3 2000-01-01 00:00:04 0.1029247
11  3 2000-01-01 00:00:00 0.9568333
9   2 2000-01-01 00:00:03 0.5514350
7   2 2000-01-01 00:00:01 0.5281055
6   2 2000-01-01 00:00:00 0.0455565
12  3 2000-01-01 00:00:01 0.4533342
8   2 2000-01-01 00:00:02 0.8924190

3 1 2000-01-01 00:00:02 0.4089769
13 3 2000-01-01 00:00:02 0.6775706

And I want to get a timeSeries object or xts object like this:

                           1         2         3
2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
2000-01-01 00:00:01        NA 0.5281055 0.4533342
2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
2000-01-01 00:00:04        NA        NA 0.1029247

# CODE:
set.seed(123)
# set N to 5 to reproduce above data.frame
N <- 1000
# set K to 3 to reproduce above data.frame
K <- 10
X <- data.frame(
  ID = rep(1:K, each = N),
  DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), K)),
  VALUE = runif(N*K), stringsAsFactors = FALSE) X <- X[sample(1:(N*K), N*K),]
X <- X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),] str(X)

xtsSplit <- function(x)
{
  library(xts)
  x <- xts(x[,c("ID","VALUE")], as.POSIXct(x[,"DATE"]))   return(do.call(merge, split(x$VALUE,x$ID))) }
xtsSplitTime <- replicate(50,
  system.time(xtsSplit(X))[[1]])
median(xtsSplitTime)

xtsReshape <- function(x)
{
  library(xts)
  x <- reshape(x, idvar = "DATE", timevar = "ID", direction = "wide")   x <- xts(x[,-1], as.POSIXct(x[,1]))
  return(x)
}
xtsReshapeTime <- replicate(50,
  system.time(xtsReshape(X))[[1]])
median(xtsReshapeTime)



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 07 Apr 2011 - 10:56:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 07 Apr 2011 - 13:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive