Re: [R] multiple comparisons of time series data

From: Spencer Graves <>
Date: Mon 29 May 2006 - 06:45:23 EST

PAIRWISE KOLMOGOROV-SMIRNOV:           I don't know, but it looks like you could just type "pairwise.t.test" at a command prompt, copy the code into an R script file, and create a function "pairwise.ks.test" just by changing the call to "t.test" with one to "ks.test". Try it. If you have trouble making it work, submit a post on that.

          I would NOT do this, however, because the "ks.test" assumes samples of INDEPENDENT observations. If you've got time series, I would expect the assumption of independence to be violated, and I would not believe the results of a KS test. If you what to try what I just suggested, please also try it with multiple time series WITHOUT "varying our representation of the stream within the model", preferably several times.

COMPARING MULTIPLE TIME SERIES           If I had k different time series to compare, I might proceed as follows:

  1. Make normal probability plots using, e.g., qqnorm. If the observations did NOT look normal, I'd consider some transformation. If the numbers were all positive, I might consider using the "boxcox" function in library(MASS) to help select one. However, I wouldn't completely believe the results, because this also assumes the observations are independent, and I know they're not.
  2. Try to fit some traditional time series model as describe, e.g., in the chapter on time series on Venables and Ripley (2002) Modern Applied Statistics with S (Springer). There are better books on time series, but this is probably the first book I would recommend to anyone using R, and this chapter would be a reasonable start. I'd play with this until I seemed to get sensible fits for nearly all series with the same model and with residuals that looked fairly though not totally (a) white by the Box-Ljung criteria, and (b) normal in normal probability plots. If I saw consistent non-normal behavior in the residuals, it would indicate a problem bigger than I can handle in a brief email like this.
  3. With k different time series, most of the results of "2" could be summarized in k sets of estimated regression coefficients, all for the same model, with estimated standard errors plus whitened residuals. If you had m parameters, each pair of time series could then be summarized into m z-scores = (b.i-b.j)/(var.b.i+var.b.j), which could then be further converted into m p.values. You would then add the p.values from ks.test, making (m+1) p.values for each of the k*(k-1)/2 = 10 pairs of series with k = 5 series. I'd then feed these k*(m+1) p.values into "p.adjust" to get an answer. (Note: "pairwise.t.test" calls "pairwise.table", which further calls "p.adjust". I didn't know any of this before I read your post.) I might experiment with the different "methods" for p.adjust, and I got different answers from the different methods, I might worry about which to believe. The Bonferroni is the simplest, most widely known and understood, but also perhaps the most conservative. I might tend to believe some of the others more, but if I got different answers, I'd suspect that the case was marginal, and I might want to generate other sets of simulations and try those.
  4. There are other facilities in R for multiple comparisons, e.g., in the multcomp and pgirmess packages. Before I actually undertook steps 1, 2, and 3, above, I might review these packages to familiarize myself more with their contents.
  5. Virginia Tech has an excellent Statistics department with a consulting center. You might try them.
	  hope this helps,
	  Spencer Graves

Kyle Hall wrote:
> I am interested in a statistical comparison of multiple (5) time series'
> generated from modeling software (Hydrologic Simulation Program Fortran). The
> model output simulates daily bacteria concentration in a stream. The multiple
> time series' are a result of varying our representation of the stream within
> the model.
> Our main question is: Do the different methods used to represent a stream
> produce different results at a statistically significant level?
> We want to compare each otput time series to determine if there is a
> difference before looking into the cause within the model. In a previous
> study, the Kolmogorov-Smirnov k-sample test was used to compare multiple time
> series'.
> I am unsure about the strength of the Kolmogorov-Smirnov test and I have set
> out to determine if there are any other tests to compare multiple time
> series'.
> I know htat R has the ks.test but I am unsure how this test handles multiple
> comparisons. Is there something similar to a pairwise.t.test with a
> bonferroni corection, only with time series data?
> Does R currently (v 2.3.0) have a comparison test that takes into account the
> strong serial correlation of time series data?
> Kyle Hall
> Graduate Research Assistant
> Biological Systems Engineering
> Virginia Tech
> ______________________________________________
> mailing list
> PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! Received on Mon May 29 06:52:27 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 29 May 2006 - 08:10:31 EST.

Mailing list information is available at Please read the posting guide before posting to the list.