From: Earl F. Glynn <efg_at_stowers-institute.org>

Date: Tue 09 Jan 2007 - 17:17:32 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Jan 10 04:25:03 2007

Date: Tue 09 Jan 2007 - 17:17:32 GMT

"Georg Hoermann" <georg.hoermann@gmx.de> wrote in message
news:45A33EBC.3050503@gmx.de...

> Peter Dalgaard wrote:

>> Earl F. Glynn wrote:

> Thanks a lot for the help. I will post the script when its ready > (an introduction for our biology students to time series, just 8 hours)

I've been working with one of our labs here to find "cyclic" genes from microarray data. The supplement for the paper we published is here (I haven't bothered making the code general enough for a package yet):

http://research.stowers-institute.org/efg/2005/LombScargle/index.htm

Normally we only have about 20 to 50 points in our time series for each gene. With missing data a problem, I used a "Lomb-Scargle" approach to find the periodicity. With Fourier analysis, one must impute any missing data points, but with Lomb-Scargle you just process the data you have without any imputation.

Perhaps you or your students would be interested in the "Numerical
Experiments" on this page

http://research.stowers-institute.org/efg/2005/LombScargle/supplement/NumericalExperiments/index.htm

I was curious how well the Lomb-Scargle technique would work with your data -- see the R code below. Normally the Lomb-Scargle periodogram shows a single peak when there is a single dominant frequency. The Peak Significance curve for all your data is a difficult to interpret, and I'm not sure the statistical tests are valid (without some tweaks) for your size dataset.

I took a random sample of 50 of your ~3000 data points and analyzed those -- see the second code block below. [For 50 data points I know all the assumptions are "good enough" for the statistics being computed.] The periodogram here shows a single peak for period 365.6 days, which has good statistical significance. Other subset samples can show harmonic frequencies, sometimes.

# efg, 9 Jan 2007

air = read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv") #air <- read.csv("air_temp.csv")

TempAirC <- air$T_air

Time <- as.Date(air$Date, "%d.%m.%Y")
N <- length(Time)

# Lomb-Scargle code

source("http://research.stowers-institute.org/efg/2005/LombScargle/R/LombScargle.R")
MAXSPD <<- 1500

unit <<- "day"

M <- N # Usually use factor of 2 or 4, but with large N use 1 instead

# Look at test frequencies corresponding to periods of 200 days to 500 days:

f = 1/T

TestFrequencies <- (1/500) + (1/200 - 1/500) * (1:M / M)

# Use Horne & Baliunas' estimate of independent frequencies

Nindependent <- NHorneBaliunas(length(Time)) # valid for this size?

# Fairly slow with this large dataset

ComputeAndPlotLombScargle(as.numeric(Time), TempAirC,
TestFrequencies, Nindependent,

"Air Temperature [C]")

# Could get good results with fewer points too, say 50 chosen at random

MAXSPD <<- 25

TempAirC <- air$T_air

Time <- as.Date(air$Date, "%d.%m.%Y")

set.seed(19) # For reproducible results
RandomSet <- sample(1:length(Time), 50)

TempAirC <- TempAirC[RandomSet]

Time <-Time[RandomSet]

N <- length(Time)

M <- 4 * N # Usually use factor of 2 or 4

# Look at test frequencies corresponding to periods of 200 days to 500 days:

f = 1/T

TestFrequencies <- (1/500) + (1/200 - 1/500) * (1:M / M)

# Use Horne & Baliunas' estimate of independent frequencies

Nindependent <- NHorneBaliunas(length(Time))

# Very fast to compute for only 50 points

ComputeAndPlotLombScargle(as.numeric(Time), TempAirC,
TestFrequencies, Nindependent,

"Air Temperature [C]")

efg

Earl F. Glynn

Scientific Programmer

Stowers Institute

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Jan 10 04:25:03 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Tue 09 Jan 2007 - 18:30:31 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*