[R] How to do a time-stratified case-crossover analysis for air pollution data? Unformatted text-version, with an additional note

From: Nilsson Fredrik X <Fredrik.X.Nilsson_at_skane.se>
Date: Fri, 07 Mar 2008 16:08:13 +0100


Dear Experts,

I am trying to do a time-stratified case-crossover analysis on air pollution data and number of myocardial infarctions. In order to avoid model selection bias, I started with a simple simulation.

I'm still not sure if my simulation is right. But the results I get from the "ts-case-crossover" are much more variable than those from a glm.

Is this:

a. Due to the simple relation of log-rate of mi cases being = alpha + beta*pm that the glm results are more precise? 
b. Due to me using method="approximate" instead of default "exact" in the clogit? 
c. Due to the fact that the "approximate" method in clogit use "breslow" for handling ties and not "efron"? 
d. Due to that I've misunderstood how to arrange data when doing a case-crossover, such that there is a way of using "exact", that wouldn't go berserk due to the many ties (see #Berserk script at bottom). 
e. The prize to pay for using a case-crossover analysis? 
f. Or, are my simulation results for glm overly precise due to the positive autocorrelation (changed to -0.9 which made the time-stratifed be more precise, but not the glm). This note was not in the HTML-version. Sorry for that.

It should perhaps be noted that, because of the absence of individual data, the exposure to air-pollution (pm10) is assumed to be common to all individuals on a certain day.

I'd be most grateful for any help and ideas on this matter.

Best regards,

Fredrik Nilsson, PhD

PS. I am aware of the limitations that Whitaker et al. presented in Environmetrics 2007; 18: 157-171, but tried to use the time-stratified case-crossover as it could be a "simple", but fairly correct way of doing this type of analysis.

Simulation script:
library(survival)
n<-2*365
samp<-100
ti<-1:n
ar1<-rnorm(1)
for (i in 2:n){
  ar1[i]<-0.2*ar1[i-1]+rnorm(1)
}

#old version of air pollution
#pm10<-2 + .5*sin(2*pi*ti/365 + 127) + 0.1*ar1

startdate<-"1992-07-01"
date1<-as.Date(startdate)
dates<-date1 + (ti-1)
tyda<-weekdays(dates)
tymo<-months(dates)
tyyear<-as.character(dates)
for (i in 1:n)
{
  ask<-tyyear[i]
  tyyear[i]<-substr(ask,1,4)
}

moeff<-cumsum(c(1,31,28,31,30,31,30,31,31,30,31,30)) a<-as.Date("1994-12-31")
monthnames<-months(a+moeff)
tymofa<-match(tymo,monthnames)

moeff<-.5*sin(2*pi*moeff/365 + 127)

#new version; to have strictly stationary levels of air-pollution
#within strata
pm10<-2 + .5*moeff[tymofa] + 0.1*ar1

#intercept and proportionality to pm10 coefficients of log-rate
al<- -2.5
be<- 1.4

qres<-numeric(samp)
glmres<-qres
gamres<-qres
for(q in 1:samp)
{

rate<-exp(al + be*pm10)
mi<-rpois(length(rate),lambda=rate)

date1<-as.Date(startdate)
dates<-date1 + (ti-1)
tyda<-weekdays(dates)
tymo<-months(dates)
tyyear<-as.character(dates)
for (i in 1:n)
{
  ask<-tyyear[i]
  tyyear[i]<-substr(ask,1,4)
}

#replicate cases

air<-data.frame(mi, tyda, tymo, tyyear, pm10, dates) air$stratn<-as.numeric(strata(air$tyda, air$tymo,air$tyyear)) lest<-unique(air$stratn)

air$nctrl<-0
airbase<-air

#find the number of controls whithin each stratum
for (i in 1:length(lest))
{

   a<-which(air$stratn==lest[i])
   for (j in 1:length(a))
   {
      air$nctrl[a[j]]<-sum(air$mi[a[-j]])
   }

}

#create cases and controls

cami<-rep(1,sum(mi))
ctmi<-rep(0,sum(air$nctrl))

capm<-rep(pm10, mi)
ctpm<-rep(pm10, air$nctrl)

cast<-rep(air$stratn, mi)
ctst<-rep(air$stratn, air$nctrl)

cady<-rep(dates, mi)
ctdy<-rep(dates, air$nctrl)

cases<-c(cami, ctmi)
stranu<-c(cast, ctst)
days<-c(cady, ctdy)
pmva<-c(capm, ctpm)

air2<-data.frame(cases, days, pmva,stranu) air.cl<-clogit(cases~pmva + strata(stranu), method="approximate", data=air2) air.glm<-glm(mi~pm10, family=poisson, data=air)
#air.gam<-gam(mi~s(pm10), family=poisson)
qres[q]<-as.numeric(coef(air.cl))
glmres[q]<-as.numeric(coef(air.glm)[2])
#gamres[q]<-as.numeric(coef(air.gam)[2])

}
par(mfrow=c(2,1))
plot(density(qres))
plot(density(glmres))
fivenum(qres)
fivenum(glmres)

# Berserk script
# this shows that even for very few cases, my intended way of doing TS C-C is quite time-consuming.
#
#case<-c(5,0,2,1)
#airq<-c(10,2,3,3)
#
#nctrl<-0*case
#for (i in 1:length(case))
#{
#  nctrl[i]<-sum(case[-i])
#}
#
#airca<-rep(airq,case)
#airct<-rep(airq,nctrl)
#
#cases<-rep(1,sum(case))
#ctrls<-rep(0,sum(nctrl))
#
#mi<-c(cases,ctrls)
#airq<-c(airca,airct)
#a<-Sys.time()
#air.cl<-clogit(mi~airq)
#b<-Sys.time()
#b-a
#a<-Sys.time()
#air.cl<-clogit(mi~airq, method="approximate")
#b<-Sys.time()
#b-a



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Mar 2008 - 15:23:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Mar 2008 - 16:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive