[R] How to do a time-stratified case-crossover analysis for air pollution data?

From: Nilsson Fredrik X <Fredrik.X.Nilsson_at_skane.se>
Date: Fri, 07 Mar 2008 15:47:39 +0100


Dear Experts,  

I am trying to do a time-stratified case-crossover analysis on air pollution data and number of myocardial infarctions. In order to avoid model selection bias, I started with a simple simulation.  

I'm still not sure if my simulation is right. But the results I get from the "ts-case-crossover" are much more variable than those from a glm.  

Is this:

  1. Due to the simple relation of log-rate of mi cases being = alpha + beta*pm that the glm results are more precise?
  2. Due to me using method="approximate" instead of default "exact" in the clogit?
  3. Due to the fact that the "approximate" method in clogit use "breslow" for handling ties and not "efron"?
  4. Due to that I've misunderstood how to arrange data when doing a case-crossover, such that there is a way of using "exact", that wouldn't go berserk due to the many ties (see #Berserk script at bottom).
  5. The prize to pay for using a case-crossover analysis?

It should perhaps be noted that, because of the absence of individual data, the exposure to air-pollution (pm10) is assumed to be common to all individuals on a certain day.  

I'd be most grateful for any help and ideas on this matter.  

Best regards,  

Fredrik Nilsson, PhD  

PS. I am aware of the limitations that Whitaker et al. presented in Environmetrics 2007; 18: 157-171, but tried to use the time-stratified case-crossover as it could be a "simple", but fairly correct way of doing this type of analysis.  

Simulation script:

library(survival)

n<-2*365

samp<-100

ti<-1:n

ar1<-rnorm(1)

for (i in 2:n){

  ar1[i]<-0.2*ar1[i-1]+rnorm(1)

}  

#old version of air pollution

#pm10<-2 + .5*sin(2*pi*ti/365 + 127) + 0.1*ar1
 

startdate<-"1992-07-01"

date1<-as.Date(startdate)

dates<-date1 + (ti-1)

tyda<-weekdays(dates)

tymo<-months(dates)

tyyear<-as.character(dates)

for (i in 1:n)

{

  ask<-tyyear[i]

  tyyear[i]<-substr(ask,1,4)

}    

moeff<-cumsum(c(1,31,28,31,30,31,30,31,31,30,31,30))

a<-as.Date("1994-12-31")

monthnames<-months(a+moeff)

tymofa<-match(tymo,monthnames)  

moeff<-.5*sin(2*pi*moeff/365 + 127)  

#new version; to have strictly stationary levels of air-pollution

#within strata

pm10<-2 + .5*moeff[tymofa] + 0.1*ar1  

#intercept and proportionality to pm10 coefficients of log-rate

al<- -2.5

be<- 1.4    

qres<-numeric(samp)

glmres<-qres

gamres<-qres

for(q in 1:samp)

{  

rate<-exp(al + be*pm10)

mi<-rpois(length(rate),lambda=rate)    

date1<-as.Date(startdate)

dates<-date1 + (ti-1)

tyda<-weekdays(dates)

tymo<-months(dates)

tyyear<-as.character(dates)

for (i in 1:n)

{

  ask<-tyyear[i]

  tyyear[i]<-substr(ask,1,4)

}  

#replicate cases

air<-data.frame(mi, tyda, tymo, tyyear, pm10, dates)

air$stratn<-as.numeric(strata(air$tyda, air$tymo,air$tyyear))

lest<-unique(air$stratn)  

air$nctrl<-0

airbase<-air  

#find the number of controls whithin each stratum

for (i in 1:length(lest))

{

   a<-which(air$stratn==lest[i])

   for (j in 1:length(a))

   {

      air$nctrl[a[j]]<-sum(air$mi[a[-j]])

   }

}    

#create cases and controls

cami<-rep(1,sum(mi))

ctmi<-rep(0,sum(air$nctrl))  

capm<-rep(pm10, mi)

ctpm<-rep(pm10, air$nctrl)  

cast<-rep(air$stratn, mi)

ctst<-rep(air$stratn, air$nctrl)  

cady<-rep(dates, mi)

ctdy<-rep(dates, air$nctrl)  

cases<-c(cami, ctmi)

stranu<-c(cast, ctst)

days<-c(cady, ctdy)

pmva<-c(capm, ctpm)  

air2<-data.frame(cases, days, pmva,stranu)

air.cl<-clogit(cases~pmva + strata(stranu), method="approximate", data=air2)

air.glm<-glm(mi~pm10, family=poisson, data=air)

#air.gam<-gam(mi~s(pm10), family=poisson)

qres[q]<-as.numeric(coef(air.cl))

glmres[q]<-as.numeric(coef(air.glm)[2])

#gamres[q]<-as.numeric(coef(air.gam)[2])
 

}

par(mfrow=c(2,1))

plot(density(qres))

plot(density(glmres))

fivenum(qres)

fivenum(glmres)  

# Berserk script

# this shows that even for very few cases, my intended way of doing TS
C-C is quite time-consuming.

#

#case<-c(5,0,2,1)

#airq<-c(10,2,3,3)

#

#nctrl<-0*case

#for (i in 1:length(case))

#{

# nctrl[i]<-sum(case[-i])

#}

#

#airca<-rep(airq,case)

#airct<-rep(airq,nctrl)

#

#cases<-rep(1,sum(case))

#ctrls<-rep(0,sum(nctrl))

#

#mi<-c(cases,ctrls)

#airq<-c(airca,airct)

#a<-Sys.time()

#air.cl<-clogit(mi~airq)

#b<-Sys.time()

#b-a

#a<-Sys.time()

#air.cl<-clogit(mi~airq, method="approximate")

#b<-Sys.time()

#b-a

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Mar 2008 - 14:51:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Mar 2008 - 15:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive