From: Thomas Lumley <tlumley_at_u.washington.edu>

Date: Thu 29 Jul 2004 - 03:42:01 EST

https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 29 03:50:21 2004

Date: Thu 29 Jul 2004 - 03:42:01 EST

On Wed, 28 Jul 2004, Mayeul KAUFFMANN wrote:

*> > If you can get the conditional independence (martingaleness) then, yes,
**> > BIC is fine.
**> >
**> > One way to check might be to see how similar the standard errors are
**> with
**> > and without the cluster(id) term.
**>
*

> (Thank you "again !", Thomas.)

*>
**> At first look, the values seemed very similar (see below, case 2).
**> However, to check this without being too subjective, and without a
**> specific test, I needed other values to assess the size of the
**> differences: what is similar, what is not?
**>
*

I think the econometricians have theory for this (comparing the whole covariance matrices).

-thomas

*>
*

> ==========================================================================

*> =====
**> CASE 1
**> I first estimated the model without modeling dependence:
**>
**> Call:
**> coxph(formula = Surv(start, stop, status) ~ cluster(ccode) +
**> pop + pib + pib2 + crois + instab.x1 + instab.autres, data = xstep)
**>
**>
**> coef exp(coef) se(coef) robust se z p
**> pop 0.3606 1.434 0.0978 0.1182 3.05 2.3e-03
**> pib -0.5947 0.552 0.1952 0.1828 -3.25 1.1e-03
**> pib2 -0.4104 0.663 0.1452 0.1270 -3.23 1.2e-03
**> crois -0.0592 0.943 0.0245 0.0240 -2.46 1.4e-02
**> instab.x1 2.2059 9.079 0.4692 0.4097 5.38 7.3e-08
**> instab.autres 0.9550 2.599 0.4700 0.4936 1.93 5.3e-02
**>
**> Likelihood ratio test=74 on 6 df, p=6.2e-14 n= 7286
**>
**> There seems to be a strong linear relationship between standard errors
**> (se, or naive se) and robust se.
**>
**> > summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1))
**> Coefficients:
**> Estimate Std. Error t value Pr(>|t|)
**> sqrt(diag(cox1$naive.var)) 0.96103 0.04064 23.65 2.52e-06 ***
**> Multiple R-Squared: 0.9911, Adjusted R-squared: 0.9894
**>
**>
**> ==========================================================================
**> =====
**> CASE 2
**>
**> Then I added a variable (pxcw) measuring the proximity of the previous
**> event (1>pxcw>0)
**>
**> n= 7286
**> coef exp(coef) se(coef) robust se z p
**> pxcw 0.9063 2.475 0.4267 0.4349 2.08 3.7e-02
**> pop 0.3001 1.350 0.1041 0.1295 2.32 2.0e-02
**> pib -0.5485 0.578 0.2014 0.1799 -3.05 2.3e-03
**> pib2 -0.4033 0.668 0.1450 0.1152 -3.50 4.6e-04
**> crois -0.0541 0.947 0.0236 0.0227 -2.38 1.7e-02
**> instab.x1 1.9649 7.134 0.4839 0.4753 4.13 3.6e-05
**> instab.autres 0.8498 2.339 0.4693 0.4594 1.85 6.4e-02
**>
**> Likelihood ratio test=78.3 on 7 df, p=3.04e-14 n= 7286
**>
**>
**> Estimate Std. Error t value Pr(>|t|)
**> sqrt(diag(cox1$naive.var)) 0.98397 0.02199 44.74 8.35e-09 ***
**> Multiple R-Squared: 0.997, Adjusted R-squared: 0.9965
**>
**> The naive standard errors (se) seem closer to the robust se than they were
**> when not modeling for dependence.
**> 0.98397 is very close to one, R^2 grew, etc.
**> The dependence is high (risk is multiplied by 2.475 the day after an
**> event)
**> but conditional independence (given covariates) seems hard to reject.
**>
**>
**> ==========================================================================
**> =====
**> CASE 3
**> Finally, I compared these results with those without repeated events
**> (which gives a smaller dataset). A country is removed as soon as we
**> observe its first event.
**> (robust se is still computed, even if naive se should in fact be used here
**> to compute the pvalue)
**>
**> coxph(formula = Surv(start, stop, status) ~ cluster(ccode) +
**> pop + pib + pib2 + crois + instab.x1 + instab.autres, data =
**> xstep[no.previous.event, ])
**>
**> coef exp(coef) se(coef) robust se z p
**> pop 0.4236 1.528 0.1030 0.1157 3.66 2.5e-04
**> pib -0.7821 0.457 0.2072 0.1931 -4.05 5.1e-05
**> pib2 -0.3069 0.736 0.1477 0.1254 -2.45 1.4e-02
**> crois -0.0432 0.958 0.0281 0.0258 -1.67 9.5e-02
**> instab.x1 1.9925 7.334 0.5321 0.3578 5.57 2.6e-08
**> instab.autres 1.3571 3.885 0.5428 0.5623 2.41 1.6e-02
**>
**> Likelihood ratio test=66.7 on 6 df, p=1.99e-12 n=5971 (2466 observations
**> deleted due to missing)
**>
**>
**> > summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1))
**> Estimate Std. Error t value Pr(>|t|)
**> sqrt(diag(cox1$naive.var)) 0.86682 0.07826 11.08 0.000104 ***
**> Residual standard error: 0.06328 on 5 degrees of freedom
**> Multiple R-Squared: 0.9608, Adjusted R-squared: 0.953
**>
**>
**> There seems to be no evidence that robust se is more different from se in
**> case 2 than in case 3 (and case 1).
**> It even seems closer.
**>
**> I conclude that conditional independence (martingaleness) cannot be
**> rejected in CASE 2, when modeling the dependence between events with a
**> covariate.
**>
**> Mayeul KAUFFMANN
**> Univ. Pierre Mendes France
**> Grenoble - France
**>
**>
**>
**> > > Then, there is still another option. In fact, I already modelled
**> > > explicitely the influence of past events with a "proximity of last
**> event"
**> > > covariate, assuming the dependence on the last event decreases at a
**> > > constant rate (for instance, the proximity covariate varies from 1 to
**> 0.5
**> > > in the first 10 years after an event, then from 0.5 to 0.25 in the
**> next
**> > > ten years, etc).
**> > >
**> > > With a well chosen modelisation of the dependence effect, the events
**> > > become conditionnaly independent, I do not need a +cluster(id) term,
**> and I
**> > > can use fit$loglik to make a covariate selection based on BIC, right?
**> >
**> > If you can get the conditional independence (martingaleness) then, yes,
**> > BIC is fine.
**> >
**> > One way to check might be to see how similar the standard errors are
**> with
**> > and without the cluster(id) term.
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
**>
*

Thomas Lumley Assoc. Professor, Biostatistics tlumley@u.washington.edu University of Washington, Seattle ______________________________________________R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 29 03:50:21 2004

*
This archive was generated by hypermail 2.1.8
: Wed 03 Nov 2004 - 22:55:20 EST
*