[R] Effect size of comparison of two levels of a factor in multiple linear regression

From: Christoph Mathys <cmathys_at_bidmc.harvard.edu>
Date: Sun, 3 Feb 2008 16:09:50 +0100

Dear R users,

I have a linear model of the kind

outcome ~ treatment + covariate

where 'treatment' is a factor with three levels ("0", "1", and "2"), and the covariate is continuous. Treatments "1" and "2" both have regression coefficients significantly different from 0 when using treatment contrasts with treatment "0" as the baseline. I would now like to determine effect sizes (akin to Cohen's d in a two-sample comparison) for the comparison to baseline of treatments "1" and "2". I have illustrated a way to do this in the reproducible example below and am grateful for any comments on the soundness of what I'm doing. I have not yet found a way to determine confidence intervals for the effect sizes derived below and would appreciate tips on that.

set.seed(123456) # Make session reproducible

# Set up the treatment factor with three levels and 100 observations
# each

treatment <- factor(c(rep(0, 100), rep(1, 100), rep(2, 100)))

# Simulate outcomes

outcome <- rep(NA, 300)

outcome[treatment==0] <- rnorm(100, 10, 5) # baseline: mean=10, sd=5
outcome[treatment==1] <- rnorm(100, 30, 5) # effect size 4
outcome[treatment==2] <- rnorm(100, 40, 5) # effect size 6

# Check effect sizes (Cohen's d)

cohens.d <- function (x, y) {(mean(x)-mean(y))/sqrt((var(x)+var(y))/2) } cohens.d(outcome[treatment==1], outcome[treatment==0]) [1] 3.984774
cohens.d(outcome[treatment==2], outcome[treatment==0]) [1] 6.167798

# Sometimes standardized regression coefficients are recommended
# for determining effect size but that clearly doesn't work here:
coef(lm(scale(outcome) ~ treatment))
(Intercept) treatment1 treatment2
  -1.233366 1.453152 2.246946
# The reason it doesn't work is that the difference of outcome
# means is divided by the sd of *all* outcomes:
(mean(outcome[treatment==1])-mean(outcome[treatment==0]))/sd(outcome) [1] 1.453152
(mean(outcome[treatment==2])-mean(outcome[treatment==0]))/sd(outcome) [1] 2.246946

# Now, create a situation where Cohen's d is impossible to
# calculate directly by introducing a continuous covariate:
covariate <- 1:300
outcome <- outcome + rnorm(300, covariate, 2) model <- lm(scale(outcome) ~ treatment + scale(covariate)) coef(model)

     (Intercept)       treatment1       treatment2 scale(covariate)
      -0.1720456        0.1994251        0.3167116        0.8753761

# Proposed way to determine effect size: simulate outcomes for each
# treatment level assuming the covariate to have a fixed value (here
# its mean value after standardization: zero)
no.of.sims <- 10000
sims.model <- MCMCregress(model, mcmc=no.of.sims) sims.model[1:2,]

     (Intercept) treatment1 treatment2 scale(covariate)      sigma2
[1,]  -0.1780735  0.2024111  0.3395233        0.8682119 0.002617449
[2,]  -0.1521623  0.1773623  0.2956053        0.8764573 0.003529013
sims.treat0 <- rnorm(no.of.sims, sims.model[,"(Intercept)"], sqrt(sims.model[,"sigma2"]))
sims.treat1 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] + sims.model[,"treatment1"], sqrt(sims.model[,"sigma2"])) sims.treat2 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] + sims.model[,"treatment2"], sqrt(sims.model[,"sigma2"]))

# Calculate Cohen's d for simulated values
cohens.d(sims.treat1, sims.treat0)
[1] 3.683093
cohens.d(sims.treat2, sims.treat0)
[1] 5.782622

These values are reasonably close to the ones (4 and 6) I plugged in at the beginning. It would be even nicer to have a confidence interval for them, but if I bootstrap one out of the simulated outcomes its width depends on the number of simulations and is therefore arbitrary. If anyone knew a better way to get at the effect sizes I'm looking for or how I could also get confidence intervals for them, that would be greatly appreciated.



Christoph Mathys, M.S.
Music and Neuroimaging Laboratory
Beth Israel Deaconess Medical Center
and Harvard Medical School

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 03 Feb 2008 - 15:12:12 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 04 Feb 2008 - 14:00:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive