Re: [R] Logistic regression with multiple imputation

From: Chuck Cleland <ccleland_at_optonline.net>
Date: Wed, 30 Jun 2010 06:02:29 -0400

On 6/30/2010 1:14 AM, Daniel Chen wrote:
> Hi,
>
> I am a long time SPSS user but new to R, so please bear with me if my
> questions seem to be too basic for you guys.
>
> I am trying to figure out how to analyze survey data using logistic
> regression with multiple imputation.
>
> I have a survey data of about 200,000 cases and I am trying to predict the
> odds ratio of a dependent variable using 6 categorical independent variables
> (dummy-coded). Approximatively 10% of the cases (~20,000) have missing data
> in one or more of the independent variables. The percentage of missing
> ranges from 0.01% to 10% for the independent variables.
>
> My current thinking is to conduct a logistic regression with multiple
> imputation, but I don't know how to do it in R. I searched the web but
> couldn't find instructions or examples on how to do this. Since SPSS is
> hopeless with missing data, I have to learn to do this in R. I am new to R,
> so I would really appreciate if someone can show me some examples or tell me
> where to find resources.

  Here is an example using the Amelia package to generate imputations and the mitools and mix packages to make the pooled inferences.

titanic <-
read.table("http://lib.stat.cmu.edu/S/Harrell/data/ascii/titanic.txt", sep=',', header=TRUE)

set.seed(4321)

titanic$sex[sample(nrow(titanic), 10)] <- NA
titanic$pclass[sample(nrow(titanic), 10)] <- NA
titanic$survived[sample(nrow(titanic), 10)] <- NA

library(Amelia) # generate multiple imputations
library(mitools) # for MIextract()
library(mix) # for mi.inference()

titanic.amelia <- amelia(subset(titanic, select=c('survived','pclass','sex','age')),

                         m=10, noms=c('survived','pclass','sex'),
emburn=c(500,500))

allimplogreg <- lapply(titanic.amelia$imputations, function(x){glm(survived ~ pclass + sex + age, family=binomial, data = x)})

mice.betas.glm <- MIextract(allimplogreg, fun=function(x){coef(x)}) mice.se.glm <- MIextract(allimplogreg, fun=function(x){sqrt(diag(vcov(x)))})

as.data.frame(mi.inference(mice.betas.glm, mice.se.glm))

# Or using only mitools for pooled inference

betas <- MIextract(allimplogreg, fun=coef) vars <- MIextract(allimplogreg, fun=vcov) summary(MIcombine(betas,vars))

> Thank you!
>
> Daniel
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 30 Jun 2010 - 10:09:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Jun 2010 - 13:10:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive