Re: [R] Is there a good package for multiple imputation of missing values in R?

From: Frank E Harrell Jr <>
Date: Mon, 30 Jun 2008 13:25:13 -0500

Robert A LaBudde wrote:
> At 03:02 AM 6/30/2008, Robert A. LaBudde wrote:

>> I'm looking for a package that has a start-of-the-art method of 
>> imputation of missing values in a data frame with both continuous and 
>> factor columns.
>> I've found transcan() in 'Hmisc', which appears to be possibly suited 
>> to my needs, but I haven't been able to figure out how to get a new 
>> data frame with the imputed values replaced (I don't have Herrell's 
>> book).
>> Any pointers would be appreciated.

> Thanks to "paulandpen", Frank and Shige for suggestions.
> I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'.
> I still haven't mastered the methodology for using aregImpute() in
> 'Hmisc' based on the help information. I think I'll have to get hold of
> Frank's book to see how it's used in a complete example.

It's not in the book; it will be in the 2nd edition someday Frank

> 'Amelia' and 'norm' appear to be focused solely on continuous,
> multivariate normal variables, but my needs typically involve datasets
> with both factors and continuous variables.
> The function mice() in 'mice' appears to best suit my needs, and the
> help file was intelligible, and it works on both factors and continuous
> variables.
> For those in the audience with similar issues, here is a code snippet
> showing how some of these functions work ('felon' is a data frame with
> categorical and continuous predictors of the binary variable 'hired'):
> library('mice') #missing data imputation library for md.pattern(),
> mice(), complete()
> names(felon) #show variable names
> md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
> library('Hmisc') #package for na.pattern() and impute()
> na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
> #simple imputation can be done by
> felon2<- felon #make copy
> felon2$felony<- impute(felon2$felony) #impute NAs (most frequent)
> felon2$gender<- impute(felon2$gender) #impute NAs
> felon2$natamer<- impute(felon2$natamer) #impute NAs
> na.pattern(felon2[,1:4]) #show no NAs left in these vars
> fit2<- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial)
> summary(fit2)
> #better, multiple imputation can be done via mice():
> imp<- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations)
> for (iSet in 1:5) { #show results for the 5 imputation datasets
> fit<- glm(hired ~ felony + gender + natamer,
> data=complete(imp, iSet), family=binomial) #fit to iSet-th realization
> print(summary(fit))
> }
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail:
> Least Cost Formulations, Ltd. URL:
> 824 Timberlake Drive Tel: 757-467-0954
> Virginia Beach, VA 23464-3239 Fax: 757-467-2947
> "Vere scire est per causas scire"
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 30 Jun 2008 - 18:33:36 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 30 Jun 2008 - 19:31:52 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive