Re: [R] missing values imputation

About this list Date view Thread view Subject view Author view Attachment view

From: A.J. Rossini (
Date: Thu 13 May 2004 - 03:23:05 EST

Message-id: <>

(Ted Harding) <> writes:

> On 12-May-04 Rolf Turner wrote:
>> Anne Piotet wrote:
>>> What R functionnalities are there to do missing values imputation
>>> (substantial proportion of missing data)? I would prefer to use
>>> maximum likelihood methods ; is the EM algorithm implemented? in
>>> which package?
>> The so-called ``EM algorithm'' is ***NOT*** an
>> algorithm. It is a methodology or a unifying concept.
>> It would be impossible to ``implement'' it. (Except
>> possibly by means of some extremely advanced and
>> sophisticated Artificial Intelligence software.)
> Do we understand the same thing by "EM Algorithm"?
> The one I'm thinking of -- formulated under that name by Dempster,
> Laird and Rubin in 1977 ("Maximum likelihood estimation from incomplete
> data via the EM algorithm", JRSS(B) 39, 1-38) -- is indeed an algorithm
> in exactly the same sense as any iterative search for the maximum of a
> function.
> Essentially, in the context of data modelled by an underlying exponential
> family distribution where there is incomplete information about the
> values which have this distribution, it proceeds by
> Start: Choose starting estimates for the parameters of the distribution
> E: Using the current parameter values, compute the expected vaues
> of the sufficient statistics conditional on the observed information
> M: Solve the maximum-likelihood equations (which are functions of the
> sufficient statistics) using the expected values computed in (E)
> If sufficently converged, stop. Otherwise, make the current parameter
> values equal to the values estimated in (M) and return to (E).
> Algorithm, this, or not????
> And where does "extremely advanced and sophisticated Artificial
> Intelligence software" come into it? You can, in some cases, perform
> the above EM algorithm by hand.
> Which "EM Algorithm" are you thinking of?

Thanks, Ted :-) -- to extend it a bit, one can imagine the use of
approximate solutions to the 2 steps (simulation methods to get
expected values, similar range of approaches for the maximization) and
get a general (but possibly not robust) computational solution for
the parametric problem. Just plug in a formula for the likelihood and
the sufficient statistics...

Of course, thousands of papers have been written on these variations
(likelihood, specific implementations of the E and M steps).


Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

______________________________________________ mailing list PLEASE do read the posting guide!

About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:09 EST