# Re: [R] missing values imputation

From: A.J. Rossini (rossini@blindglobe.net)
Date: Thu 13 May 2004 - 03:23:05 EST

```Message-id: <85y8nxo8zq.fsf@servant.blindglobe.net>

```

(Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> writes:

> On 12-May-04 Rolf Turner wrote:
>> Anne Piotet wrote:
>>
>>> What R functionnalities are there to do missing values imputation
>>> (substantial proportion of missing data)? I would prefer to use
>>> maximum likelihood methods ; is the EM algorithm implemented? in
>>> which package?
>>
>> The so-called ``EM algorithm'' is ***NOT*** an
>> algorithm. It is a methodology or a unifying concept.
>> It would be impossible to ``implement'' it. (Except
>> possibly by means of some extremely advanced and
>> sophisticated Artificial Intelligence software.)
>
> Do we understand the same thing by "EM Algorithm"?
>
> The one I'm thinking of -- formulated under that name by Dempster,
> Laird and Rubin in 1977 ("Maximum likelihood estimation from incomplete
> data via the EM algorithm", JRSS(B) 39, 1-38) -- is indeed an algorithm
> in exactly the same sense as any iterative search for the maximum of a
> function.
>
> Essentially, in the context of data modelled by an underlying exponential
> family distribution where there is incomplete information about the
> values which have this distribution, it proceeds by
>
> Start: Choose starting estimates for the parameters of the distribution
> E: Using the current parameter values, compute the expected vaues
> of the sufficient statistics conditional on the observed information
> M: Solve the maximum-likelihood equations (which are functions of the
> sufficient statistics) using the expected values computed in (E)
> If sufficently converged, stop. Otherwise, make the current parameter
> values equal to the values estimated in (M) and return to (E).
>
> Algorithm, this, or not????
>
> And where does "extremely advanced and sophisticated Artificial
> Intelligence software" come into it? You can, in some cases, perform
> the above EM algorithm by hand.
>
> Which "EM Algorithm" are you thinking of?

Thanks, Ted :-) -- to extend it a bit, one can imagine the use of
approximate solutions to the 2 steps (simulation methods to get
expected values, similar range of approaches for the maximization) and
get a general (but possibly not robust) computational solution for
the parametric problem. Just plug in a formula for the likelihood and
the sufficient statistics...

Of course, thousands of papers have been written on these variations
(likelihood, specific implementations of the E and M steps).

best,
-tony

```--
rossini@u.washington.edu            http://www.analytics.washington.edu/
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email
CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help