Re: [R] handling NA by mean replacement

From: James Reilly <reilly_at_stat.auckland.ac.nz>
Date: Tue 31 Jan 2006 - 17:23:27 EST

Here are a couple of documents that make much the same point (e.g. "mean value imputation is not recommended"), and discuss several alternatives.

http://nces.ed.gov/statprog/2002/appendixb3.asp http://www2.chass.ncsu.edu/garson/pa765/missing.htm

I think we'd need more information on the context to provide any real advice. Another possible source of help is the Impute mailing list: http://lists.utsouthwestern.edu/mailman/listinfo/impute

Cheers,
James

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

On 31/01/2006 6:20 a.m., Berton Gunter wrote:

> Lots of other folks will give you the simple answer (hint: ?'[' ?is.na)
>
> Yours is one of those "iceberg" questions -- 2/3 hidden underwater.
>
> Two points:
>
> Point 1: Generally you **don't have to do such replacement** as most of R's
> functions have a na.rm or na.action argument (unfortunately, for historical
> reasons, the argument names and meanings aren't consistent) that does
> basically what you want anyway.
>
> Point 2: Doing what you ask is probably a bad idea, as it creates mythical
> degrees of freedom and biases results --> gives wrong statistical answers.
>
> As a general matter, handling missing values "correctly" is a difficult
> statistical issue that you may want to avoid if you can (R has plenty of
> packages that can deal with it, but it requires background expertise).
> Honestly, I'm not sure "if you can" makes any sense here (how do you know?),
> but let's just say that I think your potential for mischief is reduced if
> you use R's inbuilt arguments for ignoring missings rather than imputing
> them naively.
>
> Having said that, I believe that clustering procedures, for example, may not
> permit this (but they have builtin missing imputation capabilities of their
> own, do they not?), so you may have to impute. In this case, try to do so
> wisely (e.g. via multiple imputation?).
>
> Perhaps this will stimulate real experts to offer you some advice. Good
> luck.
>
> Cheers,
> Bert
>
> Bert Gunter
> Genentech
>
>> -----Original Message----- >> From: r-help-bounces@stat.math.ethz.ch >> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Julie Bernauer >> Sent: Monday, January 30, 2006 8:50 AM >> To: r-help@stat.math.ethz.ch >> Subject: [R] handling NA by mean replacement >> >> Hello >> >> I am sorry fuch such a stupid question. Suppose I have a >> table of data having a >> lot of NAs and I want to replace those NAs by the mean of the >> column before NA >> replacement. How is it possible to do that efficiently ? >> >> Thanks in advance, >> >> Julie >> >> -- >> Julie Bernauer >> Yeast Structural Genomics >> http://www.genomics.eu.org >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Jan 31 17:33:47 2006

This archive was generated by hypermail 2.1.8 : Tue 31 Jan 2006 - 21:04:45 EST