RE: [R] Normalization and missing values

From: Berton Gunter <gunter.berton_at_gene.com>
Date: Thu 14 Apr 2005 - 03:07:56 EST


Normalization: ?scale -- or, more usually, an argument in the clustering function (see package "cluster" where "stand" is the argument in the various functions. Other packages may have similar capabilties).

Missing Values: A HUGE and COMPLEX issue. One Reference: ANALYSIS OF INCOMPLETE MULTIVARIATE DATA by J.L. Schafer (Chapman and Hall); Donald Rubin has published several books and many papers on this, so anything by him is another good resource.

Setting missings to 0 will clearly produce nonsense, as two cases with lots of missings in corresponding coordinates will cluster together when there is no reason for them to do so. Set them to NA, but as some clustering routines work only with complete cases, this might leave you with a data set of size 0. So you need clustering methods that can work with missing data, e.g. pam, clara, etc.; but of course one doesn't quite know what to make of two cases that are deemed to be "close" on the basis of, say, 10% of nonmissing shared coordinates as compared to cases that are close based on all shared coordinates. You can't expect statistical procedures to rescue you from poor data.

"The business of the statistician is to catalyze the scientific learning process." - George E. P. Box    

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Chris
> Bergstresser
> Sent: Wednesday, April 13, 2005 9:37 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Normalization and missing values
>
> Hi all --
>
> I've got a large dataset which consists of a bunch of different
> scales, and I'm preparing to perform a cluster analysis. I need to
> normalize the data so I can calculate the difference matrix.
> First, I didn't see a function in R which does
> normalization -- did
> I miss it? What's the best way to do it?
> Second, what's the best way to deal with missing values?
> Obviously,
> I could just set them to 0 (the mean of the normalized
> scales), but I'm
> not sure that's the best way.
>
> -- Chris
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list

> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Apr 14 03:16:38 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:07 EST