[R] standardization of values before call to pam() or clara()

From: Dylan Beaudette <dylan.beaudette_at_gmail.com>
Date: Tue 23 May 2006 - 10:33:47 EST


Experimenting with the cluster package, and am starting to scratch my head in regards to the *best* way to standardize my data. Both functions can pre-standardize columns in a dataframe. according to the manual:

Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation.

This works well when input variables are all in the same units. When I include new variables with a different intrinsic range, the ones with the largest relative values tend to be _weighted_ . this is certainly not surprising, but complicates things.

Does there exist a robust technique to effectively re-scale each of the variables, regardless of their intrinsic range to some set range, say from {0,1} ?

I have tried dividing a variable by the maximum value of that variable, but I am not sure if this is statistically correct.

Any ideas, thoughts would be greatly appreciated.


Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue May 23 11:33:02 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 04 Jun 2006 - 00:10:29 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.