From: Spencer Graves <spencer.graves_at_pdf.com>

Date: Sun 07 Aug 2005 - 12:10:31 EST

I'm not certain what you are asking. PLEASE do read the posting guide! "http://www.R-project.org/posting-guide.html". If you formulate your question in terms of a simple example, showing where you got stuck as suggested in the posting guide, it might help others understand your question and inspire suggestions.

TINSTAFL = There is no such thing as a free lunch (Heinlein, The Moon is a Harsh Mistress)

spencer graves

Weiwei Shi wrote:

> Dear listers:

> I have an idea to do the outlier detection and I need to use R to
> implement it first. Here I hope I can get some input from all the
> guru's here.
> guru's here.
> I select distance-based approach---
> step 1:
> calculate the distance of any two rows for a dataframe. considering
> the scaling among different variables, I choose mahalanobis, using
> variance as scaler.
**> variance as scaler.
> step 2:
> Let k be the number of points in one "cluster". K is decided by
> answering the following question: how many neighbors a point needs for
> not being an outlier.
**> not being an outlier.
**>
> for each point, get the smallest (k-1) distances from step1. Among
> the (k-1) distances of each point, get the max for the point.
> step 3:
> get the distribution of those max for all the points. Thus, the
**> multivariate problem becomes a univariate one. Then the outlier in
**> those max's will define the outlier of the point.
> My question is:
> 1. I don't know if using mahalanobis is proper or not since most
> clustering algorithms implemented in R (like pam or clara) use
> euclidean or mahattan.
**> euclidean or mahattan.
> 2. Is there a way to get the mahalanobis distance matrix for any two
> rows of a dataframe or matrix?
**> rows of a dataframe or matrix?
**> 3. My approach does allow a point belonging to more than one
**> k-cluster. Is there similar algorithm in R or published?
> Thanks for any suggestions,
> weiwei
