Re: [R] some thoughts on outlier detection, need help!

From: Spencer Graves <spencer.graves_at_pdf.com>
Date: Sun 07 Aug 2005 - 12:10:31 EST

          I'm not certain what you are asking. PLEASE do read the posting guide! "http://www.R-project.org/posting-guide.html". If you formulate your question in terms of a simple example, showing where you got stuck as suggested in the posting guide, it might help others understand your question and inspire suggestions.

          TINSTAFL = There is no such thing as a free lunch (Heinlein, The Moon is a Harsh Mistress)

          spencer graves

Weiwei Shi wrote:

> Dear listers:
> I have an idea to do the outlier detection and I need to use R to
> implement it first. Here I hope I can get some input from all the
> guru's here.
>
> I select distance-based approach---
> step 1:
> calculate the distance of any two rows for a dataframe. considering
> the scaling among different variables, I choose mahalanobis, using
> variance as scaler.
>
> step 2:
> Let k be the number of points in one "cluster". K is decided by
> answering the following question: how many neighbors a point needs for
> not being an outlier.
>
> for each point, get the smallest (k-1) distances from step1. Among
> the (k-1) distances of each point, get the max for the point.
>
> step 3:
> get the distribution of those max for all the points. Thus, the
> multivariate problem becomes a univariate one. Then the outlier in
> those max's will define the outlier of the point.
>
> My question is:
> 1. I don't know if using mahalanobis is proper or not since most
> clustering algorithms implemented in R (like pam or clara) use
> euclidean or mahattan.
> 2. Is there a way to get the mahalanobis distance matrix for any two
> rows of a dataframe or matrix?
> 3. My approach does allow a point belonging to more than one
> k-cluster. Is there similar algorithm in R or published?
>
> Thanks for any suggestions,
>
> weiwei

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves@pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sun Aug 07 12:17:47 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:07:49 EST