[R] a question about LMS and what constitutes outliers

From: Rajarshi Guha <rxg218_at_psu.edu>
Date: Fri 07 Oct 2005 - 04:57:35 EST

  I have been using the lqs function with method='lms'. However the results I get are a little different from the results noted by Rousseeuw & Leroy (Robust Regression and Outlier Detection) and I was wondering how to use these results for outlier detection.

I'm using the stackloss dataset, for which the original Rousseeuw et al. program points out that observations 1,2,3,4 and 21 are outliers.

This conslusion is arrived at by testing whether the residual is greater than 2.5 * standard error

Netx I ran lqs as:

m <- lqs(stackloss[,-4], stackloss[,4], method='lms', control=list
(psamp=4, nsamp='exact', adjust=TRUE))

(I ran it exhaustively since that was how I ran the original program
from Rousseeuw)

The coefficients obtained from lqs() are more or less identical to that obtained by the original program. However the scale estimates do not match. I assume that this would be becuase of the per sample adjustments.

Now if I want to decide whether an observation is an outlier I use the condition

which( abs(m$resid) > 2.5 * m$scale[1] )

and this gives me

 1 2 3 4 8 13 14 20 21
 1 2 3 4 8 13 14 20 21

Now, it includes the original outliers as noted by Rousseuw, but also 4 extra ones. From a plot of the residuals I can see obs 13,14,20 possibly being regarded as outliers but 8 seems a stretch.

I tried evaluating the above condition with m$scale[2] but I get the same result. I also tried running lqs() with adjust=FALSE in which case using the above condition obs 1,2,3,4,13,20,21 are regarded as outliers.

So my questions are

  1. Am I correct in using the above condition to determine whether an observation is an outlier?
  2. If so, is it correct that lqs() will detect more outliers than noted by the original book/program?


Rajarshi Guha <rxg218_at_psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE

After an instrument has been assembled, extra components will be found on the bench.

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 07 05:00:51 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:26:51 EST