# [R] Re: Hoaglin Outlier Method

From: Kenneth Hobson <khobson_at_aaahawk.com>
Date: Fri 06 May 2005 - 08:57:59 EST

Boxplot.stats seems to be somewhat helpful but not the full answer to my needs for eliminating outliers. Any other suggestions?

In the first post I mentioned the Appendix A from http://trb.org/publications/nchrp/nchrp_w71.pdf . They used X and Y varialbes whereas boxplot.stats is using just one variable. Can boxplot.stats use two variables. X and Y in this case are two samples that are usually from the same population.

I've posted some example code below. The first is the same as posted earlier but a little easier to paste for testing. The guts of what I tried follow. I used two iterations and it found 3 of the 4 outliers determined in the Appendix A.

# Data from NCHRP Appendix A - http://trb.org/publications/nchrp/nchrp_w71.pdf
T314 <- structure(list(Lab = as.integer(c(1:60)), X = c(4.89, 3.82, 2.57,

```2.3,2.034, 2, 1.97, 1.85,1.85, 1.85, 1.84, 1.82, 1.82, 1.77, 1.76, 1.67, 1.66,
1.63, 1.62,1.62, 1.55, 1.54, 1.54, 1.53, 1.53, 1.44, 1.428, 1.42, 1.39, 1.36,
1.35, 1.31, 1.28, 1.24, 1.24, 1.23, 1.22, 1.21, 1.19, 1.18, 1.18, 1.18, 1.17,
1.16, 1.13, 1.13, 1.099, 1.09, 1.09, 1.08, 1.07, 1.05, 0.98, 0.97, 0.84, 0.808,
0.69, 0.63, 0.6, 0.5), Y = c(5.28, 3.82, 2.41, 2.32, 2.211, 1.46, 2.24, 1.91,
1.78, 1.63, 1.81, 1.92, 1.2, 1.67, 1.28, 1.59, 1.45, 2.06, 1.91, 1.19, 1.26,
1.79, 1.39, 1.48, 0.72, 1.29, 1.517, 1.71, 1.12, 1.38, 0.93, 1.36, 1.2, 1.23,
0.71, 1.29, 1.26, 1.48, 1.26, 1.33, 1.21, 1.04, 1.57, 1.42, 1.08, 1.04, 1.33,
1.33, 1.2, 1.05, 1.24, 0.91, 0.99, 1.06, 1.27, 0.702, 0.77, 0.58, 1, 0.38)),
```
.Names = c("Lab", "X", "Y" ), class = "data.frame", row.names = as.character(c(1:60)))

# Eliminate outliers in X, sample 1

bs <- boxplot.stats(T314\$X, coef=1.5)
bs.out <- bs\$out
zX <- subset(T314, !T314\$X %in% bs.out)
bs.out; nrow(zX); zX

# Eliminate outliers in X, sample 1, again (Recheck in other words)
bs <- boxplot.stats(zX\$X, coef=1.5)
bs.out <- bs\$out
zX <- subset(zX, !zX\$X %in% bs.out)
bs.out; nrow(zX); zX

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list