From: Jim Lemon <bitwrit_at_ozemail.com.au>

Date: Sun 14 Aug 2005 - 07:28:29 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Aug 13 21:28:48 2005

Date: Sun 14 Aug 2005 - 07:28:29 EST

Weiwei Shi wrote:

> Hi, there:

*> I think i need to re-phrase my question since last time I did not get
**> any reply but i think the question is not that hard, probably i did
**> not make the question clear:
**>
**> I want to find cases like
**> 35, 90, 330, 330, 335
**>
**> from the rest which look like
**> 3, 3, 3, 3.2, 3.3
**> 4, 4.4, 4.5, 4.6, 4.7
**> ....
**>
**> basically there is one (or more) big 'gap' in the case i seek.
**>
*

Hi Weiwei,

I think your method of defining a central value for the large proportion of values and then setting a criterion for outliers is valid (or at least as valid as many other ways of defining outliers). However, here is a different method, sorting the vector of values and then looking for a "gap" with a specified multiple (gap.prop) of the mean differences between the smaller values. It returns the first value after the "gap" (easily changed to all the values after). To account for vectors that have negative values the minimum value is subtracted when calculating "newx" and then added to the result. For your data, a gap.prop of 20 works, but the default value of 10 doesn't. It also won't work where large values are typical and small ones are the outliers (well, it will indicate where the "gap" is).

Jim

find.first.gap<-function(x,gap.prop=10) {
lenx<-length(x)

newx<-sort(x)-min(x)

not.found<-1

gap.pos<-2

# set the

mean.diff<-newx[2]-newx[1]

while(not.found && gap.pos <= lenx) {

this.diff<-newx[gap.pos]-newx[gap.pos-1]
print(c(mean.diff,this.diff))

if(mean.diff != 0) {

if(this.diff/mean.diff >= gap.prop) not.found<-0
else gap.pos<-gap.pos+1

}

else gap.pos<-gap.pos+1

mean.diff<-(this.diff+mean.diff*(gap.pos-1))/gap.pos
}

return(newx[gap.pos]+min(x))

}

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Aug 13 21:28:48 2005

*
This archive was generated by hypermail 2.1.8
: Sun 23 Oct 2005 - 15:18:24 EST
*