Date: Wed, 05 Dec 2007 12:43:50 -0600

**> I think you did not understand ecdf() !!!
**> It *returns* a function,
**> that you can then apply to old (or new) data; see below
**>
**> MS> You can use ls.str() to look into the function environment:
**>
**> >> ls.str(environment(ecdf(x)))
**> MS> f : num 0
**> MS> method : int 2
**> MS> n : int 25
**> MS> x : num [1:25] -2.215 -1.989 -0.836 -0.820 -0.626 ...
**> MS> y : num [1:25] 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 ...
**> MS> yleft : num 0
**> MS> yright : num 1
**>
**>
**>
**> MS> You can then use get() or mget() within the function environment to
**> MS> return the requisite values. Something along the lines of the following
**> MS> within the function percentrank():
**>
**> MS> percentrank <- function(x, val)
**> MS> {
**> MS> env.x <- environment(ecdf(x))
**> MS> res <- mget(c("x", "y"), env.x)
**> MS> Ind <- which(sapply(seq(length(res$x)),
**> MS> function(i) isTRUE(all.equal(res$x[i], val))))
**> MS> res$y[Ind]
**> MS> }
**>
**> sorry Marc, but "Yuck !!"
**>
**> - this percentrank() only works when you apply it to original x[i] values
**> - only works for 'val' of length 1
**> - is a complicated hack
**>
**> and absolutely unneeded (see below)
**>
**> MS> Thus:
**>
**> MS> set.seed(1)
**> MS> x <- rnorm(25)
**>
**> >> x
**> MS> [1] -0.62645381 0.18364332 -0.83562861 1.59528080 0.32950777
**> MS> [6] -0.82046838 0.48742905 0.73832471 0.57578135 -0.30538839
**> MS> [11] 1.51178117 0.38984324 -0.62124058 -2.21469989 1.12493092
**> MS> [16] -0.04493361 -0.01619026 0.94383621 0.82122120 0.59390132
**> MS> [21] 0.91897737 0.78213630 0.07456498 -1.98935170 0.61982575
**>
**>
**> >> percentrank(x, 0.48742905)
**> MS> [1] 0.56
**>
**> [gives 0.52 in my version of R ]
**>
**> Well, that is *THE SAME* as using ecdf() the way you
**> should have used it :
**>
**> ecdf(x)(0.48742905)
**>
**> {in two lines, that is
**>
**> mypercR <- ecdf(x)
**> mypercR(0.48742905)
**>
**> which maybe easier to understand, if you have never used the
**> nice concept that underlies all of
**>
**> approxfun(), splinefun() or ecdf()
**> }
**>
**> You can also use
**>
**> ecdf(x)(x)
**>
**> and indeed check that it is identical to the convoluted
**> percentrank() function above :
**>
**> > ecdf(x)(0.48742905)
**> [1] 0.52
**> > ecdf(x)(x)
**> [1] 0.20 0.44 0.12 1.00 0.48 0.16 0.56 0.72 0.60 0.28 0.96 0.52 0.24 0.04 0.92
**> [16] 0.32 0.36 0.88 0.80 0.64 0.84 0.76 0.40 0.08 0.68
**> > all(ecdf(x)(x) == sapply(x, function(v) percentrank(x,v)))
**> [1] TRUE
**> >
**>
**>
**> Regards (and apologies for my apparent indignation ;-)
**> by the author of ecdf() ,
**>
**> Martin Maechler, ETH Zurich
*

Martin,

Thanks for the corrections. In hindsight, now seeing the intended use of ecdf() in the fashion you describe above, it is now clear that my approach in response to David's query was un-needed and "over the top". "Yuck" is quite appropriate... :-)

As I was going through this "exercise", it did seem overly complicated, given R's usual elegant philosophy about such things. I suppose if I had looked at the source for plot.stepfun(), it would have been more evident as to how the y values are acquired.

In reviewing the examples in ?ecdf, I think that an example using something along the lines of the discussion here more explicitly, would be helpful. It is not crystal clear from the examples, that one can use ecdf() in this fashion, though the use of "12 * Fn(tt)" hints at it.

Perhaps:

##-- Simple didactical ecdf example:

x <- rnorm(12)

Fn <- ecdf(x)

Fn

Fn(x) # returns the percentiles for x

...

Thanks again Martin and no offense taken... :-)

Regards,

Marc

