Re: [R] What ECDF function?

From: Shiazy Fuzzy <>
Date: Sun, 10 Jun 2007 00:36:05 +0200

On 6/9/07, Robert A LaBudde <> wrote:
> At 12:57 PM 6/9/2007, Marco wrote:
> ><snip>
> >2.I found various version of P-P plot where instead of using the
> >"ecdf" function use ((1:n)-0.5)/n
> > After investigation I found there're different definition of ECDF
> >(note "i" is the rank):
> > * Kaplan-Meier: i/n
> > * modified Kaplan-Meier: (i-0.5)/n
> > * Median Rank: (i-0.3)/(n+0.4)
> > * Herd Johnson i/(n+1)
> > * ...
> > Furthermore, similar expressions are used by "ppoints".
> > So,
> > 2.1 For P-P plot, what shall I use?
> > 2.2 In general why should I prefer one kind of CDF over another one?
> ><snip>


> This is an age-old debate in statistics. There are many different
> formulas, some of which are optimal for particular distributions.

> Using i/n (which I would call the Kolmogorov method), (i-1)/n or
> i/(n+1) is to be discouraged for general ECDF modeling. These
> correspond in quality to the rectangular rule method of integration
> of the bins, and assume only that the underlying density function is
> piecewise constant. There is no disadvantage to using these methods,
> however, if the pdf has multiple discontinuities.

> I tend to use (i-0.5)/n, which corresponds to integrating with the
> "midpoint rule", which is a 1-point Gaussian quadrature, and which is
> exact for linear behavior with derivative continuous. It's simple,
> it's accurate, and it is near optimal for a wide range of continuous
> alternatives.

Hmmm I'm a bit confused, but very interested! So you don't use the R "ecdf", do you?

> The formula (i- 3/8)/(n + 1/4) is optimal for the normal
> distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so
> there is no real benefit to using it. Similarly, there is a formula
> (i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure
> (don't need to test) the form of the distribution, you're better off
> fitting that distribution function directly and not worrying about the edf.


> Also remember that edfs are not very accurate, so the differences
> between these formulae are difficult to justify in practice.

I will bear in min! My first interpretation was that using some different from i/n (e.g. i/(n+1)),
let to better individuate tail differences (maybe...)


> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail:
> Least Cost Formulations, Ltd. URL:
> 824 Timberlake Drive Tel: 757-467-0954
> Virginia Beach, VA 23464-3239 Fax: 757-467-2947


> "Vere scire est per causas scire"

> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Sat 09 Jun 2007 - 22:42:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 09 Jun 2007 - 23:31:41 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.