Re: [R] What ECDF function?

From: Shiazy Fuzzy <shiazy_at_gmail.com>
Date: Sun, 10 Jun 2007 00:36:05 +0200

On 6/9/07, Robert A LaBudde <ral_at_lcfltd.com> wrote:
> At 12:57 PM 6/9/2007, Marco wrote:
> ><snip>
> >2.I found various version of P-P plot where instead of using the
> >"ecdf" function use ((1:n)-0.5)/n
> > After investigation I found there're different definition of ECDF
> >(note "i" is the rank):
> > * Kaplan-Meier: i/n
> > * modified Kaplan-Meier: (i-0.5)/n
> > * Median Rank: (i-0.3)/(n+0.4)
> > * Herd Johnson i/(n+1)
> > * ...
> > Furthermore, similar expressions are used by "ppoints".
> > So,
> > 2.1 For P-P plot, what shall I use?
> > 2.2 In general why should I prefer one kind of CDF over another one?
> ><snip>

>

> This is an age-old debate in statistics. There are many different
> formulas, some of which are optimal for particular distributions.
>

> Using i/n (which I would call the Kolmogorov method), (i-1)/n or
> i/(n+1) is to be discouraged for general ECDF modeling. These
> correspond in quality to the rectangular rule method of integration
> of the bins, and assume only that the underlying density function is
> piecewise constant. There is no disadvantage to using these methods,
> however, if the pdf has multiple discontinuities.
>

> I tend to use (i-0.5)/n, which corresponds to integrating with the
> "midpoint rule", which is a 1-point Gaussian quadrature, and which is
> exact for linear behavior with derivative continuous. It's simple,
> it's accurate, and it is near optimal for a wide range of continuous
> alternatives.
>

Hmmm I'm a bit confused, but very interested! So you don't use the R "ecdf", do you?

> The formula (i- 3/8)/(n + 1/4) is optimal for the normal
> distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so
> there is no real benefit to using it. Similarly, there is a formula
> (i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure
> (don't need to test) the form of the distribution, you're better off
> fitting that distribution function directly and not worrying about the edf.

>

> Also remember that edfs are not very accurate, so the differences
> between these formulae are difficult to justify in practice.
>

I will bear in min! My first interpretation was that using some different from i/n (e.g. i/(n+1)),
let to better individuate tail differences (maybe...)

Regards,

> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral_at_lcfltd.com
> Least Cost Formulations, Ltd. URL: http://lcfltd.com/
> 824 Timberlake Drive Tel: 757-467-0954
> Virginia Beach, VA 23464-3239 Fax: 757-467-2947

>

> "Vere scire est per causas scire"
>

> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 09 Jun 2007 - 22:42:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 09 Jun 2007 - 23:31:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.