Re: [R] Fitdistr() versus nls()

From: Prof Brian Ripley <>
Date: Sun 24 Sep 2006 - 07:08:19 GMT

On Sat, 23 Sep 2006, Luca Telloli wrote:

> Hello R-Users,
> I'm new to R so I apologize in advance for any big mistake I might
> be doing. I'm trying to fit a set of samples with some probabilistic
> curve, and I have an important question to ask; in particular I have
> some data, from which I calculate manually the CDF, and then I import
> them into R and try to fit: I have the x values (my original samples)
> and the y values (P(X<x)).
> To attempt the fit I've both fitdistr() and nls(), in the way you
> can read in the piece of code at the end of the email. Because the
> fit with all data doesn't work very well, I decided to take a subset
> of samples randomly chosen (for some random x, the correspondant y is
> chosen).
> The first big problem is that fitdistr and nls, in the way I use
> them in the code, they don't get me similar results. I think I'm
> making a mistake but I can't really understand which one.
> From this first issue, a second one arises because the plot with nls
> is similar (maybe not a great fit bust still...) to my original CDF
> while the plot of fitdistr is basically a straight line of constant
> value y=1. On the other side, the fitdistr outperforms in the
> Kolmogorov-Smirnov test, which for nls has values of probability
> around 0 (not a good score huh?).
> Can u please tell me if there's a major mistake in the code?

Yes: you fit distribution models to x, not to the CDF of x. So the call to fitdistr and those to ks.test should be of somevals.x not somevals.y.

Using nls here is fitting an exponentional to the CDF by least-squares. This is not very sensible for an exponentional distribution where the MLE is known in closed form.

Using a KS test after estimating parameters is invalid unless you make adjustments (ks.test does not), and suitable adjustments are AFAIK only known for ML estimation. (The help page does say so and point you to the literature.)

> Thanks in advance,
> Luca
> ----------------------------------------------------------------
> cdf.all=read.table("all_failures.cdf", header=FALSE, col.names=c
> ("ttr", "cdf"), sep=":" )
> allvals.x=array(t(cdf.all[1]))
> allvals.y=array(t(cdf.all[2]))

What is the intention here? I suspect you want cdf.all[[1]] etc.

> library(MASS)
> plot(allvals.x, allvals.y)
> for(it in 1:100){
> #extract random samples
> random=sort(sample(1:length(allvals.x), 15))
> somevals.x=allvals.x[c(random)]
> somevals.y=allvals.y[c(random)]
> #fit with nls and fitdistr
> fit.exp = fitdistr(somevals.y, "exponential")
> nls.exp <- nls(somevals.y ~ pexp(somevals.x, rate), start=list(rate=.
> 0001), model=TRUE)
> #plot what you get out of the two fits
> lines(allvals.x, pexp(allvals.x, coef(fit.exp)), col=it)
> lines(allvals.x, pexp(allvals.x, coef(nls.exp)), col=it)
> #perform kolmogorov-smirnov test on your fit
> ks.exp.nls = ks.test(somevals.y, "pexp", coef(nls.exp))
> = ks.test(somevals.y, "pexp", coef(fit.exp))
> = max(,$p.value)
> bestval.exp.nls = max(bestval.exp.nls, ks.exp.nls$p.value)
> }
> print(
> print(bestval.exp.nls)
> ----------END OF
> CODE--------------------------------------------------------------------

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Sun Sep 24 17:15:50 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 24 Sep 2006 - 09:30:06 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.