From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Sat, 23 Feb 2008 09:24:21 +0000 (GMT)

Date: Sat, 23 Feb 2008 09:24:21 +0000 (GMT)

Your example code is asserting that the events occurred at the times in 'obs.time', not before those times.

Surv(event = 1) means uncensored. If you try event = 0, the fitter diverges towards an exact fit, as I said it should.

Yes, you will get a good fit to the ECDF, but you are modelling the observation times, not the event times (and asserting that a continuous distribution fits discrete data).

I suggest you seek local statistical help: these are conceptual rather than R issues.

(BTW, using a .com address suggests you are a COMpany and in the absence of a signature block that will influence how much free help you are offered.)

On Sat, 23 Feb 2008, ahimsa campos-arceiz wrote:

> Dear Prof Ripley and Dimitris (cc R-list),

*>
**> thank you very much for your very insightful responses.
**>
**> I've been checking how to use survreg{survival} to fit a left-censored
**> lognormal distribution, and I was surprised to find that results are exactly
**> the same as with fitdistr{MASS}. Here is an example with a larger dataset:
**>
**> #====================
**> # data:
**> data1 <- data.frame(obs.time=c(30,31.98,35.95,37.2,46.4,50.5,54,56,60,62.95,
**> 69.4,71.5,74,76,78,79.25,81.1,84.68,90,95.37,100),
**> reps=c(5,47,80,18,20,32,29,8,29,2,7,8,1,4,3,2,3,3,1,2,1))
**>
**> data2 <- with(data1, data.frame(obs.time=rep(obs.time, reps),
**> event=rep(1,sum(data1$reps))))
**>
**> # 1. using fitdistr to estimate lognormal parameters
**> library(MASS)
**> fit1 <- fitdistr(data2$obs.time, "lognormal"); fit1
**> # meanlog sdlog
**> # 3.80552970 0.29093294
**> # (0.01665877) (0.01177953)
**>
**> # 2. using survreg for left-censored estimation
**> library(survival)
**> fm <- survreg(Surv(obs.time,event,type="left")~1,
**> data=data2, dist="lognormal"); summary(fm)
**> fm$coefficients[[1]]; fm$scale
**> # [1] 3.80553
**> # [1] 0.2909329
**> # results are the identical !!
**>
**> # plotting ecdf and the fitted curve =========
**> # parameters
**> fit1.mean <- fit1$estimate[[1]]
**> fit1.sd <- fit1$estimate[[2]]
**>
**> plot(ecdf(data2$obs.time), verticals=TRUE, do.p=FALSE, lwd=1.5,
**> xlim=c(0,116), ylab="pb", xlab="time")
**> curve(plnorm(x, meanlog=fit1.mean, sdlog=fit1.sd), add=TRUE, col='red',
**> lwd=3)
**> # the fit looks good to me
**>
**> #============= end script =========================
**>
**> I guess this relates to the following sentences from Prof Ripley:
**>
**> ML fitting can be done for censored data. However, I don't think
**> you have a valid description here: it seems you never recorded a time at
**> which the event had not happened, and the most likely fit is a probability
**> mass at zero (since this is a perfect explanation for your data).
**>
**> (which I don't fully understand) and
**>
**> If you have an ECDF, the jumps give you the data so you can just use
**> fitdistr().
**>
**> I have the ECDF and the number of observations is ~150-300. My final
**> understanding is therefore that fitdistr can be used to fit distributions by
**> ML over left-censored data, which is equivalent to fit the distribution
**> using cumulative probabilities (to go back to my original terminology).
**>
**> Is this understanding correct? I would highly appreciate any feedback,
**> especially if I am misunderstanding the way to estimate the function
**> parameters for this type of data.
**>
**> Thank you very much!
**>
**> Ahimsa
**>
**>
**>
**>
**> On Sat, Feb 23, 2008 at 2:35 AM, Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
**> wrote:
**>
**>> On Sat, 23 Feb 2008, ahimsa campos-arceiz wrote:
**>>
**>>> Dear all,
**>>>
**>>> I'm trying to estimate the parameters of a lognormal distribution fitted
**>>> from some data.
**>>>
**>>> The tricky thing is that my data represent the time at which I recorded
**>>> certain events. However, in many cases I don't really know when the
**>> event
**>>> happened. I' only know the time at which I recorded it as already
**>> happened.
**>>
**>> So this is a rather extreme form of censoring.
**>>
**>>> Therefore I want to fit the lognormal from the cumulative distribution
**>>> function (cdf) rather than from the probability distribution function
**>> (pdf).
**>>>
**>>> My understanding is that methods based on Maximum Likelihood (e.g.
**>> fitdistr
**>>> {MASS}) are based on the pdf. Nonlinear least-squares methods seem to be
**>>> based on the cdf... however I was unable to use nls{stat} for lognormal.
**>>
**>> Not so: ML fitting can be done for censored data. However, I don't think
**>> you have a valid description here: it seems you never recorded a time at
**>> which the event had not happened, and the most likely fit is a probability
**>> mass at zero (since this is a perfect explanation for your data).
**>>
**>> To make any progress with censoring, you need to see both positive and
**>> negative events. If you told us that none of these events happened before
**>> t=15, it would be possible to fit the model (although you would need far
**>> more data to get a good fit).
**>>
**>> Generally code to handle censoring is in survival analysis: e.g. survreg()
**>> in package survival. In the terminiology of the latter, all your
**>> observations are left-censored.
**>>
**>>> I found a website that explains how to fit univariate distribution
**>> functions
**>>> based on cumulative probabilities, including a lognormal example, in
**>> Matlab:
**>>>
**>> http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/cdffitdemo.html
**>>>
**>>> and other programs like TableCurve 2D seem to do this too.
**>>
**>> Maybe, but that is a different problem. If you have an ECDF, the jumps
**>> give you the data so you can just use fitdistr(). (And you will see
**>> comparing observed and fitted CDFs in MASS, the book.)
**>>
**>>
**>>> There must be a straightforward method in R which I have overlooked. Any
**>>> suggestion on how can I estimate these parameters in R or helpful
**>> references
**>>> are very much appreciated.
**>>>
**>>> (not sure if it helps but) here is an example of my type of data:
**>>>
**>>> treat.1 <- c(21.67, 21.67, 43.38, 35.50, 32.08, 32.08, 21.67, 21.67,
**>> 41.33,
**>>> 41.33, 41.33, 32.08, 21.67, 22.48, 23.25, 30.00, 26.00, 19.37,
**>> 26.00
**>>> ,
**>>> 32.08, 21.67, 26.00, 26.00, 43.38, 26.00, 21.67, 22.48, 35.50,
**>> 38.30,
**>>>
**>>> 32.08)
**>>>
**>>> treat.2 <- c(35.92, 12.08, 12.08, 30.00, 33.cy73, 35.92, 12.08, 30.00,
**>> 56.00,
**>>> 30.00, 35.92, 33.73, 12.08, 26.00, 54.00, 12.08, 12.08, 35.92,
**>> 35.92
**>>> ,
**>>> 12.08, 33.73, 35.92, 63.20, 30.00, 26.00, 33.73, 23.50, 30.00,
**>> 35.92
**>>> ,
**>>> 30.00)
**>>>
**>>> Thank you very much!
**>>>
**>>> Ahimsa
**>>>
**>>>
**>>> --
**>>> ahimsa campos-arceiz
**>>> www.camposarceiz.com
**>>>
**>>> [[alternative HTML version deleted]]
**>>>
**>>> ______________________________________________
**>>> R-help_at_r-project.org mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>> PLEASE do read the posting guide
**>> http://www.R-project.org/posting-guide.html
**>>> and provide commented, minimal, self-contained, reproducible code.
**>>>
**>>
**>> --
**>> Brian D. Ripley, ripley_at_stats.ox.ac.uk
**>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/>
**>> University of Oxford, Tel: +44 1865 272861 (self)
**>> 1 South Parks Road, +44 1865 272866 (PA)
**>> Oxford OX1 3TG, UK Fax: +44 1865 272595
**>>
**>
**>
**>
**>
*

-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Sat 23 Feb 2008 - 09:33:44 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 23 Feb 2008 - 10:30:16 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*