Re: [R] Fitting a distribution to peaks in histogram

From: Berton Gunter <gunter.berton_at_gene.com>
Date: Thu 20 Jul 2006 - 03:10:02 EST


With this much data, I think it makes more sense to fit a nonparametric density estimate. ?density does this via a kernel density procedure, but RSiteSearch('nonparametric density') will find many alternatives. The ash and mclust packages are two that come to mind, but there are certainly others.

Of course, if you must have a parametric fit, then you'll have to fit a mixture of some sort. But when both the number of components and individual distributions are to be estimated, this is a nontrivial problem, as one runs into identifiability issues and corresponding convergence problems. V&R's discussion of density estimation in MASS has some useful things to say about these issues, and Ripley's book, "PATTERN RECOGNITION AND NEURAL NETWORKS" has even more. As both sources indicate, there's a large literature on this issue and much software.

Cheers,
Bert Gunter  

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of hadley wickham
> Sent: Wednesday, July 19, 2006 9:21 AM
> To: Ulrik Stervbo
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] Fitting a distribution to peaks in histogram
>
> > I would like to fit a distribution to each of the peaks in
> a histogram, such
> > as this:
> http://photos1.blogger.com/blogger/7029/2724/1600/DU145-Bax3-B
> cl-xL.png
>
> As a first shot, I'd try fitting a mixture of gamma distributions (say
> 3), plus a constant term for the highest bin. You could do this using
> ML. If the number of peaks is truly unknown, this will be a little
> trickier but still possible and you could use the LRT to chose between
> them.
>
> > Integrate the area between each two peaks, using the means
> and widths of the
> > distributions fitted to the two peaks. I will be using the integrate
> > function
>
> Why do you want to do this?
>
> >
> > The histogram is based on approximately 15000 events, which
> makes Mclust and
> > pam (which both delivers the information I need) less useful.
>
> If you have unbinned data, it would be better (more precise/powerful)
> to use that.
>
> Regards,
>
> Hadley
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Jul 20 03:16:38 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 20 Jul 2006 - 04:22:08 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.