Re: [R] Density Estimation

From: Adelchi Azzalini <azzalini_at_stat.unipd.it>
Date: Fri 09 Jun 2006 - 01:35:10 EST

On Wed, 07 Jun 2006 19:54:32 +0200, Pedro Ramirez wrote:

PR> >Not a direct answer to your question, but if you use a logspline
PR> >density estimate rather than a kernal density estimate then the
PR> >logspline package will help you and it has built in functions for
PR> >dlogspline, qlogspline, and plogspline that do the integrals for
PR> >you.
PR> >
PR> >If you want to stick with the KDE, then you could find the area
PR> >under each of the kernals for the range you are interested in
PR> >(need to work out the standard deviation used from the bandwidth,
PR> >then use pnorm for the default gaussian kernal), then just sum
PR> >the individual areas.
PR> >
PR> >Hope this helps,
PR> 
PR> Thanks a lot for your quick help! I think I will follow your first
PR> 
PR> suggestion (logspline
PR> density estimation) instead of summing over the kernel areas
PR> because at the boundaries of the range truncated kernel areas can
PR> occur, so I think it is easier to do it with logsplines. Thanks
PR> again for your help!!
PR> 
PR> Pedro
PR> 
PR> 

Besides the computational aspect, there is a statistical one: the optimal choice of bandwidth for estimating the density function is not optimal (and possibly not even jsut sensible) for estimating the distribution function, and the stated problem is equivalent to estimation of the distribution function.

In mathematical terms the optimal bandwith for density estimation decreases at rate n^{-1/5}, while the one for distribution function decreases at rate n^{-1/3}, if n is the sample size. In practical terms, one must choose an appreciably smaller bandwidth in the second case than in the first one.

best wishes,

Adelchi

PR> 
PR> >
PR> >--
PR> >Gregory (Greg) L. Snow Ph.D.
PR> >Statistical Data Center
PR> >Intermountain Healthcare
PR> >greg.snow@intermountainmail.org
PR> >(801) 408-8111
PR> >
PR> >
PR> >-----Original Message-----
PR> >From: r-help-bounces@stat.math.ethz.ch
PR> >[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Pedro
PR> >Ramirez Sent: Wednesday, June 07, 2006 11:00 AM
PR> >To: r-help@stat.math.ethz.ch
PR> >Subject: [R] Density Estimation
PR> >
PR> >Dear R-list,
PR> >
PR> >I have made a simple kernel density estimation by
PR> >
PR> >x <- c(2,1,3,2,3,0,4,5,10,11,12,11,10)
PR> >kde <- density(x,n=100)
PR> >
PR> >Now I would like to know the estimated probability that a new
PR> >observation falls into the interval 0<x<3.
PR> >
PR> >How can I integrate over the corresponding interval?
PR> >In several R-packages for kernel density estimation I did not
PR> >found a corresponding function. I could apply Simpson's Rule for
PR> >integrating, but perhaps somebody knows a better solution.
PR> >
PR> >Thanks a lot for help!
PR> >
PR> >Pedro
PR> >
PR> >_________
PR> >
PR> >______________________________________________
PR> >R-help@stat.math.ethz.ch mailing list
PR> >https://stat.ethz.ch/mailman/listinfo/r-help
PR> >PLEASE do read the posting guide!
PR> >http://www.R-project.org/posting-guide.html
PR> >
PR> 
PR> ______________________________________________
PR> R-help@stat.math.ethz.ch mailing list
PR> https://stat.ethz.ch/mailman/listinfo/r-help
PR> PLEASE do read the posting guide!
PR> http://www.R-project.org/posting-guide.html
PR>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jun 09 01:37:56 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 09 Jun 2006 - 02:11:00 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.