[R] Solved (??) Behaviour of integrate (was 'Poisson-lognormal probab ility calculations')

From: <k.jewell_at_campden.co.uk>
Date: Mon, 18 Feb 2008 12:25:26 +0000


Hi Again,

I think I've solved my problem, but please tell me if you think I'm wrong, or you can see a better way!

A plot of the integrand showed a very sharp peak, so I was running into the integrand "feature" mentioned in the note. I resolved it by limiting the range of integration as shown here:



function (x, meanlog = 0, sdlog = 1, sdlim=6, rel.tol = .Machine$double.eps^0.5,...) {
#+

# K. Jewell Feb 2008
# Based on VGAM::dpolono, modified to cover wider range of x
# Some simplification, but major change is to avoid integration problems by
# limiting range of integration to sdlim (default = 6) standard deviations
each side
# of mean of Poisson or lognormal. Also increase default precision of
integration.
# For x in 0:170 (working range of VGAM::dpolono) and default parameters,
max deviation
# from VGAM::dpolono is 3.243314e-05 of VGAM::dpolono result
#-

    require(stats)
    mapply(function(x, meanlog, sdlog, ...){

         lower <- min(max(0, x-sdlim*sqrt(x)), exp(meanlog-sdlim*sdlog))
         upper <- max(x+sdlim*sqrt(x), exp(meanlog+sdlim*sdlog))
         integrate(function(t, x, meanlog, sdlog) 
               exp(dpois(x,t,TRUE) + dlnorm(t, meanlog, sdlog, TRUE)),
              lower = lower, upper = upper, x = x, meanlog = meanlog, sdlog
= sdlog, 
              rel.tol = rel.tol, ...)$value
                                           },
           x, meanlog, sdlog, ...
           )
                                           }
---------------------------------------------------------------

Best regards,

Keith Jewell
mailto:k.jewell_at_campden.co.uk telephone (direct) +44 (0)1386 842055

> -----Original Message-----
> From: Jewell, Keith
> Sent: 15 February 2008 16:48
> To: 'r-help_at_r-project.org'
> Subject: Behaviour of integrate (was 'Poisson-lognormal probability
> calculations')
> 
> Hi again,
> 
> Adding further information to my own query, this function gets to the
> core of the problem, which I think lies in the behaviour of
> 'integrate'.
> -------------------------------------
> function (x, meanlog = 0, sdlog = 1, ...) {
>     require(stats)
>     integrand <- function(t, x, meanlog, sdlog) dpois(x,t)*dlnorm(t,
> meanlog, sdlog)
>     mapply(function(x, meanlog, sdlog, ...)
> #                (1/gamma(x+1))*
>          integrate(function(t, x, meanlog, sdlog)
> #                   gamma(x+1)*
>               integrand(t, x, meanlog, sdlog),
>               lower = 0, upper = Inf, x = x, meanlog = meanlog, sdlog =
> sdlog, ...)$value,
>            x, meanlog, sdlog, ...
>            )
>                                            }
> ----------------------------------------
> Mathematically, the presence or not of the two commented lines should
> make no difference; they multiply the integrand by a constant (with
> respect to the integration), then divide the result by the same
> constant. In practice they make a big difference! I guess they're
> altering the behaviour of the 'integrate'.
> 
> I'd have thought the presence of the lines would worsen the behaviour.
> Without the lines the integrand is reasonably small, the integral is <
> 1. With the lines the limit on the integral is x!,  leading to "non-
> finite function values" for x much > 170, even if we use logs to get
> around the limit on gamma(x).
> 
> In fact with the lines the plot of function(x) v. x looks reasonable
> (but I don't know if the values are correct!!), but without the lines
> it looks silly, I just don't believe it!
> 
> I thought the problem might relate to the note in ?integrate "If the
> function is approximately constant (in particular, zero) over nearly
> all its range it is possible that the result and error estimate may be
> seriously wrong.". I wouldn't really expect multiplication by a large
> constant to fix such errors (??), but the lognormal distribution is
> skew so it might be considered "approximately constant ... over nearly
> all its range". Even
> though ...
> > integrate(dlnorm, 0, Inf)
> 1 with absolute error < 2.5e-07
> ... suggests that this is not the source of the problem,  I tried
> changing variables to integrate over a normally distributed variable:
> ----------------
> function (x, meanlog = 0, sdlog = 1, ...) {
>     require(stats)
>     integrand <- function(t, x, meanlog, sdlog)
> dpois(x,exp(t))*dnorm(t, meanlog, sdlog)
>     mapply(function(x, meanlog, sdlog, ...)
>  #              (1/gamma(x+1))*
>          integrate(function(t, x, meanlog, sdlog)
>  #                  gamma(x+1)*
>               integrand(t, x, meanlog, sdlog),
>               lower = -Inf, upper = Inf, x = x, meanlog = meanlog,
> sdlog = sdlog, ...)$value,
>            x, meanlog, sdlog, ...
>            )
>                                            }
> -----------------------------
> Still no better; with the constants the values look reasonably smooth
> (but are they correct??), without the constants the values are silly.
> 
> I've tried reducing rel.tol and increasing subdivisions, they change
> the behaviour a little but don't "fix" it, and I still get the marked
> difference between the presence and absence of those lines (and I'm
> increasingly unsure whether either answer is correct!).
> 
> I'm still trying. but I really think I'm going nowhere. Has anyone any
> ideas?
> 
> Thanks in advance,
> 
> Keith Jewell
> mailto:k.jewell_at_campden.co.uk telephone (direct) +44 (0)1386 842055
> > -----Original Message-----
> > From: Jewell, Keith
> > Sent: 15 February 2008 11:16
> > To: 'r-help_at_r-project.org'
> > Subject: Poisson-lognormal probability calculations
> >
> > Hi,
> >
> > just for the record, although I don't think it's relevant (!)
> > -------------------------------------
> > > sessionInfo()
> > R version 2.6.0 (2007-10-03)
> > i386-pc-mingw32
> >
> > locale:
> > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> > Kingdom.1252;LC_MONETARY=English_United
> > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
> >
> > attached base packages:
> > [1] stats4    splines   stats     graphics  grDevices utils
> > datasets
> > methods   base
> >
> > other attached packages:
> > [1] VGAM_0.7-5         xlsReadWrite_1.3.2
> > --------------------------------
> >
> > I'm having some problems with a Poisson-lognormal density (mass?)
> > function.
> >
> > VGAM has the dpolono function, but that doesn't work for x-values
> over
> > 170, and I need to go to *much* bigger numbers. It fails first
> because
> > of gamma overflow, then because of non-finite integrand.
> > ---------------------
> > > VGAM::dpolono(170)
> > [1] 4.808781e-09
> > > VGAM::dpolono(171)
> > [1] 0
> > Warning message:
> > In VGAM::dpolono(171) : value out of range in 'gammafn'
> > > VGAM::dpolono(172)
> > [1] 0
> > Warning message:
> > In VGAM::dpolono(172) : value out of range in 'gammafn'
> > > VGAM::dpolono(173)
> > [1] 0
> > Warning message:
> > In VGAM::dpolono(173) : value out of range in 'gammafn'
> > > VGAM::dpolono(174)
> > Error in integrate(f = integrand, lower = -Inf, upper = Inf, x =
> x[i],
> > :
> >   non-finite function value
> > -------------
> >
> > I tidied up a little (to my eyes only, no offence intended to the
> > estimable authors of VGAM) and avoided the gamma overflow by using
> logs
> > - OK so far, it agrees almost perfectly with VGAM::dpolono for x up
> to
> > 170, and now extends the range up to 173 (wow!!).
> > -------------
> > dApolono <-
> > function (x, meanlog = 0, sdlog = 1, ...) {
> >     require(stats)
> >     integrand <- function(t, x, meanlog, sdlog) exp(-t+x*log(t)-
> > log(sdlog*t*sqrt(2*pi))-0.5*((log(t)-meanlog)/sdlog)^2)
> >     mapply(function(x, meanlog, sdlog, ...){
> >        temp <- try(
> >          integrate(f = integrand, lower = 0, upper = Inf, x = x,
> > meanlog = meanlog, sdlog = sdlog, ...)
> >                   )
> >          ifelse(inherits(temp, "try-error"), NA, exp(log(temp$value)-
> > lgamma(x+1)))
> >                                       }
> >               , x, meanlog, sdlog, ...
> >            )
> >                                            }
> > plot(log(dApolono(0:173)))
> > dApolono(174)
> > -----------------------------
> >
> > Addressing the non-finite integrand, I noticed that the gamma(x+1)
> > divisor was outside the integrand so that as x gets bigger the
> > integrand gets bigger, only to be finally divided by an increasing
> > gamma(x+1). I reasoned that putting the divisor inside the integral
> > might incur a substantial performance hit, but would keep the
> integrand
> > at reasonable values.
> >
> > --------------------------------------
> > dApolono <-
> > function (x, meanlog = 0, sdlog = 1, ...) {
> >     require(stats)
> >     integrand <- function(t, x, meanlog, sdlog) exp(-t+x*log(t)-
> > log(sdlog*t*sqrt(2*pi))-0.5*((log(t)-meanlog)/sdlog)^2-lgamma(x+1))
> >     mapply(function(x, meanlog, sdlog, ...){
> >        temp <- try(
> >          integrate(f = integrand, lower = 0, upper = Inf, x = x,
> > meanlog = meanlog, sdlog = sdlog, ...)
> >                   )
> >          ifelse(inherits(temp, "try-error"), NA, temp$value)
> >                                       }
> >               , x, meanlog, sdlog, ...
> >            )
> >                                            }
> > plot(log(dApolono(0:173)))
> > dApolono(174)
> > dApolono(1E3)
> > -------------------------
> >
> > This avoids the non-finite integrand and gives me answers at much
> > higher x, but now at least some of the answers are wrong (not to say
> > silly), even in the range where the other versions worked. I've tried
> > other variations including changing the variable of integration to
> > log(t) and integrating dpois()*dlnorm(), but I can't fix it; if the
> > factorial (=gamma) is inside the integrand I get silly answers, if
> it's
> > outside I get non-finite integrand.
> >
> > I'm tearing my hair. Can anyone suggest where I may be going wrong?
> Any
> > suggestions at all will be appreciated.
> >
> > Thanks in advance,
> >
> > Keith Jewell
> > mailto:k.jewell_at_campden.co.uk telephone (direct) +44 (0)1386 842055
> >



_____________________________________________________________________
The information in this document and attachments is given after the exercise of all reasonable care and skill in its compilation, preparation and issue, but is provided without liability in its application or use. It may contain privileged information that is exempt from disclosure by law and may be confidential. If you are not the intended recipient you must not copy, distribute or take any action in reliance on it. If you have received this document in error please notify us and delete this message from your system immediately.

Campden & Chorleywood Food Research Association Group; Campden & Chorleywood Food Research Association (company limited by guarantee),registered number 510618); CCFRA Group Services Ltd. (registered number 3841905); and CCFRA Technology Ltd (registered number 3836922), all registered in England and Wales with the registered office at Station Road, Chipping Campden, Gloucestershire, GL55 6LD.

The Group may monitor e-mail traffic data and also the content of e-mail for the purposes of security and staff training.

Any advice given is subject to our normal terms and conditions of trading, a copy of which is available on request.



This e-mail has been scanned for all viruses by MessageL...{{dropped:2}}

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 18 Feb 2008 - 12:29:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 18 Feb 2008 - 12:30:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive