From: Doran, Harold <HDoran_at_air.org>
Date: Sat 06 May 2006 - 01:54:55 EST

Spencer

Thanks for your thoughts on this. I did a bit of work and did end up with a method (more a trick), but it did work. I am certain there are better ways to do this, but here is how I resolved the issue.

The integral I need to evaluate is

\frac{\int_c^{\infty} p(x|\theta)f(\theta)d\theta} {\int_{-\infty}^{\infty} p(x|\theta)f(\theta)d\theta}

Where p(x|\theta) is the likelihood for the Rasch item response theory model and f(\theta) is the population
distribution. Originally, I was using numerical quadrature by summing from -10 to 10 in increments of .01 for the denominator and from c to 10 for the numerator. This worked, but was rather expensive.

Now, the gauss.quad.prob function provides weights that I can use for the denominator via:

Where mu and sigma are values that are estimated from another procedure. But, for sake of argument assume N(0,1). My challenge was to get the appropriate nodes and weights for the numerator since I am going from -Inf to c. So, what I did to get this to work was get the quadrature nodes and weights using

Then, I do the following to get them to work with a normal distribution:

F(y) = p(y+c) * N(y+c|\mu, \sigma^2)

Where y is the laguerre node and c is the lower bound of integration. Subsequently, I plug this into the following

\sum_{i=1}^Q F(y_i)e^y_i \cdot w_i

Where i indexes node and w is the weight at node y_i.

As it turns out, I can evaluate this using only 49 quadrature points (and not the 2001 I was using with equal intervals) and get my result. Both provide the same answer, but CPU time is reduced to basically zero for the quadrature whereas it was about 4 CPU seconds for the equal interval approach. Keep in mind that I am looping through a student achievement data file where this is being applied to about 70,000 students, so time matters here.

Here is how I programmed this. I realize there are much better ways to accomplish this, and any suggestions would be very much appreciated.

library(statmod)

rasch <- function(b,theta){

1 / (1 + exp(outer(b,theta,'-')))
}

like.mat <- function(x,b,theta){

rasch(b, theta)^x * (1 - rasch(b,theta))^(1-x)    }

class.numer <- function(x,b, prof_cut, mu=0, sigma=1, aboveQ){

mat <- rbind(like.mat(x,b, (prof_cut-gauss_numer$nodes)), dnorm(prof_cut-gauss_numer$nodes, mean=mu, sd=sigma))

      } else { mat <- rbind(like.mat(x,b, (gauss_numer$nodes+prof_cut)), dnorm(gauss_numer$nodes+prof_cut,  mean=mu,                 sd=sigma))


}
f_y <- rbind(apply(mat, 2, prod), exp(gauss_numer$nodes), gauss_numer$weights)

sum(apply(f_y,2,prod))
}

class.denom <- function(x,b, mu=0, sigma=1){

gauss_denom <- gauss.quad.prob(49, dist='normal', mu=mu, sigma=sigma)    mat <- rbind(like.mat(x,b,gauss_denom$nodes),gauss_denom$weights)    sum(apply(mat, 2, prod))
}

class.acc <-function(x,b,prof_cut, mu=0, sigma=1, aboveQ=TRUE){

result <- class.numer(x,b,prof_cut, mu,sigma, aboveQ)/class.denom(x,b, mu, sigma)

return(result)
}

# test the function
set.seed(1)
response <- c(rep(1,10), rep(0,10))
items <- runif(20, -3,3)
class.acc(response, items, .05, above=F) [1] 0.3279967

>
> I started this email as a reply to a question from
> Harold Doran on
> approximates a definite integral from a to b of f(x)*w(x)*dx
> with a weighted sum i = 1:n of w[i]*f(x[i]), where the
> quadrature points and weights are chosen so the formula is
> exact if f(x) is a polynomial of order at most
> (2*n-1). If a=(-Inf), b=Inf, and w(x) = a normal density, this is
> called Gauss-Hermite quadrature; if the mean and standard
> deviation of the normal are estimated from the data, it is
> finite and w(x) is a beta distribution on that range, this is
> called Gauss-Jacobi quadrature; for the uniform
> distribution, it is called Gauss-Legendre. For the gamma
> distribution (including exponential as a special case), the
> integral is from 0 to Inf, and it's called Gauss-Laguerre.
> Recent references that I found useful for this are the following:
>
> Kythe and Schaeferkotter (2005) Handbook of
> Computational Methods for Integration (Chapman and Hall)
>
> Evans and Schwartz (2000) Approximating Integrals via
> Monte Carlo and Deterministic Methods.
>
> In theory, one could use Gauss-Jacobi on any finite
> interval, Gauss-Laguerre on any semi-finite interval and
> Gauss-Hermite on any integral over the entire real line.
> However, in my recent attempts to do this, I failed to get
> reasonable rates of convergence for many integrals that I
> thought should be reasonably well behaved.
>
> However, as I mentioned above, my work in this area
> suggested I might more wisely invest my time in spline integration.
>
> Best Wishes,
> spencer graves
>
> Doran, Harold wrote:
>
> > I've written a series of functions that evaluates an integral
> from -inf to a or b to +inf using equally spaced quadrature
> points along a normal distribution from -10 to +10 moving in
> increments of .01. These functions are working and give very
> good approximations, but I think they are computationally
> wasteful as I am evaluating the function at *many* points.
> >
> > Instead, I would prefer to use true Gaussan Quadrature and
> evaluate the integral at fewer points. I am working with the
> gauss.quad.prob function and have rewritten my functions to
> use the nodes and weights. But, I believe these are only for
> definite integrals and not for evaluating from -inf to a or b to +inf.
> >
> >>From what I understand, I cannot use the tails (e.g., -4.85
> or 4.85 below) to approximate inf. In fact, I've tried it and
> it doesn't work. Instead, I believe one needs special weights
> when the interest is to evaluate an integral from -inf to a etc.
> >
> > I've read through the help and searched the archives and didn't
> see any relevant discussion on whether it is possible to use
> this function to get the weights for -inf to a. Only two hits
> come up in the archive and neither was relevant. I think the
> Laguerre quadrature in gauss.prob might be the right path,
> but I'm uncertain.
> >
> > Is it possible to get the nodes and weights I need from R
> directly, or should I refer to a table that has derived these
> for the normal distribution from another source?
> >
> > thank you
> > Harold
> >
> >
> >
> >
> >
> > $nodes > > [1] -4.8594628 -3.5818235 -2.4843258 -1.4659891 -0.4849357 > 0.4849357 > > 1.4659891 2.4843258 3.5818235 [10] 4.8594628 > > > >$weights
> > [1] 4.310653e-06 7.580709e-04 1.911158e-02 1.354837e-01
> 3.446423e-01
> > 3.446423e-01 1.354837e-01 1.911158e-02 [9] 7.580709e-04
> 4.310653e-06
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help