Re: [R] fitted.values from zeroinfl (pscl package)

From: Achim Zeileis <Achim.Zeileis_at_wu-wien.ac.at>
Date: Mon, 18 Feb 2008 14:40:07 +0100 (CET)


On Mon, 18 Feb 2008, Sarah J Thomas wrote:

> Hello all:
>
> I have a question regarding the fitted.values returned from the
> zeroinfl() function. The values seem to be nearly identical to those
> fitted.values returned by the ordinary glm(). Why is this, shouldn't
> they be more "zero-inflated"?
>
> I construct a zero-inflated series of counts, called Y, like so:

To make this reproducible, I set the random seed to

set.seed(123)

in advance and then ran your source code

b= as.vector(c(1.5, -2))
g= as.vector(c(-3, 1))
x <- runif(100) # x is the covariate
X <- cbind(1,x)

p <- exp(X%*%g)/(1+exp(X%*%g))
m <- exp(X%*%b) # log-link for the mean process

                  # of the Poisson

Y <- rep(0, 100)

u <- runif(100)
for(i in 1:100) {

    if( u[i] < p[i] ) { Y[i] = 0 }
    else { Y[i] <- rpois(1, m[i]) }
}

# now let's compare the fitted.values from zeroinfl() # and from glm()

z1 <- glm(Y ~ x, family=poisson)
z2 <- zeroinfl(Y ~ x|x) #poisson is the default

[snip]

> You can see that they are almost identical... and the fitted.values from
> zeroinfl don't seem to be zero-inflated at all! What is going on?

Well, let's see how zero inflated your observations are:

R> sum(u < p)
[1] 2

Wow, two (!) observations that have been zero-inflated. Let's see how much the probability for observing a zero would have been

R> dpois(0, m[u < p])
[1] 0.3147816 0.1409670

which is not so low, in particular for the first one.

Overall, you've got

R> sum(Y < 1)
[1] 23

zeros in that data set and the expected number of zeros in a Poisson GLM is

R> sum(dpois(0, fitted(z1)))
[1] 23.35615

So you have observed *less* zeros than expected by a Poisson GLM. Surely, this is not the kind of data that zero-inflated models have been developed for.

> Ultimately I want these fitted.values for a goodness of fit type of test
> to see if the zeroinfl model is needed or not for a given data series.
> With these fitted.values as they are, I am rejecting assumption of a
> zero-inflated model even when the data really are zero-inflated.

Maybe you ought to think about useful data-generating processes first before designing tests or criticizing software... Z



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 18 Feb 2008 - 13:44:42 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 18 Feb 2008 - 14:00:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive