[R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

From: Strubbe Diederik <diederik.strubbe_at_ua.ac.be>
Date: Wed, 02 Jun 2010 12:56:00 +0200


Dear all,

I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1, Windows XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit variables. However, most models fail because “the global deviance is increasing” and I am not sure what causes this behaviour. The dataset consists of counts of birds (duck) and 5 habit variables measured in the field (n= 182). The dependent variable (the number of ducks counted)’suffers’ from zero-inflation and overdisperion:

> proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> mean <- mean(data$duck)
> var <- var(data$duck)
> proportion_non_zero

[1] 0.1153846
> mean

[1] 1.906593
> var

[1] 37.35587

(I have no idea how to simulate a zero-inflated overdispersed Poisson variable, but the data used can be found at http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).

First, I create a (strong) pattern in the dataset by: data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)

I try to analyze these data by fitting several possible distributions (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and type II NBI NBII and zero-inflated negative binomial ZINBI) while using cubic splines with a df=3. The best fitting model will then be choses on the basis of its AIC.

However, these models frequently fail to converge, and I am not sure why, and what to do about it. For example:

> model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= PO)
GAMLSS-RS iteration 1: Global Deviance = 1350.623 GAMLSS-RS iteration 2: Global Deviance = 1350.623

> model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= ZIP)

GAMLSS-RS iteration 1: Global Deviance = 326.3819 
GAMLSS-RS iteration 2: Global Deviance = 225.1232 
GAMLSS-RS iteration 3: Global Deviance = 319.9663 
Error in RS() : The global deviance is increasing  Try different steps for the parameters or the model maybe inappropriate In addition: There were 14 warnings (use warnings() to see them)

> model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= NBI)
GAMLSS-RS iteration 1: Global Deviance = 291.8607 GAMLSS-RS iteration 2: Global Deviance = 291.3291 ######......######
GAMLSS-RS iteration 4: Global Deviance = 291.1135 GAMLSS-RS iteration 20: Global Deviance = 291.107 Warning message:
In RS() : Algorithm RS has not yet converged

> model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= NBII)
GAMLSS-RS iteration 1: Global Deviance = 284.5993 GAMLSS-RS iteration 2: Global Deviance = 281.9548 ######......######
GAMLSS-RS iteration 5: Global Deviance = 280.7311 GAMLSS-RS iteration 15: Global Deviance = 280.6343

> model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= ZINBI)

GAMLSS-RS iteration 1: Global Deviance = 1672.234 
GAMLSS-RS iteration 2: Global Deviance = 544.742 
GAMLSS-RS iteration 3: Global Deviance = 598.9939 
Error in RS() : The global deviance is increasing  Try different steps for the parameters or the model maybe inappropriate

Thus, in this case, only the Poisson (PO) and Negative Binomial type I (NBI)converge whereas all other models fail…

My first approach was to omit the smoothing factors for each model, or further reduce the number of variables but this does not solve the problem and most models fail, often yielding a “Error in RS() : The global deviance is increasing” message.

I would think that, given the fact that the dependent variable is zero-inflated and overdispersed, that the Zero-Inflated Negative Binomial (ZINBI) distribution would be the best fit, but the ZINBI even fails in the following very simple examples.

> model_ZINBI <- gamlss(duck ~ cs(LFAP200,df=3),data=data,family= ZINBI)

GAMLSS-RS iteration 1: Global Deviance = 3508.533 
GAMLSS-RS iteration 2: Global Deviance = 1117.121 
GAMLSS-RS iteration 3: Global Deviance = 652.5771 
GAMLSS-RS iteration 4: Global Deviance = 632.8885 
GAMLSS-RS iteration 5: Global Deviance = 645.1169 
Error in RS() : The global deviance is increasing  Try different steps for the parameters or the model maybe inappropriate

> model_ZINBI <- gamlss(duck ~ LFAP200,data=data,family= ZINBI)

GAMLSS-RS iteration 1: Global Deviance = 3831.864 
GAMLSS-RS iteration 2: Global Deviance = 1174.605 
GAMLSS-RS iteration 3: Global Deviance = 562.5428 
GAMLSS-RS iteration 4: Global Deviance = 344.0637 
GAMLSS-RS iteration 5: Global Deviance = 1779.018 
Error in RS() : The global deviance is increasing  Try different steps for the parameters or the model maybe inappropriate

Any suggestions on how to proceed with this?

Many thanks in advance,

Diederik

Diederik Strubbe
Evolutionary Ecology Group
Department of Biology
University of Antwerp
Groenenborgerlaan 171
2020 Antwerpen, Belgium
tel: +32 3 265 3464

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 02 Jun 2010 - 11:04:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Jun 2010 - 12:10:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive