Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

From: Joris Meys <jorismeys_at_gmail.com>
Date: Wed, 02 Jun 2010 13:45:35 +0200

Hi Diederik,

I can't say immediately why your models fail without seeing the data and running a number of tests. You could play around with the other parameters, as the problem might be related to the optimization algorithm.

A different approach would be using the methods in the vegan package. This package contains a set of ordination methods that allow to map environmental variables on the ordination space.

You could also use the hurdle model from the pscl library, but this is not guaranteed to work better. It is not a zero-inflation model, but a two-component model that fits the zeros using eg a binomial and the rest of the counts using eg a poisson or a negative binomial.

Cheers
Joris

On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik <diederik.strubbe_at_ua.ac.be
> wrote:

> Dear all,
>
> I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1, Windows
> XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit
> variables. However, most models fail because “the global deviance is
> increasing” and I am not sure what causes this behaviour. The dataset
> consists of counts of birds (duck) and 5 habit variables measured in the
> field (n= 182). The dependent variable (the number of ducks
> counted)’suffers’ from zero-inflation and overdisperion:
>
> > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > mean <- mean(data$duck)
> > var <- var(data$duck)
> > proportion_non_zero
> [1] 0.1153846
> > mean
> [1] 1.906593
> > var
> [1] 37.35587
>
> (I have no idea how to simulate a zero-inflated overdispersed Poisson
> variable, but the data used can be found at
> http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).
>
>
> First, I create a (strong) pattern in the dataset by:
> data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)
>
> I try to analyze these data by fitting several possible distributions
> (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and type II
> NBI NBII and zero-inflated negative binomial ZINBI) while using cubic
> splines with a df=3. The best fitting model will then be choses on the basis
> of its AIC.
>
> However, these models frequently fail to converge, and I am not sure why,
> and what to do about it. For example:
>
> > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> PO)
> GAMLSS-RS iteration 1: Global Deviance = 1350.623
> GAMLSS-RS iteration 2: Global Deviance = 1350.623
>
> > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> ZIP)
> GAMLSS-RS iteration 1: Global Deviance = 326.3819
> GAMLSS-RS iteration 2: Global Deviance = 225.1232
> GAMLSS-RS iteration 3: Global Deviance = 319.9663
> Error in RS() : The global deviance is increasing
> Try different steps for the parameters or the model maybe inappropriate
> In addition: There were 14 warnings (use warnings() to see them)
>
> > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> NBI)
> GAMLSS-RS iteration 1: Global Deviance = 291.8607
> GAMLSS-RS iteration 2: Global Deviance = 291.3291
> ######......######
> GAMLSS-RS iteration 4: Global Deviance = 291.1135
> GAMLSS-RS iteration 20: Global Deviance = 291.107
> Warning message:
> In RS() : Algorithm RS has not yet converged
>
> > model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> NBII)
> GAMLSS-RS iteration 1: Global Deviance = 284.5993
> GAMLSS-RS iteration 2: Global Deviance = 281.9548
> ######......######
> GAMLSS-RS iteration 5: Global Deviance = 280.7311
> GAMLSS-RS iteration 15: Global Deviance = 280.6343
>
> > model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> ZINBI)
> GAMLSS-RS iteration 1: Global Deviance = 1672.234
> GAMLSS-RS iteration 2: Global Deviance = 544.742
> GAMLSS-RS iteration 3: Global Deviance = 598.9939
> Error in RS() : The global deviance is increasing
> Try different steps for the parameters or the model maybe inappropriate
>
>
> Thus, in this case, only the Poisson (PO) and Negative Binomial type I
> (NBI)converge whereas all other models fail…
>
> My first approach was to omit the smoothing factors for each model, or
> further reduce the number of variables but this does not solve the problem
> and most models fail, often yielding a “Error in RS() : The global deviance
> is increasing” message.
>
> I would think that, given the fact that the dependent variable is
> zero-inflated and overdispersed, that the Zero-Inflated Negative Binomial
> (ZINBI) distribution would be the best fit, but the ZINBI even fails in the
> following very simple examples.
>
> > model_ZINBI <- gamlss(duck ~ cs(LFAP200,df=3),data=data,family= ZINBI)
> GAMLSS-RS iteration 1: Global Deviance = 3508.533
> GAMLSS-RS iteration 2: Global Deviance = 1117.121
> GAMLSS-RS iteration 3: Global Deviance = 652.5771
> GAMLSS-RS iteration 4: Global Deviance = 632.8885
> GAMLSS-RS iteration 5: Global Deviance = 645.1169
> Error in RS() : The global deviance is increasing
> Try different steps for the parameters or the model maybe inappropriate
>
> > model_ZINBI <- gamlss(duck ~ LFAP200,data=data,family= ZINBI)
> GAMLSS-RS iteration 1: Global Deviance = 3831.864
> GAMLSS-RS iteration 2: Global Deviance = 1174.605
> GAMLSS-RS iteration 3: Global Deviance = 562.5428
> GAMLSS-RS iteration 4: Global Deviance = 344.0637
> GAMLSS-RS iteration 5: Global Deviance = 1779.018
> Error in RS() : The global deviance is increasing
> Try different steps for the parameters or the model maybe inappropriate
>
>
>
> Any suggestions on how to proceed with this?
>
> Many thanks in advance,
>
>
> Diederik
>
>
> Diederik Strubbe
> Evolutionary Ecology Group
> Department of Biology
> University of Antwerp
> Groenenborgerlaan 171
> 2020 Antwerpen, Belgium
> tel: +32 3 265 3464
>
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
Joris.Meys_at_Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Wed 02 Jun 2010 - 11:49:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jun 2010 - 07:50:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive