Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

From: Joris Meys <jorismeys_at_gmail.com>
Date: Thu, 03 Jun 2010 17:00:04 +0200

On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson <gavin.simpson_at_ucl.ac.uk>wrote:

>
> vegan is probably not too useful here as the response is univariate;
> counts of ducks.
>

If we assume that only one species is counted and of interest for the whole research. I (probably wrongly) assumed that data for multiple species was available.

Without knowledge about the whole research setup it is difficult to say which method is the best, or even which methods are appropriate. VGAM is indeed a powerful tool, but :

> proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
means 182 observations in the dataset

> model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data, family= NBI)

is 5 splines with 3df, an intercept, that's a lot of df for only 182 observations. using VGAM ain't going to help here. I'd reckon that the model itself should be reconsidered, rather than the distribution used to fit the error terms.

Cheers
Joris

> > You could also use the hurdle model from the pscl library, but this is
> not
> > guaranteed to work better. It is not a zero-inflation model, but a
> > two-component model that fits the zeros using eg a binomial and the rest
> of
> > the counts using eg a poisson or a negative binomial.
>
> zeroinfl() in the same (pscl) package will fit ZIP and ZINB models.
> Neither of these will be a GAM though, so if the OP was wanting smooth
> functions of covariates they may not so useful, although bs() in package
> splines might be used to include smooth terms into these functions.
>
> gam() in mgcv has the negative binomial family that might be a useful
> starting point to modelling the overdispersion.
>
> Package vgam has functions to fit ZIP/ZINB or the hurdle models
> mentioned above with smooth functions of covariates.
>
> In any case the OP should probably seek help directly from the gamlss
> package maintainers to get gamlss working on their data or suggest why
> the fitting isn't working.
>
> HTH
>
> G
>
> >
> > Cheers
> > Joris
> >
> > On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik <
> diederik.strubbe_at_ua.ac.be
> > > wrote:
> >
> > > Dear all,
> > >
> > > I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1,
> Windows
> > > XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit
> > > variables. However, most models fail because the global deviance is
> > > increasing and I am not sure what causes this behaviour. The dataset
> > > consists of counts of birds (duck) and 5 habit variables measured in
> the
> > > field (n= 182). The dependent variable (the number of ducks
> > > counted)suffers from zero-inflation and overdisperion:
> > >
> > > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > > > mean <- mean(data$duck)
> > > > var <- var(data$duck)
> > > > proportion_non_zero
> > > [1] 0.1153846
> > > > mean
> > > [1] 1.906593
> > > > var
> > > [1] 37.35587
> > >
> > > (I have no idea how to simulate a zero-inflated overdispersed Poisson
> > > variable, but the data used can be found at
> > > http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).
> > >
> > >
> > > First, I create a (strong) pattern in the dataset by:
> > > data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)
> > >
> > > I try to analyze these data by fitting several possible distributions
> > > (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and
> type II
> > > NBI NBII and zero-inflated negative binomial ZINBI) while using cubic
> > > splines with a df=3. The best fitting model will then be choses on the
> basis
> > > of its AIC.
> > >
> > > However, these models frequently fail to converge, and I am not sure
> why,
> > > and what to do about it. For example:
> > >
> > > > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3)
> +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > PO)
> > > GAMLSS-RS iteration 1: Global Deviance = 1350.623
> > > GAMLSS-RS iteration 2: Global Deviance = 1350.623
> > >
> > > > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) +
> cs(HHCDI1000,df=3) +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > ZIP)
> > > GAMLSS-RS iteration 1: Global Deviance = 326.3819
> > > GAMLSS-RS iteration 2: Global Deviance = 225.1232
> > > GAMLSS-RS iteration 3: Global Deviance = 319.9663
> > > Error in RS() : The global deviance is increasing
> > > Try different steps for the parameters or the model maybe
> inappropriate
> > > In addition: There were 14 warnings (use warnings() to see them)
> > >
> > > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > NBI)
> > > GAMLSS-RS iteration 1: Global Deviance = 291.8607
> > > GAMLSS-RS iteration 2: Global Deviance = 291.3291
> > > ######......######
> > > GAMLSS-RS iteration 4: Global Deviance = 291.1135
> > > GAMLSS-RS iteration 20: Global Deviance = 291.107
> > > Warning message:
> > > In RS() : Algorithm RS has not yet converged
> > >
> > > > model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > NBII)
> > > GAMLSS-RS iteration 1: Global Deviance = 284.5993
> > > GAMLSS-RS iteration 2: Global Deviance = 281.9548
> > > ######......######
> > > GAMLSS-RS iteration 5: Global Deviance = 280.7311
> > > GAMLSS-RS iteration 15: Global Deviance = 280.6343
> > >
> > > > model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > ZINBI)
> > > GAMLSS-RS iteration 1: Global Deviance = 1672.234
> > > GAMLSS-RS iteration 2: Global Deviance = 544.742
> > > GAMLSS-RS iteration 3: Global Deviance = 598.9939
> > > Error in RS() : The global deviance is increasing
> > > Try different steps for the parameters or the model maybe
> inappropriate
> > >
> > >
> > > Thus, in this case, only the Poisson (PO) and Negative Binomial type I
> > > (NBI)converge whereas all other models fail
> > >
> > > My first approach was to omit the smoothing factors for each model, or
> > > further reduce the number of variables but this does not solve the
> problem
> > > and most models fail, often yielding a Error in RS() : The global
> deviance
> > > is increasing message.
> > >
> > > I would think that, given the fact that the dependent variable is
> > > zero-inflated and overdispersed, that the Zero-Inflated Negative
> Binomial
> > > (ZINBI) distribution would be the best fit, but the ZINBI even fails in
> the
> > > following very simple examples.
> > >
> > > > model_ZINBI <- gamlss(duck ~ cs(LFAP200,df=3),data=data,family=
> ZINBI)
> > > GAMLSS-RS iteration 1: Global Deviance = 3508.533
> > > GAMLSS-RS iteration 2: Global Deviance = 1117.121
> > > GAMLSS-RS iteration 3: Global Deviance = 652.5771
> > > GAMLSS-RS iteration 4: Global Deviance = 632.8885
> > > GAMLSS-RS iteration 5: Global Deviance = 645.1169
> > > Error in RS() : The global deviance is increasing
> > > Try different steps for the parameters or the model maybe
> inappropriate
> > >
> > > > model_ZINBI <- gamlss(duck ~ LFAP200,data=data,family= ZINBI)
> > > GAMLSS-RS iteration 1: Global Deviance = 3831.864
> > > GAMLSS-RS iteration 2: Global Deviance = 1174.605
> > > GAMLSS-RS iteration 3: Global Deviance = 562.5428
> > > GAMLSS-RS iteration 4: Global Deviance = 344.0637
> > > GAMLSS-RS iteration 5: Global Deviance = 1779.018
> > > Error in RS() : The global deviance is increasing
> > > Try different steps for the parameters or the model maybe
> inappropriate
> > >
> > >
> > >
> > > Any suggestions on how to proceed with this?
> > >
> > > Many thanks in advance,
> > >
> > >
> > > Diederik
> > >
> > >
> > > Diederik Strubbe
> > > Evolutionary Ecology Group
> > > Department of Biology
> > > University of Antwerp
> > > Groenenborgerlaan 171
> > > 2020 Antwerpen, Belgium
> > > tel: +32 3 265 3464
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > >
> > > ______________________________________________
> > > R-help_at_r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> ECRC, UCL Geography, [f] +44 (0)20 7679 0565
> Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
> UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
>

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
Joris.Meys_at_Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Jun 2010 - 15:03:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jun 2010 - 15:50:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive