Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Thu, 03 Jun 2010 16:35:58 +0100

On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote:
> On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson <gavin.simpson@ucl.ac.uk>wrote:
>
> >
> > vegan is probably not too useful here as the response is univariate;
> > counts of ducks.
> >
>
> If we assume that only one species is counted and of interest for the whole
> research. I (probably wrongly) assumed that data for multiple species was
> available.
>
> Without knowledge about the whole research setup it is difficult to say
> which method is the best, or even which methods are appropriate. VGAM is
> indeed a powerful tool, but :
>
> > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> means 182 observations in the dataset
>
> > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,
> family= NBI)
>
> is 5 splines with 3df, an intercept, that's a lot of df for only 182
> observations. using VGAM ain't going to help here.

How do you know?

> I'd reckon that the model
> itself should be reconsidered, rather than the distribution used to fit the
> error terms.

I was going to mention that too, but the OP did reduce this down to a single smooth and the problem of increasing deviance remained. Hence trying to fit a /similar/ model in other software might give an indication whether the problems are restricted to a single software or a more general issue of the data/problem?

At this stage the OP is stuck not knowing what is wrong; (s)he has nothing to do model checking on etc. Trying zeroinfl() and fitting a parametric model, for example, might be a useful starting point, then move on to models with smoothers if required.

G

>
> Cheers
> Joris
>
>
>
> > > You could also use the hurdle model from the pscl library, but this is
> > not
> > > guaranteed to work better. It is not a zero-inflation model, but a
> > > two-component model that fits the zeros using eg a binomial and the rest
> > of
> > > the counts using eg a poisson or a negative binomial.
> >
> > zeroinfl() in the same (pscl) package will fit ZIP and ZINB models.
> > Neither of these will be a GAM though, so if the OP was wanting smooth
> > functions of covariates they may not so useful, although bs() in package
> > splines might be used to include smooth terms into these functions.
> >
> > gam() in mgcv has the negative binomial family that might be a useful
> > starting point to modelling the overdispersion.
> >
> > Package vgam has functions to fit ZIP/ZINB or the hurdle models
> > mentioned above with smooth functions of covariates.
> >
> > In any case the OP should probably seek help directly from the gamlss
> > package maintainers to get gamlss working on their data or suggest why
> > the fitting isn't working.
> >
> > HTH
> >
> > G
> >
> > >
> > > Cheers
> > > Joris
> > >
> > > On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik <
> > diederik.strubbe_at_ua.ac.be
> > > > wrote:
> > >
> > > > Dear all,
> > > >
> > > > I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1,
> > Windows
> > > > XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit
> > > > variables. However, most models fail because the global deviance is
> > > > increasing and I am not sure what causes this behaviour. The dataset
> > > > consists of counts of birds (duck) and 5 habit variables measured in
> > the
> > > > field (n= 182). The dependent variable (the number of ducks
> > > > counted)suffers from zero-inflation and overdisperion:
> > > >
> > > > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > > > > mean <- mean(data$duck)
> > > > > var <- var(data$duck)
> > > > > proportion_non_zero
> > > > [1] 0.1153846
> > > > > mean
> > > > [1] 1.906593
> > > > > var
> > > > [1] 37.35587
> > > >
> > > > (I have no idea how to simulate a zero-inflated overdispersed Poisson
> > > > variable, but the data used can be found at
> > > > http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).
> > > >
> > > >
> > > > First, I create a (strong) pattern in the dataset by:
> > > > data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)
> > > >
> > > > I try to analyze these data by fitting several possible distributions
> > > > (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and
> > type II
> > > > NBI NBII and zero-inflated negative binomial ZINBI) while using cubic
> > > > splines with a df=3. The best fitting model will then be choses on the
> > basis
> > > > of its AIC.
> > > >
> > > > However, these models frequently fail to converge, and I am not sure
> > why,
> > > > and what to do about it. For example:
> > > >
> > > > > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3)
> > +
> > > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> > cs(LFAP200,df=3),data=data,family=
> > > > PO)
> > > > GAMLSS-RS iteration 1: Global Deviance = 1350.623
> > > > GAMLSS-RS iteration 2: Global Deviance = 1350.623
> > > >
> > > > > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) +
> > cs(HHCDI1000,df=3) +
> > > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> > cs(LFAP200,df=3),data=data,family=
> > > > ZIP)
> > > > GAMLSS-RS iteration 1: Global Deviance = 326.3819
> > > > GAMLSS-RS iteration 2: Global Deviance = 225.1232
> > > > GAMLSS-RS iteration 3: Global Deviance = 319.9663
> > > > Error in RS() : The global deviance is increasing
> > > > Try different steps for the parameters or the model maybe
> > inappropriate
> > > > In addition: There were 14 warnings (use warnings() to see them)
> > > >
> > > > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> > cs(LFAP200,df=3),data=data,family=
> > > > NBI)
> > > > GAMLSS-RS iteration 1: Global Deviance = 291.8607
> > > > GAMLSS-RS iteration 2: Global Deviance = 291.3291
> > > > ######......######
> > > > GAMLSS-RS iteration 4: Global Deviance = 291.1135
> > > > GAMLSS-RS iteration 20: Global Deviance = 291.107
> > > > Warning message:
> > > > In RS() : Algorithm RS has not yet converged
> > > >
> > > > > model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> > cs(LFAP200,df=3),data=data,family=
> > > > NBII)
> > > > GAMLSS-RS iteration 1: Global Deviance = 284.5993
> > > > GAMLSS-RS iteration 2: Global Deviance = 281.9548
> > > > ######......######
> > > > GAMLSS-RS iteration 5: Global Deviance = 280.7311
> > > > GAMLSS-RS iteration 15: Global Deviance = 280.6343
> > > >
> > > > > model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> > cs(LFAP200,df=3),data=data,family=
> > > > ZINBI)
> > > > GAMLSS-RS iteration 1: Global Deviance = 1672.234
> > > > GAMLSS-RS iteration 2: Global Deviance = 544.742
> > > > GAMLSS-RS iteration 3: Global Deviance = 598.9939
> > > > Error in RS() : The global deviance is increasing
> > > > Try different steps for the parameters or the model maybe
> > inappropriate
> > > >
> > > >
> > > > Thus, in this case, only the Poisson (PO) and Negative Binomial type I
> > > > (NBI)converge whereas all other models fail
> > > >
> > > > My first approach was to omit the smoothing factors for each model, or
> > > > further reduce the number of variables but this does not solve the
> > problem
> > > > and most models fail, often yielding a Error in RS() : The global
> > deviance
> > > > is increasing message.
> > > >
> > > > I would think that, given the fact that the dependent variable is
> > > > zero-inflated and overdispersed, that the Zero-Inflated Negative
> > Binomial
> > > > (ZINBI) distribution would be the best fit, but the ZINBI even fails in
> > the
> > > > following very simple examples.
> > > >
> > > > > model_ZINBI <- gamlss(duck ~ cs(LFAP200,df=3),data=data,family=
> > ZINBI)
> > > > GAMLSS-RS iteration 1: Global Deviance = 3508.533
> > > > GAMLSS-RS iteration 2: Global Deviance = 1117.121
> > > > GAMLSS-RS iteration 3: Global Deviance = 652.5771
> > > > GAMLSS-RS iteration 4: Global Deviance = 632.8885
> > > > GAMLSS-RS iteration 5: Global Deviance = 645.1169
> > > > Error in RS() : The global deviance is increasing
> > > > Try different steps for the parameters or the model maybe
> > inappropriate
> > > >
> > > > > model_ZINBI <- gamlss(duck ~ LFAP200,data=data,family= ZINBI)
> > > > GAMLSS-RS iteration 1: Global Deviance = 3831.864
> > > > GAMLSS-RS iteration 2: Global Deviance = 1174.605
> > > > GAMLSS-RS iteration 3: Global Deviance = 562.5428
> > > > GAMLSS-RS iteration 4: Global Deviance = 344.0637
> > > > GAMLSS-RS iteration 5: Global Deviance = 1779.018
> > > > Error in RS() : The global deviance is increasing
> > > > Try different steps for the parameters or the model maybe
> > inappropriate
> > > >
> > > >
> > > >
> > > > Any suggestions on how to proceed with this?
> > > >
> > > > Many thanks in advance,
> > > >
> > > >
> > > > Diederik
> > > >
> > > >
> > > > Diederik Strubbe
> > > > Evolutionary Ecology Group
> > > > Department of Biology
> > > > University of Antwerp
> > > > Groenenborgerlaan 171
> > > > 2020 Antwerpen, Belgium
> > > > tel: +32 3 265 3464
> > > >
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > >
> > > > ______________________________________________
> > > > R-help_at_r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > > >
> > >
> > >
> > > ______________________________________________
> > > R-help_at_r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> > ECRC, UCL Geography, [f] +44 (0)20 7679 0565
> > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
> > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> >
>
>

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Jun 2010 - 15:36:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jun 2010 - 17:10:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive