From: Achim Zeileis <Achim.Zeileis_at_uibk.ac.at>

Date: Tue, 12 Apr 2011 08:45:04 +0200 (CEST)

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 12 Apr 2011 - 07:01:02 GMT

Date: Tue, 12 Apr 2011 08:45:04 +0200 (CEST)

On Mon, 11 Apr 2011, ty ty wrote:

> Hello, dear experts. I don't have much experience in building

*> regression models, so sorry if this is too simple and not very
**> interesting question.
**> Currently I'm working on the model that have to predict proportion of
**> the debt returned by the debtor in some period of time. So the
**> dependent variable can be any number between 0 and 1 with very high
**> probability of 0 (if there are no payment) and if there are some
**> payments it can very likely be 1 (all debt paid) although can be any
**> number from 0 to 1.
**> Not having much knowledge in this area I can't think about any
**> appropriate model and wasn't able to find much on the Internet. Can
**> anyone give me some ideas about possible models, any information
**> on-line and some R functions and packages that can implement it.
**> Thank you in advance for any help.
*

Beta regression is one possibility to model proportions in the open unit interval (0, 1). It is available in R in the package "betareg":

http://CRAN.R-project.org/package=betareg http://www.jstatsoft.org/v34/i02/

If 0 and 1 can occur, some authors have suggested to scale the response so that 0 and 1 are avoided. See the paper linked above for an example. If, however, there are many 0s and/or 1s, one might want to take a hurdle or inflation type approach. One such approach is implemented in the "gamlss" package:

http://CRAN.R-project.org/package=gamlss http://www.jstatsoft.org/v23/i07/ http://www.gamlss.org/

The hurdle approach can be implemented using separate building blocks. First a binary regression model that captures whether the dependent variable is greater than 0 (i.e., crosses the hurdle): glm(I(y > 0) ~ ..., family = binomial). Second a beta regression for only the observations in (0, 1) that crossed the hurdle: betareg(y ~ ..., subset = y > 0). A recent technical report introduces such a family of models along with many further techniques (specialized residuals and regression diagnostics) that are not yet available in R:

http://arxiv.org/abs/1103.2372

Best,

Z

> Ihor.

*>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 12 Apr 2011 - 07:01:02 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 12 Apr 2011 - 07:10:30 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*