Re: [R] "logistic" + "neg binomial" + ...

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Sat 23 Sep 2006 - 15:15:17 GMT


On 22-Sep-06 Ted Harding wrote:
> I've just come across a kind of problem which leads
> me to wonder how to approach it in R.
>
> Basically, each a set of items is subjected to a series
> of "impacts" until it eventually "fails". The "force"
> of each impact would depend on covariates X,Y say;
> [...]
> ... one could envisage
> something like a logistic model for the probabiliy
> of failure at each impact, leading to a kind of
> generalised "geometric distribution" -- that is,
> the likelihood for each item would be of the form
>
> (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n]
>
> where P[i] could have a logistic model in terms of
> the values of X[i] and Y[i], and n is the index of
> the impact at which failure occurred. That is then
> a solvable problem.

I may be getting closer, but am well off target still!

Starting with the case of no covariates, one has

   p*(1-p)^(n-1) (n = 1,2,...) or p*(1-p)^y (y = 0,1,...)

which is a particular case of a negative binomial, with "target successes" = 1. In terms of the two-stage model for a negative binomial (see V&R MASS section 7.4), this corresponds to

   (mu^y * theta^theta)/(mu + theta)^(theta + y)    *gamma(theta + y)/(gamma(theta)*y!)

with theta = 1 and p = theta/(mu + theta) = 1/(mu + 1).

This was in the context of having landed on glm.nb in MASS.

However, glm.nb fits theta, which I would want to fix at 1.

I don't see anything in ?glm.nb which allows theta to be held at a fixed value.

The next snag is that it would not be straightforward, as far as I can see, to introduce covariates. The typical data set would be a set of sequences each of the form

   X1 Y1 0
   X2 Y2 0
   .......
   Xn Yn 1

where the value of n is random, so varies from sequence to sequence. In the above negative binomial framework, y=(n-1) and the covariates for that value of y would be the set

 (X1,X2,...,Xn, Y1,Y2,...,Yn)

and therefore of variable length for each observation (i.e. sequence as above, or value of y per sequence). I don't know how one can accomodate a variable length of covariates per observation.

So it looks as though glm.nb, while thinking along the lines I want, won't fit the bill!

However, other features of glm.nb would be suitable, since

  p/(1-p) = 1/mu

and a logistic model for p therefore means a linear fit to log(mu), and glm.nb allows a log link.

Comments welcome!
With thanks,
Ted.



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 23-Sep-06                                       Time: 16:15:14
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun Sep 24 01:19:40 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 23 Sep 2006 - 16:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.