From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>

Date: Sat 23 Sep 2006 - 15:15:17 GMT

E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun Sep 24 01:19:40 2006

Date: Sat 23 Sep 2006 - 15:15:17 GMT

On 22-Sep-06 Ted Harding wrote:

> I've just come across a kind of problem which leads

*> me to wonder how to approach it in R.
**>
**> Basically, each a set of items is subjected to a series
**> of "impacts" until it eventually "fails". The "force"
**> of each impact would depend on covariates X,Y say;
**> [...]
**> ... one could envisage
**> something like a logistic model for the probabiliy
**> of failure at each impact, leading to a kind of
**> generalised "geometric distribution" -- that is,
**> the likelihood for each item would be of the form
**>
**> (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n]
**>
**> where P[i] could have a logistic model in terms of
**> the values of X[i] and Y[i], and n is the index of
**> the impact at which failure occurred. That is then
**> a solvable problem.
*

I may be getting closer, but am well off target still!

Starting with the case of no covariates, one has

p*(1-p)^(n-1) (n = 1,2,...) or p*(1-p)^y (y = 0,1,...)

which is a particular case of a negative binomial, with "target successes" = 1. In terms of the two-stage model for a negative binomial (see V&R MASS section 7.4), this corresponds to

(mu^y * theta^theta)/(mu + theta)^(theta + y) *gamma(theta + y)/(gamma(theta)*y!)

with theta = 1 and p = theta/(mu + theta) = 1/(mu + 1).

This was in the context of having landed on glm.nb in MASS.

However, glm.nb fits theta, which I would want to fix at 1.

I don't see anything in ?glm.nb which allows theta to be held at a fixed value.

The next snag is that it would not be straightforward, as far as I can see, to introduce covariates. The typical data set would be a set of sequences each of the form

X1 Y1 0

X2 Y2 0

.......

Xn Yn 1

where the value of n is random, so varies from sequence to sequence. In the above negative binomial framework, y=(n-1) and the covariates for that value of y would be the set

(X1,X2,...,Xn, Y1,Y2,...,Yn)

and therefore of variable length for each observation (i.e. sequence as above, or value of y per sequence). I don't know how one can accomodate a variable length of covariates per observation.

So it looks as though glm.nb, while thinking along the lines I want, won't fit the bill!

However, other features of glm.nb would be suitable, since

p/(1-p) = 1/mu

and a logistic model for p therefore means a linear fit to log(mu), and glm.nb allows a log link.

Comments welcome!

With thanks,

Ted.

E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861

Date: 23-Sep-06 Time: 16:15:14 ------------------------------ XFMail ------------------------------ ______________________________________________R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun Sep 24 01:19:40 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sat 23 Sep 2006 - 16:30:20 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*