Re: [R] "logistic" + "neg binomial" + ...

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Sat 23 Sep 2006 - 11:23:48 GMT


On 23-Sep-06 Prof Brian Ripley wrote:

> On Fri, 22 Sep 2006, Ted.Harding@nessie.mcc.ac.uk wrote:
> 

>> Hi Folks,
>>
>> I've just come across a kind of problem which leads
>> me to wonder how to approach it in R.
>>
>> Basically, each a set of items is subjected to a series
>> of "impacts" until it eventually "fails". The "force"
>> of each impact would depend on covariates X,Y say;
>> but as a result of preceding impacts an item would be
>> expected to have a "cumulative frailty" such that the
>> probability of failure due to a particular impact would
>> possibly increase according to the series of impacts
>> already survived.
>
> So this is a discrete-time survival model.

Essentially, yes. Though without "cumulative frailty" one need not take into account that each item is repeatedly "hit" until it "breaks" -- it would be the same as if each item was discarded after impact, being replaced by an identical one.

>> Without the "cumulative frailty" one could envisage
>> something like a logistic model for the probabiliy
>> of failure at each impact, leading to a kind of
>> generalised "exponential distribution" -- that is,
>> the likelihood for each item would be of the form
>>
>> (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n]
>>
>> where P[i] could have a logistic model in terms of
>> the values of X[i] and Y[i], and n is the index of
>> the impact at which failure occurred. That is then
>> a solvable problem.
>>
>> Even so, I'm not (so far) finding in the R resources
>> the appropriate analogue of glm for this kind of
>> model. I dare say a prolonged trawl through the various
>> "survival" resources might lead to something applicable,
>> but ...
>
> What is inadequate about glm itself?

My concern about using glm is that it does not have an option like "family=negbinomial", so that binary data are treated as if binomially distributed. For example, consider the two cases (without covariates):

  1. 0 0 0 0 0 0 0 0 0 1
  2. 0 0 0 0 0 0 0 0 0 1

where A results from a fixed number of 10 trials, of which one results in "1", while B results from repeating trials until a "1" is obtained, as it happens on the 10th trial.

Now, granted that both A and B have the same likelihood function for p (prob of "1"), namely p*(1-p)^9, so the same MLE of p would result, namely p=1/10. However, they have quite different sampling properties. If, with

  y <- cbind(c(0,0,0,0,0,0,0,0,0,1),c(1,1,1,1,1,1,1,1,1,0))

you fit y~1, the estimate returned by glm is the Intercept which is the estimated value of t = log(p/(1-p)):

  summary(glm(y~1,family=binomial))
  [...]
  Coefficients:

              Estimate Std. Error z value Pr(>|z|)  
  (Intercept)   -2.197      1.054  -2.085   0.0371 *

and indeed, with t=Intercept, you find that exp(t)/(1+exp(t)) = 0.1

However, I would not trust the "Std. Error" from this output! It is computed on the basis that the 0/1 data are binomial with fixed size n=10, whereas I would want a "Std. Error" computed on the basis that the random variable is n, for fixed r=1. (Indeed, in both cases the "Std. Error" is not quite what it seems, since there is positive probability in case A for both t = -inf and t = +inf when estimated, and in case B positive probability for t = +inf).

Although in the simple cases of A and B one can work directly with p, avoiding such problems, when one introduces covariates for each trial and it is natural to postulate a logistic model for P[i] in the i-th trial, then it would be very convenient to use the 'glm' mechanism for this. Again, the same estimated values of the coefficients would result from maximising the likelihood function whether the data are generated as in A or as in B, for the same reasons (and even more strongly) I would not trust the SEs of the coefficients as output from glm.

> The log-likelihood is a sum of terms over impacts, so fitting
> logisitic models for each impact can be done separately for
> each model.

I'm not sure how to interpret this. I'm envisaging only one model, say logit(P[i]) = a + b*X[i] + c*Y[i] for the i-th impact; "each impact" is a single event so I woudn't envisage fitting any model for this single event.

> However, nnet() can fit them simultaneously (and couple
> them if you want).
> 

>> And then there's the cumulative frailty ... !
>
> Add the history to that point into the model.

I'll look futher into those suggestions.

Thanks,
Ted.



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 23-Sep-06                                       Time: 12:23:44
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Sep 23 21:28:17 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 23 Sep 2006 - 13:30:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.