Re: [Rd] pbinom with size argument 0 (PR#8560)

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Mon 06 Feb 2006 - 10:10:22 GMT


On 05-Feb-06 uht@dfu.min.dk wrote:
> Hello all
>
> A pragmatic argument for allowing size=3D=3D0 is the situation where
> the size is in itself a random variable (that's how I stumbled over
> the inconsistency, by the way).
>
> For example, in textbooks on probability it is stated that:
>
> If X is Poisson(lambda), and the conditional=20
> distribution of Y given X is Binomial(X,p), then=20
> Y is Poisson(lambda*p).
>
> (cf eg Pitman's "Probability", p. 400)
>
> Clearly this statement requires Binomial(0,p) to be a well-defined
> distribution.
>
> Such statements would be quite convoluted if we did not define
> Binomial(0,p) as a legal (but degenerate) distribution. The same
> applies to codes where the size parameter may attain the value 0.
>
> Just my 2 cents.
>
> Cheers,
>
> Uffe

Uffe's pragmatic argument is of course convincing at least in the circumstances he refers to. However, Peter Ehlers' posting has re-stimulated the underlying ambiguity I feel about this issue (intially, that the probability of a "non-event" should be undefined).

Thus I can envisage different circumatances in which one or the other view could be appropriate.

Uffe observes a Poisson-distributed number of Bernoulli trials and records the number of "successes", with zero if the Poisson distribution says "zero trials". In that case no Bernoulli trial has been carried out, so the issue of what the distribution over its empty set of outcomes should be is irrelevant. However, he can encapsulate this process mathematically by assigning P=1 to the outcome r=0 when n=0, and this may well lead to a more straightforward R program, for instance (which, reading between the lines, may well be what really happened in his case).

On the other hand, suppose I (and maybe Peter Ehlers too) am simulating a study in which random numbers (according to some distribution) of subjects become available, in each "sweep" of the study, for questionnaire, and the outcome of interest is the number in the "sweep" answering "Yes" to a question. Part of this simulation is to create a database of responses along with concomitant variables. It is possible (and under some circumstances perhaps more likely) that the number of available subjects in a "sweep" is zero -- these people cannot be contacted, say.

Maybe I'm studying a "missing data" situation.

In that case it would be natural to enter "r=NA" in the database for those sweeps which produces no responses. This would denote "missing data". And natural also to (initially, before embarking on say an imputation exercise) to attribute "P=NA" to the probability of "Yes" for such a group since we do not have any direct information (though may be able to exploit associations between other variables to obtain indirect information, under certain assumptions).

So maybe one could need implementations of pbinom and dbinom which work differently in different circumstances. But what remains important is that, whichever way they work in given circumstances, they should be consistent with each other.

Best wishes to all,
Ted.



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 06-Feb-06                                       Time: 10:10:19
------------------------------ XFMail ------------------------------

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon Feb 06 21:30:15 2006

This archive was generated by hypermail 2.1.8 : Mon 06 Feb 2006 - 15:33:21 GMT