Re: [R] faster computation of cumulative multinomial distribution

From: Alberto Monteiro <>
Date: Mon 02 Apr 2007 - 13:42:35 GMT

Theo Borm wrote:  

>>> I have a hunch that it won't work as my p vector typically contains
>>> values like:
>> p<-c(0.99, 0.005, 0.003, 0.001, 0.0005, 0.0003, 0.0001, 0.00005,
>> 0.00003, 0.00002).
>> So you should expect that only 1/50000 will be in the low-prob
>> hole in the average?

> Or less. And normally I'll have 30-60 holes, with the "remainder"
> hole containing > 98% of the events.
I think the problem here is not with the low probability associated to the total of the holes, but with the different probabilities associated to the different low-probability holes.
>> Hmmm.... That's right. No way to do a Monte Carlo here!

> Yes. Its back to square one for me. Maybe I should get a bigger
> computer. I heard that IBM is selling some quite fancy stuff....
> Would probably be quite a challenge to coerce R to run on such a
> machine ;-)
I don't think a bigger computer will solve it, after all, you might be dealing with probabilities close to 1:(every atom in the Universe)

Hmmm... Maybe you can break the problem into several (manageable) parts. For example, you can divide the problem into this:

(a) given N shots, how many will hit the lowest-prob hole (the 0.00002)?

This is easy: it's the binomial distribution. There are three cases to be considered here: those that hit zero times (and this will carry a zero forever), those that hit exactly once, and those that hit two or more.

Unless the two or more is not negligible compared to the exactly once, we can treat it as two.

The next holes could be done considering the independence of the conditional probabilities.

For example, hitting exactly once every hole would be
k <- length(p)
prod(dbinom(1, N - (0:(k-2)), p[k - (0:(k-2))]))

BTW, this is exactly
dmultinom(c(N-k+1, rep(1, k-1)), prob=p)

What I am missing here?

Alberto Monteiro mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Mon Apr 02 23:47:07 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 02 Apr 2007 - 14:30:36 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.