Re: [R] generate Binomial (not Binary) data

From: Ted Harding <Ted.Harding_at_manchester.ac.uk>
Date: Fri 09 Feb 2007 - 16:21:58 GMT


On 09-Feb-07 Duncan Murdoch wrote:
> On 2/7/2007 8:20 AM, Marc Bernard wrote:

>> Dear All,
>>    
>>   I am looking for an R function or any other reference to generate a
>>   series of correlated Binomial (not a Bernoulli) data. The "bindata"
>>   library can do this for the binary not the binomial case.

>
> Ted asked how you want your series correlated. Another question is how
> you want it "binomial": do you want each value to be binomial with
> parameters conditional on previous values, or do you want each to be
> marginally binomial with some fixed parameters?
>
> I can only think of two trivial solutions that meet both conditions
> simultaneously: the i.i.d. sequence, and a sequence that repeats 0
> indefinitely. I'd be interested in hearing if anyone knows of any
> non-trivial examples of sequences where the distributions are both
> conditionally and marginally binomial, or a proof that there are none.
>
> Duncan Murdoch

I've been thinking a bot more about this issue, and (somewhat on the same olines as Duncan above) have explicitly identified some difficulties.

A.
Gregor Gorjanc, replying to Bernard, wrote:

>> The "bindata" library can do this for the binary not the
>> binomial case.

>
> But binomial is just a sum of Bernoulli vars so this should
> be doable.

Suppose, following Grigor's suggestion, we make up correlated binomial variables as sums of correlated Bernoulli variables.

I.e. we have 2 binomial variabes:

  X = U1 + U2 + ... + Un
  Y = V1 + V2 + ... + Vn

where {U1,...,Un} are independent bernoullis (0/1) with P["1"] = p, and similar for {V1,...,Vn}. Then X and Y are Binomial (n,p).

Suppose that cor(U1,V1,) = r, which depends uniquely on

  P[U1=1 & V1=1] = p11

say, and similarly for (U2,V2), (U3,V3), ...

Then X and Y will be correlated, and their correlation can be calculated in terms of p11. So this is one model that can generate correlated binomial variables.

However, while a bivariate normal (given means and variances) is uniquely defined by the correlation coefficient, this is not the case for a bivariate binomial:

B.
Let U be binomial(m,p), V binomial((n-m),p), W binomial((n-m),p) and all three independent.

Then

  X = U + V is binomial(n,p)
  Y = U + W is binomial(n,p)

and X,Y are correlated, with correlation coefficient (if I've done it right) m/(m+n). So, while in general you cannot choose m (outof n) to get an exact correlation coefficient c, this is a different kind of model to generate correlated binomial variables.

There will be many others (some easier to think up then others).

C. The above concern only correlation between two binomial variables (X,Y). Bernard clarified his original query in a private email to me:

  "Let i indexes a subject and Y_i = (Y_i1, Y_i2,...,Y_iT)    be a vector of binomial variables for subject i such that    Y_it ~ Bin(n,p_t) with t = 1,2, ....T.    A simple correlation I would like to have is :    corr(Y_ij, Y_ik) = c for all (j,k)"

(I think the index i is irrelevant to the present query).

This could be done on the lines of case (B) above with

  Y_i1 = U + V1
  Y_i2 = U + V2
  ....
  Y_iT = U + VT

provided the desired value c is adequately approximated by m/(m+n).

Alternatively, you can start off from a situation inspired by (A):

Let W = U + V | U+V <=1, i.e. X is the sum of two independent bernoullis conditional on excluding the case where both are 1.

Then W is bernoulli with

  P[W=0] = q^12/(q^2 + 2*p*q), P[W=1] = 2*p*q/(q^2 + 2*p*q)

where p = P[U=1] = P[V=1] and q = (1-p). Let P[W=1] = p11.

Then create binomial(n,p11) variables Y_i1, Y_i2, ... , Y_iT by

  Y_i1 = (U1 + V11) + (U2 + V12) + ... + (Un + V1n)   Y_i2 = (U1 + V21) + (U2 + V22) + ... + (Un + V2n)   ....
  Y_iT = (U1 + VT1) + (U2 + VT2) + ... + (Un + VTn)

where each (Ui + Vij) is conditional on U+V <= 1 as in (B)..

Then we have pairwise equi-correlated variables Y_i1,...,Y_iT.

Although the two methods described above (i.e. in (C)) give pairwise equicorrelated binomial variables, I suspect that their higher-order moments, and indeed the joint distribution of (Y_ir,Y_is), will not be the same in the two methods.

So, as I see it, there is not a unique answet to the question. [Primitive versions of the above led me to ask the question

 How do you want your series of binomial datato be "correlated"?]

Anyone wiser than I am?

Ted.



E-Mail: (Ted Harding) <Ted.Harding@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 09-Feb-07                                       Time: 16:13:14
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Feb 10 03:40:04 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 09 Feb 2007 - 19:31:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.