From: Petr Savicky <savicky_at_praha1.ff.cuni.cz>

Date: Mon, 25 Apr 2011 11:58:43 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 25 Apr 2011 - 10:00:32 GMT

Date: Mon, 25 Apr 2011 11:58:43 +0200

On Sun, Apr 24, 2011 at 07:00:26PM -0400, Shane Phillips wrote:

> Hi, R-Helpers!

*>
**> I have a dataframe that contains a binomial variable. I need to add another random variable drawn from a normal distribution with a specific mean and standard deviation. This variable also needs to be correlated with the existing binomial variable with a specific correlation (say .75). Any ideas?
*

Hi.

If X, Y are dependent random variables and we want to generate y, so that (x, y) is a pair from their joint distribution with known x, then y should be generated from the conditional distribution P(Y|X=x). If the probability P(X=x) is not too small, then this may be done by rejection sampling: Generate pairs (X, Y) until the condition X=x is satisfied and use the corresponding Y.

It remains to generate pairs (X, Y), where Y is a normal variable and X a binomial one. The parameters of Y are known, the parameters of X should be chosen somehow and the correlation of X and Y is known. I suggest the following. Compute the distribution of X as a vector of probabilities p_0, ..., p_n (see ?dbinom). Find a nondecreasing function f() from reals to {0, .., n} such that f(Y) has distribution p_0, ..., p_n. The function may be determined by a sequence of cutpoints a_1, ..., a_n defining f(y) as follows

y f(y)

(-infty, a_1) 0

[a_1, a_2) 1

...

[a_n, infty) n

For each i, the cutpoint a_i is the (p_0 + ... + p_{i-1})-quantile of Y (see ?qnorm). See ?cut for computing f().

The pair (f(Y), Y) has the required marginal distributions and, in my opinion, the maximal possible correlation. If this correlation is lower than the requested one, then i think there is no solution.

If the correlation of (f(Y), Y) is at least the required one, then use a mixture of the distribution (f(Y), Y) and (X, Y), where X has the required marginal distribution of X, but is generated independently from Y. The mixture parameter may be determined as a solution of an equation with one variable.

Hope this helps.

Petr Savicky.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 25 Apr 2011 - 10:00:32 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 25 Apr 2011 - 10:40:33 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*