Re: [R] Generating correlated data from uniform distribution

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Sat 02 Jul 2005 - 21:22:19 EST


On 02-Jul-05 Peter Dalgaard wrote:
> "Jim Brennan" <jfbrennan@rogers.com> writes:
>

>> OK now I am skeptical especially when you say in a weird way:-)
>> This may be OK but look at plot(x,y) and I am suspicious. Is it still
>> alright with this kind of relationship?

> ...
>> N <- 10000
>> rho <- .6
>> x <- runif(N, -.5,.5)
>> y <- x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))

>
> Well, the covariance is (everything has mean zero, of course)
>
> E(XY) = (1+rho)/2*EX^2 + (1-rho)/2*E(X*-X) = rho*EX^2
>
> The marginal distribution of Y is a mixture of two identical uniforms
> (X and -X) so is uniform and in particular has the same variance as X.
>
> In summary, EXY/sqrt(EX^2EY^2) == rho
>
> So as I said, it satisfies the formal requirements. X and Y are
> uniformly distributed and their correlation is rho.
>
> If for nothing else, I suppose that this example is good for
> demonstrating that independence and uncorrelatedness is not the same
> thing.

That was a nice sneaky solution! I was toying with something similar, but less sneaky, until I saw Peter's, on the lines of

  x<-runif(2N, -0.5,0.5); ix<-(N-k):(N+k); y<-x; y[ix]<-(-y[ix])

(which makes the same point about independence and correlation). The larger k as a fraction of N, the more you swing from rho = 1 to rho = -1, but you cannot achieve, as Peter did, an arbitrary correlation coefficient rho since the value depends on k which can only take discrete values.

Another approach which leads to a less "special" joint distribution is

  x<-sort(runif(N, -0.5,0.5)); y<-sort(runif(N, -0.5,0.5))

followed by a rho-dependent permutation of y. I'm still pondering a way of choosing the permutation so as to get a desired rho.

The extremes are the identity, which for a given sample will give as close as you can get to rho = +1, and reversal, which gives as close as you can get to rho = -1.

However, the maximum theoretical rho which you can get (as opposed to what is possible for particular samples, which may get arbitrarily close to +1) depends on N. For instance, with N=3, it looks as though the theoretical rho is about 0.9 with the "identity" permutation (for N=1000, however, just about all samples give rho > 0.99).

I smell a source of interesting exam questions ...

Over to you!

Best wishes,
Ted.



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-05                                       Time: 12:22:09
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 02 21:45:57 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST