# Re: [R] Generating correlated data from uniform distribution

From: Greg Snow <greg.snow_at_ihc.com>
Date: Wed 06 Jul 2005 - 02:34:39 EST

Here is an approach using 'optim' and simulated annealing:

x <- sort(runif(1000))
y <- sort(runif(1000))

ord <- 1:1000
target <- function(ord){ ( cor(x, y[ord]) - 0.6 ) ^2 } new.point <- function(ord){

```	tmp <- sample(length(ord), 2)
ord[tmp] <- ord[rev(tmp)]
ord
```

}

new.point2 <- function(ord){

```	tmp <- sample(length(ord) -100, 1)
tmp2 <- sample(100, 1)
ord[ c(tmp, tmp+tmp2) ] <- ord[ c(tmp+tmp2, tmp) ]
ord
```

}

res <- optim(ord, target, new.point, method="SANN",

control = list(maxit=6000, temp=2000, trace=TRUE))

res2 <- optim(ord, target, new.point2, method="SANN",

control = list(maxit=60000, temp=200, trace=TRUE))

y <- y[res\$par]

par(mfrow=c(2,2))

```hist(x)
hist(y)
plot(x,y)
```

cor(x,y)

y <- sort(y)[res2\$par]

par(mfrow=c(2,2))

```hist(x)
hist(y)
plot(x,y)
```

cor(x,y)

Hope this helps,

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow@ihc.com
(801) 408-8111

>>> "Jim Brennan" <jfbrennan@rogers.com> 07/01/05 05:25PM >>> OK now I am skeptical especially when you say in a weird way:-) This may be OK but look at plot(x,y) and I am suspicious. Is it still alright with this kind of relationship?

For large N it appears Spencer's method is returning slightly lower correlation for the uniforms as compared to the normals so maybe there is a problem!?!

Hope we are all learning something and Menghui gets/has what he wants . :-)

-----Original Message-----
From: pd@pubhealth.ku.dk [mailto:pd@pubhealth.ku.dk] On Behalf Of Peter Dalgaard
Sent: July 1, 2005 6:59 PM
To: Jim Brennan
Cc: 'Tony Plate'; 'Menghui Chen'; r-help@stat.math.ethz.ch Subject: Re: [R] Generating correlated data from uniform distribution

"Jim Brennan" <jfbrennan@rogers.com> writes:

> Yes you are right I guess this works only for normal data. Free advice
> sometimes comes with too little consideration :-)

Worth every cent...

Hmm, but is it? Or rather, what is the relation between the correlation of the normals and that of the transformed variables? Looks nontrivial to me.

Incidentally, here's a way that satisfies the criteria, but in a rather weird way:

N <- 10000
rho <- .6
x <- runif(N, -.5,.5)
y <- x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))

```--
O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help