Re: [R] Generate a serie of new vars that correlate with existing var

From: Nguyen Dinh Nguyen <n.nguyen_at_garvan.org.au>
Date: Tue 03 Apr 2007 - 22:51:48 GMT


Dear Greg,
Thanks million!
"As good as it gets" :)
All the best
Nguyen

-----Original Message-----
From: Greg Snow [mailto:Greg.Snow@intermountainmail.org] Sent: Wednesday, April 04, 2007 1:46 AM
To: Nguyen Dinh Nguyen; r-help@stat.math.ethz.ch Subject: RE: [R] Generate a serie of new vars that correlate with existing var

Here is one way to do it:

# create the initial x variable
x1 <- rnorm(100, 15, 5)

# x2, x3, and x4 in a matrix, these will be modified to meet the criteria
x234 <- scale(matrix( rnorm(300), ncol=3 ))

# put all into 1 matrix for simplicity
x1234 <- cbind(scale(x1),x234)

# find the current correlation matrix
c1 <- var(x1234)

# cholesky decomposition to get independence chol1 <- solve(chol(c1))

newx <- x1234 %*% chol1

# check that we have independence and x1 unchanged zapsmall(cor(newx))
all.equal( x1234[,1], newx[,1] )

# create new correlation structure (zeros can be replaced with other r vals)
newc <- matrix(

c(1  , 0.4, 0.5, 0.6, 
  0.4, 1  , 0  , 0  ,
  0.5, 0  , 1  , 0  ,
  0.6, 0  , 0  , 1  ), ncol=4 )

# check that it is positive definite
eigen(newc)

chol2 <- chol(newc)

finalx <- newx %*% chol2 * sd(x1) + mean(x1)

# verify success
mean(x1)
colMeans(finalx)

sd(x1)
apply(finalx, 2, sd)

zapsmall(cor(finalx))
pairs(finalx)

all.equal(x1, finalx[,1])

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow@intermountainmail.org
(801) 408-8111
 
 


> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Nguyen
> Dinh Nguyen
> Sent: Sunday, April 01, 2007 7:47 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Generate a serie of new vars that correlate with
> existing var
>
> Dear R helpers,
> I have a var (let call X1) with approximately Normal
> distribution (say, mean=15, SD=5).
> I want to generate a series of additional vars X2, X3,
> X4...such that the correlation between X2 and X1 is o.4, X3 and
> X1 is 0.5, X4 and X1 is 0.6 and so on with the condition all
> variables X2, X3, X4....have the same mean and SD with X1.
> Any help should be appreciated
> Regards
> Nguyen
>
> ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Wed Apr 04 08:52:49 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 04 Apr 2007 - 02:31:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.