# [R] Simulate Correlated data from complex sample

From: Doran, Harold <HDoran_at_air.org>
Date: Fri 02 Dec 2005 - 03:41:59 EST

Dear List:

I have created some code to simulate data from a complex sample where 5000 students are nested in 50 schools. My code returns a dataframe with a variable representing student achievement at a single time point. My actual code for creating this is below.

What I would like to do is generate a second column of data that is correlated with the first at .8 and has the same means within each school. So I do not think I can use mvrnorm or simulate() in the Matrix package, at least not in a way I can currently see.

A very basic example would be something like first create a vector (s1) and then generate a second one that is correlated with the first by some user-defined measure.

In my example below the variable I want to replicate is data\$theta.

I think I could go through the exercise to write code that would so this, but I think there might be a smarter and easier function for doing so. I've used RSiteSearch() a bit, but the keywords I'm using aren't turning up results that I can use. I may be missing something very simple and transparent.

Any thoughts are much appreciated,
Harold
Ver 2.2
Windows XP

```N   <- 5000 # Number of students
J   <- 50   # Number of schools
N_j <- N/J  # Number of students in each school
a_g <- c(0,.5,1) # This is the growth vector

```

# Step 1 -- create psi for base grade
rps <- rep(N_j, J)
v_gk <- rep(rnorm(J, 0, sqrt(.01) ), rps) v_gik <- rnorm(N, 0, sqrt(.99))

# Organize into a dataframe
data <- data.frame(schid = rep(1:J, rps), stuid = 1:N, cbind(v_gk, v_gik), psi = v_gk + v_gik + a_g)

# Now create theta
B_g <- .95 # This is correlation between within-grade trait and vertical trait
w_gk <- 0 # fixed at zero for now
data\$w_gik <-rnorm(N, 0, sqrt(.0975))
data\$theta <- (B_g * data\$psi) + w_gk + data\$w_gik

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list