[R] Simulate data with binary outcome

From: Steve Frost <S.Frost_at_uws.edu.au>
Date: Wed, 16 Jul 2008 15:40:24 +1000


Dear R-Users,

             I wish to simulate a binary outcome data set with predictors (in the example below, age, sex and systolic BP). Is there a way I can set the frequency of the outcome (y) to be say 5% (versus the 0.1% when using the seed below)?

# Example R-code based on Frank Harrell's Design help files

library(Hmisc)
n <- 1000
set.seed(123456)

age <- runif(n, 60, 90)
sbp <- rnorm(n, 120, 15)
sex <- factor(sample(c('female','male'), n,TRUE))

# Specify population model for log odds that CHD = Yes

L  <- 0.4*(sex == 'male') +
      0.045*(age) +
      0.05*(sbp)

# Simulate binary y to have Prob(y = 1) = 1/[1+exp(-L)]

y <- ifelse(runif(n) < plogis(L), 1, 0)
table(y)

ddist <- datadist(sex,age,sbp)
options(datadist = 'ddist')

fit <- lrm(y ~ sex + age + sbp)

summary(fit)



Steve Frost MPH
University of Western Sydney
Building 7
Campbelltown Campus
Locked Bag 1797
PENRITH SOUTH DC 1797
Phone 61+ 2 4620 3415
Mobile 0407 291088
Fax 61+ 2 4625 4252


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Jul 2008 - 05:44:05 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 16 Jul 2008 - 06:31:47 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive