# RE: [R] Using R to illustrate the Central Limit Theorem

From: Bliese, Paul D LTC USAMH <paul.bliese_at_us.army.mil>
Date: Fri 13 May 2005 - 19:59:58 EST

Below is one option based on Bill Venables code. Note that to do this I had to start with a k of 2.

N <- 10000
for(k in 2:20) {

graphics.off()
par(mfrow = c(2,2), pty = "s")

```    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1")
hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2")
m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k))
hist(m, breaks = "FD", xlim = c(-4,4), main = k,
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
```
pu <- par("usr")[1:2]
x <- seq(pu, pu, len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")     abline(0, 1, col = "red")
Sys.sleep(3)
}

By the way, I should probably know this but what is the logic of the "sqrt(12*k)" part of the example? Obviously as k increases the mean will approach .5 in a uniform distribution, so runif(k)-.5 will be close to zero, and sqrt(12*k) increases as k increases. Why 12, though?

PB

This thread was very timely for me since I will be teaching an introductory
biostats course in the fall, including a session on CLT. I have shamelessly
taken Dr. Venables' slick piece of code and wrapped it in a function so that
I can vary the maximum sample size at will. I then created functions based
on the first to sample from the exponential and the chi-squaredistributions.
Lastly, I created a function to sample from a pdf with a parabolic shape (confirming in the process just how rusty my calculus is :-) ). My code is
below for all to do with as they please.

Now, at the risk of asking a really obvious question ...

In my final function, I used the inverse probability integral transformation
to sample from my parabolic distribution. I create a local function rparab
shown here:

rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}

It seems that in my version of R (2.0.1) on Linux, that calculating the cube
root of a negative number using ^(1/3) returns NaN. I looked at the help in
the arithmetic operators and did help.search("cube root"), help.search("root")
and help.search("cube") and recognised no alternatives. So I used an ifelse() to
deal with the negatives. Have I missed something really elementary?

Thanks

clt.unif <- function(n=20) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Uniform(0,1), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu, pu, len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.exp <- function(n=20,theta=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rexp(N*k,1/theta), N, k)) - theta)/sqrt(theta^2/k) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("exp(",theta,"), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu, pu, len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.chisq <- function(n=20,nu=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rchisq(N*k,nu), N, k)) - nu)/sqrt(2*nu/k) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Chi-Square(",nu,"), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu, pu, len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.parab <- function(n=20) {
rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}

N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rparab(N*k), N, k)))/sqrt(3/(5*k)) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu, pu, len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

Bill.Venables@csiro.au wrote:

```>Here's a bit of a refinement on Ted's first suggestion.
>
>
> N <- 10000
> graphics.off()
> par(mfrow = c(1,2), pty = "s")
> for(k in 1:20) {
>    m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
>    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
>            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
>    pu <- par("usr")[1:2]
>    x <- seq(pu, pu, len = 500)
>    lines(x, dnorm(x), col = "red")
>    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
>    abline(0, 1, col = "red")
>    Sys.sleep(1)
>  }
>
>

```
```--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: kevin.thorpe@utoronto.ca  Tel: 416.946.8081  Fax: 416.971.2462

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help