RE: [R] Using R to illustrate the Central Limit Theorem

From: Bliese, Paul D LTC USAMH <paul.bliese_at_us.army.mil>
Date: Fri 13 May 2005 - 19:59:58 EST


Interesting thread. The graphics are great, the only thing that might be worth doing for teaching purposes would be to illustrate the original distribution that is being averaged 1000 times.

Below is one option based on Bill Venables code. Note that to do this I had to start with a k of 2.

N <- 10000
 for(k in 2:20) {

    graphics.off()
    par(mfrow = c(2,2), pty = "s")

    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1")
    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2")
    m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k))
    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
    pu <- par("usr")[1:2]
    x <- seq(pu[1], pu[2], len = 500)
    lines(x, dnorm(x), col = "red")
    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")     abline(0, 1, col = "red")
    Sys.sleep(3)
  }

By the way, I should probably know this but what is the logic of the "sqrt(12*k)" part of the example? Obviously as k increases the mean will approach .5 in a uniform distribution, so runif(k)-.5 will be close to zero, and sqrt(12*k) increases as k increases. Why 12, though?

PB

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Kevin E. Thorpe Sent: Friday, May 13, 2005 12:03 AM
To: Bill.Venables@csiro.au
Cc: ted.harding@nessie.mcc.ac.uk; r-help@stat.math.ethz.ch Subject: Re: [R] Using R to illustrate the Central Limit Theorem

This thread was very timely for me since I will be teaching an introductory
biostats course in the fall, including a session on CLT. I have shamelessly
taken Dr. Venables' slick piece of code and wrapped it in a function so that
I can vary the maximum sample size at will. I then created functions based
on the first to sample from the exponential and the chi-squaredistributions.
Lastly, I created a function to sample from a pdf with a parabolic shape (confirming in the process just how rusty my calculus is :-) ). My code is
below for all to do with as they please.

Now, at the risk of asking a really obvious question ...

In my final function, I used the inverse probability integral transformation
to sample from my parabolic distribution. I create a local function rparab
shown here:

rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}

It seems that in my version of R (2.0.1) on Linux, that calculating the cube
root of a negative number using ^(1/3) returns NaN. I looked at the help in
the arithmetic operators and did help.search("cube root"), help.search("root")
and help.search("cube") and recognised no alternatives. So I used an ifelse() to
deal with the negatives. Have I missed something really elementary?

Thanks

clt.unif <- function(n=20) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Uniform(0,1), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.exp <- function(n=20,theta=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rexp(N*k,1/theta), N, k)) - theta)/sqrt(theta^2/k) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("exp(",theta,"), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.chisq <- function(n=20,nu=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rchisq(N*k,nu), N, k)) - nu)/sqrt(2*nu/k) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Chi-Square(",nu,"), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.parab <- function(n=20) {
rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}

N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rparab(N*k), N, k)))/sqrt(3/(5*k)) hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red")
Sys.sleep(1)
}
}

Bill.Venables@csiro.au wrote:

>Here's a bit of a refinement on Ted's first suggestion.
>
> 
> N <- 10000
> graphics.off()
> par(mfrow = c(1,2), pty = "s")
> for(k in 1:20) {
>    m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
>    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
>            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
>    pu <- par("usr")[1:2]
>    x <- seq(pu[1], pu[2], len = 500)
>    lines(x, dnorm(x), col = "red")
>    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
>    abline(0, 1, col = "red")
>    Sys.sleep(1)
>  }
>  
>


-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: kevin.thorpe@utoronto.ca  Tel: 416.946.8081  Fax: 416.971.2462

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri May 13 20:25:07 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:45 EST