Re: [R] adding the mean and standard deviation to boxplots

From: HBaize <HBaize_at_buttecounty.net>
Date: Wed, 23 Jul 2008 14:52:17 -0700 (PDT)

Fernando,
I don't have time to do all that you asked, but here is some code that makes violin plots with mean, median, and 95% CI. I like this plot very much, even if boxplot purists think it is horrible :-)

I think the boxplot was developed before we had computing power. Now we can show the detail of the distribution easily. This code uses the library "UsingR" written by John Verzani.

Real R wizards will find my code to be crude. It could be done with more elegance, but it works :-)
Note that I varied the sample size to show difference in 95% CI.

HTH

## Create three random data vectors

```a <- rnorm(25,2500,300)
b <- rnorm(50,3500,250)
c <- rnorm(100,4000,200)

```

## Convert data vectors to dataframes

```adf <- data.frame(Group = " A ", Measure = a)
bdf <- data.frame(Group = " B ", Measure = b)
cdf <- data.frame(Group = " C ", Measure = c)

```

## Combine into a dataframe using rbind
abcData <- rbind(adf, bdf, cdf)
attach(abcData)

## load the UsingR library for violin plots

library(UsingR)

## Run boxplot to find statistics, but don't draw the boxplots
S <- boxplot(Measure ~ Group, plot=FALSE)

## Draw violin plots

simple.violinplot(Measure ~ Group,

```                col = "lightblue")

```

title(main="Just Random Test Data",

```      sub="A, B, & C",
cex.main = 1.5,
cex.sub = 1.3)

```

## Define locations for additional chart elements
at <- c(1:length(S\$names))

## Draw thick green lines for median values

points(at,S\$stats[3, ], pch = 22, cex = 1.2, bg = "darkgreen")

## Get Group means and plot them using a diamond plot symbol
## IMPORTANT -- must add the missing values removal: na.rm=TRUE
## if there is any missing data.

means <- by(Measure, Group, mean, na.rm=TRUE) points(at,means, pch = 23, cex = 1.2, bg = "red")

##- Get CIs -##
## create standard error function--

se <- function(x) {

```         y <- x[!is.na(x)]
sqrt(var(as.vector(y))/length(y))
```
}

## create length function for non-missing values
lngth <- function(x){

```            y <- x[!is.na(x)]
length(y)
```

}

## Compute vectors of standard error and n
Hse <- by(Measure,Group,se)
Hn <- by(Measure,Group,lngth)

## compute 95% CIs and store in vectors
civ.u <- means + qt(.975, df=Hn-1) * Hse # Upper bound CI civ.l <- means + qt(.025, df=Hn-1) * Hse # Lower bound CI

## Draw CI, first vertical line, then upper and lower horizontal
segments(at, civ.u, at, civ.l, lty = "solid", lwd = 2, col = "red") segments(at - 0.1, civ.u, at + 0.1, civ.u, lty = "solid", lwd =2,col = "red")
segments(at - 0.1, civ.l, at + 0.1, civ.l, lty = "solid", lwd =2,col = "red")

## Draw Mean values to the left edge of each violinplot
text(at - 0.1, means, labels = formatC(means, format = "f", digits = 1),

pos = 2, cex = 1, col = "red")

## Draw Median values to the right edge of each violinplot
text(at + 0.1, S\$stats[3, ], labels = formatC(S\$stats[3, ],

format = "f", digits = 1), pos = 4, cex = 1, col = "darkgreen")

## Print "n" under the name of measure

mtext(S\$n, side = 1, at = at, cex=.75, line = 2.5)

## End

Fernando Marmolejo-Ramos wrote:

```>
> Dear users
>
> This is a message I was directing to Harold Baize but because I pressed
> the wrong button the message got lost grrrr!!!
>
> So I’m doing it all over again:
>
> Lets suppose I have three batches of data:
>
> a <- rnorm(50,2500,300)
> b <- rnorm(50,3500,250)
> c <- rnorm(50,4000,200)
>
> # Now I want to plot them as boxplots and violin plots
> require(vioplot)
> vioplot (a,b,c, horizontal=T, col=“white”)
> boxplot (a,b,c, horizontal=T, col=“white”)
>
> As we know boxplot show the least-greates values, lower-upper quartiles,
> the mean, and outliers (when present).
>
> However, for some data is not important the MEDIAN but the MEAN. Also, it
> is more relevant to show ERROR BARS instead of quartiles.
>
> So, how could I see (for the batches of data I introduced above)…
>
> 1.	a boxplot showing the MEAN and the SD instead of the lower/upper
> quartile?
> 2.	a boxplot showing the MEAN and the STANDARD ERROR OF THE MEAN instead
> of the lower/upper quartile?
> 3.	a boxplot showing the MEAN and the 95% CI instead of the lower/upper
> quartile?
>
> (I think in all these cases is preferable to have visual access, or to
> have the line that shows, the LEAST and the GREATEST VALUES.)
>
> In other words, that the ERROR BARS (95% CI, SD, SE) proposed here take
> the place of the boxes usually used to represent the lower/upper quartile.
>
> Now, the big question, is all this jazz possible to be implemented in
> violin plots as well?
>
> How could that be done?
>
> Cheers,
>
> Fernando
>

```
```--
View this message in context: http://www.nabble.com/adding-the-mean-and-standard-deviation-to-boxplots-tp15271398p18619876.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
```
Received on Wed 23 Jul 2008 - 21:56:40 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Jul 2008 - 22:32:51 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.