# [R] brewing stats

From: paul sorenson (sosman) <sourceforge_at_metrak.com>
Date: Sun 23 Oct 2005 - 21:52:21 EST

I guess this isn't so much of a help request as a show-and-tell from a non-statistician homebrewer who has been fumbling around with R. If nothing else it provides yet another data set. I hope it is not out of line.

Anyway, the plots I have produced are at

The polling method is somewhat simple, its just one of those multiple choice style polls you can create on various web forums.

The poll was prompted by the ongoing claim from fly spargers that "their" method is more efficient, but I had never seen data to support that. I thought maybe it was a bit of snobbery.

Maybe they are right. However if I conveniently ignore that annoying bump on the left of the batch sparge histogram then the two groups start to look very similar.

I was going to go out on a limb and say I learn heaps from reading the posts here so please don't ruin my delusion too much if my output violates all principles of good statistics. OTOH if you can suggest other cool looking graphs please feel free. The more difficult to pronounce the names are, the better :-)

The data set is (efficiency is the low end of its bin):

```method	efficiency	count	source
fly	95	0	bb
fly	90	0	bb
fly	85	2	bb
fly	80	8	bb
fly	75	13	bb
fly	70	8	bb
fly	65	3	bb
fly	60	0	bb
fly	55	0	bb
batch	95	0	bb
batch	90	0	bb
batch	85	4	bb
batch	80	3	bb
batch	75	15	bb
batch	70	10	bb
batch	65	6	bb
batch	60	7	bb
batch	55	1	bb

```

And the R code:

# Crunch some stuff with brewboard (and similar polls).

# Shift value to centre of bin

x\$efficiency = x\$efficiency + 2.5
# Ignore rows with no votes (NA), zeros are ok though

```y = x[which(!is.na(x\$count)),]
r = rep(row.names(y), y\$count)
z = y[r,]
```

z\$count = 1

par(mfrow=c(2,2))
barplot(table(z\$method), main="number of responses") barplot(table(z\$method, z\$efficiency), beside=T, legend=T, main="Mash efficiency by method", sub="paul sorenson 2005 brewiki.org") boxplot(z\$efficiency ~ z\$method, main="Mash efficiency") z.h = hist(z\$efficiency, prob=T, main="Efficiencies,\n all methods combined", xlab="efficiency")
z.md = max(z.h\$density)
lines(density(z\$efficiency, bw=3.0), col='blue')
#qqnorm(x\$efficiency)

t.test(efficiency ~ method, data=z)
#by(z, z\$method, summary)

zs = split(z, z\$method)
summary(zs\$batch)
summary(zs\$fly)

# fit a normal distribution

require(MASS)
z.fit = fitdistr(z\$efficiency, 'normal') q = 55:95 + 2.5
lines(q, dnorm(q, z.fit\$estimate['mean'], z.fit\$estimate['sd']), col='red') legend('topleft', legend=c('density', 'fitted'), col=c('blue', 'red'), lwd=1, inset=0.05)

# Factor out lowball values.

z.f = z[which(z\$efficiency >= 65),]
by(z.f, z.f\$method, summary)

R-help@stat.math.ethz.ch mailing list