# Re: [R] difficulties computing a simple anova

From: Greg Snow <Greg.Snow_at_imail.org>
Date: Thu, 31 Jan 2008 11:52:35 -0700

Here are a couple of ways:

tmp.df <- data.frame( group=factor(sample(LETTERS[1:3], 25, replace=TRUE)),   val = rnorm(25, 50, 5) )

tmp.df <- transform(tmp.df, ssq = ave(val, group,

FUN=function(x) x-mean(x))^2 )
# another way
tmp.means <- tapply(tmp.df\$val, tmp.df\$group, FUN=mean) tmp.df\$ssq2 <- as.vector( ( tmp.df\$val - tmp.means[tmp.df\$group] ) ^2 ) # check that they both worked
with( tmp.df, all.equal(ssq,ssq2) )

Hope this helps,

From: r-help-bounces_at_r-project.org on behalf of Will Holcomb Sent: Thu 1/31/2008 8:10 AM
To: Simon Blomberg
Cc: r-help
Subject: Re: [R] difficulties computing a simple anova

Excellent. I was able to do the analysis with no problem. I just had to do some twiddling with the inputs to get things from the CSV to the format you described.

When the data is in that format, I would like to be able to say, "create a column which is each square of the difference of each response with the mean of all the responses for that drug." Is there a simple way to do that? What I am doing currently is:

squares = c()
for(drug in unique(drugs)) {

responses = subset(spiderdata, Drug == drug, select = Response)    squares = append(squares, (responses - mean(responses)) ^ 2) }
spiderdata <- cbind(spiderdata, SquaresWithin = squares)

This works because the elements are sorted by drug and the drugs vector has the drug names in the same order. If my data were mixed up however, this method would misalign results. Is there any way to say something like:

spiderdata\$SquaresWithin = (spiderdata\$Response - mean(subset(spiderdata, Drug == *spiderdata\$Drug of current row*, select = Response))) ^ 2

Just using Drug == spiderdata\$Drug doesn't seem to work and I think it is because Drug is a factor?

I suppose I could always sort on the element I was generating a new column for and cause them to align, but I was wondering if the syntax would allow me to refer to elements of the row a computation. I was trying it with tapply which it sounds like should work, but this is not the method:

sq_within = function(row) { (row\$Response - mean(subset(spiderdata, row = row\$Drug))) ^ 2 }
tapply(spiderdata, spiderdata\$Drug, fun = sq_within)

It doesn't like the first and second arguments to be different lengths. The first argument is the data to be iterated over though and I think my understanding of how the function works is somehow fundamentally off. I've had a hard time finding clear examples where tapply isn't used with a function that generates a different number of outputs than there are inputs.

Thank you again for the help. I'm liking R, just still getting the hang of some of the conceptual stuff.

Will

P.S. You don't have to be vague for the sake of my homework. The assignment was to do this in SAS which I've already completed. I don't plan on owning SAS though and I would like to be able to do statistics in the future, so learning R is just for my benefit.

On Jan 30, 2008 8:11 PM, Simon Blomberg <s.blomberg1_at_uq.edu.au> wrote:

> Your data setup is wrong. You have one factor (Drug) with 3 levels
> (Zoloft, Naltrexone, Valium). So your data should be:
>
> spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"),
> each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12,
> 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
>
> You should be able to take it from there. (Since this is a homework
> problem, I'm being intentionally vague.)
>
> cheers,
>
> Simon.
>
> On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote:
> > My grasp of R and statistics are both seriously lacking, so if this
> question
> > is completely naive, I apologize in advance. I've hunted for a couple
> hours
> > on the internet and none of the methods I've found have produced the
> result
> > I'm looking for.
> >
> > I'm currently a student in a Statistics class and we are learning the
> ANOVA.
> > We had to do one by hand and then reproduce our work in SAS. I really
> like
> > the idea of understanding R however and would like to reproduce the
> solution
> > in R if possible.
> >
> > Where I'm at now is this little program:
> > http://odin.himinbi.org/classes/psy304b/spider_analysis.r
> >
> > The program calculates an anova manually (correctly, I'm pretty sure, it
> > agrees with the same numbers in excel). The answer that it comes up with
> > doesn't agree with any of the numbers I can get either the aov or anova
> > functions to produce.
> >
> > Can anyone help me with simply the method to compute a one-way anova?
> Well,
> > specifically how to replicate the sort of anova people learn in an intro
> to
> > statistics class. All of the degrees of freedom are off from what I
> expect
> > them to be (they're all 1).
> >
> > (The original problem, should it help in understanding my question, is
> at:
> > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it
> will
> > likely look pretty funky if your browser doesn't support mathml (firefox
> > does).)
> >
> > Will
> >
> > The program is as follows:
> >
> > library(foreign)
> > # spiderdata <- read.csv("spider_data.csv")
> >
> > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6),
> > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
> > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> >
> > summary(spiderdata)
> >
> > # Compute a one-way ANOVA by hand
> >
> > J = length(spiderdata)
> >
> > sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
> > for(j in 2:J) {
> > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
> > }
> > sqdata
> >
> > N = 0
> > for(j in 1:J) {
> > N = N + length(sqdata[[j]])
> > }
> >
> > SSW = sum(sqdata)
> > MSW = SSW / (N - J)
> > SSB = 0
> > for(j in 1:(length(spiderdata))) {
> > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
> > (sum(spiderdata) / N)) ^ 2)
> > }
> > MSB = SSB / (J - 1)
> >
> > F = MSB / MSW
> > f_prob = pf(F, J - 1, N - J)
> > reject_point = qf(.95, J - 1, N - J)
> >
> > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F,
> ",
> > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")
> >
> > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
> > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> --
> Simon Blomberg, BSc (Hons), PhD, MAppStat.
> Lecturer and Consultant Statistician
> Faculty of Biological and Chemical Sciences
> The University of Queensland
> St. Lucia Queensland 4072
> Australia
> Room 320 Goddard Building (8)
> T: +61 7 3365 2506
> http://www.uq.edu.au/~uqsblomb <http://www.uq.edu.au/%7Euqsblomb>
> email: S.Blomberg1_at_uq.edu.au
>
> Policies:
> 1. I will NOT analyse your data for you.
>
> The combination of some data and an aching desire for
> be extracted from a given body of data. - John Tukey.
>
>

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 31 Jan 2008 - 18:59:12 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 01 Feb 2008 - 00:30:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.