Re: [R] difficulties computing a simple anova

From: Will Holcomb <wholcomb_at_gmail.com>
Date: Thu, 31 Jan 2008 17:11:31 -0600

That's excellent. I somehow missed that variable names were case sensitive. I was using all lowercase names and it was throwing things off. The error I would get using fun instead of FUN was along the lines of:

function (x)
x - mean(x)
<environment: 0x84be554>
Error in unique.default(x) : unique() applies only to vectors Execution halted

Which was a bit misleading since it looked like the function was being used. This solution was exactly the sort of thing I was hoping was possible though. Thank you. I talked to my prof about the fact I like R and will never own a copy of SAS, and I get to do my work in R this semester, so y'all may well hear from me again. =)

Will

On Jan 31, 2008 12:52 PM, Greg Snow <Greg.Snow_at_imail.org> wrote:

> Here are a couple of ways:
>
> tmp.df <- data.frame( group=factor(sample(LETTERS[1:3], 25,
> replace=TRUE)),
> val = rnorm(25, 50, 5) )
>
> tmp.df <- transform(tmp.df, ssq = ave(val, group,
> FUN=function(x) x-mean(x))^2 )
> # another way
> tmp.means <- tapply(tmp.df$val, tmp.df$group, FUN=mean)
> tmp.df$ssq2 <- as.vector( ( tmp.df$val - tmp.means[tmp.df$group] ) ^2 )
> # check that they both worked
> with( tmp.df, all.equal(ssq,ssq2) )
>
> Hope this helps,
>
> ------------------------------
> *From:* r-help-bounces_at_r-project.org on behalf of Will Holcomb
> *Sent:* Thu 1/31/2008 8:10 AM
> *To:* Simon Blomberg
> *Cc:* r-help
> *Subject:* Re: [R] difficulties computing a simple anova
>
> Excellent. I was able to do the analysis with no problem. I just had to
> do
> some twiddling with the inputs to get things from the CSV to the format
> you
> described.
>
> When the data is in that format, I would like to be able to say, "create a
> column which is each square of the difference of each response with the
> mean
> of all the responses for that drug." Is there a simple way to do that?
> What
> I am doing currently is:
>
> squares = c()
> for(drug in unique(drugs)) {
> responses = subset(spiderdata, Drug == drug, select = Response)
> squares = append(squares, (responses - mean(responses)) ^ 2)
> }
> spiderdata <- cbind(spiderdata, SquaresWithin = squares)
>
> This works because the elements are sorted by drug and the drugs vector
> has
> the drug names in the same order. If my data were mixed up however, this
> method would misalign results. Is there any way to say something like:
>
> spiderdata$SquaresWithin = (spiderdata$Response - mean(subset(spiderdata,
> Drug == *spiderdata$Drug of current row*, select = Response))) ^ 2
>
> Just using Drug == spiderdata$Drug doesn't seem to work and I think it is
> because Drug is a factor?
>
> I suppose I could always sort on the element I was generating a new column
> for and cause them to align, but I was wondering if the syntax would allow
> me to refer to elements of the row a computation. I was trying it with
> tapply which it sounds like should work, but this is not the method:
>
> sq_within = function(row) { (row$Response - mean(subset(spiderdata, row =
> row$Drug))) ^ 2 }
> tapply(spiderdata, spiderdata$Drug, fun = sq_within)
>
> It doesn't like the first and second arguments to be different lengths.
> The
> first argument is the data to be iterated over though and I think my
> understanding of how the function works is somehow fundamentally off. I've
> had a hard time finding clear examples where tapply isn't used with a
> function that generates a different number of outputs than there are
> inputs.
>
> Thank you again for the help. I'm liking R, just still getting the hang of
> some of the conceptual stuff.
>
> Will
>
> P.S. You don't have to be vague for the sake of my homework. The
> assignment
> was to do this in SAS which I've already completed. I don't plan on owning
> SAS though and I would like to be able to do statistics in the future, so
> learning R is just for my benefit.
>
> On Jan 30, 2008 8:11 PM, Simon Blomberg <s.blomberg1_at_uq.edu.au> wrote:
>
> > Your data setup is wrong. You have one factor (Drug) with 3 levels
> > (Zoloft, Naltrexone, Valium). So your data should be:
> >
> > spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"),
> > each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12,
> > 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> >
> > You should be able to take it from there. (Since this is a homework
> > problem, I'm being intentionally vague.)
> >
> > cheers,
> >
> > Simon.
> >
> > On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote:
> > > My grasp of R and statistics are both seriously lacking, so if this
> > question
> > > is completely naive, I apologize in advance. I've hunted for a couple
> > hours
> > > on the internet and none of the methods I've found have produced the
> > result
> > > I'm looking for.
> > >
> > > I'm currently a student in a Statistics class and we are learning the
> > ANOVA.
> > > We had to do one by hand and then reproduce our work in SAS. I really
> > like
> > > the idea of understanding R however and would like to reproduce the
> > solution
> > > in R if possible.
> > >
> > > Where I'm at now is this little program:
> > > http://odin.himinbi.org/classes/psy304b/spider_analysis.r
> > >
> > > The program calculates an anova manually (correctly, I'm pretty sure,
> it
> > > agrees with the same numbers in excel). The answer that it comes up
> with
> > > doesn't agree with any of the numbers I can get either the aov or
> anova
> > > functions to produce.
> > >
> > > Can anyone help me with simply the method to compute a one-way anova?
> > Well,
> > > specifically how to replicate the sort of anova people learn in an
> intro
> > to
> > > statistics class. All of the degrees of freedom are off from what I
> > expect
> > > them to be (they're all 1).
> > >
> > > (The original problem, should it help in understanding my question, is
> > at:
> > > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it
> > will
> > > likely look pretty funky if your browser doesn't support mathml
> (firefox
> > > does).)
> > >
> > > Will
> > >
> > > The program is as follows:
> > >
> > > library(foreign)
> > > # spiderdata <- read.csv("spider_data.csv")
> > >
> > > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7,
> 6),
> > > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
> > > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> > >
> > > summary(spiderdata)
> > >
> > > # Compute a one-way ANOVA by hand
> > >
> > > J = length(spiderdata)
> > >
> > > sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
> > > for(j in 2:J) {
> > > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
> > > }
> > > sqdata
> > >
> > > N = 0
> > > for(j in 1:J) {
> > > N = N + length(sqdata[[j]])
> > > }
> > >
> > > SSW = sum(sqdata)
> > > MSW = SSW / (N - J)
> > > SSB = 0
> > > for(j in 1:(length(spiderdata))) {
> > > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
> > > (sum(spiderdata) / N)) ^ 2)
> > > }
> > > MSB = SSB / (J - 1)
> > >
> > > F = MSB / MSW
> > > f_prob = pf(F, J - 1, N - J)
> > > reject_point = qf(.95, J - 1, N - J)
> > >
> > > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:",
> F,
> > ",
> > > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")
> > >
> > > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
> > > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help_at_r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > --
> > Simon Blomberg, BSc (Hons), PhD, MAppStat.
> > Lecturer and Consultant Statistician
> > Faculty of Biological and Chemical Sciences
> > The University of Queensland
> > St. Lucia Queensland 4072
> > Australia
> > Room 320 Goddard Building (8)
> > T: +61 7 3365 2506
> > http://www.uq.edu.au/~uqsblomb <http://www.uq.edu.au/%7Euqsblomb> <
> http://www.uq.edu.au/%7Euqsblomb>
> > email: S.Blomberg1_at_uq.edu.au
> >
> > Policies:
> > 1. I will NOT analyse your data for you.
> > 2. Your deadline is your problem.
> >
> > The combination of some data and an aching desire for
> > an answer does not ensure that a reasonable answer can
> > be extracted from a given body of data. - John Tukey.
> >
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 31 Jan 2008 - 23:22:05 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 Jan 2008 - 23:30:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive