[R] Interpretation of call to aov()

From: JeeBee <JeeBee_at_troefpunt.nl>
Date: Sat 05 Aug 2006 - 22:15:58 EST


Hi all,

I've been reading about aov() at
http://www.psych.upenn.edu/~baron/rpsych/rpsych.html and http://davidmlane.com/hyperstat/intro_ANOVA.html and I try to use this test in experiments with my simulator.

What I would like Anova to tell me is whether the differences I see when plotting the means of performance per method are significant. And also, whether this is dependent on the problem size (bigger is more complex).
I would be very grateful if there's somebody more mathematically skilled on this list who could tell me whether I'm drawing correct conclusions.

> data

    performance method problem

1   146780.0000      -f     960
2     4654.0000      -f     160
3    45840.0000      -f     320
4    54750.0000      -f     320
5    91750.0000      -f     480
6     7452.0000      -f     160
7     8866.0000      -f     160
8     8513.0000      -f     160
9   139520.0000      -f     960
10   85380.0000      -f     480

<snip>

> str(data)

`data.frame': 419 obs. of 3 variables:

 $ performance: num  146780   4654  45840  54750  91750 ...
 $ method     : Factor w/ 7 levels "-f","-f -q","-h0 -r0",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ problem    : int  960 160 320 320 480 160 160 160 960 480 ...


> summary(aov(performance ~ method * problem, data=data))
Df Sum Sq Mean Sq F value Pr(>F) method 6 3.3185e+11 5.5308e+10 416.91 < 2.2e-16 *** problem 1 5.7141e+11 5.7141e+11 4307.26 < 2.2e-16 *** method:problem 6 9.8891e+10 1.6482e+10 124.24 < 2.2e-16 *** Residuals 405 5.3728e+10 1.3266e+08
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I interpret this data as follows:

-1- The performance depends on the chosen method.
If I compute the overall performance means for each method, this will
give me numbers that are significantly different. This means, the method
with the greatest mean is significantly better than at least some other
methods (and not worse than any other method).

-2- The performance depends on the problem complexity.
This is not so interesting. In my setting it is trivial that performance
is worse for more complex problems.

-3- There is interaction between method and complexity, in other words,
when trying to order the methods from bad to good, one cannot simply do
this without taking the problem complexity into account. (for simple
problems method A might be the best, for complex problems, another method
might be the better).

I have not used Error() in my call to aov().
I've seen this one being used: Error(subj/(shape * color)
But I do not have subjects. Or in fact, I believe I have only 1, which is
my simulator. Am I correct about that? Or should I use something like
Error(method * problem) ?

Thanks in advance,
JeeBee.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat Aug 05 22:23:46 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 06 Aug 2006 - 04:18:14 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.