From: Rolf Turner <r.turner_at_auckland.ac.nz>

Date: Thu, 13 Mar 2008 10:33:10 +1300

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Mar 2008 - 21:37:24 GMT

Date: Thu, 13 Mar 2008 10:33:10 +1300

I thought your question was well expressed and that you followed the posting guide better than most.

I'm no expert on such issues, but I'd like to kick in a few opinions (with which others may disagree).

(1) All of the anova stuff is based on the assumption of homogeneity

of variance. However my understanding is that the model is quite robust

to this assumption. Problems may arise if there are small sample sizes in some cells and if the small samples are associated with large variances. Otherwise there is not all that much of a worry. (2) The Tukey test is indeed based on the assumption of equal sample sizes. The version of the test for unbalanced data is an approximation. My understanding is that it's a pretty good approximation.

(3) For multiple comparisons after applying the Kruskal-Wallis test: Experts

on non-parametric statistics may know about more powerful methods, but

I would be inclined simply to apply a Bonferroni correction to a collection

of pairwise tests (e.g. wilcox.test). Just multiply the p- values by

the number of pairwise comparisons, k-choose-2 where k is the number of

groups (= 3-choose-2 = 3 in your toy example).

(4) Generally speaking I would say that if a classical test and a non- parametric

test give different answers, then your data are being coy about revealing

their true import. I would have very little faith in either answer, and

would claim that you really need more data.

Unfortunately this need can rarely be satisfied. If you have to make a

decision one way or another, then you should go with the non- parametric

answer.

(5) Finally, my universal prescription is: ``When in doubt, simulate.''

I.e. simulate multiple data sets on the basis of models fitted to, or related to, your real data. Run the possible tests on the simulated data sets. Since these data are simulated, you know what the right answer is. Count up how often you get the right answer. Such an exercise can be quite revealing.

**HTH
**
cheers,

Rolf Turner

On 13/03/2008, at 9:19 AM, eugen pircalabelu wrote:

*> Hi,
**>
*

> My data was only a toy example that matched the real situation,

*> with real data, but i could not have posted the entire data.set and
**> so i gave a self contained example of what i thought was my
**> problem. Of course you can see with the naked eye that the data is
**> unbalanced, (this was done intentionally) but like i said this was
**> only a toy example, mimicking a problem from a real data set.
**>
**> Thank you and have a great ahead!
**>
**>
**> David Hewitt <dhewitt37_at_gmail.com> wrote:
**>
**>
**>> I have the following problem: how appropriate is my aov model
**>> under the
**>> violation of anova assumptions?
**>>
**>> Example:
**>> a<-c(1,1,1,1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3,3)
**>> b<-c(101,1010,200,300,400, 202, 121, 234, 55,555,66,76,88,34,239,
**>> 30, 40,
**>> 50,50,60)
**>> z<-data.frame(a, b)
**>> fligner.test(z$b, factor(z$a))
**>> aov(z$b~factor(z$a))->ll
**>> TukeyHSD(ll)
**>>
**>> Now from the aov i found that my model is unbalanced, and from the
**>> flinger test i found out that the assumption of homogeneity of
**>> variances
**>> is rejected. Could my Tukey comparison be a valid one under these
**>> violations? From what i read the Tukey test is valid only when the
**>> model
**>> is balanced and when the assumption of homogeneity of variances is
**>> not
**>> rejected, am i wrong? Can anyone tell me what would be the correct
**>> test in
**>> this case?
**>>
**>> Doing a non-parametric Kruskal - wallis test would give me a
**>> different
**>> result. But what would be the correct multiple comparison test in
**>> this
**>> case?
**>>
**>
**> You shouldn't have needed aov to tell you that the data (not the
**> model) are
**> unbalanced. I could see that without running the code! Seriously,
**> you might
**> need to think more about the type of model you're using, and what
**> you want
**> to know, and then consider how to estimate the effect sizes of
**> interest.
**>
**>
**> -----
**> David Hewitt
**> Virginia Institute of Marine Science
**> http://www.vims.edu/fish/students/dhewitt/
**> --
**> View this message in context: http://www.nabble.com/question-for-
**> aov-and-kruskal-tp15955385p15976643.html
**> Sent from the R help mailing list archive at Nabble.com.
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-
**> guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
**>
**>
**> ---------------------------------
**>
**> [[alternative HTML version deleted]]
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-
**> guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

######################################################################Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Mar 2008 - 21:37:24 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 12 Mar 2008 - 22:30:21 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*