Re: [R] question for aov and kruskal

From: Rolf Turner <r.turner_at_auckland.ac.nz>
Date: Thu, 13 Mar 2008 10:33:10 +1300

I thought your question was well expressed and that you followed the posting guide better than most.

I'm no expert on such issues, but I'd like to kick in a few opinions (with which others may disagree).

(1) All of the anova stuff is based on the assumption of homogeneity

     of variance. However my understanding is that the model is quite robust

     to this assumption.  Problems may arise if there are small sample
     sizes in some cells and if the small samples are associated with
     large variances.  Otherwise there is not all that much of a worry.

(2) The Tukey test is indeed based on the assumption of equal sample
     sizes.  The version of the test for unbalanced data is an  
approximation.
     My understanding is that it's a pretty good approximation.

(3) For multiple comparisons after applying the Kruskal-Wallis test: Experts

     on non-parametric statistics may know about more powerful methods, but

     I would be inclined simply to apply a Bonferroni correction to a collection

     of pairwise tests (e.g. wilcox.test). Just multiply the p- values by

     the number of pairwise comparisons, k-choose-2 where k is the number of

     groups (= 3-choose-2 = 3 in your toy example).

(4) Generally speaking I would say that if a classical test and a non- parametric

     test give different answers, then your data are being coy about revealing

     their true import. I would have very little faith in either answer, and

     would claim that you really need more data.

     Unfortunately this need can rarely be satisfied. If you have to make a

     decision one way or another, then you should go with the non- parametric

     answer.

(5) Finally, my universal prescription is: ``When in doubt, simulate.''

     I.e. simulate multiple data sets on the basis of models fitted to,
     or related to, your real data.  Run the possible tests on the  
simulated
     data sets.  Since these data are simulated, you know what the right
     answer is.  Count up how often you get the right answer.

     Such an exercise can be quite revealing.

HTH                 cheers,

                        Rolf Turner

On 13/03/2008, at 9:19 AM, eugen pircalabelu wrote:

> Hi,
>
> My data was only a toy example that matched the real situation,
> with real data, but i could not have posted the entire data.set and
> so i gave a self contained example of what i thought was my
> problem. Of course you can see with the naked eye that the data is
> unbalanced, (this was done intentionally) but like i said this was
> only a toy example, mimicking a problem from a real data set.
>
> Thank you and have a great ahead!
>
>
> David Hewitt <dhewitt37_at_gmail.com> wrote:
>
>
>> I have the following problem: how appropriate is my aov model
>> under the
>> violation of anova assumptions?
>>
>> Example:
>> a<-c(1,1,1,1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3,3)
>> b<-c(101,1010,200,300,400, 202, 121, 234, 55,555,66,76,88,34,239,
>> 30, 40,
>> 50,50,60)
>> z<-data.frame(a, b)
>> fligner.test(z$b, factor(z$a))
>> aov(z$b~factor(z$a))->ll
>> TukeyHSD(ll)
>>
>> Now from the aov i found that my model is unbalanced, and from the
>> flinger test i found out that the assumption of homogeneity of
>> variances
>> is rejected. Could my Tukey comparison be a valid one under these
>> violations? From what i read the Tukey test is valid only when the
>> model
>> is balanced and when the assumption of homogeneity of variances is
>> not
>> rejected, am i wrong? Can anyone tell me what would be the correct
>> test in
>> this case?
>>
>> Doing a non-parametric Kruskal - wallis test would give me a
>> different
>> result. But what would be the correct multiple comparison test in
>> this
>> case?
>>
>
> You shouldn't have needed aov to tell you that the data (not the
> model) are
> unbalanced. I could see that without running the code! Seriously,
> you might
> need to think more about the type of model you're using, and what
> you want
> to know, and then consider how to estimate the effect sizes of
> interest.
>
>
> -----
> David Hewitt
> Virginia Institute of Marine Science
> http://www.vims.edu/fish/students/dhewitt/
> --
> View this message in context: http://www.nabble.com/question-for-
> aov-and-kruskal-tp15955385p15976643.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Mar 2008 - 21:37:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 12 Mar 2008 - 22:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive