# Re: [R] Lack of independence in anova()

From: Douglas Bates <dmbates_at_gmail.com>
Date: Tue 05 Jul 2005 - 02:12:46 EST

I have already had email exchanges off-list with Phillip Good pointing out that the independence property that he wishes to establish by simulation is a consequence of orthogonality of the column span of the row contrasts and the interaction contrasts. If you set the contrasts option to a set of orthogonal contrasts such as contr.helmert or contr.sum, which has no effect on the results of the anova, this is easily established.

> build

function(size, v = rnorm(sum(size))) {

```    col=c(rep(0,size),rep(1,size),rep(2,size),rep(3,size),
rep(0,size),rep(1,size),rep(2,size),rep(3,size))
row=c(rep(0,size+size+size+size),rep(1,size+size
```
+size+size))
return(data.frame(c=factor(col), r=factor(row),yield=v)) }
> size

 3 3 3 0 3 3 3 0
> set.seed(1234321)
> vv <- build(size)
> vv

c r yield

```1  0 0  1.2369081
2  0 0  1.5616230
3  0 0  1.8396185
4  1 0  0.3173245
5  1 0  1.0715115
6  1 0 -1.1459955
7  2 0  0.2488894
8  2 0  0.1158625
9  2 0  2.6200816
10 0 1  1.2624048
11 0 1 -0.9862578
12 0 1 -0.3235653
13 1 1  0.2039706
14 1 1 -1.4574576
15 1 1  1.9158713
16 2 1 -2.0333909
17 2 1  1.0050272
18 2 1  0.6789184
```

> options(contrasts = c('contr.helmert', 'contr.poly'))
> crossprod(model.matrix(~c*r, vv))
```            (Intercept) c1 c2 r1 c1:r1 c2:r1
(Intercept)          18  0  0  0     0     0
c1                    0 12  0  0     0     0
c2                    0  0 36  0     0     0
r1                    0  0  0 18     0     0
c1:r1                 0  0  0  0    12     0
c2:r1                 0  0  0  0     0    36

```

On 7/4/05, Phillip Good <pigood@verizon.net> wrote:
> If the observations are normally distributed and the 2xk design is
> balanced, theory requires that the tests for interaction and row effects be
> independent. In my program, appended below, this would translate to cntT
> (approx)= cntR*cntI/N if all R routines were functioning correctly. They
> aren't.
>
> sim2=function(size,N,p){
> cntR=0
> cntC=0
> cntI=0
> cntT=0
> cntP=0
> for(i in 1:N){
> #generate data
> v=gendata(size)
> #analyze after build(ing) design containing data
> lm.out=lm(yield~c*r,build(size,v))
> av.out=anova(lm.out)
> #if column effect is significant, increment cntC
> if (av.out[]<=p)cntC=cntC+1
> #if row effect is significant, increment cntR
> if (av.out[]<=p){
> cntR=cntR+1
> tmp = 1
> }
> else tmp =0
> if (av.out[]<=p){
> #if interaction is significant, increment cntI
> cntI=cntI+1
> #if both interaction and row effect are significant, increment cntT
> cntT=cntT + tmp
> }
> }
> list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> }
>
> build=function(size,v){
> #size is a vector containing the sample sizes
> col=c(rep(0,size),rep(1,size),rep(2,size),rep(3,size),
> rep(0,size),rep(1,size),rep(2,size),rep(3,size))
> row=c(rep(0,size+size+size+size),rep(1,size+size
> +size+size))
> return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
>
> gendata=function(size){
> ssize=sum(size);
> return (rnorm(ssize))
> }
>
> #Example
> size=c(3,3,3,0,3,3,3,0)
> sim2(size,10000,10,.16)
>
>
>
> Phillip Good
> Huntington Beach CA
>
>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help