From: Spencer Graves <spencer.graves_at_pdf.com>

Date: Thu 07 Jul 2005 - 03:08:29 EST

*>
*

*> c r
*

*> 1 1 1
*

*> 2 1 1
*

*> 3 1 1
*

*> 4 2 1
*

*> 5 2 1
*

*> 6 2 1
*

*> 7 3 1
*

*> 8 3 1
*

*> 9 3 1
*

*> 10 1 2
*

*> 11 1 2
*

*> 12 1 2
*

*> 13 2 2
*

*> 14 2 2
*

*> 15 2 2
*

*> 16 3 2
*

*> 17 3 2
*

> 18 3 2

*>
*

*>
*

> [1] 0 1 1 2 3 3

*>
*

*>
*

> asgn, sum)))

*> [1] 20.61 0.01 20.61 0.00 0.00
*

*>
*

*> 3 2.6259806 3.5504584 1.645215 0.1197238 0.85361018 4.53895212
*

*> 4 9.1942755 8.2122693 4.863392 5.4413571 2.03715439 22.94815118
*

*>
*

*> The rows of that array correspond to the sum of squares for the
*

*> Intercept (which we generally ignore), the factor 'c', the factor 'r',
*

*> their interaction and the residuals.
*

*>
*

*> As you can see that took about 21 seconds on my system, which I expect
*

*> is a bit faster than your simulation ran.
*

*>
*

*> Because I set the seed to a known value I can reproduce the results
*

*> for the first few simulations to check that the sums of squares are
*

*> correct. Remember that the original fit (fm1) is not included in the
*

*> table.
*

*>
*

*>
*

*>
*

> Analysis of Variance Table

*>
*

*> Response: rnorm(18)
*

*> Df Sum Sq Mean Sq F value Pr(>F)
*

*> c 2 0.5825 0.2913 0.3801 0.6917
*

*> r 1 0.2613 0.2613 0.3410 0.5701
*

*> c:r 2 2.6260 1.3130 1.7137 0.2215
*

*> Residuals 12 9.1943 0.7662
*

*>
*

*> You can continue this process if you wish further verification that
*

*> the results correspond to the fitted models.
*

*>
*

*> You can get the p-values for the F-tests as
*

*>
*

*>
*

*>
*

**> FALSE),
**

*> + pr = pf((res[3,]/1)/(res[5,]/12), 1, 12, low =
*

*> FALSE),
*

*> + pint = pf((res[4,]/2)/(res[5,]/12), 2, 12, low =
*

*> FALSE))
*

*>
*

*> Again you can check this for the first few by hand.
*

*>
*

*>
*

*> 4 0.57775850 0.13506561 0.8775828
*

*> 5 0.06363138 0.16124100 0.1224806
*

*>
*

*> If you wish you could then check marginal distributions using
*

*> techniques like an empirical density plot.
*

*>
*

*>
*

*>
*

*>
*

> At this point I would recommend checking the joint distribution but if

*> you want to choose a specific level and check the contingency table
*

*> that could be done as
*

*>
*

*>
*

*>
*

*> The summary method for an xtabs object provides a test of independence
*

*>
*

*>
*

*>
*

*> Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsF)
*

*> Number of cases in table: 10000
*

*> Number of factors: 2
*

*> Test for independence of all factors:
*

> Chisq = 51.6, df = 1, p-value = 6.798e-13

*>
*

*> for which you can see the puzzling result. However, if you use the
*

*> chisquared test based on the known residual variance of 1, you get
*

*>
*

*>
*

*>
*

*> 4 0.7706757 0.28059805 0.9418946
*

*> 5 0.5523983 0.53843951 0.6525907
*

*>
*

*>
*

*>
*

*> Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsC)
*

*> Number of cases in table: 10000
*

*> Number of factors: 2
*

*> Test for independence of all factors:
*

> Chisq = 0.25041, df = 1, p-value = 0.6168

*>
*

*>
*

*>
*

*> On 7/4/05, Phillip Good <pigood@verizon.net> wrote:
*

*>
*

*>
*

> worry

*>
*

*>
*

> effects

*>
*

*>
*

*> cntT
*

*>
*

*>
*

*> They
*

*>
*

*>
*

> http://www.R-project.org/posting-guide.html

*>
*

*>
*

*> ______________________________________________
*

*> R-help@stat.math.ethz.ch mailing list
*

*> https://stat.ethz.ch/mailman/listinfo/r-help
*

*> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

Date: Thu 07 Jul 2005 - 03:08:29 EST

I'm confused: I understood Doug to be describing a traditional, normal theory ANOVA with k rows in the table for different effects plus a (k+1)st for residuals and the mean squares (MS) column are all indpendent chi-squares scaled by the same unknown sigma^2. The k statistics in the F column are ratios of independent chi-squares with the same independent denominator chi-square. How can they be indpendent?

spencer graves

p.s. How is the method of synchronized permutations relevant to a
traditional, normal theory ANOVA?

Phillip Good wrote:

> Do you or Lumley have a citation for this conclusion? Most people go

*> forward with the ANOV on the basis that the various tests are independent.
**>
**> Phillip Good
**> P.S. Tests based on the method of synchronized permutations are
**> independent.
**>
**> -----Original Message-----
**> From: Douglas Bates [mailto:dmbates@gmail.com]
**> Sent: Monday, July 04, 2005 1:44 PM
**> To: pigood@verizon.net
**> Cc: r-help
**> Subject: Re: [R] Lack of independence in anova()
**>
**>
**> I wrote up a note on how to do a more efficient simulation of the
**> p-values from anova then discovered to my surprise that the chi-square
**> test for independence of the significance of the F-tests indicated
**> that they were not independent. I was stumped by this but fortunately
**> Thomas Lumley came to my rescue with an explanation. There is no
**> reason why the results of the F tests should be independent. The
**> numerators are independent but the denominator is the same for both
**> tests. When, due to random variation, the denominator is small, then
**> the p-values for both tests will tend to be small. If, instead of
**> F-tests you use chi-square tests then you do see independence.
**>
**> Here is the note on the simulation.
**>
**> There are several things that could be done to speed the simulation of
**> the p-values of an anova like this under the null distribution.
**>
**> If you examine the structure of a fitted lm object (use the function
**> str()) you will see that there are components called `qr', `effects'
**> and `assign'. You can verify by experimentation that `qr' and
**> `assign' depend only on the experimental design. Furthermore, the
**> `effects' vector can be reproduced as qr.qty(qrstr, y).
**>
**> The sums of squares for the different terms in the model are the sums
**> of squares of elements of the effects vector as indexed by the assign
**> vector. The residual sum of squares is the sum of squares of the
**> remaining elements of the assign vector. You can generate 10000
**> replications of the calculations of the relevant sums of squares as
**>
**>
*

>>set.seed(1234321) >>vv <- data.frame(c = gl(3,3,18), r = gl(2,9,18)) >>vv

> 18 3 2

>>fm1 <- lm(rnorm(18) ~ c*r, vv) >>fm1$assign

> [1] 0 1 1 2 3 3

>>asgn <- c(fm1$assign, rep(4, 12)) >>system.time(res <- replicate(10000, tapply(qr.qty(fm1$qr, rnorm(18))^2,

> asgn, sum)))

>>res[,1:6]

>

> [,1] [,2] [,3] [,4] [,5] [,6]

> 0 0.4783121 0.3048634 0.713689 0.6937838 0.03649023 2.63392426> 1 0.5825213 1.4756395 1.127018 0.5209751 1.18697199 3.32972093> 2 0.2612723 3.6396106 0.547506 1.1641910 0.37843963 0.03411672

>>set.seed(1234321) >>fm1 <- lm(rnorm(18) ~ c*r, vv) >>anova(fm2 <- lm(rnorm(18) ~ c*r, vv))

> Analysis of Variance Table

>>pvalsF <- data.frame(pc = pf((res[2,]/2)/(res[5,]/12), 2, 12, low =

>>pvalsF[1:5,]

>> pc pr pint

> 1 0.69171238 0.57006574 0.2214847

> 2 0.37102129 0.03975286 0.1158059> 3 0.28634939 0.26771167 0.1740633

>>library(lattice) >>densityplot(~ pc, pvals)

> At this point I would recommend checking the joint distribution but if

>>xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsF)

>> I(pint < 0.16)> I(pr < 0.16) FALSE TRUE

> FALSE 7204 1240

> TRUE 1215 341

>>summary(xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsF))

> Chisq = 51.6, df = 1, p-value = 6.798e-13

>>pvalsC <- data.frame(pc = pchisq(res[2,], 2, low = FALSE),

>

> + pr = pchisq(res[3,], 1, low = FALSE),

> + pint = pchisq(res[4,], 2, low = FALSE))

>>pvalsC[1:5,]

>> pc pr pint

> 1 0.7473209 0.60924741 0.2690144

> 2 0.4781553 0.05642013 0.1694446> 3 0.5692081 0.45933855 0.4392846

>>xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsC)

>> I(pint < 0.16)> I(pr < 0.16) FALSE TRUE

> FALSE 7121 1319

> TRUE 1324 236

>>summary(xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsC))

> Chisq = 0.25041, df = 1, p-value = 0.6168

>>To my surprise, the R functions I employed did NOT verify the independence >>property. I've no quarrel with the theory--it's the R functions that

> worry

>>me. >> >>pG >> >>-----Original Message----- >>From: Douglas Bates [mailto:dmbates@gmail.com] >>Sent: Monday, July 04, 2005 9:13 AM >>To: pigood@verizon.net; r-help >>Subject: Re: [R] Lack of independence in anova() >> >> >>I have already had email exchanges off-list with Phillip Good pointing >>out that the independence property that he wishes to establish by >>simulation is a consequence of orthogonality of the column span of the >>row contrasts and the interaction contrasts. If you set the contrasts >>option to a set of orthogonal contrasts such as contr.helmert or >>contr.sum, which has no effect on the results of the anova, this is >>easily established. >> >> >>>build >> >>function(size, v = rnorm(sum(size))) { >> col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]), >> rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8])) >> row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6] >> +size[7]+size[8])) >> return(data.frame(c=factor(col), r=factor(row),yield=v)) >>} >> >>>size >> >>[1] 3 3 3 0 3 3 3 0 >> >>>set.seed(1234321) >>>vv <- build(size) >>>vv >> >> c r yield >>1 0 0 1.2369081 >>2 0 0 1.5616230 >>3 0 0 1.8396185 >>4 1 0 0.3173245 >>5 1 0 1.0715115 >>6 1 0 -1.1459955 >>7 2 0 0.2488894 >>8 2 0 0.1158625 >>9 2 0 2.6200816 >>10 0 1 1.2624048 >>11 0 1 -0.9862578 >>12 0 1 -0.3235653 >>13 1 1 0.2039706 >>14 1 1 -1.4574576 >>15 1 1 1.9158713 >>16 2 1 -2.0333909 >>17 2 1 1.0050272 >>18 2 1 0.6789184 >> >>>options(contrasts = c('contr.helmert', 'contr.poly')) >>>crossprod(model.matrix(~c*r, vv)) >> >> (Intercept) c1 c2 r1 c1:r1 c2:r1 >>(Intercept) 18 0 0 0 0 0 >>c1 0 12 0 0 0 0 >>c2 0 0 36 0 0 0 >>r1 0 0 0 18 0 0 >>c1:r1 0 0 0 0 12 0 >>c2:r1 0 0 0 0 0 36 >> >> >>On 7/4/05, Phillip Good <pigood@verizon.net> wrote: >> >>> If the observations are normally distributed and the 2xk design is >>>balanced, theory requires that the tests for interaction and row

> effects

>>be >> >>>independent. In my program, appended below, this would translate to

>>>(approx)= cntR*cntI/N if all R routines were functioning correctly.

>>>aren't. >>> >>>sim2=function(size,N,p){ >>> cntR=0 >>> cntC=0 >>> cntI=0 >>> cntT=0 >>> cntP=0 >>> for(i in 1:N){ >>> #generate data >>> v=gendata(size) >>> #analyze after build(ing) design containing data >>> lm.out=lm(yield~c*r,build(size,v)) >>> av.out=anova(lm.out) >>> #if column effect is significant, increment cntC >>> if (av.out[[5]][1]<=p)cntC=cntC+1 >>> #if row effect is significant, increment cntR >>> if (av.out[[5]][2]<=p){ >>> cntR=cntR+1 >>> tmp = 1 >>> } >>> else tmp =0 >>> if (av.out[[5]][3]<=p){ >>> #if interaction is significant, increment cntI >>> cntI=cntI+1 >>> #if both interaction and row effect are significant, increment >> >>cntT >> >>> cntT=cntT + tmp >>> } >>> } >>> list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT) >>>} >>> >>>build=function(size,v){ >>>#size is a vector containing the sample sizes >>>col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]), >>>rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8])) >>>row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6] >>>+size[7]+size[8])) >>>return(data.frame(c=factor(col), r=factor(row),yield=v)) >>>} >>> >>>gendata=function(size){ >>> ssize=sum(size); >>> return (rnorm(ssize)) >>>} >>> >>>#Example >>> size=c(3,3,3,0,3,3,3,0) >>> sim2(size,10000,10,.16) >>> >>> >>> >>>Phillip Good >>>Huntington Beach CA >>> >>> >>> >>>______________________________________________ >>>R-help@stat.math.ethz.ch mailing list >>>https://stat.ethz.ch/mailman/listinfo/r-help >>>PLEASE do read the posting guide! >> >>http://www.R-project.org/posting-guide.html >> >>> >>______________________________________________ >>R-help@stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide!

> http://www.R-project.org/posting-guide.html

-- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves@pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Thu Jul 07 03:14:31 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:33:17 EST
*