Re: [R] summary.manova rank deficiency error + data

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Wed, 13 Aug 2008 18:04:41 +0200

Pedro Mardones wrote:
> Thanks for the reply. The SAS output is attached but seems to me that
> doesn't correspond to the wihtin-row contrasts as you suggested. By
> the way, yes the data are highly correlated, in fact each row
> correspond to the first part of a signal vector. Thanks anyway....
> PM
>
Agreed. I tried disabling the check that causes R to protest, and then it gives similar DF but not quite the same statistics, quite possibly due to numerical instabilities in one or both systems. (You can easily try yourself, just do anova.mlm <- stats::anova.mlm and edit the qr() call inside.)

 > anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks") Analysis of Variance Table

            Df    Wilks approx F num Df den Df Pr(>F)   
(Intercept)  1 0.002537  1887.24      5     24 <2e-16 ***
GROUP        2     0.62     1.29     10     48 0.2616   
Residuals   28                                          
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



>
> The GLM Procedure
> Multivariate Analysis of Variance
> E = Error SSCP Matrix
> y1 y2 y3
> y4 y5
> y1 0.0353518799 0.035256904 0.0351327804
> 0.0349749601 0.0347868018
> y2 0.035256904 0.0351627227 0.0350395053
> 0.0348827098 0.0346956744
> y3 0.0351327804 0.0350395053 0.0349173343
> 0.0347617352 0.0345760232
> y4 0.0349749601 0.0348827098 0.0347617352
> 0.0346075203 0.0344233531
> y5 0.0347868018 0.0346956744 0.0345760232
> 0.0344233531 0.0342409225
>
> Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
> DF = 28 y1 y2 y3
> y4 y5
> y1 1.000000 0.999992 0.999967
> 0.999921 0.999852
> <.0001 <.0001
> <.0001 <.0001
> y2 0.999992 1.000000 0.999991
> 0.999963 0.999911
> <.0001 <.0001
> <.0001 <.0001
> y3 0.999967 0.999991 1.000000
> 0.999990 0.999958
> <.0001 <.0001
> <.0001 <.0001
> y4 0.999921 0.999963 0.999990
> 1.000000 0.999989
> <.0001 <.0001 <.0001
> <.0001
> y5 0.999852 0.999911 0.999958
> 0.999989 1.000000
> <.0001 <.0001 <.0001 <.0001
>
> The SAS System 10:33 Wednesday, August 13, 2008 8
> The GLM Procedure
> Multivariate Analysis of Variance
> H = Type III SSCP Matrix for group
> y1 y2 y3
> y4 y5
> y1 0.0023822408 0.002365848 0.0023471328
> 0.0023261249 0.0023030993
> y2 0.002365848 0.0023495679 0.0023309816
> 0.0023101183 0.0022872511
> y3 0.0023471328 0.0023309816 0.0023125426
> 0.0022918453 0.0022691608
> y4 0.0023261249 0.0023101183 0.0022918453
> 0.0022713359 0.0022488593
> y5 0.0023030993 0.0022872511 0.0022691608
> 0.0022488593 0.0022266141
>
> Characteristic Roots and Vectors of: E Inverse * H, where
> H = Type III SSCP Matrix for group
> E = Error SSCP Matrix
> Characteristic Characteristic Vector V'EV=1
> Root Percent y1 y2 y3
> y4 y5
> 0.41840103 71.72 -7542.628 17131.814 5347.394
> -31627.317 16700.100
> 0.16496011 28.28 -4180.854 -4413.446 32096.035
> -35545.204 12040.697
> 0.00000001 0.00 -41004.875 107291.004 -95905.664
> 32641.189 -3028.470
> 0.00000000 0.00 -416.226 -111.206 410.721
> 295.193 -171.953
> 0.00000000 0.00 -14678.651 5787.997 54718.250
> -69055.249 23218.580
>
> MANOVA Test Criteria and F Approximations for the Hypothesis of No
> Overall group Effect
> H = Type III SSCP Matrix for group
> E = Error SSCP Matrix
> S=2 M=1 N=11
> Statistic Value F Value Num DF
> Den DF Pr > F
> Wilks' Lambda 0.60518744 1.37 10
> 48 0.2227
> Pillai's Trace 0.43658228 1.40 10
> 50 0.2095
> Hotelling-Lawley Trace 0.58336114 1.37 10
> 33.362 0.2385
> Roy's Greatest Root 0.41840103 2.09 5
> 25 0.1000
>
>
>
>
>
>
>
>
>
>
> On Wed, Aug 13, 2008 at 4:34 AM, Peter Dalgaard
> <p.dalgaard_at_biostat.ku.dk> wrote:
>
>> Pedro Mardones wrote:
>>
>>> Dear R-users;
>>>
>>> Previously I posted a question about the problem of rank deficiency in
>>> summary.manova. As somebody suggested, I'm attaching a small part of
>>> the data set.
>>>
>>> #***************************************************
>>>
>>> "test" <-
>>>
>>> structure(.Data = list(structure(.Data = c(rep(1,3),rep(2,18),rep(3,10)),
>>> levels = c("1", "2", "3"),
>>> class = "factor")
>>>
>>>
>>> ,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,
>>>
>>> 0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,
>>>
>>> 0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)
>>>
>>>
>>> ,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,
>>>
>>> 0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,
>>>
>>> 0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)
>>>
>>>
>>> ,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,
>>>
>>> 0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,
>>>
>>> 0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)
>>>
>>>
>>> ,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,
>>>
>>> 0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,
>>>
>>> 0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)
>>>
>>>
>>> ,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,
>>>
>>> 0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,
>>>
>>> 0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
>>> )
>>> ,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
>>> ,row.names = seq(1:31)
>>> ,class = "data.frame"
>>> )
>>>
>>> summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
>>>
>>> #Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP, test),
>>> :
>>> residuals have rank 3 < 5
>>>
>>> #***************************************************
>>>
>>> What I don't understand is why SAS returns no errors using PROC GLM
>>> for the same data set. Is because PROC GLM doesn't take into account
>>> problems of rank deficiency? So, should I trust manova instead of PROC
>>> GLM output? I know it can be a touchy question but I would like to
>>> receive some insights.
>>> Thanks
>>> PM
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> What you have here is extremely correlated data:
>>
>>
>>> (V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))
>>>
>> Y1 Y2 Y3 Y4 Y5
>> Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
>> Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
>> Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
>> Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
>> Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889
>>
>>> eigen(V)
>>>
>> $values
>> [1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12
>>
>> $vectors
>> [,1] [,2] [,3] [,4] [,5]
>> [1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
>> [2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
>> [3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
>> [4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
>> [5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918
>>
>> Notice the more than 9 orders of magnitude between the eigenvalues.
>>
>> I think that what is happening is that what SAS calls MANOVA is actually
>> looking at within-row contrasts, which effectively removes the largest
>> eigenvalue. In R, the equivalent would be
>>
>>
>>> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")
>>>
>> Analysis of Variance Table
>>
>>
>> Contrasts orthogonal to
>> ~1
>>
>> Df Wilks approx F num Df den Df Pr(>F)
>> (Intercept) 1 0.037 164.873 4 25 <2e-16 ***
>> GROUP 2 0.701 1.215 8 50 0.3098
>> Residuals 28
>> ---
>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> or (this could be computationally more precice, but in fact it gives the
>> same result)
>>
>>
>>> anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")
>>>
>> --
>> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907
>>
>>
>>
-- O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Wed 13 Aug 2008 - 16:08:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 14 Aug 2008 - 08:33:45 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive