Re: [R] dealing with multicollinearity

From: John Sorkin <jsorkin_at_grecc.umaryland.edu>
Date: Mon 11 Apr 2005 - 22:43:35 EST


Manuel,
The problem you describe does not sound like it is due to multicolinearity. I state this because you variance inflation factor is modest (1.1) and, more importantly, the correlation between your independent variables (x1 and x2) is modest, -0.25. I suspect the problem is due to one, or more, observations having a disproportionally large influence on your coefficients. I suggest you plot your residuals vs. predicted values. I would also do a formal analysis of the influence each observation has on the reported coefficients. You might consider computing Cook's distance for each observation.  

I hope this has helped.  

John  

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC  

University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524  

410-605-7119
- NOTE NEW EMAIL ADDRESS:
jsorkin@grecc.umaryland.edu

>>> Manuel Gutierrez <manuel_gutierrez_lopez@yahoo.es> 4/11/2005 6:22:55 AM >>>

I have a linear model y~x1+x2 of some data where the coefficient for
x1 is higher than I would have expected from theory (0.7 vs 0.88)
I wondered whether this would be an artifact due to x1 and x2 being correlated despite that the variance inflation factor is not too high (1.065): I used perturbation analysis to evaluate collinearity library(perturb)
P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))
> summary(P)

Perturb variables:

x1         normal(0,1) 
x2         normal(0,1) 

Impact of perturbations on coefficients:
            mean     s.d.     min      max     
(Intercept)  -26.067    0.270  -27.235  -25.481
x1             0.726    0.025    0.672    0.882
x2             0.060    0.011    0.037    0.082

I get a mean for x1 of 0.726 which is closer to what is expected.
I am not an statistical expert so I'd like to know if my evaluation of the effects of collinearity is correct and in that case any solutions to obtain a reliable linear model.
Thanks,
Manuel

Some more detailed information:

> A<-lm(y~x1+x2)
> summary(A)

Call:
lm(formula = y ~ x1 + x2)

Residuals:

      Min 1Q Median 3Q Max -4.221946 -0.484055 -0.004762 0.397508 2.542769

Coefficients:

             Estimate Std. Error t value Pr(>|t|)

(Intercept) -27.23472    0.27996 -97.282  < 2e-16 ***
x1            0.88202    0.02475  35.639  < 2e-16 ***
x2            0.08180    0.01239   6.604 2.53e-10 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1 

Residual standard error: 0.823 on 241 degrees of
freedom
Multiple R-Squared: 0.8411,    Adjusted R-squared: 0.8398

F-statistic: 637.8 on 2 and 241 DF,  p-value: <
2.2e-16 


> cor.test(x1,x2)
Pearson's product-moment correlation data: x1 and x2 t = -3.9924, df = 242, p-value = 8.678e-05 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.3628424 -0.1269618 sample estimates: cor -0.248584 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Apr 12 09:45:44 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:06 EST