Re: [R] linear models and colinear variables...

From: Jonathan Baron <>
Date: Thu 01 Jul 2004 - 09:47:17 EST

On 06/30/04 16:32, Peter Gaffney wrote:
>I'm having some issues on both conceptual and
>technical levels for selecting the right combination
>of variables for this model I'm working on. The basic,
>all inclusive form looks like
>lm(mic ~ B * D * S * U * V * ICU)

When you do this, you are including all the interaction terms. The * indicates an interaction, as opposed to +. That might make sense unders some circumstances, for example if you are just trying to get the best model and you plan to eliminate higher-order interactions that are not significant, but usually it does more to obscure the interesting effects than to display them.

>My suspicion is that there's a large degree of
>colinearity in some of these variables that serves to
>reduce the total effect of either of a nearly colinear
>pair to an insignificant level; my hope is that
>removing one of a mostly colinear group would allow
>the other variables' possibly significant effects to
>be measured.

There may be colinearity, but the most likely problem is that you are including too many interactions, at too high a level. Inclusion of nonsignificant interaction terms often turns significant main effects into nonsignificant effects.

>Question 1) Is this legitimate at all? Can I do
>regression using the entire data set over only
>selected factors while ignoring others?
>(Admittedly I only just got my Bachelor's in math; the
>gaps in my knowlege here are profound and

If you select predictors on the basis of which ones are significant, then the final significance levels don't mean much, usually. Remember, 1 out of 20 will be significant at .05 even if you are using random numbers.

>Question 2) How do I go about selecting possible
>colinear explanatory variables?

If there is colinearity, then what to do about it depends on the substance of the questions you are asking. Some options are to combine variables, do some sort of factor analysis and use factors rather than variables as predictors, use the most meaningful of the variables that are colinear, or just live with it, if the substantive issues rule out the other options. (I'm sure there are other solutions that others might point out.)

>I had originally thought I'd just make a matrix of
>coefficients of colinearity for each pair of variables
>and iteratively re-run the model until I got the
>results I wanted, but I can't really figure out how to
>do this. In addition, I'm not sure how to do this in
>the model syntax once I've actually decided on some
>variables to exclude.
>For instance, supposing I wanted to run the model as
>above without the variable
>Bstaph.aureus:Dvan:Sr:U:ICU. What I tried was
>lm(mic ~ B * D * S * U * V * ICU -
>Obviously this doesn't work because the variable name
>Bstaph.aureus:Dvan:Sr:U:ICU hasn't been recognized
>yet. How do I do this? My best guess so far is to

Not clear what you mean here.

>build and define each of the variables like
>Bstaph.aureus:Dvan:Sr:U:ICU by hand with some
>imperative/iterative style programming using some kind
>of string generation system. This sounds like a royal
>pain, and is something I'd rather avoid doing if at
>all possible.
>Any suggestions? :-D


Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:  
R search page:

______________________________________________ mailing list
PLEASE do read the posting guide!
Received on Thu Jul 01 10:03:03 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 08:11:28 EST