John Sorkin wrote:
> Be very careful!
> When regression is performed by steps, you often will not get the same
> results as you would get from a single multivariable regression. The
> explanation for this is not simple, but a simplified explanation is that
> when you do your first regression,
> y=f(x1)
> all the total variance that can be accounted for is sucked up by x1
> leaving little varinace to be accounted for by your second regression,
> residuals=f(x2). In contrast when you perform a multivariable regression,
> y=f(x1,x2) the total variance is proportioned between x1 and x2.
> John
> I saw this type of models in some of my company projects.
> To simplify:
> Y is regressed on X1 and X2. But the regression is done by two steps:
> First Y is regressed on X1 with intercept, and the residuals from the
> first
> step are used to regress on X2, without the constant. The reason to do so
> is some observations have X1 data but do not have X2, so I guess the
> person
> wants to use as much information as he can to get the coef. for X1, and
> then
> use part of the residuals (that have X2 data) to catch what is left to be
> explained by X2.
> But my concern is, should we consider the correlation between X1 and X2?
> If
> residuals from the first step are used, then X1 effect has been removed.
> Then what does it really mean by regressing residuals on X2, which has
> some
> X1 effect correlated with?? should X2 be adjusted by X1, too (regress X2
> on
> X1 and use the residuals)?
> What if both X1 and X2 are dummy variables? Dummy variables can have a
> meaningful correlation, too, right?
>
> Thanks a lot!
