# [R] Endogenous variables in ordinal logistic (or probit) regression

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Wed, 09 Apr 2008 17:35:07 -0500

A student brought this question to me and I can't find any articles or examples that are directly on point.

Suppose there are 2 ordinal logistic regression models, and one wants to set them into a simultaneous equation framework. Y1 might be a 4 category scale about how much the respondent likes the American Flag and Y2 might be how much the respondent likes the Republican Party in America.

By the usual simultaneous equation argument, one should not simply run 2 polr

polr (Y1 ~ Y2 + X1 +X2)

and

polr(Y2 ~ Y1 + X1 + X2)

because Y1 and Y2 are endogenous. Where does the problem arise? Thinking back to the theoretical model, there are unmeasured scale variables y1* and y2* that are determined by

y1* = b0 + b1 * y2 + b2 * X1 + b3 * X2 + e1 and

y2* = c0 + c1 * y1 + c2 * X1 + c3 * X2 + e2

y1* and y2* are not observed, we see only the categorical outputs Y1 and Y2 that correspond to

```Y1 =   0    if y1* < pi1
Y1 =   1    if   pi1 <= y1* < pi2
Y1 =   2   if    pi2 <= y1* < pi3
Y1 =   3   if    pi3 <= y1*

```

and similarly for Y2.

Since e1 is "going into" y1*, and y1* "goes into" y2*, then there is the good chance that the error term e1 is correlated with y2*.

Running

polr (Y1 ~ Y2 + X1 +X2)

in isolation might give badly biased estimates.

I have found a well developed literature that deals with the question when one of the Y's is dichotomous.

Rivers, Douglas and Quang H. Vuong. 1988. Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics 39: 347-366

Alvarez, R. Michael and Garrett Glasgow. 1999. Two-Stage Estimation of Nonrecursive Choice Models. Political Analysis. 8: 11:24.

I have not found anybody who has estimated one of these models with R, however, and was hoping to get an example from someone.

I would also like to know if there is likely to be a problem extending the estimation framework to two multi-category dependent variables. In particular, I'm curious to know if one estimates a first stage model of Y1 as in

polr(Y1 ~ X1 + X2 + Z1)

to estimate predicted values of y1*, (y1*-hat, the linear predictor's estimated value, I believe), what would be the properties second stage parameter estimates of the regression that uses the instrumental variable

polr(Y2 ~ y1*-hat + X1 + X2)

As far as I can tell, this instrumental variables approach is the only realistic way to do this.

I am aware of some articles that claim that a multi-category logistic regression will essentially boil down to a series of dichotomous logits, in the sense that the dependent variable can be thought of as a sequence "are you in group 0 or group 1" "are you in group 1 or group 2" and so forth.

Cole, Stephen R, Paul D. Allison, and Cande V. Ananth. 2004. Estimation of Cumulative Odds Ratios. AEP 14(3): 172-178. (AEP = Annals of Epidemiology)

Following that approach, one could convert the data into the cumulative logistic format and then proceed with the methods proposed for binary dependent variables. I'm cautious about that approach because the results are not equivalent to maximum likelihood as would be obtained from polr, for example, and I don't quite see the strength of building on that approach.

PJ

```--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help