IV with Endogenous Controls

Applied papers that rely on instrumental variables are often focused solely on a single parameter of interest. This parameter is the coefficient on an endogenous variable. Using an IV for this variable, one obtains the IV estimate of this coefficient. But what about all those other "exogenous" vars in the model?

Usually the coefficient estimates are ignored because they are not "of interest." What if one of those exogenous control variable is really endogenous as well? The typical response: "I don't care about that coefficient, so it doesn't matter." But does it matter? Um, perhaps yes!

Let's say the true model is


and both x1 and x2 are endogenous. You only care about b1. You have an IV for x1, call it z. For z to be valid, we know it must be correlated with x1 and uncorrelated with e. But ... it also needs to be (in all likelihood) uncorrelated with x2!

If your IV is correlated with an endogenous regressor incorrectly treated as exogenous, then your IV estimate of b1 will likely be inconsistent (remember, IV is always biased). So, if z and x2 are correlated, IV is unlikely to work!

Consider the following simulations. The DGP has 2 controls, both "equally" endogenous and uncorrelated with each other. Z is correlated with x1 and not with e. Everything is standard normal and the true value of b1=1.

First set of results (based on 250,000 sims) has Corr(z,x2)=0. IV is consistent even though x2 is incorrectly treated as exo.

Now, let's have z be correlated with x2 as well (albeit less strongly than z is with x1). IV is inconsistent (although less so than OLS in this particular DGP).

Uh oh. This is not looking good. What if z is only weakly correlated with x1, but still correlated with x2? Then, you might want to look away.

 So, do NOT ignore the endogeneity of controls just because you don't care about them.
One final thought. At this point, you might be thinking you can game the system by just dropping x2 from the model since you don't care about it anyway.

If you omit x2 from the model and it belongs, then the error term becomes e'=(e+b2*x2). Since x2 is part of the error term, z is no longer a valid instrument since it is likely correlated with e' through x2. And, once again, that's why should not ask how the sausage is made.

Note: Code is available here: http://faculty.smu.edu/millimet/blog.html


Popular posts from this blog

Mostly Unidentified?

Different, but the Same

Yet Another Post on LPM and Probit?