Today is one of the best days of the year ... opening day for baseball (in the US)! In honor of the return to the diamond, and ending my own offseason, I decided it was a great day to blow off some of my duties and write another blog. Not a coincidence, the topic very much relates to a line by the character, Brickma, the pitching coach of the Chicago Cubs in the classic film Rookie of the Year.
“I wrap the cake up in my vomit bag, and voila!... Breakfast! Conservation, Managing resources... that is the key to Baseball.”
-Brickma
Of course, Brickma also got himself locked in the equipment cage when it was game time!
So, let’s play some ball and do this. The topic here is instrumental variables (IV). [Insert eye roll here]
I have been teaching IV in my class for the last few lectures. It is a fascinating estimator; how it works, its relation to other estimators, how finicky it can be, etc. Yes, many people hate it, but they are misinformed and/or don't appreciate the mechanics under the hood. As many have said before ...
and it does a disservice for those just learning to hear any estimator being disparaged without explanation [... except diff-in-diff LOL]. The fact of the matter is that IV works GREAT as long as the model is correctly specified and the sample is large. Most researchers who express distaste for IV do so not because of the estimator itself, but because they do not believe they or others can correctly specify the model. And, when the model is incorrectly specified (or the assumptions required only hold weakly), things can go haywire in a hurry.
This is a bit unfair to the estimator.
IV may breakdown a bit more starkly when things go awry, but it's not like this is unique to the IV estimator. For some reason, many many applied researchers are willing to accept the parallel trends assumption, but dispute any IV assumption. [Even despite the great recent paper by
Roth & Sant'Anna (2023) on the relationship between functional form and identification in diff-in-diff.] But, alas, I digress.
The point I want to make in this blog is to alert readers to an overlooked paper in the IV literature. It relates to finding valid instruments, the most contentious part of applied papers relying on IV. The paper is by
Brückner (2013) and is an applied paper. However, it proposes a creative -- and understated -- IV approach that had not been used before to my knowledge.
The model in the paper involves two simultaneously determined variables, say Y1 and Y2. Assume the data-generating process (DGP) is the following
Y1 = a + b*Y2 + e1
Y2 = c + d*Y1 + f*W + e2
Thus, Y1 depends on Y2 and unobserved factors in e1. Y2 depends on Y1, exogenous observed factors W, and unobserved factors, e2. The focus of the paper is on the parameter d, the effect of Y1 on Y2. As (should be) well known, when two variables are simultaneously determined -- as in this setup -- Ordinary Least Squares (OLS) estimation of either equation will produce biased estimates.
For example, the OLS estimate of d is biased because changes in e2 ⇒ changes in Y2 ⇒ changes in Y1 (if b≠0) ⇒ Cov(Y1,e2) ≠ 0.
To obtain a consistent estimate of d, a researcher may turn to IV. This requires an instrument for Y1 in the first equation above. To be a valid instrument, we require a variable Z that is (strongly) correlated with Y1, uncorrelated with e2, and excluded from the Y2 equation. The DGP above specifies the first-stage equation for Y1. In particular, the reduced form for Y1 is
Y1 = [1/(1-b*d)]*[(a + b*c) + b*f*W + (e1 + b*e2)]
which can be written as
Y1 = g + h*W + e3
Thus, the reduced form consistent with the DGP implies that Y1 depends only on the exogenous factors W. However, W is in the structural equation for Y2 according to the DGP and therefore is not a valid instrument. Moreover, there is no valid instrument for Y1 according the reduced form!
But, to quote another sports person, Lee Courso, "not so fast, my friend!" It would be a rookie mistake to throw up your arms, put the research project in the garbage, and rethink your life choices.
Brückner posits that while the reduced form for Y1 is not useful for identification of d, the structural model for Y1 is. Specifically, if one can consistently estimate the structural model for Y1, then one can net out the problematic part of the variation in Y1. Formally, given a consistent estimate of b,
Z = Y1 - b*Y2 = a + e1
is the part of Y1 that is not due to Y2. Brückner then proposes to use Z as an instrument for Y1. It is likely to be strongly correlated with Y1 (since e1 likely contains a lot of variation), it is exogenous IF Cov(e1,e2) = 0, and it is excluded from the Y2 equation as Z only affects Y2 indirectly through Y1.
So, now the problem turns to obtaining a consistent estimate of b. Well, again, because of the simultaneity issue, Y2 is endogenous in the structural model for Y1. However, following the DGP, the reduced form for Y2 becomes
Y2 = m + n*W + e4
Since W is not in the structural equation for Y1, these are available as instruments assuming f≠0. Now, a consistent estimate of b can be obtained, leading to the generation of Z, enabling IV estimation of the structural model for Y2.
What's going on here and why is this at all interesting to anyone but me?
In this setup, the objective is to consistently estimate the effect of one variable (Y1) on another (Y2). OLS is not possible because the two variables are jointly determined. At first glance, IV is not possible either because there is no exogenous exclusion restriction in the first-stage Y1 equation. But, all hope is not lost! If there
is an exclusion restriction in the equation of interest (Y2), then an IV estimator can be constructed. In other words, you need not restrict yourself to looking for exclusion restrictions in the Y1 equation ... exclusion restrictions in the Y2 equation work as well.
This is a very cool tool! And, if you don't like it when it is implemented, blame the worker, not the tool... because the tool does require the critical assumption that e1 and e2 are independent. Thus, this techniques works in certain situations where endogeneity arises solely due to simultaneity, not due to unobserved heterogeneity that affects both Y1 and Y2.
But, if this case applies to you, you can do like Brickma and manage the resources at your disposal for maximum enjoyment, just like in baseball!
Time to go watch some baseball. See you next offseason (hopefully sooner).
UPDATE (3.31.23)
The very smart Sal Navarro pointed me to Matzkin (2016) who proves identification in an entirely nonparametric version of the setup here. Moreover, since identification requires Cov(e1,e2) = 0, this is related to the literature of identification via covariance restrictions (CR), although here the CR is needed in addition to the exclusion restriction, W. My two dissertation chapters use a slightly different setup and achieve identification solely via CR (Millimet (2000), Pitt et al. (2003)).