I know, I know, but hear me out. I was thinking about a lack of instruments this week because I stumbled across a few new papers by
(some with co-authors) that address the issue of endogeneity. And, these papers do not require, you guessed it, an instrument.
Kiviet (2020a,b) and Kiviet and Kripfganz (2020) develop and apply the estimation approach. Kripfganz and Kiviet (2020) outline a new Stata command, kinkyreg, that makes the estimator immediately accessible to applied researchers. Well, researchers that are not beholden to R at least!
I am still processing the finer points of the approach, but let me sketch the basic idea. Take a simple (linear in parameters) model with a single regressor,
y = a + bx + e.
Ordinary Least Squares (OLS) provides an unbiased and consistent estimate of b if ρ = Corr(x,e) equals zero. Alternatively, Instrumental Variables (IV) provides a biased but consistent estimate of b if Corr(z,x) is not equal to zero and Corr(z,e) equal zero, where z -- the instrument -- is an observed variable.
I have discussed the pros and cons of IV on several occasions. See
here,
here,
here, and
here. Aside from the fact that IV is biased in finite samples, it is now well appreciated that IV can behave poorly, even asymptotically, when instruments are only weakly correlated with the endogenous regressor (i.e., Corr(z,x) ≈ 0), that the condition that Corr(z,e) equal zero is untestable with a single instrument, and IV can be highly misleading if the any of the requirements for a valid instrument are violated.
A result of this has been the development of many new approaches that assess the sensitivity of IV estimates to deviations from the strict requirements required for consistency. See my prior post
here. However, the more typical result of this has been to treat IV like the Rodney Dangerfield of estimators.
Well, Kiviet's new papers are no exception. The idea is to recover an unbiased and consistent estimate of b when x is endogenous without reliance on an instrument. A cappella style.
The solution is actually quite simple and is similar in spirit to many of the methods that assess sensitivity of IV estimates to deviations from the strict requirements required for consistency mentioned above. Recall, the model is
y = a + bx + e.
Let b-hat represent the OLS estimate of b. The expectation of b-hat is
E[b-hat] = b + Cov(x,e)/Var(x).
The bias, Cov(x,e)/Var(x), is non-zero when x is endogenous. If this bias term were estimable, then a bias-corrected OLS estimate, b-bc, given by
b-bc = b-hat - Cov(x,e)/Var(x),
would be unbiased for b. However, while Var(x) is directly estimable from the data, Cov(x,e) is not. Moreover, as covariances are unbounded, even thinking about plausible values for this term is not straightforward.
But,
we can re-write the bias term as
Cov(x,e)/Var(x) = ρSD(e)SD(x)/Var(x) = ρSD(e)/SD(x).
Now, at least we know that ρ must lie in the interval [-1,1]. What about SD(e)? Well, one might think about estimating this from the variance of the OLS residuals. However, this is not an unbiased estimate since the OLS estimate of b is biased. However, Kiviet shows that a consistent estimate also depends on the value of ρ. Specifically,
Var(e) = (SSE/N)/(1-ρ^2),
where SSE is the usual OLS sum of squared errors and N is the sample size. Combining these last two statements, we can write the bias as
Cov(x,e)/Var(x) = ρSQRT[(SSE/N)/(1-ρ^2)Var(x)]
where the only unknown is ρ.
So, what can one do when there is an unknown parameter? There are two options. First, one can guess a value for ρ and recover the corresponding bias-corrected estimate, b-bc(ρ). However, this is no more satisfying than IV. IV proceeds by grabbing some variable, z, and guessing that Cov(x,z) = 0. Guessing that Corr(x,e) = ρ is no more appealing.
Second, one can posit a range of plausible values of ρ and then report the range of the estimates (and the union of the confidence intervals). This is a partial identification approach. Note, since the unknown parameter, ρ, must lie in interval [-1,1], then by examining the bias-corrected estimates over this entire interval, one is assured (subject to the usual sampling error) that the true value lies within.
This is the strategy advocated in the papers by Kiviet and produced by their Stata code. They call their procedure kinky least squares (KLS).
In the papers, they discuss the behavior of their bias-corrected approach, situations with multiple endogenous regressors, and partial identification of the coefficients on exogenous covariates that may be in the model.
As I said at the outset, I am still processing this new work. But, I do find it noteworthy that the Kiviet papers do not cite Krauth (2016). This paper is quite similar it seems at first blush. Krauth considers OLS as well, but has more of a flavor of Altonji et al. (2005). Krauth has exogenous controls, w, in model from the start
y = a + bx + wg + e,
where w is a vector of exogenous controls and g is the corresponding coefficient vector, and expresses
Corr(x,e) = λCorr(x,wg).
If λ = 0, then x is exogenous and OLS is unbiased. If λ = 1, then this is the case where selection on unobserved variables is equal to selection on observed variables as in Altonji et al. (2005). And so on. Krauth then partially identifies b by considering a range of plausible values for λ. Krauth is thus also expressing the bias term as a function of a single, unknown parameter in a way that makes it reasonably easy to think about plausible values of this parameter. Sound familiar?
The best part is that Krauth also has code
available in Stata; command name
rcr.
So, on this final night of Hanukkah, this is my present to
Khoa Vu, who let me know on Twitter how much he was missing reading new blog posts! Hopefully this one brings a little music into your lives.
Let's get through the rest of 2020 together.
References
Kiviet, J.F. (2020), "Testing the Impossible: Identifying Exclusion Restrictions," Journal of Econometrics, 218, 294-316
Kiviet, J.F. (2020b), "Instrument-Free Inference Under Confined Regressor Endogeneity and Mild Regularity," unpublished manuscript
Kiviet, J.F. and S. Kripfganz (2020), "Reassessment of Classic Case Studies in Labor Economics with New Instrument-Free Methods," unpublished manuscript
Krauth, B. (2016), "
Bounding a Linear Causal Effect Using Relative Correlation Restrictions,"
Journal of Econometric Methods, 5, 117-141
Kripfganz, S. and J.F. Kiviet (2020), "kinkyreg: Instrument-Free Inference for Linear Regression Models with Endogenous Regressors," unpublished manuscript