a cappella

As the year 2020 finally -- FINALLY! -- nears an end, hope looms on the horizon. That person currently residing in the White House is about to be evicted and a vaccine is currently en route. Some day soon, we may actually get to see one another in person.


I apologize to all those in hearing distance (which, thanks to social distancing, is not many), but I feel like singing! Singing a cappella style.


Why a cappella? I am glad you asked. A cappella music is music to the ears of empirical researchers.  

Because no instruments are required!


I know, I know, but hear me out. I was thinking about a lack of instruments this week because I stumbled across a few new papers by Jan Kiviet (some with co-authors) that address the issue of endogeneity. And, these papers do not require, you guessed it, an instrument. 
Kiviet (2020a,b) and Kiviet and Kripfganz (2020) develop and apply the estimation approach. Kripfganz and Kiviet (2020) outline a new Stata command, kinkyreg, that makes the estimator immediately accessible to applied researchers. Well, researchers that are not beholden to R at least!


I am still processing the finer points of the approach, but let me sketch the basic idea. Take a simple (linear in parameters) model with a single regressor,

y = a + bx + e.

Ordinary Least Squares (OLS) provides an unbiased and consistent estimate of b if ρ = Corr(x,e) equals zero. Alternatively, Instrumental Variables (IV) provides a biased but consistent estimate of b if Corr(z,x) is not equal to zero and Corr(z,e) equal zero, where z -- the instrument -- is an observed variable. 

I have discussed the pros and cons of IV on several occasions. See here, here, here, and here. Aside from the fact that IV is biased in finite samples, it is now well appreciated that IV can behave poorly, even asymptotically, when instruments are only weakly correlated with the endogenous regressor (i.e., Corr(z,x) ≈ 0), that the condition that Corr(z,e) equal zero is untestable with a single instrument, and IV can be highly misleading if the any of the requirements for a valid instrument are violated.

A result of this has been the development of many new approaches that assess the sensitivity of IV estimates to deviations from the strict requirements required for consistency. See my prior post here. However, the more typical result of this has been to treat IV like the Rodney Dangerfield of estimators.


Well, Kiviet's new papers are no exception. The idea is to recover an unbiased and consistent estimate of b when x is endogenous without reliance on an instrument. A cappella style.

The solution is actually quite simple and is similar in spirit to many of the methods that assess sensitivity of IV estimates to deviations from the strict requirements required for consistency mentioned above. Recall, the model is

y = a + bx + e.

Let b-hat represent the OLS estimate of b. The expectation of b-hat is

E[b-hat] = b + Cov(x,e)/Var(x).

The bias, Cov(x,e)/Var(x), is non-zero when x is endogenous. If this bias term were estimable, then a bias-corrected OLS estimate, b-bc, given by

b-bc = b-hat - Cov(x,e)/Var(x),

would be unbiased for b. However, while Var(x) is directly estimable from the data, Cov(x,e) is not. Moreover, as covariances are unbounded, even thinking about plausible values for this term is not straightforward.

But, 


we can re-write the bias term as

Cov(x,e)/Var(x) = ρSD(e)SD(x)/Var(x) = ρSD(e)/SD(x).

Now, at least we know that ρ must lie in the interval [-1,1]. What about SD(e)? Well, one might think about estimating this from the variance of the OLS residuals. However, this is not an unbiased estimate since the OLS estimate of b is biased. However, Kiviet shows that a consistent estimate also depends on the value of ρ. Specifically,

Var(e) = (SSE/N)/(1-ρ^2),

where SSE is the usual OLS sum of squared errors and N is the sample size. Combining these last two statements, we can write the bias as 

Cov(x,e)/Var(x) = ρSQRT[(SSE/N)/(1-ρ^2)Var(x)]

where the only unknown is ρ. 

So, what can one do when there is an unknown parameter? There are two options. First, one can guess a value for ρ and recover the corresponding bias-corrected estimate, b-bc(ρ). However, this is no more satisfying than IV. IV proceeds by grabbing some variable, z, and guessing that Cov(x,z) = 0. Guessing that Corr(x,e) = ρ is no more appealing. 


Second, one can posit a range of plausible values of ρ and then report the range of the estimates (and the union of the confidence intervals). This is a partial identification approach. Note, since the unknown parameter, ρ, must lie in interval [-1,1], then by examining the bias-corrected estimates over this entire interval, one is assured (subject to the usual sampling error) that the true value lies within.

This is the strategy advocated in the papers by Kiviet and produced by their Stata code. They call their procedure kinky least squares (KLS). 


In the papers, they discuss the behavior of their bias-corrected approach, situations with multiple endogenous regressors, and partial identification of the coefficients on exogenous covariates that may be in the model. 


As I said at the outset, I am still processing this new work. But, I do find it noteworthy that the Kiviet papers do not cite Krauth (2016). This paper is quite similar it seems at first blush. Krauth considers OLS as well, but has more of a flavor of Altonji et al. (2005). Krauth has exogenous controls, w, in model from the start

y = a + bx + wg + e,

where w is a vector of exogenous controls and g is the corresponding coefficient vector, and expresses

Corr(x,e) = λCorr(x,wg).

If λ = 0, then x is exogenous and OLS is unbiased. If λ = 1, then this is the case where selection on unobserved variables is equal to selection on observed variables as in Altonji et al. (2005). And so on. Krauth then partially identifies b by considering a range of plausible values for λ. Krauth is thus also expressing the bias term as a function of a single, unknown parameter in a way that makes it reasonably easy to think about plausible values of this parameter. Sound familiar? 


The best part is that Krauth also has code available in Stata; command name rcr.

So, on this final night of Hanukkah, this is my present to Khoa Vu, who let me know on Twitter how much he was missing reading new blog posts! Hopefully this one brings a little music into your lives.


Let's get through the rest of 2020 together.


References

Altonji, J.G., T.E. Elder, and C.R. Taber (2005), "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools," Journal of Political Economy, 113, 151-184

Kiviet, J.F. (2020), "Testing the Impossible: Identifying Exclusion Restrictions," Journal of Econometrics, 218, 294-316

Kiviet, J.F. (2020b), "Instrument-Free Inference Under Confined Regressor Endogeneity and Mild Regularity," unpublished manuscript

Kiviet, J.F. and S. Kripfganz (2020), "Reassessment of Classic Case Studies in Labor Economics with New Instrument-Free Methods," unpublished manuscript

Krauth, B. (2016), "Bounding a Linear Causal Effect Using Relative Correlation Restrictions," Journal of Econometric Methods, 5, 117-141

Kripfganz, S. and J.F. Kiviet (2020), "kinkyreg: Instrument-Free Inference for Linear Regression Models with Endogenous Regressors," unpublished manuscript



Popular posts from this blog

The Great Divide

There is Exogeneity, and Then There is Strict Exogeneity

Black Magic