(Rational) Addictions

I have an addiction. Perhaps more than one. But one for sure. And, so, when this image

made its rounds this week on Twitter with everyone adding their favorite econometric caption to what this person is winking about, I could not resist. I tried. I failed. I relented. I tweeted what I think she is saying:

"No measurement error."

Low-hanging fruit, I know. Still, I feel compelled to shout "measurement error" until researchers listen more.

Then, as only an enabler can do, my addiction was fed when Carlos Carpio Ochoa tweeted a response asking for a blog post on addressing measurement error and endogeneity in a binary covariate, such as treatment assignment. Of course, I do not possess the will power to resist. So, here we are.

In a very early post I discuss the difficulties created with measurement error in an otherwise exogenous binary covariate. Simulations show how an apparently valid instrument does not yield consistent estimates of the true coefficient in this case. The intuition behind this result is that measurement error in a binary variable cannot be classical measurement error; the measurement error must be negatively correlated with the truth (Black et al. 2000). Because the truth is correlated with the measurement error, any instrument that is correlated with the truth will also be correlated with the measurement error, rendering the instrument invalid.

Well, it's one thing to point out a problem. It's another to provide a solution. As I mentioned in the prior post (but did not go into detail), Nguimkeu et al. (2019) provide a solution that applied researchers should get to know. Their estimator allows for both endogeneity in the true binary covariate and (one-sided) measurement error. Even better, it is trivial to estimate in Stata.

The data-generating process (DGP) is assumed to be

y = xβ + αD* + ε

D* = I(zθ + ν > 0)

δ = I(wγ + υ > 0)

D = δD*

where y is the outcome, D* is the (unobserved) true binary covariate, δ is an (unobserved) indicator for correct measurement, D is the observed binary covariate, and I() is the indicator function. The formulation places some strong restrictions on the DGP and strong data requirements on the researcher.

Specifically, it assumes the measurement error is one-sided: if D* = 0, then D = 0, but if D* = 1, then D may be either 0 or 1. In other words, the model allows for the presence of false negatives, but not false positives. This may be reasonable in situations where there is stigma associated with D* and, therefore, individuals are only likely to misreport the absence of D* and not its presence. Aside from this, identification requires two instruments: a variable in z that is not included in x and a variable in w that is not in z. Finally, ε needs to be independent of x and z (but can be correlated with w); ν and υ need to be independent of x, z, and w.

Given this setup, estimation is accomplished using a simple two-step approach. In the first step, Poirier's (1980) partial observability model is estimated under the assumption that the errors are jointly normal. This model is similar to a bivariate probit, but with only a single outcome. In this case, we have

D = δD* = I(zθ + ν > 0, wγ + υ > 0)

which is estimable by maximum likelihood. In Stata, this is estimated using the -biprobit, partial- command.

In the second step,

y = xβ + αD*-hat + ε

is estimated by Ordinary Least Squares (OLS) where

D*-hat = Φ(zθ-hat)

and Φ() is the standard normal CDF. Appropriate standard errors can be obtained by bootstrapping the two-step estimator. The estimate of α is consistent under the required assumptions.

Indeed. And, while the authors' do note the strength of the requirements for identification, they state: "To our knowledge, this paper is the first attempt to provide point estimates of treatment effects in the context of endogenous misreporting of a binary treatment variable."

That brings us to another possible solution when dealing with a mismeasured and endogenous binary covariate: partial identification. The general idea of partial identification is also covered in a prior post. Partial identification is my second addiction after measurement error. So, put the two together and you get

To get a rough idea of the partial identification approach, consider the population Average Treatment Effect (ATE) of D* given by

E[Y(1) - Y(0)],

where Y(1), Y(0) are potential outcomes associated with treatment (denoted by D* = 1) and non-treatment (D* = 0). Focus on one aspect of this parameter, E[Y(1)]. Treatment of E[Y(0)] is analogous. E[Y(1)] is equivalent to

E[Y(1) | D* = 1]*Pr(D* = 1) + E[Y(1) | D* = 0]*Pr(D* = 0).

In principle, E[Y(1) | D* = 1], Pr(D* = 1), and Pr(D* = 0) can be observed. However, even in the absence of measurement error, E[Y(1) | D* = 0] can never be observed. This is the problem of the missing counterfactual. One can impose strong assumptions to point identify this quantity (e.g., independence or conditional independence assumptions).

In a partial identification approach, E[Y(1)] may be bounded under weaker assumptions by replacing E[Y(1) | D* = 0] with upper and lower bounds based on more justifiable assumptions such as ones that allow for non-random selection into the treatment on the basis of unobserved variables. See, e.g., Manski (1990) and Lechner (1999).

Yes. Yes it is, Adam Goldberg. We can do the same to obtain bounds on E[Y(0)]. Bounds on the ATE are then obtained trivially using these bounds on the separate components.

Well, what if D* is unobserved and instead we observe D? In this case, none of the components of E[Y(1)] or E[Y(0)] are observed. However, assuming the outcome is also binary, we can rewrite each of these terms as functions of quantities that are observable and four misclassification probabilities

Probability of a false positive with Y = 1
Probability of a false negative with Y = 1
Probability of a false positive with Y = 0
Probability of a false negative with Y = 0

By considering different assumptions on these misclassification probabilities, it is easy enough (just some algebra) to derive bounds on the ATE. The bounds can be further tightened by considering assumptions on the treatment selection mechanism and/or the ATE itself.

Examples of this approach include Kreider et al. (2012) and Millimet & Roy (2015). The best news? Yup. It's easy to do in Stata. See McCarthy et al. (2015).

Partial identification is perfectly suited for assessing the impact of a binary covariate when the variable is endogenous and potentially misreported. Of course, the down side is that partial identification is not point identification. But, point identification under faulty assumptions is not identification either. Researchers must be wary of their own addiction.

To incredible certitude.

References

Black, D.A., M.C. Berger, and F.A. Scott (2000), "Bounding Parameter Estimates with Nonclassical Measurement Error," Journal of the American Statistical Association, 95, 451, 739-748

Kreider, B., J.V. Pepper, C. Gundersen, and D. Jolliffe (2012), "Identifying the Effects of SNAP (Food Stamps) on Child Health Outcomes when Participation is Endogenous and Misreported," Journal of the American Statistical Association, 107, 958-975

Lechner, M. (1999), "Nonparametric Bounds on Employment and Income Effects of Continuous Vocational Training in East Germany," Econometrics Journal, 2, 1-28.

Manski, C.F. (1990), "Nonparametric Bounds on Treatment Effects," American Economic Review, 80, 319-323

McCarthy, I., D.L. Millimet, and M. Roy (2015), "Bounding Treatment Effects: Stata Command for the Partial Identification of the Average Treatment Effect with Endogenous and Misreported Treatment Assignment," Stata Journal, 15, 411-436

Millimet, D.L. and M. Roy (2015), "Partial Identification of the Long-Run Causal Effect of Food Security on Child Health," Empirical Economics, 48, 83-141

Nguimkeu, P., A. Denteh, and R. Tchernis (2019), "On the Estimation of Treatment Effects with Endogenous Misreporting," Journal of Econometrics, 208, 487-506

Search This Blog

How the (Econometric) Sausage is Made

(Rational) Addictions

Popular posts from this blog

There is Exogeneity, and Then There is Strict Exogeneity

Different, but the Same

What Do You Median?