Don't Taunt Me
In what I guess could only have been done to taunt me after my previous post on not completely dismissing an estimator, this week's Twitter decided to offer a (admittedly humorous) sucker punch at another estimator: propensity score matching (PSM).
Third, there is the King & Nielsen (2019) article entitled "Why Propensity Scores Should Not Be Used for Matching." Well, that doesn't sound good. I won't get into the details, but the critique applies mainly certain matching algorithms. See this nice presentation on the paper by Ben Jann at a ... Stata Users Meeting. Stata to the rescue again!
But, I was thankful for the dig at PSM. As a new department chair, I am a bit worried about having the time and mental energy to keep up my blogging. At the start of the pandemic, I kind of sort of promised to blog every week to help keep the spirits up. Little did I know, five months later, there would still be no end in sight.
Turns out, the dig at PSM was just the kick in the pants I needed to motivate a new post.
PSM and matching in general has drawn the ire of many applied researchers for several years now. As with Instrumental Variables (IV) in my prior post, I am not entirely sure why. And, there is probably a whole new cohort of applied researchers who have no idea why and are too nervous to ask.
I think there are a few possible reasons for the disdain with which many view PSM. First, the estimator came out of seemingly nowhere in the late 1990s in economics. Being the new, shiny toy, it immediately became fairly popular. As part of this quick rise to fame, many researchers did not fully appreciate how the estimator works. This led many to claim that the estimator is a cure for endogeneity. As this is clearly not true, it led to a backlash. But, this backlash is misplaced in my opinion. The backlash should not be against the estimator per se, but rather the appropriation of the estimator by researchers for a purpose for which it is not designed. See my tweet here!
Second, there is a sense that PSM is really no different than Ordinary Least Squares (OLS). This is sort of like the Linear Probability Model vs. Probit/Logit debate.
While there are times when PSM and OLS yield similar estimates, there are other times when this is not true. In particular, when there is limited overlap in the covariates between the treatment and control group, OLS relies on extrapolation and PSM does not.
Third, there is the King & Nielsen (2019) article entitled "Why Propensity Scores Should Not Be Used for Matching." Well, that doesn't sound good. I won't get into the details, but the critique applies mainly certain matching algorithms. See this nice presentation on the paper by Ben Jann at a ... Stata Users Meeting. Stata to the rescue again!
Finally, as with Instrumental Variables (IV) in my prior post, I think the main objection to PSM is that it relies on identifying assumptions that many feel are not credible. In this case, the Conditional Independence Assumption (CIA). But, I have responses.
1. This does not explain why PSM gets more grief than OLS, which relies on CIA and functional form assumptions.
Altonji, J.G., T.E. Elder, and C.R. Taber (2005), "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools," Journal of Political Economy, 113, 151-184
Ichino, A., F. Mealli, and T. Nannicini (2008), "From Temporary Help Jobs to Permanent Employment: What Can We Learn from Matching Estimators and Their Sensitivity?" Journal of Applied Econometrics, 23, 305-332
King, G. and R. Nielsen (2019), "Why Propensity Scores Should Not Be Used for Matching," Political Analysis, 27, 435-454
Masten, M. and A. Poirier (2018), "Identification of Treatment Effects under Conditional Partial Independence” Econometrica, 86, 317-351
1. This does not explain why PSM gets more grief than OLS, which relies on CIA and functional form assumptions.
2. This does not invalidate PSM. Any estimator is only useful in situations where its identifying assumptions hold.
More importantly, though, many empirical researchers are perhaps unaware of the handful of methods that enable us to learn something even if the required assumption for PSM does not hold exactly. This is identical to my prior IV post. Horseshoes, hand grenades, and all that again.
Over the past two plus decades, there has been a steady stream of methods developed that allow one to relax CIA and assess robustness of the original PSM (or OLS) estimate.
In one strand of the literature, the methods are built upon the following logic: The CIA requires the absence of unobserved variables that are correlated with both treatment assignment and potential outcomes. So, how much selection into treatment on the basis of unobserved variables correlated with potential outcomes would there have to be to alter the conclusion of the PSM estimator?
Rosenbaum (2002) provides an early means of addressing this question. In what's come to be known as Rosenbaum Bounds, the method assesses the change in the p-value associated with the null hypothesis that the treatment effect is zero as greater selection on unobserved variables is allowed. This selection is quantified by asking how much the odds ratio of being treated is allowed to differ for two observationally identical agents. If the estimated treatment effect remains statistically significant despite the odds ratio deviating significantly from one, then the PSM estimate is deemed to be robust to hidden bias.
In Stata, this can be implemented with the command -rbounds-. A different command, -mhbounds-, is recommended when the outcome is binary.
It is important to note that these methods are only useful if the initial PSM estimate obtained under CIA is statistically significant. In other words, this approach is useful when the worry is that a statistically significant estimate is driven by an unobserved confounder. If a null effect is found and the worry is that this is due to an unobserved confounder masking a non-zero treatment effect, this method is not useful (to my knowledge).
A similar, but alternative approach, is developed in Ichino et al. (2008). In a cool twist, the authors assume that CIA does not hold given the observed variables, X. Instead, the authors assume that CIA would hold if both X and U observed, where U is an unobserved variable correlated with both treatment assignment and potential outcomes. What did we learn to do in the previous post when our lives are difficult, but would instead be much easier if something that is unknown were known?
That's right, we make it up! I mean, we do a grid search! Or, in this case, we simulate it (where simulations are a form of a grid search over the space where we think U comes from ... imprecise math terminology, I'm sure, but whatever).
Specifically, the authors suggest specifying parameters that govern the distribution of U to ensure that it is correlated with both treatment assignment and potential outcomes, simulate U from this distribution, and then re-do PSM augmenting the set of covariates to include U. Comparison of this estimate (obtained over many simulations) with the initial PSM reveals the sensitivity of the initial estimate to the unobserved confounder. Neat, huh?
The trick lies in how one specifies the distribution of U. One method the authors refer to as killer confounder. Sounds perfect for 2020!
In this scenario, the researcher searches for a distribution of U that drives the PSM estimate of the treatment effect to zero. Then, the researcher can assess the plausibility of this distribution. Best part? That's right ... in Stata as -sensatt-.
Similar methods are also available for models estimated via regression rather than PSM. Altonji et al. (2005) offer two approaches in this regard. The first model is applicable when the outcome is binary. In this case, to estimate the effect of a binary treatment on a binary outcome, a bivariate probit model can be used under the assumption of bivariate normality. In this case, CIA corresponds to the assumption that the correlation between the errors is zero. To relax CIA, the authors suggest simply constraining the correlation to values other than zero and see how the estimated treatment effect varies. Once one knows the values of this correlation for which the treatment effect is statistically significant, one can assess the plausibility of these values. Again, this can be done easily in Stata using -biprobit- along with Stata's constraints option.
When the outcome is binary or continuous, the authors pose a different approach. The method is based on the omitted variable bias formula. This bias is not generally known. Here, the authors reveal how to estimate it under the assumption that selection on unobserved variables is equal to selection on observed variables and under the null of no treatment effect. While a bit abstract, it is actually quite interesting to think about what this means. But, I won't digress here. In this scenario, one can compute the omitted variable bias. By then comparing the magnitude of the bias to the size of the initial treatment effect estimate, one can infer how selection on unobserved variables must compare to selection on observed variables in order to fully explain the initial treatment effect estimate. The researcher can think about whether this is plausible.
Without going into detail, note that Oster (2019) and Cinelli & Hazlett (2020) extend this approach further.
As I repeatedly say, methods such as these are not only highly fascinating, informative about the application at hand, but also help distinguish your research from the crowd by applying new econometric techniques.
[You. Not me.]
To sum up, this first strand of the literature contains methods that allow one to assess how much selection on unobserved variables is needed to destroy your initial finding. Another strand of the literature contains methods that allow you to estimate the treatment effect under a pre-specified amount of selection on unobserved variables.
The first example is in the Ichino et al. (2008) simulation approach discussed above. Here, rather than simulate a killer confounder and then assessing its plausibility, the authors suggest simulating a calibrated confounder and estimating the treatment effect with this variable now observed. Specifically, a calibrated confounder is one that is simulated from a distribution that mimics an observed confounder in X. Thus, this method asks what the estimated treatment would be if there exists an unobserved confounder with the same distribution as an observed confounder? Again, in Stata as part of the command listed above.
More recently, Masten & Poirier (2018) develop a partial identification approach that bounds the treatment effect under a pre-specified departure from CIA, which they refer to as partial CIA. The bounds are functions of a parameter chosen by the researcher than governs the extent to which CIA is allowed to be violated.
Best part? Nope, not quite. Masten's website says Stata code is forthcoming, but ... it has been saying that for a while.
Finally, in a completely different strand of the literature, I would be remiss if I did not mention my own paper with Rusty Tchernis. In Millimet & Tchernis (2013) we do two things. First, we propose a way to trim the sample based on the propensity score prior to doing PSM in order to focus on the sub-sample where the bias from the failure of CIA is minimized. Second, we propose a bias-corrected PSM estimator that accounts for a failure of CIA. Sounds cool, no? Well ... it's not.
I am not actually a big fan. (Some) others like it, and that's great. The paper suffers from two shortcomings. First, it makes strong distributional assumptions. Second, the bias-minimizing estimator has the undesirable property of moving the goalposts to a different parameter that can estimated with less bias. One still learns something, but this is not ideal.
The other reason for my dislike of the paper is that it took an incredibly long time to publish at JAE. Four referees and multiple rounds of revision. The referees and editor were fantastic as I recall; it was just a long haul. By the time it was over with, I truly never wanted to think about this paper again. I guess we can call this post-publication traumatic stress disorder! Such is the publication process.
At the end of the day, as with IV, PSM and matching more generally are useful tools for the job under a particular set of assumptions.
When there is concern that those assumptions may not hold in a particular application, there are more options available than casting the estimator aside and moving on. The econometrics literature is full of fascinating, and usable, approaches to allow the researcher to use the method and then assess what changes as the identifying assumptions are relaxed. Even if the researcher is convinced of the identifying assumptions, utilizing these methods seems recommended anyway.
Bottom line. Never say never.
References
Altonji, J.G., T.E. Elder, and C.R. Taber (2005), "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools," Journal of Political Economy, 113, 151-184
Cinelli, C. and C. Hazlett (2020), "Making Sense of Sensitivity: Extending Omitted Variable Bias," Journal of the Royal Statistical Society, Series B, 82, 39-67
DiPrete, T. and M. Gangl (2004), "Assessing Bias in the Estimation of Causal Effects:Rosenbaum Bounds on Matching Estimators and Instrumental Variables Estimation with Imperfect Instruments," Sociological Methodology, 34, 271-310
Ichino, A., F. Mealli, and T. Nannicini (2008), "From Temporary Help Jobs to Permanent Employment: What Can We Learn from Matching Estimators and Their Sensitivity?" Journal of Applied Econometrics, 23, 305-332
King, G. and R. Nielsen (2019), "Why Propensity Scores Should Not Be Used for Matching," Political Analysis, 27, 435-454
Masten, M. and A. Poirier (2018), "Identification of Treatment Effects under Conditional Partial Independence” Econometrica, 86, 317-351
Millimet, D.L. and R. Tchernis (2013), "Estimating Treatment Effects Without an Exclusion Restriction: Withan Application to the School Breakfast Program," Journal of Applied Econometrics, 28, 982-1017
Oster, E. (2019), "Unobservable Selection and Coefficient Stability: Theory and Evidence," Journal of Business & Economic Statistics, 37, 187-204
Rosenbaum, P.R. (2002), Observational Studies, 2nd ed., New York: Springer