Yet Another Post on LPM and Probit?


Binary choice models. Statistical models used to analyze the determinants of a binary outcome. Employed or not employed. Enrolled in school or not enrolled. Defaulted on a loan or not. WTO member or not. Democratic or not. And on and on. The analysis of binary outcomes is frequent and important in economics, political science, sociology, epidemiology, and others. So, we should strive to get it right.

Getting it "right" has meant a lengthy debate - that has seemingly gone on ad nauseam - over the choice between a linear probability model (LPM) and a probit or logit model. 

Morgan Freeman Im Done GIF - MorganFreeman ImDone OhLord GIFs

For those unaware, LPM is a fancy name for "I am going to use OLS even though the dependent variable is binary, but I want to feel special!" 


Image result for i feel pretty meme

Probit and logit, on the other hand, are estimated via maximum likelihood (the original ML). Now that we have covered the basics, here is the twist. This is most definitely NOT another blog post on the relative merits of LPM and probit/logit.


Related image

The relative merits of each is important and should be studied. And, for the uninitiated, I refer you to lengthy Twitter battles and excellent blog posts elsewhere. But, here, I want to advocate that economists (and perhaps other disciplines, but perhaps they do already) should add to their toolbox OTHER estimators of binary choice models!


Image result for blasphemy meme

Admittedly, I am no expert, but think of the fun new Twitter battles this could create. To understand these other models and why they might be useful, let's refresh our memory on the classics: LPM, probit, and logit.

The LPM posits that the data-generating process (DGP) is

Pr(y=1|x) = xb

The marginal effect of x is constant and given by b. This is what leads to one of the shortcomings of LPM; namely, predicted probabilities can lie outside the unit interval.

The probit and logit models posit that the DGP is

Pr(y=1|x) = F(xb)

where F() is the standard normal cumulative distribution function (CDF) in a probit model and the logistic CDF in a logit model. The use of a CDF for the choice of F() ensures that the probabilities are contained in the unit interval. The marginal effect of a unit change in one of the covariates, x_j, is given by

dPr(y=1|x)/dx_j = F'(xb)*b_j

In contrast to the LPM, the marginal effects - which are scaled versions of b - are necessarily observation-specific; they vary with the full covariate vector, x. This is well-known. 

But, what else can we say about the marginal effects in the probit and logit model? For starters, we can say that the marginal effects are maximized when F'(xb) is maximized. In the probit and logit models, this occurs when xb = 0, or Pr(y=1|x) = 0.5. In the probit model, F'(0) = 0.4 (roughly). In the logit model, F'(0) = 0.25. This implies that these models restrict the largest marginal effects to occur for observations with an initial probability of y being one of one-half. Conversely, these models restrict the marginal effects to be essentially zero for observations with initial probabilities of y being one close to zero or one. These are behavioral restrictions that are embedded in the model without a second thought.

What else? We also know that F'(xb) is symmetric which implies that the marginal effects are symmetric. For example, the effect of a unit change in x_j is identical when xb = -c and xb = c for all choices of c.  Thus, these models restrict the marginal effects to be identical for observations with initial probabilities of y being one equal to p and 1-p for all p. Again, this is a typically overlooked behavior restriction.

So, what can we do if we don't wish to choose a model where the marginal effects are constant across observations (LPM) or a priori restricted to be maximized at initial probabilities of one-half and symmetric away from one-half (probit and logit)? 


Image result for tell me tell me meme

We move beyond LPM, probit, and logit by choosing a different functional form for F() that does not impose symmetry and whose derivative is not necessarily maximized when F(xb)=0.5. Two options are available in - gasp! - Stata. The complementary log-log (cloglog) model and the skewed logit (aka scobit) model (Nagler 1994). How can you go wrong with a name like scobit?

The cloglog model sets F(xb) = 1 - exp[-exp(xb)]. The scobit is sort of like a Box-Cox model; it adds an extra parameter into the model to be estimated that allows the data to inform us about the choice of F(). Here, F(xb) = 1 - [1+exp(xb)]^(-a), where a is to be estimated along with b. As shown below, when a=1 the scobit reduces to the usual logit model. For other values, we can get asymmetric and funky shapes for F().


Some of these alternative models are used more frequently in a literature referred to as "rare events." These refer to modeling a binary outcome when, without loss in generality, 1s are very rare. Examples might include modeling bankruptcies by firms or individuals or infant mortality in a developed country. However, given the potential for these models to relax the behavioral restrictions embedded in the LPM, probit, and logit models, it seems they ought to have a more prominent place in many other contexts as well.

Let the debate begin!


References

Nagler, J. (1994), "Scobit: An Alternative Estimator to Logit and Probit," American Journal of Political Science, 38, 230-255.




Popular posts from this blog

The Great Divide

There is Exogeneity, and Then There is Strict Exogeneity

Black Magic