Faulty Logic?

Ahhhh, summer has arrived. As department chair, it is a welcome relief. Not that the "endless stream of crap" as I describe it has ceased. But, at least I can deal with it poolside.

Alas, something has already disturbed my summer of contentment. A few colleagues in the SMU business school came to me with some questions regarding the econometrics for a paper they are working on. They noticed something hinky in what they were finding, while following the existing literature. They came to me to try to make sense of it.

If I am right, the issue is very simple, yet reveals a lack of econometric knowledge in the existing literature. Of course, it could be I am the one lacking appropriate knowledge. You, my dedicated readers, can let me know.

Simplifying the issue a bit, the general research question is whether individuals in position of power within a firm (or other organization) matter. The prior literature focuses on managers and CEOs. Do these workers have an "effect" on firm outcomes once firm-specific factors are accounted for? Alternatively, one might ask if principals in schools matter once school-specific factors are accounted for.

Let's stick to the CEO example since the prior paper they brought to my attention is concerned with this question. The idea is that each firm has certain observed attributes and unobserved (but time invariant) attributes such as corporate culture that affect firm outcomes. Conditional on these observed and unobserved firm-specific factors, do CEOs matter?

At first blush, this is relatively straightforward to answer given data on firms i over time t, where firm i at time t is led by CEO j. In this case, one might wish to estimate the following linear regression

y_it = a_i + b_j(it) + cX_it + e_it

where a_i are firm fixed effects (FEs), b_j(it) are CEO FEs, X_it are time-varying, firm-specific observed controls, and e_it is an idiosyncratic error term. A joint test of

H_o : b_j = 0 for all j

is a test of whether CEOs matter conditional on observed and time invariant unobserved firm-specific factors.

The first potential issue with this approach -- which is not the focus of this post -- is that one needs the firm and CEO FEs to not be perfectly colinear. For example, if each firm has one and only one CEO during the sample period, then there is no way to disentangle the CEO effect from the firm effect. If firms have multiple CEOs during the sample period, and CEOs only work at one firm during the sample period, then the firm FE will capture, among other things, the average CEO effect for CEOs working at the firm. The CEO FEs are identified, and are interpreted as the the deviation from the firm FE which, in part, reflects the average CEO effect. Testing whether there is within-firm variation in the CEO FEs will provide evidence of whether CEOs matter. Alternatively, if CEOs move across firms during the sample period, then it is possible to disentangle the firm and CEO FEs. This is just the now famous AKM approach in Economics (see also Bonhomme et al. (2023)).

This final approach is what is relied on in the literature, even going further to examine instances where CEO movement is arguably exogenous. So, yes, we are back to estimating the following linear regression

y_it = a_i + b_j(it) + cX_it + e_it

and conducting a joint test of

H_o : b_j = 0 for all j .

All is good, right?

Fee et al. (2013), in a Review of Financial Studies article, cast doubt on this approach. The authors perform the following exercise.

1. Restrict the sample to firms (and years) that have one CEO at the beginning of the sample and then the CEO moves to a different firm in the sample. For example, CEO 1 is at firm A for the first part of the sample and then at firm B for the remainder of the sample.

2. Estimate the linear regression model above and test for the joint significance of the CEO FEs.

3. Create fake data by randomizing the firm where each CEO moves to. For example, while in the actual data CEO 1 moved from firm A to firm B, in the randomized data CEO 1 may be shuffled in the data such that now the data show CEO 1 moving to firm C.

4. Estimate the same regression model and conduct the same test of joint significance.

5. Repeat steps 3 and 4 many times.

6. Assess the results.

This approach sounds like a good idea. If the finding that the CEO FEs are jointly significant in the actual data is not spurious, then one should hope to fail to reject the null of no CEO FEs in the randomized data (allowing for Type I error). Of course this should be the case! If I assign CEOs to firms at which they did not actually work at, I should not find that the CEOs matter if my empirical approach is valid!

The authors believe this logic. They state (p. 593):

"[W]e randomly assign each CEO-to-CEO mover to a different hiring firm than the one he or she actually joins. We scramble the data 1,000 times, run the same regression for each scramble, and calculate the median F-statistic and p-value for the CEO fixed effects. As we report in the second row of Table 8, the results using this sample continue to indicate very significant CEO-style effects, even though we have assigned executives to firms that they did not actually join. In some cases, median F-statistics are larger for the scrambled sample than for the actual sample."

Well ...

This test is nonsense. Why? Intuitively, it boils down to the fact that FEs are intercepts and not slope coefficients. Consider a simple linear regression model

y_i = a + bx_i + e_i .

The OLS formulas are

b-hat = Cov(x,y)/Var(x)

a-hat = y-bar - b-hat*x-bar

where y-bar and x-bar are sample means. Suppose a,b ≠ 0 and that a-hat and b-hat are statistically significant at conventional levels. Now suppose we "scramble" the data so that the covariate value for observation i is now given by x_j for some j ≠ i. In each scrambled data set, Cov(x,y) ≈ 0 and so b-hat is close to zero. Repeating this process is what is known as randomization inference.

Notice, this randomization "works" for the slope coefficient. What about the intercept? Well, in each scrambled data set, b-hat ≈ 0 which means that a-hat ≈ y-bar. If y-bar ≠ 0, then the estimated intercept will remain statistically significant in all scrambled data sets.

Does this mean that the original OLS estimate of a is flawed somehow? No! It means that randomization inference (as implemented here) is not applicable to the intercept, only the slope.

In panel data, this logic continues. In the standard FE model

y_it = a_i + b*x_it + e_it

a-hat_i = y-bar_i - b-hat*x-bar_i .

In this case, the estimated FEs have a similar form to the pooled OLS case. The only difference is that the means of y and x are computed within each unit i, not over the full sample. Now, if we "scramble" the covariate, a-hat_i ≈ y-bar_i and the joint significance of the unit FEs this tells us nothing about the validity of the econometric approach.

Now, the authors in the RFS paper are not scrambling the covariates, but instead are scrambling which firm a CEO is assigned to. Well, it's more complex, but no different. Scrambling the CEOs will likely alter b-hat since the unobserved CEO heterogeneity is not controlled for now. So, b-hat will be biased in the scrambled data sets. The corresponding CEO FEs, however, will still be

a-hat_j(i) = y-bar_i - b-hat*x-bar_i

where a-hat_j(i) is the estimated FE for CEO j that was scrambled and assigned to firm i. It should be apparent that the joint statistical significance of a-hat_j(i) tell us nothing about the validity of the underlying model.

I did a real quick simulation to illustrate this. The (true) DGP is

y_it = a_i + 5*b_j(it) + x_it + e_it, i,j = 1,...,100; t = 1,...,10

x_it ~ N(0.5*a_i + 2*b_j(it), 2)

a_i, b_j ~ N(10,2)

So, there are 100 firms and 100 CEOs observed over 10 periods. CEO i is assigned to firm i for periods 1-4. After period 4, all CEOs move and are assigned to a new firm at random. Now CEO i works at firm j for the remainder of the sample.

For comparison, I scramble each data set two ways.

1. While in the actual data CEO i moves to firm j, in the scrambled data CEO i is assigned to a firm at random. CEO i continues to be assigned to firm i for the first four time periods.

2. CEO i is assigned to a randomly chosen firm j in the first four periods and a randomly chosen different firm for the remainder of the sample.

In Experiment 1, CEO i is assigned to the correct firm for t=1,..,4 and an incorrect firm for t=5,...,10. In Experiment 2, CEO i is assigned to one incorrect firm for t=1,...,4 and a different incorrect firm for t=5,...,10.

I replicate this 1000 times. In other words, I generate 1000 correct data sets and scramble each data set once according to Experiments 1 and 2. Median results are reported below, where beta-hat is the coefficient on x (true value = 1).

Correct Data

beta-hat = 1.000

F-stat (CEO FEs) = 162

p-value = 0.000

Experiment 1

beta-hat = 1.190

F-stat (CEO FEs) = 49

p-value = 0.000

Experiment 2

beta-hat = 1.178

F-stat (CEO FEs) = 51

p-value = 0.000

What is going here? I generated the data so it is likely that y_bar_j (which has to net out both effects of x and the firm FEs) is non-zero. Thus, the estimated CEO FEs are always statistically significant even when scrambled.

Code is available here.

Now that summer is upon us, be sure to take time for yourself. You have earned the right to kick back and brush up on some econometrics. Godspeed.

Search This Blog

How the (Econometric) Sausage is Made

Faulty Logic?

Popular posts from this blog

There is Exogeneity, and Then There is Strict Exogeneity

Different, but the Same

What Do You Median?