Nonclassical Classics

#EconTwitter once again showed this week why it is not a giant waste of time. In a Twitter discussion started by the wonderful author of The MixTape himself (available here for the low, low price of $35.00), I became aware of a purported phenomenon discovered in psychology over 20 years ago: the Dunning-Kruger Effect. And, given how long 2020 has lasted, it's really about 100 years old.


The Dunning-Kruger Effect was the subject of an article and another blog post this week and that sparked Scott's attention. This, in turn, led Grant McDermott, to suggest I enter the fray. See, the aforementioned blog post made its point using R and, since Grant and I are statistical software nemeses, he asked for my Stata take.


Of course, I was easily intrigued because it turns out at the heart of the alleged Dunning-Kruger Effect lies measurement error. Nonclassical measurement error to boot! So, I threw all my other work to side and dove right in. I am a sucker for a good measurement error controversy. Plus, it turns out that the Dunning-Kruger Effect not only enjoys its own wikipedia page, not only won the authors a (Ig) Nobel Prize, but also has its own song in the Incompetence Opera.


I like! I can see the headline on Broadway now: 

#EconometricSausage: The Opera


Anyway, enough dreaming. Let's get to the point. The essence of the Dunning-Kruger Effect is the following graph


The graph is derived from individual-level data on two variables: an "objective" measure of skill and a self-reported "subjective" assessment of skill. Refer to these variables as skill, S, and perception, P, respectively. Individuals are then grouped into quartiles based on S. Finally, the mean percentile values of S and P for each quartile (based on S) are plotted in the graph.

Easy. However, the implication is anything but. The authors interpret this graph as indicating that unskilled individuals do not know they are unskilled; they overstate their ability. High skilled individuals do the reverse, but to a lesser extent.


The Dunning-Kruger Effect is simple and intuitive. It's even believable. Who among us has not been guilty of or encountered others engaging in an inflated sense of self-worth? We are, after all, economists!



But, alas, the conclusions drawn by Dunning and Kruger have been called into question. To see why, we need to understand how measurement error works. The model that Dunning and Kruger have in mind is

P = S + u,

where u is nonclassical measurement error because Cov(S,u) < 0. In other words, S is an individual's true skill, P is the individual's misguided perceived skill, and u is measurement error. Since low-skilled (high-skilled) individuals overstate (understate) their true skill, u is negatively correlated with S. This makes u mean-reverting measurement error, a special case of nonclassical measurement error.


If the raw data on P contain mean-reverting measurement error, then the Dunning-Kruger Effect would have a stronger leg to stand on. However, the above graph taken from the original paper does not plot the raw data, it plots percentile ranks of the data. Why does this matter? This matters because percentile ranks are bounded variables; they must lie between zero and 100. And, bounded variables can never suffer from classical measurement error. Measurement error in bounded variables must, by definition, be negatively correlated with the truth. Thus, by converting S and P into percentile ranks, the Dunning-Kruger Effect is a foregone conclusion. 


This is true even if the original measurement error, u, is classical. This is what the simulation in R in the prior blog post shows. This is what I show in my Stata simulations. Before I do, let's make sure we understand the intuition. To be clear, S and P are the raw measures of true and perceived skill. The difference between them, u, is the measurement error. Define the percentile ranks of S and P as

R_s = 100 ⋅ F_s(S)
R_p = 100 ⋅ F_p(P) = 100 ⋅ F_s(S) + v = R_s + v

where F_s and F_p are the cumulative distribution functions (CDFs) of S and P, respectively, and 

v = 100 ⋅ [F_p(P) - F_s(S)].

As S → -∞, F_s(S) → 0. Since F_p(P) ≥ 0, then v ≥ 0 in the limit. As S → ∞, F_s(S) → 1. Since F_p(P) ≤ 1, then v ≤ 0 in the limit. Thus, regardless of the properties of u -- the measurement error in S -- the measurement error in R_p, v, must be negatively correlated with R_s. However, the Dunning-Kruger Effect, as interpreted, is really a statement about the properties of u, not v. Yet, the graph above tells us nothing about u.


To illustrate this, I simulated data sets with N=1000 from the following data-generating process (DGP):

S ~ N(0,1)
P = S + u
u ~ N(0,σ)
σ = ρ ⋅ {exp[max(S) - S]} + κ,

where ρ ∈ {0, 0.005, ...., 0.050} and κ ∈ {0.1,1,100}. For each combination of ρ and κ, I simulated 250 data sets. Note, the measurement error, u, is symmetrically distributed around zero and uncorrelated with S; however, the variance is decreasing in S if ρ > 0. Thus, in all cases, all individuals are just as likely to understate their skill as overstate their skill. The code is available here.

For  κ = 0.1, here is the Dunning-Kruger graph averaged over the 250 repetitions


The simulations show the percentile rank of perceived skill, R_p, varies less across quartiles of actual skill as ρ increases. But, even when ρ = 0, the Dunning-Kruger Effect is visible. What if instead of plotting percentile ranks on the y-axis, Dunning and Kruger had plotted the raw data, S and P. Here is the corresponding graph 


Looks funny, no? Well, that's because the average value of P perfectly aligns with the plot for S since the measurement error is mean zero in every DGP. Yet, we see the Dunning-Kruger Effect when we examine the data in percentile ranks, since the underlying classical measurement error in actual values must become nonclassical measurement error for the ranks.

As a final exercise, I increase the value of κ to 100. 


With κ this large, P is dominated by the (classical) measurement error; the signal-to-noise ratio is essentially zero. Thus, ranks of individual perceived skill are uniformly distributed even conditional on actual skill. As a result, the average percentile rank of perceived skill, R_p, is 50 in all cases in each quartile. Thus, we see complete mean reversion in ranks from purely classical measurement error in the underlying values.



Note, this exercise does not disprove the Dunning-Kruger Effect; it may very well exist. It just means that the authors' presentation of the data cannot prove its existence. To do so requires the authors to present the raw data -- actual and perceived skill -- not percentile ranks of each.

But, this does bring me to another point. The careful reader will notice something fishy about the original Dunning-Kruger plot above: the mean percentile rank in each quartile is greater than the median. This cannot happen if the data are generated as I described above. For instance, the mean percentile rank is 50 in each quartile in my graph with κ = 100 and less than 50 in the lower quartiles and greater than 50 in the upper quartiles when κ = 0.1. 

The reason for Dunning and Kruger's result above is that their data contain information on S, but not on P. Instead, it contains data on R_p directly, where this is obtained by asking individuals to self-report not their perceived actual skill, P, but rather their perceived ranking of their skill, R_p. P itself is missing in their data. Because they are not directly converting P into R_p, there is nothing that prevents all individuals from guessing that they are above the median.


This is not a distinction without a difference. Asking individuals to guess R_p requires them to first guess their actual skill, S, and then guess the distribution of S in the population. The Dunning and Kruger graph above can also be generated in a world where all individuals correctly know their own skill, but observe the population distribution of skill with error.

To illustrate this, I simulated data sets with N=1000 from the following data-generating process (DGP):

S ~ N(0,1)
R_s =  100 ⋅ F(S,0,1)
R_p =  100 ⋅ F(S,-v,1)
v ~ N(μ,1)
μ = κ ⋅ {exp[max(S) - S]},

where F(x,0,1) is the Pr(X<x) when X ~ N(0,1). Thus, R_s is the rank of actual skill since the true distribution of skill is standard normal. However, R_p is the rank of S derived from a normal distribution with mean -v and unit variance. If v differs from zero, then R_p will be mismeasured even though S is known. I set κ ∈ {0,0.01,0.02} and simulate 250 data sets under each value. Note, even when κ = 0, R_p will still generally differ from R_s since v is a random variable.

Here is the Dunning-Kruger graph averaged over the 250 repetitions


The results, particularly with κ = 0.01, look remarkably similar to the original Dunning and Kruger graph.


The problem is forgetting about something that Griliches (1985) wrote more than a decade before the Dunning and Kruger paper. He wrote:

"[A]ny serious data analysis has to consider at least two data generation components: the economic behavior model describing the stimulus-response behavior of the economic actors and the measurement model describing how and when this behavior was recorded and summarized. While it is usual to focus our attention on the former, a complete analysis must consider them both."

The Dunning-Kruger Effect should serve as a good reminder that, as empirical researchers, we must not give short shrift to issues of measurement lest we overstate our own intelligence.



References

Giliches, Z. (1985), "Data and Econometricians--The Uneasy Alliance," American Economic Review, 75, 196-200

Kruger, J. and D. Dunning (1999), "Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments," Journal of Personality and Social Psychology, 77, 1121-1134

Comments

Popular posts from this blog

Different, but the Same

Horseshoes and Hand Grenades

It's a Sign