Life or Death

In a recent post, I wrote about the choices we make in statistical analysis along a continuum of simple  complex methods. Methods, of course, can be simple or complex along a number of different dimensions, and methods that seem simple may actually be quite complex and vice versa. However, a recurring theme in a lot of empirical work is the desire to represent complex phenomena with a single number.


We do this with estimands, such as we when we represent heterogeneous causal effects with a single average treatment effect on the treated (ATT), for example. But, we also do this quite frequently with data. We measure complex things such as "inequality" with a single number such as the Gini coefficient, or "democracy" using a polity index, or corruption using Transparency International's Corruption Perceptions Index, or "development" using the Human Development Index.

When we attempt to boil complex things down to a single number, this creates a number of issues. First, it sets us up for criticism by others who think that a different single number is more appropriate. This is particularly stress-inducing for young researchers.


But, something all young researchers must learn:

You can make some of the people happy some of the time, but you will never, ever, make all of the people happy all of the time.


Second, it may sidetrack us from our actual research goals and send us down a rabbit hole pondering the "perfect" summary value. No one, and I mean no one, gets a Ph.D. and devotes their life to research to think long, deep thoughts about how to represent complex, real world phenomena with a single number.

 

Third, indices run the risk that people will use and abuse them without fully comprehending their intended use.

This final point is what motivated me to scrap my plans for this morning and hastily write this post. As I got to the office this morning, the first thing I did was check the news online at CNN. Big mistake. I was already in a foul mood driving into the office thinking about yesterday's news and contemplating whether I need to let it out as a rant using this blog as a forum even though it has nothing to do with econometrics. But, then I looked at today's headlines. 

The headline article was about Ukraine receiving criticism for an alleged attack against Russia on Russian soil. And?! What, Russia can attack Ukraine on its soil, unprovoked, but how dare they respond in kind? As I discussed with the great Dan Collier on twitter -- yesterday, was it? -- much of the current predicament in the US can be attributed to the "good" people being backed in to taking the high road while the "bad" people continue to use their badness to destroy this world.

Then I turned to the next headline: The good, bad and ugly about BMI

And, I was filled with more rage as a I read the article. And, so, here we find ourselves. 

The article is a multi-pronged attack -- from doctors, psychologists, etc. -- on another index, the Body Mass Index (BMI). Like the examples listed above, BMI is a well-known index used to represent something complex -- one's health -- with a single number. Yet, the article misses the larger point. No single number can ever represent something so complex as one's health. 


But, the article is written as if this is breaking news. 

The problem is not with the index, BMI. It is not with the person that created the index. According to Wikipedia

"Adolphe Quetelet, a Belgian astronomer, mathematician, statistician, and sociologist, devised the basis of the BMI between 1830 and 1850 as he developed what he called "social physics". Quetelet himself never intended for the index, then called the Quetelet Index, to be used as a means of medical assessment." 

... never intended to be used as a means of medical assessment. And, also, although created nearly 100 years ago, people are still criticizing Quetelet's work. Let that sink in, all you young researchers out there! Your goal should not be to conduct research that is immune to all criticism. Rather, your goal should be to conduct research that is important enough that people are still criticizing it a century later.

But, back to the main point. The problem is not the index. The problem is with those who misinterpret the index as something it is not. Like most (all?) things in statistics, math, data, ... these things are pure, they are logical, they are true by definition. It is only when humans get involved, that errors are introduced. The Quetelet Index, or BMI, is what it purports to be. No more, no less. It becomes flawed when humans want to ascribe something else to it. 

The article at CNN should be entitled: The good, bad, and ugly about the misapplication of the BMI by (some) humans.


What would that article look like? It would (IMHO) discuss two important things. First, how human desire for simple and quick answers leads one astray. Want to know if someone is healthy or not? Well, it's gonna take a bit more work than: (i) stand on a scale, (ii) stand against the wall, and (iii) lemme grab my calculator. We are the microwave generation, but there is no quick way to measure someone's health. 


Second, how the US educational system has failed to properly teach about measurement error.


Yes, you heard me. If humans were properly instructed in measurement error, then they would know that one should think about health and BMI in the following terms

Health = BMI + measurement error

It's really that simple. Viewing the world through this simple lens would immediately convey the fact that BMI is not meant to be a complete representation of the complex phenomena known as health. 


Not only does an even rudimentary appreciation for measurement error avoid the human error that is the real message in the original CNN article, but it also makes clear that scrapping BMI and trying to come up with another index in its place will not get one anywhere. 


Actually, let me correct that last statement. Devoting time to creating a "better" index will accomplish something. 



It will waste your time. Not uncoincidentally, I wrote on a paper on this topic a few years back: "Covariate Measurement and Endogeneity" in Economics Letters in 2015. It has ... two citations. One of which is mine.



[Yes, this is my job and I get paid to do it.]

The paper was motivated by my research at the time on the causal effect of environmental regulation on various outcomes. "Regulation" is also a complex phenomena that is difficult to measure, but which has been captured by a variety of indices proposed by researchers. However, in this literature, one worries that regulation is endogenous due to simultaneity or unobserved heterogeneity even if it were measured accurately. As a result, studies in this area typically resort to Instrumental Variables (IV). However, it is well-known that IV may also be a potential solution to measurement error.

This got me thinking. 



Researchers -- smart researchers -- were investing a lot of time attempting to come up with a "better" measure of environmental regulation. But, since one was going to use IV to estimate the model anyway, is there a benefit to coming up with a "better" measure? It would not circumvent the need to use IV. So, perhaps not.

Well, it turns out that that is indeed the case. IV suffers from finite sample bias, but it is consistent. So, asymptotically, the reliability ratio of the index is irrelevant. In finite samples, the bias depends on the strength of the instruments and the reliability ratio (under classical measurement error). However, the bias is not monotonically decreasing in the reliability ratio; improving the accuracy of the index may actually exacerbate the bias. The punchline from the paper was that researchers ought to spend their time finding stronger instruments, not a less mismeasured index.

In the CNN article, humans ought to stop trying to find a single, simple number to measure someone's health. And, humans ought to know a thing or two about measurement error.

Our lives may depend on it.


Popular posts from this blog

There is Exogeneity, and Then There is Strict Exogeneity

Faulty Logic?

Different, but the Same