Using numbers to describe the world is fraught with peril. The problem is not that you can prove anything with statistics [1] (you can’t) or the Mark Twain [2] quip equating statistics with lies, but rather that numbers unaccompanied by correct reference points [3] tend to be very misleading. Hayek’s famous quote where aggregates conceal [4] is merely a subsection of this larger mistake; to report one number (an average, a percentage, a growth statistics) when the correct or more fully elaborated story includes that number in combination with other numbers, is the major concern.
In politics (and the media tasked with covering it), the stakes to portraying numbers in one’s own favor are very high and so we see this mistake all the time. Let me address what may be the most common one: failing to divide by the correct denominator – disguised by the frequent and careless use of percentages.
Normally, claims expressed in percent such as “87.2% of American households have a computer [5]” have straightforward meanings; the word ‘percent’ literally means “by the hundred.” [6] If there were 100 American households, we would know that about 87 of them have a computer. Our brains correctly read “almost all American households”, since 87.2 is the vast majority of 100. If we have some additional information about the number of households (119 million) [5] we can quickly establish that some 104 million households have a computer; the remaining 12.8% (around 15 million or so) don’t, which we know since the groups are mutually exclusive (either your household has a computer or it doesn’t) and that percentages sum to a hundred (87.2+12.8=100). Easy.
Dealing with non-negative numbers such as car-owners, votes, revenues or incomes like this is rarely a problem, and percentages are ideally suited to perform that task; they don’t require the reader to have detailed information about the number of households in order to gauge the meaning of the 87.2% digit. Numbers reported in percent give us a natural reference point: households with computers divided by all households.
This quickly changes when the subset of numbers you’re describing may include negative numbers, such as net job creation or income growth. Since those numbers can occasionally turn negative for even large subsections of a population, the meaning of “percent” is completely lost. The intuition is this: when you include negative numbers in a sample, and take a percentage based on some remaining (positive) number, individual percentages easily sum to more than a hundred.
Here’s an illustration of a hypothetical economy of three people (A, B, and C), whose combined earnings in the first year is $300 (distributed as $50 for A, $100 for B, and $150 for C). In year two, A increased his income to $70 (a 40% increase), B only earned $80 (a 20% reduction) and C’s income went from $150 to $180 (a 20% increase). The total growth of income in our economy was 10% (A+B+C equaled $300 in year one, A+B+C equaled $330 in year two), for a total income gain of $30. If we divide A’s income gain ($20) with the total income gain ($30) we find that two-thirds, or 66.7% of the income gain went to A. If we similarly divide C’s income gain ($30) by the total income gain ($30) we find that a 100% of the year’s income gain went to C. How in the world can C have gotten the entire income gain when we just said that A received two-thirds of it?
Year 1 | Year 2 | Income gain | Income gain | Income gain | |
A | 50 | 70 | 40% | +20 | +66.7% |
B | 100 | 80 | -20% | -20 | -66.7% |
C | 150 | 180 | 20% | +30 | +100% |
Sum | $300 | $330 | 10% | +30 | 100% |
Any data presented this way would quickly raise questions: clearly, there’s something wrong with the statement that A and C together received 166.7% of total income gain. Because B’s negative income gain distorts the picture, the percentages reflecting A’s and C’s share of income gains no longer mean what they usually mean.
What Has Gone Wrong Here?
As made evident in the table, we can quickly see that because A’s income gains in absolute numbers cancel out B’s income gain, any discussion including somebody’s share of income growth is seriously misleading. The shares of income growths only sum to a hundred when we include B’s negative income gain, but if we only care about the top earners [7] – in this case C – we could easily (and erroneously) conclude that all the income gain was captured by the rich.
Let’s use another example to make the mistake even more blatantly obvious. If you remember Mitt Romney’s job creation debate in the run-up to the 2012 election, this is a great illustration of using incorrect denominators. The RNC and the Romney Campaign [8] calculated that the net job loss from January 2009 to March 2012 was 740,000 [9] , but that the net job loss for women during the same period was 683,000. Quick calculations show that women accounted for 92.3% (683,000/740,000=0.923). Of course, we can make this even more absurd by comparing February 2009 to March 2012 [10] instead, where the net job loss was 16,000 jobs. On net, women lost almost half a million jobs during that period, which yields a result of around 3100% of all job losses! The straightforward meaning of “percent” as fraction of a hundred has entirely disappeared.
Because “net job losses” involve jobs created minus jobs lost, dividing the number of jobs lost by women by net job losses is entirely the wrong denominator to use. “Percent” no longer carries the straightforward meanings it usually does, but it still conveys that message to the uninformed reader.
Percentage-Flingers Abound
Sure, anybody with even rudimentary understanding of statistics knows not to use selected percentages when underlying set of numbers [11] can be negative. But the skilled deceiver or the careless fool can still say things like “100% of the income gain between year one and year two went to earner C”, and the reader will interpret “100%” with its usual meaning (“all”). But whenever the full set includes negative numbers, the relevant denominator is no longer 100.
Either through malice or ignorance, economists and media pundits fall into this trap all the time. CNBC [12] reported last year that “in nine states, the income growth of the top 1 percent was half or more of all income growth in that time period.” A few years ago, the inequality Joseph Stiglitz [13] wrote that “all the growth in recent decades — and more — has gone to those at the top”. A recent working paper by Emmanuel Saez [14]— a U.C. Berkeley economist who really should know better — summarized his findings that the “top 1% families captured 49% of total real income growth per family from 2009-2017”. Coming out of the Great Recession, the British Magazine The Economist [15] remarked that “around 95% of the increase in American income since 2009 has gone to the top 1%”. Of course, leftist politicians make these kinds [16] of claims on a regular basis, but they are statistically unsound.
In his successful How Not To Be Wrong, Maths professor Jordan Ellenberg concluded that “the combination of positive and negative allows you, if you’re not careful, to tell a fake story, in which the whole work of job creation in the tradeable sector was done by [a single] industry.”
Reporting percentages when the underlying dataset include negative numbers removes the standard meaning of percent. 90% of something that includes negative numbers no longer means "almost all" or "a very large majority."
Numbers presented alone indeed conceal or mislead the reader; “Always Be Comparing Thy Number” [17] ought to be a statistical commandment. While numbers reported in percentages do provide a useful reference for the reader, dividing by the wrong denominator or involving negative numbers, seriously distorts the story. Be aware when statistical deceivers or careless fools fling percentages around. Sometimes they really shouldn’t.