Covid-19 and The Great Data Fallacy of 2020

Covid-19 and The Great Data Fallacy of 2020

03 . 17 . 20 Insights

“I wish it need not have happened in my time,” said Frodo.
“So do I,” said Gandalf, “and so do all who live to see such times. But that is not for them to decide. All we have to decide is what to do with the time that is given us.”

― J.R.R. Tolkien, The Fellowship of the Ring 


We live in unprecedented times and like most people I’ve spent the last few weeks trying to make sense of the changing world around me. But, as I’ve examined the stream of data embedded in the ceaseless reports on Covid-19, I have been hectored by some nagging concerns that I began expressing privately many weeks ago that have now crescendoed into this editorial.

First, a brief review of basic statistical methods. For our purposes, we’ll want to think carefully about sampling. Sampling is the selection or polling of a subset of a total population from which we attempt to estimate characteristics of the entire population. We resort to sampling as an indicative measure of a population when it is too difficult to sample the entire population. Perhaps the form of sampling most of us are familiar with is political polling where some number of persons, perhaps 5,000 likely voters, are asked who they intend to vote for and from their responses a probability of how many votes each candidate will receive is derived. Thus, if 1,000 respondents say they will vote for Candidate A and 4,000 respondents say they will vote for candidate B we do not conclude that Candidate A will receive 1,000 votes. Instead, we conclude that approximately 20% of likely voters will vote for Candidate A. So, if there are 10,000,000 voters this means that 2,000,000 (20% of 10,000,000) voters will vote for Candidate A and 8,000,000 (80% of 10,000,000) will vote for Candidate B. Sampling isn’t perfect and depending on numerous variables (e.g. sample size), there is often an associated margin of error that represents the expected range of likely miscalculation in our statistical estimate, but with proper sampling methods the margin of error is usually relatively low.

Sampling is not a cookie cutter methodology. It can be divided into two main approaches: (1) Probability Samples and (2) Non-Probability Samples. Probability Samples are comprised in such a way that the likelihood that a representative constituent of a total population will be selected can be calculated. This means that we know the content of our sample pool. An example of this might be a telephone survey based on random digital dialing where pollsters have mapped relevant demographic data to each phone number, and they know the odds of each number being dialed. By contrast, Non-Probability samples do not employ this randomization feature, and so they are subject to various types of response and selection biases. For example, if a customer service hotline provides the option to take a survey at the end of the call (an opt-in survey), it may be that very unsatisfied customers are much more likely to participate in the survey than satisfied customers, thus skewing the sample results more negative than the actual views of the customer population. Critically, in this case, pollsters do not know what type of customers comprise their sample base because the likely selection bias means the sample is not random.

So, with our minds freshly oriented towards statistics, let’s pivot back to the Covid-19 data. As we are by now all aware, people are being urged to “bend the curve” of infection growth by limiting the types of social contact that allows viruses to rapidly spread. Bending the curve is, ultimately, an attempt to lower peak hospital utilization rates so that fatalities are minimized. Thus, our most critical data points are hospital utilization rate, ICU utilization rate, critical care capacity, and mortality rate, but all the other data points are proximate causes to mortality rate. So, perhaps it might make sense to start there.

On March 3rd, 2020, the reported death rate for Covid-19 was calculated at 3.42%. This was based on 90,883 globally reported cases and 3,110 deaths. This is the number that has been bothering me for the last few weeks. First, it’s important to note that most cold and flu-like viruses impact disparate people in different ways. For example, in SARS-CoV, the virus associated with the original SARS outbreak in 2002 – 2003, only ~15% of reported cases required serious medical intervention. This implies that, at minimum, 85% of cases were atypical (presenting as asymptomatic or with mild symptoms). We are just now starting to receive early estimates of SARS-CoV-2 that suggest that up to 90% of cases may be atypical.

Recall our brief review of various types of selection and response bias. If most Covid-19 cases are mild or have no symptoms – particularly in the young, as the data suggest, then it seems likely that our 90,883 reported cases probably skew towards the most severe infections. This because in a limited testing environment there will be a selection bias against testing those who must be hospitalized verses mild or asymptomatic cases where people simply remain home and untested. Because it is all but certain that most cases are atypical, deriving accurate mortality rates depends on creating a Probability Sample diverse enough to capture the actual infection rate. But this has, shockingly, not happened. Thus, what has widely been reported is the death rate for severe, known, cases of Covid-19 and not the actual mortality rate of total infections.

The distinction I am describing is the difference between China’s Case Fatality Rate (CFR) of 3.42% and the Global Infection Fatality Rate (IFR). That the infection rate is being likely severely underreported makes sense when you consider the R0 (rate of contagiousness) of the virus. A virus with an R0 of 2.2 – 2.6, such as SARS-CoV-2, that has emerged across the globe and has been on the loose for well over 60 days should have created a number of total infections that would be at least an order of magnitude higher than the 90,883 March 3rd data. After adjusting for lagging indicators, the global death rate is probably no higher than ~0.4%. For comparison, the typical death rate of the flu is estimated to be ~0.1%. The implication that Covid-19 is perhaps up to 4 times deadlier than the common flu is serious, but a far cry from being 34 times deadlier as has been widely misreported. I am deeply concerned that it seems as though these two very different fatality metrics have been widely equivocated in subsequent policy and reporting metrics. This is not the first time that the media and experts have gotten it wrong. For example, the initial reports for H1N1 “swine flu” suggested an IFR of 1.0%. That turned out to be inaccurate by a factor of 5, with the IFR eventually adjusted downward to ~0.2%. In the end, the math suggests that SARS-Cov-2 either has a lower R0 or IFR than is being reported. Given its worldwide spread the most likely conclusion is that the R0 estimates are generally correct and the IFR is much lower than is generally perceived.

This then brings us to the ongoing concern over hospital utilization rates. Here is the best data I was able to find for domestic hospital metrics along with my estimates of the US’s ability to increase critical care options before May when Covid-19 will begin its 3-month peak:

USA Hospital Data

I’ve created a model for domestic Covid-19 infection rates assuming an infection rate of 32% with an 86% atypical infection rate. Hospital utilization rates were taken from the recent Imperial College London study, which appears to wildly overestimate IFR but has good data for how severe cases of Covid-19 behave. The summary of pertinent data points from that model is as follows:

Covid-19 Summary Analysis

The initial implication of this projection is that we have enough resources to cover the influx of Covid-19, but it’s not quite that simple. Our critical care facilities are not evenly spread throughout the country and given the nature of viral contagion patterns it’s likely that some facilities will be overwhelmed while others remain underutilized. Thus, aside from the need to gather sufficient samples to determine actual infection and mortality rates, mass testing can also aid in predicting viral spread for the appropriate allocation and mobilization of resources such as ventilators, masks, and staff. Critically, mass testing can also detect transformations in viral behavior due to mutation, allowing us to better adjust to potential changes in R0 or lethality. Covid-19 may not be as serious a threat as is widely believed, but we quite literally can’t afford to operate blindly.

In the coming days, we’re likely to face an onslaught of local or federal mandated shelter in place orders. The concordant impact on the economy will depend on the level of nuance employed within these decrees. We can extrapolate from the available data from severe cases that Covid-19 is primarily a disease that effects the elderly and to a lesser extent those with relevant comorbidities such as obesity, compromised immune systems, or existing respiratory ailments. Thus, it is my recommendation that we impose stringent mandatory isolation for anyone over the age of 70 (and any family members that wish to stay with or care for them) in conjunction with stimulus funds to support the personal and economic impacts of selective quarantine. Luckily, folks in the most vulnerable age brackets are largely already out of the labor force (less than ~10% of Americans over the age of 75 continue to work), and some portion of this work could be maintained from home, so the total personal and overall economic impact would be minimized. Multitudinous testing and serological surveys should also be implemented and any person that tests positive for Covid-19 should be placed under mandatory quarantine until they test negative twice. Mandatory hygiene protocols for all businesses should also be implemented and strictly enforced. Finally, supplies of ventilators, masks, and increased critical care capacity should be amassed and intelligently mobilized in conjunction with testing data to increase capacity on demand and in anticipation of viral peaks within community clusters. These supplies can then be stockpiled in preparation for the next inevitable pandemic, with government purchases providing the ancillary benefit of providing a small, but potentially meaningful, economic stimulus.

Unfortunately, it is my expectation that we will soon see widespread coarse-grained blanket shutdowns of businesses and services that will rain down unprecedented destruction onto the domestic and global economy. If this happens, my estimation is that we will see an approximately 6 – 8 million increase in US unemployment claims by May 2020, though it could be double or more those numbers depending on the scope and duration of the shutdown. Under such a scenario, significant economic stimulus would be necessary, and I would expect what would then be a prudent $1.25T – $1.75T stimulus package to rapidly pass through congress. The most obvious deleterious effect of such a massive stimulus will likely be that it exacerbates an already alarmingly strained debt to GDP ratio (currently at 107%). Devaluation of the currency and inflation, perhaps even hyperinflation, would naturally be all the more inevitable. As for commercial real estate, sweeping shelter in place orders do come to pass retail, hospitality, and to a lesser extent senior housing and work force housing will be severely impacted over at least a 3 – 6 month period, with retail perhaps never fully recovering. Small office would also be on the short term chopping block. Lenders will likely be incentivized to make significant short-term debt modifications to help reeling sponsors make it through a mandated economic shutdown. The forward damage to retail would likely to be a boon for industrial assets as the economy continues to transition to digital storefronts in favor of traditional brick and mortar retail. Industrial buildings might even see more demand if the notion that an increased domestic manufacturing and supply chain / logistics capacity as a matter of national security continues to gain traction, though this seems somewhat doubtful. If, instead, we employ surgical isolation plans focused on the elderly and immunocompromised populations we can likely keep unemployment claims below 10% while also providing superior protection for those most at risk. This would allow herd immunity to build without overwhelming the hospital system, provide superior protection to the the most at risk populations, and limit stimulus and reparations to approximately $500MM with less social disruption and faster economic recovery.

Covid-19 is rightfully a serious concern, but it is a concern that must be kept in context. If my projections are correct, we can expect an upper bound of approximately 162,000 domestic fatalities from Covid-19 before any threat to overwhelm the hospital system is fully ameliorated, with 77% of these fatalities occurring in people over the age of 65. With improved treatment methods derived from those countries in the early path of the virus domestic fatality figures are likely to meaningfully decrease. While these numbers are deeply concerning and we have a duty to protect the young, elderly, and infirm it might be helpful to put them in context. The following table represents the leading causes of death in the United States with an overlay for my upper bound projection of Covid-19 fatalities:

USA Leading Causes of Death

This chart raises a question: Why do we tolerate 55,000 deaths from communicable influenza, 170,000 accidental injuries (35,000 of which are car accidents) and 250,000 preventable medical errors every year? Presumably we do so because the systems and processes that lead to these fatalities create a net positive increase to social utility. So, while it’s true that cars and roads inevitably lead to accident fatalities, they also provide the infrastructure and low transportation cost that allow for any number of otherwise unattainable beneficial economic activities ranging from research to manufacture to shipping, distribution, and point of sale transaction. While the current medical system is imperfect and leads to error induced fatalities, decreased costs from limited regulatory burden and the corresponding expanded coverage create a net benefit to society. Do we need to have so many vehicular fatalities? Probably not. A mandatory $400 roll cage would probably dramatically cut down on vehicular deaths. Do we need to tolerate 250,000 medical errors? Probably not. There is very likely a cost-effective means to at least reduce these errors. And yet, every year, a quarter million Americans, of all ages, needlessly die. We are used to these errors. Comfortable with them. In many cases utterly ignorant that they even exist. But would we destroy the global economy, and all of the immense benefits and advances dependent on its existence to save the people who die from these preventable deaths? Would we make things dramatically worse for everyone by halting progress and enervating our societies to save an affected few? It is, after all, the economy that provides food and medicine and police and fire sand sanitation and courts and prisons and the EPA and transportation and heat and electricity and roads and parks and education and internet and entertainment and communications, and so on. Would we see hundreds of millions die and billions live a lower quality of life by eliminating forward progress and the life changing medical and technological advances all predicated on a functioning global economy to save 50,000 Americans and 1.25 million people annually from the flu? Of course not. So then, why is our response to Covid-19 so different? Where is the line? I’m not certain that I know the answer to that question, and these sorts of philosophical inquiries hearken back to a long-standing debate between utilitarian and deontological ethics, but it is the mark of superficial thinking to so readily dismiss the long-term economic ramifications of our decisions as if those considerations don’t also matter, or perhaps even ultimately matter more. What I do know is that there is no free lunch. We will pay for Covid-19 and the likely forthcoming government mismanagement one way or another.

But there is more going on here than merely properly indexing various safety hazards. Human psychology has evolved so that familiar fatalities that are not so perceptually exotic as, “Covid-19 from the Wuhan province of China” induce less internal apprehension. This because our personal threat detection mechanisms, largely predicated on attentional salience, are not as easily triggered by the familiar; especially if those things have no personal impact on our lives. Media framing effects have also played a large role in public response to Covid-19. Framing the story as “a strain of the flu that is three to four times more deadly than average” would create a very different response than, “an unknown disease from China is killing thousands of people.” Natural selection only cares if you live long enough to spread your genes, so selection probably favors those who fear immediate threats more so than long-term or future threats. Dying from lung cancer 30 years from now doesn’t keep you from procreating today in the same way as a hungry lion or an invisible virus. So, our salience predicated fear mechanism initially responds more powerfully to hungry lions and viruses than it does to the dangers of smoking.

Similarly, threats that trigger emotional cognition are more likely to influence our consciously controlled rational processes, especially if they engage personal conditions. Thus, normative moral judgment (incorrectly) changes when something scary that might kill my loved ones or me right now grabs our attention. Accordingly, human beings tend to overreact to immediate threats while often ignoring far flung potential threats like medical errors and influenza, even though they already have and will undoubtedly go on silently killing far more people than Covid-19 ever will. So then, it is likely no surprise that when considering our response to Covid-19 we are, by our very nature, much more concerned about the virus epidemic of today, (even with a 99.6%+ survival rate), than the millions of people who will lose their jobs and homes or the inevitable cuts to schools and entitlements that will ultimately negatively impact far more lives when the US debt finally spirals out of control; an event which I fully expect the govt to unnecessarily hasten with blunt, ineffective, pork-laden stimulus predicated on a poorly thought out shelter in place order. The rallying cry of, “Every life is worth saving” will likely be conveniently relegated to the realm of subconscious cognitive dissonance when this epidemic inexorably ends. 12.5 million people will die over the next decade from the flu and 2.5 million Americans will needlessly die from medical errors over the same time span. No one will have destroyed the economy to save them and no one has bothered to do so prior to now. Influenza deaths are, apparently, not peculiar enough to merit such drastic measures.

It is not heartless to resist our natural tendency to focus on short-term threats in order to consider the best way to optimize the long-run health of our society. Many of the issues that have tested tensions between economic stability and various notions of equality will not be so easily argued for in the face of several trillion dollars of new debt and a hobbled economy. Luckily, in this case, the solution that is best for long-term socioeconomic well-being should also provide the best outcome for those most at risk. They are not mutually exclusive considerations, but I fear that our government will use a blunt sword instead of an edged scalpel in its war against Covid-19.

Indeed, as of the time of this writing, it appears as though every major government is ill-prepared to deal with this pandemic. Many of us have some vague notion that the world’s most powerful nations have think tanks where smart folks work diligently to conceive of the best response to various disaster scenarios. Surely, they know the ideal course of action if a meteor of size X and speed Y is going to hit the earth at angle Z in 10 days or if aliens land on the White House lawn or if there is a global pandemic like Covid-19. However, it unfortunately seems clear that either no such plans exist or else they have been executed extremely poorly.

Accordingly, governments appear to be overreacting in slow and unsophisticated ways to a highly contagious coronavirus with an infection fatality rate of 3.4% when they should be quickly reacting with sophistication and nuance to a highly contagious coronavirus with a < 0.4% infection fatality rate. The extent and duration of any future “solutions” and the specific problem they are targeting (i.e. the serious threat from the actual pandemic vs the overstated apocalyptic threat from the fantastical pandemic) may have future generations wondering why we cratered the economy for want of gathering data and drawing conclusions from proper, eminently attainable, statistical data; wondering why we chose fear and hollow, contradictory, moral platitudes instead of rational action and sober analysis; wondering why they can’t have health care or social security or a robust national defense because our leaders and politicians were more concerned about the PR of mitigating plausible blame than the hard work of cost-effective crisis management.