Using publicly available data from emerging research on COVID-19, this brief was written and reviewed by the Coronavirus-19 Outbreak Response Experts (CORE-19) at the University of Tennessee, Knoxville. This brief may serve as a guide to understanding the various modeling approaches for predicting COVID-19 caseloads.
Three common modeling approaches for COVID-19 projections and forecasts
- Apply a set of logical rules and math equations to capture how people move from one group (i.e., compartment) to another, using assumptions about the disease process, social mixing, public health policies, and other aspects.
- Divide an estimated population into different groups (e.g., susceptible, exposed, etc.).
- Examples include SEIR/SIR models (Susceptible, Exposed, Infected, and Recovered/Removed).
- Infer trends about an epidemic at a given location by looking at the current condition and then applying a mathematical approximation of the likely future epidemic path.
- Can be based on observed COVID-19 data only and/or be drawn from experiences in other locations and/or assumptions about the population, transmission, and public health policies/remedies in place.
- Create a computer-generated population/community then simulate the interactions and resulting spread of disease among individuals (i.e., agents) in that population.
- Based on assumptions and/or rules about individuals’ social behaviors and risks, movement and spatial patterns, and the health interventions and policies in place.
Causes of mismatch between model predictions/forecasts and caseloads
Disclaimer: Some models are actually terrible models – the field of epidemiology has developed some demonstrably good modeling practices, so this writing reflects use of these baseline-validated modeling practices.
Model Assumptions and Specifications
Models are the quantification of our understanding of either how a system itself works (math models) or patterns in the outcomes systems produce (statistical models). Each model makes specific abstractions and picks specific elements to include. Small differences lead to big changes. Failure to agree with observations can reflect that a model did not include all the critical elements. Good use of models uses many different models to see which elements matter in which cases.
Many elements that are included in models are included as constants (or things that change in only one way), but they change in the real world. If the model changed with them, it would be reasonably accurate, but we don’t ask the modelers to re-run the model. For example:
- Models that didn’t include shelter-at-home would overestimate spread and seem very wrong starting about 2 weeks after shelter at home orders were issued.
- Models that did include shelter-at-home, but didn’t include that shelter-at-home would then stop after only a few weeks would underestimate spread and start seeming very wrong about 2 weeks after shelter-at-home orders were relaxed.
- Models that assumed that there is a constant probability of death in hospitalized patients do not take into account that medical practice has gained knowledge about how to help COVID-19 patients during the last 3 months (e.g. proning instead of ventilation).
- Models that assumed that “coming out of shelter-in-place” meant “return to what was normal in January” did not take into account that many people have lost their jobs and/or are still hesitant to interact in public and many of the places they used to go are still closed, so patterns of daily life are still very different
This does not mean these models did not help; it means they captured a different scenario and reality diverged from that scenario. One of the best ways to use models is for modelers to work closely with policy makers to constantly revise and re-run scenarios as things change.
There are several different possible issues that may arise with the data used to create models. The following issues pertain primarily to the “statistical models” (i.e. Curve fitting/ Extrapolation models) as opposed to mathematical models and agent-based, which have analogous challenges that manifest in different ways.
- Data inadequacy: Tests in the early phase of the pandemic were inadequate. As a consequence, model parameters which were estimated based on such inadequate and unreliable data were not accurate.
- Different standards: There are different standards and calculations used to report on COVID-19. For example, the confirmed cases in the Johns Hopkins Coronavirus Research Center dataset includes presumptive positive cases. However, other local sources may not have included presumed positive cases in confirmed cases. As another example, Johns Hopkins relies on recovered cases based on local media reports which is inconsistent from one source to another.
- Differences between diagnosis dates and report dates: There are lag times between diagnosis dates and when they are recorded into system (i.e., report dates). The lag times can be anywhere between a couple days and two weeks and they vary from one lab to another, from one county to another, and from one state to another unsystematically. Without the information on lag times, models which run on daily data do not have accurate data to start with.
- Different and changing protocols: State governments and health departments make their own protocols to manage COVID-19 testing, including collecting, organizing and reporting data. Those protocols might be changed on the fly causing inconsistency in data.
- Mistake in test reporting: On May 21, 2020, The Atlantic reported that CDC was making a mistake in combining test results that diagnose current coronavirus infections with test results that measure whether someone has ever had the virus. Pennsylvania, Georgia, Texas, and a number of other states are doing the same.
The Random Nature of the Phenomenon
This issue is the most important for epidemics including the COVID-19 pandemic. The world includes random chance (so can models). To understand the role of chance in models, we run them many times and look at distributions in outcomes. The world only happens once. If a model about the number of dots you expect from rolling dice predicts that your expectation is 3.5 and you then roll a 1, it looks as though your model was wrong, but it was not. Epidemics make this worse. In epidemics, each roll of the die affects the next roll of the die, so disagreeing from an expected average early means the trajectory of agreement is less and less likely over time. The following example explains this randomness:
Imagine each sick person is a die. In a healthy population, we put one sick person and say that if they roll a 1 they don’t infect anyone, but if they roll a 6 they infect 20 people, and any other number they roll, they infect 2 people. Here are three scenarios:
- The person rolls a 1. The infection dies out and we need to wait for a new person to come into the population with infection before there can be any spread of disease.
- The person rolls a 2-5, then there are two new people who each get to roll their own die, the infection is likely to spread because it’s less likely that BOTH of the next two people will each roll a 1 than for just that first person to have rolled a 1.
- The person rolls 6. Now there are 20 new people who each roll a die. It’s MUCH less likely that they will ALL roll a 1, so the infection is nearly guaranteed to spread.
In this example, there is one model. Let’s say the model is completely correct (so the dice roll exactly determines the number of new infections and each die is completely even/fair). Because of the random chance involved in rolling die, trajectories of predicted numbers of infections will be wildly different.
Best practices for reliability in modeling is to use models from different perspectives, such as statistical, mathematical, and conceptual, that converge on their understanding of what should happen next. This, however, does not mean you need a rat’s nest of experts disagreeing – it means you ask what the cases are in which they disagree and ask what the differences are.
Problems often arise with user misunderstanding of a model's purpose, capacity, and results.
- Purpose: Models that are used to make decisions to avoid bad outcomes are self-defeating (instead of self-fulfilling). If a model says “A heavy object falling towards your head will hit you” and you step to the left and it then does not hit you, the model was technically wrong (it predicted you would be hit on the head and you were not), but that was not because the model failed, it’s because it succeeded in the goal of telling you how to force the bad prediction of the model to be wrong (i.e. step to the left).
- Time-frame and spatial unit: Results from a statistical model intended for short-term forecasts are not the same as those from a mechanistic model investigating future scenarios. In the same context, a model intended for forecasts of an entire state should not be used to derive the outcomes at the county level.
- Results: Models account for and present uncertainty in their results via confidence intervals, just like the cone in hurricane forecasting. However, users often look only into the center line and assume that is the single and crisp output from the model. (Note, this is related to the "Random Nature of the Phenomenon" described above.)