Speaker
Description
The increasing frequency of infectious disease outbreaks, driven by changing Earth systems and globalization, underscores the need for reliable short-term forecasting methods to inform public health decision-making. However, most short-term case count forecasting approaches depend on extensive historical data for model training and are therefore poorly suited for emerging pathogens, early epidemic phases, or situations where episodic selection drives the emergence of new genetic variants, resulting in unexpected case surges.
To address this limitation, we developed a novel forecasting framework that leverages synthetic data from fast-evolving pathogens. Using the phylodynamic simulator MutAntiGen, we generated a comprehensive synthetic library of coupled viral evolution and epidemiological dynamics by systematically varying evolutionary, epidemiological, and immunological parameters. These simulations produce phylodynamic time series of overall and variant-specific cases across outbreak scenarios.
We used these synthetic time series, together with historical non-COVID-19 respiratory surveillance data, to pre-train a transformer-based forecasting model. We then showed that integrating synthetic phylodynamic time series with observed respiratory data substantially improved short-term forecast accuracy of U.S. state-level COVID-19 cases relative to models trained on observed data alone, and outperformed real-time COVID-19 forecasting efforts. We further showed that total case count forecasts were more accurate when composed of individual forecasts for variant-specific cases. Improvements were greatest near epidemic peaks and during epidemic decline.
By using synthetic outbreaks for model training, our framework enables forecasting for fast-evolving pathogens while mitigating data limitations, offering value for low-data and early epidemic settings. The framework is also readily extensible to incorporating genomic leading indicators of epidemic growth such as nucleotide-based population genetic summaries and pathogen-specific functional features, which has become relevant with the growing availability of pathogen genomic data in near real time.
| Expedited Notification | No thanks, I do not require Expedited Notification |
|---|