Research that made bird flu transmissible between mammals is not illegal. Since the COVID-19 pandemic, it has been broadly defunded by governments, but private labs face little oversight, creating a significant biosecurity blind spot.
While petabytes of observational DNA sequence data exist, it's insufficient for the next wave of AI. The key to creating powerful, functional models is generating causal data—from experiments that systematically test function—which is a current data bottleneck.
While 80% of DNA synthesis companies voluntarily screen orders for dangerous pathogen sequences, the system is not mandatory. This creates a glaring loophole, as a malicious actor can simply place their order with the 20% of companies that do not perform this critical safety check.
Unlike military radar for missiles, the world has no passive, global alert system for emerging pathogens. We currently rely on a slow, reactive process where sick patients present symptoms at hospitals, significantly delaying detection and response, as was the case with COVID-19.
Nation-states are unlikely to develop pandemic-level bioweapons because they cannot easily control them or protect their own populations. The primary threat comes from extremist groups or lone actors who are not motivated by rational self-preservation, a critical insight for threat modeling.
Instead of trying to control open-source AI models, which is intractable, the proposed strategy is to control the small, expensive-to-produce functional datasets they train on. This preserves the beneficial open-source ecosystem while preventing the dissemination of dangerous capabilities like viral design.
AI capabilities are rapidly advancing beyond theory. Today's frontier models can troubleshoot complex laboratory experiments from a simple cell phone picture, often outperforming human PhDs. This dramatically lowers the barrier to entry for conducting sophisticated biological research.
Unlike nuclear deterrence, there is no single theory of victory for biosecurity. The most effective approach is a layered strategy combining four pillars: Delay (e.g., data controls), Deter (e.g., treaties), Detect (e.g., wastewater monitoring), and Defend (e.g., far-UV sterilization).
The operational plan for secure data control involves "Trusted Research Environments" (TREs). In this model, researchers bring their code to the data's secure location to run analyses, rather than downloading the sensitive data itself. This allows for valuable research while preventing leakage.
The computational design of a vaccine like COVID-19's took only days. The true, months-long bottlenecks are physical: clinical trials, regulatory approval, and distribution. The greatest potential for AI in pandemic response is to accelerate these costly, real-world processes, not the initial design phase.
Research on bio-foundation models like EVO2 and ESM3 shows that strategically excluding key datasets (e.g., sequences of viruses that infect humans) dramatically reduces a model's performance on dangerous tasks, often to random chance, without harming its useful scientific capabilities.
A biosecurity data-level (BDL) framework, modeled after biosafety levels for labs, would keep 99% of biological data open-access. Only the top 1% of data—that which links pathogen sequences to dangerous properties like transmissibility—would face restrictions like requiring use-approval.
