Mozi: Governed Autonomy for Drug Discovery LLM Agents

Mozi is a novel dual-layer architecture designed to transform large language models into reliable, governed co-scientists for high-stakes drug discovery. The system features a Control Plane for governed tool-use and a Workflow Plane for stateful, composable skill graphs, addressing critical bottlenecks of unconstrained tool use and poor long-horizon reliability. Evaluation on the PharmaBench benchmark shows superior orchestration accuracy, with case studies demonstrating its ability to generate competitive in silico drug candidates.

Mozi: Governed Autonomy for Drug Discovery LLM Agents

Researchers have introduced Mozi, a novel dual-layer architecture designed to transform large language models into reliable, governed co-scientists for high-stakes domains like drug discovery. This work directly addresses the critical bottlenecks of unconstrained tool use and poor long-horizon reliability that have prevented autonomous AI agents from being safely deployed in scientific pipelines, promising to bridge the flexibility of generative AI with the deterministic rigor of computational biology.

Key Takeaways

  • Mozi is a new architecture for tool-augmented LLM agents, featuring a Control Plane for governed tool-use and a Workflow Plane for stateful, composable skill graphs.
  • It is specifically engineered to solve reliability and governance issues in long-horizon, dependency-heavy scientific tasks like drug discovery, from Target Identification to Lead Optimization.
  • The system enforces role-based tool isolation, uses reflection-based replanning, and integrates strategic human-in-the-loop checkpoints to prevent error accumulation.
  • Evaluation on the PharmaBench benchmark shows superior orchestration accuracy over existing baselines, with end-to-end case studies demonstrating its ability to generate competitive in silico drug candidates.
  • The core design principle is "free-form reasoning for safe tasks, structured execution for long-horizon pipelines," aiming to provide built-in robustness and full trace-level audibility.

Architecting Reliability: The Dual-Layer Mozi Framework

The Mozi architecture is a direct response to the observed failure modes of current LLM agents in scientific workflows, where early-stage hallucinations can multiplicatively compound into downstream failures. Its innovation lies in a two-layer design that separates governance from execution. Layer A, the Control Plane, establishes a governed supervisor-worker hierarchy. This layer is responsible for enforcing role-based tool isolation, limiting execution to pre-defined, constrained action spaces, and driving reflection-based replanning when agents encounter uncertainty or errors.

Layer B, the Workflow Plane, operationalizes the actual scientific pipeline. It encodes canonical drug discovery stages—such as Target Identification, Hit Discovery, and Lead Optimization—as stateful, composable skill graphs. This layer integrates strict data contracts between different modules and inserts strategic human-in-the-loop (HITL) checkpoints at high-uncertainty decision boundaries. This structured approach ensures scientific validity is safeguarded throughout the multi-step process, transforming the LLM from what the authors term a "fragile conversationalist" into a reliable, governed co-scientist.

Industry Context & Analysis

Mozi enters a competitive landscape where reliability in autonomous AI agents is the paramount challenge. Unlike more open-ended agent frameworks like AutoGPT or LangChain, which offer great flexibility but often drift or fail in long tasks, Mozi imposes a strict, domain-specific structure. Its approach is more analogous to Microsoft's AutoGen framework, which also uses multi-agent conversations, but Mozi adds a critical, governed workflow layer tailored for scientific rigor. This reflects a broader industry trend moving from general-purpose chatbots to specialized, verifiable AI systems for enterprise and scientific use.

The evaluation on PharmaBench is significant, as it provides a concrete, curated benchmark for biomedical agents—a domain lacking in standardized tests compared to code (HumanEval) or general knowledge (MMLU). Demonstrating superior "orchestration accuracy" suggests Mozi better manages the sequencing and execution of complex tool chains than prior art. Furthermore, its focus on navigating "massive chemical spaces" and enforcing "stringent toxicity filters" directly tackles the scale and safety requirements of real-world drug discovery, where libraries can contain billions of molecules and a single toxic compound can derail years of research.

Technically, the integration of "stateful, composable skill graphs" is a crucial advancement. It moves beyond simple linear prompting or retrieval-augmented generation (RAG) to a more robust, graph-based execution model where the state and outputs of one node (e.g., a protein-ligand docking simulation) become validated inputs for the next (e.g., ADMET property prediction). This structure is essential for reproducibility and audit trails, which are non-negotiable in regulated industries like pharmaceuticals.

What This Means Going Forward

The immediate beneficiaries of this research are computational biologists and pharmaceutical R&D teams. Mozi provides a blueprint for integrating LLMs into existing discovery platforms like Schrödinger's or OpenEye toolkits, not as black-box idea generators, but as governed orchestrators of deterministic computational tools. This could significantly accelerate early-stage discovery by automating literature-based target prioritization, virtual screening workflows, and multi-parameter optimization, while maintaining the scientific rigor and auditability required for regulatory submission.

Looking ahead, the principles behind Mozi—governed tool use, structured skill graphs, and strategic human oversight—are likely to propagate beyond drug discovery into other high-stakes, long-horizon domains. These include materials science, chip design, and complex financial modeling. The success of such systems will hinge on the creation of more domain-specific benchmarks like PharmaBench to drive measurable progress. The key trend to watch is the convergence of generative AI's creative potential with the deterministic, verifiable world of scientific computing, moving the industry from demonstration to deployment. Mozi represents a substantial step in proving that this convergence is not only possible but necessary for AI to deliver tangible, trustworthy value in the hardest scientific problems.

常见问题