Turning Questions into Experiments: AI-Powered Science Automation

Author: Denis Avetisyan

A new agentic architecture is streamlining the scientific process by automatically translating research goals into fully executable workflows.

An agentic pipeline coordinates interactions between a user, a conductor, a workflow composer, a deployment service, and a Kubernetes cluster across six distinct phases, with an asynchronous execution sentinel monitoring the process after initial submission to ensure comprehensive oversight.

This work presents a system separating intent extraction from deterministic workflow generation, using large language models and curated domain expertise to enhance reproducibility.

Despite advances in scientific workflow systems, a critical gap remains in translating high-level research questions into executable computational pipelines. This paper, ‘From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation’, introduces an agentic architecture that bridges this divide by decoupling semantic interpretation from deterministic workflow generation. The approach leverages large language models alongside domain-expert-authored knowledge to ensure reproducibility and significantly reduce computational overhead-achieving query completion with LLM costs under $0.001. Could this architecture unlock a new era of automated scientific discovery, enabling researchers to focus on hypothesis generation rather than implementation?

The Rigidity of Traditional Scientific Inquiry

Historically, scientific investigation has relied on workflows that, while thorough, demand extensive manual setup and customization. This rigidity presents a considerable obstacle to iterative exploration, as researchers often find themselves constrained by the time and effort required to reconfigure analyses for slightly altered questions or datasets. Such workflows typically involve a series of discrete steps, each requiring specific parameter adjustments and file conversions, effectively slowing the pace of discovery. The inherent inflexibility not only limits the ability to rapidly test hypotheses but also discourages the spontaneous investigation of unexpected results, as adapting the pipeline to accommodate new avenues of inquiry can be a substantial undertaking. Consequently, a significant portion of research time is devoted to maintaining the workflow itself, rather than focusing on the scientific problem at hand.

A significant impediment to accelerating scientific discovery lies in the disproportionate amount of time researchers dedicate to infrastructure management rather than core research activities. Modern scientific endeavors increasingly demand expertise not only in the specific field of study, but also in areas such as data storage, computational resources, and software maintenance. This creates a bottleneck where valuable researcher time is diverted from hypothesis generation, experimentation, and analysis, toward the often tedious tasks of system administration and troubleshooting. Consequently, projects can be delayed, insights lost, and the overall pace of innovation hampered, as the effort spent on the research tools often outweighs the time spent using them for actual discovery.

The surge in genomic data, often encapsulated in formats like VCF files, presents a significant computational hurdle for modern research. These files, detailing genetic variations, can quickly grow to enormous sizes – terabytes are increasingly common – straining traditional data processing pipelines. This isn’t merely a question of storage; the complexity of the data itself – millions of variants per individual, intricate relationships between genes, and the need for precise statistical analysis – demands adaptable computational solutions. Static workflows struggle to accommodate this scale and intricacy, requiring researchers to build and maintain increasingly complex infrastructure simply to manage the data. Consequently, scalable solutions – those leveraging cloud computing, parallel processing, and automated data management – are no longer optional but essential for unlocking the potential of genomic information and accelerating discoveries in personalized medicine and evolutionary biology.

A component architecture utilizes a Conductor to orchestrate specialized agents-a Workflow Composer leveraging domain Skills for plan creation, and a Deployment Service with Execution Sentinel executing those plans on a Kubernetes-based HyperFlow engine.

An Agentic Architecture: Decoupling Intent from Execution

The system utilizes a three-layer Agentic Architecture designed to translate natural language research questions into executable computational workflows. This architecture consists of the Semantic Layer, responsible for interpreting the user’s intent; the Deterministic Layer, which converts that intent into a defined workflow plan; and the Knowledge Layer, which provides the necessary data and tools for execution. By decoupling intent interpretation from workflow construction and execution, this layered approach enables dynamic workflow generation adaptable to varied research inquiries and computational environments. The architecture facilitates a modular system, allowing for independent updates and improvements to each layer without affecting overall functionality.

The Semantic Layer utilizes a Large Language Model (LLM) to process user-defined research questions, referred to as Research Intent. This processing involves natural language understanding to identify the core requirements of the query and convert them into a standardized, machine-readable format. Specifically, the LLM extracts key entities, relationships, and constraints, then encodes these elements into a structured representation – a knowledge graph or similar data structure. This structured output serves as the foundational blueprint for subsequent workflow generation, ensuring that the computational processes accurately reflect the initial research objectives. The layer’s output is not merely a textual interpretation, but a formalized representation suitable for algorithmic manipulation and workflow construction.

The Deterministic Layer receives the structured intent representation from the Semantic Layer and converts it into an executable workflow plan. This transformation involves mapping the abstract intent to specific computational steps, selecting appropriate tools or functions for each step, and defining the data flow between them. The resulting plan is a formalized sequence of operations, detailing the exact order of execution and the required inputs and outputs for each component. This ensures reproducibility and allows for automated execution without further semantic interpretation, effectively bridging the gap between the user’s request and computational action.

Encoding Domain Expertise: The Role of Skills

The Knowledge Layer is structured around discrete units called ‘Skills,’ which are authored and maintained by subject matter experts. These Skills are plain-text markdown documents designed to encapsulate specific domain knowledge. Each Skill contains three primary components: vocabulary mappings to standardize terminology and enable accurate intent extraction; parameter constraints defining acceptable input ranges for workflow execution; and optimization strategies outlining preferred methods for achieving desired outcomes. This structured approach allows for granular knowledge representation and facilitates the integration of expertise directly into automated workflows.

The Workflow Composer leverages Skills – curated domain expertise encoded as markdown documents – to generate optimized workflow plans. Specifically, these Skills provide critical information for intent extraction, resulting in an 83% full-match accuracy rate when paired with the Claude Opus model. This accuracy is achieved through vocabulary Skills which define permissible terms and relationships, guiding the Composer towards precise interpretation of user requests and the subsequent construction of effective workflows. This methodology ensures workflows are not only accurate in addressing the intended query but also efficient in their execution by avoiding unnecessary or incorrect steps.

Externalizing domain knowledge through the Knowledge Layer facilitates rapid adaptation to evolving research needs and datasets by decoupling core logic from specific subject matter expertise. This architecture allows for modification of Skills – the markdown documents containing vocabulary, constraints, and strategies – without necessitating alterations to the underlying Workflow Composer codebase. Consequently, updates to address new research questions or incorporate novel datasets can be implemented through Skill updates alone, significantly reducing development time and associated costs compared to traditional methods requiring extensive code refactoring. This approach prioritizes agility and maintainability, enabling faster iteration and deployment of updated workflows.

Scalable Execution and Data Optimization: A Matter of Efficiency

The system’s workflows are designed for execution within a Kubernetes environment, leveraging the Hyperflow Workflow Management System to ensure both scalability and dependability. This infrastructure allows for dynamic resource allocation, automatically adjusting to workload demands and ensuring consistent performance even with a high volume of concurrent requests. By containerizing each workflow step, the system achieves robust fault tolerance; individual failures do not compromise the entire process, and workflows can be seamlessly restarted or redistributed. The combination of Kubernetes and Hyperflow WMS establishes a resilient and adaptable foundation, capable of handling complex data processing tasks with efficiency and reliability, ultimately enabling the system to scale effectively alongside growing data and user needs.

Efficient data handling is central to streamlined workflow execution, and techniques like Tabix extraction significantly minimize data transfer requirements. This approach strategically focuses on retrieving only the necessary data subsets, bypassing the need to process entire datasets. By intelligently indexing and accessing specific data regions, Tabix extraction drastically reduces the volume of information moved during computation. The result is a marked decrease in processing time and associated costs, enabling faster insights and more economical operation of complex analytical workflows. This optimization is particularly impactful when dealing with large-scale genomic or biomedical datasets, where minimizing data transfer is crucial for scalability and feasibility.

The system employs a strategy of deferred workflow generation to dramatically enhance resource efficiency. Rather than pre-defining a rigid workflow, the system dynamically regenerates the workflow’s directed acyclic graph (DAG) based on the precise data volumes encountered and the available infrastructure resources at runtime. This adaptive approach allows for significant optimization, eliminating unnecessary data processing steps and minimizing data transfer. Benchmarking indicates this technique achieves up to a 92% reduction in data transfer requirements, leading to faster execution times and reduced computational costs by avoiding the movement of irrelevant or redundant information.

Workflow processing demonstrates remarkably consistent performance characteristics, with Large Language Model (LLM) query overhead maintained between 11 and 14 seconds irrespective of query intricacy. This efficiency is coupled with an exceptionally low execution cost, consistently remaining below $0.001 per query. This predictable latency and minimal expense are achieved through a combination of optimized infrastructure and efficient data handling, enabling rapid and affordable processing of complex requests. The system’s ability to deliver stable performance across varying workloads is a key factor in its scalability and cost-effectiveness, paving the way for broader application and accessibility.

Demonstration and Future Directions: Expanding the Scope of Automated Discovery

The successful automation of the 1000 Genomes Workflow signifies a substantial advancement in genomic data processing capabilities. This complex pipeline, traditionally requiring significant manual intervention and computational resources, was executed entirely through an automated system, validating the architecture’s capacity to manage large-scale genomic datasets. The demonstration involved the complete processing of genomic data, from initial quality control to variant calling and annotation, without human oversight at any stage. This achievement not only streamlines the analysis process but also reduces the potential for human error, offering a reliable and reproducible framework for future genomic studies and opening doors to faster scientific discovery.

The automation of complex genomic workflows represents a significant shift in research capabilities, allowing scientists to prioritize biological insight over technical hurdles. By embedding specialized domain knowledge directly into the workflow construction process, this architecture bypasses the traditionally time-consuming and resource-intensive task of manual pipeline development. This streamlined approach not only accelerates the pace of discovery, but also democratizes access to advanced genomic analysis, enabling researchers with varying levels of computational expertise to effectively explore and interpret large-scale datasets. Consequently, valuable time and resources previously dedicated to infrastructure management are now liberated for hypothesis generation, data interpretation, and ultimately, groundbreaking scientific advancements.

The architecture underpinning this workflow automation system is poised for substantial expansion, with ongoing development concentrating on broadening the repertoire of supported Skills – the modular components enabling specific analytical tasks. Future iterations will prioritize enhanced adaptability of the Workflow Composer, allowing it to dynamically adjust to evolving research questions and diverse datasets without requiring extensive manual reconfiguration. Beyond genomics, the team intends to apply this flexible framework to other scientific disciplines, potentially streamlining complex analyses in fields such as proteomics, metabolomics, and even climate modeling, thereby accelerating discovery across a wider spectrum of scientific inquiry.

The accurate translation of research intent into executable workflows relies heavily on specialized knowledge; studies reveal that when deprived of domain-specific vocabulary Skills, the performance of even advanced language models like Claude Opus diminishes significantly, achieving only 44% accuracy in intent extraction. This demonstrates that simply possessing general language proficiency is insufficient for navigating the complexities of genomic data analysis; a deep understanding of biological terms, genomic concepts, and established workflows is crucial for correctly interpreting researcher goals. Consequently, incorporating these vocabulary Skills is not merely an optimization, but a fundamental requirement for building reliable and effective automated scientific pipelines, ensuring that computational resources are directed towards meaningful and valid analyses.

The pursuit of automated scientific workflows, as detailed in this paper, echoes a fundamental principle of mathematical rigor. The agentic architecture strives for deterministic generation, separating intention from execution-a pursuit akin to establishing invariants in a complex system. As Andrey Kolmogorov stated, “The most important thing in science is to prove, not to guess.” This sentiment perfectly encapsulates the paper’s core idea; it’s not sufficient for a workflow to simply produce results, it must be demonstrably correct and reproducible, built on a foundation of provable steps rather than probabilistic approximations. The knowledge layer and domain expertise incorporated into the agentic design serve as the axioms from which these proofs can be constructed, ensuring that the ‘magic’ isn’t magic at all, but transparent, logical consequence.

The Road Ahead

The presented architecture, while a demonstrable step toward automating scientific inquiry, merely shifts the locus of potential failure. The reliance on Large Language Models for intent extraction introduces a probabilistic element that, however skillfully masked by deterministic workflow generation, remains fundamentally unsatisfying. True rigor demands provability, not merely high correlation with expressed intent. The knowledge layer, though authored by domain experts, is itself a distillation of imperfect observations and, consequently, a source of bias. Minimizing redundancy within this layer is paramount; every superfluous axiom introduces a vector for error propagation.

Future work must prioritize formal verification of both intent translation and workflow execution. The current paradigm treats the LLM as a black box; a more elegant solution would involve embedding formal logic directly into the model’s architecture, allowing for demonstrable correctness. Furthermore, the system’s capacity to discover new knowledge, rather than merely executing pre-defined workflows, remains an open challenge. A system capable of identifying inconsistencies within the knowledge layer, and proposing verifiable amendments, would represent a genuine advancement.

Ultimately, the pursuit of automated science is not about accelerating the production of data, but about refining the process of logical deduction. The goal is not to mimic the human scientist, with all their inherent fallibilities, but to surpass them with a system predicated on mathematical purity. Any other approach is, at best, a pragmatic compromise.

Original article: https://arxiv.org/pdf/2604.21910.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/