Beyond Pixels: AI Agents Tackle 3D Brain Scans Without Training

Author: Denis Avetisyan

A new approach uses sophisticated AI agents powered by large language models to analyze complex neuro-radiological images, offering a powerful alternative to traditional, data-hungry methods.

Agentic brain MRI analysis operates on exemplary cases to provide a comprehensive overview of its functionality.

Multi-agent systems leveraging large language models demonstrate training-free 3D brain MRI analysis and outperform single-agent architectures in complex neuro-radiological workflows.

Despite advances in visual question answering, current large language models struggle with the intrinsic 3D spatial reasoning required for direct analysis of volumetric medical images. This limitation motivates the work ‘Agentic Large Language Models for Training-Free Neuro-Radiological Image Analysis’, which introduces a training-free agentic pipeline leveraging LLMs to orchestrate specialized tools for automated brain MRI analysis. Our results demonstrate that this approach successfully executes complex neuro-radiological workflows-from preprocessing and pathology segmentation to longitudinal response assessment-without requiring model training, and that collaborative, multi-agent architectures outperform single-agent approaches. Could this paradigm shift unlock truly autonomous medical image analysis, and what further gains are possible through improved agent collaboration and tool integration?

The Burden of Interpretation: Bottlenecks in Neuro-Radiology

Neuro-radiology faces a significant challenge as current workflows rely heavily on manual image interpretation and data extraction, creating a bottleneck in diagnostic throughput. Radiologists spend considerable time on repetitive tasks – such as lesion detection, segmentation, and volumetry – which detracts from more complex diagnostic reasoning and patient care. This manual effort not only limits the number of patients who can be assessed within a given timeframe, but also introduces the potential for inter-reader variability and human error. Consequently, healthcare systems bear increasing costs associated with extended reading times, the need for multiple expert opinions, and potential delays in treatment decisions, highlighting an urgent need for automation to streamline these processes and improve efficiency.

The exponential growth of brain MRI data presents a significant challenge to modern neuro-radiology. Healthcare systems are facing an unprecedented influx of images, far outpacing the capacity of manual review. Efficiently extracting clinically relevant information – identifying subtle anomalies indicative of stroke, tumor growth, or neurodegenerative disease – requires scalable automation. Current limitations in human throughput not only delay diagnoses but also increase operational costs and potentially compromise patient outcomes. Consequently, research is heavily focused on developing artificial intelligence-driven tools capable of rapidly processing these complex datasets, prioritizing critical findings, and ultimately augmenting the radiologist’s expertise to meet the demands of this ever-increasing data volume.

Conventional neuroimaging analysis relies on pipelines built for specific research questions or imaging protocols, creating a significant hurdle as clinical needs and technology advance. These systems often require substantial manual recalibration or complete rebuilding to accommodate new scan types, altered data formats, or evolving diagnostic criteria. This inflexibility not only slows down the adoption of cutting-edge imaging techniques but also introduces potential inconsistencies in data processing across different clinical sites and over time. Consequently, researchers and clinicians face challenges in leveraging the full potential of neuroimaging data for both individual patient care and large-scale studies, highlighting the necessity for adaptable and automated solutions that can seamlessly integrate with changing workflows and diverse datasets.

Orchestrating Intelligence: An Agentic Framework for Neuro-Radiology

The Agentic AI Framework is a novel system designed for neuro-radiology applications that utilizes Large Language Models (LLMs) to coordinate the execution of specialized analytical tools without requiring task-specific training data. This framework operates by leveraging pre-trained LLMs as the core reasoning engine, enabling it to interpret clinical questions and dynamically select appropriate tools for analysis. Rather than directly training the LLM on radiology data, the system relies on the LLM’s inherent capabilities to understand instructions and interface with existing, independently developed tools. This approach facilitates rapid deployment and adaptability to new analytical capabilities without the need for extensive retraining, providing a flexible and scalable solution for complex neuro-radiological assessments.

The Agentic AI Framework utilizes a multi-agent architecture wherein each agent is dedicated to a discrete analytical function within the neuro-radiology workflow. This design promotes modularity by allowing individual agents to be developed, tested, and updated independently without impacting the overall system. Scalability is achieved through the ability to easily add new agents representing additional analytical capabilities or to replicate existing agents to handle increased workloads. The decomposition of complex tasks into specialized agent functions enables parallel processing and efficient resource allocation, contributing to the framework’s adaptability and performance in varying clinical scenarios.

The Orchestrator Architecture functions as a central planning component within the Agentic AI Framework, dynamically assigning tasks to specialized domain expert agents based on the specific clinical question presented. This task assignment is facilitated through two primary methods: Agents-as-Tools, where agents are directly invoked to perform specific functions, and Handoffs, enabling agents to pass intermediate results and control to one another. Empirical results indicate that multi-agent configurations employing Handoffs consistently achieve superior performance compared to both single-agent systems and architectures relying solely on an Orchestrator for task management, demonstrating the benefits of collaborative, iterative problem-solving.

The image illustrates various configurations for a multi-agent system, showcasing different potential arrangements of interacting agents.

From Image to Insight: Validating the Analytical Pipeline

The 3D Brain MRI Analysis pipeline leverages a suite of established tools to process neuroimaging data. ANTSPy facilitates advanced image registration techniques, aligning scans for comparative analysis. SynthStrip performs automated skull stripping, removing non-brain tissue to focus computational resources. BraTS Orchestrator manages the overall workflow, integrating these processes and ensuring reproducibility. Finally, PyRadiomics is utilized for high-throughput feature extraction, quantifying imaging characteristics relevant to neurological studies. This combination enables comprehensive and automated analysis of brain MRI data, from initial preprocessing to quantitative biomarker derivation.

The pipeline employs SynthSeg for the automated segmentation of 32 distinct anatomical regions within the brain MRI scans. This segmentation process delivers detailed structural information critical for subsequent quantitative analysis, enabling the calculation of volumes, shapes, and spatial relationships between these regions. The resulting data provides a basis for characterizing brain morphology and identifying potential structural anomalies, ultimately supporting investigations into neurological conditions and disease progression. SynthSeg’s output serves as a foundational dataset for downstream feature extraction and statistical modeling.

Performance of the 3D Brain MRI analysis pipeline was objectively evaluated using a dedicated Brain MRI VQA Dataset, employing metrics such as Tool-Call Fidelity and Output Quality to assess accuracy and robustness. Benchmarking revealed that a single agent approach required an average of 2 actions to complete Task 1, demonstrating improved efficiency compared to the agents-as-tools design (3.45 actions) and the orchestrator design (4.9 actions). Additionally, near-perfect Inclusion Rate was achieved for Task 3 when utilizing GPT-5.1, indicating high performance in accurately identifying and including relevant information within the analysis.

Beyond the Horizon: Generalizability and the Future of Intelligent Neuro-Radiology

The framework’s adaptability was rigorously tested through evaluation with a diverse range of leading Large Language Models, including GPT-5.1, Gemini 3 Pro, and Claude Sonnet 4.5. This comprehensive assessment confirmed the framework isn’t limited to a specific model architecture or training dataset; it successfully integrated and functioned effectively across these varied platforms. Demonstrating performance consistency regardless of the underlying LLM highlights its potential for broad implementation and future-proofing against rapid advancements in artificial intelligence, ensuring continued functionality even as new models emerge and older ones are superseded.

The architecture distinguishes itself through its inherent adaptability, representing a departure from rigid, pre-defined clinical workflows. This agentic system isn’t limited by static pipelines; instead, it functions as a dynamic platform capable of seamlessly incorporating new tools and resources as they become available. Each agent operates with a specific skillset, and the system’s orchestration allows for fluid collaboration and handoff between them, responding effectively to the ever-changing demands of clinical practice. This flexibility promises a future where diagnostic and treatment strategies can be rapidly updated and personalized, leveraging the latest advancements in medical technology and research without requiring fundamental system overhauls.

The architecture is poised for advancement towards increasingly sophisticated clinical applications, with future iterations designed to accommodate more intricate workflows and individualized patient care. Initial evaluations reveal a performance scaling with task complexity; a single agent completed an average of 5.28 actions for a moderately complex task, increasing to 11.85 actions for a significantly more demanding scenario. Importantly, alternative approaches employing multiple agents or handoffs, alongside a central orchestrator, consistently required a greater number of actions to achieve comparable results, suggesting that optimization of single-agent performance remains a crucial avenue for streamlining clinical decision support and ultimately, improving patient outcomes through more efficient and personalized treatment strategies.

The pursuit of sophisticated image analysis, as detailed in this work, echoes a fundamental design principle: elegance arising from deep understanding. The system’s ability to orchestrate complex 3D brain MRI analysis without task-specific training highlights a harmonious interplay between large language models and neuro-radiological workflows. As David Marr observed, “A good theory should not only explain the data but also predict it.” This agentic approach, demonstrating performance exceeding single-agent systems, suggests a predictive capability born from a well-structured, interconnected architecture – a testament to the power of form following function. The multi-agent system doesn’t simply process images; it understands the relationships within them, creating an interface that, if not singing, certainly hums with efficiency.

Where Do We Go From Here?

The demonstrated capacity for agentic systems to navigate the complexities of neuro-radiological analysis without explicit training is… economical, if nothing else. It speaks to a latent structure within these large language models, a kind of pre-existing competence that merely requires skillful prompting to emerge. However, to mistake this competence for understanding would be a characteristic human error. The current architecture, while demonstrating a clear benefit from multi-agent collaboration, remains a brittle construction. A single, unanticipated imaging artifact, a subtle variation in protocol, could easily disrupt the carefully orchestrated workflow.

The pursuit of robustness demands a move beyond clever prompting. Future work must address the question of grounding – anchoring these agentic systems in a more fundamental understanding of anatomical principles and pathological processes. The emphasis should shift from what is being done to why it is being done. Furthermore, the current reliance on a shared contextual window, while effective, feels… provisional. A more elegant solution might involve a persistent, evolving knowledge graph, allowing agents to build upon past experiences and refine their analytical strategies.

Ultimately, the true test of this approach will not be its ability to replicate the performance of trained models, but its capacity to surpass it. To do so requires acknowledging that intelligence, even artificial intelligence, is not merely about pattern recognition, but about the elegant resolution of ambiguity. Consistency in approach is, after all, a form of empathy for those who will follow.

Original article: https://arxiv.org/pdf/2604.16729.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Burden of Interpretation: Bottlenecks in Neuro-Radiology

Orchestrating Intelligence: An Agentic Framework for Neuro-Radiology

From Image to Insight: Validating the Analytical Pipeline

Beyond the Horizon: Generalizability and the Future of Intelligent Neuro-Radiology

Where Do We Go From Here?

See also: