AI Learns to Spot the Unseen: Diagnosing Rare Brain Diseases

Author: Denis Avetisyan

A new AI framework leverages medical knowledge and image analysis to improve the detection of difficult-to-diagnose neurological conditions.

A novel diagnostic system dissects the boundaries of medical knowledge retrieval, coordinating agents to integrate external data and achieve a 10.2% accuracy increase in rare disease diagnosis—a demonstrable improvement over conventional methods achieved by actively challenging baseline limitations.

Researchers developed RADAR, an agentic AI system that combines large language models with the Radiopaedia database to enhance diagnostic reasoning from MRI scans of rare brain disorders.

Despite advances in medical imaging, artificial intelligence often struggles with rare diseases due to limited training data—a challenge mirroring how clinicians rely on case reports and literature for unfamiliar findings. This work, ‘Learning to reason about rare diseases through retrieval-augmented agents’, introduces RADAR, an agentic system that enhances diagnostic reasoning by integrating large language models with external medical knowledge. Our approach demonstrates up to a 10.2% performance gain on a benchmark dataset of rare brain disorders, alongside improved interpretability through literature-grounded explanations. Could retrieval-augmented reasoning unlock more robust and explainable AI for the long tail of medical imaging challenges?

The Ghost in the Machine: Diagnosing the Undiagnosable

Diagnosing rare diseases from brain MRI presents a significant challenge, demanding specialized expertise and considerable time. Subtle and atypical presentations complicate the process, increasing the risk of diagnostic errors and delays. Traditional approaches are hampered by the volume of medical literature and subjective image interpretation. Clinicians synthesize information from numerous sources, relying on experience prone to bias. This cognitive burden exacerbates identifying rare disease signatures in complex neuroimaging data.

The study compares four diagnostic reasoning setups—single-agent, collaborative, challenger, and a retrieval-augmented framework (RADAR)—to explore methods for refining diagnostic accuracy through information exchange and external knowledge access.

This diagnostic bottleneck delays treatment and potentially causes irreversible damage. Innovative solutions augmenting clinical expertise and accelerating diagnosis are paramount. Perhaps the true illness isn’t the disease, but the signal lost in the noise of certainty.

Decoding the Signal: RADAR and the Augmented Mind

RADAR is a retrieval-augmented framework enhancing LLM diagnostic capabilities in medical contexts. It integrates LLM reasoning with access to a comprehensive external knowledge base, addressing inherent LLM limitations regarding specialized information. A key component is the RAG Agent, dynamically retrieving pertinent medical information relevant to patient data. This agent identifies crucial details and formulates targeted queries, ensuring the LLM receives contextualized, up-to-date information.

The RADAR system demonstrates its ability to generate diagnoses, as exemplified by the provided results, which are presented alongside the corresponding ground-truth diagnoses for comparison.

By providing access to a vast, current medical knowledge base, RADAR mitigates inaccuracies and improves diagnostic precision and efficiency, supporting informed clinical decision-making.

The Anatomy of Insight: Knowledge Retrieval within RADAR

The RAG Agent employs Query Generation to precisely target searches within Radiopaedia. This component dynamically formulates queries based on user input, optimizing retrieval for relevant data. It avoids broad searches, focusing on specific aspects of the request to enhance accuracy.

Knowledge Retrieval utilizes the all-MiniLM-L6-v2 Sentence Transformer to create vector embeddings of both the query and Radiopaedia content, representing semantic meaning for efficient comparison. A FAISS Index stores and rapidly searches these embeddings, facilitating fast similarity searches.

Top-k Retrieval, using Cosine Similarity, selects the most relevant text chunks from Radiopaedia based on their similarity to the query embedding, limiting the number of retrieved chunks to the top k most relevant to manage computational costs and prevent information overload.

Confession of Errors: Validating RADAR’s Diagnostic Prowess

RADAR was evaluated on the NOVA Dataset, a collection of brain MRI scans representing a variety of rare diseases. Results demonstrate that RADAR significantly improves diagnostic accuracy compared to baseline LLMs like GPT-4o, achieving a Top-1 accuracy of 54.40%, comparable to resident neuroradiologists (48–52%).

RADAR achieved a 7.97% improvement in Top-1 accuracy compared to Qwen3-32B, and a 10.19% improvement in Top-5 accuracy compared to DeepSeek-R1-70B. This improvement highlights the power of retrieval-augmented generation in accelerating rare disease diagnosis—a bug in the system of delayed diagnosis, now confessing its design sins.

Rewriting the Code: Towards Intelligent Diagnostic Systems

RADAR’s modular design facilitates seamless integration with existing tools and data sources, crucial for comprehensive assessments. Current development centers on expanding the knowledge base, incorporating newly discovered rare diseases, refining algorithms, and improving handling of ambiguous information. The goal is to enhance diagnostic accuracy and broaden coverage to encompass more rare genetic conditions.

This technology has the potential to reshape the diagnostic process, leading to quicker, more precise diagnoses and improved patient outcomes. By accelerating the identification of rare diseases, clinicians can initiate timely interventions and personalized treatment plans, enhancing the quality of life for affected individuals.

The pursuit within this research mirrors a fundamental principle of inquiry: understanding through rigorous testing. RADAR, as an agentic AI framework, doesn’t simply accept the limitations of its initial training; it actively seeks external knowledge, challenging the boundaries of what it ‘knows’ about rare brain disorders. This echoes Alan Turing’s sentiment: “Sometimes people who are unhappy tend to look for a person to blame.” While seemingly disparate, the quote highlights a drive to resolve uncertainty, a drive mirrored in RADAR’s diagnostic reasoning process. The system doesn’t passively observe MRI scans; it actively retrieves and integrates information, systematically dismantling the complexity of each case to arrive at a reasoned conclusion, constantly probing the edges of its understanding.

What’s Next?

The construction of RADAR, as presented, isn’t a culmination, but a controlled demolition of assumptions. The system doesn’t simply answer diagnostic questions; it reveals the fragility of current knowledge representation. The reliance on curated resources like Radiopaedia, while pragmatic, begs the question: how much of diagnostic expertise remains tacit, unwritten, and therefore inaccessible even to sophisticated language models? The true limitation isn’t the AI’s reasoning ability, but the completeness—and inherent biases—of the data it consumes.

Future iterations shouldn’t focus solely on increasing accuracy metrics. A more fruitful, if unsettling, path lies in deliberately probing the system’s failure modes. Where does RADAR confidently misdiagnose? What edge cases expose the gaps in its knowledge? These aren’t bugs to be fixed, but opportunities to map the boundaries of medical understanding. The aim isn’t to create an infallible diagnostic tool, but to build a system that systematically reveals what isn’t known.

Ultimately, the value of agentic AI in this domain may not reside in replacing clinicians, but in functioning as a cognitive stress test for the entire field. By forcing explicit articulation of diagnostic reasoning, these systems expose the underlying heuristics, assumptions, and gaps in medical knowledge. It’s a process of reverse-engineering expertise, and, like any good hack, it requires breaking things to truly understand how they work.

Original article: https://arxiv.org/pdf/2511.04720.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/