Author: Denis Avetisyan
New research details an AI system designed to improve medical diagnoses by actively seeking clarifying information from patients, rather than relying solely on initial reports.
MedClarify leverages an agentic framework and Bayesian updating to refine differential diagnoses by strategically requesting case-specific follow-up questions.
While large language models show promise in medical diagnosis, they often struggle with the iterative reasoning inherent in clinical practice, where diagnoses rarely emerge from initial presentations alone. To address this, we introduce MedClarify: an information-seeking AI agent for medical diagnosis with case-specific follow-up questions, which proactively seeks clarifying information via targeted questioning. By computing differential diagnoses and selecting questions maximizing expected information gain, MedClarify reduces diagnostic uncertainty and improves accuracy by approximately 27 percentage points compared to standard LLM approaches. Could this agentic framework pave the way for more effective and nuanced dialogues with medical LLMs, mirroring the complexities of real-world clinical reasoning?
Deconstructing Diagnosis: The Limits of Certainty
The diagnostic process frequently encounters challenges when initial patient presentations are complex and lack clear indicators of underlying disease. This ambiguity stems from the inherent overlap in symptoms across multiple conditions, the subjective nature of patient-reported experiences, and the limitations of current medical knowledge. Consequently, physicians often face a range of possible diagnoses, each with varying probabilities, requiring further investigation to narrow the field. The human body rarely presents a textbook case; variations in individual physiology, co-morbidities, and even lifestyle factors contribute to atypical presentations that can confound even experienced clinicians. Addressing this diagnostic uncertainty isn’t about eliminating it entirely – that’s often impossible – but rather about systematically reducing the number of plausible explanations through careful history-taking, targeted physical examinations, and judicious use of diagnostic testing.
Diagnostic uncertainty isn’t merely an intellectual challenge for clinicians; it directly impacts patient well-being and healthcare economics. Prolonged periods without a firm diagnosis frequently result in delayed initiation of appropriate therapy, allowing conditions to worsen and potentially necessitating more aggressive-and costly-interventions later on. This diagnostic odyssey also contributes significantly to escalating healthcare expenditures through repeated testing, specialist consultations, and extended hospital stays. Beyond the financial burden, the anxiety and psychological distress experienced by patients facing ambiguous medical situations represent a substantial, often overlooked, adverse outcome. Ultimately, minimizing diagnostic delays isn’t simply about achieving clinical accuracy; it’s about safeguarding patient health and optimizing resource allocation within the healthcare system.
The pursuit of an accurate diagnosis isn’t simply about accumulating data, but rather a deliberate process of minimizing uncertainty through precisely chosen investigations. Clinicians effectively operate by formulating initial hypotheses and then strategically seeking information that either supports or refutes those possibilities. This targeted approach, often guided by principles of Bayesian reasoning, prioritizes tests and inquiries that will yield the most diagnostic value, narrowing the range of potential conditions with each carefully selected step. By systematically addressing the most likely explanations and actively ruling out alternatives, diagnostic uncertainty is progressively reduced, ultimately leading to a more confident and timely intervention. This iterative process, prioritizing relevant data over exhaustive testing, is central to effective clinical decision-making and improved patient care.
MedClarify: An Intelligence That Questions
MedClarify distinguishes itself from traditional symptom checkers by utilizing a Large Language Model (LLM) to actively solicit further patient information. Rather than simply providing potential diagnoses based on initially reported symptoms – a passive approach – the system generates context-relevant follow-up questions designed to clarify ambiguous inputs and uncover additional pertinent details. This LLM-driven questioning isn’t pre-scripted; the model dynamically constructs inquiries based on the evolving understanding of the patient’s case, allowing for a more nuanced and targeted information-gathering process. The generated questions aim to differentiate between conditions sharing similar initial presentations and ultimately improve the accuracy of differential diagnosis.
MedClarify utilizes Bayesian Updating, a statistical method, to continuously refine the probability of different diagnoses as patient information is gathered. Initially, the system assigns prior probabilities to a range of potential conditions. With each reported symptom or response, these probabilities are updated using Bayes’ Theorem, which incorporates the likelihood of observing that specific information given each diagnosis. The resulting posterior probabilities represent the system’s current belief in each diagnosis, given all available evidence. This iterative process allows MedClarify to move beyond simple symptom matching and dynamically adjust its assessment of risk, providing a more nuanced and accurate diagnostic evaluation. Formally, the posterior probability [latex]P(D|S)[/latex] – the probability of diagnosis D given symptom S – is calculated as [latex]P(D|S) = \frac{P(S|D)P(D)}{P(S)}[/latex], where [latex]P(S|D)[/latex] is the likelihood of observing symptom S if diagnosis D is true, [latex]P(D)[/latex] is the prior probability of diagnosis D, and [latex]P(S)[/latex] is the probability of observing symptom S.
MedClarify’s questioning strategy is not pre-defined but is instead guided by information theory to maximize diagnostic efficiency. The system calculates diagnostic entropy – a measure of uncertainty regarding the possible diagnoses – and prioritizes inquiries designed to yield evidence that will most significantly reduce this entropy. This is achieved by evaluating the expected reduction in entropy for each potential question, considering the probabilities of different diagnoses and the likelihood of obtaining specific responses. Consequently, MedClarify dynamically selects questions that offer the greatest potential to differentiate between competing hypotheses, focusing on the data points that will most rapidly converge towards a likely diagnosis and avoid redundant or uninformative lines of inquiry.
Simulating Reality: Rigorous Testing Through Agentic Interaction
The Agentic Evaluation Framework was constructed to model doctor-patient interactions through the implementation of three distinct agents. The Patient Agent simulates a patient presenting symptoms and responding to queries. The Update Agent manages the iterative questioning process, formulating follow-up questions based on the patient’s responses and the current diagnostic hypothesis. Finally, the Evaluator Agent assesses the diagnostic accuracy at each step and determines convergence, providing a quantitative measure of performance. This agent-based approach allows for controlled experimentation and systematic evaluation of diagnostic capabilities, moving beyond static, single-turn interactions.
The Agentic Evaluation Framework was utilized to quantify MedClarify’s diagnostic performance through iterative questioning. Evaluation results demonstrate a 26.9 percentage point increase in diagnostic accuracy when MedClarify employed this agentic, iterative approach, as compared to a single-shot Large Language Model (LLM) baseline. This improvement indicates that MedClarify’s ability to refine its understanding of a patient’s condition through sequential questioning significantly enhances its diagnostic capabilities beyond those of a non-iterative LLM approach. The baseline LLM was prompted with the initial patient information only, and did not engage in follow-up questioning.
To assess performance with imperfect data, we utilized Feature Masking during agentic simulation, systematically removing portions of the initial patient data provided to the system. This process simulates the real-world scenario where clinicians often lack complete patient histories or test results. Evaluation revealed a 5.7 percentage point increase in diagnostic accuracy for MedClarify, when compared to a baseline system employing non-adaptive, or “naïve,” follow-up questioning to elicit missing information. This improvement demonstrates the system’s enhanced resilience and ability to effectively gather necessary data despite incomplete initial inputs, highlighting its practical applicability in clinical settings.
Beyond Diagnosis: A System That Learns to Ask the Right Questions
MedClarify signals a fundamental change in the diagnostic process, moving beyond systems that simply deliver a diagnosis to those that actively solicit clarifying information from patients. This approach acknowledges the inherent complexities of medical presentation and the potential for ambiguity in symptom reporting. By engaging in a dialogue to pinpoint specific details, MedClarify aims to reduce diagnostic errors stemming from incomplete or misinterpreted information – a common source of misdiagnosis. The system’s capacity to request further details isn’t merely about gathering data; it’s about mirroring the iterative questioning employed by experienced clinicians, ultimately leading to more accurate assessments and, consequently, improved patient outcomes through targeted and effective treatment plans.
A critical component of MedClarify’s reliability lies in its application of Temperature Scaling, a technique designed to refine the confidence levels associated with its diagnostic suggestions. Without such calibration, large language models often exhibit overconfidence, assigning high probabilities to incorrect answers – a potentially dangerous trait in medical contexts. This method effectively adjusts the model’s output, ensuring that predicted probabilities more accurately reflect the true likelihood of a diagnosis. Rigorous testing demonstrated a substantial improvement in calibration, with Temperature Scaling achieving a remarkable 56.8% reduction in Calibration Error (ECE) when compared to a standard, uncalibrated multi-turn baseline. This enhanced robustness minimizes the risk of misleadingly confident, yet inaccurate, diagnoses, bolstering the system’s trustworthiness and paving the way for safer clinical integration.
Further development of MedClarify prioritizes seamless integration with existing electronic health record systems, aiming to transform it from a research tool into a practical clinical asset. This integration will enable automated data retrieval and contextualization, streamlining the diagnostic process and reducing the burden on healthcare professionals. Simultaneously, efforts are underway to broaden the system’s expertise beyond its current scope, with plans to incorporate knowledge from a more diverse range of medical specialties. This expansion will not only increase the breadth of diagnosable conditions, but also enhance MedClarify’s ability to address the complexities of multi-morbidity, ultimately fostering more holistic and accurate patient care.
MedClarify embodies a systematic dismantling of diagnostic assumptions. The agent doesn’t simply accept presented data; it actively challenges the initial state through targeted questioning, mirroring a process of controlled deconstruction. This pursuit of clarity resonates with Alan Turing’s insight: “Sometimes people who are unhappy tend to look at the world as if through a grey veil.” MedClarify attempts to lift that veil, recognizing that complete information is rarely offered upfront. By probing for specifics, the agent minimizes ambiguity – a direct application of Bayesian updating – and refines its assessment, effectively reverse-engineering the patient’s condition from fragmented clues. Every exploit starts with a question, not with intent, and MedClarify’s questioning approach exemplifies this principle.
Beyond the Symptoms
MedClarify represents a predictable, yet crucial, step: acknowledging that even the most verbose large language model operates with a fundamentally incomplete picture. The system doesn’t diagnose; it dismantles uncertainty, peeling back layers of ‘not-quite-right’ with targeted inquiry. But the true challenge isn’t replicating a competent physician’s questioning style; it’s understanding why certain questions yield more information than others. The current framework treats information gain as a measurable quantity, a logical outcome of well-formed queries. Yet, the human element-intuition, subtle cues, the art of building patient trust to elicit complete answers-remains stubbornly opaque.
Future iterations will inevitably focus on refining the Bayesian updating mechanisms, improving the agent’s ability to prioritize relevant data, and expanding the knowledge base. However, a more radical approach might involve treating the patient not as a data source, but as a complex system itself. What if the agent’s questions were designed to perturb the system, observing the resulting changes to reveal hidden connections? It’s a shift from passive information gathering to active experimentation-a deliberate attempt to break the black box of the human body to understand how it functions.
Ultimately, the success of such systems won’t be measured by diagnostic accuracy alone. It will be determined by their ability to expose the limits of current medical knowledge, forcing a reevaluation of established assumptions. A truly intelligent agent doesn’t simply find answers; it identifies the questions that have yet to be asked.
Original article: https://arxiv.org/pdf/2602.17308.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- MLBB x KOF Encore 2026: List of bingo patterns
- eFootball 2026 Jürgen Klopp Manager Guide: Best formations, instructions, and tactics
- Overwatch Domina counters
- Brawl Stars Brawlentines Community Event: Brawler Dates, Community goals, Voting, Rewards, and more
- 1xBet declared bankrupt in Dutch court
- Clash of Clans March 2026 update is bringing a new Hero, Village Helper, major changes to Gold Pass, and more
- eFootball 2026 Starter Set Gabriel Batistuta pack review
- Gold Rate Forecast
- Bikini-clad Jessica Alba, 44, packs on the PDA with toyboy Danny Ramirez, 33, after finalizing divorce
- James Van Der Beek grappled with six-figure tax debt years before buying $4.8M Texas ranch prior to his death
2026-02-21 03:30