Author: Denis Avetisyan
Researchers have developed an intelligent framework that combines artificial intelligence with medical knowledge to improve the accuracy and consistency of Polycystic Ovary Syndrome diagnosis.

A multi-agent system leveraging large language models and a knowledge graph enables guideline-based reasoning and differential exclusion for more effective PCOS diagnosis.
Despite the prevalence of Polycystic Ovary Syndrome (PCOS) affecting approximately 10% of reproductive-aged women, accurate and interpretable diagnosis remains a challenge due to limitations in existing machine learning approaches and a lack of domain-specific AI tools. This paper introduces Mapis: A Knowledge-Graph Grounded Multi-Agent Framework for Evidence-Based PCOS Diagnosis, a novel system that leverages a structured knowledge graph and collaborative agents to simulate guideline-based clinical reasoning. Extensive evaluations demonstrate that Mapis significantly outperforms existing methods, achieving up to 13.56% higher accuracy on clinical datasets, and paving the way for more transparent and reliable diagnostic tools. Could this framework represent a paradigm shift towards knowledge-grounded AI in complex medical domains?
Unveiling the Diagnostic Challenges of PCOS
Polycystic Ovary Syndrome (PCOS) affects an estimated 6 to 12 percent of women of reproductive age, making it one of the most common endocrine disorders globally. Despite its prevalence, a significant number of cases remain undiagnosed, largely due to the condition’s heterogeneous nature and the often-subtle presentation of symptoms. This underdiagnosis poses substantial health risks, as PCOS is linked to long-term complications including insulin resistance, type 2 diabetes, cardiovascular disease, and endometrial cancer. Furthermore, the disorder frequently contributes to infertility and can have a profound impact on a woman’s quality of life, highlighting the critical need for increased awareness and improved diagnostic strategies to reach those currently without a diagnosis and provide timely intervention.
The current standard for diagnosing Polycystic Ovary Syndrome (PCOS), the Rotterdam Criteria, while widely used, presents inherent challenges due to its reliance on interpretation. This criterion set requires the presence of at least two out of three features – irregular ovulation, clinical or biochemical signs of hyperandrogenism, and polycystic ovaries observed via ultrasound – leaving room for variability between clinicians. The subjective nature of assessing these features, particularly the appearance of polycystic ovaries on ultrasound – which can be influenced by the equipment used and the technician’s expertise – contributes to inconsistencies in diagnosis. Consequently, individuals with mild presentations or atypical symptoms may be overlooked, while others might receive a PCOS diagnosis despite not fully meeting the criteria, highlighting the need for more objective and standardized diagnostic tools to improve accuracy and reduce misdiagnosis rates.
The varied ways Polycystic Ovary Syndrome (PCOS) presents clinically poses a significant hurdle to timely and accurate diagnosis. Symptoms aren’t uniform; some individuals experience infrequent periods and excess androgens, while others grapple primarily with infertility or metabolic disturbances, and many exhibit a combination of these and other issues. This heterogeneity means a patient’s experience can differ drastically, making it difficult to apply rigid diagnostic criteria. Consequently, there’s a growing need for a more nuanced, standardized approach that moves beyond simply checking off criteria to encompass a broader evaluation of hormonal profiles, metabolic health, and individual symptom presentation. A robust diagnostic framework could improve early detection, facilitate personalized treatment plans, and ultimately lessen the long-term health risks associated with this complex endocrine disorder.
Mapis: A System for Orchestrating PCOS Diagnosis
Mapis is a multi-agent system (MAS) developed to simulate the diagnostic process for Polycystic Ovary Syndrome (PCOS). The system’s architecture is designed to replicate the sequential steps a clinician undertakes when evaluating a patient for PCOS, including history taking, physical examination, and laboratory testing. Each agent within Mapis represents a specific role in this workflow – for example, an agent dedicated to reviewing patient history, another to interpreting lab results, and a final agent responsible for formulating a diagnosis based on the combined evidence. This distributed approach aims to improve diagnostic accuracy by reducing individual cognitive load and facilitating a more comprehensive evaluation, while also increasing efficiency through parallel processing of information.
Mapis utilizes a domain Knowledge Graph to represent established clinical guidelines for Polycystic Ovary Syndrome (PCOS) diagnosis. This Knowledge Graph serves as a structured repository of medical knowledge, encoding relationships between symptoms, lab results, and diagnostic criteria. By grounding its reasoning in this pre-defined knowledge base, Mapis mitigates the risk of “hallucinations” – the generation of factually incorrect or unsupported statements – commonly observed in large language models. The Knowledge Graph ensures that diagnostic inferences are aligned with accepted medical practice and provides a traceable pathway for validating each conclusion, enhancing both the reliability and interpretability of the system’s output.
Clinical data preprocessing within Mapis involves several key steps to convert unstructured Electronic Health Record (EHR) data into a standardized, agent-readable format. This process begins with extracting relevant data points – including patient demographics, medical history, lab results, and medication lists – from diverse data sources and formats commonly found in EHR systems, such as free-text notes, coded values, and image reports. Next, data cleaning addresses inconsistencies, errors, and missing values through techniques like imputation and outlier detection. Finally, the cleaned data undergoes transformation and normalization, converting it into a structured format – specifically, a set of facts represented in Resource Description Framework (RDF) – that facilitates reasoning and knowledge sharing among the Mapis agents. This structured representation enables the agents to effectively utilize the domain Knowledge Graph and perform accurate diagnostic inferences.

Deconstructing the Diagnosis with Agent-Based Reasoning
The Gynecological Endocrine Agent evaluates hormonal profiles to identify imbalances characteristic of Polycystic Ovary Syndrome (PCOS). This assessment specifically focuses on androgen levels, measuring both Free Testosterone and Dehydroepiandrosterone Sulfate (DHEA-S). Elevated levels of these androgens, alongside clinical or biochemical evidence of hyperandrogenism – such as hirsutism, acne, or alopecia – contribute to the diagnostic criteria for PCOS. The agent correlates these hormonal data with patient-reported symptoms and other diagnostic findings to establish the degree of androgen excess and its potential contribution to the overall clinical picture.
The Radiology Agent utilizes ultrasound imaging to assess Polycystic Ovarian Morphology (POM), a key diagnostic feature of Polycystic Ovary Syndrome (PCOS). This evaluation focuses on identifying the presence of multiple ovarian cysts, typically defined as twelve or more follicles measuring 2-9mm in diameter per ovary. While the presence of cysts is a common finding, it’s important to note that POM is not always present in all PCOS patients, and the number and size of cysts can vary. The agent’s assessment includes measuring ovarian volume and documenting the distribution and characteristics of any observed follicular structures, contributing to a comprehensive evaluation alongside hormonal data and differential diagnoses.
The Exclusion Agent utilizes differential diagnosis as a critical component of Polycystic Ovary Syndrome (PCOS) assessment, systematically ruling out conditions that may mimic PCOS symptoms. This process involves evaluating and excluding alternative diagnoses such as congenital adrenal hyperplasia, androgen-secreting tumors, thyroid dysfunction, and hyperprolactinemia. By applying established diagnostic criteria and leveraging patient data – including hormonal profiles and imaging results – the Exclusion Agent minimizes false positive diagnoses and enhances diagnostic specificity. This ensures that a PCOS diagnosis is only confirmed after reasonably excluding other potential etiologies for hyperandrogenism and ovulatory dysfunction.
Augmenting LLMs: Precision and Knowledge in PCOS Diagnosis
Mapis leverages the power of Large Language Models, but crucially, it doesn’t rely on the models’ inherent knowledge alone. Instead, it employs Retrieval-Augmented Generation (RAG) – a technique that dynamically retrieves relevant information from a dedicated knowledge source before the model formulates its response. This ensures that the clinical reports generated are not simply based on potentially outdated or inaccurate data stored within the model itself, but are firmly grounded in verified evidence. The system synthesizes findings by first identifying pertinent details, then using the LLM to craft a coherent and transparent report, effectively combining the reasoning abilities of the model with the reliability of external knowledge – a crucial step towards trustworthy AI in healthcare.
The reliability of Large Language Models in clinical contexts is substantially improved by integrating them with a dedicated Knowledge Graph. This approach moves beyond the LLM’s inherent limitations – its potential to generate plausible but factually incorrect statements – by providing a verified source of truth. Rather than relying solely on patterns learned from vast datasets, the LLM consults the Knowledge Graph to confirm and contextualize information before inclusion in the clinical report. This grounding process significantly reduces the risk of hallucinations or the introduction of irrelevant details, bolstering the fidelity and trustworthiness of the generated synthesis. Consequently, the system delivers more accurate and clinically relevant reports, enhancing its value as a decision-support tool and minimizing potential errors in patient care.
Recent evaluations demonstrate Mapis’s substantial advancement in Polycystic Ovary Syndrome (PCOS) diagnosis, consistently exceeding the performance of current state-of-the-art methods. On a private clinical dataset, the system achieved improvements of up to 6.05% in the F1-score and 5.88% in diagnostic accuracy. Notably, when integrated with GPT-4.1, Mapis attained an impressive accuracy of 91.76% and an F1-score of 93.52% on the same dataset, indicating a highly reliable and precise diagnostic capability. Despite its comprehensive analysis, the system maintains a practical processing time, with an average latency of 39.8 seconds per case, suggesting its potential for real-world clinical application.
The framework detailed in this work, Mapis, embodies a systemic approach to clinical diagnosis. It prioritizes understanding the interconnectedness of symptoms and guidelines, mirroring a holistic view of health. This resonates with Donald Davies’ observation that “Structure dictates behavior.” Mapis doesn’t merely process data; its knowledge graph and multi-agent system establish a structure that guides the diagnostic process, ensuring adherence to evidence-based guidelines and enabling a more accurate differential exclusion of PCOS. The system’s design acknowledges that a change in one area – a new symptom or guideline update – impacts the entire framework, emphasizing the importance of structural integrity for reliable outcomes.
Future Directions
The pursuit of diagnostic rigor, as exemplified by Mapis, inevitably reveals the inherent fragility of formalized systems. While the framework demonstrates a capacity for guideline-compliant reasoning, it simultaneously highlights the limitations of translating nuanced clinical judgment into algorithmic processes. The reliance on a knowledge graph, however robust, still necessitates careful curation and continuous updating – a perpetual task given the evolving landscape of medical understanding. The true challenge lies not merely in achieving accurate diagnoses, but in gracefully handling ambiguity and uncertainty, qualities presently beyond the reach of even the most sophisticated models.
Future iterations should focus less on chasing marginal gains in diagnostic accuracy and more on developing mechanisms for self-assessment and error propagation. A system capable of articulating its own confidence levels, and flagging areas of potential misinterpretation, would be far more valuable than one simply delivering a ‘correct’ answer. Moreover, the integration of longitudinal patient data and individual variability – moving beyond population-level guidelines – presents a substantial, yet crucial, hurdle.
The elegance of a system is often masked by its complexity. Good architecture is invisible until it breaks, and only then is the true cost of decisions visible.
Original article: https://arxiv.org/pdf/2512.15398.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Mobile Legends: Bang Bang (MLBB) Sora Guide: Best Build, Emblem and Gameplay Tips
- Brawl Stars December 2025 Brawl Talk: Two New Brawlers, Buffie, Vault, New Skins, Game Modes, and more
- Clash Royale Best Boss Bandit Champion decks
- Best Hero Card Decks in Clash Royale
- Call of Duty Mobile: DMZ Recon Guide: Overview, How to Play, Progression, and more
- Clash Royale December 2025: Events, Challenges, Tournaments, and Rewards
- Best Arena 9 Decks in Clast Royale
- Clash Royale Best Arena 14 Decks
- Clash Royale Witch Evolution best decks guide
- Brawl Stars December 2025 Brawl Talk: Two New Brawlers, Buffie, Vault, New Skins, Game Modes, and more
2025-12-19 05:53