AI Explains Itself: Building Narratives from Machine Learning

Author: Denis Avetisyan


Researchers are leveraging the power of multi-agent systems and large language models to automatically generate clear, human-understandable explanations of complex AI decisions.

The Coherence Agent identifies issues within narrative text-highlighting them for focused revision-and provides corresponding feedback to improve overall textual consistency.
The Coherence Agent identifies issues within narrative text-highlighting them for focused revision-and provides corresponding feedback to improve overall textual consistency.

This review examines an agentic approach to generating faithful and coherent XAI-narratives, utilizing SHAP values and LLMs to bridge the gap between model prediction and human comprehension.

Despite advances in Explainable AI (XAI), translating complex model decisions into truly accessible explanations remains a significant challenge. This work, ‘An Agentic Approach to Generating XAI-Narratives’, proposes a multi-agent framework leveraging Large Language Models (LLMs) to automatically generate and refine human-understandable narratives of machine learning predictions, prioritizing both faithfulness to the model and linguistic coherence. Evaluations across five LLMs and tabular datasets demonstrate that iterative refinement by critic agents substantially improves narrative faithfulness-reducing unfaithful explanations by up to 90%-and an ensemble strategy further enhances performance. Could this agentic approach unlock a new era of transparent and trustworthy AI systems capable of communicating complex reasoning to diverse audiences?


From Obscurity to Insight: The Need for Narrative Transparency in AI

Although techniques like SHAP values offer a powerful means of dissecting model decision-making processes and identifying influential features, the resulting outputs frequently present a significant hurdle for those lacking a strong technical background. These methods typically express explanations as numerical values or complex visualizations, demanding specialized knowledge to interpret effectively. Consequently, the valuable insights into why a model arrived at a particular conclusion remain obscured for many potential users, including domain experts and those directly impacted by the model’s predictions. This disconnect hinders the practical application of XAI, as understanding is a prerequisite for trust and responsible implementation, and technical complexity often overshadows the core reasoning behind the model’s behavior.

The utility of even the most accurate machine learning model hinges not just on its predictive power, but on the ability to convey why it makes certain decisions. Effective communication of these internal reasoning processes is increasingly vital for fostering trust, particularly when models influence critical areas like healthcare, finance, or criminal justice. Presenting raw data or technical feature attributions often falls short; explanations must transcend technical jargon and be readily understandable to those directly affected by the model’s output. This necessitates a move toward human-interpretable explanations – insights framed in a way that aligns with human cognition and allows for informed decision-making, ultimately bridging the gap between algorithmic precision and human understanding.

The human mind doesn’t process information as a list of isolated facts; instead, it seeks patterns and constructs meaning through stories. Consequently, simply identifying which features most influenced a machine learning model – while valuable – falls short of true explanation. Effective Explainable AI (XAI) demands more than feature importance scores; it requires translating those technical details into a cohesive narrative. This means framing the model’s reasoning in terms of cause and effect, connecting the inputs to the output through a logical and understandable sequence of events. By constructing these explanatory stories, complex algorithmic decisions become accessible, fostering trust and enabling stakeholders to grasp why a model arrived at a particular conclusion, rather than merely what the conclusion is.

The Narrator, when generating narratives from SHAP input like in this example from the Student dataset, can introduce faithfulness issues such as incorrect feature values (e.g., reporting 'goout' as 5 instead of 4), reversed feature signs (e.g., 'Walc' labeled as positive instead of negative), and inaccurate feature importance rankings (as seen with 'goout' and 'failures'), all highlighted in red.
The Narrator, when generating narratives from SHAP input like in this example from the Student dataset, can introduce faithfulness issues such as incorrect feature values (e.g., reporting ‘goout’ as 5 instead of 4), reversed feature signs (e.g., ‘Walc’ labeled as positive instead of negative), and inaccurate feature importance rankings (as seen with ‘goout’ and ‘failures’), all highlighted in red.

Orchestrating Clarity: An Agentic Approach to Narrative XAI

An Agentic AI approach to explainable AI (XAI) narrative generation utilizes a Multi-Agent System (MAS) to automate the process of creating human-understandable explanations. This involves decomposing the complex task of narrative construction into discrete sub-tasks handled by independent agents. The MAS framework enables parallel processing and specialization, allowing each agent to focus on a specific aspect of the narrative, such as content generation, factual accuracy, and linguistic coherence. This distributed architecture contrasts with monolithic approaches where a single model handles all aspects of explanation generation, and aims to improve both the quality and scalability of XAI narratives.

The Agentic AI system utilizes a three-agent architecture to automate XAI narrative generation. The Narrator agent is responsible for the initial drafting of the explanation, translating model behavior into human-readable text. The Faithful Evaluator agent assesses the narrative’s fidelity to the underlying model’s reasoning, identifying and flagging any inaccuracies or misrepresentations. Finally, the Coherence Agent focuses on linguistic refinement, ensuring the narrative is grammatically correct, stylistically consistent, and easily understandable, thereby improving overall readability and clarity.

Distributing narrative generation tasks across specialized agents – a Narrator, Faithful Evaluator, and Coherence Agent – offers significant advantages over monolithic approaches. Monolithic systems, handling all aspects of narrative creation within a single module, often struggle with complexity and limited scalability as narrative requirements evolve. In contrast, the multi-agent system facilitates modularity; each agent focuses on a specific sub-problem, enabling independent development, testing, and refinement. This decomposition improves robustness by isolating failures and reducing the impact of errors. Furthermore, the system’s architecture inherently supports scalability; individual agents can be replicated or enhanced without requiring modification of the entire system, allowing for increased throughput and the handling of more complex explanatory scenarios.

An iterative process refines narratives by combining faithfulness critiques from the [latex]	ext{Faithful Evaluator}[/latex] and coherence feedback from the [latex]	ext{Coherence Agent}[/latex], allowing the [latex]	ext{Narrator}[/latex] to generate increasingly improved text until a defined stopping criterion is met.
An iterative process refines narratives by combining faithfulness critiques from the [latex] ext{Faithful Evaluator}[/latex] and coherence feedback from the [latex] ext{Coherence Agent}[/latex], allowing the [latex] ext{Narrator}[/latex] to generate increasingly improved text until a defined stopping criterion is met.

Validating the Logic: Metrics for Faithful XAI Narratives

The Faithful Evaluator utilizes a suite of automatic metrics to assess the fidelity of generated XAI narratives relative to the source SHAP values. Rank Accuracy measures the correct ordering of feature importance as presented in the narrative compared to the SHAP values. Sign Accuracy determines the percentage of features for which the narrative correctly identifies the direction of influence (positive or negative) as indicated by the SHAP values. Finally, Value Accuracy quantifies how closely the magnitude of feature importance described in the narrative aligns with the absolute values derived from the SHAP calculations; these metrics provide quantifiable measures of faithfulness beyond qualitative assessment.

Initial evaluation of the generated XAI narratives, conducted prior to refinement, yielded accuracy scores between 0.900 and 0.958 when assessed against the corresponding SHAP values using Automatic Evaluation metrics – specifically Rank Accuracy, Sign Accuracy, and Value Accuracy. These scores, representing the degree of alignment between the narrative and the feature importance data, establish a strong baseline performance and indicate the feasibility of further improvement through iterative refinement of the narrative generation process. The range suggests consistent performance across the chosen evaluation metrics, providing a reliable foundation for measuring subsequent gains achieved through optimization of the LLM and agentic system designs.

In specific configurations, an ensemble method achieved 100% faithfulness as measured by alignment between generated XAI narratives and the underlying SHAP values. This result was obtained using a critic-rule design implemented with the DeepSeek-V3.2-Exp language model. The ensemble approach likely combines multiple evaluation perspectives or narrative generation strategies to rigorously validate the fidelity of explanations, effectively eliminating discrepancies between the narrative and the model’s feature importance attribution.

Following initial evaluations, Round-2 testing demonstrated significant gains in the faithfulness and accuracy of generated XAI narratives. Across a range of Large Language Models (LLMs) and agentic system designs, accuracy metrics reached up to 0.999, as measured by the Faithful Evaluator’s Automatic Evaluation suite. This improvement indicates the efficacy of the implemented refinement strategies in aligning narrative explanations with the underlying SHAP values and confirms the potential for generating highly faithful explanations through iterative development and optimization of the XAI pipeline.

Extraction error represents a significant source of inaccuracy in XAI narrative generation, originating from limitations in the information retrieval process. When the system fails to accurately identify and retrieve the relevant data points needed to justify a model’s prediction, the resulting narrative will inherently be flawed. This error isn’t a deficiency in the narrative generation itself, but rather a problem with the foundational data supplied to the generator. Mitigation strategies must therefore focus on improving the robustness and precision of the information retrieval component, potentially through techniques like query refinement, expanded data source coverage, or the implementation of confidence scoring to filter unreliable data before it’s used to construct the explanation.

The Faithful Evaluator successfully extracts all mistakes from the SHAP input table (highlighted in red) and provides directional feedback to the Narrator for revision, as demonstrated by the output of the Faithful Critic.
The Faithful Evaluator successfully extracts all mistakes from the SHAP input table (highlighted in red) and provides directional feedback to the Narrator for revision, as demonstrated by the output of the Faithful Critic.

Beyond Automation: Human-Centered Validation of AI Narratives

Generated narratives undergo rigorous scrutiny beyond automated metrics through the implementation of Human-Centered Evaluation techniques. These methods, including detailed user surveys and meticulous manual inspection, directly assess whether explanations are not only technically correct, but also genuinely understandable and persuasive for intended audiences. This approach prioritizes practical usefulness, moving beyond simple accuracy to determine if the narratives effectively convey information and facilitate informed decision-making. By focusing on the human experience of interpreting these generated texts, researchers gain crucial insights into areas requiring refinement and ensure the final output resonates with, and is actionable for, real-world users.

Complementing automated metrics, a human-centered evaluation rigorously assesses whether generated narratives truly resonate with intended audiences. This process moves beyond simply detecting grammatical correctness or factual accuracy to determine if explanations are genuinely understandable and, crucially, actionable. By directly soliciting feedback from representative users, researchers can pinpoint specific areas where narratives falter – perhaps due to jargon, insufficient context, or illogical sequencing. Such insights facilitate targeted refinements, ensuring explanations not only convey information but also empower recipients to effectively apply that knowledge, ultimately bridging the gap between data and informed decision-making.

To streamline the often lengthy process of human evaluation, researchers are increasingly leveraging Large Language Models (LLMs) as automated initial assessors of narrative quality. This “LLM-as-a-Judge” approach doesn’t replace human judgment entirely, but instead functions as a powerful filter, rapidly identifying narratives that meet baseline standards for coherence, relevance, and clarity. By pre-screening generated content, LLM-as-a-Judge significantly reduces the workload for human evaluators, allowing them to concentrate on more nuanced assessments and complex cases. This acceleration not only improves efficiency but also facilitates more iterative development cycles, enabling researchers to refine narrative generation models with greater speed and precision.

A detailed qualitative analysis of generated narratives pinpointed specific coherence issues that hindered comprehension, revealing patterns beyond what automated metrics could detect. This in-depth review focused on instances of illogical transitions, abrupt topic shifts, and insufficient contextual linking between sentences, ultimately providing actionable insights for refinement. By addressing these identified weaknesses – such as strengthening pronoun references and explicitly stating causal relationships – researchers were able to significantly improve the narrative flow and clarity, ensuring explanations were not only factually correct but also easily understood by the intended audience. The targeted approach, guided by qualitative feedback, demonstrated that enhancing coherence is crucial for effective communication and building user trust in AI-generated content.

The Faithful Evaluator processes narratives to extract feature information-including rank, sign, and value-and identifies errors, which are highlighted for reporting.
The Faithful Evaluator processes narratives to extract feature information-including rank, sign, and value-and identifies errors, which are highlighted for reporting.

The pursuit of coherent and faithful explanations, as demonstrated in this research, echoes a fundamental principle of systems design. The work highlights how an agentic approach-allowing for iterative refinement and negotiation between components-can yield more robust and understandable narratives from complex machine learning models. This mirrors the idea that infrastructure should evolve without rebuilding the entire block. As Ken Thompson aptly stated, “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” The research implicitly acknowledges this; a system built on negotiation and incremental improvement offers inherent debuggability and adaptability, allowing explanations to be refined without requiring a complete overhaul of the underlying model or narrative generation process.

Where Do We Go From Here?

The pursuit of explainable AI often feels like an exercise in applied rhetoric. This work, by framing explanation as a negotiation between agents, subtly highlights the fundamental question: what are systems actually optimizing for? Faithfulness to the model’s internal logic is crucial, certainly, but it is insufficient. Coherence, as demonstrated, demands a narrative structure, a telling of a story. Yet, the ultimate metric remains elusive – explanation for whom, and toward what end? A truly agentic system must model not just the machine learning model, but the cognitive state of the explainer’s audience.

Future iterations should resist the temptation to equate complexity with insight. Simplicity is not minimalism; it is the discipline of distinguishing the essential from the accidental. Current methods still largely rely on post-hoc rationalization, applying narratives after a prediction. A more elegant approach would integrate explainability directly into the model’s learning process, building inherently interpretable structures from the outset. This requires a shift in perspective: from explaining what a model did, to understanding why it learned to do it.

Ultimately, the success of XAI will not be measured by the fluency of generated narratives, nor even by metrics of faithfulness and coherence. It will be determined by whether these explanations genuinely facilitate better decision-making, fostering trust and accountability in increasingly complex systems. The challenge, then, lies not in generating more explanations, but in crafting explanations that matter.


Original article: https://arxiv.org/pdf/2603.20003.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-23 20:38