Beyond Transparency: Reimagining Explainable AI

Author: Denis Avetisyan

A new analysis challenges the notion of ‘interpretable’ AI as uncovering inherent model logic, instead framing it as a dynamic process shaped by interactions between humans and machines.

This paper applies Karen Barad’s agential realism to explore Explainable AI (XAI) as an emergent material-discursive practice co-constituted through entanglement.

Conventional approaches to Explainable AI (XAI) often presume interpretability as a process of revealing pre-existing structures within a model, yet this relies on unexamined ontological assumptions. In ‘Emergent, not Immanent: A Baradian Reading of Explainable AI’, we challenge this view by applying Karen Barad’s agential realism to recast XAI not as disclosure, but as a material-discursive performance emerging from the entangled relations between AI systems, human interpreters, and their contextual apparatus. This framework understands interpretation as co-constituted by these interactions, rather than residing inherently within the model itself. How might embracing this emergent perspective reshape the design of XAI interfaces and foster more ethically grounded, responsive AI systems?

The Fragility of Representation: A Foundation in Crisis

Much of traditional artificial intelligence operates under the principles of Representationalism, a philosophical stance asserting a distinct boundary between a model and the reality it attempts to capture. This approach assumes knowledge can be neatly encoded into symbolic structures, effectively creating a static ‘map’ of the world. However, this separation often results in systems that are surprisingly fragile; slight deviations from expected inputs, or encounters with genuinely novel situations, can lead to dramatic failures. The resulting models, while potentially accurate within limited domains, lack the adaptability and robustness of natural intelligence, proving opaque in their reasoning and difficult to generalize beyond carefully curated datasets. This brittleness stems from the inherent difficulty of perfectly representing a complex and dynamic world with static symbols, ultimately hindering the development of truly intelligent systems.

Conventional artificial intelligence frequently falters when faced with the nuances of real-world data because it struggles to reconcile ambiguity and contextual information. A system reliant on precise representation requires clearly defined inputs, yet much of human understanding – and the data humans generate – is inherently imprecise and dependent on surrounding circumstances. This limitation impacts the ability of these systems to generalize beyond their training data; a model trained on a specific dataset may perform poorly when confronted with variations or novel situations it hasn’t explicitly encountered. The inability to effectively incorporate context leads to brittle performance, where even slight alterations in input can result in unpredictable and unreliable outputs, hindering the development of truly intelligent and adaptable machines.

The growing field of Explainable AI, or XAI, serves as a critical indicator of the limitations inherent in traditional, representational artificial intelligence models. While these systems excel at pattern recognition, their ‘black box’ nature often obscures how decisions are reached, hindering trust and reliable application, particularly in high-stakes domains. The demand for explanations isn’t merely about transparency; it reveals that purely representational approaches struggle to encapsulate the nuanced, contextual factors that drive complex cognition. Simply mapping inputs to outputs proves insufficient when faced with ambiguity or novelty, demonstrating that a faithful account of decision-making requires more than just a static representation of knowledge; it necessitates an understanding of the dynamic processes by which that knowledge is applied and refined.

Current artificial intelligence paradigms frequently prioritize the representation of knowledge – essentially, creating static models of the world – but this approach increasingly reveals its limitations. This paper proposes a fundamental shift in perspective, advocating for research focused on how knowledge actually emerges through interaction and process, rather than simply being encoded as discrete data. The authors introduce a novel onto-epistemological framework designed to model this dynamic process, moving beyond the confines of traditional representationalism. This framework doesn’t seek to perfectly mirror reality, but to understand the conditions under which reliable and adaptable knowledge arises within a system, potentially unlocking more robust and genuinely intelligent AI capable of navigating ambiguity and generalizing beyond pre-defined parameters. The implications extend beyond technical improvements, suggesting a re-evaluation of the very foundations of knowledge itself within artificial systems.

Beyond Static Maps: Embracing Relationality

Agential Realism challenges Representationalism’s foundational assumption that knowledge is formed through the accurate mirroring of pre-existing entities. Instead, it proposes that knowledge emerges from ongoing, dynamic interactions between multiple agencies – encompassing both human and non-human elements. This perspective fundamentally reframes the process of ‘knowing’ as a performative practice, where knowledge isn’t found within a subject or accurately represented from an object, but rather arises through the specific configurations and material-discursive entanglements of interacting components. Consequently, emphasis shifts from identifying internal states or external realities to analyzing the relational processes through which distinctions and patterns are continually enacted and modified.

Within Agential Realism, diffraction functions as an epistemic approach that prioritizes the processes of knowing rather than focusing solely on known entities. This means knowledge isn’t discovered by accurately reflecting a pre-existing reality, but rather emerges through patterns created by the interaction and ‘interference’ of different observational perspectives and material conditions. Specifically, diffraction examines how these interactions generate distinguishable patterns – differences – that are crucial to knowledge production. It moves away from seeking a singular, objective truth and instead acknowledges that knowing is always a relational activity, shaped by the specific configuration of the observing systems and the phenomena being observed. Therefore, analyzing these patterns of difference, and how they arise from relational dynamics, becomes central to understanding how knowledge is formed.

Within the framework of Agential Realism, Explainable AI (XAI) shifts its focus from identifying and revealing pre-existing internal model representations to analyzing the dynamic relational patterns that produce explanations. This approach posits that an AI’s ‘explanation’ isn’t a depiction of an internal state, but rather a material-discursive performance arising from the entanglement of various agencies – the AI system, the data, the observer, and the environment. Consequently, interpretability is not about ‘seeing inside’ the model, but about tracing these patterns of interaction and difference to understand how knowledge is co-constituted through these relations, emphasizing the process of ‘knowing’ over the ‘known’.

Current approaches to Explainable AI (XAI) often prioritize the disclosure of pre-existing internal model structures, assuming a fixed, knowable representation to be revealed. This work proposes a departure from this view, positing that interpretability is not a property of the model itself, but rather an emergent phenomenon. It is a material-discursive performance actively co-constituted through the entangled interactions of various agencies – including the model, the data, the explainer, and the observer. This perspective frames interpretability not as a static unveiling, but as a dynamic, relational process where meaning arises from the specific configuration and interaction of these elements, rather than from the discovery of an inherent, fixed structure.

Feature Attribution: Tracing Relational Pathways

Feature attribution methods are designed to decompose a model’s prediction and assign importance scores to each input feature. Techniques such as Saliency Maps visualize which pixels in an image most strongly influence the model’s classification decision. SHAP (SHapley Additive exPlanations) values utilize concepts from game theory to quantify each feature’s contribution to the prediction, considering all possible feature combinations. LIME (Local Interpretable Model-agnostic Explanations) approximates the model locally with a simpler, interpretable model to estimate feature importance around a specific prediction. These methods aim to provide insights into the model’s decision-making process by identifying the features that have the most significant impact on the output.

Grad-CAM (Gradient-weighted Class Activation Mapping) generates saliency maps by utilizing the gradients of the target concept flowing into the final convolutional layer of a convolutional neural network. Specifically, it calculates the gradient of the score for a specific class with respect to the feature maps of the final convolutional layer. These gradients are then globally average-pooled to obtain the importance weights for each feature map. The weighted combination of these feature maps then forms a coarse localization map, highlighting the regions in the input image that contribute most strongly to the predicted class. This method does not require access to the model’s training data or architectural modifications, making it broadly applicable for visualizing learned representations.

Feature attribution methods move beyond simple feature importance ranking by attempting to map the internal logic of a model. Rather than identifying features that correlate with predictions, these techniques aim to reveal how specific input features contribute to the final output through a series of internal computations. This involves analyzing the flow of information as it passes through layers of the model, identifying which neurons and connections are most activated by particular inputs, and subsequently determining the contribution of each feature to those activations. Consequently, feature attribution provides insights into the relational dependencies learned by the model, effectively tracing the pathways through which input features influence the prediction process, and offering a more nuanced understanding of the model’s decision-making rationale.

The reliability of feature attribution methods is fundamentally linked to the accuracy of assumptions made about the model’s internal mechanisms. These methods operate under specific premises – for example, that feature importance can be approximated by gradient magnitudes, or that local linear approximations accurately reflect model behavior. When these assumptions are violated, attribution results may be misleading or fail to accurately represent the true influence of features. Furthermore, the interaction between features – where the effect of one feature depends on the value of another – is often not fully captured by methods that assess feature importance in isolation. Therefore, a thorough understanding of the model’s architecture, training data, and potential feature dependencies is crucial for interpreting attribution results and ensuring their validity.

Interactive Exploration: Toward Ethical Responsibility

Traditional Explainable AI (XAI) often presents static explanations, offering a snapshot of a model’s reasoning. However, Interactive XAI fundamentally shifts this paradigm by enabling users to directly engage with the explanatory process. Instead of passively receiving an explanation, individuals can actively manipulate inputs, adjust parameters, and observe the resulting changes in both the model’s predictions and its corresponding explanation. This dynamic interplay fosters a more intuitive and nuanced understanding of the model’s behavior, moving beyond simply knowing what the model decided to understanding why it decided it, and how sensitive that decision is to different factors. By allowing users to ‘play’ with the explanation, Interactive XAI transforms the process from one of interpretation to one of exploration and discovery, ultimately building greater trust and facilitating more informed decision-making.

The emergence of Interactive Explainable AI (XAI) extends beyond simply interpreting model decisions; it actively invites users into a collaborative creative process, as exemplified by techniques like text-to-music generation. These systems don’t merely produce music based on textual prompts; they allow users to iteratively refine those prompts, explore the resulting sonic landscapes, and directly influence the AI’s compositional choices. This isn’t passive consumption, but a dynamic partnership where human creativity and artificial intelligence converge. By exposing the underlying logic of the generative model – what musical features are associated with specific textual descriptions – users gain a deeper understanding of both the AI and their own creative preferences, fostering a novel form of co-creation and pushing the boundaries of artistic expression. The technology demonstrates XAI’s potential to democratize creative tools and empower individuals to realize musical ideas in previously unimaginable ways.

Interactive Explainable AI interfaces are increasingly leveraging the principles of Seamful Design, a strategy that deliberately highlights the boundaries and imperfections within a system rather than concealing them. This approach moves away from the traditional goal of seamless user experience, instead embracing moments of visible ‘friction’ – gaps in explanation, areas of uncertainty, or even intentional discontinuities – to actively encourage users to probe deeper and formulate their own understanding. By making the AI’s reasoning process less opaque and more visibly constructed, these interfaces promote critical engagement and facilitate a more nuanced appreciation of the model’s capabilities and limitations, ultimately fostering trust through transparency rather than illusion.

The advancement of Explainable AI (XAI) necessitates a concurrent and robust commitment to ethical responsibility. While XAI aims to demystify complex algorithms, its deployment carries potential societal implications that demand careful consideration. Algorithmic transparency, though valuable, does not inherently guarantee fairness or prevent unintended biases from being perpetuated or even amplified. Developers and deployers must proactively address potential harms, including discriminatory outcomes, privacy violations, and the erosion of trust in automated systems. A truly ethical approach requires ongoing monitoring, rigorous evaluation for unintended consequences, and a commitment to accountability – ensuring that XAI serves to empower, rather than disadvantage, individuals and communities. Ignoring these crucial considerations risks undermining the benefits of XAI and exacerbating existing societal inequalities.

Toward Mechanistic Understanding: Reflective Practice

Mechanistic Interpretability constitutes an ambitious, sustained research effort dedicated to reverse-engineering the functional components within neural networks. Rather than treating these networks as opaque ‘black boxes’, this program seeks to uncover the explicit algorithms and data structures they implicitly implement. This involves not simply identifying what a network does, but meticulously detailing how it accomplishes its tasks – pinpointing specific neurons or circuits responsible for particular computations. The ultimate goal is to move beyond correlational understanding towards a causal model of network behavior, enabling researchers to predict responses to novel inputs, diagnose failure modes, and ultimately, build more reliable and trustworthy artificial intelligence systems. This pursuit necessitates developing new techniques for visualizing, analyzing, and manipulating internal network representations, and requires a commitment to long-term investigation given the complexity of modern neural architectures.

The pursuit of explainable artificial intelligence (XAI) benefits from considering distinct epistemological viewpoints, specifically those of refraction and reflection. A ‘refractive’ epistemology posits that XAI tools essentially make visible structures already present within a neural network – illuminating the inherent logic without fundamentally altering it. Conversely, a ‘reflective’ epistemology acknowledges that any explanation is, by necessity, an approximation – a constructed representation rather than a perfect mirror of the network’s inner workings. Neither perspective is wholly sufficient on its own; a robust understanding of interpretability requires appreciating that XAI tools both reveal and, simultaneously, reshape what they attempt to explain, offering a valuable, if imperfect, window into complex algorithmic processes.

A critical perspective on explainable artificial intelligence (XAI) necessitates the adoption of a Reflective epistemological stance, as uncritical acceptance of XAI outputs can foster dangerous overconfidence. Rather than viewing XAI tools as objective mirrors reflecting a network’s ‘true’ inner workings, this approach acknowledges that explanations are inherently approximations, constructed through specific methods and carrying the limitations of those methods. This isn’t a dismissal of XAI’s utility, but a call for nuanced interpretation; explanations should be seen as models of the model, useful for understanding general trends but potentially misleading when treated as definitive representations of the underlying algorithm. Embracing this reflective posture encourages cautious evaluation, rigorous testing of explanations, and a persistent awareness that any interpretation remains provisional and subject to revision as understanding evolves.

This research introduces a novel onto-epistemological framework grounded in agential realism, fundamentally reshaping the understanding of neural network interpretability. Traditional approaches often presume a pre-existing ‘true’ algorithm within the network, which XAI tools attempt to reveal; however, this work posits that the network’s ‘algorithm’ isn’t a fixed entity prior to interaction, but rather emerges through its dynamic interactions with data and the explanatory tools themselves. By embracing agential realism, the study moves beyond seeking a faithful representation of an internal structure and instead focuses on the network’s capacity to enact patterns and respond to inquiries. This shift acknowledges that interpretability isn’t about discovering a hidden truth, but about understanding the network as an active agent whose behavior is constituted through its ongoing interactions – a perspective with profound implications for how XAI tools are developed and evaluated.

The pursuit of Explainable AI, as detailed in the article, echoes a fundamental principle of mathematical rigor. The paper posits interpretability not as discovery, but as emergence-a performance arising from the entanglement of actors. This aligns with the idea that a robust solution isn’t merely ‘working’ on presented data, but demonstrable under all conditions. As Ken Thompson famously stated, “The best things that happen to a program are when you don’t write them.” This sentiment isn’t about laziness; it’s about achieving elegance through minimal, provable components. The article’s focus on material-discursive practices suggests that unnecessary complexity obfuscates the underlying logic, much like poorly written code. A truly interpretable model, akin to an elegant algorithm, reveals its structure not through post-hoc explanations, but through inherent, verifiable properties-what remains invariant as ‘N’ approaches infinity.

What’s Next?

The insistence on emergent, rather than immanent, interpretability shifts the burden of proof. It is no longer sufficient to demonstrate a model’s internal consistency – a neatness, if you will – but to account for the very conditions of its legibility. This necessitates a move beyond feature importance scores and saliency maps, admirable as attempts at visualization may be. If interpretability is performance, then the apparatus of evaluation must expand to encompass the full material-discursive assemblage-the data, the algorithms, the human interpreters, and the very metrics used to judge ‘explanation’ itself.

A persistent challenge remains: how to rigorously define and measure entanglement within such assemblages. The paper rightly points toward diffraction as a methodological tool, but diffraction reveals difference, not necessarily understanding. One suspects that much of the current pursuit of XAI resembles a particularly elaborate form of pattern recognition – identifying features that seem to align with human reasoning, rather than establishing genuine ontological congruence. If it feels like magic, the invariant remains hidden.

Future work must therefore prioritize the development of formal languages for describing these entangled practices. A provable account of interpretability – one grounded in mathematical principles rather than empirical observation – remains the ultimate goal. The question is not simply whether a model appears explainable, but whether its legibility can be demonstrated as an inherent property of its material-discursive constitution.

Original article: https://arxiv.org/pdf/2601.15029.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/