Beyond the Algorithm: Auditing Decisions with Human Self-Description

Author: Denis Avetisyan


A new approach measures the alignment between how algorithms represent individuals and how those individuals represent themselves, offering a crucial check on algorithmic fairness.

Representation Fidelity provides a method for auditing algorithmic decision-making by quantifying the distance between externally-defined representations and individual self-descriptions.

Algorithmic decision-making increasingly impacts human lives, yet validating the grounds upon which these decisions rest remains a critical challenge. This paper, ‘Representation Fidelity:Auditing Algorithmic Decisions About Humans Using Self-Descriptions’, introduces a novel approach-Representation Fidelity-to audit such systems by quantifying the distance between externally-defined representations and individuals’ self-descriptions. We demonstrate that discrepancies between these representations reveal the degree to which algorithmic assessments are grounded in reasonable, individual-specific characteristics, and introduce the Loan-Granting Self-Representations Corpus 2025 for benchmarking. Could a wider adoption of Representation Fidelity metrics foster more transparent and equitable algorithmic systems?


The Illusion of Representation: Beyond Prediction to Understanding

Contemporary algorithmic decision systems routinely construct intricate representations of individuals to facilitate predictions and automate judgments. However, the very process of translating human characteristics into quantifiable data can introduce opacity and, critically, amplify existing societal biases. These representations, often derived from numerous data points and processed through complex machine learning models, frequently lack transparency – the mechanisms transforming raw data into these profiles remain obscured. This lack of interpretability isn’t merely a technical challenge; it raises serious concerns about fairness, as biases embedded within the training data or the algorithms themselves can be unwittingly encoded into these representations, leading to discriminatory outcomes. Consequently, individuals may be unfairly categorized or denied opportunities based not on their actual merits, but on distorted or prejudiced digital profiles.

Current validation practices for algorithmic systems predominantly prioritize predictive accuracy – how well an AI forecasts an outcome – but often neglect a far more fundamental concern: representational fidelity. This means that while a system might successfully predict a characteristic, the internal representation it uses to do so may not actually reflect the nuanced reality of the individual being assessed. A model could, for example, accurately predict credit risk without understanding the underlying financial circumstances, relying instead on proxy variables that encode societal biases. Consequently, even highly accurate systems can perpetuate unfair or discriminatory outcomes if the representations they construct are distorted, incomplete, or fail to capture the full complexity of the individuals they are intended to serve. Ensuring representational fidelity, therefore, is not merely a technical refinement but a critical step towards building truly ethical and trustworthy AI.

The limitations in validating AI representations extend beyond mere inaccuracies; they strike at the core of responsible AI development. A system’s inability to accurately reflect the individuals it assesses introduces the potential for systemic unfairness, where decisions are based not on genuine characteristics, but on distorted or incomplete data proxies. This lack of fidelity directly impedes explainability, as justifications for algorithmic outcomes become obscured by the opaque transformation from individual attributes to numerical representations. Consequently, the ethical deployment of these systems is jeopardized, demanding a shift in focus towards representational faithfulness – ensuring AI not only predicts accurately, but also portrays individuals with integrity and nuance before rendering any decision that impacts their lives.

Eliciting the Self: Anchoring Algorithms in Narrative

Large Language Models (LLMs) are utilized to elicit ā€œSelf-Descriptionsā€ directly from individuals, functioning as a method for capturing personally-defined characteristics and contextual circumstances. This process involves prompting individuals to generate free-text narratives regarding their own attributes, experiences, and perspectives. The resulting text is then processed and stored as a representation of the individual’s self-perception. Unlike traditional methods relying on pre-defined categories or structured questionnaires, LLM-generated Self-Descriptions offer a flexible and open-ended format, allowing for the expression of complex and nuanced self-understandings. The LLM acts as an interface, translating individual input into a textual form suitable for analysis and comparison with algorithmic representations.

Traditional methods of assessing representation quality rely heavily on structured data – quantifiable metrics like demographic statistics or labeled attributes. However, these approaches often fail to capture the complexities of individual self-perception and lived experience. Utilizing natural language generation allows for the evaluation of representation based on qualitative, nuanced descriptions of individuals themselves. This shifts the focus from algorithmic accuracy against predefined categories to how well a system can articulate an individual’s self-defined characteristics and circumstances, providing a more human-centric and comprehensive measure of representational fidelity. This qualitative assessment is not intended to replace structured data evaluations, but to supplement them with insights into the subjective experience of being represented.

Establishing a verifiable link between algorithmic representations and individuals relies on anchoring those representations in self-reported narratives. This process moves beyond reliance on proxy data or externally-defined attributes by directly incorporating an individual’s own description of their characteristics and circumstances. The generated ā€˜Self-Description’ acts as a reference point; algorithmic outputs can then be evaluated for consistency with this narrative, providing a measurable and auditable connection. Discrepancies between the algorithmic representation and the individual’s self-description can be identified and addressed, ensuring the representation accurately reflects the person it intends to model and allowing for iterative refinement of the underlying algorithms.

Measuring the Distance: A Fidelity Metric Rooted in Semantics

To quantify the alignment between algorithmic representations and self-perception, we utilize Word Mover’s Distance (WMD). WMD calculates the minimum cumulative distance required to ā€˜move’ the embeddings of words in one text to match the words in another, effectively measuring semantic dissimilarity. We employ pre-trained GloVe embeddings – specifically, 50-dimensional vectors – to represent each word in both the algorithmic input features and the generated Self-Descriptions. The resulting WMD score, therefore, represents the semantic distance between these two representations; lower scores indicate greater similarity, while higher scores denote a larger divergence in meaning. This approach allows for a numerical assessment of how closely the algorithmic representation captures the essence of an individual’s self-described characteristics.

Representation Mismatch, as quantified by our fidelity metric, assesses the divergence between algorithmic and self-reported representations of an individual. Specifically, the metric calculates the semantic distance-using Word Mover’s Distance on GloVe embeddings-between the input features used by the algorithm and the textual Self-Descriptions provided by the individual. A larger distance indicates a greater discrepancy, suggesting the algorithm’s internal representation does not adequately capture how the individual perceives and describes themselves. This quantifiable measure allows for systematic identification of instances where algorithmic representations deviate from self-perceptions, enabling analysis of potential biases or inaccuracies in the algorithmic model.

Analysis of the German Credit Dataset revealed a Pearson correlation coefficient of [latex]r = 0.5[/latex] between the Word Mover’s Distance – quantifying the semantic difference between algorithmic and self-reported representations – and the volume of additional, manually identified information present in the self-descriptions. This statistically significant, though weak positive correlation, validates the feasibility of using embedding distance as a quantifiable metric for ā€˜Representation Mismatch’. Specifically, larger distances between embeddings corresponded to a tendency for individuals to provide more supplementary details in their self-descriptions, suggesting instances where the algorithmic representation failed to fully capture the individual’s self-perception and necessitating further elaboration.

The Echo of Representation: Implications for Trustworthy AI

Algorithmic validation typically centers on predictive performance – how accurately a system forecasts outcomes. However, representation fidelity introduces a crucial, complementary layer of scrutiny. This approach assesses whether the internal representation of data within an algorithm – its understanding of individuals or concepts – aligns with the nuanced reality of what it represents. A system might achieve high predictive accuracy while still harboring distorted or incomplete understandings, potentially perpetuating harmful biases or overlooking critical details. By evaluating representation fidelity, researchers and developers can move beyond simply asking ā€˜does it work?’ to also understanding ā€˜what does the algorithm think is happening?’, fostering more robust, equitable, and trustworthy artificial intelligence systems. This shift acknowledges that a system’s internal worldview fundamentally shapes its decisions, even if those decisions appear superficially correct.

The identification of systematic discrepancies between how algorithms represent individuals and how those individuals perceive themselves – a typology of representation mismatches – offers a powerful pathway towards mitigating algorithmic bias. This framework moves beyond simply assessing predictive accuracy to examine how algorithms construct understandings of people, revealing potential distortions in those representations. By categorizing these mismatches – for instance, discrepancies in stated values, expressed preferences, or perceived attributes – researchers can pinpoint the specific areas where algorithmic representations diverge from self-perception. Consequently, this detailed understanding enables targeted interventions, such as refining training data, adjusting algorithmic parameters, or incorporating fairness constraints, ultimately fostering more equitable and responsible AI systems. The ability to systematically analyze and address these representational gaps promises a significant advancement in ensuring that algorithmic decision-making aligns with individual realities and values.

The process of discerning relevant information within self-descriptions presents a significant challenge, as evidenced by the inter-annotator agreement analysis which yielded an F1 score of 0.765 under relaxed criteria, but a considerably lower 0.340 with stricter evaluation. This discrepancy underscores the inherent subjectivity and nuance involved in interpreting self-perception data, even among human observers. Consequently, future research must prioritize expanding this analytical framework beyond the current application and investigate alternative methodologies for capturing self-perception, potentially leveraging computational techniques to reduce reliance on manual annotation. Such advancements are not merely academic exercises; they are crucial steps toward building more responsible and reliable artificial intelligence systems that accurately reflect and respect individual self-representation.

The pursuit of Representation Fidelity, as outlined in this work, echoes a fundamental truth about complex systems. It acknowledges that attempts to perfectly define a human through algorithmic input – to create a static, external representation – are inherently flawed. This resonates deeply with the observation that ‘every architectural choice is a prophecy of future failure.’ The distance between externally-prescribed representations and self-descriptions isn’t merely a technical problem to be minimized; it’s an inevitable consequence of reducing a dynamic, self-aware entity to a set of data points. The study correctly frames this mismatch not as a bug, but as a characteristic of all systems attempting to model reality. Order, in this context, is simply a temporary reprieve, a cache built before the next inevitable outage of perfect representation.

What’s Next?

The pursuit of ā€˜Representation Fidelity’ illuminates a fundamental tension: systems designed to categorize humans invariably define those categories for them. This work rightly shifts attention from the algorithmic black box to the input itself, but merely measuring the distance between prescribed representations and self-description is a diagnostic, not a cure. The true challenge lies in acknowledging that any representation is, at best, a provisional map of an ever-shifting territory. Monitoring is the art of fearing consciously; the observed discrepancies aren’t errors to be corrected, but signals of inevitable divergence.

Future investigations must address the inherent instability of ā€˜self-description’ itself. What constitutes a meaningful self-portrait changes with context, time, and the very act of observation. The field needs to move beyond static comparisons, exploring dynamic models that account for the evolution of individual characteristics and the complex interplay between external categorization and internal identity.

True resilience begins where certainty ends. The goal isn’t to eliminate representation mismatch-that is, a fool’s errand-but to build systems capable of gracefully absorbing it. The architecture should anticipate its own failures, providing mechanisms for renegotiating representations, challenging assumptions, and ultimately, relinquishing control. That’s not a bug – it’s a revelation.


Original article: https://arxiv.org/pdf/2603.05136.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-08 18:54