Building Ethical Language Models: A Developer’s Challenge

Author: Denis Avetisyan

A new study examines how well existing AI ethics tools help developers address potential biases and harms in language technologies.

Researchers evaluated Model Cards, ALTAI, FactSheets, and Harms Modeling through interviews with developers building Portuguese language models, finding gaps in nuanced language support and reliance on expert knowledge.

Despite growing awareness of responsible AI development, effectively evaluating and implementing ethical guidelines for language models remains a significant challenge. This is addressed in ‘Evaluation of AI Ethics Tools in Language Models: A Developers’ Perspective Case Study’, which assesses the practical utility of four prominent AI Ethics Tools-Model Cards, ALTAI, FactSheets, and Harms Modeling-through interviews with developers building models for the Portuguese language. The research reveals these tools can guide broad ethical considerations, but fall short in addressing language-specific nuances and require substantial developer expertise to identify potential harms. Will a more granular, context-aware approach to AI ethics tooling be necessary to truly foster responsible language model development?

The Inevitable Documentation Gap: Ethics and the Rise of AI

The accelerating integration of artificial intelligence into daily life has sparked a crucial shift towards prioritizing ethical considerations during its development. Increasingly, stakeholders recognize that AI systems, while offering immense potential benefits, are not inherently neutral; they can perpetuate and even amplify existing societal biases, leading to discriminatory outcomes or unforeseen harms. This growing awareness demands a proactive approach, moving beyond reactive mitigation to embedding ethical principles – fairness, accountability, transparency, and safety – directly into the design, training, and deployment of AI. Such foresight is no longer simply a matter of responsible innovation, but a necessity for fostering public trust and ensuring that these powerful technologies serve humanity equitably and sustainably.

Despite a surge in ethical frameworks for artificial intelligence, translating principles into practice faces considerable hurdles, primarily stemming from inadequate documentation. Many organizations are developing guidelines addressing bias, fairness, and accountability, yet these remain largely aspirational without robust, transparent records of development processes, training data, and algorithmic decision-making. This lack of comprehensive documentation creates a significant challenge for independent audits, risk assessment, and ultimately, ensuring responsible AI deployment. Without standardized and accessible records, verifying adherence to ethical principles becomes exceedingly difficult, hindering the ability to identify and mitigate potential harms before they manifest. The current situation necessitates a shift towards prioritizing ‘explainable AI’ not just in design, but also in the meticulous recording of its creation and function.

The burgeoning field of artificial intelligence is hampered by a critical deficiency in its supporting documentation, creating substantial obstacles to responsible development and deployment. Current practices exhibit a lack of standardized formats and consistently applied metrics, meaning that assessments of algorithmic bias, data provenance, and potential societal impacts are often incomplete or incomparable. This inconsistency extends to accessibility; crucial information regarding model limitations, training datasets, and intended use cases is frequently buried within technical reports, proprietary code, or inaccessible databases. Consequently, independent audits become exceedingly difficult, hindering effective risk assessment and impeding accountability when AI systems produce harmful or unexpected outcomes, ultimately slowing the progress towards trustworthy and ethical AI.

Standardizing the Echo: ModelCards, FactSheets, and the Illusion of Control

ModelCards are a standardized documentation format designed to report key information about machine learning models, facilitating transparency and accountability. These documents typically include details on model intended use, training data, evaluation metrics, limitations, and potential biases. The framework encourages reporting both quantitative performance data and qualitative considerations, such as ethical risks and fairness evaluations. By providing a consistent structure, ModelCards enable stakeholders – including developers, researchers, and end-users – to better understand a model’s capabilities and potential impacts, thereby promoting responsible AI development and deployment. The format is designed to be easily shared and version-controlled, supporting iterative model improvement and ongoing monitoring.

FactSheets establish trust in AI systems by relying on supplier declarations – formally documented statements detailing system capabilities, limitations, performance metrics, and intended use cases. These declarations move beyond high-level descriptions by requiring specific, verifiable claims about the AI’s behavior and the data used in its development. The process involves suppliers attesting to the accuracy of this information, providing a basis for independent validation and accountability. Crucially, FactSheets are designed to facilitate due diligence by downstream users and regulators, enabling informed decision-making and risk assessment based on declared characteristics rather than solely relying on black-box evaluations.

ALTAI (AI Transparency and Accountability Index) offers a detailed checklist designed to comprehensively assess risks associated with AI systems, covering areas such as fairness, privacy, security, and accountability. However, the tool has received criticism for its substantial length and complex terminology, which can hinder usability and make it difficult for non-expert stakeholders to effectively utilize the checklist for risk evaluation. While intended to be exhaustive, the perceived lack of readability poses a challenge to widespread adoption and consistent application across different organizations and AI development teams.

The utility of documentation methods – including ModelCards, FactSheets, and ALTAI – is directly correlated to their integration throughout the entire AI system development lifecycle. Sporadic or post-hoc implementation of these tools yields limited benefit; consistent application, beginning in the initial planning phases and continuing through deployment and monitoring, is crucial for establishing a reliable record of model characteristics, intended use, limitations, and potential risks. This sustained effort facilitates ongoing evaluation, allows for effective mitigation of identified issues, and supports responsible AI governance by providing auditable evidence of due diligence. Without consistent implementation, these methods become static reports rather than dynamic components of a robust AI risk management framework.

The Developer’s Burden: Usability and the Myth of Ethical Integration

Successful integration of AI ethics tools into development workflows requires rigorous usability evaluation. Developer adoption is not solely determined by a tool’s technical capabilities, but also by its ease of use, clarity of documentation, and perceived value in addressing practical ethical concerns. Insufficient usability can lead to developers bypassing these tools, even if they acknowledge their potential benefits. Evaluation should focus on factors such as the time required to utilize the tool, the clarity of its outputs, and the extent to which it integrates with existing development environments and processes. Prioritizing developer experience is therefore critical for maximizing the impact of AI ethics tools and fostering responsible AI development practices.

Effective implementation of AI documentation methods – including ModelCards, FactSheets, and ALTAI – is contingent on understanding developer workflows and identifying practical obstacles to adoption. Research indicates that while developers generally find these tools helpful, their utility is maximized when feedback informs iterative improvements to documentation format and content. Specifically, developers can pinpoint areas where documentation is unclear, incomplete, or fails to address real-world implementation challenges. Gathering developer perspectives allows for the streamlining of documentation processes and the creation of more actionable guidance, ultimately increasing the likelihood that these tools will be integrated into standard AI development practices.

Determining the effectiveness of AI ethics tools (AIETs) in both identifying and mitigating potential harms is a critical step prior to widespread implementation. Justification for integrating these tools into development workflows requires empirical evidence demonstrating a measurable reduction in ethical risks associated with language models. Assessment must move beyond simply flagging potential issues to evaluating the tools’ capacity to guide developers toward concrete mitigation strategies and demonstrably improve model safety. This evaluation should include metrics related to the accuracy of harm identification, the completeness of mitigation recommendations, and the usability of the tools in practical development scenarios, especially when applied to models with unique linguistic characteristics like those operating in Portuguese.

Research indicates a high degree of effectiveness for AI Ethics Tools (AIETs) when used as guides for identifying broad ethical considerations within language models. Specifically, 100% of developers participating in the study reported finding these tools helpful for this purpose. This suggests a strong perceived value and practical utility of AIETs in prompting consideration of ethical implications during language model development. The findings are based on developer evaluations of various AIET methods, and demonstrate a consensus on their usefulness as a starting point for ethical analysis.

Developer evaluations indicate ModelCards are the most highly-regarded AI ethics documentation method tested, achieving an average score of 4.1 out of 5. A majority, 80% of participating developers, specifically identified ModelCards as the ‘best’ tool overall. This assessment suggests a strong preference for the format and information presented within ModelCards, potentially due to their comprehensive nature or ease of integration into existing development workflows. The scoring was based on developer perceptions of usefulness in identifying and mitigating ethical concerns related to language models.

Harms Modeling received an average developer score of 3.6, positioning it as the second most helpful AI Ethics Tool (AIET) evaluated in our research. While ModelCards achieved a slightly higher score, Harms Modeling was specifically identified by participants as being particularly effective in the identification of potential ethical considerations within language models. This suggests that, although all evaluated AIETs provide value as guides, Harms Modeling offers a distinct advantage when the primary goal is to proactively recognize and categorize possible harms before further analysis or mitigation efforts are undertaken.

This research specifically examines the performance of AI ethics tools – including ModelCards, FactSheets, and ALTAI – when applied to Portuguese Language Models (PLMs). PLMs introduce unique challenges not typically found in English-language models due to linguistic complexities, cultural nuances, and differing societal norms reflected in the training data. These factors can impact the identification and mitigation of potential harms, requiring tailored evaluation metrics and potentially different approaches to ethical assessment compared to those used for English models. The study aims to determine the effectiveness of current AIETs in addressing these specific challenges within the context of Portuguese language technology.

The Inevitable Audit: Integrating Ethics into the Lifecycle of Language

Effective deployment of crucial documentation tools-like ModelCards, FactSheets, and ALTAI-isn’t simply an add-on; it requires fundamental integration throughout the entire language model development lifecycle. These methods aren’t post-hoc reports, but rather continuous processes woven into design, training, and evaluation phases. Early incorporation allows developers to systematically track datasets, model capabilities, limitations, and potential biases, fostering a proactive approach to responsible AI. This seamless integration ensures documentation remains current, accurate, and genuinely reflects the model’s behavior, moving beyond superficial transparency to deliver genuinely actionable insights for stakeholders and enabling continuous improvement throughout the model’s lifespan.

The creation of detailed, accessible documentation is paramount to establishing both accountability and trust in artificial intelligence systems, a necessity amplified when these systems are designed for specific cultural contexts. Language models tailored for languages like Portuguese, with their nuanced grammatical structures and cultural idioms, require particularly thorough documentation outlining training data, potential biases, and limitations in performance across diverse dialects and social groups. This transparency allows for external scrutiny, enabling researchers and users to identify and address potential harms stemming from inaccurate or culturally insensitive outputs. By openly detailing the model’s development process and known weaknesses, developers demonstrate a commitment to responsible AI practices and invite collaborative efforts to refine and improve these increasingly powerful technologies, ultimately fostering greater public confidence.

Anticipating and addressing ethical implications throughout the development of language models is crucial for minimizing potential harms and fostering responsible AI. This proactive approach moves beyond simply reacting to issues as they arise; instead, developers can systematically identify biases, ensure fairness, and protect user privacy before deployment. By embedding ethical considerations into the design process – from data collection and model training to evaluation and ongoing monitoring – creators can build systems that are more robust, reliable, and aligned with societal values. This foresight not only reduces the risk of unintended consequences, such as discriminatory outputs or the spread of misinformation, but also cultivates greater public trust and encourages the beneficial adoption of this powerful technology.

The adoption of a robust ethical framework for language model development extends far beyond the successful completion of a single project. By prioritizing transparency and accountability, developers establish a replicable model for responsible AI innovation throughout the industry. This proactive approach signals a commitment to mitigating potential harms-such as bias amplification or the spread of misinformation-and fosters a culture where ethical considerations are integral to the entire development lifecycle. Consequently, this sets a new standard, encouraging peer organizations to adopt similar practices and ultimately accelerating the widespread creation of AI systems built on a foundation of trust and societal benefit.

The evaluation reveals a predictable pattern. These AI Ethics Tools, while intending to mitigate harm, operate as incomplete prophecies. Developers find guidance in frameworks like Model Cards and FactSheets, yet the tools lack the necessary granularity for Portuguese, a language rich in nuance. It echoes a familiar truth: systems aren’t built, they’re grown, and growth demands adaptation. As Ada Lovelace observed, “The Analytical Engine has no pretensions whatever to originate anything.” These tools, like the Engine, require human expertise to interpret and apply ethical considerations – they offer a structure, not a solution. The documentation, inevitably, will reflect the limitations of the foresight applied.

The Path Ahead

The evaluation reveals a familiar pattern: tools intended to constrain systems instead become documentation of their inevitable drift. These AI Ethics Tools, while offering a framework for initial consideration, operate under the assumption that ethical concerns are static, addressable with checklists. The study rightly highlights the need for linguistic sensitivity, but this is merely a symptom. The core problem isn’t a lack of nuance in the tools, but the belief that ‘responsible AI’ can be engineered, rather than continuously cultivated.

Long stability in ethical assessment is not a sign of success, but a harbinger of unaddressed systemic risks. The developer expertise required to effectively use these tools suggests they are not safeguards for the naive, but rather sophisticated instruments for those already aware of the potential pitfalls. The field must move beyond identifying harms and toward understanding the evolution of those harms as models interact with a world they were never truly designed to comprehend.

Future work should focus not on building better tools, but on developing methods for observing and interpreting the emergent properties of these systems. The goal isn’t to prevent failure – that is an illusion – but to build resilience, to anticipate the unexpected shapes these models will take, and to mitigate the consequences when they inevitably deviate from their intended course.

Original article: https://arxiv.org/pdf/2512.15791.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Documentation Gap: Ethics and the Rise of AI

Standardizing the Echo: ModelCards, FactSheets, and the Illusion of Control

The Developer’s Burden: Usability and the Myth of Ethical Integration

The Inevitable Audit: Integrating Ethics into the Lifecycle of Language

The Path Ahead

See also: