Reviving Language Assessment with AI: A Hawaiian Case Study

Author: Denis Avetisyan

Researchers demonstrate a community-driven framework for ethically integrating artificial intelligence into Hawaiian language assessment, balancing psychometric rigor with cultural preservation.

The KĀʻEO item analysis workflow integrates human cultural expertise directly into an AI synthesis-leveraging models like NotebookLM and Claude 3.5 Sonnet-to iteratively refine guidelines and ensure culturally informed content development, effectively positioning humans as active agents <i>within</i> the system rather than simply initial inputs. — The KĀʻEO item analysis workflow integrates human cultural expertise directly into an AI synthesis-leveraging models like NotebookLM and Claude 3.5 Sonnet-to iteratively refine guidelines and ensure culturally informed content development, effectively positioning humans as active agents *within* the system rather than simply initial inputs.

This work details a human-in-the-loop system leveraging large language models to augment assessment workflows while prioritizing Indigenous data sovereignty and culturally responsive measurement.

Traditional psychometric practices often struggle to reconcile rigorous measurement with the nuanced integrity of Indigenous languages and cultural contexts. This paper details a community-based workflow-detailed in ‘Bridging Psychometric and Content Development Practices with AI: A Community-Based Workflow for Augmenting Hawaiian Language Assessments’-that ethically integrates artificial intelligence to enhance the analysis of Hawaiian-medium educational assessments. Findings demonstrate that large language models, governed by a robust AI policy framework and human oversight, can accelerate psychometric analysis while upholding cultural authority and linguistic precision. Could this model offer a replicable pathway for responsible AI integration in Indigenous-language educational measurement more broadly?

Deconstructing Assessment: Reclaiming Knowledge Systems

The KĀʻEO program fundamentally reimagines assessment practices to center Hawaiian language and cultural knowledge. Recognizing the limitations of standardized testing-often designed with Western cognitive models and linguistic structures-KĀʻEO advocates for methods deeply rooted in Hawaiian epistemology and ways of knowing. This shift prioritizes performance-based tasks, observation, and portfolios that authentically capture a student’s competencies within a cultural context. Assessments are collaboratively designed with educators, cultural experts, and community members to ensure relevance and validity, moving beyond mere quantification of skills to a holistic evaluation of a student’s growth as a culturally grounded individual. The program emphasizes that true assessment isn’t simply measuring what a student knows, but rather, demonstrating how they know it, within the rich tapestry of Hawaiian culture and language.

While established psychometric analyses provide valuable frameworks for evaluating performance, their uncritical application can introduce subtle yet significant biases when assessing diverse populations. These tools, often normed on Westernized samples and predicated on specific linguistic structures, may not accurately capture the knowledge, skills, and abilities of individuals from cultures with differing communication styles or worldviews. For instance, a test reliant on direct questioning might disadvantage a respondent raised in a culture emphasizing indirectness and contextual understanding. Similarly, linguistic nuances-idioms, metaphors, and culturally-bound references-can be misinterpreted, leading to inaccurate scoring. Therefore, adapting these established methods-through careful translation, culturally-sensitive item development, and the incorporation of qualitative data-is crucial to ensure assessments truly measure competence, rather than reflecting cultural mismatch or linguistic barriers.

The pursuit of equitable assessment necessitates a fundamental shift towards Indigenous Data Sovereignty, recognizing that communities retain ultimate authority over the collection, ownership, and application of information about themselves. This principle demands more than simply including diverse perspectives; it requires actively centering Indigenous knowledge systems and protocols within the entire assessment process. Ethical stewardship of cultural knowledge, therefore, isn’t about preserving data from communities, but facilitating their self-determination in how that knowledge is utilized, interpreted, and disseminated. Assessments designed with this ethos prioritize community benefit, ensure culturally appropriate methodologies, and actively safeguard against the misappropriation or harmful application of sensitive cultural information, ultimately fostering a system where evaluation serves to empower, rather than perpetuate existing inequalities.

AI as a Lens: Iterative Refinement within KĀʻEO

The AI Lab is implementing a Design-Based Research (DBR) methodology to iteratively improve assessment item development within the KĀʻEO framework. DBR involves cycles of design, implementation, analysis, and refinement, allowing for continuous improvement based on empirical data. This approach moves beyond traditional, linear item creation by integrating AI tools to accelerate the process and provide data-driven insights. By focusing on iterative refinement within the KĀʻEO framework-a system emphasizing culturally relevant and rigorous assessment-the Lab aims to enhance the quality and validity of assessment items while simultaneously building a deeper understanding of effective item design principles.

NotebookLM streamlines the initial analysis of assessment items through the implementation of Document-Grounded Generation (RAG). This process involves retrieving relevant information directly from authoritative KĀʻEO documents to contextualize and validate each item. By grounding the analysis in these source materials, NotebookLM ensures that item evaluations are consistently aligned with established KĀʻEO guidelines and content standards. The system automatically cross-references item content with the KĀʻEO framework, flagging potential discrepancies or areas requiring further review, and thereby improving the accuracy and efficiency of the item analysis phase.

Claude 3.5 Sonnet was utilized to convert outputs from complex psychometric analyses into concise briefs accessible to item developers, thereby streamlining the item refinement process. This application facilitated the efficient review of 58 previously flagged assessment items spanning Hawaiian Language Arts, Mathematics, and Science. The briefs generated by Claude 3.5 Sonnet provided developers with readily understandable summaries of psychometric data, enabling quicker identification of areas for improvement and reducing the time required for iterative refinement. This approach moved the team from raw data analysis to actionable development guidance.

The Human Firewall: Safeguarding Linguistic Authenticity

The KĀʻEO AI Principles and Policy Framework establishes guidelines for the responsible implementation of artificial intelligence, specifically prioritizing human oversight in the evaluation of AI-produced content. This framework recognizes that while AI tools can assist in language processing and content generation, ultimate validation requires the nuanced judgment of human experts. The policy mandates that all AI-generated outputs undergo review by individuals possessing relevant linguistic and cultural knowledge to ensure accuracy, appropriateness, and adherence to established standards. This human-centered approach is considered fundamental to mitigating potential errors, biases, and cultural insensitivity inherent in automated systems, and is a core tenet of the KĀʻEO initiative.

Human-in-the-Loop (HITL) systems are essential for maintaining Linguistic Integrity within the KĀʻEO AI framework by providing crucial verification of AI-generated Hawaiian language outputs. These systems integrate human expertise – specifically, native speakers and cultural specialists – directly into the assessment process to confirm both the grammatical accuracy and the cultural authenticity of the language used. This verification extends beyond simple error detection; HITL review assesses nuances in meaning, contextual appropriateness, and adherence to established linguistic conventions of the Hawaiian language. The implementation of HITL ensures that AI-generated content aligns with the intended meaning and respects the cultural significance embedded within the Hawaiian language, a critical safeguard against inaccuracies or misrepresentations.

While psychometric analysis, enhanced by AI tools, delivers quantifiable data regarding AI-generated content, assessment of cultural appropriateness necessitates qualitative review by cultural experts. An analysis of 58 flagged items demonstrated that, despite quantitative metrics, 6 AI-generated briefs successfully passed review without requiring substantive revisions, indicating a degree of initial linguistic and cultural alignment. This highlights the importance of combining both quantitative and qualitative evaluation methods to ensure the responsible and accurate application of AI in contexts requiring cultural sensitivity and linguistic integrity.

Beyond Compliance: Embedding CARE Principles into Assessment

The KĀʻEO AI Principles and Policy Framework represents a significant advancement in ethical assessment practices by proactively integrating the foundational CARE Principles – Collective Benefit, Authority to Control, Responsibility, and Ethics – into the design and implementation of AI-assisted evaluation systems. This framework doesn’t simply apply these principles as an afterthought; instead, it fundamentally reorients the development process, ensuring that AI tools serve the specific needs and priorities of the communities they assess. By centering Indigenous values from the outset, KĀʻEO establishes a robust mechanism for safeguarding data sovereignty, promoting equitable outcomes, and fostering trust in AI technologies within educational contexts. This holistic approach moves beyond conventional ethical guidelines, creating a proactive system that prioritizes community wellbeing and responsible innovation.

The KĀʻEO AI program prioritizes Indigenous Data Sovereignty, ensuring the Hawaiian community maintains ultimate authority over their data and its applications. This commitment extends beyond simple data ownership; it fundamentally shapes the program’s development, aligning it with Hawaiian values and priorities throughout every stage-from data collection and analysis to the interpretation and use of assessment results. By centering community governance, the AI Lab actively avoids the historical pitfalls of research conducted on Indigenous communities, instead fostering a collaborative partnership where data serves to empower and uplift the Hawaiian language revitalization efforts. This approach guarantees the KĀʻEO program remains culturally relevant, ethically sound, and directly responsive to the needs and aspirations of the community it serves, safeguarding both cultural heritage and the integrity of the assessment process.

The KĀʻEO AI program demonstrates that culturally-grounded AI development can simultaneously improve assessment validity and uphold Indigenous rights. Rigorous analysis of assessment items revealed specific areas for improvement; notably, two items displayed exceptionally low discrimination indices – 0.122 and 0.03 respectively – which would likely have gone undetected without this enhanced validation process rooted in the CARE Principles. This meticulous approach, extending beyond mere technical accuracy to prioritize collective benefit and responsible data stewardship, establishes a replicable model for ethical AI implementation in other Indigenous language revitalization efforts, proving that technological advancement and cultural preservation can, and should, proceed in tandem.

The work detailed in this paper embodies a spirit of playful inquiry, meticulously dissecting established assessment practices to reveal underlying assumptions. It acknowledges that seemingly rigid systems – in this case, psychometric analysis – are ultimately constructions, susceptible to re-evaluation and improvement. This resonates with Ken Thompson’s observation: “Software is a gas; it expands to fill the available computing power.” Just as software stretches to utilize resources, this project demonstrates how AI, when thoughtfully integrated and governed by a community prioritizing Indigenous data sovereignty, expands the possibilities for culturally responsive measurement. The human-in-the-loop approach isn’t about replacing expertise, but rather augmenting it, finding new ways to ‘fill the available’ analytical power while upholding ethical considerations.

What’s Next?

The successful integration demonstrated here isn’t about building better assessments-it’s about exposing the inherent fragility of the construct itself. The workflow, while robust, merely highlights how much of ‘proficiency’ is an artifact of the questions asked, the scoring rubrics applied, and, now, the algorithmic biases embedded within the LLMs. This isn’t a failure, but a necessary demolition of assumed truths. The real exploit of comprehension isn’t the AI’s ability to parse language, but the system’s capacity to force a reckoning with what ‘knowing’ a language actually means.

Future work must move beyond simply mitigating bias – a perpetual game of whack-a-mole – and confront the fundamental limitations of applying psychometric models to culturally situated knowledge. The commitment to Indigenous data sovereignty isn’t merely ethical; it’s a methodological imperative. The challenge lies in developing assessment frameworks that defer to community expertise, using AI not as an evaluator, but as a tool for surfacing and validating already-held understandings.

Ultimately, the field needs to consider whether ‘assessment’-as traditionally conceived-is the appropriate goal. Perhaps the true innovation lies in building systems that facilitate language revitalization, using AI to create dynamic, personalized learning experiences grounded in cultural context. That would be a truly disruptive exploit – one that bypasses the limitations of measurement altogether.

Original article: https://arxiv.org/pdf/2512.17140.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Deconstructing Assessment: Reclaiming Knowledge Systems

AI as a Lens: Iterative Refinement within KĀʻEO

The Human Firewall: Safeguarding Linguistic Authenticity

Beyond Compliance: Embedding CARE Principles into Assessment

What’s Next?

See also: