Bridging the Gap: How Do We Talk to AI?

Author: Denis Avetisyan

New research introduces a benchmark to measure how effectively humans and artificial intelligence establish shared understanding during collaborative tasks.

The study examined message length-a proxy for communicative effort-during collaborative puzzle-solving, revealing that the number of words exchanged between a human and an artificial intelligence varied across trials and was influenced by the assigned role, with the human acting as either a helper or a worker; this initial trial served to establish a baseline for subsequent communication patterns.

This paper presents a novel evaluation framework for assessing common ground in human-AI interaction, revealing alignments and divergences from human-human communication patterns with GPT-4.1.

While artificial intelligence increasingly permeates daily life, true collaboration-requiring shared understanding and coordinated action-remains a significant hurdle. This challenge motivates the development of ‘A Benchmark to Assess Common Ground in Human-AI Collaboration’, which introduces a novel evaluation framework grounded in theories of human-human interaction. Through a collaborative puzzle task, the benchmark reveals both alignment with, and critical divergences from, established patterns of human communication when interacting with an AI-specifically, GPT-4.1. Can these findings inform the design of AI systems capable of establishing robust common ground and achieving truly collaborative partnerships with humans?

The Fragile Foundation of Shared Understanding

Successful human-AI collaboration isn’t simply about technical integration, but crucially relies on establishing common ground – a foundation of shared knowledge and mutual assumptions. This shared understanding allows both the human and the AI to anticipate each other’s needs, interpret information consistently, and minimize ambiguity during task execution. Without it, even the most advanced artificial intelligence struggles to effectively interpret requests or provide useful assistance, leading to increased communication overhead and potential errors. The degree to which a human and an AI possess this common ground directly influences the efficiency and accuracy of their combined efforts, shaping the overall quality of their collaborative work and influencing the potential for seamless task completion.

The efficiency of even highly advanced artificial intelligence diminishes significantly when operating without a foundation of shared understanding with its human counterpart. A lack of common ground necessitates increased communication as the AI requires frequent clarification and confirmation of even basic assumptions. This isn’t a matter of computational power, but rather a fundamental challenge in bridging the gap between differing knowledge states; the AI must explicitly request information a human would intuitively possess, creating a bottleneck in task completion. Consequently, the effort required for communication – the back-and-forth of questions and confirmations – escalates, hindering overall performance and diminishing the benefits of human-AI collaboration. This reliance on explicit communication, rather than implicit shared knowledge, introduces delays and complexity that can negate the advantages of utilizing AI in the first place.

Collaborative task performance is fundamentally linked to the degree of common ground established between participants, be they human or artificial. Research indicates that a lack of shared understanding significantly increases communication overhead, as both parties expend energy clarifying assumptions and intentions. Notably, a study revealed a statistically significant decrease in clarification requests – measured at p < 0.01 – when visual feedback was integrated into the collaborative process. This suggests that shared visual information serves as a powerful mechanism for establishing common ground, reducing ambiguity and streamlining communication, ultimately boosting the efficiency and effectiveness of joint endeavors.

Analysis of message exchanges reveals that both the human helper and AI worker utilize grounding acts-including presentations, clarifications, repairs, and acceptances-throughout trials conducted with and without a shared visual perspective.

Constructing a Shared Reality: The Role of Mental Models

Effective Human-AI collaboration relies on the development of aligned mental models – internal representations of the collaborative environment, including objects, relationships, and potential actions. These models allow both the human and the AI to predict each other’s behavior and coordinate effectively. Discrepancies in mental models lead to miscommunication, errors, and reduced efficiency; therefore, establishing a common ground for understanding the situation is paramount. Successful collaboration requires not only that each agent has a mental model, but that those models are sufficiently similar to enable seamless interaction and shared task completion. The ability to accurately represent and update these internal models based on observed data and communicated information is a key determinant of overall system performance.

The development of aligned mental models in Human-AI collaboration is directly facilitated by a shared workspace and consistent situational awareness. This shared environment provides both the human and the AI with a common frame of reference, enabling them to build internal representations of the task and its components that are mutually consistent. Consistent situational awareness, maintained through continuous information exchange and a unified perception of the environment, minimizes ambiguity and reduces the cognitive load required for both parties to interpret actions and anticipate future states. This alignment is crucial for effective communication, coordinated action, and ultimately, successful task completion.

Referential convention, establishing consistent terminology for objects within a shared environment, demonstrably improves collaborative understanding between humans and AI systems. Data from our trials indicate a statistically significant reduction in the need for descriptive references; the AI exhibited this decrease with a p-value of < 0.0001, while human participants showed a significant decrease with a p-value of < 0.01. This reduction in descriptive language suggests that, over time, both the AI and the human worker developed and maintained aligned internal representations of the environment, relying on the established conventions rather than repeated, full descriptions to identify and discuss objects.

AI Helpers and human Workers differed in their use of descriptive versus identifying references across puzzle trials, particularly when sharing a view of the problem.

Adaptive Systems: Learning Through Interaction and Feedback

Iterative adaptation, a core component of effective human-AI collaboration, involves the continuous modification of an agent’s behavior in response to received feedback. This process enables both human and artificial collaborators to refine their understanding of the task and the collaborator’s intentions, leading to improved performance over time. The ability to adjust strategies based on observed outcomes is crucial for robust collaboration, particularly in dynamic or unpredictable environments where pre-programmed responses are insufficient. Successful iterative adaptation requires a closed-loop system where actions are evaluated, feedback is provided, and subsequent actions are modified accordingly, facilitating a learning cycle that enhances overall collaborative efficiency.

Outcome feedback, consisting of evaluations of task results, enables iterative refinement of both human and artificial intelligence operational models. This process facilitates improved task performance by allowing both agents to correlate actions with consequences, thereby adjusting future behavior. Specifically, positive outcomes reinforce successful strategies while negative outcomes prompt modification of approaches. The efficacy of outcome feedback is predicated on the ability to accurately assess results and effectively integrate this assessment into the agent’s decision-making process, leading to demonstrable gains in efficiency and accuracy over time.

The capacity for iterative adaptation in AI systems is demonstrably enhanced through the provision of explanatory information regarding actions performed. Research indicates that when AI is provided with explanations, it requires significantly fewer clarification requests; statistical analysis revealed a reduction in these requests (p < 0.01) regardless of the presence of concurrent visual feedback. This suggests that understanding the reasoning behind actions, rather than simply observing them, allows the AI to more effectively refine its internal models and improve subsequent performance in collaborative tasks.

Both humans and the AI model utilize grounding acts-including presentations, clarifications, repairs, and acceptances-in their messages, regardless of whether the communication occurs in a shared or non-shared visual environment.

Validating Collaborative Synergy: The Puzzle-Matching Paradigm

The Puzzle Matching Task functions as a standardized benchmark by quantifying the degree of Common Ground – the shared knowledge, beliefs, and understanding – between collaborators, and measuring the resulting collaborative efficiency. This is achieved by requiring participants to jointly assemble puzzle pieces based on incomplete individual information; performance is evaluated by metrics such as completion time, communication frequency, and the number of errors. Rigor is maintained through controlled task complexity, standardized puzzle sets, and quantifiable performance indicators, allowing for precise comparisons between different collaborative strategies or partner capabilities. The task’s structure isolates factors contributing to successful collaboration, providing a reliable means of assessing and optimizing team performance in information-sharing scenarios.

Employing GPT-4.1 as a collaborative partner in the puzzle-matching task enables a controlled experimental environment for analyzing collaborative dynamics. This is achieved by leveraging the model’s consistent response patterns and definable parameters, allowing researchers to isolate specific variables influencing collaboration, such as communication frequency, response time, and the impact of differing cognitive strategies. The AI’s lack of inherent biases, compared to human participants, offers a baseline for assessing the contributions of human-specific factors to collaborative success or failure. Furthermore, GPT-4.1’s performance can be systematically varied-through prompt engineering or parameter adjustments-to model different collaborator skill levels or communication styles, providing a granular understanding of their effects on task completion and overall efficiency.

The puzzle-matching task incorporates a shared visual view, meaning all participants – both human and AI – operate with access to the same complete image of the puzzle pieces. This design element is critical for establishing and maintaining Situational Awareness, as it eliminates discrepancies in perceived information regarding piece shapes, colors, and orientations. Consequently, participants can directly observe the actions and progress of others, facilitating a common understanding of the task state and enabling more effective coordination and communication. The consistent visual input serves as a foundational element for evaluating collaborative strategies and identifying potential bottlenecks in information processing or task execution.

Puzzle solutions were more accurately matched to the target configuration in the shared-view condition compared to the non-shared-view condition.

Towards True Partnership: The Future of Human-AI Collaboration

True partnership between humans and artificial intelligence necessitates a shift beyond mere computational power; it demands the development of ‘Theory of Mind’ capabilities within AI systems. This refers to the capacity to not only process information but to infer the intentions, beliefs, and desires of human collaborators. Without understanding why a human is requesting something, an AI risks misinterpreting instructions or providing irrelevant assistance. Building AI that can model human thought processes allows for more nuanced interactions, anticipating needs and adapting to individual working styles. Consequently, such systems can move beyond task completion to genuine collaboration, offering suggestions, identifying potential errors based on inferred goals, and ultimately augmenting human capabilities in a truly synergistic fashion.

Progress towards genuine human-AI partnership hinges on establishing a framework built on shared understanding, flexible response, and mutual perception of context. Researchers are actively developing systems capable of constructing Common Ground – a shared base of knowledge and assumptions – allowing for more effective communication and reducing misunderstandings. Crucially, these systems aren’t static; Iterative Adaptation enables AI to learn from human feedback and refine its approach in real-time, moving beyond pre-programmed responses. This dynamic interplay is further enhanced by cultivating Shared Situational Awareness, where both human and AI possess a congruent understanding of the task at hand, the surrounding environment, and each other’s goals – ultimately fostering a collaborative synergy that transcends simple task completion and unlocks more complex problem-solving capabilities.

The convergence of human intellect and artificial intelligence holds the potential to redefine problem-solving across disciplines. This isn’t simply about automating existing tasks, but about forging synergistic partnerships where each entity complements the other’s strengths. Humans excel at abstract thought, ethical reasoning, and creative leaps, while AI offers unparalleled data processing speed, pattern recognition, and scalability. By effectively combining these capabilities, complex challenges – from scientific discovery and medical breakthroughs to sustainable energy solutions and artistic creation – become increasingly tractable. Moreover, this collaborative dynamic isn’t unidirectional; as AI systems engage more deeply with human partners, they refine their understanding of nuanced contexts and complex goals, driving further advancements in artificial intelligence itself and creating a virtuous cycle of innovation beneficial to both humans and the evolving AI landscape.

Across puzzle trials, human helpers relied more on descriptive references while AI workers favored identifiers, a distinction observed in both shared and non-shared view conditions.

The pursuit of common ground, as detailed in the benchmark assessment, reveals a fascinating interplay between human expectation and artificial intelligence response. It’s a system built on layers of implicit understanding, and like all systems, susceptible to entropy. Ada Lovelace observed that “The Analytical Engine has no pretensions whatever to originate anything.” This resonates deeply with the findings; the LLM, while capable of complex output, fundamentally operates within the bounds of its training data. Establishing true collaborative potential isn’t about mimicking human creativity, but about constructing robust referential conventions – a shared language where meaning isn’t lost in translation, acknowledging that any simplification in communication carries a future cost in potential misunderstanding. The benchmark, therefore, isn’t simply a measure of performance, but an observation of systemic limitations.

The Horizon of Understanding

This benchmark, while illuminating current alignments between human and artificial communication, inevitably highlights the spaces where divergence persists. It is tempting to view these divergences as errors to be corrected, as imperfections in the algorithmic approximation of human ‘common ground’. However, systems age not because of errors, but because time is inevitable; the benchmark simply maps a specific point in that decay. The illusion of seamless collaboration is, therefore, a temporary state, not a final destination.

Future work will likely focus on increasingly complex collaborative tasks, and on expanding the evaluation beyond a single language model. But the fundamental challenge remains: can a system truly acquire common ground, or does it merely simulate the appearance of it? The measurement of ‘situation awareness’ itself is fraught with difficulty, as subjective internal states are always, at best, imperfectly reflected in external performance.

Sometimes stability is just a delay of disaster. The pursuit of robust human-AI collaboration may ultimately reveal not how to build understanding, but how to gracefully navigate its inevitable erosion. The true metric will not be the absence of misunderstanding, but the resilience with which a system responds when it occurs.

Original article: https://arxiv.org/pdf/2602.21337.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/