Can You Tell the Difference? An AI Now Blends Seamlessly Into Group Chats

Author: Denis Avetisyan


Researchers have developed an AI agent capable of participating in multi-user conversations with a level of human-like nuance that challenges participants to distinguish it from a real person.

The system employs a three-stage workflow-routing events to select a conversational strategy, executing that strategy with available tools, and reflecting on context for iterative improvement-allowing for dynamic adaptation to the rapid pace of conversation through interruption and continuous refinement of its approach.
The system employs a three-stage workflow-routing events to select a conversational strategy, executing that strategy with available tools, and reflecting on context for iterative improvement-allowing for dynamic adaptation to the rapid pace of conversation through interruption and continuous refinement of its approach.

This paper introduces HUMA, a novel LLM-based agent designed to facilitate asynchronous group chats with convincingly human behavior and timing.

While increasingly sophisticated, most conversational AI is designed for direct, turn-based interactions, falling short of replicating the nuances of natural group communication. This limitation motivates the work presented in ‘Humanlike Multi-user Agent (HUMA): Designing a Deceptively Human AI Facilitator for Group Chats’, which introduces an AI agent capable of convincingly participating in asynchronous multi-party chats. Through an event-driven architecture and realistic timing simulation, HUMA achieves near-parity with human facilitators in subjective experience and proves difficult for participants to reliably distinguish from a person. Could such convincingly human AI facilitators redefine online community management and reshape the dynamics of digital collaboration?


The Challenge of Mimicking Natural Group Dialogue

Current chatbot technology frequently struggles to replicate the subtleties of human interaction within group conversations. These systems often deliver responses that, while technically correct, lack the contextual awareness, emotional intelligence, and adaptive phrasing characteristic of natural dialogue. The result is a noticeable disconnect – chatbots tend to treat each message in isolation, failing to build upon previous exchanges or acknowledge the conversational dynamics unfolding amongst human participants. This robotic quality stems from limitations in their ability to process the implicit cues, shared history, and evolving topics that define genuine group chat, ultimately hindering their integration into these complex social settings.

Achieving truly natural conversation within a group chat presents a formidable challenge for artificial intelligence, largely because replicating the dynamic interplay of human turn-taking is exceptionally complex. Current systems often struggle with determining when to respond – interrupting or talking over others, or remaining silent for unnatural lengths of time. Beyond timing, a responsive chatbot must also process multiple concurrent inputs, differentiating relevant contributions from background chatter and tailoring replies accordingly. This demands an AI capable of not simply reacting to the most recent message, but maintaining a coherent understanding of the evolving conversation flow and participating in a way that feels fluid and appropriate to all involved – a level of contextual awareness that remains elusive for most existing models.

Group messaging presents a unique challenge for artificial intelligence, extending beyond simple question-and-answer interactions. Unlike synchronous conversations, group chats are often asynchronous, meaning participants don’t respond immediately and context can shift rapidly. This demands an AI capable of more than just reactive responses; it requires genuine participation. The system must intelligently track evolving topics, remember prior exchanges from multiple participants, and contribute relevantly even after a delay. Effectively, the AI needs to model a conversational partner who understands the ebb and flow of a group dynamic, rather than simply processing individual messages in isolation – a significant leap towards truly natural language processing in a social context.

Participants consistently failed to differentiate between AI and human community managers, exhibiting near-chance classification rates with symmetrical confusion across conditions.
Participants consistently failed to differentiate between AI and human community managers, exhibiting near-chance classification rates with symmetrical confusion across conditions.

HUMA: An Architecture for Believable Interaction

The HUMA framework utilizes an event-driven architecture where incoming chat events trigger immediate processing by designated system components. This architecture eschews traditional request-response models in favor of asynchronous event handling, allowing the system to react to user input without waiting for previous operations to complete. Each chat event is treated as a discrete notification, facilitating parallel processing and reducing latency. Consequently, HUMA can dynamically generate responses based on the current state of the conversation and incoming events, enabling real-time interaction and adaptive behavior. This event-driven approach is implemented using a message queue to ensure reliable delivery and scalability of event processing.

The HUMA framework builds upon the existing MUCA (Multi-User Conversational Agent) architecture by adding functionalities specifically designed for handling interactions involving multiple users. These enhancements include improved state management to track individual user contexts within a shared conversation, mechanisms for resolving conflicting user requests, and advanced scheduling algorithms to manage turn-taking and ensure responsive communication across all participants. Furthermore, HUMA incorporates features for broadcasting messages to specific user groups and dynamically adjusting conversational strategies based on the evolving group dynamics and individual user behaviors, all while maintaining scalability for a large number of concurrent users.

Strategy routing within the HUMA architecture functions by evaluating incoming chat events against a defined set of conversational strategies, each associated with specific contextual triggers and time constraints. The system dynamically ranks these strategies based on the current dialogue state, user input, and elapsed time since the last interaction. This ranking process utilizes a scoring function that considers factors such as keyword matches, intent recognition confidence levels, and the relevance of the strategy to the overall conversation goal. The highest-scoring strategy is then selected, and its corresponding response generation module is activated to produce the next turn in the dialogue. This allows HUMA to prioritize responses that are both contextually appropriate and timely, facilitating more natural and believable interactions.

Simulating Human Nuance: Timing and Interruption

HUMA’s architecture simulates human response times by introducing variable delays before generating each message. These delays are not fixed but are dynamically adjusted based on factors such as message length and complexity, mirroring the cognitive processing time of a human. Critically, the system is designed to be interruptible; incoming user input can preempt the ongoing message generation process, resulting in incomplete responses or a shift in conversational focus, similar to how humans are often interrupted during speech. This intentional introduction of latency and the capacity for interruption are core elements in creating a more believable and engaging conversational AI, as it avoids the characteristic instantaneous responses of traditional chatbots.

Typing indicators within the HUMA system function as visual cues designed to simulate the temporal aspects of human communication. These indicators, typically represented by an animation signifying ongoing text composition, provide feedback to the user that the AI is actively processing and formulating a response. This simulates the latency inherent in human typing and thought processes, increasing the perception of responsiveness and reducing the feeling of interacting with a purely algorithmic system. The presence of a typing indicator contributes to a sense of ‘presence’ by mirroring the conversational rhythms of human interaction and signaling ongoing engagement, thereby enhancing the user’s feeling of connection with the AI.

Timeliness regularization within HUMA’s strategy routing functions by introducing controlled variability into response times. Instead of consistently selecting the fastest or most efficient response strategy, the system incorporates a degree of randomness, drawing from a distribution of acceptable delays. This prevents HUMA from establishing a rigid response pattern, which would be readily identifiable as artificial. By modulating response latency even when optimal strategies are available, the system mimics the natural hesitations and variations inherent in human conversation, contributing to increased behavioral diversity and a less predictable interaction experience. This regularization is applied at the routing level, meaning the choice of strategy is influenced by timeliness considerations, not the execution speed of a single strategy.

Participants reported consistently similar experiences with both human and AI community managers, as indicated by small effect sizes and overlapping distributions across survey measures.
Participants reported consistently similar experiences with both human and AI community managers, as indicated by small effect sizes and overlapping distributions across survey measures.

Evaluating Social Presence and Impact

A focused user study was undertaken within the active Leonardo AI Community to rigorously evaluate the performance of the HUMA agent. Participants were strategically recruited through Prolific, a platform enabling access to a diverse and representative pool of individuals for research purposes. This approach ensured a broad range of perspectives were included in the assessment, bolstering the validity of the findings related to HUMA’s social capabilities and user perception. The study’s design prioritized ecological validity by embedding the agent within a real-world online community, allowing for observation of natural interactions and authentic responses from participants.

The core of the evaluation centered on determining whether HUMA could convincingly establish social presence – that is, the sensation among users that an AI agent is a genuine, relatable social entity. This wasn’t merely about technical functionality, but about fostering a feeling of connection and authentic interaction. Researchers assessed this by observing how participants responded to HUMA’s communications within the Leonardo AI community, analyzing whether behaviors mimicked those expected from a human community manager. The findings suggest HUMA successfully cultivated this sense of being ‘present’ – users demonstrated an inability to consistently differentiate between the AI and their human counterparts, indicating a strong capacity for the agent to be perceived as a credible and engaging social actor.

A recent study explored the capacity of the HUMA AI agent to convincingly simulate a human community manager, revealing a remarkable level of parity with actual human facilitators. Participants engaged with either an AI-driven or human-operated community manager, and subsequent analysis demonstrated an inability to reliably distinguish between the two – detection rates hovered around chance levels, at 55.4% for the AI condition and 46.7% for the human condition. This suggests that HUMA successfully replicates key elements of human interaction, blurring the lines between artificial and authentic social presence and indicating a significant advancement in the creation of convincingly human-like AI agents.

Evaluations of community manager performance revealed a remarkably narrow gap between human facilitators and the HUMA AI agent. Human community managers achieved an average effectiveness score of 4.48, while HUMA registered a score of 4.14 on a 5-point scale; this represents a difference of only 0.34 points. Statistical analysis, using Cohen’s d, indicated a small effect size of −0.37, suggesting that while a slight difference exists, the AI’s performance was convincingly close to that of its human counterparts in fulfilling the role of community manager. This finding underscores HUMA’s capacity to effectively engage with and support online communities, approaching human-level proficiency in this complex social task.

Evaluations revealed a striking similarity in how users perceived the social presence of human and AI community managers. Scoring an average of 4.89 (with a standard deviation of 0.87) for human managers and 4.71 (SD=0.83) for the AI, HUMA successfully fostered a feeling of genuine interaction. This near-parity, indicated by a Cohen’s d of −0.21, suggests that participants experienced the AI not merely as a functional tool, but as a relatable social entity within the online community. The findings demonstrate HUMA’s capacity to establish a convincing sense of ‘being there’ and engaging in authentic communication, mirroring the effect of a human facilitator.

Evaluations revealed a high degree of user satisfaction with both human and AI-driven community management, registering scores of 4.58 (with a standard deviation of 0.97) for human community managers and 4.32 (SD=0.78) for the AI, HUMA. While a slight difference of 0.26 points exists on the 5-point scale, the calculated Cohen’s d of −0.30 suggests this disparity is not substantial, indicating HUMA effectively fostered positive user experiences comparable to those delivered by human moderators. This finding is particularly notable as it demonstrates the AI’s capacity to maintain engagement levels and generate satisfaction within the online community, contributing to a seamless and positive interaction for participants.

A key finding of the study revealed that HUMA’s interactions successfully navigated the complexities of human perception, effectively avoiding the unsettling phenomenon known as the uncanny valley. This suggests that the AI agent’s behavioral design – encompassing its responses, conversational style, and overall presentation – struck a balance that fostered a positive user experience. Rather than eliciting feelings of eeriness or revulsion often associated with near-human representations, HUMA’s performance maintained user engagement and satisfaction, indicating a crucial step forward in the development of AI capable of establishing genuine social connection without triggering negative emotional responses.

Individual item scores assessing community manager facilitation quality reveal modest trends favoring human managers, but overall differences across all items are minimal, indicating comparable effectiveness to automated systems.
Individual item scores assessing community manager facilitation quality reveal modest trends favoring human managers, but overall differences across all items are minimal, indicating comparable effectiveness to automated systems.

The development of HUMA underscores a fundamental principle of effective system design: elegance through simplicity. The agent’s success isn’t rooted in complex algorithms attempting to replicate human nuance, but in a carefully constructed simulation of timing and social presence. As Marvin Minsky observed, “The more you know, the more you realize how much more you don’t.” This holds true for creating believable AI; HUMA demonstrates that focusing on a few key behavioral elements – specifically asynchronous chat dynamics and event-driven responses – can achieve surprisingly human-like results. The research highlights that a well-structured system, prioritizing core interaction patterns, can be far more effective than striving for exhaustive, and ultimately fragile, imitation of human complexity.

What Lies Ahead?

The pursuit of convincingly human-like agents, as demonstrated by HUMA, inevitably reveals the brittleness of ‘intelligence’ as currently conceived. The system achieves a surface-level parity in asynchronous chat, yet the underlying mechanics remain a complex choreography of timing simulations and event-driven responses. True scalability isn’t about increasing computational power, but about reducing the need for such intricate mimicry. A more robust architecture will necessitate a deeper understanding of the principles governing social interaction – not as a series of predictable inputs, but as an emergent property of a complex system.

Current evaluations rely on subjective judgements of ‘humanness’. While valuable as a first step, this metric is inherently circular. The goal shouldn’t be to fool participants, but to create agents that genuinely contribute to the collaborative process. Future work must shift focus towards objective measures of group performance – how does an AI facilitator impact decision-making, information sharing, or creative problem-solving? The ecosystem of the chat itself is the key, and the agent must be measured by its contribution to the overall health of that ecosystem.

Ultimately, the challenge isn’t building agents that appear human, but agents that understand – and respect – the fundamental principles of social coordination. Simplicity, not sophistication, will define the next generation. A truly scalable system won’t need to simulate humanity; it will embody principles of effective communication, allowing genuine interaction to emerge naturally.


Original article: https://arxiv.org/pdf/2511.17315.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-24 23:41