Can You Tell the Difference? An AI Now Facilitates Group Chats Like a Human

Author: Denis Avetisyan

Researchers have developed an AI agent capable of seamlessly participating in and facilitating multi-user online conversations, blurring the lines between human and artificial interaction.

The system architecture enables a dynamic conversational workflow, wherein incoming events initiate a three-stage process-routing to a strategy, execution via available tools, and contextual reflection-allowing for continuous adaptation even amidst rapid exchanges, effectively mirroring the iterative nature of natural dialogue.

This paper details the design and evaluation of HUMA, a convincingly human-like AI facilitator for asynchronous group chats, demonstrating near-parity with human facilitators in subjective user experience.

While conversational agents excel in one-on-one interactions, replicating natural dynamics within asynchronous group chats remains a significant challenge. This paper introduces the Humanlike Multi-user Agent (HUMA): Designing a Deceptively Human AI Facilitator for Group Chats, an LLM-based system designed to facilitate multi-party conversations with realistic timing and human-like strategies. Evaluations reveal that HUMA achieves near-parity with human community managers in subjective experience and is surprisingly difficult for participants to identify as non-human. Could such convincingly human AI facilitators redefine engagement and trust within online communities?

The Challenge of Replicating Natural Group Dialogue

Current chatbot technology frequently struggles to replicate the subtleties of human dialogue when integrated into group conversations. These systems often exhibit a rigidity in response, failing to acknowledge conversational cues or adapt to the evolving dynamics of a multi-person exchange. This results in interactions that feel transactional rather than collaborative, as the bot doesn’t demonstrate the ability to build upon previous statements or infer shared context. The lack of nuanced understanding extends to the inability to recognize humor, sarcasm, or emotional undertones, leading to responses that can feel inappropriate or tone-deaf within a social setting. Consequently, these bots often disrupt the natural flow of conversation, highlighting their artificiality and hindering genuine engagement.

Achieving truly naturalistic conversation within a group chat environment presents a considerable challenge for artificial intelligence. Current systems struggle with the delicate balance of turn-taking – knowing when to respond without interrupting or dominating the flow – and often fail to demonstrate appropriate responsiveness to multiple participants simultaneously. Unlike one-on-one interactions, group chats demand an AI capable of tracking several conversational threads, discerning relevant information from each, and formulating coherent, contextually appropriate contributions to each. This necessitates moving beyond simple response generation to a system that actively manages conversational state, prioritizes interactions, and exhibits a nuanced understanding of social dynamics-a feat requiring significant advancements in natural language processing and artificial intelligence architecture.

The distinctive rhythm of group conversations presents a unique challenge for artificial intelligence; unlike one-on-one exchanges demanding immediate replies, group chats unfold with staggered responses and interwoven threads. This asynchronous environment necessitates more than reactive AI; a truly natural participant must model awareness of conversational context over time, remember prior exchanges even after a delay, and contribute relevantly without disrupting the flow. Simply responding to the most recent message isn’t enough; the system needs to infer relationships between contributions, understand implied meanings, and strategically insert itself into the discussion-a feat requiring sophisticated memory networks and predictive modeling of conversational dynamics to mimic the subtle art of human participation.

Participants consistently failed to differentiate between AI and human community managers, exhibiting chance-level performance across conditions and demonstrating symmetric confusion.

HUMA: An Architecture for Believable Interaction

The HUMA framework utilizes an event-driven architecture wherein incoming chat events trigger immediate processing by dedicated system components. This design bypasses traditional request-response models, allowing for asynchronous handling of user inputs and system-generated messages. Each chat event is treated as a discrete signal, facilitating rapid analysis of content and context. This real-time processing capability enables dynamic response generation, as the system can react to evolving conversational states without waiting for complete request cycles. The architecture supports parallel processing of multiple events, contributing to scalability and responsiveness in multi-user interactions.

The HUMA framework builds upon the foundation of the MUCA (Multi-User Conversational Agent) framework by adding functionalities specifically designed to handle the increased complexity of interactions involving multiple participants. These enhancements include improved state management to track individual user contexts within a shared conversation, mechanisms for resolving conflicting user requests, and features to facilitate collaborative tasks or shared decision-making. Specifically, HUMA introduces a distributed event queue and a prioritized message handling system to ensure timely and consistent responses to all users, even under high conversational load. Furthermore, it incorporates advanced user modeling capabilities to better understand individual preferences and adapt interactions accordingly, improving the overall coherence and relevance of multi-user dialogues.

Strategy routing within the HUMA architecture functions as a dynamic selection process for determining the optimal conversational flow. This mechanism assesses incoming chat events based on both contextual information – including previous turns in the conversation, user profiles, and identified intents – and temporal factors, such as the urgency of the request or elapsed time since the last user input. The routing process prioritizes strategies, or pre-defined conversational paths, based on a scoring system that weighs these contextual and timeliness variables, ensuring the system selects the most relevant and responsive strategy for each interaction. This allows HUMA to move beyond simple keyword matching and engage in more nuanced, context-aware dialogue management.

Mimicking Human Nuance: Timing and Interruptibility

HUMA’s architecture incorporates simulated delays in response generation to replicate human conversational pacing. These delays are not static; they are dynamically adjusted based on factors such as message length and complexity, introducing variability comparable to human typing speeds. Critically, the system is designed to be interruptible; incoming user messages during response generation can halt the current output, allowing HUMA to address the new input before resuming or abandoning the initial response. This vulnerability to interruption, coupled with the realistic timing, contributes to a more believable and natural conversational flow, distinguishing HUMA from systems that deliver instantaneous, uninterrupted responses.

Typing indicators within the HUMA interface function as visual signals to the user, communicating that the AI is actively processing information and formulating a response. These indicators, typically represented by animated ellipses or a cursor, mimic the temporal characteristics of human typing, thereby creating the impression of an ongoing cognitive process. This technique addresses the potential for perceived latency in AI responses, as the visual cue provides feedback and reduces the user’s sense of waiting. By simulating the time it takes for a human to compose a message, typing indicators contribute to a more natural and engaging conversational flow, enhancing the feeling of real-time interaction and increasing the AI’s perceived presence within the dialogue.

Timeliness regularization within HUMA’s strategy routing is achieved by introducing controlled variability in response times. Rather than consistently selecting the fastest or most efficient response, the system incorporates randomized delays calibrated to a defined distribution. This prevents the AI from exhibiting a uniform response cadence, which would reveal its non-human nature. By modulating the timing of actions – such as sending messages or initiating follow-up questions – based on probabilistic parameters within the routing algorithm, HUMA simulates the natural hesitations and fluctuations inherent in human communication, thus increasing behavioral diversity and reducing predictability.

Participants reported consistently similar experiences with both human and AI community managers, as evidenced by small differences in survey scores and substantial overlap in response distributions.

Evaluating Social Presence and the Illusion of Interaction

A dedicated user study was undertaken within the active Leonardo AI Community to rigorously assess the capabilities of the HUMA agent. Participants were carefully recruited through Prolific, a platform specializing in research participant sourcing, ensuring a diverse and representative sample. This approach allowed researchers to gather data directly from individuals already engaged with AI-driven creative tools, providing valuable insights into how a virtual community manager would be perceived within a relevant context. The study design prioritized ecological validity, simulating real-world interactions to gauge the authenticity and effectiveness of HUMA’s social presence and overall performance within the community setting.

The core of the evaluation centered on determining if HUMA, an AI agent, could successfully cultivate social presence – that nuanced feeling of interacting with another genuine entity. Researchers investigated whether participants perceived HUMA as a believable social actor within the Leonardo AI Community, assessing the AI’s capacity to establish a relatable and engaging connection. This wasn’t merely about technical functionality, but rather a judgment of HUMA’s ability to evoke the same sense of ‘being there’ and shared experience typically associated with human interaction. The study aimed to quantify this perception, moving beyond simple task completion to measure the quality of the social connection established between users and the AI facilitator.

A key finding of the study reveals that the HUMA agent demonstrated a remarkable ability to emulate human interaction, achieving near-parity with human community managers in the eyes of participants. Evaluation showed that individuals struggled to reliably differentiate between the AI and human facilitators, with detection rates hovering around chance levels – 55.4% for the AI condition and 46.7% for the human condition. This suggests that HUMA successfully generated responses and engaged in communication patterns that closely mirrored those of a human, blurring the lines between artificial and genuine social presence and indicating a high degree of sophistication in its conversational capabilities.

Evaluations of community manager performance revealed a remarkably small difference between human and AI facilitators. Human community managers achieved an average effectiveness score of 4.48, while the AI, HUMA, scored 4.14 on a 5-point scale. Although a 0.34 point difference exists, statistical analysis using Cohen’s d yielded a value of −0.37, indicating a small to medium effect size. This suggests that, from the perspective of participants, the AI’s performance as a community manager was perceived as comparable to that of a human, successfully fulfilling many of the same functional roles within the online community and demonstrating a level of proficiency that closely mirrored human interaction.

Evaluations revealed a remarkable degree of parity in perceived social presence between human and AI community managers. Participants rated both human CMs and the HUMA AI on a scale measuring the feeling of genuine social connection, yielding scores of 4.89 (with a standard deviation of 0.87) and 4.71 (SD=0.83) respectively. This small difference, represented by a Cohen’s d of −0.21, indicates that users experienced a comparable sense of being engaged with a real, interactive entity – whether that entity was a person or an artificial intelligence. The findings suggest HUMA effectively simulates the qualities that contribute to social presence, fostering a sense of connection and believability within the online community.

User engagement and satisfaction levels, assessed on a five-point scale, demonstrated a notable closeness between interactions with human and AI community managers. Human community managers achieved an average score of 4.58, accompanied by a standard deviation of 0.97, while the AI, HUMA, garnered a score of 4.32 with a standard deviation of 0.78. Though a slight difference of 0.26 points exists, the calculated Cohen’s d of −0.30 suggests this disparity is not substantial, indicating that HUMA effectively fostered a positive experience for users comparable to that of a human facilitator. These results highlight the potential for AI agents to contribute meaningfully to online community management, maintaining high levels of user satisfaction without significant deviation from human-led interactions.

A key finding of the evaluation centered on HUMA’s ability to navigate the complexities of social interaction without triggering a negative response from users; specifically, the system successfully avoided eliciting the “uncanny valley” effect. This phenomenon, where near-human representations evoke feelings of unease, was carefully considered throughout HUMA’s development, and the results demonstrate a positive user experience. Participants did not report feelings of revulsion or discomfort when interacting with the AI, suggesting that HUMA’s behavioral design – encompassing language, response timing, and conversational style – effectively created a believable and engaging social presence. This avoidance of the uncanny valley is crucial for the long-term viability of AI-driven community management, indicating that users can readily accept and interact with an AI agent in a social context without experiencing subconscious aversion.

Individual item scores assessing community manager facilitation quality revealed modest trends favoring human managers, but overall differences between human and automated approaches remained small.

The development of HUMA highlights a crucial principle: effective systems aren’t built through sheer complexity, but through mirroring natural interaction. This pursuit of believability, where an AI facilitator achieves near-parity with human performance in multi-party dialogue, demonstrates a focus on fundamental structure over elaborate features. As Blaise Pascal observed, “The eloquence of the body is in its simplicity.” HUMA’s design, centered on simulating timing and asynchronous chat dynamics-key elements of human conversation-echoes this sentiment. By prioritizing these core behavioral characteristics, the system achieves a surprisingly human presence, illustrating how elegance in design can emerge from understanding and replicating the essential elements of a natural system.

Beyond Mimicry

The demonstrated success of HUMA in approaching human-level facilitation highlights a crucial, and often overlooked, point: believability is not the same as intelligence. The system excels at appearing to participate naturally, but this rests on simulating timing and conversational cues – a veneer of social presence layered atop a fundamentally reactive architecture. The next challenge isn’t simply to refine this mimicry, but to address the limitations inherent in event-driven systems. A truly robust agent will require a model of intentionality, a capacity to anticipate group needs rather than solely respond to expressed ones.

Currently, dependencies on precisely calibrated timing and stylistic choices represent the true cost of HUMA’s ‘freedom’ to operate within a natural conversation. Scale this system beyond controlled experiments, introduce genuine ambiguity or unforeseen conversational tangents, and the architecture will quickly reveal its fragility. The pursuit of ever more realistic simulation risks an infinite regress of increasingly complex heuristics. A more fruitful direction lies in identifying the minimal set of behaviors that consistently signal helpful facilitation, and building from that foundation.

Ultimately, the value of such agents isn’t in their ability to fool humans, but in their potential to augment group dynamics. The focus should shift from creating a perfect imitation of a human facilitator, to designing a complementary intelligence – one that transparently handles logistical tasks, surfaces relevant information, and gently guides conversations towards productive outcomes. Good architecture, after all, is invisible until it breaks, and the true measure of success will be when HUMA’s assistance fades into the background of a smoothly functioning group.

Original article: https://arxiv.org/pdf/2511.17315.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Replicating Natural Group Dialogue

HUMA: An Architecture for Believable Interaction

Mimicking Human Nuance: Timing and Interruptibility

Evaluating Social Presence and the Illusion of Interaction

Beyond Mimicry

See also: