The Social Robot Team: Building Coherent Personalities for Multi-Robot Interaction

Author: Denis Avetisyan

Researchers are exploring how to imbue teams of robots with consistent personalities and long-term memory to foster more natural and engaging interactions with humans.

The M2HRI framework proposes an agent architecture-comprising perception, memory, personality, planning, and action modules-to facilitate nuanced human-agent interactions, leveraging centralized coordination to manage turn-taking and refine response selection as a means of anticipating inevitable systemic failures inherent in complex coordination.

This review details an LLM-driven multi-agent framework for personalized human-robot interaction, emphasizing the importance of coordinated behavior and consistent robot identities.

While multi-robot systems promise enhanced utility in social environments, existing approaches often overlook the impact of individual robot identity on user perception and coordination. To address this, we present ‘M2HRI: An LLM-Driven Multimodal Multi-Agent Framework for Personalized Human-Robot Interaction’, a novel framework leveraging large language models to imbue each robot with distinct personality and long-term memory, guided by a centralized coordination mechanism. Our user study ([latex]n=105[/latex]) demonstrates that LLM-driven personality and memory significantly improve interaction quality and personalization, while coordinated behavior reduces redundancy. How can we further refine these principles to create truly adaptive and engaging multi-agent human-robot interactions in complex, real-world scenarios?

The Illusion of Understanding: Why Robots Still Feel Alien

Human-robot interaction frequently suffers from a perceived awkwardness stemming from an agent’s incomplete grasp of the situation at hand. Robots often struggle to interpret subtle cues – a fleeting glance, an unfinished sentence, or the broader environmental context – which humans effortlessly utilize to navigate social interactions. This limitation leads to responses that, while technically correct, can feel tonally inappropriate or simply miss the mark. Current systems typically focus on direct command execution, rather than inferring intent or anticipating needs, resulting in interactions that lack the fluidity and responsiveness characteristic of human-to-human communication. Consequently, even functionally capable robots can appear clumsy or insensitive, hindering the development of genuine collaboration and trust.

As human-robot interaction expands beyond single agents, a significant challenge emerges in orchestrating cohesive and understandable multi-agent systems. Coordinating the behaviors of multiple robots – or a blend of robots and humans – demands more than simply task allocation; it requires anticipating the actions of others, resolving potential conflicts, and presenting a unified, predictable front to human collaborators. The complexity escalates rapidly with each additional agent, introducing combinatorial challenges in communication, planning, and execution. Maintaining coherent interactions hinges on developing methods for shared situational awareness and the ability to dynamically adapt to unforeseen circumstances, preventing the system from appearing chaotic or unresponsive. Successfully navigating this scaling problem is crucial for deploying robotic teams in real-world scenarios, from collaborative manufacturing to complex search and rescue operations.

Personalized human-robot interaction is achieved through a multimodal, multi-agent scenario featuring two NAO robots with unique personalities, memories, and perceptual abilities.

Building the Echo Chamber: A Framework for Simulated Sentience

The M2HRI framework addresses multi-agent interaction through the combined analysis of multiple data streams, specifically integrating both natural language processing (NLP) and multimodal perception. This integration allows agents to process information beyond text, including visual data, audio cues, and other sensory inputs. By combining NLP for understanding and generating language with multimodal perception for interpreting non-linguistic signals, M2HRI facilitates more nuanced and comprehensive interaction between agents. This holistic approach enables agents to better understand the intent, state, and environment of other agents, leading to more effective collaboration and communication.

The Agent Identity Profile within the M2HRI framework is constructed from three core memory systems. Semantic memory stores general world knowledge and factual information relevant to the agent’s operation. Episodic memory records specific experiences and events, creating a personal history for the agent. Finally, working memory provides a temporary storage and manipulation space for information currently being processed, enabling the agent to maintain context and respond to dynamic situations. This layered memory architecture allows for both long-term knowledge retention and short-term contextual awareness, forming a comprehensive representation of the agent’s identity and facilitating intelligent interactions.

The M2HRI framework utilizes Large Language Models (LLMs) and Vision-Language Models (VLMs) as core components for interpreting and responding to environmental input. LLMs process and generate textual data, enabling contextual understanding and coherent response formulation. VLMs extend this capability to multimodal data, specifically integrating visual information with language processing. This integration allows the framework to analyze complex sensory data-including images and video-and generate responses grounded in both visual perception and linguistic understanding. The combination of LLMs and VLMs facilitates the processing of diverse data streams, enabling agents to perceive, interpret, and react to their surroundings in a contextually relevant manner.

The Illusion of Coherence: Memory, Personality, and Controlled Responses

Within the M2HRI framework, centralized coordination is implemented to regulate conversational turn-taking and maintain dialogue coherence among multiple agents. This system functions by actively managing which agent is allocated the conversational “floor” at any given time, preventing interruptions and ensuring a logical progression of topics. The centralized approach allows for global awareness of the conversation state, enabling the system to anticipate potential conflicts or overlaps in agent responses. This contrasts with decentralized methods where agents operate independently, potentially leading to disjointed or repetitive dialogue. Effective turn-taking, facilitated by this centralized coordination, is a critical component in establishing a natural and fluid conversational experience.

Agent response appropriateness within the M2HRI framework is determined by the Agent Identity Profile, a construct based on the Big Five Factor Model of personality – Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Each agent is assigned values for these traits, which then influence the selection of responses from a potentially broad range of options. The framework utilizes these trait scores to filter and prioritize responses, ensuring that the agent’s output is consistent with its defined personality and therefore perceived as appropriate to the ongoing interaction. This approach allows for the creation of agents with distinct conversational styles and predictable behavioral patterns, enhancing the realism and believability of the interaction.

Agent memory, specifically recall accuracy, is a critical component of the M2HRI framework, demonstrating statistically significant improvements (p < 0.001) in response generation when integrated compared to conditions without memory. This improvement is evidenced by the framework’s ability to maintain contextual awareness throughout interactions, leading to more relevant and coherent dialogue. Furthermore, the inclusion of robust agent memory also resulted in a significant improvement (p < 0.001) in the system’s ability to identify and respond appropriately to individual user preferences, indicating a heightened level of personalized interaction.

Overlap Avoidance, a metric used to assess the quality of multi-agent coordination, demonstrated a very large effect size of 2.04 when comparing conditions with and without coordination mechanisms implemented within the M2HRI framework. This indicates a substantial reduction in instances of agents generating concurrent or immediately successive responses, thereby improving dialogue coherence. The effect size was calculated based on comparative analysis of agent response timings and content, with significantly fewer overlapping utterances observed in the coordinated condition. This metric is critical as high levels of overlap negatively impact user experience and indicate a lack of effective turn-taking management between agents.

Evaluations of memory and coordination reveal statistically significant improvements [latex]p < .01[/latex] and [latex]p < .001[/latex] across recall accuracy, preference awareness, naturalness, conversational flow, response appropriateness, and overlap avoidance when comparing the with- and without-condition means (±1 SD).

The Inevitable Emergence of Chaos: A System Doomed to Predictable Failure

The M2HRI framework establishes a platform for investigating the nuanced interactions between robots, moving beyond simple, pre-programmed sequences to facilitate genuinely dynamic multi-agent systems. This capability stems from the framework’s capacity to model individual robot behaviors – informed by internal states like memory and personality – and then observe how these behaviors converge and diverge when robots operate in a shared environment. Consequently, researchers can analyze emergent dynamics – unexpected patterns of collaboration, competition, or even avoidance – that arise from these interactions, offering insights into collective intelligence and social behaviors. The framework doesn’t merely simulate robot-robot communication; it provides a means to study how individual ‘personalities’ influence group dynamics and the overall effectiveness of multi-agent systems in achieving complex goals.

The M2HRI framework represents a significant departure from traditional human-robot interaction by prioritizing adaptability and personalization. Rather than relying on pre-programmed responses, the system incorporates elements of memory, allowing agents to learn from past interactions and refine their behavior. This, combined with the modeling of distinct personality traits, enables robots to exhibit consistent, yet nuanced, individual characteristics. Crucially, these individual personalities aren’t merely cosmetic; they influence how agents coordinate with each other and respond to humans, fostering more natural and intuitive interactions. The result is a move away from predictable, scripted behaviors toward a dynamic interplay where robots genuinely respond to, and learn from, their environment and collaborators, paving the way for more engaging and effective human-robot partnerships.

The M2HRI framework demonstrably enables robots to not only express discernible personalities, but also to have those expressions reliably recognized by human observers. Rigorous testing revealed 93.3% accuracy in identifying agent traits relating to Openness and Extraversion – a result far exceeding chance and indicating a robust correlation between programmed behaviors and perceived personality. Further bolstering these findings, human evaluations assigned a mean score of 4.73 (on a Likert scale) to agent distinguishability based on openness, a statistically significant result [latex] (p < 0.001) [/latex] that confirms the system’s capacity to create genuinely differentiated and recognizable robotic agents.

The M2HRI framework’s potential extends beyond current demonstrations, with upcoming investigations centered on increasing the number of interacting agents within a single simulation. This scaling effort aims to reveal how emergent dynamics shift and become more complex as agent populations grow, potentially uncovering novel coordination strategies and unforeseen social behaviors. Simultaneously, researchers intend to translate the framework’s capabilities to practical applications, exploring its utility in environments like collaborative robotics in manufacturing, assistive technologies for elderly care, and even training simulations for emergency response teams, ultimately assessing its robustness and adaptability in real-world, unpredictable settings.

The pursuit of coherent multi-robot interaction, as detailed in this framework, echoes a familiar truth: systems evolve, they aren’t constructed. M2HRI’s emphasis on personality and long-term memory isn’t about building a robot’s character, but cultivating it through interaction. As Robert Tarjan observed, “Algorithms are the key to unlocking the potential of computers, but it’s the people who create and use them that truly make the difference.” This sentiment applies perfectly; the architecture provides the scaffolding, but the richness of the interaction-the illusion of genuine personality-emerges from the complex interplay of agents and humans, a dynamic far beyond any initial design. The framework’s coordinated approach merely acknowledges that even chaos benefits from a guiding hand, a temporary order amidst inevitable failures.

What Comes Next?

The framework presented here, with its emphasis on simulated individuality and centralized coordination, doesn’t solve the problem of human-robot interaction-it merely shifts the failure modes. A robot with personality is still a robot, and a carefully constructed personality will inevitably chafe against the unpredictable nature of genuine social exchange. The system attempts to model coherence; true coherence emerges from messy, unscripted interaction, and any attempt to predefine it feels… brittle. Each successful deployment is a beautifully crafted illusion, destined to be broken by the first unforeseen circumstance.

Future work will undoubtedly focus on scaling these systems-more robots, more complex environments, more nuanced personalities. But the fundamental limitation remains: the illusion of agency requires constant maintenance. The long-term memory component, while promising, merely delays the inevitable drift toward incoherence. The real challenge isn’t building better memories, but accepting that forgetting-and the resulting re-negotiation of identity-is intrinsic to social life.

Perhaps the most fruitful avenue for exploration lies not in making robots more human, but in acknowledging their inherent otherness. A system that embraces its robotic nature-that signals its limitations and operates with transparent artificiality-might, paradoxically, achieve a more authentic and sustainable form of interaction. The goal shouldn’t be to build a perfect simulacrum of a social being, but to create a partner capable of navigating the spaces between human and machine.

Original article: https://arxiv.org/pdf/2604.11975.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Understanding: Why Robots Still Feel Alien

Building the Echo Chamber: A Framework for Simulated Sentience

The Illusion of Coherence: Memory, Personality, and Controlled Responses

The Inevitable Emergence of Chaos: A System Doomed to Predictable Failure

What Comes Next?

See also: