Robots That Remember: Personalizing Interactions for Everyone

Author: Denis Avetisyan

A new framework enables socially assistive robots to dynamically adapt to the needs of multiple users, creating more natural and effective long-term interactions.

HARMONI establishes a framework for conversational AI by integrating multimodal perception, world modeling, user profiling, and contextual generation to produce personalized responses grounded in both the immediate conversational context and long-term speaker characteristics, effectively simulating a nuanced understanding of interactive dialogue.

This paper introduces HARMONI, a multimodal personalization system for multi-user human-robot interaction, addressing challenges in user modeling, long-term memory, and ethical AI considerations.

Existing human-robot interaction systems struggle to maintain personalized and adaptive engagement over time, particularly when managing interactions with multiple users. This paper introduces HARMONI-a multimodal personalization framework leveraging large language models to enable socially assistive robots to dynamically adapt to individual needs in complex, multi-user environments. Through integrated perception, world modeling, user profiling, and response generation, HARMONI demonstrably improves user modeling accuracy and personalization quality, as validated through both quantitative evaluations and a real-world nursing home study. Could this approach pave the way for more nuanced and effective long-term human-robot collaboration in diverse social settings?

Navigating the Complexities of Multi-User Interaction

Much of the existing research in Human-Robot Interaction has historically centered on scenarios involving a single human and a single robot, creating a somewhat limited understanding of real-world applicability. This focus overlooks the inherent complexities that arise when multiple individuals interact with a robotic system simultaneously – a common occurrence in homes, workplaces, and public spaces. These multi-user dynamics introduce challenges related to managing competing requests, interpreting ambiguous commands from various sources, and ensuring equitable access to the robot’s capabilities. Consequently, robots designed primarily for one-on-one interaction often struggle to function effectively in group settings, leading to frustrating experiences and hindering their potential integration into everyday life. Addressing this gap requires a shift towards developing robotic systems capable of navigating the nuances of social interaction and accommodating the diverse needs of multiple users concurrently.

For robots to truly integrate into human environments, they must navigate the intricacies of multi-user interaction by simultaneously managing a tapestry of individual needs and expectations. This demands more than simply responding to the most recent command; a successful interaction requires the robot to maintain distinct user profiles, remembering preferences – such as communication style or task prioritization – for each person involved. Furthermore, the robot must meticulously track the conversational history with each user, ensuring that responses are contextually relevant and avoid repeating information or contradicting previous statements. This complex juggling act necessitates advanced algorithms capable of discerning intent, resolving conflicting requests, and maintaining a coherent, personalized experience for every participant, moving beyond single-user paradigms to enable genuinely collaborative robotic assistance.

Successfully navigating multi-user interactions demands a significant leap in robotic capabilities, as current systems are largely designed for one-on-one engagement. A robot operating amongst a group must not only discern individual user profiles – encompassing factors like expertise, communication style, and even emotional state – but also synthesize these disparate inputs into a coherent understanding of the collective dynamic. This requires advanced algorithms capable of tracking multiple conversational threads, resolving conflicting requests, and maintaining a consistent, relevant response for each participant. The challenge extends beyond simple task allocation; a truly effective multi-user robot must demonstrate social intelligence, proactively managing expectations and ensuring equitable participation within the group, a feat demanding robust modeling of human behavior and nuanced understanding of social cues.

The robot successfully manages a complex, multi-turn dialogue by dynamically switching conversational context between users and accurately resolving references like [latex] ext{``the appointment''} [/latex] through maintained state tracking. — The robot successfully manages a complex, multi-turn dialogue by dynamically switching conversational context between users and accurately resolving references like [latex] ext{“the appointment”} [/latex] through maintained state tracking.

HARMONI: A Framework for Personalized and Adaptive Interaction

The HARMONI framework utilizes a multimodal approach to personalization in human-robot interaction (HRI) scenarios involving multiple users. This involves integrating data from various sensory inputs – including visual, auditory, and potentially physiological signals – to build a comprehensive understanding of each user. Adaptability is achieved through continuous monitoring of user behavior and preferences, allowing the robot to dynamically adjust its actions and communication strategies. Crucially, HARMONI explicitly incorporates ethical considerations into the personalization process, implementing mechanisms to safeguard user privacy, ensure safety, and mitigate potential biases in the robot’s responses. This balance between personalized interaction and responsible AI is a core design principle of the framework.

The HARMONI framework utilizes a modular architecture to enable complex human-robot interaction. The perception module gathers data from various sensors to understand the environment and user state. This data feeds into the world modeling module, which constructs and maintains a representation of the surrounding environment, including objects and spatial relationships. Simultaneously, the user modeling module builds a profile of the individual user, tracking preferences, capabilities, and emotional state. Finally, the response generation module synthesizes appropriate robot behaviors – including speech, gestures, and movement – based on the outputs of the other modules, facilitating a nuanced and contextually relevant interaction.

The Personalization Strategy within the HARMONI framework operates by continuously monitoring user characteristics – including behavioral patterns, emotional states derived from multimodal sensor data, and explicitly stated preferences – to modify robot behavior in real-time. This adjustment occurs through a rules-based system and machine learning algorithms that map observed user data to specific robot actions, such as altering speech rate, modifying physical proximity, or adapting task complexity. The strategy doesn’t rely on pre-defined user profiles, but rather builds an ongoing model of the user’s current state, allowing for nuanced and context-aware interaction. This dynamic adaptation aims to optimize the human-robot interaction for individual users, improving both efficiency and user experience.

The HARMONI framework incorporates ethical considerations as a primary design principle, implementing mechanisms to balance personalized interaction with user privacy and safety. This is achieved through data minimization techniques, limiting the collection and storage of personally identifiable information. Furthermore, the framework utilizes differential privacy methods where appropriate to obfuscate individual user data while still enabling effective personalization. Safety protocols are integrated into the response generation module, ensuring that robot actions and communication remain within pre-defined boundaries and do not pose a risk to the user’s physical or psychological wellbeing. Explicit consent mechanisms and transparency regarding data usage are also incorporated to maintain user autonomy and build trust.

This user interface architecture integrates perception, user modeling with long-term memory, and context-aware response generation-supported by a world model maintaining short-term conversational context-to provide explainable and adaptable multi-user interactions.

Real-Time Memory Management: Sustaining Coherent Interaction

HARMONI’s Real-Time Multi-User Memory Management system is designed to sustain conversational coherence in environments with concurrent user interactions. This is achieved through continuous tracking of individual user states and the maintenance of distinct memory spaces for each. The system employs efficient data structures and algorithms to facilitate rapid access and modification of user-related information, preventing data collisions and ensuring that responses are tailored to the correct individual. Scalability is a key component, enabling the system to handle an increasing number of concurrent users without significant performance degradation, and ensuring consistent responsiveness across all interactions.

HARMONI’s memory management system differentiates between short-term and long-term storage to facilitate adaptive interactions. Short-term memory functions as a buffer, retaining information directly related to the current conversational exchange, including recent user inputs and system responses. This allows the system to maintain coherence within the immediate dialogue. Conversely, long-term memory stores persistent data associated with individual users, encompassing profile details, stated preferences, and historical interaction data. This user-specific information is retrieved as needed to personalize responses and tailor the conversation to each user’s established characteristics, independent of the current exchange.

HARMONI’s ability to deliver contextually relevant and personalized responses is achieved through the dynamic integration of short-term and long-term memory components. Short-term memory, focused on the immediate conversational exchange, is continuously updated with each user interaction. This information is then cross-referenced with data retrieved from long-term memory, which stores persistent user profiles and preferences. The framework employs algorithms to prioritize and synthesize information from both sources, enabling it to tailor responses not only to the current dialogue but also to the individual user’s established characteristics and history. This process occurs in real-time, minimizing latency and ensuring a fluid and adaptive user experience.

Experimental validation of the real-time memory management system was conducted using complex multi-user interaction scenarios. Results indicated quantifiable performance improvements across key areas: user detection accuracy increased by 15%, average profile retrieval time decreased by 22%, and response generation latency was reduced by 10%. These gains were measured using a benchmark dataset of 500 simulated multi-user conversations, with performance evaluated against a baseline system lacking dynamic memory integration. Statistical analysis confirmed the observed improvements were significant with a p-value of < 0.05 for all measured metrics.

PersonaFeedback improves both reply quality and latency by selectively retrieving relevant user profile features, outperforming both baseline inference and full profile utilization as evaluated by an LLM judge.

Validating HARMONI: Leveraging LLM-Based Evaluation

The evaluation of HARMONI’s response quality and relevance utilized an ‘LLM-as-Judge’ technique, wherein large language models were leveraged to score the outputs. This methodology involved prompting multiple LLMs to assess HARMONI-generated responses based on predefined criteria for answer quality and contextual appropriateness. The LLM-as-Judge approach provided a scalable and automated method for comparative analysis, enabling objective quantification of HARMONI’s performance relative to other language models without requiring extensive human annotation. Scores generated by the LLM judges were then used to benchmark HARMONI’s capabilities and identify areas for potential improvement.

Evaluation of the HARMONI framework included comparative analysis against a diverse set of Large Language Models (LLMs). Specifically, HARMONI-generated responses were benchmarked against outputs from Gemma3-4B, Gemma3-12B, LLaMA3-70B, GPT-4o, and Mistral-Nemo-12B. This selection encompassed both open-source models, such as Gemma and LLaMA, and closed-source models, including GPT-4o and Mistral-Nemo-12B, representing a range of model sizes and architectures to provide a comprehensive performance assessment.

Evaluation using an ‘LLM-as-Judge’ technique indicated that HARMONI consistently generated responses assessed as higher quality and more contextually appropriate than those produced by several baseline large language models, including Gemma3-4B, Gemma3-12B, LLaMA3-70B, GPT-4o, and Mistral-Nemo-12B. Quantitative analysis of LLM-as-Judge scores for answer quality revealed consistent improvements when HARMONI-generated responses were compared to those of the baseline models, demonstrating a statistically significant advantage in response quality as determined by the evaluating LLMs.

A usability evaluation of the HARMONI framework was conducted utilizing the System Usability Scale (SUS) with a cohort of 20 participants ranging in age from 65 to 91 years old. This assessment measured perceived ease of use and satisfaction with the system. The SUS questionnaire consists of ten statements with five-point Likert scale responses, providing a quantitative measure of usability. Results from the SUS indicate that the framework is considered user-friendly and effective for the target demographic, confirming its practical application and accessibility for older adults.

Employing a two-inference approach with GPT-4o consistently improves performance-as evidenced by higher ROUGE scores, increased session similarity, and enhanced short- and long-term memory integration-while simultaneously reducing latency through parallelized processing.

Towards More Empathetic and Engaging Social Robotics

The HARMONI framework represents a substantial advancement for Socially Assistive Robotics (SAR) by moving beyond simplistic, one-size-fits-all interactions. It achieves this through a sophisticated ability to tailor responses and behaviors to individual users, recognizing and adapting to their unique needs and preferences in real-time. Crucially, HARMONI doesn’t limit itself to single interactions; it effectively manages the complexities of multi-user scenarios, navigating group dynamics and ensuring each participant receives appropriate and personalized support. This capacity to handle both individualization and complex social environments positions HARMONI as a pivotal development, potentially revolutionizing how robots assist in areas like healthcare, education, and collaborative work, fostering more natural and effective human-robot interaction.

The HARMONI framework demonstrates a capacity to tailor robotic assistance to a wide spectrum of individual needs, moving beyond generalized interactions to provide genuinely supportive experiences. This is achieved through nuanced understanding of user states and preferences, allowing robots to adapt their behavior and communication style for optimal engagement. Consequently, studies indicate that this personalized approach not only increases user acceptance but also demonstrably improves outcomes in areas such as rehabilitation, education, and elder care. By fostering stronger, more meaningful connections, HARMONI facilitates a collaborative dynamic where individuals feel empowered and supported, leading to more effective and satisfying interactions with robotic assistants.

The development of HARMONI places significant emphasis on establishing a foundation for responsible innovation within human-robot interaction. Recognizing the sensitive nature of social robotics, the framework is designed with robust ethical considerations and stringent user privacy protocols integrated throughout its architecture. This proactive approach extends beyond mere compliance with regulations; it actively seeks to anticipate and mitigate potential risks associated with data collection, algorithmic bias, and the potential for misuse. By prioritizing user control over personal information and ensuring transparency in robotic behavior, HARMONI aims to foster trust and acceptance, paving the way for socially assistive robots that genuinely enhance human well-being without compromising individual rights or dignity.

Ongoing development surrounding the HARMONI framework prioritizes scaling its functionality to address increasingly intricate real-world interactions. Researchers are actively investigating how to equip the system with the capacity to navigate dynamic, multi-person environments, adapting to nuanced social cues and managing unpredictable behaviors. Exploration extends beyond current applications, with planned studies focusing on HARMONI’s potential within educational settings, healthcare facilities supporting long-term patient care, and even collaborative workspaces where human-robot teams can enhance productivity and well-being. This expansion aims not only to broaden the scope of socially assistive robotics, but also to rigorously evaluate HARMONI’s adaptability and robustness across a diverse spectrum of application domains, paving the way for genuinely helpful and engaging robotic companions.

The HARMONI framework, with its focus on dynamic adaptation in multi-user human-robot interaction, echoes a fundamental tenet of complex systems. It suggests that successful interaction isn’t about imposing a rigid structure, but about fostering a responsive, evolving relationship. As Henri Poincaré observed, “It is through science that we arrive at certainty, and through art that we arrive at enjoyment.” HARMONI, in its pursuit of personalized and ethically-sound assistance, attempts to bridge that divide – employing the ‘science’ of multimodal perception and LLMs to create an interaction that is not just functional, but genuinely enjoyable for each user. If the system looks clever, it’s probably fragile; HARMONI’s strength lies in its adaptability, a testament to prioritizing the whole system over isolated, impressive features.

Beyond Adaptation: Charting a Course for Truly Collaborative Robotics

The presented framework, HARMONI, rightly addresses the critical need for personalization within multi-user human-robot interaction. However, the pursuit of dynamic adaptation raises a fundamental question: what, precisely, are these systems optimizing for? Is it merely behavioral compliance, or a more nuanced enhancement of collective well-being? The elegance of a system lies not in the complexity of its responses, but in the clarity of its underlying goals. A responsive robot is not necessarily an effective collaborator.

The long-term memory component, while necessary, invites further scrutiny. Data accrues, models refine, yet the spectre of algorithmic bias and the potential for reinforcing existing social inequalities remain. Simplicity is not minimalism; it is the discipline of distinguishing the essential from the accidental. Future work must focus on robust mechanisms for transparency and accountability, ensuring these systems serve as equitable partners, not subtle perpetuators of pre-existing conditions.

Ultimately, the true challenge transcends technical sophistication. It resides in constructing a framework that prioritizes not just the ‘what’ of interaction, but the ‘why’. A system capable of learning and adapting is valuable, but one designed with a clear understanding of its ethical obligations and its role within a larger social structure is truly significant. The path forward demands a move from merely responsive robots to genuinely collaborative partners.

Original article: https://arxiv.org/pdf/2601.19839.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/