Learning by Showing: Robots That Ask for the Best Guidance

Author: Denis Avetisyan

A new algorithm helps robots actively solicit feedback from humans to quickly learn complex tasks and improve the teaching experience.

CMA-ES-IG consistently generates higher-quality robotic trajectories than baseline algorithms across diverse simulated environments, achieving superior performance-indicated by improved trajectory queries-even in early iterations and demonstrating robustness to varying representation spaces.

This paper introduces CMA-ES-IG, a method combining Covariance Matrix Adaptation Evolution Strategy with information gain to optimize robot trajectory learning through efficient human preference elicitation.

Effective human-robot interaction requires adaptation to individual user preferences, yet current preference learning techniques often prioritize efficiency over the user experience itself. This work, ‘Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG’, introduces a novel algorithm that explicitly balances learning performance with user-friendliness by generating robot behaviors designed to be both informative and perceptually distinct. Through combining Covariance Matrix Adaptation Evolution Strategy with Information Gain, CMA-ES-IG demonstrably scales to high-dimensional preference spaces while remaining robust to noisy feedback and, crucially, being preferred by non-expert users. Will this approach pave the way for more intuitive and effective robot teaching methods that truly center the human in the loop?

The Challenge of Intuitive Robotics

Conventional robotics often operates on a paradigm of pre-programmed actions, where a robot executes a fixed sequence of behaviors regardless of the user’s unique needs or desires. This approach, while effective in structured environments, struggles when faced with the nuances of human interaction and individual preferences. A robot designed for assistance might, for example, consistently offer help in a manner that feels intrusive or inefficient to a particular user, simply because its actions aren’t tailored to that person’s specific working style or physical capabilities. The rigidity of these pre-defined routines limits the robot’s ability to provide truly intuitive and personalized support, hindering its integration into everyday life and underscoring the need for more adaptive control systems.

The development of truly intuitive robots hinges on their ability to incorporate human feedback, moving beyond rigid pre-programming. A robot’s responsiveness isn’t simply about completing a task, but about how it completes that task in relation to an individual’s needs and expectations. Research demonstrates that robots capable of learning from subtle cues – a slight hesitation, a verbal correction, or even a change in posture – can dramatically improve user satisfaction and build trust. This integration of feedback necessitates algorithms that can efficiently process nuanced human input, adapt behaviors in real-time, and ultimately create a collaborative experience where the robot anticipates and responds to the user’s intent, rather than requiring explicit instructions for every action.

Effectively soliciting user feedback for adaptive robot behavior requires a delicate balance. Current research highlights the difficulty of capturing nuanced preferences without creating a frustrating or cumbersome experience for the human partner. Simply asking for explicit ratings after each robot action proves inefficient and often disrupts the natural flow of interaction. Consequently, scientists are exploring implicit methods – analyzing subtle cues like gaze direction, micro-expressions, and even physiological signals – to infer user satisfaction and intent. The challenge lies in developing algorithms that can accurately interpret these ambiguous signals in real-time, filtering out noise and adapting to individual differences, ultimately enabling robots to learn and respond in a truly personalized and unobtrusive manner.

Participants in a user study expressed preferences for Blossom's gesture-based affective signaling and JACO's item-handing behavior through a ranking interface. — Participants in a user study expressed preferences for Blossom’s gesture-based affective signaling and JACO’s item-handing behavior through a ranking interface.

CMA-ES-IG: A System for Learning User Intent

CMA-ES-IG utilizes Covariance Matrix Adaptation Evolution Strategy (CMA-ES), a derivative-free optimization algorithm known for its robustness in non-convex and noisy landscapes, in conjunction with the Information Gain principle. CMA-ES iteratively refines a population of candidate solutions by adapting the covariance matrix of a multivariate normal distribution, allowing it to efficiently explore high-dimensional search spaces. Information Gain, in this context, quantifies the reduction in uncertainty regarding user preferences achieved by executing a particular trajectory. By integrating these two concepts, CMA-ES-IG biases the optimization process towards trajectories that yield the highest Information Gain, effectively balancing exploration and exploitation to accelerate learning of desired behaviors.

The CMA-ES-IG algorithm utilizes an exploration strategy based on maximizing information gain regarding user preferences. This is achieved by quantifying the uncertainty in the user’s evaluation of different robot behaviors and then actively selecting trajectories predicted to reduce this uncertainty most effectively. The algorithm does not randomly sample behaviors; instead, it prioritizes those expected to yield the highest information gain, as determined by metrics related to the variance or entropy of predicted user preferences. This targeted exploration allows the system to efficiently refine its understanding of the user’s desired behavior, focusing on areas of the behavior space where the user’s preferences are least certain.

CMA-ES-IG reduces the number of user interactions needed to establish preferred robot behaviors by prioritizing trajectory exploration based on information gain. This active learning approach focuses computational effort on areas of the behavioral space where user feedback will most effectively refine the robot’s understanding of preferences. Experimental results demonstrate that CMA-ES-IG achieves higher alignment with user preferences, as quantified by the Area Under the Curve (AUC) metric, compared to methods that rely on random or uniformly distributed trajectory sampling; this improved alignment is achieved with fewer required interactions from the user.

Different query generation techniques explore a trajectory feature space to estimate user preference, with standard CMA-ES maximizing reward but hindering distinguishability, information gain prioritizing distinguishable trajectories at the cost of reward, and CMA-ES-IG effectively balancing both objectives.

Validating Preference Elicitation Through Ranking

Trajectory Ranking was implemented as the primary method for collecting user preference data due to its ability to mimic real-world path selection scenarios. This approach presents users with multiple potential trajectories and requests a ranking of those options from most to least preferred. This method offers a more intuitive and efficient feedback mechanism compared to pairwise comparisons or absolute scoring, as users directly express relative preferences. The ranked list of trajectories then serves as input for preference learning algorithms, allowing for the inference of user-specific cost functions or reward signals that govern path selection behavior. The resulting data is easily interpretable and readily facilitates the construction of personalized route guidance systems or trajectory optimization algorithms.

The Plackett-Luce model, a statistical model for ranking data, was utilized to analyze user-provided trajectory rankings. This model estimates the probability of a trajectory being preferred over another, based on its inherent ‘score’ or attractiveness. Specifically, the probability [latex]P(i > j)[/latex] that trajectory i is ranked higher than trajectory j is calculated as [latex]P(i > j) = \frac{s_i}{s_i + s_j}[/latex], where [latex]s_i[/latex] and [latex]s_j[/latex] represent the scores assigned to trajectories i and j, respectively. By applying this model to the collected trajectory rankings, we were able to infer quantitative user preferences and establish a statistically rigorous basis for comparing and optimizing trajectory generation algorithms. Maximum Likelihood Estimation was employed to determine the trajectory scores [latex]s_i[/latex] from the observed ranking data.

Non-inferiority testing established that the proposed trajectory ranking approach achieves performance levels comparable to existing methods, but with a substantial reduction in the number of user interactions required to elicit preferences. Quantitative evaluation, using Area Under the Curve (AUC) as a metric for trajectory quality, demonstrated superior performance. Simulations further indicated that the CMA-ES-IG algorithm consistently minimizes cumulative regret, as measured by AUC, particularly in optimization problems with dimensionality greater than 10 (d > 10). These results suggest increased efficiency and effectiveness in preference-based trajectory optimization compared to current state-of-the-art techniques.

CMA-ES-IG consistently generates higher-quality trajectories, as indicated by average user reward, outperforming both CMA-ES and Infogain across all dimensions.

Toward Intuitive and Adaptive Robotic Companions

Robotic assistants are becoming increasingly capable through behavioral adaptation, a process where robots refine their actions based on real-time user feedback. The Covariance Matrix Adaptation Evolution Strategy with Iterative Gaussian improvement (CMA-ES-IG) is a key enabler of this continuous learning. Unlike traditional methods that require explicit reprogramming, CMA-ES-IG allows a robot to autonomously adjust its parameters – governing movement, force, or timing – in response to implicit signals like user corrections or preferences. This is achieved through an evolutionary process where the robot explores different behavioral strategies, evaluating their success based on user interaction, and iteratively improving its performance. User studies have demonstrated that robots employing CMA-ES-IG are not only perceived as more adaptive, but also demonstrably easier to use, paving the way for more intuitive and effective human-robot collaboration.

The capacity of robot assistants to operate effectively in diverse and unpredictable settings is significantly bolstered by representation learning. This approach employs techniques, such as Autoencoders, to distill complex sensory inputs – like visual data or joint angles – into a condensed, meaningful representation. Rather than treating each new situation as entirely novel, the robot can leverage these learned representations to identify underlying similarities with previously encountered scenarios. Consequently, the system exhibits improved generalization capabilities, allowing it to adapt its behavior to unfamiliar tasks and environments with greater efficiency and robustness. This learned abstraction moves the robot beyond rote memorization, fostering a form of “understanding” that enables it to apply existing knowledge to solve new problems, ultimately leading to more versatile and dependable assistance.

Efforts to build truly helpful robot assistants are increasingly focused on fostering natural and intuitive interactions, and recent studies demonstrate the impact of incorporating social cues like gesture. Researchers have shown that robots capable of performing tasks such as physical handover – safely and effectively transferring objects to a human – are perceived as significantly more engaging. Critically, user studies comparing different behavioral adaptation algorithms revealed that a system utilizing CMA-ES-IG consistently received higher ratings for ease of use when compared to traditional CMA-ES methods. Furthermore, participants found the CMA-ES-IG system to be markedly more adaptive to their needs than a system relying on Infogain learning, suggesting that these advancements aren’t merely cosmetic but contribute to a demonstrably improved user experience and a more seamless integration of robotic assistance into everyday life.

User study results indicate that CMA-ES-IG was consistently preferred over other algorithms when teaching robots user preferences.

Expanding the Horizon: Quality Diversity and Beyond

The capacity for a robot to thrive in unpredictable settings hinges not simply on executing a single task well, but on possessing a broad skillset. Quality Diversity (QD) techniques address this need by encouraging the evolution of a diverse collection of behaviors, rather than optimizing for a single, best solution. This approach allows the robot to maintain a ‘repertoire’ of skills, enabling it to adapt quickly and effectively to changing circumstances. Instead of relearning from scratch when faced with a novel situation, the robot can draw upon its existing behavioral diversity, selecting or combining skills to meet new challenges. Such flexibility translates to increased robustness – a QD-equipped robot is less likely to fail catastrophically when encountering unexpected obstacles or variations in its environment, fostering a more reliable and adaptable machine.

Understanding the sustained impact of tailored robotic behavior on human-robot interaction necessitates continued investigation. While initial studies demonstrate promising acceptance of personalized approaches – such as the consistently high ranking of the CMA-ES-IG algorithm – the longevity of this positive reception remains an open question. Future research should focus on longitudinal studies to assess whether continued adaptation of a robot’s behavior fosters deepening trust and satisfaction, or if, conversely, it leads to user frustration or a sense of unpredictability. Critical to this exploration is identifying the boundaries of acceptable personalization; determining how much adaptation is beneficial, and at what point it may erode the user’s sense of control or comfort. Ultimately, a comprehensive understanding of these long-term effects will be crucial for designing robots that are not simply functional, but genuinely welcomed as collaborative partners in everyday life.

The development of adaptable robotic behavior signifies a shift towards robots functioning as genuine collaborators, rather than simply automated tools. Recent research highlights this progression, demonstrating that algorithms capable of generating diverse and effective behavioral strategies are not merely theoretical constructs, but demonstrably preferred by human users. Specifically, the CMA-ES-IG algorithm consistently received the highest user rankings, indicating a strong acceptance of its approach to robotic interaction. This preference suggests that users value robots capable of varied and nuanced responses, fostering a sense of partnership and shared understanding – a crucial step in building truly collaborative robotic systems for the future.

Users trained robots to recognize behavioral preferences by ranking trajectories for physical object manipulation ([latex]a[/latex]) and emotive gestures ([latex]b[/latex]), effectively bridging the gap between human intention and robotic action.

The pursuit of efficient robotic trajectories, as detailed in this work, highlights a fundamental truth about complex systems. The algorithm, CMA-ES-IG, doesn’t merely find a path, it actively seeks representations that minimize the cognitive load on the human evaluator – a subtle acknowledgement that the ‘whole’ of the human-robot interaction dictates the success of any single trajectory. As Marvin Minsky observed, “The more of a method that is based on cleverness, the more likely it is to fail.” This research avoids unnecessary complexity, prioritizing a clear, evaluable search space over brute-force optimization. If the system looks clever, it’s probably fragile; CMA-ES-IG aims for robustness through representational simplicity, recognizing that structure dictates behavior, both for the robot and the person guiding it.

Beyond the Trajectory

The presented work, while demonstrating a pragmatic improvement in robot teaching through efficient trajectory exploration, merely skirts the edges of a deeper challenge. Optimizing for evaluability – minimizing the cognitive load on the human demonstrator – implicitly acknowledges the system’s dependence on an external intelligence. Every new dependency is the hidden cost of freedom. The algorithm refines the interface, but does not address the fundamental question of how a machine might autonomously define ‘good’ behavior, independent of constant human assessment.

Future work must move beyond optimizing the search process and consider the representation of behavioral primitives themselves. The current approach, focused on trajectories, remains tethered to low-level control. A more robust system would likely decompose complex tasks into higher-level abstractions, allowing for compositional generalization and reducing the reliance on continuous, nuanced human feedback. This shift demands exploration into alternative representation spaces – perhaps inspired by developmental robotics or intrinsic motivation – where the robot actively structures its own learning landscape.

Ultimately, the elegance of a system lies not in its ability to mimic, but in its capacity to synthesize. The presented algorithm represents a step towards more intuitive human-robot interaction, yet the true measure of progress will be a machine capable of not just learning what is desired, but understanding why.

Original article: https://arxiv.org/pdf/2603.09011.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/