Robots Learn by Example: Building Skills from Human Feedback

Author: Denis Avetisyan

A new method empowers robots to acquire and generalize skills by dynamically incorporating human guidance and successful code examples into a reusable knowledge base.

The system addresses the limitations of robotic skill acquisition by collecting human feedback on failed attempts - such as a robot unable to operate a toaster - and integrating this data into a skillbook, which then uses generalized templates to guide code generation for new tasks, effectively allowing the robot to learn from its mistakes and expand its capabilities beyond initial programming, even for seemingly simple actions like opening doors. — The system addresses the limitations of robotic skill acquisition by collecting human feedback on failed attempts – such as a robot unable to operate a toaster – and integrating this data into a skillbook, which then uses generalized templates to guide code generation for new tasks, effectively allowing the robot to learn from its mistakes and expand its capabilities beyond initial programming, even for seemingly simple actions like opening doors.

This paper introduces MEMO, a system for improving neuro-symbolic policies through skill clustering, retrieval-augmented generation, and the aggregation of human feedback into a skillbook for enhanced robot learning and generalization.

While neuro-symbolic policies offer a promising framework for general robot manipulation, their performance remains constrained by the limited repertoire of available skills. This paper, ‘From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO’, introduces a method for dynamically expanding these skill sets by leveraging human feedback to create a retrieval-augmented skillbook. By clustering corrections and synthesizing generalized skill templates, we enable robots to generate novel skills from multi-task human guidance. Can this approach unlock more robust and adaptable robotic systems capable of learning from nuanced human instruction and generalizing to previously unseen tasks?

The Illusion of Control: Why Robots Still Need Us

Conventional robotic systems often depend on meticulously pre-programmed skills, a methodology that severely restricts their capacity to respond effectively to unforeseen circumstances. This approach, while successful in highly structured environments, falters when confronted with the inherent variability of the real world. Each new situation, even a minor deviation from the programmed parameters, can require significant re-engineering or complete task failure. The rigidity stems from the robot’s inability to generalize learned behaviors to novel scenarios; it operates based on explicit instructions rather than adaptable principles. Consequently, these systems struggle with even slight environmental changes, highlighting a critical need for robotic platforms capable of independent learning and dynamic skill acquisition to navigate the complexities of unstructured environments.

Robotics frequently encounters the “long tail” of real-world complexity – a vast number of infrequent, unpredictable scenarios that lie beyond the scope of typical training data. Current robotic systems, while proficient in well-rehearsed tasks, often falter when faced with even slight deviations from these established parameters. This necessitates extensive and time-consuming retraining for each new circumstance, creating a significant bottleneck in deployment and adaptability. The issue isn’t necessarily a lack of core capability, but rather the impracticality of anticipating and programming responses for every conceivable variation. Consequently, even minor changes in environment, object properties, or task requirements can trigger substantial performance degradation, highlighting the need for more robust and generalized learning approaches that move beyond rigid pre-programming.

Current robotic systems often demand exhaustive, step-by-step demonstrations for each new task, proving inefficient and impractical for real-world application. Researchers are now focused on developing methods that allow robots to learn incrementally from sparse human input – subtle corrections, high-level instructions, or even simple preferences – rather than requiring complete, pre-programmed sequences. This approach, termed ‘learning from demonstration’ with limited guidance, seeks to mimic the human ability to acquire skills through observation and refinement, enabling robots to adapt to unforeseen circumstances and continuously improve performance with minimal intervention. The aim is to bridge the gap between rigid automation and true autonomy, fostering robots capable of navigating the complexities of dynamic environments and assisting humans in increasingly versatile roles.

Using the MEMO framework, a robot successfully performs the task of emptying a cabinet in a real-world, zero-shot experiment.

Building a Digital Memory: The Robot’s Skillbook

A Skillbook functions as a repository for robot experiences, recording both the robot’s actions and the associated feedback received from human operators or the environment. This data is stored as episodic memories, detailing specific situations and outcomes. The core principle is to move beyond static programming by capturing a continuous stream of interaction data, allowing the robot to build a knowledge base over time. Each entry within the Skillbook typically includes sensor data, action commands executed by the robot, and a corresponding reward or correction signal representing the quality of the outcome. This allows for subsequent analysis and generalization of learned behaviors, forming the basis for adaptive and robust performance in novel situations.

The MEMO methodology leverages the Skillbook by employing a Language Model to analyze stored robot experiences and identify similarities between them. This clustering process groups experiences based on shared characteristics, allowing the system to abstract common patterns and generalize learned skills beyond the specific conditions in which they were originally acquired. The Language Model assesses the textual data associated with each experience – including task descriptions, sensor data summaries, and human feedback – to create vector embeddings representing semantic similarity. Experiences with close proximity in this vector space are then clustered, forming the basis for reusable skill representations and enabling transfer learning to novel, but related, scenarios.

Dynamically expanding a robot’s Skillbook – a repository of past experiences and associated feedback – enables adaptation to novel situations without necessitating complete retraining. This approach contrasts with traditional robotic systems requiring extensive re-programming for each new environment or task variation. By continuously adding new experiences to the Skillbook, the robot incrementally builds a broader knowledge base. This allows the system to leverage previously learned skills, even if not directly applicable, and generalize them to address unforeseen circumstances. The result is a more robust system capable of maintaining performance across a wider range of operating conditions and exhibiting improved resilience to unexpected events, thereby reducing downtime and enhancing overall reliability.

Skillbook clustering facilitates the development of generalized skills by grouping similar experiences based on shared characteristics. This process moves beyond storing individual, specific instances of successful actions; instead, it identifies underlying patterns and abstracts them into reusable skill representations. By consolidating redundant information, clustering minimizes storage requirements and computational load. The resulting abstract skills are not tied to the precise conditions of the original experiences, enabling the robot to apply them effectively across a wider range of novel situations and significantly improving task efficiency by avoiding the need to relearn variations of the same fundamental skill.

MEMO enhances skill generalization by dynamically collecting and clustering human feedback on atomic task decompositions, storing paraphrased interventions and successful actions as reusable skill templates for future action generation.

The Hybrid Approach: Combining Logic and Perception

A neuro-symbolic approach integrates the pattern recognition capabilities of neural networks with the logical reasoning of symbolic systems to address limitations inherent in each method when applied to robotic environments. Neural networks excel at perceptual tasks like object detection and scene understanding, but struggle with generalization and explainability. Symbolic reasoning provides structured knowledge representation and logical inference, yet requires manually defined rules and struggles with noisy or incomplete data. By combining these paradigms, robots can leverage neural networks for perception and symbolic systems for planning and decision-making, resulting in improved robustness, adaptability, and the ability to operate in complex, dynamic environments. This integration facilitates both high-level reasoning about tasks and low-level control of robotic actuators.

Direct Robot Correction (DROC) is a technique that integrates human feedback directly into a robot’s operational loop to improve performance. This method stores context-specific corrections – adjustments made by a human demonstrator to a robot’s actions – as parameterized updates to the robot’s policy. These corrections are not treated as isolated instances, but are generalized and applied to similar future situations encountered by the robot. The system learns to predict when and how to apply these context-aware corrections, effectively refining the robot’s behavior over time and allowing it to adapt to nuanced environmental factors or task variations that were not explicitly programmed. This approach contrasts with traditional reinforcement learning by leveraging immediate human guidance instead of requiring extensive trial-and-error exploration.

Foundation Models facilitate robotic task execution by breaking down complex goals into a sequence of manageable sub-steps. This decomposition allows for a modular approach to skill application, where each sub-step corresponds to a specific, executable action or procedure. By representing tasks as structured sequences, these models enable robots to leverage pre-trained knowledge and generalize to novel situations more effectively. This contrasts with monolithic action policies, as it allows for targeted refinement of individual sub-steps and facilitates the combination of existing skills to address more intricate challenges. The resulting framework supports hierarchical planning and allows robots to reason about task dependencies and preconditions, ultimately improving performance and robustness.

Vision and Language Models (VLMs), including RT-2 and Octo, function as essential components in robotic systems by translating perceptual input into a structured, machine-readable format and interpreting natural language instructions. These models utilize techniques to generate Scene Graphs, which represent the objects within an environment and their relationships to each other, providing a contextual understanding of the robot’s surroundings. Simultaneously, VLMs process task descriptions – expressed in natural language – to determine the desired outcome or goal. This dual capability allows robots to connect language-based commands with visual observations, enabling them to perform tasks specified through both modalities; RT-2, for example, demonstrates the ability to generalize to novel visual categories based on language instruction, while Octo showcases proficiency in multi-modal reasoning and task execution.

After analyzing 12 hours of user feedback, MEMO demonstrates superior zero-shot task success-achieving a 78% success rate on unseen tasks compared to 40% for DROC−−V-and requires offline clustering to master complex skills like emptying cabinets, closing bottles, and pouring cans.

The Rise of the Autonomous Agent: Robots That Teach Themselves

Agentic systems represent a paradigm shift in robotics, moving beyond pre-programmed instructions towards autonomous skill development. Systems like Voyager don’t simply execute tasks; they actively generate and store new, executable programs as reusable skills through a process of iterative self-improvement. This means the system isn’t limited to behaviors explicitly coded by humans; it can explore, experiment, and build its own repertoire of actions. Each successful attempt, or even a partially successful one, becomes a learning opportunity, with the resulting program stored for future use. This capability allows the system to accumulate knowledge and adapt to novel situations, essentially evolving its skillset over time and demonstrating a form of algorithmic self-teaching. The stored skills then become building blocks for tackling increasingly complex challenges, fostering a cycle of continuous learning and adaptation.

The development of ProgPrompt represents a significant step towards building truly adaptable robotic systems through the compositional creation of skills. Rather than pre-programming every possible action, ProgPrompt allows a robot to construct complex behaviors from a library of fundamental “skill primitives”-basic actions like grasping, moving, or observing. This modular approach mirrors how humans learn, assembling simple movements into intricate sequences. By intelligently combining these primitives, robots can generate novel behaviors to address unforeseen challenges, moving beyond the limitations of fixed programming. The resulting flexibility is not simply about executing a wider range of actions, but about the capacity to create new actions on demand, fostering a level of autonomy previously unattainable in robotics.

PragmaBot represents a significant leap in robotic autonomy through its capacity for experiential learning. This system doesn’t simply execute programs; it actively analyzes the consequences of its actions, essentially ‘reflecting’ on what worked and what didn’t. By storing these outcomes as learned experiences – a form of internal knowledge base – PragmaBot refines its future behavior without explicit reprogramming. This self-assessment allows the robot to build upon past successes and avoid repeating errors, leading to increasingly efficient and adaptable performance in novel situations. The ability to internalize and leverage experience distinguishes PragmaBot, paving the way for robotic systems that continuously improve and operate with greater independence.

Recent innovations in robotic skill generation are yielding demonstrably improved performance in real-world applications. Studies reveal that agentic robotic systems, capable of autonomously developing and storing learned behaviors, now achieve a 78% success rate when confronted with tasks they have never encountered before. This substantial figure highlights a critical shift from pre-programmed automation to genuinely adaptive robotics, where systems can independently problem-solve and refine their capabilities. The ability to generalize learned skills to novel situations represents a significant leap towards more versatile and reliable robotic assistants, promising increased efficiency and broader applicability across diverse operational environments.

Increasing the size of the skillbook-measured in user hours-improves MEMO's zero-shot success rate from approximately 30% to 80%, demonstrating that generalized guidance through local correction clustering is crucial for sustained performance beyond a certain point. — Increasing the size of the skillbook-measured in user hours-improves MEMO’s zero-shot success rate from approximately 30% to 80%, demonstrating that generalized guidance through local correction clustering is crucial for sustained performance beyond a certain point.

The pursuit of generalized skills, as detailed in this work with MEMO, feels predictably optimistic. It’s a familiar dance: elegant architectures designed to sidestep the inevitable entropy of production systems. The idea of a ‘skillbook’ – a repository of successful code aggregated from human feedback – is, in essence, just a more sophisticated form of the patch collection every engineer secretly maintains. As Grace Hopper famously said, “It’s easier to ask forgiveness than it is to get permission.” This sentiment rings true; MEMO attempts to build a framework for proactively acquiring skills, but one suspects the real innovation lies in how gracefully it handles the failures-the skills that don’t quite transfer, the edge cases the clustering algorithms miss. The system doesn’t prevent tech debt; it merely organizes it into neatly labeled categories.

What’s Next?

The ambition to build systems that accumulate skills, as MEMO attempts, inevitably runs into the realities of deployment. A skillbook, however elegantly clustered, is still just data – and data degrades. The question isn’t simply whether a robot can learn a new skill, but whether that skill remains reliable after weeks, months, or years of operation in unpredictable environments. Successful retrieval-augmented generation relies on successful examples; the edge cases, the subtly different scenarios, will always accumulate and require correction. Expect the “skill clustering” to become an increasingly complex process of damage control.

Furthermore, the reliance on human feedback introduces its own set of problems. Feedback is expensive, inconsistent, and frequently contradictory. A system that scales skill acquisition to match the complexity of the real world will need to address the bottleneck of human-in-the-loop refinement, perhaps by embracing imperfect automation or actively modeling the biases inherent in the feedback itself. The promise of a continually expanding skillset will likely be tempered by the practicalities of maintaining its quality.

It’s tempting to frame this as progress towards ‘general’ intelligence, but a more honest assessment acknowledges that each successful skill added is merely another layer of complexity. If code looks perfect, no one has deployed it yet. The field will likely spend the next decade not achieving ‘general’ skills, but building increasingly robust systems for managing the inevitable entropy of accumulated knowledge.

Original article: https://arxiv.org/pdf/2603.04560.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Control: Why Robots Still Need Us

Building a Digital Memory: The Robot’s Skillbook

The Hybrid Approach: Combining Logic and Perception

The Rise of the Autonomous Agent: Robots That Teach Themselves

What’s Next?

See also: