Robots Learn to Walk and Work

Author: Denis Avetisyan

A new control framework empowers robots to seamlessly combine locomotion and manipulation tasks, opening doors for more versatile and adaptable robotic systems.

The study contrasts standard dynamics rollouts-where actions directly control a multi-body dynamics model at the joint level-with a network-policy-augmented approach, demonstrating how actions can instead serve as inputs to a low-level locomotion policy, effectively shifting the control paradigm.

This work presents Sumo, a hierarchical control approach integrating pre-trained reinforcement learning with sample-based model predictive control for robust whole-body loco-manipulation in quadruped and humanoid robots.

Achieving robust and adaptable whole-body manipulation remains a challenge for legged robots operating in complex, real-world environments. This paper introduces ‘Sumo: Dynamic and Generalizable Whole-Body Loco-Manipulation’, a hierarchical control framework that combines pre-trained reinforcement learning with real-time sample-based model predictive control to enable dynamic and versatile loco-manipulation on both quadrupedal and humanoid platforms. Demonstrated through tasks like uprighting heavy tires and dragging barriers exceeding the robot’s size, Sumo achieves generalization across diverse objects and scenarios without task-specific tuning. Could this approach unlock a new era of adaptable robotic agents capable of seamlessly interacting with and modifying their surroundings?

The Illusion of Control: Why Robots Still Can’t Handle Reality

Conventional robotic control strategies often prove inadequate when addressing the intricacies of loco-manipulation – the coordinated execution of movement and object interaction. These methods typically rely on meticulously crafted mathematical models that describe a robot’s dynamics and its environment. However, creating accurate models for complex, real-world scenarios is exceptionally challenging, and even slight discrepancies can lead to instability or failure. Furthermore, these systems demand extensive, task-specific tuning – a process of painstakingly adjusting numerous control parameters to achieve desired performance. This sensitivity limits their adaptability; a robot expertly calibrated for one task may struggle significantly when presented with even minor variations or unexpected disturbances, hindering its ability to operate reliably in dynamic and unstructured environments.

Robotic systems designed for real-world interaction frequently encounter environments that deviate from idealized conditions, causing performance degradation. Unexpected disturbances – a slippery floor, an object shifting during grasping, or even subtle changes in payload weight – can disrupt carefully programmed movements and lead to instability. Traditional control architectures, heavily reliant on precise environmental models, struggle to compensate for these unforeseen events. This limitation significantly hinders a robot’s adaptability; a system proficient in a controlled laboratory setting may falter when deployed in a dynamic, unstructured environment. Consequently, achieving true robotic versatility demands control strategies that prioritize robustness and the ability to gracefully recover from disturbances, rather than solely focusing on precise trajectory tracking.

Traditional robotic control relies heavily on pre-programmed sequences and accurate environmental models, a strategy that proves brittle when confronted with the unpredictable nature of real-world tasks. To overcome this limitation, researchers are increasingly focused on control methodologies that embrace, rather than resist, inherent instability within robotic systems. These approaches, often leveraging techniques like Model Predictive Control and reinforcement learning, allow robots to dynamically adjust to disturbances and uncertainties. Instead of striving for perfect precision, the goal is to create systems that can recover from imbalances and maintain functionality even when faced with unexpected events. This shift towards adaptable control promises a new generation of robots capable of navigating complex environments and performing intricate tasks with a level of robustness previously unattainable, moving beyond rigidly defined operations toward truly versatile performance.

The seamless integration of locomotion and manipulation is paramount for robots operating in real-world scenarios. Achieving proficient loco-manipulation isn’t simply a matter of executing each action independently; rather, it requires a sophisticated interplay where movement and object interaction are continuously adjusted in response to one another. A robot must anticipate how its movements will affect its ability to manipulate objects, and conversely, how manipulating an object will influence its stability and trajectory. This coordination demands precise timing, force control, and a predictive capability that allows the robot to react effectively to disturbances and maintain balance while actively engaging with its environment. Successfully navigating and interacting with complex surroundings hinges on this synergistic relationship between how a robot moves and what it does with its hands – or end-effectors.

This system integrates a pre-trained whole-body control policy operating at 5050 Hz with a high-level, sample-based model predictive control (MPC) running at 2020 Hz to optimize task performance by generating desired limb and torso commands.

Sumo: A Hybrid Approach to Masking Inevitable Failure

The Sumo framework integrates pre-trained reinforcement learning (RL) policies with online, sample-based Model Predictive Control (MPC) to achieve robust and adaptable robot control. The RL component provides an initial, generalized policy trained offline, offering a foundation for whole-body movement and efficient task initiation. This pre-trained policy is then refined in real-time by the MPC, which utilizes sampled predictions of the robot’s future states to optimize actions based on a defined cost function. By combining the exploration capabilities and generalization of RL with the precise, optimization-based control of MPC, the framework aims to overcome limitations inherent in each individual approach, resulting in a system that is both robust to disturbances and capable of high-performance task execution.

The Sumo framework utilizes a pre-trained ‘Generalist Policy’ as the initial control source for whole-body movements. This policy, trained offline, provides a robust and generalized skillset applicable to a range of tasks, thereby circumventing the need for random exploration during online operation. By establishing a functional baseline, the Generalist Policy significantly accelerates task initialization and improves sample efficiency. This pre-training process equips the system with a foundational understanding of dynamics and control, enabling it to quickly adapt to new scenarios and perform complex maneuvers with greater stability and reliability compared to systems reliant solely on online learning.

Sample-Based Model Predictive Control (MPC) within the Sumo framework operates by repeatedly solving an optimization problem over a finite prediction horizon. At each time step, the MPC utilizes a dynamic model to predict the system’s future states based on a set of candidate control actions. These actions are then evaluated against defined ‘Task Objectives’ using a [latex]cost function[/latex], which quantifies performance metrics such as tracking error, energy consumption, or obstacle avoidance. The control action minimizing this cost is implemented, and the process repeats, allowing the system to adapt to environmental changes and refine its behavior in real-time without requiring explicit re-training of the underlying dynamic model. This approach enables robust performance even in uncertain or dynamic environments by continuously re-planning based on the latest observations.

The Sumo framework utilizes a cost function to numerically evaluate the performance of actions selected by the Model Predictive Control (MPC) algorithm. This function assigns a scalar value representing the degree to which a proposed action satisfies defined ‘Task Objectives’, such as minimizing tracking error, maximizing speed, or conserving energy. The MPC then iteratively solves an optimization problem – minimizing this cost function over a prediction horizon – to determine the optimal sequence of control actions. The specific formulation of the cost function includes weighted terms for each objective, allowing designers to prioritize certain aspects of performance over others and tune the system’s behavior. The resulting cost value provides a quantitative metric for comparing different control strategies and guiding the MPC towards achieving the desired outcomes.

From Simulation to Reality: A Carefully Constructed Illusion

Sumo’s capability in contact-rich manipulation stems from its design to explicitly model and control the complex interactions occurring at contact surfaces. This allows the robot to respond dynamically to variations in object pose, surface friction, and external disturbances during manipulation tasks. Unlike systems that treat contacts as simple constraints, Sumo predicts contact forces and torques, enabling nuanced control of object motion and preventing instability. This is crucial for tasks requiring precise force application or adapting to uncertain object properties, resulting in a more robust and reliable manipulation performance.

The Sumo framework incorporates motion capture data to refine the accuracy of its object models. This process involves recording the precise movements of objects during manipulation and using that data to build a more realistic and detailed representation within the simulation environment. By improving the fidelity of the object model – specifically, its dynamic properties like mass distribution, friction coefficients, and collision shapes – the framework enables the development of more effective manipulation strategies. These refined models allow for more accurate prediction of object behavior during simulated interactions, which directly translates to improved performance when deploying those strategies on a physical robot.

Sumo incorporates reinforcement learning (RL) to enable robots to autonomously refine their manipulation strategies. This is achieved by defining a reward function that incentivizes successful task completion and penalizes failures or inefficient movements. Through iterative interaction with the environment – both simulated and real-world – the robot’s control policy is updated based on the received rewards, using algorithms such as Proximal Policy Optimization. This process allows the robot to learn complex manipulation skills, adapt to variations in object properties and environmental conditions, and continuously improve its performance over time without explicit programming for each scenario.

The Sumo framework specifically addresses the challenge of sim-to-real transfer, enabling robotic control policies trained in simulation to function effectively in physical environments. Validation experiments yielded a 100% success rate across multiple manipulation tasks – including tire uprighting and box pushing – each tested with ten individual trials. This performance indicates a robust transfer of learned behaviors from the simulated environment to real-world operation, minimizing the need for extensive retraining or adaptation in the physical world.

Spot successfully demonstrates a diverse range of manipulation and locomotion skills, including uprighting objects like tires, barriers, and cones, stacking tires, and both dragging and pushing heavy objects.

The Inevitable March of Automation: Or, How We Pretend It’s Going Well

The Sumo framework represents a significant advancement in robotic control, enabling robots to perform intricate, coordinated movements crucial for modern logistics and manufacturing environments. This capability extends beyond pre-programmed routines, allowing for dynamic adjustments in response to changing conditions and unexpected obstacles. Robots powered by Sumo can autonomously manage complex manipulation tasks-like precisely placing components on an assembly line or efficiently sorting packages-previously requiring significant human intervention. The framework’s strength lies in its ability to seamlessly integrate motion planning, control, and adaptation, paving the way for more versatile and reliable robotic systems capable of handling the demands of fast-paced, unstructured workplaces and ultimately boosting productivity and efficiency.

The Sumo framework achieves enhanced robotic autonomy through a synergistic blend of pre-trained intelligence and on-the-fly responsiveness. Rather than relying solely on pre-programmed instructions, the system leverages learned policies – essentially, a robot’s accumulated experience – to initiate tasks. However, recognizing that real-world environments are rarely predictable, the framework incorporates real-time adaptation, allowing the robot to dynamically adjust its actions based on immediate sensory input. This capability is crucial for navigating unstructured settings – warehouses, construction sites, or even disaster zones – where unexpected obstacles or variations in terrain are commonplace. By continuously refining its approach, the robot maintains reliable performance even when faced with unforeseen challenges, marking a significant step toward truly versatile and independent robotic operation.

Achieving robust robotic control in complex scenarios demands precise parameter tuning, and this work leverages the Cross-Entropy Method to optimize the parameters governing Model Predictive Control (MPC). This optimization process ensures not only stable operation but also efficient task execution, as demonstrated through rigorous testing involving tire manipulation. Specifically, robots utilizing this optimized MPC achieved an average completion time of 9.2 ± 4.7 seconds when righting an overturned tire – a surprisingly swift action for a complex robotic task. Further highlighting the system’s capabilities, the robots successfully stacked a tire atop another in 88% of trials (10 out of 10 attempts), showcasing a level of dexterity previously challenging for autonomous systems. This precise control, enabled by the Cross-Entropy Method, represents a significant step towards more reliable and adaptable robotic performance in real-world applications.

The development of robotic systems capable of fluid interaction with complex environments represents a significant step toward widespread human-robot collaboration. Recent advancements showcase a pathway to achieving this seamless integration, evidenced by a demonstrated ability to perform tasks requiring both navigation and physical manipulation. Specifically, testing revealed a 99% success rate in dragging a crowd barrier – a task demanding adaptable locomotion and force control – highlighting the potential for these robots to assist in real-world scenarios like event management or disaster response. This level of performance suggests a future where robots can reliably and safely operate alongside humans, augmenting their capabilities and streamlining a variety of labor-intensive processes.

The pursuit of elegant control frameworks, as demonstrated by Sumo’s hierarchical approach, often feels like building sandcastles against the tide. This work attempts to bridge the gap between learned policies and real-time adaptation, a laudable goal, yet one inevitably met with the unpredictable nature of production environments. It’s a constant recalibration, a patching of assumptions. As Edsger W. Dijkstra observed, “It’s always time to start improving the things that aren’t working.” Sumo, with its integration of reinforcement learning and sample-based MPC, embodies that iterative process – a refinement built upon the recognition that perfect models are a fiction, and robust loco-manipulation requires a willingness to embrace, and compensate for, imperfection. The system will inevitably require patching; the question is whether those patches are anticipated within the framework’s design.

The Road Ahead

This Sumo framework, elegantly layering learned policies atop model predictive control, feels… familiar. The history of robotics is littered with attempts to bridge the reality gap between simulation and, well, reality. Pre-trained policies are merely a new coat of paint on the old problem of distribution shift. Production, as always, will find a way to introduce edge cases the simulations missed-a slightly uneven floor, a rogue cable, a particularly ambitious dust bunny.

The true test isn’t whether this works in a lab, but how gracefully it fails in the wild. Future work will inevitably focus on more robust adaptation, perhaps through online learning or better uncertainty quantification. But let’s be honest: that’s just kicking the can down the road. The underlying tension between learning and control isn’t solved, merely re-packaged. Everything new is old again, just renamed and still broken.

One suspects the next iteration will involve even more layers of abstraction, chasing an ever-elusive ‘general’ solution. Perhaps the real breakthrough won’t be in the algorithms themselves, but in accepting that a little bit of controlled clumsiness is simply inherent to embodied intelligence. After all, robots, like people, rarely achieve perfection. They just learn to pick themselves up after falling.

Original article: https://arxiv.org/pdf/2604.08508.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Control: Why Robots Still Can’t Handle Reality

Sumo: A Hybrid Approach to Masking Inevitable Failure

From Simulation to Reality: A Carefully Constructed Illusion

The Inevitable March of Automation: Or, How We Pretend It’s Going Well

The Road Ahead

See also: