Humanoid Robots Learn to Lift and Move Heavy Loads

Author: Denis Avetisyan


Researchers have developed a reinforcement learning framework allowing a humanoid robot to perform complex loco-manipulation tasks involving significant force exertion.

Training a robotic policy with a targeted sampling strategy-focused on a target velocity of $0.7 \text{m/s}$ and zero target force-enables improved generalization across varying force levels, preventing convergence toward high-force scenarios that degrade locomotion performance under low-force conditions.
Training a robotic policy with a targeted sampling strategy-focused on a target velocity of $0.7 \text{m/s}$ and zero target force-enables improved generalization across varying force levels, preventing convergence toward high-force scenarios that degrade locomotion performance under low-force conditions.

A three-policy system enables whole-body control and curriculum learning for robust force-capable manipulation of heavy objects, demonstrated with a 112.8kg cart.

Despite advances in humanoid robotics, reliably combining dexterous manipulation with robust force control remains a challenge for high-load industrial applications. This work introduces a novel framework for ‘Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation’, presenting a three-policy reinforcement learning approach to address this gap. By decoupling upper and lower body control with a delta-command policy, and leveraging kinematic priors and force-based curriculum learning, the framework enables a humanoid robot to proactively interact with its environment. Could this approach unlock truly versatile and powerful human-robot collaboration in demanding industrial settings?


The Essence of Embodied Interaction

Truly versatile robotic manipulation extends far beyond the precise charting of movement paths. While trajectory planning dictates where a robot arm should go, achieving robust performance in real-world scenarios necessitates a dynamic interplay between locomotion and force control. A robot must be able to adjust its physical positioning – its gait and balance – to access objects, counteract disturbances, and maintain stability during interactions. Simultaneously, sensitive force control allows it to modulate the pressure applied during grasping and assembly, preventing damage to both the manipulated object and the robot itself. Without this integrated approach, robots remain limited to highly structured environments and predictable tasks, unable to navigate the complexities and uncertainties inherent in dynamic, unstructured settings.

Conventional robotic systems often falter when faced with real-world complexity due to a fragmented approach to motion and force control. Historically, locomotion and manipulation have been treated as largely independent problems, with separate algorithms governing each. This separation results in brittle performance; a robot might successfully execute a pre-programmed sequence in a controlled environment, but struggle when confronted with unexpected obstacles, variations in object pose, or dynamic changes in its surroundings. The lack of seamless integration necessitates extensive, and often impractical, re-programming for even minor deviations from ideal conditions, hindering adaptability and ultimately limiting the robot’s effectiveness in unstructured or unpredictable scenarios. Consequently, achieving truly robust and efficient robotic performance demands a unified framework capable of coordinating locomotion and manipulation in a fluid and responsive manner.

The Unitree G1 quadrupedal robot presents a uniquely advantageous platform for advancing the field of loco-manipulation, bridging the gap between mobile robotics and dexterous manipulation. Its dynamic locomotion capabilities, combined with a versatile robotic arm, allow researchers to develop and rigorously test policies that integrate movement and force control in complex, real-world scenarios. This platform’s robustness and adaptability are crucial for simulating the unpredictable demands of industrial environments, where robots must navigate obstacles while simultaneously performing intricate tasks. By leveraging the G1, scientists are able to move beyond simulations and evaluate the practicality of algorithms designed for a fully functional, mobile manipulator – a system poised to revolutionize automation in manufacturing, logistics, and beyond.

A delta-command policy successfully stabilized the end-effector pose during robotic manipulation, as demonstrated by experiments with a 4kg payload and during dynamic movements like squatting with a mounted gimbal, preventing downward displacement observed without the policy.
A delta-command policy successfully stabilized the end-effector pose during robotic manipulation, as demonstrated by experiments with a 4kg payload and during dynamic movements like squatting with a mounted gimbal, preventing downward displacement observed without the policy.

Learning Through Integrated Action

Reinforcement learning was utilized to develop independent control policies for the robot’s lower and upper body. The lower-body policy governs both locomotion and the application of forces, enabling the robot to maintain stability and exert external forces. Simultaneously, a separate reinforcement learning policy was trained for the upper body, focusing on manipulation tasks. This dual-policy approach allows for the coordinated execution of complex behaviors requiring both movement and interaction with the environment, such as carrying objects while simultaneously applying force to another object. The policies are trained concurrently, but are distinct, allowing for specialization and efficient learning of individual skills.

The lower-body policy was trained using curriculum learning, a method that sequences tasks from simple to complex to improve learning efficiency and stability. Initial training focused on basic locomotion skills, establishing a foundational ability to move and maintain balance. Subsequent stages progressively introduced force control challenges, requiring the robot to exert and regulate forces with its lower body. This incremental approach allowed the policy to first master fundamental movements before tackling more demanding tasks involving active force application, ultimately enabling the robot to handle dynamic interactions with its environment and support complex manipulation scenarios.

The integrated reinforcement learning approach successfully trained the robot to perform a complex cart-manipulation task, demonstrating coordinated locomotion and manipulation capabilities. Specifically, the robot achieved stable bipedal locomotion while simultaneously carrying a 4kg object and exerting force on a cart with a total mass of 112.8kg. This task required the robot to dynamically adjust its gait and force application to maintain balance and effectively move the loaded cart, validating the efficacy of the learned policy in a challenging, real-world scenario.

This system architecture integrates upper-body, lower-body, and delta-command policies to enable coordinated loco-manipulation, with filled circles denoting data outputs and hollow circles indicating no output.
This system architecture integrates upper-body, lower-body, and delta-command policies to enable coordinated loco-manipulation, with filled circles denoting data outputs and hollow circles indicating no output.

Robustness Through Simulation and Refinement

Training within the IsaacSim robotic simulation platform facilitates rapid data collection and policy optimization by enabling parallel simulation and automated data logging. This approach bypasses the limitations and costs associated with real-world data acquisition, such as time, equipment wear, and potential safety concerns. IsaacSim’s physics engine allows for the generation of diverse training scenarios and facilitates the evaluation of policy performance across a range of conditions. The platform supports programmatic control of simulation parameters, enabling systematic exploration of the policy’s parameter space and accelerating the optimization process. Furthermore, the generated synthetic data is perfectly labeled, removing the need for manual annotation and streamlining the training pipeline for reinforcement learning and imitation learning algorithms.

Domain randomization systematically varies simulation parameters during training to create a diverse dataset of synthetic experiences. This process includes randomizing physical properties such as friction, mass, and damping, as well as visual characteristics like textures and lighting conditions. By training policies across this broad distribution of simulated environments, the resulting agent develops increased robustness to variations encountered in real-world deployment. This approach effectively bridges the reality gap, reducing the need for extensive real-world training data and improving the policy’s ability to generalize to previously unseen conditions, ultimately enhancing performance and reliability.

Implementation of the Delta-Command Policy resulted in quantifiable improvements to end-effector stability during dynamic locomotion. Specifically, testing demonstrated a 57% reduction in oscillation amplitude, indicating a decrease in unwanted movements of the end-effector. Furthermore, the policy maintained a mean end-effector pose error of only 1.5 cm while performing the squatting maneuver, demonstrating precise control and minimal deviation from the desired trajectory. These metrics were obtained through repeatable simulations and provide objective data on the policy’s performance characteristics.

The delta-command policy (red) demonstrably improves end-effector height control during robot locomotion at 0.7 m/s, as compared to its absence (blue), while accounting for torso height variation (black).
The delta-command policy (red) demonstrably improves end-effector height control during robot locomotion at 0.7 m/s, as compared to its absence (blue), while accounting for torso height variation (black).

Towards Adaptive and Versatile Robotic Systems

Recent research showcases a robotic system capable of dynamically interacting with its environment through the manipulation of significant loads; the system successfully demonstrated the ability to push and pull a cart weighing 112.8 kg. This achievement isn’t simply about brute force, but a sophisticated integration of multiple robotic competencies. The robot coordinates its locomotion – the ability to move effectively – with precise force control, ensuring stable interaction with the cart, and nuanced manipulation to guide its movement. This coordinated approach allows the robot to not only overcome the inertia of a heavy load but also to maintain balance and adapt to external disturbances, representing a substantial step towards more versatile and practical robotic agents capable of assisting in real-world logistics and material handling scenarios.

The Delta-Command Policy functions as an advanced supervisory layer, significantly augmenting the robot’s existing upper-body control capabilities. Rather than dictating precise motor commands, this policy generates high-level tracking targets – desired positions and velocities – which the lower-level controllers then execute. This abstraction is crucial; it allows the robot to respond dynamically to changing conditions and complex task demands without requiring constant, detailed re-programming. By focusing on where the robot should be, rather than how to get there, the Delta-Command Policy facilitates smoother, more adaptable movements and enables the robot to effectively coordinate locomotion, force application, and manipulation – ultimately expanding the range of tasks it can reliably perform in real-world settings.

The development of robotic systems capable of navigating and interacting with real-world settings demands a move beyond constrained laboratory environments. This research contributes to that progression by demonstrating a robotic agent that effectively integrates locomotion, force control, and manipulation skills – abilities crucial for handling dynamic and unpredictable scenarios. Successfully managing a heavy payload while navigating suggests a pathway towards robots that can assist in logistics, construction, or even disaster relief, where adaptability and physical strength are paramount. Ultimately, this work signifies a crucial incremental step towards versatile robotic agents capable of undertaking a broader spectrum of complex tasks in environments that are, by their nature, unstructured and challenging.

Experiments successfully demonstrated the robot's ability to both push and pull a cart, showcasing its versatile manipulation skills.
Experiments successfully demonstrated the robot’s ability to both push and pull a cart, showcasing its versatile manipulation skills.

The pursuit of robust loco-manipulation, as demonstrated in this work, necessitates a surgical approach to control. Abstraction layers must prove their utility; superfluous complexity hinders rather than helps. This research champions a curriculum learning strategy, incrementally increasing task difficulty-a method echoing the principle of building intuition. Barbara Liskov aptly stated, “It’s one of the challenges of software development that we often don’t know what we need until we see it working.” The successful execution of high-load manipulation – pushing a 112.8kg cart – validates this approach, showcasing a system where functionality precedes elaborate design, aligning with the ethos that code should be as self-evident as gravity.

What Lies Ahead?

The demonstrated capacity for force-capable loco-manipulation, while a necessary step, merely reframes the central challenge. The successful exertion of force-moving a 112.8kg cart-is not, in itself, intelligence. It is a symptom of it. The true limitation remains not in the robot’s actuators, but in the impoverishment of the state space. Current curricula, even those leveraging reinforcement learning, are predicated on a fundamentally incomplete understanding of environmental interaction. To address this, future work must prioritize the development of predictive models capable of anticipating, rather than reacting to, external disturbances.

The architecture’s reliance on discrete policies, however effective, introduces a combinatorial burden. Scaling this framework to more complex scenarios will necessitate a transition towards continuous, parameterized control. The elegance of a delta-command policy is diminished by its inherent rigidity. Exploration of latent-space representations, allowing for nuanced adaptation to unforeseen circumstances, offers a potential, though computationally demanding, avenue for improvement. Unnecessary is violence against attention; the current paradigm necessitates increasingly complex curricula to compensate for a lack of inherent adaptability.

Ultimately, the pursuit of ‘general’ loco-manipulation will necessitate a reckoning with the problem of embodiment. The robot’s physical form is not merely a constraint, but a crucial component of its cognitive architecture. Density of meaning is the new minimalism; simplification must occur not through abstraction, but through a deeper understanding of the interplay between morphology, dynamics, and environmental affordances.


Original article: https://arxiv.org/pdf/2511.21169.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-27 07:13