Steady Hands: Teaching Robots to Carry the Load

Author: Denis Avetisyan


New research details a robust locomotion framework enabling humanoid robots to reliably transport objects in challenging industrial settings.

The system integrates perception-driven kinematic control of upper-body manipulation-informed by six-dimensional object pose estimation-with a residual reinforcement learning policy governing lower-body locomotion, utilizing height-conditioned offsets and a history-based state estimator to infer base velocity, height, and latent disturbances induced by both load and manipulation-a design acknowledging that all dynamic systems inevitably bear the mark of prior states and external forces.
The system integrates perception-driven kinematic control of upper-body manipulation-informed by six-dimensional object pose estimation-with a residual reinforcement learning policy governing lower-body locomotion, utilizing height-conditioned offsets and a history-based state estimator to infer base velocity, height, and latent disturbances induced by both load and manipulation-a design acknowledging that all dynamic systems inevitably bear the mark of prior states and external forces.

This work presents a reinforcement learning and kinematic control approach to stable, height-conditioned box handling for partially observable humanoid robots.

Achieving stable locomotion in humanoid robots is fundamentally challenged by the dynamic coupling between payload and whole-body motion, particularly in industrial settings. This paper, ‘Load-Aware Locomotion Control for Humanoid Robots in Industrial Transportation Tasks’, introduces a novel framework that integrates reinforcement learning for lower-body control with kinematics-based manipulation, enabling robust load-carrying in complex environments. By leveraging height-conditioned offsets, history-based state estimation, and kinematics-based references, the approach achieves faster training and accurate height tracking without requiring fine-tuning in the real world. Could this decoupled yet coordinated loco-manipulation architecture pave the way for more adaptable and efficient humanoid robots in demanding industrial applications?


The Inevitable Drift: Confronting Dynamic Locomotion

Conventional robotic locomotion systems frequently encounter difficulties when operating outside of carefully controlled settings. These systems typically rely on precise models of the environment and the robot’s own mechanics, a reliance that quickly breaks down in the face of uneven terrain, unexpected obstacles, or even minor shifts in payload. This inherent fragility stems from an inability to effectively compensate for disturbances – a gust of wind, a slippery surface, or an unanticipated load – which can destabilize the robot and halt progress. Consequently, these robots exhibit limited adaptability, requiring significant pre-programming for specific scenarios and struggling to generalize to novel situations, hindering their potential for real-world applications requiring robust and flexible movement.

Robust locomotion isn’t simply about moving forward; it demands a system’s ability to function effectively even when information is incomplete and external factors are unpredictable. A truly adaptable robot must contend with partial observability – the inability to fully perceive its environment or its own internal state – and uncertain payloads, meaning the weight and distribution of carried objects may be unknown or variable. This presents a significant engineering challenge, as traditional control systems rely on precise data. Consequently, research focuses on developing algorithms that can estimate missing information, predict potential disturbances, and adjust movements in real-time to maintain stability and efficiency, even under imperfect conditions. The goal is not flawless knowledge, but rather a system capable of skillful improvisation and resilient performance despite inherent uncertainties.

Existing locomotion control systems frequently struggle when confronted with the realities of dynamic environments, primarily due to a limited capacity for real-time adaptation. While theoretical models often assume perfect knowledge of the robot’s state and payload, practical applications demand responsiveness to unforeseen changes – an uneven terrain, a shifting center of gravity, or an unexpectedly heavy load. The inability to effectively integrate sensory feedback – data from force sensors, accelerometers, and vision systems – hinders a robot’s capacity to compensate for these disturbances. Consequently, current methods often rely on pre-programmed responses or slow, reactive adjustments, resulting in instability, inefficient energy expenditure, and a diminished ability to navigate complex and unpredictable landscapes. Addressing this limitation necessitates the development of control algorithms capable of rapidly processing sensory information and dynamically modifying gait parameters to maintain balance and optimize performance under varying conditions.

This robot successfully demonstrated loco-manipulation in a realistic industrial setting-reconstructed from the World Humanoid Robot Games-by both transporting boxes directly and performing depalletizing while carrying a [latex]55[/latex] kg payload.
This robot successfully demonstrated loco-manipulation in a realistic industrial setting-reconstructed from the World Humanoid Robot Games-by both transporting boxes directly and performing depalletizing while carrying a [latex]55[/latex] kg payload.

Modular Resilience: Decoupled Control for Adaptability

Decoupled control in robotics achieves modularity by independently managing the control of various locomotion subsystems – such as joint torques, foot placement, and body posture – rather than employing a monolithic control scheme. This separation allows for targeted adjustments to individual components without requiring recalculation of the entire locomotion plan. The resulting flexibility facilitates adaptation to unexpected disturbances and allows robots to execute a wider range of movements. Furthermore, decoupled architectures simplify the design and debugging process, as each subsystem can be tested and refined in isolation before integration. This modularity also enables easier scaling and reconfiguration of robotic systems to suit different tasks or environments.

Decoupled control systems utilize kinematic control to establish desired robot motions, which are then refined through the integration of tactile feedback. This feedback, typically provided by force/torque sensors or tactile arrays, allows the robot to detect unexpected external forces-such as pushes, collisions, or uneven terrain-and dynamically adjust its trajectory. The system calculates corrective actions based on the magnitude and direction of these forces, enabling it to maintain balance and stability during complex maneuvers. This reactive capability is crucial for robust locomotion in unpredictable environments and allows for adaptation to disturbances that would otherwise compromise stability or cause a fall. The combination of pre-planned kinematic trajectories and real-time tactile response significantly improves a robot’s ability to navigate challenging and dynamic scenarios.

The WoCoCo (Whole-body Contact Control) framework improves robot adaptability by breaking down complex locomotion tasks into a sequence of simpler contact stages. This decomposition allows for the independent planning and control of each stage, focusing on maintaining stability and achieving desired foot placements during transitions between stances. By sequentially addressing contact events – initial contact, stance phase, and lift-off – the framework enables robots to negotiate uneven or unpredictable terrain without requiring a globally optimal solution. This staged approach reduces computational complexity and allows for real-time adaptation to external disturbances or unexpected ground features, ultimately enhancing robustness in challenging environments.

Reconstructing the Present: State Estimation and Dynamic Stability

History-based state estimators address the challenge of partial observability by leveraging temporal data. These estimators do not rely solely on current sensor readings; instead, they integrate past observations into a dynamic model. This model, often incorporating latent representations learned from training data, predicts the current system state based on its historical trajectory. By maintaining an internal representation of the system’s past, the estimator can infer unobserved states and reduce the impact of sensor noise or occlusions. This approach is particularly effective in dynamic systems where the state evolves continuously and is influenced by previous states and actions.

Kinematics-based locomotion references and height-conditioned offset control work in conjunction to maintain accurate tracking of desired trajectories and effectively compensate for external loads. Locomotion references define the intended path and velocity of the system, while height-conditioned offset control adjusts the center of mass to counteract disturbances and maintain stability. This control scheme utilizes the kinematic model to predict future states and proactively adjust joint torques, allowing for precise tracking even when subjected to unpredictable forces or changes in payload. The height-conditioned aspect specifically modulates the offset to maintain a consistent center of gravity, crucial for dynamic balancing and preventing falls or instability during locomotion.

State estimation performance was quantified using Mean Squared Error (MSE) across three degrees of freedom during dynamic locomotion. Specifically, the system achieved an MSE of 2.15 x 10-3 for base velocity in the x-direction (vx), 8.86 x 10-4 for base velocity in the y-direction (vy), and 10-6 for base height. These values indicate a high degree of accuracy in estimating the robot’s state variables, even while undergoing dynamic movements, and demonstrate the efficacy of the implemented state estimation methods for precise control and balancing.

The reported state estimation and dynamic stability results were obtained using a floating-base articulated system. This robotic configuration is characterized by unrestrained movement in its base, granting six degrees of freedom – three translational and three rotational – crucial for maintaining balance during dynamic locomotion. Unlike fixed-base systems, the floating base allows the robot to actively react to external disturbances and center of mass shifts without being constrained by a fixed ground connection. This increased kinematic flexibility is essential for implementing the necessary control strategies to counteract destabilizing forces and accurately track desired trajectories, ultimately enabling stable and precise dynamic balancing capabilities.

Under dynamic motion, the history-based state estimator accurately predicts base velocity in both the [latex]v_x[/latex] and [latex]v_y[/latex] directions, as well as base height, as demonstrated by its low mean squared error.
Under dynamic motion, the history-based state estimator accurately predicts base velocity in both the [latex]v_x[/latex] and [latex]v_y[/latex] directions, as well as base height, as demonstrated by its low mean squared error.

Bridging the Divide: Sim-to-Real Transfer and Future Directions

Domain randomization addresses the persistent challenge of transferring robotic skills learned in simulation to the complexities of the real world. This technique intentionally introduces variability within the simulation environment – altering parameters like friction, lighting, object textures, and even the robot’s own mass and joint properties – effectively creating a diverse training landscape. By exposing the robot to a wide range of simulated conditions, the resulting control policies become inherently more robust and less sensitive to the inevitable discrepancies between the simulation and reality. Consequently, a robot trained with domain randomization is far more likely to generalize successfully to unpredictable real-world scenarios, demonstrating improved performance and reliability without requiring extensive retraining or manual adjustments in the physical environment.

Recent advancements in robotics have culminated in a demonstrable transfer of skills from simulation to a physical humanoid robot tasked with industrial box transportation. This framework successfully enabled a robot to perform the complex motions required to move boxes weighing up to 55kg in a real-world environment, bridging the persistent gap between virtual training and physical execution. The achievement highlights the efficacy of the developed system in handling substantial payloads while navigating the inherent uncertainties of a non-simulated workspace. This successful sim-to-real transfer signifies a crucial step towards deploying humanoid robots in demanding industrial settings, paving the way for increased automation and efficiency in logistics and manufacturing.

Continued refinement of robotic systems benefits significantly from the integration of optimization-based frameworks and reinforcement learning algorithms, enabling ongoing performance gains and increased adaptability. These approaches move beyond pre-programmed behaviors, allowing robots to learn from experience and fine-tune their actions in response to changing conditions or unforeseen circumstances. Optimization techniques systematically search for the best possible control parameters, while reinforcement learning encourages exploration and rewards successful outcomes, fostering a cycle of continuous improvement. This paradigm is particularly valuable in complex, real-world scenarios where static programming proves insufficient, and the ability to learn and adapt is crucial for sustained, reliable operation – effectively allowing the robot to become more proficient with each repetition of a task or encounter with a novel situation.

Effective industrial box handling by humanoid robots demands precise coordination across numerous degrees of freedom, and whole-body control strategies provide the necessary framework to achieve this. Rather than treating individual joints in isolation, this approach considers the robot’s entire system – encompassing arms, legs, torso, and even foot placement – as a unified mechanism. By simultaneously optimizing the movement of all these components, the robot can maintain balance and stability while manipulating heavy payloads, such as the 55kg boxes used in this study. This holistic control allows for efficient trajectory planning, minimizing wasted energy and maximizing speed, while also enabling the robot to adapt to unexpected disturbances or changes in the environment, ensuring robust and reliable performance in real-world industrial settings.

Training comparisons reveal that different offset and reference formulations impact both the convergence of episode reward and the efficiency of episode length over environment steps.
Training comparisons reveal that different offset and reference formulations impact both the convergence of episode reward and the efficiency of episode length over environment steps.

The pursuit of robust locomotion, as detailed in this work, mirrors a natural process of adaptation. Systems, whether biological or robotic, inevitably encounter imperfect information and external disturbances. This research addresses partial observability by employing history-based state estimation, allowing the humanoid robot to navigate industrial tasks despite incomplete sensory data. As Blaise Pascal observed, “The eloquence of a man does not depend on the words he knows, but on the things he thinks.” Similarly, the robot’s success isn’t solely defined by the complexity of its algorithms, but by its capacity to ‘think’ – or rather, to estimate and respond – to an ever-changing environment, allowing the system to age gracefully even under load.

What Lies Ahead?

This work demonstrates a functional coupling of learned and pre-programmed control, a common trajectory for robotic systems. However, any improvement ages faster than expected; the demonstrated robustness, while significant, will inevitably erode as operational environments shift and task demands evolve. The partial observability addressed here isn’t a solved problem, merely a mitigated one. Future iterations will likely necessitate increasingly sophisticated methods for anticipating and compensating for sensory limitations, pushing the boundaries of state estimation and prediction.

The core framework’s reliance on height-conditioned offsets and kinematics-based references represents a localized optimization. Scaling this approach-extending it to more complex payloads, dynamic environments, or collaborative scenarios-will expose inherent limitations. Rollback, the journey back along the arrow of time, will become increasingly difficult as the system navigates greater degrees of freedom and unpredictable interactions. The true test lies not in achieving initial stability, but in maintaining it over prolonged operation.

Ultimately, the field must confront the inevitability of decay. The question isn’t whether these systems will fail, but how they will fail, and whether those failures can be anticipated and gracefully accommodated. A focus on inherent resilience-designing for degradation rather than striving for perpetual optimization-may prove a more fruitful avenue for long-term progress.


Original article: https://arxiv.org/pdf/2603.14308.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-17 15:11