Rolling with the Times: AI-Powered Quadruped Skating

Author: Denis Avetisyan

Researchers have developed a quadrupedal robot capable of surprisingly efficient locomotion using passive wheels, driven by a novel co-design approach.

A bilevel co-design framework iteratively refines both the physical design [latex]\mathbf{d}_{next}[/latex] and control policy [latex]\pi_{\theta}[/latex] for quadrupedal skating, leveraging Bayesian Optimization to propose designs and Reinforcement Learning to optimize policies, ultimately maximizing skating performance as measured by [latex]J(\pi_{\theta}^{\*},\mathbf{d},\mathcal{T})[/latex].

A reinforcement learning and Bayesian optimization framework enables the discovery of optimal hardware and control policies for versatile quadrupedal skating.

Achieving efficient and versatile locomotion remains a key challenge for quadrupedal robots, particularly when exploring dynamic modalities like skating. This is addressed in ‘Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization’, which presents a novel hardware-control co-design framework leveraging reinforcement learning and Bayesian optimization to enable roller skating with passive wheels. The resulting designs and control policies not only surpass human-engineered baselines but also exhibit complex behaviors like hockey stops and self-aligning motion, demonstrating a significant leap in dynamic quadrupedal locomotion. Could this co-design approach unlock similarly advanced capabilities in other challenging robotic domains?

The Illusion of Control: Reimagining Quadrupedal Locomotion

Conventional quadrupedal locomotion, while versatile, faces inherent limitations when navigating smooth surfaces. The cyclical process of lifting, advancing, and placing each leg requires significant energy expenditure, as friction constantly opposes motion. This is particularly true on surfaces like polished floors or ice, where the lack of traction exacerbates the problem, leading to slower speeds and reduced efficiency. Each footfall becomes a compromise between stability and momentum, demanding precise muscle control and constant adjustments to maintain balance. Consequently, achieving rapid and sustained movement relies on overcoming these frictional forces, a challenge that often restricts the overall performance of legged robots and biological quadrupeds alike.

Quadrupedal Skating represents a departure from conventional legged locomotion, integrating the advantages of both walking and rolling gaits. This innovative approach equips a quadrupedal robot with passive wheels attached to its limbs, allowing for a seamless transition between traditional walking and a faster, more energy-efficient rolling motion. By dynamically adjusting leg posture and wheel engagement, the robot can navigate varied terrains, maintaining stability while capitalizing on the reduced friction of rolling on smooth surfaces. Initial tests demonstrate that Quadrupedal Skating significantly enhances speed and reduces energy expenditure compared to purely legged locomotion, suggesting a promising avenue for robotics applications requiring swift and efficient traversal across diverse environments.

A quadrupedal robot achieves skating via 3D-printed roller supports holding passive wheels, with the wheel yaw installation angle ψ-the deviation between the wheel and robot’s sagittal plane-serving as the primary design parameter.

Designing for Inevitable Compromise: A Bilevel Optimization

A bilevel optimization framework was implemented to address the interdependent challenges of hardware design and control policy development for Quadrupedal Skating. This approach treats the robot’s physical parameters as the upper-level variables and the control policy as the lower-level variables. By simultaneously optimizing both, the framework avoids sub-optimal solutions that would arise from designing hardware and control sequentially. The outer loop adjusts the hardware configuration, while the inner loop trains a corresponding control policy, enabling the system to identify designs that are both controllable and conducive to achieving the desired skating behavior. This contrasts with traditional methods where hardware is fixed prior to control design, limiting the potential for synergistic improvements.

Bayesian Optimization is employed as the outer loop optimization algorithm to navigate the quadruped robot’s design space, specifically targeting parameters such as the Wheel Yaw Installation Angle. This method is selected for its efficiency in handling non-convex and computationally expensive optimization problems, characteristic of robotic design. Bayesian Optimization constructs a probabilistic surrogate model of the objective function – in this case, performance metrics derived from Reinforcement Learning training in the inner loop – and uses an acquisition function to balance exploration and exploitation when proposing new hardware configurations. This allows the algorithm to intelligently sample promising designs while minimizing the number of full Reinforcement Learning training runs required to identify optimal or near-optimal hardware parameters.

Proximal Policy Optimization (PPO) serves as the control learning algorithm within the bilevel optimization framework. For each proposed hardware configuration evaluated by the outer loop’s Bayesian Optimization, PPO trains a control policy to maximize cumulative reward on the quadrupedal skating task. PPO is employed due to its stability and sample efficiency, achieved through a clipped surrogate objective function that prevents overly large policy updates. This ensures that the learned control policy remains robust and avoids catastrophic performance degradation during training, even with variations in the hardware parameters being tested. The resulting policy, optimized for a specific hardware design, is then evaluated to provide feedback to the outer loop, guiding the search for optimal hardware-control co-design.

A naive [latex]\psi = 0^\circ[/latex] configuration on the Unitree Go1 and many other quadrupedal robots results in an uncontrollable forward velocity due to inherent leg kinematic limitations.

Mapping the Real: Accurate State Estimation and Validation

Accurate state estimation is fundamental to robotic control and optimization algorithms, providing the necessary data for feedback loops and trajectory planning. Our system utilizes OpenVINS, an open-source visual-inertial navigation system, for real-time pose tracking. OpenVINS fuses data from onboard inertial measurement units (IMUs) and cameras to estimate the robot’s six-degree-of-freedom pose – position and orientation – with high accuracy and low latency. This enables robust control even in challenging environments and provides reliable ground truth data for evaluating the performance of higher-level algorithms, such as the quadrupedal skating locomotion demonstrated in this work.

The OptiTrack motion capture system was implemented to provide ground truth data for validating the accuracy of the robot’s state estimation and for performance analysis. This system utilizes infrared cameras to track the 3D positions and orientations of reflective markers placed on the robot. Data is captured at a rate of 200 Hz, providing a high-resolution time series of the robot’s true pose. Captured data is then processed using OptiTrack Motive software, and exported in a format compatible with our robot state estimation and analysis pipelines. The precision of the OptiTrack system, with a reported accuracy of 0.1mm, enables reliable assessment of state estimation errors and performance metrics.

Performance of the Quadrupedal Skating gait was quantitatively assessed using the Cost of Transport (CoT) metric, which represents the metabolic energy expenditure per unit distance traveled. Experimental results demonstrate a 14.6% reduction in average CoT compared to a traditionally designed, one-dimensional (1D) co-design gait. This reduction indicates improved energetic efficiency during locomotion, suggesting that the proposed skating gait minimizes metabolic cost for the quadrupedal robot. The CoT was calculated by dividing the total energy consumption – measured via onboard current sensors and estimated motor efficiency – by the total distance traveled, averaged across multiple trials.

Combining World Frame Command with 2D design parameterization resulted in an energy-efficient self-aligning behavior, demonstrated by the robot consistently orienting its backward direction with the commanded velocity for optimal performance.

Beyond Simple Movement: The Illusion of Agency

The robotic system’s design extends beyond simple movement, incorporating an optimized control policy that facilitates complex maneuvers like ‘Self-Aligning Motion’. This capability allows the robot to autonomously reorient itself, maintaining stability and improving overall efficiency during operation. By actively counteracting external disturbances and proactively adjusting its posture, the system minimizes energy expenditure and ensures consistent performance even in challenging environments. This self-correcting behavior is achieved through a combination of precise hardware control and a learned policy that anticipates and mitigates potential instabilities, representing a significant advancement in robotic agility and robustness.

The robotic system exhibits a notable capacity for dynamic locomotion, most clearly demonstrated through the execution of a ‘Hockey Stop’ maneuver. This rapid deceleration, akin to a player halting on ice, is achieved not through direct motor commands, but by specifying desired velocities in a global, world-frame coordinate system. This approach bypasses the complexities of translating desired motions into individual motor actions, allowing for remarkably swift and precise control. The implementation showcases the system’s responsiveness and authority over its movements, highlighting the effectiveness of world-frame velocity commands in facilitating complex, high-agility behaviors beyond simple forward motion.

The integration of both world frame and base frame command strategies with the learned control policy yielded a significant performance improvement in dynamic maneuvers. Specifically, researchers demonstrated a 50% reduction in ‘Hockey Stop Time’ when utilizing the world frame command policy compared to the traditionally employed base frame approach. This substantial decrease highlights the system’s enhanced responsiveness and ability to execute complex movements with greater efficiency. The success of this integration underscores the potential for leveraging world frame commands to optimize control in robotic systems requiring rapid deceleration and directional changes, ultimately paving the way for more agile and versatile applications.

Utilizing a 'hockey stop' maneuver-rotating to align maximal wheel friction with velocity during rapid deceleration-the robot effectively halts when commanded using the human-engineered design. — Utilizing a ‘hockey stop’ maneuver-rotating to align maximal wheel friction with velocity during rapid deceleration-the robot effectively halts when commanded using the human-engineered design.

The Inevitable Horizon: Toward Adaptable and Resilient Locomotion

Future research endeavors are concentrating on integrating the ‘Virtual Leg Vector’ directly into the reinforcement learning algorithms that govern quadrupedal robot locomotion. This innovative approach treats the vector – representing a safe, reachable leg position – not simply as a target, but as a limiting factor during the learning process. By establishing this constraint, the robot is actively discouraged from attempting movements that would result in overextension or instability, effectively promoting a more robust and energy-efficient gait. This proactive safety measure is expected to significantly enhance the robot’s ability to maintain balance and navigate unpredictable terrains, leading to more reliable performance in real-world applications and minimizing the need for extensive recovery maneuvers after disturbances.

The current research lays the groundwork for a new generation of quadrupedal robots capable of navigating significantly more challenging environments. Future iterations aim to move beyond static, predictable surfaces and embrace the complexities of real-world terrain – including uneven ground, obstacles, and even dynamically changing conditions. This expansion necessitates advanced perception and control algorithms, enabling the robot to not only react to its surroundings but also proactively adapt its gait and body posture. Ultimately, the goal is to create a truly versatile machine, one that can traverse rubble-strewn landscapes, ascend steep inclines, and maintain stability in unpredictable, dynamic settings, mirroring the agility and robustness of natural animal locomotion.

The development of robots capable of efficient and adaptable locomotion holds significant potential for operation in complex, real-world environments. Current robotic systems often struggle with unpredictable terrain or dynamic conditions, leading to energy inefficiency and limited maneuverability. However, by prioritizing strategies that optimize movement based on environmental feedback and internal stability, researchers are paving the way for a new generation of robots. These advancements aren’t merely about speed or strength; they focus on intelligent, energy-conscious navigation – enabling robots to traverse uneven surfaces, adapt to shifting landscapes, and maintain balance with minimal energy expenditure. This ultimately promises robots capable of prolonged operation in diverse and challenging scenarios, from search and rescue missions to environmental monitoring and logistical support.

Angular velocity tracking rewards are scaled to emphasize linear velocity tracking performance in the world frame.

The pursuit of efficient locomotion, as demonstrated by this co-design framework, echoes a fundamental truth about complex systems. It isn’t simply about designing a solution, but cultivating one that adapts and thrives within inherent uncertainties. This work, blending reinforcement learning and Bayesian Optimization to achieve quadrupedal skating, reveals that the most robust systems aren’t static blueprints, but emergent properties of continuous refinement. As Carl Friedrich Gauss observed, “If I have seen further it is by standing on the shoulders of giants.” This holds true here; the system builds upon prior knowledge, iteratively improving through interaction and optimization, acknowledging that even the most elegant architecture is merely a temporary respite from the inevitable entropy. Order, in this case, is indeed just cache between two outages, constantly being renegotiated with each new interaction and adaptation.

The Turning Wheel

This work, focused on quadrupedal skating, reveals a familiar truth: every optimization is merely a temporary reprieve. The co-design framework, marrying reinforcement learning with Bayesian Optimization, finds local minima in the space of hardware and control. But the ground shifts. Future iterations will inevitably demand solutions to the problem of generalization – of adapting to terrains unforeseen during the initial search. Every dependency on a specific wheel material, a precise actuator torque, is a promise made to the past, a constraint on future evolution.

The elegance of passive wheels should not obscure a deeper point. Control, as traditionally conceived, is an illusion demanding service level agreements. The system will, in time, begin fixing itself. The real challenge lies not in imposing order, but in cultivating resilience – in designing for the inevitable cascade of failures that will test the limits of this elegant, rolling architecture.

One anticipates a move beyond efficiency, toward systems that anticipate instability. The cycle continues: build, break, adapt. The wheel turns, not toward a perfect solution, but toward a more interesting set of problems. And within those problems, a deeper understanding of the limits of design itself.

Original article: https://arxiv.org/pdf/2603.18408.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/