Robot Tennis Player Learns from Human Mistakes

Author: Denis Avetisyan

Researchers have developed a system that allows a humanoid robot to acquire athletic tennis skills by learning from imperfect motion capture data of human players.

A system called LATENT learns robust tennis play by first pre-training a motion tracker on imperfect human data, then building a correctable latent action space through online distillation, and finally training a high-level policy to refine and combine these latent actions-a policy subsequently transferred to real-world scenarios using dynamics randomization and observation noise.

The system, called LATENT, utilizes a correctable latent action space and robust sim-to-real transfer techniques to overcome the challenges of learning from noisy human data.

Reproducing the dynamic athleticism of human tennis players on humanoid robots remains a significant challenge, particularly due to the scarcity of complete and accurate motion data for robotic imitation. This work introduces LATENT, a system described in ‘Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data’, which learns robust tennis skills by leveraging imperfect human motion fragments – prioritizing learnable priors over complete datasets. By constructing a correctable latent action space and employing techniques for sim-to-real transfer, LATENT enables a humanoid robot to consistently strike and return incoming balls with natural motion. Could this approach unlock more accessible and efficient methods for teaching complex, dynamic skills to robots across a range of athletic domains?

The Challenge of Embodied Skill

The replication of human athletic ability in robotics, particularly in dynamic sports like tennis, presents a formidable engineering hurdle. Current methods often falter not because of a lack of computational power, but due to the ‘sim-to-real’ gap – the difficulty of translating learned behaviors from simulated environments to the unpredictable complexities of a physical robot and real-world conditions. This transfer is complicated by subtle variations in robot mechanics, sensor noise, and the inherent uncertainties of interacting with a moving ball and a variable playing surface. Consequently, a tennis-playing robot must contend with discrepancies between the idealized digital model and the messy realities of physical execution, requiring robust algorithms capable of adapting to unforeseen circumstances and maintaining performance despite imperfect data and imprecise control.

Replicating the nuanced movements of athletics presents a considerable hurdle for robotic systems due to the sheer complexity of human motion. Traditional approaches to robotic control often falter when confronted with the high dimensionality of athletic skills; a single tennis serve, for instance, involves coordinating dozens of degrees of freedom across the body. Furthermore, natural human movement isn’t perfectly consistent – there’s inherent variability in timing, force, and trajectory – and capturing this variability requires enormous datasets and exceptionally precise control mechanisms. Existing methods struggle to generalize from limited or imperfect data, demanding painstakingly curated examples and often failing to adapt to even slight deviations from the training conditions. This reliance on extensive data and rigid control severely limits the robot’s ability to perform reliably in dynamic, real-world scenarios.

Replicating the nuanced movements of human athletes presents a considerable hurdle for robotic systems, largely because these systems typically require complete and flawless data sets for effective learning. However, real-world athletic performance is rarely pristine; it’s characterized by subtle errors, incomplete motions, and variations in technique. Consequently, research focuses on developing algorithms that can effectively interpret and learn from this ‘noisy’ data. These systems must be able to extrapolate complete actions from fragmented observations, identify the underlying intent behind imperfect motions, and generalize learned skills to novel situations. This ability to learn from imperfection is crucial, as it mirrors the way humans themselves acquire athletic skills – through practice, adaptation, and refinement despite inherent inconsistencies in execution.

Heatmaps demonstrate that the learned policy facilitates effective and adaptive court coverage as the robot responds to increasingly consecutive ball returns, highlighting its ability to maintain position throughout extended rallies.

A Latent Space for Skill Acquisition

LATENT is a system designed to enable humanoid robots to learn athletic tennis skills through the implementation of a latent action space. This space functions as a compressed representation of fundamental movements, or primitive skills, necessary for playing tennis. By operating within this reduced dimensionality, LATENT aims to simplify the control problem and facilitate the acquisition of complex behaviors. The system moves away from directly controlling low-level motor commands and instead learns to manipulate this latent space, effectively allowing the robot to learn how to move rather than explicitly defining each joint angle and velocity.

LATENT utilizes a Motion Tracker to capture human tennis demonstrations, providing data for a Variational Autoencoder (VAE). The VAE is then trained to encode this complex motion data into a lower-dimensional, continuous latent space. This process effectively distills the essential parameters of the demonstrated skill, creating a compact representation that captures variations in movement while discarding noise and irrelevant details. The resulting latent space allows for efficient manipulation and generalization of the learned skill, enabling the robot to explore and adapt to new situations.

The utilization of a latent action space allows for generalization from limited datasets and imperfect data collection, addressing a key challenge in robotics where acquiring extensive, flawless motion capture is often impractical. By encoding observed human motions into a lower-dimensional, continuous latent space, the system can interpolate between learned actions and extrapolate to novel situations not explicitly present in the training data. This is achieved through the Variational Autoencoder (VAE) which learns a probabilistic representation of the motion, enabling the robot to handle variations in execution and adapt to discrepancies between simulation and the physical world during skill transfer to the platform. The resulting robustness improves performance when applying learned skills in real-world scenarios with inherent noise and uncertainty.

Real-world rally footage demonstrates the system's ability to handle diverse tennis return behaviors, encompassing varied strokes, footwork, and full-body coordination. — Real-world rally footage demonstrates the system’s ability to handle diverse tennis return behaviors, encompassing varied strokes, footwork, and full-body coordination.

Policy Learning Through Reinforcement

The high-level policy utilizes the Proximal Policy Optimization (PPO) algorithm, a model-free, on-policy reinforcement learning method, to learn a control strategy. This policy operates within a pre-defined latent action space, effectively treating lower-level motor skills as building blocks. Rather than directly commanding motor torques, the PPO policy learns to compose sequences of these primitive skills and to correct their execution based on observed state. The latent space representation allows for generalization to novel situations and simplifies the control problem by abstracting away low-level details, enabling the robot to focus on higher-level task planning and adaptation. The policy’s parameters are iteratively updated through interaction with the environment, maximizing a reward function designed to encourage successful task completion and efficient skill utilization.

To enhance the policy’s ability to generalize to unobserved conditions, training incorporates both Dynamics Randomization and Observation Noise. Dynamics Randomization involves randomly varying physical parameters of the simulated environment – such as mass, friction, and damping – during each training episode. This forces the policy to learn control strategies that are less sensitive to specific parameter values. Simultaneously, Observation Noise is added to the robot’s perceived state, simulating inaccuracies in sensor readings and introducing uncertainty in the observed environment. The magnitude of this noise is randomly varied to promote robustness against noisy sensor data. These techniques collectively expose the policy to a broader range of plausible scenarios, improving its performance and reliability in real-world deployments.

A Latent Action Barrier is implemented during reinforcement learning to regulate the exploration of the latent action space and promote the development of physically plausible robot behaviors. This barrier functions as a constraint on the output of the policy network, penalizing actions that deviate significantly from a predefined distribution of typical robot motions. Specifically, the barrier enforces a limit on the magnitude of changes in latent variables between successive time steps, effectively smoothing the learned policy and preventing jerky or unnatural movements. This constraint improves sample efficiency and accelerates skill acquisition by focusing exploration on a subset of actions likely to yield stable and coordinated robot behavior.

Latent Action Barriers (LAB) provide a mechanism to constrain robot actions within safe and desired behavioral ranges, effectively guiding exploration and preventing unintended outcomes.

Demonstrating Resilience and Skill Transfer

Experiments conducted on a Unitree G1 humanoid robot have validated the efficacy of LATENT in acquiring and performing both forehand and backhand tennis strokes. The system successfully translates learned latent representations into precise motor commands, enabling the robot to execute these complex movements with a degree of dexterity previously unattainable. This demonstration highlights LATENT’s capacity to not only learn distinct skills – in this case, two fundamentally different tennis strokes – but also to embody them physically through a robotic platform. The ability to perform both forehands and backhands showcases the system’s versatility and potential for more complex, dynamic applications beyond simple repetitive tasks, representing a significant step towards creating robots capable of nuanced and adaptive physical interactions.

Rigorous evaluation of the system’s performance centered on two key metrics: Success Rate and Distance Error. Success Rate quantified the proportion of attempted strokes successfully returned within the court boundaries, while Distance Error measured the average deviation between the robot’s ball landing point and the ideal target location. Comparative analysis consistently revealed substantial improvements over established baseline methods; the system achieved a notably higher Success Rate and significantly reduced Distance Error across a range of experimental conditions. These quantitative results underscore the efficacy of the approach in not only executing strokes, but also in achieving precision and consistency – crucial elements for robust and skillful performance in a dynamic, interactive setting.

The developed system showcases a remarkable degree of resilience and flexibility in dynamic scenarios. Through extensive robot-robot self-play, the system consistently achieved up to 25 consecutive rallies, indicating a high level of stability and learned coordination. Importantly, this performance was maintained even when the system was trained using imperfect human motion data – demonstrating its capacity to generalize and adapt to real-world input that is often noisy or incomplete. This adaptability suggests the system isn’t simply memorizing specific motions, but instead learning underlying principles of table tennis, allowing it to successfully navigate variations in technique and maintain a consistent level of play.

Robot self-play in simulation demonstrates consistent rallies, as visualized by opposing robots on the court and confirmed by the distribution of consecutive rally counts across numerous games.

The pursuit of robotic athleticism, as demonstrated by LATENT, echoes a sentiment akin to Paul Erdős’s belief that “a mathematician knows each number between 1 and 100.” This isn’t about rote memorization, but rather a fundamental understanding of the underlying principles. LATENT doesn’t simply copy human motion; it constructs a ‘correctable latent action space’ – a distilled representation of the essential elements for a successful tennis stroke. This parallels the mathematician’s grasp of numerical relationships, allowing for adaptation and problem-solving even with ‘imperfect data’. The system strives for elegance, reducing complexity to its core components, mirroring the beauty found in a concise mathematical proof. It’s a testament to the power of abstraction and a rejection of unnecessary clutter, ultimately aiming for a refined, almost minimalist, expression of athletic skill.

Further Vectors

The presented work addresses a practical necessity: extracting utility from imperfect data. The elegance of constructing a correctable latent action space is not the destination, but rather a mitigation of the inherent noise in translating biological motion to robotic control. Future iterations will necessarily confront the question of what constitutes ‘correctable’. The system currently assumes a fidelity to the original human data, but a more ambitious approach would involve identifying and discarding demonstrably suboptimal elements within the human performance itself-a form of robotic distillation of athletic skill.

Sim-to-real transfer remains, predictably, a constriction. The techniques employed offer robustness, yet introduce further layers of abstraction. The ultimate test will not be replication of human motion, but the emergence of novel, robotically-optimized strategies. A tennis-playing robot that mimics human error is merely a curiosity; one that transcends it, a demonstration of genuine intelligence.

Unnecessary is violence against attention. The field now faces a choice: pursue ever-more-realistic simulations, or embrace the fundamental differences between biological and mechanical systems. Density of meaning is the new minimalism. The pursuit of perfect replication is, ultimately, a distraction. The true potential lies in leveraging robotic capabilities to achieve performance unattainable by humans, even if the resulting motion appears… unfamiliar.

Original article: https://arxiv.org/pdf/2603.12686.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Embodied Skill

A Latent Space for Skill Acquisition

Policy Learning Through Reinforcement

Demonstrating Resilience and Skill Transfer

Further Vectors

See also: