One Model to Rule Them All: Universal Robot Control with AdaMorph

Author: Denis Avetisyan

Researchers have developed a new neural framework that enables a single model to adapt to and control robots with vastly different physical builds.

AdaMorph unifies cross-embodiment motion retargeting by standardizing human movement into local velocities and articulation <span class="katex-eq" data-katex-display="false">\mathbf{v}\_{t},\bm{\omega}\_{t},\mathbf{g}\_{t},\mathbf{J}\_{t}</span>, then encoding this intent-conditioned on SMPL shape parameters <span class="katex-eq" data-katex-display="false">\bm{\beta}</span>-into a shared latent space, before leveraging learnable robot prompts and AdaLin-style modulation to adapt the motion to a target robot’s joint space. — AdaMorph unifies cross-embodiment motion retargeting by standardizing human movement into local velocities and articulation $\mathbf{v}\_{t},\bm{\omega}\_{t},\mathbf{g}\_{t},\mathbf{J}\_{t}$ , then encoding this intent-conditioned on SMPL shape parameters $\bm{\beta}$ -into a shared latent space, before leveraging learnable robot prompts and AdaLin-style modulation to adapt the motion to a target robot’s joint space.

AdaMorph achieves unified motion retargeting by decoupling semantic intent from morphology using adaptive normalization and a physics-compatible representation.

Despite advances in robotics, transferring human motion to robots with diverse morphologies remains a significant challenge due to kinematic and dynamic discrepancies. This paper introduces AdaMorph: Unified Motion Retargeting via Embodiment-Aware Adaptive Transformers, a novel framework that unifies robot control across heterogeneous embodiments by decoupling semantic intent from morphological execution. AdaMorph utilizes Adaptive Layer Normalization to dynamically modulate feature spaces, enabling a single model to generate plausible motions for robots with varying topologies. Could this approach unlock truly generalizable robot control, allowing robots to learn from and replicate a wider range of human behaviors?

The Challenge of Diverse Robotic Forms

Conventional motion retargeting techniques, while effective for robots closely resembling the original motion capture subject, encounter significant hurdles when applied to machines with drastically different physical builds. The process typically involves painstakingly adjusting parameters and re-mapping movements for each new robot morphology, a time-consuming and often imprecise endeavor. This per-robot adaptation stems from the fundamental challenge of translating human motion-intended for a specific skeletal structure-onto kinematic chains with varying link lengths, joint configurations, and degrees of freedom. Consequently, researchers face a substantial bottleneck in scaling robotic applications, as the effort required to adapt motions grows exponentially with the diversity of robot platforms. This laborious process limits the widespread deployment of sophisticated human-inspired movements across a broader range of robotic systems, hindering progress in areas like collaborative robotics and automated assistance.

The faithful reproduction of human motion on robots presents a significant challenge due to discrepancies in kinematic structure; conventional motion retargeting techniques frequently struggle to maintain the subtleties of natural movement when transferred to robots with differing joint configurations and link lengths. These methods often produce movements that appear stiff, unnatural, or even physically implausible for the target robot, as they fail to account for how variations in morphology affect the feasibility and aesthetic quality of the motion. The loss of nuance extends beyond simple posture; critical details like the timing, acceleration, and coordination of movements are often distorted, resulting in robotic performances that lack the fluidity and expressiveness of the original human demonstration. Consequently, a key area of research focuses on developing algorithms capable of adapting and reinterpreting human motion in a way that respects the unique biomechanics and limitations of each robotic platform.

The pursuit of truly versatile robotics hinges on developing motion control frameworks that transcend the limitations of single-robot adaptation. Current approaches often require painstaking recalibration for each new robotic morphology, hindering widespread deployment and scalability. A unified framework, however, promises a paradigm shift – one where a single control system can effectively govern a diverse array of robots, irrespective of their kinematic structure or physical characteristics. This generalization isn’t merely about convenience; it’s about unlocking the potential for robots to operate seamlessly in complex, unpredictable environments, and to collaborate effectively with both humans and other machines. Such a system would rely on abstracting motion into core principles, allowing for intelligent adaptation and transfer across platforms, ultimately fostering a future where robotic capabilities are limited only by the imagination.

Despite being trained on different motions, the unified model successfully transfers complex footwork and posture to various robot embodiments when performing unseen folk dances without requiring any fine-tuning.

AdaMorph: A Framework for Unified Motion Control

AdaMorph employs a Transformer architecture, specifically leveraging self-attention mechanisms, to model temporal dependencies within motion sequences. This allows the system to capture long-range correlations and intricate patterns present in human motion data, exceeding the capabilities of recurrent or convolutional networks for this task. The Transformer’s encoder-decoder structure processes input motion sequences and generates corresponding output motions, with multi-head attention enabling parallel processing of motion features at different temporal scales. This architecture facilitates learning complex, non-linear motion patterns and allows for generalization to novel motions and robot embodiments, as the learned representations are not limited by fixed receptive fields or sequential processing constraints.

The AdaMorph framework employs an Intent Encoder to address the challenge of transferring motion between robots with varying morphologies. This encoder processes human motion capture data and generates a latent intent vector, effectively decoupling the intention of the movement from the specific kinematic details of the human performer. This intent vector serves as a consistent input to the motion decoder for all target robots, regardless of their differing link lengths, joint configurations, or degrees of freedom. By representing motion in terms of intent rather than raw joint angles, the system achieves morphology-agnostic motion transfer, allowing a single intent to be realized across a diverse range of robotic embodiments without requiring per-robot retargeting or extensive adaptation procedures.

Adaptive Layer Normalization (AdaLN) facilitates robot motion adaptation by modulating learned motion features based on a robot’s kinematic structure. Specifically, AdaLN incorporates robot embodiment embeddings – vector representations of a robot’s physical parameters like link lengths and joint limits – into the layer normalization process. This allows the network to dynamically adjust the feature statistics (mean and variance) at each layer, effectively scaling and shifting the activations. By conditioning the normalization on the embodiment embedding, the system can generate motions appropriate for robots with differing morphologies without requiring retraining or manual adjustments to the motion data; the same underlying motion features are transformed to suit each robot’s unique configuration.

The AdaMorph framework employs static robot prompts and cross-attention mechanisms during the motion decoding phase to ensure robot-specific motion generation. Each robot is associated with a unique, fixed prompt vector representing its kinematic properties. This prompt is concatenated with the latent motion features at each decoding step. Simultaneously, a cross-attention module allows the decoder to selectively attend to the robot prompt while generating each frame of motion. This process effectively conditions the generated motion on the target robot’s embodiment, facilitating the transfer of a common intent to diverse kinematic structures without requiring retraining or fine-tuning for each robot.

Learned robot representations exhibit both topological structure, demonstrated by correlated kinematic chains in the cosine similarity matrix, and semantic clustering in a t-SNE projection, indicating the model effectively captures morphological similarities between robots like the proximal grouping of G1 and H1.

Motion Representation Through Base Frames and Physics

AdaMorph utilizes a Base-Frame Representation to decouple human motion from global coordinates, expressing it instead as relative velocities and articulations within a local coordinate system. This standardization is achieved by defining a base frame attached to a key joint – typically the root of the human skeleton – and representing all subsequent motion as changes relative to this frame. Specifically, motion is characterized by $\Delta \mathbf{p}$ representing local linear velocity and $\Delta \mathbf{q}$ representing local angular velocity. By operating in this local space, AdaMorph effectively normalizes motion data, reducing the complexity of transferring movements between different robotic embodiments and simplifying the optimization process by minimizing the impact of global positional differences.

AdaMorph utilizes Local Linear Velocity $\mathbf{v}_{local}$ and Local Angular Velocity $\mathbf{\omega}_{local}$ as the primary control parameters for motion synthesis. These velocities are defined in the local coordinate frame of each end-effector or joint, providing a frame-independent representation of motion. By directly controlling these local velocities, the system decouples motion from global positional constraints and robot base location, enabling transferability across different embodiments and environments. This localized control scheme allows for precise manipulation of each degree of freedom, ensuring accurate trajectory tracking and reducing the computational complexity associated with inverse kinematics in a global frame. The system then integrates these local velocities over time to generate smooth and physically plausible motions.

Physics-Constrained Optimization within AdaMorph leverages techniques to ensure generated motions adhere to physical plausibility over extended durations. This is achieved through Differentiable Integration, enabling gradient-based optimization of motion trajectories while respecting the laws of physics; traditional discrete time-stepping methods are avoided to allow for continuous optimization. Furthermore, the framework incorporates SO(3) Projection, which constrains the rotational component of the motion to remain within the special orthogonal group $SO(3)$ , preventing unrealistic or physically impossible orientations and ensuring rotational consistency throughout the generated sequence. This combination of techniques results in long-horizon consistency, where motions remain stable and believable over time, and realistic dynamics, accurately simulating physical interactions and forces.

Embodiment-Specific Output Adapters are a crucial component in AdaMorph, responsible for translating the generalized, shared latent representation of motion into a format compatible with a specific robot’s kinematic and dynamic constraints. These adapters utilize robot-specific parameters – including link lengths, joint limits, and actuator characteristics – to project the latent space into actuator commands or desired end-effector trajectories. This projection ensures that the transferred motion is physically realizable on the target robot and accounts for differences in morphology and actuation. The adapters effectively decouple the motion planning from the robot’s physical embodiment, allowing a single latent representation to drive diverse robotic platforms with minimal retraining.

Our soft-prompted unified architecture successfully retargets human motions to 12 diverse robot embodiments within the MuJoCo simulator, demonstrating robust behavioral reproduction despite variations in robot morphology.

Zero-Shot Generalization and the Future of Robotic Control

AdaMorph exhibits a remarkable capacity for zero-shot generalization, effectively transferring learned motion skills to robots it has never encountered during training-requiring no task-specific fine-tuning for new robotic embodiments. This capability stems from the framework’s design, which prioritizes learning adaptable motion primitives rather than memorizing specific robot configurations. Consequently, a single, trained AdaMorph model can govern the movements of diverse robotic platforms, ranging in size, morphology, and actuator types, with minimal performance degradation. The system achieves this by abstracting away robot-specific details, focusing instead on the underlying principles of motion and allowing for seamless adaptation to previously unseen hardware, significantly streamlining the robotic deployment process and fostering a new level of robotic versatility.

Rigorous evaluation of the AdaMorph framework demonstrates a remarkable degree of alignment between human-intended movements and robotic execution, as quantified by the Pearson Correlation Coefficient (PCC). Across testing with the Unitree G1 and H1 quadrupedal robots, the system consistently achieved a PCC of approximately 0.95, indicating a very strong positive correlation. This high score signifies that the robot’s movements closely mirror those of a human operator, even without any specific training for those robots. The PCC, a statistical measure of linear correlation, provides a robust and objective assessment of motion fidelity, validating the framework’s ability to effectively transfer learned behaviors to new robotic platforms and suggesting a highly intuitive level of control.

The AdaMorph framework consistently demonstrates a remarkable ability to transfer learned motions across a diverse range of robotic platforms. Rigorous evaluation, utilizing the Pearson Correlation Coefficient (PCC), reveals that the system achieves a high degree of fidelity – exceeding 0.85 – in both root velocity and whole-body activity consistency across all twelve tested robot embodiments. This signifies that the transferred motions not only match the intended speed and direction but also maintain a natural and coordinated posture throughout execution, regardless of the robot’s physical structure. Such a high level of consistency suggests the framework effectively captures the underlying principles of locomotion, enabling seamless adaptation to varying morphologies and reducing the need for robot-specific adjustments.

The adaptability of the AdaMorph framework extends to robots with significantly different physical designs, as demonstrated by consistently high Pearson Correlation Coefficients (PCC) achieved across diverse embodiments. Even when applied to the Hightorque Hi and Kuavo S45 – robots exhibiting substantial morphological distinctions – the framework maintains a PCC of 0.8 or greater. This result underscores the robustness of the approach, indicating its capacity to generalize motion control beyond superficial similarities in robotic structure. The ability to transfer learned behaviors effectively to robots with varied anatomies dramatically simplifies the deployment process and suggests a pathway towards creating universally adaptable robotic systems, capable of operating across a broad spectrum of platforms without extensive per-robot customization.

Traditional robotic systems often demand extensive, robot-specific adaptation – a process of meticulous fine-tuning for each new embodiment – which significantly hinders rapid deployment and scalability. However, this framework circumvents that limitation by enabling robots to generalize learned motions to unseen platforms without any additional training. This capability drastically reduces the time and resources traditionally required to get a robot operational in a new form, effectively decoupling motion learning from specific hardware. Consequently, roboticists can focus on developing versatile behaviors, confident that those behaviors will transfer seamlessly across a diverse range of robotic morphologies, promising a future of faster prototyping and wider robotic application.

The development of this adaptable robotic framework promises a future where robots more naturally integrate into human environments. By minimizing the need for extensive, robot-specific programming, the system facilitates the creation of machines capable of responding to a wider range of commands and situations without requiring retraining. This inherent flexibility allows for more intuitive interaction, as robots can generalize learned motions to novel bodies and tasks, effectively bridging the gap between human intention and robotic action. Consequently, these systems are poised to become more responsive partners in collaborative tasks, assistive technologies, and everyday human-robot interactions, ultimately fostering a more seamless and productive coexistence.

High Pearson correlations between human and robot motion profiles for both root velocity and whole-body activity demonstrate the model’s ability to consistently transfer movement rhythm and overall energy across diverse robotic embodiments.

The pursuit of unified control, as demonstrated by AdaMorph, echoes a fundamental tenet of elegant design. The framework’s decoupling of semantic intent from morphological execution, achieved through Adaptive Layer Normalization, exemplifies a dedication to clarity. It is a refinement, not an accretion. As Edsger W. Dijkstra stated, “It is quite remarkable how much can be accomplished if one is not bothered by having to consider the consequences.” AdaMorph doesn’t attempt to account for every possible robotic variation; instead, it establishes a streamlined pathway for transferring motion, trusting in the underlying physics-based simulation to handle the specifics. This prioritization of core principles over exhaustive detail is indicative of a truly thoughtful approach to complex systems.

What Remains?

The pursuit of a unified control framework, as demonstrated by AdaMorph, inevitably reveals the persistent gulf between representation and reality. Decoupling semantic intent from morphological execution is a necessary simplification, a graceful retreat from the intractable complexity of embodied intelligence. The efficacy of Adaptive Layer Normalization suggests that a degree of morphological abstraction is attainable, but this abstraction comes at a cost. The question is not merely whether a model can generalize across embodiments, but whether it can do so without discarding the very nuances that define those embodiments.

Future work will likely focus on minimizing this loss of fidelity. The integration of richer physics-based priors, beyond the current simulation compatibility, seems crucial. Yet, the temptation to add complexity must be resisted. True progress may lie not in building more elaborate models, but in discovering more elegant ways to represent what is already there. The ultimate test will not be the number of embodiments controlled, but the subtlety with which each is animated.

One wonders if the quest for a universal retargeting framework is, in some sense, a category error. Perhaps the true challenge is not to create a single model that controls everything, but to create a system that learns to appreciate the unique affordances of each embodiment – a system that understands that sometimes, the most intelligent response is to simply adapt.

Original article: https://arxiv.org/pdf/2601.07284.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Diverse Robotic Forms

AdaMorph: A Framework for Unified Motion Control

Motion Representation Through Base Frames and Physics

Zero-Shot Generalization and the Future of Robotic Control

What Remains?

See also: