Author: Denis Avetisyan
Researchers have developed a neural simulator that learns to accurately model the complex dynamics of deformable objects manipulated by robots, bridging the gap between simulation and real-world performance.

SoMA reconstructs and simulates soft-body interactions using Gaussian splats and force-driven dynamics, enabling stable, long-horizon simulations for real-to-sim transfer.
Accurate simulation of deformable objects remains a key bottleneck in transferring robotic manipulation skills from simulation to the real world. This paper introduces ‘SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation’, a novel approach leveraging 3D Gaussian splats to learn and simulate the dynamics of soft bodies interacting with robots. SoMA achieves stable, long-horizon simulation by representing object states in a learned latent space and directly modeling robot-object forces-improving resimulation accuracy by 20%. Could this neural simulation framework unlock more robust and adaptable robotic systems capable of complex, real-world tasks?
The Inevitable Decay of Simulation: A Necessary Imperfection
The creation of realistic and responsive deformable object simulations is paramount for advancements in both robotics and virtual reality, yet current methodologies frequently fall short in delivering both accuracy and computational speed. Traditional approaches, often relying on mesh-based models or rigid body approximations, struggle to faithfully reproduce the nuanced behaviors of materials like cloth, rubber, or biological tissues. This limitation stems from the immense computational cost associated with precisely modeling continuous deformations and internal forces, making real-time interaction impractical. Consequently, simulations often exhibit visual artifacts, instability, or a disconnect from physical reality, hindering the development of robust robotic manipulation systems and truly immersive virtual experiences. Overcoming these challenges necessitates innovative techniques that prioritize efficiency without sacrificing the fidelity required for believable and predictable object behavior.
Current methods for simulating deformable objects-like cloth, fluids, or soft tissues-often stumble when faced with the intricate dance of forces at play. These techniques frequently oversimplify the interactions between an object’s internal structure and external forces, resulting in simulations that appear visually unconvincing or, more critically, become numerically unstable. A seemingly minor perturbation-a gentle breeze on a cloth, for example-can trigger exaggerated, unrealistic responses as the simulation struggles to maintain physical plausibility. This occurs because accurately representing the complex interplay of tension, compression, bending, and collision forces-all while maintaining computational efficiency-remains a substantial hurdle. The resulting instability limits the ability to create truly immersive virtual experiences or reliable robotic interactions with the physical world, necessitating advanced approaches that prioritize both fidelity and robustness.
The difficulty in realistically simulating deformable objects presents a considerable hurdle for applications demanding physical interaction. Current methods often fall short in replicating the nuanced behavior of materials like cloth, rubber, or even human tissue, leading to simulations that appear unnatural or, critically, behave unpredictably during interaction. This instability hinders the development of robust robotic systems capable of manipulating soft objects and limits the immersive quality of virtual reality experiences. Consequently, researchers are actively pursuing novel modeling techniques, shifting away from purely data-driven approaches and towards hybrid methods that integrate physical principles with learned representations to achieve both fidelity and computational efficiency – a crucial step towards seamless and believable real-world interaction within digital environments.
![SoMA consistently produces stable, long-horizon simulations of diverse soft-body objects ([latex] ext{rope, cloth, doll}[/latex]) under both familiar and novel robot manipulations, unlike PhysTwin and GausSim which struggle with complex interactions and exhibit real-to-sim discrepancies.](https://arxiv.org/html/2602.02402v1/x3.png)
Gaussian Splats: A Continuous Representation of Impermanence
3D Gaussian Splatting represents a scene as a collection of 3D Gaussians, each defined by its position, covariance, opacity, and color. This approach contrasts with traditional mesh-based methods which discretize geometry using polygons, often requiring significant memory and computational resources for complex scenes. Gaussians offer a continuous, differentiable representation, allowing for efficient rendering and reconstruction. The covariance matrix of each Gaussian defines its shape and orientation, enabling the capture of fine geometric details. Furthermore, the opacity value controls the contribution of each Gaussian to the final rendered image, facilitating the representation of transparent and semi-transparent surfaces. This continuous representation allows for higher quality reconstructions with fewer parameters compared to discrete mesh representations, leading to reduced storage requirements and faster rendering speeds.
3D Gaussian Splatting represents objects as a collection of 3D Gaussians, each defined by its position, covariance, and opacity. This differs from discrete representations like meshes or voxels by providing a continuous, differentiable signal. Rendering is performed by convolving these Gaussians with the camera projection, resulting in view-dependent effects like specular highlights and transparency. The computational efficiency stems from the analytical integrability of the convolution, allowing for rapid rendering without the need for complex rasterization or shading calculations. Furthermore, the number of Gaussians required to represent a scene can be significantly lower than the number of triangles in a comparable mesh, leading to reduced memory usage and faster processing times. This approach effectively balances rendering quality with computational cost, making it suitable for real-time applications and large-scale scene reconstruction.
The integration of 3D Gaussian Splatting with established 3D reconstruction pipelines, specifically Simultaneous Localization and Mapping (SLAM) and COLMAP, enables the creation of detailed and accurate 3D models. SLAM algorithms provide the initial sparse or dense reconstruction, which is then refined and represented using 3D Gaussian Splats. COLMAP, a structure-from-motion and multi-view stereo pipeline, can generate a point cloud or mesh that serves as input for Gaussian Splatting, converting it into a more efficient and photorealistic representation. This combination leverages the strengths of both approaches: SLAM/COLMAP for robust initial reconstruction and Gaussian Splats for high-quality rendering and efficient storage, resulting in complete 3D scenes with improved geometric fidelity and visual appearance.
SoMA: Modeling the Inevitable Drift from Reality
SoMA addresses the challenge of transferring robotic skills learned in the real world to simulation environments, specifically for interactions involving deformable objects. Traditional simulation often fails due to inaccuracies in modeling these objects, leading to discrepancies between simulated and real-world behavior. SoMA employs a neural network-based approach to learn a mapping from real-world observations to simulated states, effectively creating a more realistic and accurate simulation of deformable object dynamics. This allows robots to train in simulation and then reliably execute those learned skills when deployed in the physical world, reducing the need for extensive real-world training and improving robustness.
SoMA utilizes Gaussian Splats to represent the continuous, deformable state of objects, offering a memory-efficient alternative to traditional mesh-based or particle-based methods. These splats, defined by their mean, covariance, and scalar value, allow for a differentiable representation suitable for gradient-based optimization. A Hierarchical Graph Network (HGN) then processes this state representation, enabling efficient propagation of information between splats and across the object’s geometry. The HGN’s hierarchical structure reduces computational complexity by abstracting local details while preserving global context, allowing SoMA to simulate complex deformations with reduced computational cost and improved stability compared to methods that operate directly on the splat representation without a hierarchical structure.
Robot-Conditioned Real-to-Sim (R2S) Mapping within SoMA establishes a causal link between robot actions and simulated object dynamics. This is achieved by conditioning the simulation on the robot’s state and actions, effectively modeling how the robot’s interactions directly influence the behavior of deformable objects. Specifically, the system learns a mapping that predicts the change in object state [latex] \Delta s [/latex] given the robot’s action [latex] a [/latex] and the current object state [latex] s [/latex]. This conditioning ensures the simulated dynamics are not simply a general prediction of object behavior, but are specifically tied to, and consistent with, the robot’s physical interactions, improving the fidelity of real-to-sim transfer.
Validating the Illusion: Measuring the Fidelity of Decay
The accuracy of SoMA is substantiated through a rigorous evaluation process employing established metrics for image and depth data. Specifically, researchers utilized Absolute Relative Error to quantify discrepancies, alongside Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) to assess perceptual fidelity. These comprehensive tests demonstrate SoMA’s ability to generate significantly more accurate simulations than existing methodologies, consistently achieving improved scores across all evaluated parameters and validating its advancements in realistic data generation. The use of these metrics provides a quantifiable basis for understanding the improvements SoMA delivers in both numerical precision and visual quality.
Evaluations demonstrate that SoMA significantly advances the state-of-the-art in simulation fidelity, achieving a 20% improvement in overall performance when measuring both RGB imagery and depth data compared to previously established methods. This substantial gain isn’t merely incremental; it represents a marked leap in the accuracy and realism of generated scenes. Rigorous testing, utilizing metrics designed to assess perceptual quality and geometric precision, consistently highlights SoMA’s capacity to produce simulations that more closely mirror real-world conditions. The improvement extends to complex scenarios, suggesting a robust architecture capable of handling intricate details and challenging conditions, and paving the way for more reliable and immersive virtual experiences.
The fidelity of SoMA’s simulations is significantly bolstered by the integration of physics-inspired consistency constraints and a multi-resolution training regime. These constraints enforce plausible physical behaviors within the generated scenes, preventing visually jarring inconsistencies and enhancing overall realism. Simultaneously, multi-resolution training allows the system to learn robust features at varying levels of detail, enabling it to reconstruct complex scenes efficiently and maintain stability even with limited input data. This combined approach ensures that the simulated environments not only appear realistic but also adhere to fundamental physical principles, resulting in more convincing and dependable outputs for applications ranging from robotics to virtual reality.
SoMA distinguishes itself through its occlusion-aware reconstruction capabilities, a critical advancement for generating realistic simulations, particularly within complex environments. Traditional methods often struggle when portions of a scene are hidden from view, leading to incomplete or inaccurate reconstructions. SoMA, however, intelligently infers information about occluded regions, leveraging contextual understanding and learned priors to fill in missing data with high fidelity. This allows the system to create more complete and robust representations of the scene, even when faced with partial observations. The result is a simulation that maintains visual consistency and plausibility, even as objects move behind one another or are temporarily hidden from view, significantly enhancing the overall realism and believability of the generated content.
![The demonstrated approach achieves robust performance across diverse robotic manipulation tasks, including multi-view object recognition, T-shirt folding, and the challenging [latex]PhysTwin[/latex] dataset.](https://arxiv.org/html/2602.02402v1/x4.png)
Towards Impermanent Intelligence: Accepting the Inevitable Shift
The convergence of neural dynamics modeling and real-to-sim transfer methodologies, such as the System-level Model Adaptation (SoMA) technique, is poised to redefine the landscape of embodied artificial intelligence and robot learning. This integration allows for the creation of robotic systems capable of not just performing tasks, but understanding the underlying physical principles governing their environment. By accurately representing the dynamic interplay between a robot’s actions and the world around it – including factors like friction, inertia, and material properties – simulations become far more realistic. Consequently, robots can train in these virtual environments, mastering intricate skills involving deformable objects and complex interactions, and then seamlessly execute those skills in the physical world with minimal adaptation, overcoming the traditional limitations of transferring learned behaviors between simulation and reality. This approach promises a future where robots are not simply pre-programmed, but genuinely embodied – possessing a deep understanding of their physical selves and the environments they inhabit.
The pursuit of robotic dexterity hinges on bridging the reality gap – the discrepancy between simulated environments and the physical world. Recent advances demonstrate that highly accurate physical simulations allow robots to acquire intricate manipulation skills – like grasping and folding deformable objects – entirely within a virtual space. This is achieved by modeling not just the robot’s movements, but also the complex physics of interaction, including friction, contact forces, and material properties. Crucially, these learned skills aren’t confined to the simulation; techniques such as System-level Model Adaptation (SoMA) facilitate a seamless transfer of the robot’s ‘knowledge’ to its physical counterpart, enabling it to perform the same tasks in the real world with minimal retraining. This approach drastically reduces the time and resources required for robot learning, bypassing the need for extensive real-world experimentation and paving the way for robots capable of mastering increasingly complex tasks.
The development of robotic systems skilled at handling deformable objects – think clothing, ropes, or even biological tissues – represents a significant leap towards truly versatile artificial intelligence. Current robotic manipulation often struggles with the inherent unpredictability of these materials, demanding systems that can adapt in real-time to changing conditions. This research directly addresses this challenge by fostering the creation of robots capable of not just executing pre-programmed actions, but of understanding how objects respond to force and interacting with them in a nuanced, robust manner. Such advancements promise applications ranging from automated surgery and advanced manufacturing to assistive robotics and even exploration of challenging environments, ultimately yielding systems that are more intelligent, adaptable, and resilient in complex, dynamic settings.
The presented SoMA framework inherently acknowledges the transient nature of any modeled system. It doesn’t strive for a perfect, static representation of deformable object dynamics, but rather constructs a simulation that, while remarkably stable for extended horizons, exists within the constraints of learned Gaussian splat representations. This approach mirrors a fundamental principle: stability is an illusion cached by time. As Marvin Minsky observed, “You can’t always get what you want, but if you try sometime you find, you get what you need.” SoMA doesn’t attempt to perfectly replicate reality-an impossible feat-but focuses on generating sufficient fidelity for robotic manipulation tasks, accepting that the simulation, like all systems, will eventually succumb to the effects of latency and approximation. The system, therefore, ages gracefully within its defined parameters.
What Lies Ahead?
The pursuit of accurate simulation inevitably encounters the inherent ephemerality of physical systems. SoMA offers a compelling method for reconstructing and propagating dynamics, but even learned representations are subject to the subtle degradations of time. The fidelity achieved with Gaussian splats is noteworthy; however, the system, like all systems, will eventually encounter scenarios where the learned model diverges from the increasingly complex reality it attempts to mirror. The challenge isn’t simply to refine the simulation, but to accept that perfect replication is an asymptotic goal-a direction, not a destination.
Future work will likely focus on extending the scope of interaction-moving beyond single objects to more intricate assemblies. Yet, a potentially more fruitful avenue lies in understanding how these simulations age. Observing the points of failure, the accumulation of error, and the emergent behaviors born from imperfect reconstruction could reveal deeper principles about the nature of physical systems themselves. Sometimes observing the process is better than trying to speed it up.
Ultimately, SoMA, and systems like it, learn to age gracefully. The long-horizon stability achieved is not a triumph over decay, but a measured accommodation of it. The field should perhaps shift its focus from striving for absolute accuracy to cultivating robustness-building simulations that, rather than perfectly predicting the future, can reliably navigate an uncertain present.
Original article: https://arxiv.org/pdf/2602.02402.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Heartopia Book Writing Guide: How to write and publish books
- Gold Rate Forecast
- Robots That React: Teaching Machines to Hear and Act
- Mobile Legends: Bang Bang (MLBB) February 2026 Hilda’s “Guardian Battalion” Starlight Pass Details
- UFL soft launch first impression: The competition eFootball and FC Mobile needed
- 1st Poster Revealed Noah Centineo’s John Rambo Prequel Movie
- Here’s the First Glimpse at the KPop Demon Hunters Toys from Mattel and Hasbro
- UFL – Football Game 2026 makes its debut on the small screen, soft launches on Android in select regions
- Katie Price’s husband Lee Andrews explains why he filters his pictures after images of what he really looks like baffled fans – as his ex continues to mock his matching proposals
- Arknights: Endfield Weapons Tier List
2026-02-03 13:07