Bridging the Reality Gap: PolaRiS Enables Robust Robot Policy Evaluation

Author: Denis Avetisyan

A new system leverages advanced 3D reconstruction and data co-training to generate highly realistic simulated environments for assessing the performance of general-purpose robot policies.

PolaRiS establishes a system for transforming brief real-world video footage into highly realistic simulated environments, enabling scalable evaluation of robotic policies-a process refined through co-finetuning with simulation data and supported by newly available tools, datasets, and a community-driven environment hub, effectively bridging the reality gap in robotic learning.

PolaRiS establishes a scalable real-to-sim evaluation pipeline with strong correlation to real-world robot performance using Gaussian Splatting and simulation data co-training.

Accurate and efficient benchmarking remains a key challenge in robotics, particularly when evaluating increasingly versatile, generalist policies. This paper introduces PolaRiS-a scalable real-to-sim framework for high-fidelity robot evaluation-which leverages neural reconstruction from real-world video to create interactive simulation environments. We demonstrate that PolaRiS evaluations exhibit a significantly stronger correlation with real-world performance than existing simulated benchmarks, enabled by a simple co-training recipe bridging the reality gap. Could this approach pave the way for more democratized and distributed evaluation of the next generation of robotic foundation models?

Decoding Reality: The Challenge of Robotic Generalization

The pursuit of robotic agents capable of operating reliably in novel settings presents a persistent challenge for researchers. While robots can often achieve impressive performance within controlled laboratory conditions, their abilities frequently degrade when deployed in the unpredictable complexity of the real world. This discrepancy arises from the infinite variability inherent in unseen environments – variations in lighting, surface textures, object arrangements, and unexpected disturbances can all disrupt a robot’s pre-programmed behaviors. Consequently, a robot trained to navigate one specific room may struggle significantly in a slightly different space, highlighting the limitations of current approaches to robotic generalization and the need for more adaptable and robust learning algorithms.

The persistent challenge of transferring robotic skills learned in simulation to real-world application frequently stems from mismatches in physical dynamics. While simulations offer a cost-effective and safe training ground, they inevitably simplify the complexities of reality – friction, material properties, sensor noise, and even subtle variations in gravity are difficult to perfectly replicate. This discrepancy leads to what is known as the “sim-to-real gap,” where policies that perform flawlessly in the virtual environment falter or fail when deployed on a physical robot. The robot may struggle with tasks requiring precise force control, delicate manipulation, or navigation across uneven terrain, as the learned behaviors are predicated on inaccurate physical models. Consequently, researchers are actively exploring techniques to bridge this gap, including domain randomization – intentionally varying simulation parameters – and domain adaptation – modifying the learned policy to account for real-world differences.

Successfully deploying robots beyond controlled laboratory settings demands a concerted effort to reconcile the differences between simulation and reality. Current approaches often rely heavily on simulated training data, but these virtual environments inevitably diverge from the complexities of the physical world – variations in lighting, friction, material properties, and unforeseen disturbances can all derail a policy learned solely in simulation. Bridging this “reality gap” necessitates innovative techniques such as domain randomization – deliberately varying simulation parameters to force the robot to learn robust features – and domain adaptation, which refines the policy using limited real-world data. Furthermore, research is increasingly focused on meta-learning approaches, where robots learn how to learn, enabling faster adaptation to novel environments and improved generalization performance. Ultimately, the ability to transfer skills learned in simulation to the unpredictable nuances of the real world represents a crucial step toward truly versatile and autonomous robotic systems.

Policies crafted solely within simulated environments often exhibit a troubling fragility when confronted with the unpredictable nuances of the physical world. This brittleness stems from an inability to account for discrepancies – often subtle – between the idealized conditions of simulation and the messy reality of sensor noise, imperfect actuators, and unforeseen environmental factors. A robot trained exclusively on simulated data may perform flawlessly in its digital training ground, but falter dramatically when tasked with the same objective in a real-world setting. The resulting policies lack the robustness and adaptability needed to generalize beyond the precise conditions they were designed for, hindering their practical application and necessitating more sophisticated approaches to bridge the sim-to-real gap.

Co-finetuning policies with a small set of simulated demonstrations in PolaRiS enables strong real-to-sim transfer and allows for evaluation in novel environments without requiring additional data collection.

Constructing Verisimilitude: PolaRiS, A High-Fidelity Simulation Framework

PolaRiS is a simulation framework developed to address the need for realistic environments in robot policy evaluation. Traditional simulation methods often struggle to accurately represent complex scenes, hindering the transfer of learned policies to real-world applications. PolaRiS distinguishes itself by focusing on high-fidelity rendering and physically plausible environments, allowing for more robust and reliable testing of robotic algorithms before deployment. The framework is designed to facilitate the evaluation of robot behaviors across a variety of scenarios, enabling researchers and developers to refine and optimize policies in a controlled and repeatable manner. This approach aims to bridge the reality gap commonly observed in robotic simulations, ultimately improving the performance and adaptability of robotic systems.

PolaRiS utilizes 2D Gaussian Splatting, a neural reconstruction technique, to create detailed 3D scene representations from input imagery. This method represents a scene as a collection of 2D Gaussian distributions, each defined by its position, size, rotation, and opacity. By learning the parameters of these Gaussians from multiple views, PolaRiS can synthesize novel views of the scene with high visual fidelity and view-dependent effects. The technique is particularly efficient due to its ability to render scenes with a relatively small number of Gaussians, enabling real-time or near real-time rendering performance while maintaining a high level of detail, surpassing traditional mesh-based or voxel-based reconstruction methods in terms of both quality and speed.

PolaRiS establishes accurate scale and world coordinates within the simulated environment through the inclusion of ChArUco boards. These boards, containing readily detectable fiducial markers, are placed within the captured scene data. During reconstruction, the known physical dimensions of the ChArUco board serve as a ground truth scaling factor. The detected corners of the ChArUco board in the image data are then used to define the origin and orientation of the world coordinate frame, effectively aligning the simulated environment with real-world measurements. This process ensures that distances and sizes within the simulation accurately reflect the physical world, which is critical for realistic robot policy evaluation.

PolaRiS achieves computational efficiency and visual realism by representing the simulated environment as a collection of oriented planar Gaussian disks. This approach utilizes 2D Gaussian splatting, where each disk is defined by its position, orientation, size, and opacity. Rendering is performed by accumulating the contributions of these disks, offering a differentiable representation suitable for rendering and physics simulation. This method significantly reduces computational cost compared to traditional mesh-based rendering, as it avoids complex triangle intersections and texture mapping. The use of oriented disks also facilitates accurate lighting and shading calculations, contributing to a visually plausible simulation with reduced rendering time and memory footprint.

PolaRiS enables users to construct realistic simulation environments by scanning real-world scenes, representing them with articulated Gaussian splats, and composing evaluation scenarios with scanned objects and robots.

Closing the Loop: Validating Real-to-Sim Transfer with Hardware-in-the-Loop Testing

PolaRiS is designed for compatibility with Hardware-in-the-Loop (HIL) testing, enabling real-time interaction between simulated environments and physical robotic systems. This integration is achieved through standardized communication interfaces, allowing policies developed and trained in simulation to be deployed and evaluated on actual robotic hardware. The HIL setup facilitates closed-loop testing where sensor data from the physical robot is fed back into the simulation, and control commands are sent from the simulation to the robot. This bidirectional data flow validates the transfer of learned policies from the virtual to the real world, identifying discrepancies and enabling iterative refinement of control algorithms before deployment on a physical platform.

PolaRiS facilitates the evaluation of trained policies by deploying them onto a physical robot operating concurrently with the simulation. This process, known as Hardware-in-the-Loop (HIL) testing, enables real-time comparison of the policy’s performance in both environments. Sensor data from the physical robot is fed back into the simulation, closing the loop and allowing the policy to react to realistic conditions and physical limitations. This bi-directional communication validates the policy’s robustness and transferability from the simulated domain to the real world, identifying discrepancies and potential failure points before actual deployment. The system allows for the assessment of policy performance metrics, such as trajectory tracking error and task completion rate, under realistic conditions, providing a more comprehensive evaluation than simulation alone.

The DROID Platform is a modular robotic system designed for research and development, functioning as a key component in validating robotic policies developed within the PolaRiS framework. It features a differential drive base, a customizable payload capacity, and multiple sensor integration points, including encoders, IMUs, and cameras. This adaptability allows researchers to configure the platform to represent a variety of robotic systems and environmental conditions. The platform’s open-source software and hardware design facilitates rapid prototyping and iterative testing, enabling a streamlined workflow for transferring policies from simulation to a physical robotic base for real-world validation.

Joint Position Control (JPC) is a core capability within the PolaRiS framework, facilitating accurate and repeatable robot movements in both simulated and real-world environments. JPC operates by commanding individual joint angles of the robotic platform – such as the DROID Platform – to achieve desired end-effector positions and orientations. This is accomplished through the use of feedback loops, where the actual joint angles are measured and compared to the commanded values, with corrections applied via motor control signals. Precise JPC is essential for validating policies transferred from simulation to the physical robot, ensuring that learned behaviors translate effectively and reliably to real-world manipulation tasks. The system supports both position and velocity control modes, allowing for trade-offs between accuracy and speed based on the specific application requirements.

PolaRiS accurately assesses policy performance both in specific target environments and overall, as shown by strong correlation metrics across individual environments.

Refining the Model: Co-Training and Improving Correlation with Real-World Performance

To bridge the gap between simulated training and real-world application, the PolaRiS framework utilizes a co-training procedure that incorporates a limited set of demonstrations gathered from actual robotic execution. This refinement stage allows the initially simulated policy to adapt to the nuances of physical reality, addressing discrepancies arising from imperfect simulation. By learning from these real-world examples, the policy adjusts its parameters to better align with observed behaviors, enhancing its robustness and overall performance in unpredictable environments. This targeted learning approach proves particularly effective when dealing with the complexities of physical interaction, where even subtle differences between simulation and reality can significantly impact performance.

The refinement of the initial policy, facilitated by co-training, is crucial for translating simulated success into real-world applicability. This process doesn’t simply optimize for a single environment; it actively bolsters the policy’s ability to generalize. By exposing the policy to a limited set of real-world demonstrations, the framework identifies and corrects discrepancies between the simulation and reality, effectively reducing the ‘sim-to-real’ gap. Consequently, the policy becomes more resilient to unforeseen variations in lighting, texture, and object positioning, enhancing its robustness and ensuring reliable performance even when faced with conditions not explicitly encountered during training. This iterative fine-tuning results in a policy capable of adapting and maintaining consistent behavior across diverse and unpredictable environments.

The foundation of PolaRiS’s policy learning lies in the generation of comprehensive simulation data. This data, created entirely within the PolaRiS framework, serves as the primary input for initial policy training, effectively establishing a robust baseline before any real-world interaction. By training policies extensively in simulation, the system develops a strong understanding of the task dynamics and potential scenarios. This approach significantly reduces the need for costly and time-consuming real-world data collection, and allows for rapid iteration and experimentation with different policy configurations. The fidelity of this simulated environment is crucial, enabling the learned policies to transfer effectively to the complexities of real-world application, ultimately boosting performance and adaptability.

The PolaRiS framework demonstrates a remarkable ability to generalize learned policies to new, real-world scenarios, as evidenced by a strong correlation between simulation and reality. Specifically, the system achieves a Pearson correlation coefficient of up to 0.98 when comparing policy performance in simulated environments to its execution in the real world. This high degree of alignment extends to established benchmarks; PolaRiS also exhibits a Pearson correlation coefficient of 0.9 with the RoboArena benchmark, signifying its capacity for effective zero-shot evaluation – the ability to perform well in environments it has never directly experienced. These results highlight the framework’s potential for developing robotic systems that can adapt and function reliably without extensive retraining in each new setting.

This table details the composition of cotraining data mixes used to determine the minimal data requirements for maintaining fidelity in simulation evaluation.

The pursuit of robust generalist robot policies, as detailed in PolaRiS, necessitates a relentless questioning of simulated environments. One must dismantle assumptions about fidelity and actively seek discrepancies between the virtual and the real. Grace Hopper aptly stated, “It’s easier to ask forgiveness than it is to get permission.” This ethos perfectly encapsulates the PolaRiS approach; the system doesn’t simply accept simulation as a proxy for reality but actively constructs higher-fidelity environments through real-world data co-training. By prioritizing reconstruction and iterative refinement, PolaRiS embodies a willingness to ‘break’ the limitations of traditional simulation, ultimately pushing the boundaries of real-to-sim transfer and policy generalization.

Beyond the Mirror: Future Exploits in Simulation

PolaRiS establishes a compelling link between simulated and real-world robotic performance, but correlation isn’t causation-it’s merely a useful constraint. The system meticulously reconstructs reality, yet every exploit starts with a question, not with intent. Future work shouldn’t focus solely on refining fidelity, but on actively violating the assumptions baked into these reconstructions. What happens when the simulator is deliberately fed adversarial data, or when it’s asked to model physics beyond its design parameters? The true test of a generalist policy isn’t its success in a polished simulation, but its graceful failure when that simulation breaks down.

Current evaluations still rely on pre-defined tasks and environments. A more rigorous approach would involve open-ended exploration-allowing policies to define their own challenges within the simulated world. This shifts the focus from validating performance to revealing the limitations of both the policy and the simulation itself. The current paradigm builds increasingly complex mirrors; the next step requires shattering them to see what remains.

Ultimately, the value of systems like PolaRiS lies not in their ability to predict real-world success, but in their capacity to highlight the inherent unpredictability of complex systems. A perfect simulation is a sterile one; it is in the imperfections, the unexpected interactions, that true intelligence-and truly robust policies-will emerge.

Original article: https://arxiv.org/pdf/2512.16881.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Decoding Reality: The Challenge of Robotic Generalization

Constructing Verisimilitude: PolaRiS, A High-Fidelity Simulation Framework

Closing the Loop: Validating Real-to-Sim Transfer with Hardware-in-the-Loop Testing

Refining the Model: Co-Training and Improving Correlation with Real-World Performance

Beyond the Mirror: Future Exploits in Simulation

See also: