Robots Find Their Bearings: A Learned Approach to Multi-Robot Localization

Author: Denis Avetisyan

Researchers have developed a new system that allows teams of robots to accurately determine their positions relative to each other, even with limited visibility and noisy sensor data.

A multi-robot localization network leverages a graph neural network to associate prior and detected bearings, predicting relative positions alongside associated uncertainty-represented as covariances-and then employs a differentiable parallel graph optimization to refine performance through joint error correction and gradient propagation.

Mr. Virgil leverages graph neural networks and differentiable optimization for robust data association and state estimation in multi-robot systems utilizing visual-inertial and UWB ranging data.

Achieving robust multi-robot localization remains challenging due to difficulties in reliably associating detections across agents. This paper introduces ‘Mr. Virgil: Learning Multi-robot Visual-range Relative Localization’, a novel end-to-end framework that learns to associate ultra-wideband ranging and visual detections using graph neural networks. By integrating robust data association with differentiable pose graph optimization, Mr. Virgil delivers accurate and stable pose estimation, even with occlusion and limited observations. Could this learned approach unlock more scalable and resilient multi-robot systems for complex, real-world deployments?

The Challenge of Scalable Multi-Robot Localization

For teams of robots to effectively collaborate on tasks – whether in warehouses, disaster response, or exploration – each robot must possess an accurate understanding of its position relative to the others. This relative localization, however, presents a significant challenge as traditional methods falter when scaled to larger teams or complex environments. Many approaches rely on centralized computation, creating bottlenecks and single points of failure, or employ simplified models that cannot account for real-world noise and ambiguity. The core issue stems from the combinatorial explosion of possible relationships as the number of robots increases; determining which robot is which, and their precise distances, becomes exponentially more difficult. Consequently, errors accumulate, leading to localization drift and ultimately hindering the team’s ability to coordinate and achieve shared objectives. Developing robust, scalable, and ambiguity-resistant relative localization techniques remains a central hurdle in realizing the full potential of collaborative robotics.

Many current multi-robot localization strategies falter when confronted with the complexities of real-world scenarios due to architectural and methodological limitations. A prevalent issue is the reliance on centralized systems, where a single processor handles data from all robots; this creates bottlenecks and single points of failure, severely restricting scalability and responsiveness in large-scale deployments. Alternatively, some approaches employ overly simplistic models of robot motion and sensor data, often neglecting crucial factors like sensor noise, dynamic obstacles, and non-linearities in robot kinematics. Consequently, these systems struggle to maintain accurate localization estimates in dynamic environments, leading to drift, inconsistencies, and ultimately, a breakdown in collaborative task execution. The need for more distributed, robust, and adaptable localization frameworks is therefore paramount for realizing the full potential of multi-robot systems in practical applications.

Increasing the number of robots in the simulated random forest environment heightens the likelihood of occlusions, as demonstrated by the isolated robot (yellow ellipse) within a group of mutually observable robots (green ellipses).

Mr. Virgil: A Decentralized Vision for Team Localization

Mr. Virgil is a complete multi-robot localization system intended for operation without a central coordinator. The system architecture allows each robot to estimate its pose independently using data acquired from other robots within its visual sensing range. This decentralized approach enhances robustness and scalability compared to centralized localization methods. Localization is achieved through inter-robot observations, specifically feature detections, and subsequent pose graph optimization. The system is designed to operate solely on visual-range data, meaning it does not require external infrastructure such as UWB beacons or motion capture systems for pose estimation.

Data association, the process of correctly identifying observations with their corresponding landmarks or robots, is critical for multi-robot localization. Mr. Virgil addresses the inherent challenges of partial assignment – situations where not every observation can be uniquely matched – with a robust strategy based on the Sinkhorn algorithm. This iterative algorithm solves the assignment problem by finding the optimal assignment matrix that minimizes a cost function while enforcing row and column sums to represent the expected number of associations. By framing data association as an optimization problem and utilizing Sinkhorn, the system avoids the need for computationally expensive combinatorial search methods and facilitates scalable, real-time performance even with a large number of robots and observations. The algorithm effectively distributes uncertainty across potential assignments, improving robustness to noisy sensor data and ambiguous observations.

The front-end network within Mr. Virgil is implemented using LibTorch, the C++ API for PyTorch, to maximize computational efficiency. This choice allows for direct deployment of the trained network without the overhead of Python interpretation, critical for real-time localization. LibTorch facilitates graph optimization and parallel processing on commodity hardware, enabling the system to process visual data and generate localization estimates at a rate sufficient for dynamic robotic applications. The utilization of LibTorch also simplifies deployment across different platforms and integrates seamlessly with existing C++ robotic frameworks.

Refining Localization Through Optimized Data Association

Mr. Virgil’s data association strategy initially estimates correspondence using Bearing Matching. This technique determines potential matches between sensor detections and robot poses by comparing the angular difference, or “bearing,” between the detected feature and its predicted location based on the robot’s pose. The algorithm calculates the bearing error for each possible detection-pose pair, and establishes initial correspondences based on a defined threshold. This method provides a computationally efficient initial guess for subsequent refinement, prioritizing detections with minimal angular deviation from their expected locations, thereby reducing the search space for accurate pose estimation.

The Sinkhorn algorithm is employed to refine the initial data association established by Bearing Matching. This algorithm operates by solving the optimal transport problem, effectively distributing the association weights to create a consistent mapping between robot poses and sensor detections. Specifically, it iteratively adjusts the weights based on the cost matrix derived from the bearing angles, enforcing a global consistency constraint through the introduction of entropy regularization. This regularization prevents the formation of spurious associations and ensures that each detection is confidently linked to a single robot pose, resulting in a more accurate and reliable pose graph.

The Ceres Solver is employed as a non-linear optimization step to refine the estimated relative poses determined by the data association process. This optimization minimizes the reprojection error between observed detections and the predicted locations based on the robot’s trajectory. In simulated testing environments, this approach yields a Root Mean Squared Error (RMSE) of 3.9cm, quantifying the accuracy of the estimated relative poses. The RMSE value represents the typical distance between the predicted and actual robot positions after optimization, serving as a metric for the overall system performance.

Expanding the Horizon: Versatility in Sensor Integration

Mr. Virgil’s architecture prioritizes adaptability in perception, intentionally designed to be sensor agnostic. This means the system isn’t tethered to a specific localization technology; instead, it can seamlessly incorporate data from a diverse array of sensors, including ultra-wideband (UWB) systems, visual fiducial markers like APRILTAG, and even infrared (IR) LED-based tracking methods such as CREPES. This flexibility allows Mr. Virgil to leverage the strengths of each technology, creating a more robust and accurate localization system that isn’t hampered by the limitations of any single sensor. By treating localization data as an abstract input, the system simplifies integration of new or emerging sensing modalities, paving the way for future enhancements and broadening its operational scope in complex environments.

Mr. Virgil’s architecture is intentionally flexible, allowing for the seamless integration of diverse localization technologies to bolster performance in challenging conditions. Beyond traditional methods, the system readily accepts data from Ultra-Wideband (UWB) systems, which provide precise ranging information, and visual fiducial markers like APRILTAGs, offering robust pose estimation even with limited visibility. Furthermore, the design accommodates infrared (IR) LED-based systems, such as CREPES, enabling localization in environments where other technologies struggle. This multi-sensor approach doesn’t simply combine data; it leverages the strengths of each system, creating a more resilient and accurate localization framework capable of adapting to varied and complex operational scenarios.

Evaluations conducted within simulated environments reveal that Mr. Virgil consistently achieves higher localization accuracy and robustness compared to established methods such as PVO and Simple Match. Specifically, the system demonstrates a marked ability to maintain precise positioning even when subjected to significant sensor noise and partial occlusions – conditions that frequently challenge robotic navigation in real-world settings. This superior performance is attributed to the system’s advanced data fusion techniques, enabling it to effectively leverage information from multiple sensor modalities and mitigate the impact of individual sensor failures or inaccuracies. The consistent outperformance in these challenging scenarios suggests that Mr. Virgil offers a reliable and accurate localization solution for robots operating in complex and unpredictable environments.

The system detailed in this work, Mr. Virgil, embodies a holistic approach to multi-robot localization. It isn’t merely about fusing sensor data, but about intelligently associating that data across multiple agents. This mirrors a fundamental principle of complex systems: altering one component – in this case, data association – invariably impacts the entire architecture. As Blaise Pascal observed, “The whole is greater than the sum of its parts.” Mr. Virgil’s end-to-end learning, particularly its use of graph neural networks, demonstrates this elegantly. By considering relationships between robots and observations, the system achieves robustness, even with limited or occluded data – a testament to the power of understanding the whole rather than focusing solely on individual sensor readings. The differentiable optimization further reinforces this, treating the entire localization problem as an interconnected unit.

What Lies Ahead?

The presentation of Mr. Virgil, while a step toward more resilient multi-robot localization, implicitly highlights a persistent question: what are systems of this type actually optimizing for? Accurate trajectories are often the stated goal, but true coordination demands a richer understanding of environmental uncertainty – not merely a precise map, but a probabilistic assessment of knowable unknowns. The system’s reliance on visual-inertial fusion and UWB ranging, though effective, exposes the enduring challenge of sensor integration; a robust architecture must account for inevitable failures, not simply mask them with clever filtering.

Future work should resist the temptation toward ever-increasing complexity. Simplicity is not minimalism; it is the discipline of distinguishing the essential from the accidental. Data association, even with graph neural networks, remains a brittle point. Perhaps a shift from explicitly identifying correspondences to learning a representation that implicitly encodes relationships would prove more fruitful. The current paradigm excels at refining existing estimates; less attention is paid to detecting fundamental localization failures, a critical oversight in genuinely autonomous systems.

Ultimately, the true measure of progress will not be incremental improvements in accuracy, but a fundamental rethinking of how robots perceive and interact with their surroundings. A system that can gracefully degrade in the face of ambiguity, and prioritize informed exploration over perfect reconstruction, will be far more valuable than one that strives for unattainable precision. The elegance lies not in solving the problem completely, but in understanding its inherent limitations.

Original article: https://arxiv.org/pdf/2512.10540.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Scalable Multi-Robot Localization

Mr. Virgil: A Decentralized Vision for Team Localization

Refining Localization Through Optimized Data Association

Expanding the Horizon: Versatility in Sensor Integration

What Lies Ahead?

See also: