Coordinated Action: Agents Learn to Work Together Through Relative Positioning

Author: Denis Avetisyan


A new method allows multi-agent systems to discover coordinated behaviors by focusing on how agents perceive each other’s movements and positions within a shared environment.

The system discerned the first five significant movement patterns from unrefined joint data, achieving this through a combination of individual agent analysis and two novel methods focused on relative positioning-a process exemplified by conditioning states at coordinates (1,4) and (1,7)-revealing how complex team behaviors emerge from fundamental, interconnected movements.
The system discerned the first five significant movement patterns from unrefined joint data, achieving this through a combination of individual agent analysis and two novel methods focused on relative positioning-a process exemplified by conditioning states at coordinates (1,4) and (1,7)-revealing how complex team behaviors emerge from fundamental, interconnected movements.

This paper introduces a novel approach to multi-agent reinforcement learning that uses inter-agent relative state abstraction and a ‘Fermat state’ representation to enable the discovery of joint options for improved performance in decentralized, partially observable environments.

While temporally extended actions enhance single-agent planning, coordinating behaviors in multi-agent systems is hampered by the exponential growth of the joint state space. This challenge is addressed in ‘Discovering Coordinated Joint Options via Inter-Agent Relative Dynamics’ by introducing a novel approach to multi-agent option discovery based on an abstraction of inter-agent relative states. The core contribution lies in a new state representation-built around a ‘Fermat state’ quantifying team alignment-that enables the learning of strongly coordinated joint options via a neural graph Laplacian estimator. Could this method unlock more robust and scalable coordination in complex, decentralized environments requiring nuanced teamwork?


The Inevitable Expansion of Complexity

Conventional methods in multi-agent systems often falter when faced with increasing complexity due to the inherent growth of the joint state space. As each agent adds its own state variables and potential actions, the total number of possible system states expands exponentially, quickly exceeding computational resources. This phenomenon-a core limitation in areas like robotics, game theory, and distributed artificial intelligence-creates a significant bottleneck for scalability and real-time responsiveness. Consequently, coordinating even a modest number of agents becomes prohibitively difficult, hindering the development of truly intelligent and adaptive multi-agent systems capable of operating in dynamic, real-world environments. The sheer size of this state space necessitates the exploration of novel techniques for representation and reasoning, moving beyond exhaustive search or overly simplified models.

Achieving seamless coordination among multiple agents demands more than simply reacting to immediate observations; it necessitates a complex process of inferring the intentions of others and predicting their subsequent actions. This ‘theory of mind’ – the ability to model the beliefs and desires of other agents – becomes acutely challenging in real-world scenarios where complete information is rarely available. Partially-observable environments introduce uncertainty, forcing agents to maintain probabilistic beliefs about the hidden states of the world and the internal states of their peers. Consequently, the computational burden of reasoning about these nested uncertainties grows exponentially with the number of agents and the depth of future predictions, creating a significant bottleneck for scalable and efficient multi-agent systems. Overcoming this computational hurdle is crucial for deploying coordinated agents in dynamic and unpredictable settings.

Many current multi-agent systems operate under the unrealistic premise of complete information, where each agent possesses a full understanding of the environment and the actions of others. Alternatively, these systems employ significant simplifications – such as assuming limited agent capabilities or predictable behaviors – to make the computational problem tractable. However, this reliance on full observability or simplified assumptions severely restricts their effectiveness in real-world applications characterized by partial information, dynamic environments, and unpredictable agent actions. The inability to account for hidden states, imperfect sensing, or the complex interplay of strategic agents ultimately limits the scalability and robustness of these approaches when confronted with the intricacies of genuine multi-agent scenarios.

The escalating complexity of multi-agent systems demands a shift from conventional approaches to representing and reasoning about interactions. Current methods, often predicated on complete information or overly simplistic models, falter when faced with the nuances of real-world scenarios – partial observability, dynamic environments, and the inherent uncertainty of other agents’ actions. A novel paradigm must prioritize scalable representations of joint states and intentions, potentially leveraging techniques like factored representations, belief propagation, or approximate inference to navigate the exponential growth of the state space. This emerging framework envisions agents not merely reacting to immediate stimuli, but proactively modeling each other’s reasoning processes and anticipating future collaborative states, ultimately enabling more robust and efficient coordination in complex, unpredictable environments.

The agent discovers options using inter-agent relative representations derived from a state factorization function <span class="katex-eq" data-katex-display="false">g</span>, a Fermat encoder φ, a state distance encoder <span class="katex-eq" data-katex-display="false">d_{\theta}</span>, and a graph Laplacian eigenvector approximator μ.
The agent discovers options using inter-agent relative representations derived from a state factorization function g, a Fermat encoder φ, a state distance encoder d_{\theta}, and a graph Laplacian eigenvector approximator μ.

Distilling Essence: The Geometry of Coordination

Inter-Agent Relative State Abstraction is a technique for representing the combined state of multiple agents within a reduced-dimensional, latent space. This method moves beyond absolute state definitions by focusing on relationships between agents, rather than their individual positions or attributes. The core principle is to define a compact representation centered on states where agents exhibit maximal alignment – that is, states where their relative positions and intentions are most coordinated. This approach aims to distill the essential information needed for multi-agent reasoning, effectively reducing the complexity of the overall state space while preserving critical inter-agent dynamics. The resulting abstraction facilitates more efficient planning and decision-making in complex multi-agent systems by focusing on the relative state rather than the absolute state of each agent.

The Inter-Agent Relative State Abstraction utilizes a ‘Fermat State’ as the central reference point within the latent space representing the multi-agent system’s state. This Fermat State is defined as the state minimizing the sum of weighted distances to all other agents; mathematically, given agent positions x_i, the Fermat State x^* is the solution to \arg\min_x \sum_i w_i ||x - x_i||, where w_i are weighting coefficients. By centering the state representation around this point, the abstraction effectively normalizes relative positions and, crucially, provides a meaningful baseline for inferring intentions based on deviations from the Fermat State, simplifying subsequent planning and decision-making processes.

State similarity estimation is achieved through the combination of Successor Distances and a Temporal Distance Encoder. Successor Distances, calculated as the expected number of steps to reach a terminal state from a given state under a specific policy, provide a metric for state proximity based on future trajectories. The Temporal Distance Encoder then processes these Successor Distances, along with observed state transitions, to capture the temporal relationships between states. This encoding allows the system to differentiate between states that may have similar immediate features but lead to different long-term outcomes, effectively modeling the dynamics of the environment and agent interactions. The resulting representation facilitates efficient comparison of states based on both their current configuration and anticipated future evolution.

Dimensionality reduction via Inter-Agent Relative State Abstraction facilitates more efficient planning and decision-making by representing the multi-agent state in a lower-dimensional latent space. This compact representation decreases computational complexity in both state estimation and action selection processes. Reduced dimensionality directly translates to fewer variables to consider during planning, enabling faster search algorithms and reduced memory requirements. Furthermore, a lower-dimensional state space allows for more effective generalization to unseen scenarios and improved scalability to larger multi-agent systems, as the ‘curse of dimensionality’ is mitigated.

Policy roll-outs in a 15x15 grid environment with four agents demonstrate action choices (arrows), final states (colored circles), and a white circle indicating the estimated Fermat state, with corresponding Fermatnn-distance estimates displayed as bars; further state initializations are detailed in Appendix A.7.
Policy roll-outs in a 15×15 grid environment with four agents demonstrate action choices (arrows), final states (colored circles), and a white circle indicating the estimated Fermat state, with corresponding Fermatnn-distance estimates displayed as bars; further state initializations are detailed in Appendix A.7.

Unearthing Coordinated Behavior: Spectral Echoes of Action

Eigenoptions are discovered through analysis of the state transition graph representing the multi-agent environment. Specifically, the graph Laplacian of this state transition graph is utilized to identify temporally extended actions. The graph Laplacian, a matrix representing connectivity and relationships between states, provides a spectral decomposition that reveals inherent macro-actions present within the environment’s dynamics. These macro-actions, termed Eigenoptions, correspond to eigenvectors of the Laplacian and represent coordinated behaviors that efficiently transition the agents between states. The ALLO method is then employed to derive these Eigenoptions from the spectral decomposition, effectively extracting reusable, coordinated action primitives.

Eigenoptions are computed using the ALLO (Algorithm for Learning Options) method, which identifies temporally-extended actions from observed state transition data. ALLO functions by discovering policies that maximize expected cumulative reward within the discovered option. Further refinement is achieved through the application of the Kronecker product; this allows for the systematic combination of individual agent policies, effectively scaling single-agent behaviors to multi-agent scenarios. The Kronecker product facilitates the creation of joint actions by combining the action spaces of individual agents, thereby enabling the emergence of coordinated macro-actions without requiring explicit specification of inter-agent dependencies.

The method facilitates the learning of temporally extended actions by directly addressing the dynamics of the multi-agent system. Unlike traditional approaches that rely on pre-defined action sequences or fixed-duration actions, this technique enables agents to discover actions whose duration and execution are intrinsically linked to the state transitions observed within the environment. This adaptation is achieved through the analysis of the state transition graph, allowing the system to identify recurring patterns and derive actions that are optimally aligned with the observed environmental dynamics, improving efficiency and coordination among agents.

The implementation of Eigenoptions facilitates more efficient execution of complex, coordinated behaviors in multi-agent systems compared to reliance on primitive actions. This efficiency stems from the ability of Eigenoptions to represent temporally extended actions, effectively reducing the action space and computational demands during task execution. Empirical results demonstrate that agents utilizing Eigenoptions achieve significant performance gains in downstream tasks, characterized by reductions in task completion time and improvements in success rates. These gains are attributable to the optimized action sequences and reduced planning complexity enabled by the discovery and application of coordinated macro-actions.

Eigenoption discovery on raw joint states, utilizing both Kronecker product and inter-agent relative representations, yields five eigenvectors revealing key coordination patterns for agents positioned at coordinates (7,7) and (7,8).
Eigenoption discovery on raw joint states, utilizing both Kronecker product and inter-agent relative representations, yields five eigenvectors revealing key coordination patterns for agents positioned at coordinates (7,7) and (7,8).

MacDec-POMDP: Asynchronous Execution and the Illusion of Synchrony

The research introduces a novel integration of macro-actions into the established MacDec-POMDP framework, building upon the Dec-POMDP to facilitate more complex and efficient multi-agent coordination. This extension allows individual agents to utilize temporally extended actions – known as macro-actions – without requiring strict synchronization with other agents. By embracing an asynchronous execution scheme, the system avoids bottlenecks and enhances responsiveness, enabling agents to pursue their objectives concurrently. This approach effectively moves beyond immediate, reactive behaviors and empowers agents to plan and execute sequences of actions tailored to their specific roles and the demands of the environment, ultimately fostering more robust and adaptable multi-agent systems.

The MacDec-POMDP framework facilitates a significant advancement in multi-agent coordination by enabling the simultaneous execution of Joint Options – pre-defined sequences of actions undertaken collaboratively by multiple agents. Rather than relying on strict sequential execution where one agent completes an action before another begins, this parallel approach dramatically improves system responsiveness, particularly in dynamic environments. By allowing agents to initiate and carry out their portions of a joint plan concurrently, the framework minimizes delays and maximizes efficiency. This is crucial for complex tasks demanding real-time adaptation and swift, coordinated responses, ultimately leading to more robust and effective multi-agent systems capable of tackling intricate challenges.

Within the MacDec-POMDP framework, each agent benefits from a Recurrent Neural Network (RNN) based decentralised policy, allowing for individualized learning strategies. This approach moves beyond uniform control, enabling each agent to develop a policy specifically attuned to its unique role and the overarching objectives of the multi-agent system. The RNN component is crucial, as it allows agents to maintain an internal state, effectively remembering past interactions and adapting its behaviour based on the evolving dynamics of the environment and the actions of other agents. Consequently, this tailored policy learning fosters more efficient coordination and responsiveness, ultimately contributing to enhanced performance in complex, collaborative tasks – a significant advantage demonstrated in scenarios like Level-Based Foraging and Overcooked.

The implementation of an Options Framework is central to building multi-agent systems capable of complex, temporally-extended actions; it provides the essential tools for managing these ‘options’ – pre-defined sequences of actions treated as single units. This approach moves beyond immediate reactions, enabling agents to plan and execute coordinated strategies over longer horizons, resulting in more robust and efficient performance. Demonstrated in challenging environments like Level-Based Foraging and the cooperative game Overcooked, this framework facilitates significant improvements in task completion; agents utilizing temporally extended actions consistently outperform those relying on purely reactive behaviors, highlighting the value of strategic foresight and coordinated execution in multi-agent systems.

Eigenvectors discovered from successive CEO cycles are integrated into the agent's action space to guide exploration in subsequent phases.
Eigenvectors discovered from successive CEO cycles are integrated into the agent’s action space to guide exploration in subsequent phases.

The pursuit of coordinated behavior in multi-agent systems often resembles cultivating a garden rather than constructing a machine. This paper’s exploration of inter-agent relative dynamics and the ‘Fermat state’ acknowledges this inherent complexity. It doesn’t seek to build coordination, but to create conditions where it can grow from the interactions of agents. As Brian Kernighan observed, “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” Similarly, overly complex approaches to multi-agent coordination can become brittle. The elegance of this work lies in its abstraction of relative states, fostering a more forgiving system where agents can adapt and learn coordinated joint options despite partial observability.

What Lies Ahead?

The pursuit of coordinated action, as demonstrated by this work, inevitably reveals the fragility inherent in any attempt to define coordination. The ‘Fermat state’ offers a compelling representation, a snapshot of relative positioning, but it is merely a snapshot. Environments shift, agents miscalculate, and the elegant geometry collapses into the noise of execution. A system that never breaks is, after all, a system that has ceased to learn.

The true challenge isn’t discovering options – it’s cultivating the capacity to lose them gracefully. Current approaches, even those leveraging relative dynamics, tend toward brittle specialization. Future work should investigate how agents can dynamically renegotiate their understandings of ‘joint options,’ acknowledging that any shared plan is, at best, a temporary truce with entropy. The ideal isn’t seamless cooperation, but robust adaptation to its inevitable failure.

Perfection leaves no room for people – or, in this case, agents. The field will likely move beyond the search for optimal strategies and toward mechanisms that encourage controlled exploration of sub-optimal ones. The most interesting systems won’t be those that solve problems, but those that learn to live within them, constantly dissolving and reforming in response to the unpredictable currents of a shared world.


Original article: https://arxiv.org/pdf/2512.24827.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-02 21:53