Building Better Teammates: AI Learns to Collaborate On the Fly

Author: Denis Avetisyan

New research demonstrates how artificial intelligence can dynamically adapt to different collaborators in complex team environments, achieving strong performance through learned teammate modeling.

The system models teammate behavior by transforming observed interaction trajectories into prototype feature vectors and matching them against predefined rubrics, enabling a large language model to classify teammate types-either directly from the rubric, or, in a refined approach, by additionally conditioning on retrieved exemplar trajectories-and subsequently select an optimal response policy from a library of trained strategies, thereby achieving adaptive collaboration.

This work introduces ReCoLLAB, a retrieval-augmented generation framework leveraging large language models for cooperative ad-hoc teammate modeling in multi-agent reinforcement learning environments like Overcooked.

Adapting to unseen collaborators remains a key challenge in multi-agent systems, often hindered by brittle, model-based approaches. This paper introduces ReCollab: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling, a novel framework leveraging large language models and retrieval-augmented generation to infer teammate behavior and facilitate effective coordination. Experiments in the cooperative Overcooked environment demonstrate that ReCoLLAB consistently improves adaptation and achieves Pareto-optimal performance trade-offs. Could this approach unlock more robust and flexible teamwork in increasingly complex real-world scenarios?

The Challenge of Unseen Collaboration: A Fundamental Limitation

Conventional Multi-Agent Reinforcement Learning (MARL) systems frequently falter when faced with the unpredictability of new teammates, a limitation stemming from their reliance on pre-established coordination patterns. These systems are typically trained assuming a fixed team composition, rendering them inefficient-and sometimes entirely ineffective-when encountering agents with differing strategies or behaviors. The core issue lies in the difficulty of generalizing learned policies to unseen collaborators; a team expertly coordinated during training can quickly devolve into chaos when an unfamiliar agent enters the dynamic. This ‘cold-start’ problem presents a significant hurdle for deploying MARL in real-world scenarios-like robotics or autonomous vehicles-where constant shifts in team membership are commonplace and pre-training with every possible teammate is simply impractical.

The inherent difficulty of establishing effective teamwork with unfamiliar agents presents a significant hurdle in dynamic, real-world scenarios – a challenge known as the ‘cold-start’ problem. When agents encounter novel teammates without a history of coordinated training, performance often suffers due to an inability to predict or respond to differing behaviors and strategies. This necessitates the development of adaptive strategies that move beyond reliance on pre-programmed coordination or extensive shared learning. Instead, robust solutions must prioritize rapid assessment of new teammates, allowing agents to quickly infer intentions and adjust their own actions accordingly. Overcoming this cold-start limitation is crucial for deploying multi-agent systems in environments characterized by frequent team composition changes, such as robotic swarms, autonomous vehicles, and complex resource management systems.

Effective teamwork amongst artificial agents hinges on a crucial ability: the rapid construction of behavioral models for novel collaborators. Rather than relying on pre-programmed coordination, successful multi-agent systems must dynamically assess and predict the actions of previously unseen teammates. This necessitates agents that can observe, interpret, and extrapolate from limited interactions, essentially building an internal simulation of another’s decision-making process. Such predictive capabilities allow for proactive adaptation, enabling agents to anticipate potential conflicts or opportunities and adjust their own strategies accordingly – a vital skill in environments characterized by constant change and unpredictable team compositions. The speed and accuracy with which these models are formed directly impacts the efficiency and robustness of the overall collaborative effort, paving the way for truly flexible and resilient multi-agent systems.

Current approaches to multi-agent teamwork frequently demand substantial pre-training or the implementation of shared policies – strategies that prove largely ineffective when agents encounter unfamiliar collaborators. These methods often require agents to learn a rigid, collective behavior, failing to account for the inherent variability of new team members and the need for on-the-fly adaptation. The impracticality stems from the computational cost of exhaustive training for every possible teammate configuration, and the inflexibility of shared policies which restrict individual agents from leveraging their unique capabilities or responding creatively to unforeseen circumstances. Consequently, these limitations hinder the deployment of multi-agent systems in dynamic, real-world environments where seamless integration with previously unseen agents is paramount, and pre-defined coordination is often impossible.

ReCoLLAB consistently achieves performance near or on the Pareto frontier, demonstrating a strong balance between teammate-type classification accuracy and cumulative reward.

Modeling the Unknown: A Teammate-Centric Perspective

Teammate Modeling is the process by which agents in collaborative scenarios develop internal representations of the policies or underlying types of their partners. This capability is critical for effective teamwork, particularly in ad-hoc settings where prior knowledge of teammate behavior is limited or nonexistent. These representations aren’t necessarily complete recreations of the teammate’s decision-making process, but rather functional models sufficient to predict likely actions given the current state of the environment. The accuracy of these models directly impacts an agent’s ability to anticipate, coordinate, and adapt its own behavior, ultimately influencing overall team performance. Different approaches to teammate modeling vary in complexity and the types of information they utilize, ranging from simple frequency counts of observed actions to more sophisticated probabilistic inference techniques.

Several methods exist for inferring teammate behavior in multi-agent systems, each employing distinct computational approaches. PLASTIC utilizes Bayesian belief updates to maintain a probability distribution over a teammate’s policy based on observed actions. M3RL (Model-based Multi-agent Reinforcement Learning) focuses on learning models of other agents to predict their future behavior. TEAMSTER employs a contract-based framework where agents negotiate and commit to specific roles and actions, facilitating coordination. NAHT (Neural Agent Hierarchical Teammate) leverages hierarchical reinforcement learning to model teammate intentions and predict actions at varying levels of abstraction. These approaches differ in their representation of teammate behavior – from probabilistic beliefs to explicit contracts and learned models – and consequently, in their computational demands and adaptability to dynamic environments.

The capacity to anticipate teammate actions and dynamically adjust strategies is central to improved multi-agent system performance. By inferring likely behaviors, agents can move beyond reactive responses and proactively plan actions that complement those of their teammates. This predictive capability facilitates tighter coordination, reduces redundant effort, and enables the exploitation of complementary skillsets. Consequently, agents employing these techniques demonstrate enhanced efficiency in task completion, increased robustness to unforeseen circumstances, and a measurable improvement in overall system performance metrics, particularly in collaborative scenarios requiring shared goals and synchronized actions.

Current teammate modeling techniques – including PLASTIC, M3RL, TEAMSTER, and NAHT – predominantly function as independent systems, limiting their capacity to integrate external data sources or shared knowledge. This isolation prevents these methods from fully capitalizing on contextual information such as the overall task structure, environmental constraints, or historical interaction data. Consequently, performance gains are often constrained by the model’s reliance on limited observations of teammate behavior, hindering adaptability to novel situations or complex task dynamics. A unifying framework capable of incorporating these broader contextual cues remains an open research challenge.

Both teammate-type classification accuracy and episodic returns improve with an increasing number of retrieved exemplars <span class="katex-eq" data-katex-display="false">k</span>, indicating the benefit of incorporating more relevant past experiences. — Both teammate-type classification accuracy and episodic returns improve with an increasing number of retrieved exemplars $k$ , indicating the benefit of incorporating more relevant past experiences.

Leveraging Large Language Models as World Models for Collaboration

The CoLLAB framework distinguishes itself by employing Large Language Models (LLMs) not simply as response generators, but as ‘world models’ capable of representing and classifying collaborative teammate types. This is achieved by prompting the LLM with behavioral descriptions of teammates, allowing it to infer characteristics and categorize them based on observed actions within a collaborative environment. This approach moves beyond traditional agent modeling which often relies on hand-engineered features or limited state spaces; instead, CoLLAB leverages the LLM’s inherent capacity for pattern recognition and contextual understanding to dynamically classify teammate behavior without explicit programming for each type.

The CoLLAB framework categorizes teammate behavior by formulating prompts for Large Language Models (LLMs) that include descriptive details of observed actions. These prompts enable the LLM to classify teammates into distinct types, which are then used to dynamically adjust collaborative strategies. The classification is based on the LLM’s ability to infer behavioral patterns from the provided descriptions, allowing for a nuanced understanding of teammate roles and tendencies. This facilitates the selection of optimal interaction protocols tailored to each teammate’s identified type, thereby improving overall team performance in collaborative tasks.

Structured World Models within the CoLLAB framework represent the collaborative environment and agent interactions as a formalized, machine-readable structure. This allows the Large Language Model (LLM) to not simply process textual descriptions of behavior, but to reason about the state of the environment, the roles of different agents within it, and the likely consequences of actions. The model captures elements such as object locations, agent positions, and task-relevant information, providing a contextual basis for understanding teammate behavior and predicting future actions. This structured representation is critical for enabling the LLM to effectively categorize teammate types and formulate appropriate collaborative strategies, exceeding the performance of methods relying solely on behavioral descriptions.

The ReCoLLAB framework builds upon the CoLLAB approach by integrating Retrieval-Augmented Generation (RAG) to improve teammate-type classification accuracy within the Overcooked environment. Empirical results demonstrate that ReCoLLAB consistently outperforms both PLASTIC and Logistic Regression in this task. Performance optimization studies indicate that utilizing between 3 and 5 retrieved exemplars during the RAG process yields the most significant gains; increasing the number of retrieved exemplars beyond this range does not substantially improve classification accuracy.

Validation and Performance in Dynamic, Unpredictable Environments

The CoLLAB and ReCoLLAB frameworks were rigorously tested and validated using the cooperative game ‘Overcooked’, which serves as a standard benchmark for evaluating multi-agent coordination algorithms. Overcooked presents a complex, dynamic environment requiring agents to effectively communicate and collaborate to achieve shared goals within time constraints. Successful performance in this environment demonstrates the frameworks’ ability to handle the challenges of partial observability, stochasticity, and the need for rapid adaptation – characteristics common to real-world multi-agent systems. The game’s established metrics and comparative baselines allow for quantifiable assessment of CoLLAB and ReCoLLAB’s efficacy against existing approaches in multi-agent reinforcement learning.

Comparative analysis reveals that the CoLLAB and ReCoLLAB frameworks consistently outperform the Independent Proximal Policy Optimization (PPO) algorithm in multi-agent reinforcement learning (MARL) scenarios involving dynamically changing team compositions. Specifically, performance gains are most pronounced when agents experience frequent alterations in teammate identities and skill levels. This improvement stems from the ability of CoLLAB and ReCoLLAB to model and adapt to these variations, unlike Independent PPO, which treats each agent as independent and lacks mechanisms for explicitly accounting for teammate dynamics. Quantitative results demonstrate statistically significant increases in cumulative reward and task completion rates when employing CoLLAB and ReCoLLAB in environments with fluctuating team membership.

The CoLLAB framework’s teammate categorization is refined through the implementation of a Behavior Rubric, which leverages Mutual Information to quantify the statistical dependence between an agent’s actions and the resulting changes in the environment. This allows for a more nuanced understanding of teammate behavior than simple observation of actions alone. By calculating the Mutual Information between an agent’s actions and the observed state transitions, the rubric effectively identifies which actions are most informative about a teammate’s contribution. This information is then used to categorize teammates into distinct behavioral profiles, enabling CoLLAB to tailor its cooperative strategy and improve overall performance by better anticipating and responding to teammate actions.

Evaluations of the ReCoLLAB framework demonstrate performance at least competitive with, and frequently exceeding, that of baseline Multi-Agent Reinforcement Learning (MARL) algorithms in ad-hoc teamwork scenarios. Specifically, ReCoLLAB achieves comparable or superior cumulative reward when agents are dynamically composed into teams. Empirical results indicate an optimal ‘probe length’ of 20 timesteps for both maximizing classification accuracy of teammate behaviors and achieving peak cumulative reward during collaborative tasks. This probe length represents the duration of observed agent behavior used for categorization and subsequent coordination within the ReCoLLAB framework.

Towards Truly Adaptive and Intelligent Multi-Agent Systems

The convergence of large language models (LLMs) and multi-agent reinforcement learning (MARL) signifies a pivotal advancement in the pursuit of genuinely adaptive and intelligent systems. Traditionally, MARL agents required extensive training within specific team compositions to achieve effective collaboration. However, integrating LLMs allows agents to leverage pre-existing knowledge and natural language understanding, enabling them to rapidly model the intentions, capabilities, and communication styles of novel teammates. This capacity for ‘zero-shot’ teammate adaptation circumvents the limitations of conventional MARL, fostering more flexible and robust multi-agent systems capable of operating effectively in dynamic and unpredictable environments. The resulting systems exhibit enhanced coordination, improved task allocation, and a heightened ability to generalize learned behaviors to previously unseen scenarios, paving the way for more sophisticated and autonomous collective intelligence.

The capacity for rapid teammate adaptation represents a pivotal advancement in multi-agent reinforcement learning. Traditionally, agents required extensive joint training to function effectively as a team; however, integrating large language models allows for a paradigm shift. Agents can now leverage the knowledge embedded within these models to quickly understand the capabilities, intentions, and even communication styles of novel teammates. This unlocks possibilities previously constrained by the need for extensive pre-training, suggesting applications in scenarios like disaster response, where teams must form dynamically, or complex industrial automation, where robotic agents frequently integrate and reconfigure. The resulting systems promise increased flexibility, resilience, and ultimately, the ability to tackle more complex and unpredictable challenges in real-world settings.

Continued innovation in multi-agent reinforcement learning hinges on leveraging the evolving capabilities of large language models. Future studies are poised to investigate more nuanced LLM architectures, moving beyond current implementations to incorporate models with enhanced reasoning and contextual understanding. Crucially, research will focus on refining prompting strategies-the methods used to guide LLM behavior-to facilitate more accurate teammate modeling. This involves developing prompts that encourage agents to infer teammate intentions, predict actions, and adapt collaborative strategies dynamically. By optimizing both the LLM itself and the techniques used to interact with it, researchers aim to unlock more sophisticated coordination patterns, ultimately leading to multi-agent systems capable of tackling increasingly complex and unpredictable real-world challenges.

Successfully translating Large Language Model-based Multi-Agent Reinforcement Learning (MARL) into practical application necessitates the creation of frameworks capable of handling real-world complexity. Current systems often struggle with scalability – maintaining performance as the number of agents or environmental variables increases – and robustness, meaning consistent operation despite unforeseen circumstances or noisy data. Future development hinges on addressing these limitations through innovations in distributed computing, efficient communication protocols between agents, and techniques for mitigating the computational demands of LLMs. Such frameworks must also prioritize adaptability, allowing agents to learn and refine their strategies continuously within dynamic environments. Without these advancements, the potential of LLM-enhanced MARL to tackle intricate problems in areas like robotics, resource management, and autonomous systems will remain largely unrealized.

The pursuit of robust teammate modeling, as demonstrated by ReCoLLAB, echoes a fundamental principle of computational elegance. The framework’s reliance on retrieval-augmented generation to infer teammate types aligns with the need for provable strategies, not merely those succeeding on limited test cases. As Ken Thompson aptly stated, “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” ReCoLLAB’s methodical approach to understanding and adapting to teammate behavior, grounded in observable data and LLM inference, offers a more debuggable, and ultimately more reliable, path to cooperative intelligence than purely reactive systems. The system doesn’t just work in Overcooked; it demonstrates a discernible logic in its adaptation.

Beyond Collaboration: Charting a Course for Rigor

The presented work, while demonstrating a functional approach to ad-hoc teammate modeling, merely skirts the edges of a truly provable system. The reliance on Large Language Models, inherently stochastic and opaque, introduces a fundamental limitation. While retrieval-augmented generation mitigates some drift, it does not guarantee consistent, logically sound inferences about teammate behavior. A rigorous solution necessitates a formalization of teammate ‘types’ – not as linguistic descriptions, but as sets of verifiable properties within the game’s state space. Only then can one prove, rather than merely observe, adaptive behavior.

Future investigation should focus on replacing the LLM with a deductive engine. Imagine a system capable of inferring optimal collaborative strategies based on a formal model of the environment and teammate capabilities – a system where performance isn’t measured by empirical success, but by mathematical certainty. The current approach treats ‘teammate modeling’ as pattern recognition; the next stage demands a transition to logical deduction. The question isn’t whether the system appears to collaborate, but whether its actions are demonstrably correct, given the defined parameters.

Furthermore, the exclusive focus on Overcooked, however charming, limits generalizability. A truly robust framework must extend beyond a single, constrained environment. The ultimate challenge lies in constructing a system that can a priori determine the principles of effective collaboration in any multi-agent scenario, independent of specific game mechanics. To claim progress, one must move beyond observed performance and towards demonstrable, provable correctness.

Original article: https://arxiv.org/pdf/2512.22129.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/