Teamwork Makes the Dream Work: Scaling Search with Intelligent Agents

Author: Denis Avetisyan

A new framework decouples search planning and knowledge management, enabling more stable and effective agent-based information retrieval.

The M-ASK framework iteratively refines predictions through a cyclical process of planning, searching, and knowledge updating, where both the planning and answer agents are directly incentivized via absolute reward scores <span class="katex-eq" data-katex-display="false">F\_{1}^{0}</span> and <span class="katex-eq" data-katex-display="false">F\_{1}^{t}</span>, while collaborative agents are motivated by the marginal improvement <span class="katex-eq" data-katex-display="false">\Delta F\_{1}^{t}</span> achieved with each refinement step. — The M-ASK framework iteratively refines predictions through a cyclical process of planning, searching, and knowledge updating, where both the planning and answer agents are directly incentivized via absolute reward scores $F\_{1}^{0}$ and $F\_{1}^{t}$ , while collaborative agents are motivated by the marginal improvement $\Delta F\_{1}^{t}$ achieved with each refinement step.

This paper introduces M-ASK, a multi-agent system leveraging dense rewards and concise context to optimize agentic search and address inherent instability.

While agentic search-leveraging Large Language Models to interleave reasoning and tool use-holds promise for complex information seeking, current monolithic architectures suffer from instability stemming from unconstrained reasoning and challenges in credit assignment. This paper introduces ‘Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search’, presenting M-ASK, a novel framework that decouples search planning from knowledge management via specialized agents. By employing turn-level rewards and a focus on concise context, M-ASK demonstrably improves both answer accuracy and training stability on multi-hop question answering benchmarks. Could this multi-agent approach unlock more robust and scalable agentic systems for tackling increasingly complex information challenges?

Decoding the Limits of Conventional Search

Conventional search technologies excel at retrieving information directly linked to keywords, proving highly effective for simple, fact-based inquiries. However, when confronted with questions demanding nuanced interpretation, synthesis of multiple sources, or multi-step logical deduction, these methods frequently falter. The core limitation lies in their inability to grasp the intent behind complex queries – instead of understanding the reasoning process required, they prioritize keyword matching. This often results in a deluge of superficially relevant documents, overwhelming the user and obscuring the genuinely pertinent information. Consequently, tasks requiring deep reasoning – such as diagnosing a complex medical condition based on a patient’s history, or comparing the strategic implications of competing geopolitical events – typically exceed the capabilities of traditional search, highlighting a critical need for systems capable of ‘thinking’ through problems rather than simply retrieving data.

The core challenge in deep reasoning for search lies in sustaining a logical progression of thought throughout a prolonged inquiry. Unlike simple keyword matches, complex questions demand a series of interconnected inferences, yet current systems often treat each search step in isolation. This fragmented approach leads to an accumulation of potentially irrelevant information – information overload – as the system struggles to retain context and prioritize pertinent details. Consequently, the final answer may be inaccurate, not because of a failure to find all the necessary information, but because the system couldn’t effectively synthesize it into a coherent whole, losing the thread of the initial reasoning process and drawing flawed conclusions from a disorganized collection of facts.

Iterative Retrieval-Augmented Generation (RAG) systems, while promising for complex queries, frequently encounter difficulties due to the ‘long-horizon credit assignment problem’. This challenge arises because determining the value of each individual search step within a multi-stage reasoning process proves remarkably difficult. When a RAG system ultimately arrives at a correct or incorrect answer, it’s often unclear which specific retrievals were genuinely helpful – or even detrimental – to the final outcome. This ambiguity hinders effective learning and improvement; the system struggles to refine its search strategies because it cannot reliably attribute success or failure to particular actions taken during the extended search process. Consequently, optimizing these systems requires navigating a complex landscape where the impact of early search steps may only become apparent much later, making it hard to isolate and reinforce beneficial behaviors.

M-ASK overcomes limitations of monolithic agents-specifically, long-horizon credit assignment issues stemming from unconstrained output, sparse rewards, and search noise-by decoupling roles and utilizing dense, turn-level rewards.

Architecting Intelligence: Introducing M-ASK

M-ASK achieves a separation of search planning and knowledge management through the implementation of a multi-agent system. This system distributes the search process across specialized agents, each responsible for a distinct task. Rather than a monolithic approach, individual agents focus on specific functions, such as initiating the search, formulating queries, extracting relevant evidence from search results, and updating the system’s overall knowledge. This decomposition allows for modularity and scalability; agents can be updated or replaced without impacting the entire system, and new functionalities can be added by introducing specialized agents. The framework relies on inter-agent communication to coordinate these tasks and maintain a consistent search trajectory.

The M-ASK framework’s ‘Knowledge State’ functions as a centralized, dynamic record of the search process. It explicitly stores posed questions, both answered and unanswered, along with the corresponding evidence retrieved. Critically, the Knowledge State also incorporates predicted answers, representing the system’s current hypotheses before evidence confirmation. This data structure is not merely a historical log; it is actively maintained and shared amongst all agents within the M-ASK system, enabling coordinated search and preventing redundant queries. The inclusion of predicted answers allows agents to prioritize evidence gathering and refine search strategies based on evolving understanding, ultimately facilitating a more efficient and focused information retrieval process.

The M-ASK framework’s functionality is distributed among four core agents. The Planning Agent initiates the search process by defining initial goals and constraints. The Search Agent is responsible for translating these goals into specific queries suitable for knowledge sources. Following query execution, the Summary Agent extracts relevant evidence from the search results. Finally, the Update Agent integrates this extracted evidence into the shared Knowledge State, refining the system’s understanding of the search topic and informing subsequent query generation. This division of labor allows for a modular and adaptable search process.

M-ASK consistently converges and maintains concise responses on the HotpotQA dataset, exhibiting greater stability than Search-r1, which frequently experiences mode collapse and context bloating as indicated by the training curves and variance in response length.

Refining the Signal: Dense Rewards and Parameter Sharing

Traditional reinforcement learning often utilizes sparse rewards, providing feedback only upon completion of a task. In contrast, M-ASK employs ‘Turn-Specific Dense Rewards’ which deliver immediate feedback to the agent at each step within the search process. This approach assigns a reward value to each intermediate action, rather than solely at the terminal state. By providing more frequent signals, dense rewards facilitate faster learning and address the challenges of the long-horizon credit assignment problem, where it can be difficult to determine which actions contributed to a final outcome. The granularity of turn-specific rewards allows the agent to more readily associate actions with their immediate consequences, improving the efficiency of the learning process.

Traditional reinforcement learning often struggles with the ‘long-horizon credit assignment problem’ – determining which actions, taken far in the past, contributed to a delayed reward. M-ASK addresses this by providing agents with immediate, turn-specific dense rewards at each step of the information retrieval process. This frequent feedback signal allows the agent to directly associate actions with consequences, substantially simplifying the learning process. Consequently, the agent can more effectively learn optimal search strategies and improve performance without needing to propagate reward signals across numerous intermediate steps, leading to faster convergence and enhanced efficiency in complex question answering tasks.

M-ASK employs parameter sharing across its ensemble of agents to improve training efficiency and generalization capability. This technique reduces the total number of trainable parameters by forcing multiple agents to utilize a common set of parameters, thereby decreasing computational cost and data requirements. The resulting model achieves a state-of-the-art average F1 score of 50.09 when evaluated across multiple Question Answering benchmarks, demonstrating the effectiveness of this approach in enhancing performance and resource utilization.

Evaluations on the HotpotQA benchmark demonstrate that M-ASK achieves a performance improvement of +5.82 compared to existing question answering methods. This metric reflects the model’s enhanced ability to accurately answer multi-hop reasoning questions requiring information retrieval from supporting documents. The reported gain indicates a statistically significant advantage in identifying relevant evidence and synthesizing correct answers within the HotpotQA dataset, which is known for its complexity and challenging reasoning requirements.

Beyond Retrieval: Robustness, Synthesis, and the Future of Agentic Search

The efficacy of M-ASK lies in its ability to actively combat the pervasive issue of ‘search noise’ – the influx of irrelevant or inaccurate information that often plagues retrieval-augmented generation (RAG) systems. Rather than passively accepting all retrieved data, M-ASK employs a dynamic knowledge refinement process, continuously evaluating and prioritizing evidence based on its relevance to the query. This is achieved through a multi-agent system where specialized agents collaborate to filter, validate, and synthesize information, effectively distilling a concise and accurate knowledge state. By focusing solely on pertinent evidence, M-ASK significantly diminishes the influence of misleading or extraneous data, leading to more reliable and focused responses, and ultimately enhancing the performance of agentic search tasks.

M-ASK builds upon the foundation of iterative Retrieval-Augmented Generation (RAG) systems, moving beyond simple question answering to facilitate genuinely proactive information seeking. Unlike traditional RAG which responds to specific queries, M-ASK enables agents to independently explore a knowledge space, formulating their own sub-questions and dynamically refining their search strategy. This is achieved through a multi-agent system where individual agents collaborate, each responsible for a specific aspect of the search – such as evidence gathering, evaluation, or synthesis – ultimately constructing a coherent and comprehensive understanding of the topic. The framework doesn’t merely retrieve relevant documents; it actively synthesizes information from multiple sources, identifying connections and resolving inconsistencies to produce novel insights and complex answers that extend beyond the scope of any single document.

A key advancement offered by the M-ASK framework lies in its exceptional training stability. Unlike conventional retrieval-augmented generation (RAG) systems-such as Search-r1, which frequently experiences a 90% training collapse rate-M-ASK consistently achieves a 0% collapse rate. This resilience stems from the framework’s continuous knowledge refinement and focused evidence selection, preventing the agent from losing coherence or diverging during the learning process. The ability to maintain a stable training trajectory is crucial for building reliable and consistently performing agentic search systems, allowing for more complex and nuanced information synthesis without the risk of catastrophic failure during adaptation.

Unlike traditional retrieval-augmented generation (RAG) systems which often suffer from exponentially growing context windows as they accumulate information, the Multi-Agent System Knowledge (M-ASK) framework actively curates its knowledge base. This is achieved through a dynamic state space that prioritizes relevant evidence and discards redundant or irrelevant information. Consequently, M-ASK avoids the computational bottlenecks and performance degradation associated with monolithic frameworks-systems that attempt to retain all retrieved data. By maintaining a concise and focused knowledge state, the framework ensures efficient processing and sustained performance even with complex, multi-turn interactions and expansive information landscapes, offering a scalable solution for agentic search and beyond.

The innovative Multi-Agent System (M-ASK) architecture transcends the limitations of typical search-focused frameworks, offering a versatile foundation for tackling a diverse range of cognitive tasks. Beyond simply retrieving information, the system’s collaborative agent design facilitates nuanced understanding and reasoning applicable to complex problem solving. This adaptability stems from the framework’s ability to dynamically refine its knowledge state and prioritize relevant evidence, enabling it to effectively address challenges in areas such as intricate question answering, data analysis requiring synthesis from multiple sources, and even scenarios demanding strategic planning and decision-making. The core principles of iterative refinement and focused knowledge representation are not bound to search; they represent a broadly applicable methodology for building robust and intelligent systems capable of navigating complexity across numerous domains.

The presented framework, M-ASK, actively challenges the conventional monolithic approach to agentic search. It dissects the process into distinct components-search planning and knowledge management-a deliberate act of deconstruction mirroring a core tenet of rigorous inquiry. This echoes Henri Poincaré’s sentiment: “Mathematics is the art of giving reasons.” M-ASK doesn’t simply apply mathematics, but embodies its spirit by systematically breaking down a complex problem into manageable parts, optimizing each element through dense rewards and concise context, and then reconstructing a more robust and stable system. The result isn’t merely functionality; it’s demonstrably reasoned performance-a testament to understanding through intelligent disassembly.

Pushing the Boundaries

The decoupling of search planning and knowledge management, as demonstrated by M-ASK, isn’t merely an architectural refinement – it’s a tacit acknowledgement that current agentic systems often choke on their own accumulated ‘wisdom’. The framework’s reliance on dense rewards and concise context suggests a fundamental tension: the more an agent ‘knows’, the harder it becomes to apply that knowledge effectively. Future work must rigorously investigate this trade-off, perhaps exploring methods for actively forgetting or distilling information, rather than simply accumulating it. The current emphasis on performance benchmarks feels…comfortable. A truly disruptive path involves deliberately introducing controlled instability, probing the limits of what constitutes ‘knowledge’ in an agentic context.

One anticipates that scaling M-ASK will reveal new failure modes, not necessarily related to planning or knowledge, but to the negotiation between agents. The framework currently addresses instability; it does not inherently account for adversarial behaviour within the multi-agent system itself. It remains to be seen whether the pursuit of optimal collective search will inevitably lead to emergent ‘politics’ amongst the agents, or if a sufficiently robust reward structure can preemptively constrain such dynamics.

Ultimately, the most intriguing question isn’t whether M-ASK improves agentic search, but whether it simply postpones the inevitable confrontation with the inherent limits of formal systems. The architecture invites further disassembly, further fracturing of assumed coherences. The goal, one suspects, shouldn’t be stability, but a controlled descent into productive chaos.

Original article: https://arxiv.org/pdf/2601.04703.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Decoding the Limits of Conventional Search

Architecting Intelligence: Introducing M-ASK

Refining the Signal: Dense Rewards and Parameter Sharing

Beyond Retrieval: Robustness, Synthesis, and the Future of Agentic Search

Pushing the Boundaries

See also: