Smarter Teams, Smaller Budgets: Building Efficient AI Systems

Author: Denis Avetisyan


A new framework optimizes the selection and collaboration of large language models to maximize performance while dramatically reducing computational costs.

BAMAS constructs a cost-conscious multi-agent system by strategically allocating large language models-provisioning an optimally priced ensemble-and determining the most effective collaborative structure to direct task completion, acknowledging that even elegant architectures inevitably accrue practical costs.
BAMAS constructs a cost-conscious multi-agent system by strategically allocating large language models-provisioning an optimally priced ensemble-and determining the most effective collaborative structure to direct task completion, acknowledging that even elegant architectures inevitably accrue practical costs.

This paper introduces BAMAS, a system for structuring budget-aware multi-agent systems using integer linear programming and reinforcement learning to achieve cost-efficient topology selection.

While large language model (LLM)-based multi-agent systems show promise for complex task solving, their practical deployment is often hindered by escalating costs as scale increases. This paper introduces BAMAS: Structuring Budget-Aware Multi-Agent Systems, a novel framework designed to address this challenge by optimizing both LLM selection and collaboration topology under explicit budget constraints. BAMAS leverages integer linear programming and reinforcement learning to achieve comparable task performance with cost reductions of up to 86%. Could this approach unlock more widespread adoption of LLM-powered multi-agent systems in resource-limited environments?


The Illusion of Collaboration: Why More Agents Aren’t Always Better

The potential of multi-agent systems lies in their ability to break down complex problems into manageable tasks, assigning them to specialized large language models (LLMs) and combining the results. However, realizing this promise is hampered by significant coordination challenges; simply connecting multiple LLMs does not guarantee effective collaboration. Each LLM possesses unique strengths and weaknesses, requiring sophisticated mechanisms to ensure seamless communication, prevent redundant efforts, and resolve conflicting outputs. Without careful orchestration, these systems can suffer from communication overhead, inconsistent reasoning, and a lack of overall strategic direction, ultimately limiting their performance on intricate tasks despite the individual capabilities of the constituent models. The core difficulty resides in establishing a cohesive workflow where diverse LLMs act not as independent entities, but as a unified team pursuing a common objective.

Current multi-agent system frameworks, including popular options like Auto-Gen, Meta-GPT, and Chat-Dev, frequently emphasize maximizing parallel task execution, often at the expense of genuine strategic collaboration. While these systems excel at distributing workload across multiple large language models, they often lack the nuanced coordination required for complex problem-solving. This prioritization of quantity over quality of interaction frequently leads to diminished returns; agents may duplicate effort, work at cross-purposes, or fail to effectively build upon each other’s contributions. The result is a system that, despite its computational power, underperforms due to a lack of cohesive strategy and intelligent task allocation, highlighting a critical need for frameworks that prioritize synergistic interaction over simple parallelism.

Current methods for coordinating large language models in multi-agent systems frequently struggle with an inefficient trade-off between computational expense and overall effectiveness. While increasing the number of agents can offer a degree of parallelism, it doesn’t necessarily translate to smarter or more efficient problem-solving, often leading to escalating costs without proportional gains in performance. The Bottleneck Adaptive Multi-Agent System (BAMAS) addresses this critical gap by dynamically adjusting resource allocation and agent engagement based on real-time performance analysis. Initial results indicate BAMAS can achieve substantial cost reductions-potentially up to 86%-compared to existing orchestration frameworks, suggesting a pathway towards scalable and economically viable multi-agent systems without sacrificing the quality of their output.

Collaboration topologies vary significantly across datasets and allocated budgets.
Collaboration topologies vary significantly across datasets and allocated budgets.

BAMAS: A Pragmatic Approach to Multi-Agent Efficiency

BAMAS presents a new framework for building multi-agent systems focused on resource efficiency. This framework addresses both the selection of Large Language Models (LLMs) for specific tasks and the configuration of how those LLMs collaborate. Rather than utilizing a fixed set of LLMs or a static collaboration structure, BAMAS dynamically optimizes these two elements in tandem. This optimization process aims to maximize performance within defined budgetary limitations, allowing for the construction of cost-effective multi-agent systems capable of tackling complex problems. The system moves beyond simply choosing the “best” LLM and instead seeks the optimal combination of LLMs and their interaction patterns to achieve a desired outcome.

BAMAS employs Integer Linear Programming (ILP) to determine the optimal allocation of Large Language Models (LLMs) – specifically DeepSeek-V3 and GPT-4.1 Nano – within a multi-agent system, while strictly observing predefined budgetary limitations. The ILP formulation defines decision variables representing the assignment of each subtask to a specific LLM, with objective functions minimizing total cost and maximizing performance. Constraints within the model enforce the budget limit, ensure each subtask is assigned to exactly one LLM, and may incorporate LLM-specific cost and performance characteristics. This optimization process yields a cost-effective LLM assignment that balances computational expense with desired accuracy, enabling the system to operate efficiently under resource constraints.

The BAMAS Topology Selection Policy operates by dynamically configuring the collaborative structure of Large Language Models (LLMs) based on the requirements of a given task. This adaptive approach contrasts with static topologies, allowing the system to optimize LLM interactions for both performance and cost. Evaluations demonstrate that BAMAS achieves accuracy levels comparable to state-of-the-art multi-agent systems, while simultaneously reducing computational costs by up to 86%. This cost reduction is realized through efficient allocation of resources and minimization of unnecessary LLM interactions, as determined by the policy during task execution.

BAMAS consistently achieves lower cost and higher accuracy than baseline approaches across all tested datasets.
BAMAS consistently achieves lower cost and higher accuracy than baseline approaches across all tested datasets.

Learning to Collaborate: Dynamic Topology Selection via Reinforcement Learning

The Topology Selection Policy within BAMAS utilizes Reinforcement Learning (RL) to dynamically determine the most effective collaboration strategy between agents. Unlike static topology assignments, the RL-based policy allows the system to learn and adapt its network configuration based on observed performance and task demands. This learning process involves defining a reward function that quantifies successful task completion, enabling the RL agent to explore different topologies – Linear, Star, and Feedback – and iteratively refine its selection process. The policy considers agent capabilities and task requirements as state variables within the RL framework, optimizing for metrics such as completion time, resource utilization, and solution quality. Consequently, the system moves beyond pre-defined configurations to achieve improved performance through data-driven topology adaptation.

BAMAS incorporates three primary network topologies to facilitate multi-agent collaboration: Linear Topology, Star Topology, and Feedback Topology. Linear Topology arranges agents in a sequential chain, suitable for tasks requiring ordered processing or relaying of information. Star Topology designates a central agent to coordinate and distribute tasks to others, enabling efficient centralized control. Feedback Topology allows agents to iteratively refine solutions through reciprocal communication and assessment, beneficial for complex problem solving and consensus building. The selection of an appropriate topology is determined by the specific task demands and the capabilities of the agents involved, allowing BAMAS to dynamically adapt its communication structure for optimal performance.

Planner-Driven Topology in BAMAS facilitates centralized coordination by designating a single agent as the primary planner, responsible for task decomposition and assignment to other agents. This contrasts with topologies like Linear, Star, and Feedback, which prioritize parallel execution or iterative refinement of solutions through distributed agent interactions. In Planner-Driven Topology, agents execute plans received from the central planner, reducing inter-agent negotiation but potentially creating a single point of failure or bottleneck. The other supported topologies allow for more robust and scalable solutions, particularly in dynamic environments, by distributing the planning and execution responsibilities.

Putting BAMAS to the Test: Performance Across Diverse Benchmarks

BAMAS underwent rigorous evaluation across a spectrum of challenging datasets – GSM8K, MATH, and MBPP – to assess its capabilities in diverse problem-solving domains. This testing revealed a strong aptitude for grade school mathematics, as demonstrated by performance on GSM8K, alongside a capacity for advanced reasoning tasks inherent in the MATH dataset. Further, BAMAS showcased proficiency in practical coding challenges through its performance on the MBPP benchmark, which centers on Python programming problems. The results collectively indicate that BAMAS isn’t limited to a single skill, but rather exhibits a versatile intelligence capable of tackling quantitative and qualitative problems requiring both computational and logical thinking.

BAMAS demonstrates remarkable proficiency in grade school mathematics, achieving 95.3% accuracy on the challenging GSM8K dataset with a budget of 1625. This performance places it in direct competition with AutoGen, a leading model in the field, which achieves a slightly higher score of 95.4%. The close proximity of these results highlights BAMAS’s ability to effectively solve complex, multi-step math problems commonly encountered in primary and secondary education, showcasing a significant advancement in automated reasoning and problem-solving capabilities within this domain.

Evaluations on the MBPP dataset reveal BAMAS achieves a compelling 82.6% accuracy in solving Python programming problems, placing it on par with current state-of-the-art models which average 82.2%. This performance indicates BAMAS possesses a robust capacity for code generation and logical reasoning necessary to tackle intermediate-level programming challenges. The close alignment with leading benchmarks underscores the model’s potential as a practical tool for automated software development and problem-solving, suggesting it can reliably produce functional and correct code for a wide range of tasks.

Recent evaluations demonstrate that BAMAS achieves a leading accuracy of 81.2% on the challenging MATH dataset, given a computational budget of 2000. This performance notably surpasses that of existing approaches, which currently average 77.6% accuracy on the same benchmark. The MATH dataset requires complex, multi-step reasoning to solve mathematical problems, highlighting BAMAS’s advanced capabilities in this domain. This improved performance suggests a substantial step forward in the ability of AI models to tackle sophisticated mathematical challenges, potentially offering new tools for both educational and research applications.

A key indicator of BAMAS’s robust performance lies in its remarkable budgetary efficiency and task completion rate. Across diverse benchmark datasets, the model demonstrated an exceptionally low incidence of exceeding allocated computational resources – notably, completing all tasks within budget on the GSM8K dataset. This signifies BAMAS’s ability to effectively prioritize and allocate its reasoning capacity, preventing unnecessary computation and ensuring reliable results even with constrained resources. The model’s consistent performance within budgetary limits distinguishes it from many contemporary systems, highlighting its practical applicability and potential for real-world deployment where resource management is critical.

The pursuit of elegant architectures in multi-agent systems, as detailed in this work on BAMAS, inevitably encounters the harsh realities of production. The framework’s focus on budget constraints and cost-efficient LLM selection highlights a pragmatic compromise – a system designed not for theoretical perfection, but for survival within resource limitations. As Marvin Minsky observed, “Problems that seem almost impossible can often be solved by breaking them into smaller parts.” BAMAS embodies this principle, dissecting the complex challenge of multi-agent collaboration into manageable components of cost and performance, acknowledging that even the most optimized topology will, one day, be optimized back into a new compromise.

What’s Next?

The pursuit of cost-efficient multi-agent systems, as exemplified by BAMAS, will inevitably encounter the realities of scale. The current formulation, reliant on integer linear programming for topology selection, hints at computational bottlenecks as agent numbers increase. One anticipates a shift towards heuristic approximations, trading optimality for speed-a familiar compromise. The elegance of optimizing LLM collaboration risks being overshadowed by the sheer unpredictability of LLM behavior itself; tests are a form of faith, not certainty.

A critical, largely unaddressed problem lies in the dynamic nature of budgetary constraints. BAMAS appears to assume a static cost model. Production rarely affords such luxuries. Future work must grapple with fluctuating LLM pricing, variable compute costs, and the ever-present possibility of a critical service becoming unexpectedly expensive. A system that performs beautifully on a benchmark today may simply cease to function tomorrow.

Ultimately, the true measure of this framework-and others like it-won’t be its theoretical performance, but its resilience. A system designed to minimize cost is only valuable if it doesn’t crash on Mondays. The field will likely see a divergence: one path towards increasingly sophisticated optimization, the other towards robust, albeit imperfect, systems that simply work in the face of inevitable chaos.


Original article: https://arxiv.org/pdf/2511.21572.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-30 20:55