Smarter Stock Control: How Humans and AI Can Work Better Together

Author: Denis Avetisyan

A new approach to inventory management combines the reasoning power of large language models with the precision of operations research to dramatically improve efficiency.

Hybrid agentic workflows leveraging human-in-the-loop decision support achieve over 30% cost reduction in inventory control compared to end-to-end AI solutions.

Despite advances in operations research, effective inventory control remains a challenge for many businesses lacking specialized expertise. This paper, ‘Ask, Clarify, Optimize: Human-LLM Agent Collaboration for Smarter Inventory Control’, investigates a hybrid approach leveraging Large Language Models (LLMs) not as direct solvers, but as intelligent interfaces that decouple semantic understanding from rigorous mathematical optimization. Our results demonstrate that this agentic framework reduces total inventory costs by over 30% compared to end-to-end LLM solutions, revealing a fundamental bottleneck in computational reasoning rather than informational access. Could this paradigm-separating natural language interaction from solver-based policies-unlock broader accessibility to advanced decision support systems across diverse operational domains?

The Illusion of Predictability in Inventory Control

Conventional inventory policies frequently operate on the basis of predictable demand and consistent lead times – assumptions rarely met in practice. These methods often treat demand as a stable average, overlooking natural fluctuations and seasonal variations, while lead times – the delay between ordering and receiving goods – are typically estimated as fixed values. This simplification, though easing calculations, leads to suboptimal outcomes; policies designed for a static environment struggle when confronted with real-world volatility. Consequently, businesses may find themselves consistently overestimating or underestimating required stock levels, resulting in increased holding costs, expedited shipping expenses, and the potential for lost sales due to stockouts – a clear indication that reliance on these simplified models can significantly impact profitability and customer satisfaction.

Conventional inventory control systems frequently falter when confronted with the unpredictable nature of modern supply chains. Demand fluctuates due to evolving consumer preferences, unforeseen events, and promotional activities, while lead times – the duration between order placement and receipt – are rarely fixed, impacted by transportation delays, supplier issues, and geopolitical instability. This inherent uncertainty forces businesses to either maintain excessively large safety stocks to mitigate stockout risks – tying up capital and incurring storage costs – or risk lost sales and damaged customer relationships due to insufficient inventory. The consequences extend beyond immediate financial losses, potentially eroding brand loyalty and market share as competitors with more responsive supply chains gain an advantage. Ultimately, reliance on deterministic models in a stochastic world leads to increased operational expenses and a heightened vulnerability to disruptions.

A Framework for Intelligent Inventory Decisions

The Hybrid Agentic Framework integrates the capabilities of Large Language Models (LLMs) and Operations Research (OR) solvers to create a decision-support system. LLMs provide semantic understanding, enabling the interpretation of complex, often unstructured, inputs and the extraction of relevant information. This information is then passed to OR solvers, which apply rigorous mathematical optimization techniques to identify optimal or near-optimal solutions. By separating semantic understanding from policy computation, the framework leverages the strengths of both approaches: the flexibility and natural language processing abilities of LLMs, and the guaranteed optimality and scalability of OR methods. This combined approach aims to address complex decision-making problems where both qualitative understanding and quantitative optimization are critical.

Separating information extraction and policy computation within the Hybrid Agentic Framework facilitates improved system flexibility and scalability. Traditionally, many decision-support systems integrate these functions, creating a monolithic structure that limits adaptability to changes in data sources or decision-making logic. By decoupling these processes, the framework allows for independent updates and modifications to either the information extraction component – responsible for processing natural language inputs and identifying relevant data – or the policy computation component, which utilizes Operations Research solvers to determine optimal actions. This modularity supports easier integration of new data sources, algorithms, and problem domains without requiring a complete system overhaul, and enables parallel processing of these distinct tasks for enhanced performance and scalability with increasing data volumes and computational demands.

The Natural Language Interface (NLI) facilitates user interaction with the Hybrid Agentic Framework by translating human language into a structured format understandable by the system. This is achieved through the use of Large Language Models (LLMs) which parse user requests expressed in natural language, identifying key entities, relationships, and intended actions. The NLI then converts this information into a standardized query or set of parameters suitable for input into the Operations Research (OR) solvers. This decoupling of input method from the underlying computational engine allows for greater accessibility and ease of use, enabling users without specialized knowledge of OR techniques to leverage the framework’s capabilities. The NLI supports a range of input types, including questions, commands, and declarative statements, and provides a feedback mechanism to clarify ambiguous requests or confirm understanding.

Anchoring Decisions in Rigor: Mitigating Uncertainty

Large Language Models (LLMs), despite their proficiency in natural language processing, are susceptible to generating factually incorrect or illogical statements, commonly referred to as ‘hallucinations’. This inherent limitation poses a significant risk when applying LLMs to decision-making processes, particularly in areas like inventory management. Incorrect data generated by an LLM can directly translate into suboptimal inventory policies, leading to outcomes such as overstocking, stockouts, and increased operational costs. The probability of hallucination is not consistently predictable and can vary based on the complexity of the request and the training data used to develop the LLM. Therefore, relying solely on LLM-generated recommendations without validation can compromise the reliability and effectiveness of inventory control systems.

The system employs a rigorous optimization process utilizing Operations Research (OR) solvers to generate inventory policies. These solvers, based on established mathematical principles, formulate the inventory management problem as a defined objective function subject to a set of constraints reflecting factors such as demand, capacity, and costs. The optimization process identifies the policy that maximizes or minimizes this objective function – for example, minimizing total cost or maximizing service level – while strictly adhering to the specified constraints. This approach ensures that all recommendations are mathematically sound and demonstrably optimal given the input data, mitigating the risk of illogical or infeasible policies that could arise from LLM-generated outputs alone. The solvers used can include linear programming, mixed-integer programming, and other suitable techniques depending on the complexity of the inventory problem.

The system incorporates a ‘Human Imitator’ – a Large Language Model (LLM) that has been fine-tuned to emulate the decision-making processes of a rational inventory manager. This LLM doesn’t directly generate policies, but instead provides realistic and plausible inputs – such as demand forecasts, cost estimations, and risk tolerances – to the underlying optimization solver. By simulating a human expert, the Human Imitator introduces constraints and considerations that might otherwise be absent in a purely data-driven optimization, resulting in more practical and implementable inventory control policies. The use of a fine-tuned LLM allows for the injection of nuanced, context-aware inputs, improving the robustness and reliability of the overall system by mitigating the impact of potentially unrealistic or incomplete data.

The system employs a tiered agent architecture for reliable control policy generation. The Information Extraction Agent first processes raw data inputs, ensuring data integrity and format consistency for subsequent analysis. This processed data is then fed to the Optimization Agent, which utilizes Operations Research (OR) solvers to compute optimal inventory policies based on predefined constraints and objectives. Finally, the Policy Interpretation Agent translates the computationally-derived policies – which may involve complex mathematical formulations – into clear, actionable guidance for implementation, bridging the gap between algorithmic output and practical application.

Beyond Optimization: Towards Adaptable and Scalable Solutions

The system accommodates a versatile spectrum of inventory management strategies, ranging from well-established techniques such as Base-Stock and the (s, S) policy – which define reorder points and order quantities – to cutting-edge solutions leveraging Deep Reinforcement Learning (DRL). This integration allows for a nuanced approach to inventory control; traditional methods provide a solid foundation, while DRL introduces adaptability and optimization in dynamic environments. By employing algorithms that learn through trial and error, the framework can discover policies that surpass the performance of static, pre-defined rules, particularly when facing unpredictable demand or supply chain disruptions. This combination ensures both stability and responsiveness, enabling businesses to maintain optimal inventory levels and minimize associated costs.

Deep Reinforcement Learning serves as a core component, enabling the system to navigate the intricacies of inventory management in realistically uncertain conditions. Building upon established techniques like Dynamic Programming and Approximate Stochastic Control, this approach moves beyond pre-programmed rules to allow the system to learn optimal policies through trial and error. Instead of relying on static models, the agent interacts with a simulated environment, receiving rewards or penalties based on its decisions – effectively training itself to minimize costs and maximize efficiency. This adaptive learning capability is particularly valuable in dynamic markets where demand fluctuates and supply chains are subject to disruption, allowing the system to continually refine its strategies and maintain optimal performance even as conditions change.

The framework’s architecture intentionally separates the processes of interpreting business requirements – the ‘semantic understanding’ – from the complex calculations needed to determine optimal inventory levels. This decoupling is central to its scalability; as market dynamics shift or business needs evolve, only the semantic interpretation component requires modification, leaving the underlying mathematical optimization engine untouched. This modular design allows the system to readily adapt to new products, fluctuating demand patterns, or altered cost structures without requiring a complete overhaul of the optimization algorithms. Consequently, the framework isn’t simply a static solution, but a dynamic, responsive system capable of maintaining efficiency and cost-effectiveness in an ever-changing commercial landscape.

Rigorous testing of the proposed hybrid agentic framework revealed a substantial reduction in total inventory costs, achieving a 32.1% decrease when compared to a baseline system utilizing an interactive GPT-4o model. This outcome signifies a marked improvement in cost efficiency, suggesting the framework’s ability to optimize inventory management beyond current state-of-the-art approaches. The observed reduction isn’t merely statistical; it translates directly into significant savings for businesses grappling with complex supply chains and fluctuating demand. This performance underscores the potential of the framework to not only adapt to, but proactively improve upon, existing inventory control strategies, offering a pathway to leaner operations and increased profitability.

Rigorous testing of the proposed framework revealed a compelling performance advantage, achieving a 97.1% win rate against a strong interactive baseline powered by GPT-4o. This consistently high success rate demonstrates the system’s capacity to not merely suggest improvements, but to reliably formulate inventory control policies that yield superior outcomes across a broad spectrum of simulated scenarios. The substantial margin of victory indicates a fundamental optimization in the approach, moving beyond incremental gains to establish a dominant strategy in managing inventory costs and responsiveness to fluctuating demand. This level of consistent outperformance suggests the framework possesses a robust adaptability and offers a tangible path towards significantly enhanced operational efficiency.

The pursuit of optimized inventory control, as detailed within this study, benefits from a deliberate separation of cognitive tasks. The framework’s success hinges on isolating semantic understanding-a domain readily addressed by large language models-from the rigorous demands of stochastic optimization. This echoes Carl Friedrich Gauss’s sentiment: “If I have seen further it is by standing on the shoulders of giants.” The current work builds upon established operations research techniques – the ‘giants’ – and leverages LLMs not to replace them, but to enhance their utility. The 30% cost reduction demonstrates that clarity-in this case, a clear division of labor-yields quantifiable improvements, and represents a mercy to complex logistical systems.

The Road Ahead

The demonstrated decoupling of linguistic interpretation from numerical optimization offers a clarifying principle. The prevailing tendency to demand end-to-end solutions from large language models – to have them do rather than inform – now appears increasingly profligate. The thirty percent cost reduction achieved here isn’t merely a quantitative improvement; it’s an argument for intellectual economy. The value lies not in the complexity added, but in the unnecessary layers removed.

Remaining, however, is the stubborn reality of data dependence. The framework, while robust, still requires historical data to calibrate the optimization solvers. Future work must address the challenge of ‘cold start’ scenarios, where such data is scarce or nonexistent. More fundamentally, the division of labor between language model and solver feels… provisional. A true synthesis – an architecture where linguistic nuance directly informs, and constrains, the optimization process – remains elusive.

Ultimately, the question isn’t whether large language models can manage inventory, but whether they can help humans understand the fundamental uncertainties inherent in any complex system. The true measure of progress will not be cost reduction alone, but a willingness to embrace the irreducible complexity, and to design solutions that acknowledge, rather than attempt to eliminate, it.

Original article: https://arxiv.org/pdf/2601.00121.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Predictability in Inventory Control

A Framework for Intelligent Inventory Decisions

Anchoring Decisions in Rigor: Mitigating Uncertainty

Beyond Optimization: Towards Adaptable and Scalable Solutions

The Road Ahead

See also: