Reasoning Without Retraining: A New Approach to AI Planning

Author: Denis Avetisyan


Researchers have developed a framework that empowers AI agents to plan and act more effectively by leveraging past experiences without requiring costly model updates.

The system navigates complex decision-making through a framework where Monte Carlo Tree Search identifies promising reasoning paths, distilling these into foundational, context-independent elements-softly hinted at by the current state-and ultimately grounded into decisive actions [latex] a_{t} [/latex].
The system navigates complex decision-making through a framework where Monte Carlo Tree Search identifies promising reasoning paths, distilling these into foundational, context-independent elements-softly hinted at by the current state-and ultimately grounded into decisive actions [latex] a_{t} [/latex].

SGA-MCTS decouples planning from execution using offline Monte Carlo Tree Search and reusable, de-lexicalized State-Goal-Action atoms, enabling efficient and robust tool use.

Complex, multi-step reasoning remains a significant challenge for large language model (LLM) agents, often requiring a trade-off between computationally expensive search-based planning and the limited generalization of supervised fine-tuning. To address this, we introduce ‘SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval’, a novel framework that casts LLM planning as non-parametric retrieval via offline Monte Carlo Tree Search (MCTS) and reusable, de-lexicalized State-Goal-Action (SGA) atoms. This approach enables frozen, open-weight models to achieve System 2 reasoning depth at System 1 inference speeds-matching state-of-the-art performance without task-specific training. Could this decoupling of planning and execution unlock truly scalable and real-time autonomous agents capable of tackling increasingly complex real-world problems?


The Fragility of Pattern Recognition: LLM Agents and the Limits of Memory

Large Language Model (LLM) agents, despite exhibiting remarkable capabilities, frequently encounter limitations when tasked with intricate, multi-step problem-solving that demands tool utilization. These agents primarily operate by retrieving and applying patterns learned during their extensive training – a process known as parametric knowledge. However, this approach struggles when faced with novel scenarios or tool combinations not explicitly encountered before. The inherent reliance on memorized associations hinders their ability to adapt to unforeseen circumstances or creatively sequence tools to achieve a goal. Consequently, even relatively simple tasks requiring nuanced planning or iterative refinement can prove challenging, as the agents lack the capacity to reason beyond the confines of their pre-existing knowledge base and often fail to generalize learned behaviors to new contexts.

Current large language model (LLM) agents frequently exhibit a limited capacity to transfer learned reasoning skills to novel situations. While proficient within the specific tasks they were trained on, performance often degrades substantially when confronted with even minor variations in problem framing or environmental context. This brittleness stems from a reliance on memorized patterns rather than genuine understanding of underlying principles; agents struggle to adapt previously successful strategies to new challenges requiring nuanced application or creative problem-solving. Consequently, developing agents capable of robust generalization – that is, applying reasoning flexibly across diverse tasks and environments – remains a significant hurdle in achieving truly intelligent and adaptable artificial intelligence.

Current large language model (LLM) agents frequently intertwine the processes of planning and execution, leading to brittle performance when faced with unexpected obstacles or complex scenarios. This fusion limits their ability to adapt and recover from errors during task completion. Separating these functions – allowing the agent to first devise a comprehensive plan and then execute it step-by-step – offers a pathway to more robust problem-solving. A decoupled architecture enables agents to evaluate potential actions before committing to them, anticipate challenges, and dynamically adjust their plans as needed. This approach mirrors human cognitive strategies, where deliberate planning precedes action, and allows for a more efficient allocation of resources and a greater capacity to generalize learned skills across diverse environments. By prioritizing planning as a distinct phase, LLM agents can move beyond simply recalling patterns to truly reasoning through problems.

SGA-MCTS: Abstracting Reasoning Through Experience

SGA-MCTS establishes a computational framework that separates the processes of planning and execution, mirroring the Dual-Process Theory of Cognition which posits two distinct systems: a fast, intuitive system and a slower, deliberative system. In this model, planning is handled through Monte Carlo Tree Search (MCTS) which explores potential action sequences, while execution is abstracted from these explorations. This decoupling enables the system to reuse previously computed plans and adapt to new situations without re-planning from scratch. The framework aims to improve efficiency and generalization by isolating the reasoning process from the specifics of the environment or task, allowing for the transfer of learned knowledge across different scenarios.

SGA-MCTS utilizes State-Goal-Action (SGA) Atoms to represent abstracted execution traces, effectively decoupling planning from the specifics of execution. These atoms consist of a pre-condition state, a desired goal state, and the action that transitions between them. Critically, this abstraction process removes surface-level details like object identities, precise coordinates, or timing information, focusing instead on the underlying causal relationship between states and actions. The resulting SGA atoms are therefore generalizable and reusable across diverse scenarios, representing a core causal logic independent of specific environmental configurations. This allows the system to reason about what needs to be achieved, rather than how it is achieved in a particular instance.

Schema-Guided Abstraction (SGA) transforms Monte Carlo Tree Search (MCTS) trajectories into reusable reasoning primitives by identifying and generalizing frequently occurring state-action patterns. This process involves extracting sequences of states, goals, and actions from MCTS simulations and representing them as abstract ‘SGA Atoms’. The schema component provides a set of predefined relationships that guide the generalization process, enabling the system to recognize isomorphic situations despite variations in surface-level details. By applying these schemas, SGA effectively compresses the raw MCTS exploration data into a more concise and transferable format, facilitating the reuse of learned reasoning patterns across different tasks and environments.

SGA-MCTS demonstrates significant data compression capabilities by representing MCTS exploration trajectories as reusable State-Goal-Action (SGA) atoms. Empirical results indicate a compression ratio of 6.9x when comparing the storage requirements of raw MCTS data to the abstracted SGA representation. This compression is achieved by eliminating redundant information present in multiple MCTS rollouts and retaining only the essential causal relationships defined by the SGA atoms, leading to a substantial reduction in memory footprint without loss of reasoning capability.

A decreasing relationship between ΔPass Rate and dataset tool familiarity indicates that the System for General Abstraction (SGA) excels at generalizing abstract reasoning to novel tools in out-of-distribution scenarios.
A decreasing relationship between ΔPass Rate and dataset tool familiarity indicates that the System for General Abstraction (SGA) excels at generalizing abstract reasoning to novel tools in out-of-distribution scenarios.

From Experience to Abstraction: De-lexicalization and Retrieval

De-lexicalization is a process used to enhance the generalization capabilities of reasoning systems by abstracting away from specific details within execution data. This is achieved by systematically replacing concrete entities – such as specific object names or locations – with typed slots that indicate the entity’s role or category. For example, instead of recording “move the red block to the table,” a de-lexicalized representation would record “move the block to the surface.” This substitution allows the system to reason about actions and relationships independent of particular instances, facilitating the application of learned patterns to novel situations with different, but structurally similar, entities. The resulting generalized patterns are more robust and adaptable than those based on specific, instance-level data.

Action de-lexicalization improves generalization capabilities by replacing specific parameters within action definitions with typed slots. This masking of parameters allows a single, generalized action schema to be applied across diverse scenarios with varying object instances. Rather than requiring a separate action definition for each specific object, the system can infer the correct instantiation at runtime based on the current context and available slot fillers. This parameter abstraction increases the robustness and adaptability of the system, enabling it to perform tasks with novel or previously unseen objects without requiring retraining or modification of the core action set.

Hybrid Symbolic-Semantic Retrieval addresses the challenge of efficiently identifying pertinent Structured General Action (SGA) atoms for task completion by integrating both semantic understanding and symbolic feasibility checks. This approach moves beyond purely symbolic or semantic methods by leveraging semantic embeddings to initially narrow the search space to actions conceptually related to the current task. Subsequently, a symbolic reasoning component validates these semantically similar actions based on their feasibility within the current environment state and task constraints. This dual-filtering process significantly improves retrieval accuracy and reduces computational overhead compared to methods relying solely on either symbolic or semantic analysis, enabling the selection of the most relevant SGA atoms for subsequent action generation.

The implemented retrieval mechanism demonstrates a substantial reduction in token consumption during task execution, achieving a 76% decrease compared to baseline models reliant on extensive reasoning processes. Specifically, this translates to an average of ∌2,080 fewer tokens required per task. Beyond efficiency gains, the retrieved information functions as “soft hints” during action generation, guiding the model’s selection of appropriate actions without rigidly prescribing a specific outcome. This approach balances computational cost with performance, enabling more efficient and adaptable task completion.

Unlike raw text retrieval which suffers performance degradation with increasing [latex]KK[/latex] due to noise, our de-lexicalized approach demonstrates robust and sustained performance as the retrieval size [latex]KK[/latex] increases.
Unlike raw text retrieval which suffers performance degradation with increasing [latex]KK[/latex] due to noise, our de-lexicalized approach demonstrates robust and sustained performance as the retrieval size [latex]KK[/latex] increases.

Validation and Broad Applicability Across Benchmarks

SGA-MCTS demonstrates a notable capacity for navigating intricate task landscapes, as evidenced by its successful implementation and rigorous testing on demanding benchmarks such as StableToolbench and ToolHop. These platforms present agents with complex challenges requiring sequential decision-making and skillful tool utilization – scenarios designed to push the boundaries of current LLM agent capabilities. The framework’s performance across these benchmarks isn’t merely functional; it signifies an ability to generalize learned strategies and adapt to diverse problem structures, suggesting a core strength in its approach to planning and execution. This successful application validates the design principles of SGA-MCTS and highlights its potential for broader deployment in real-world applications requiring intelligent automation and problem-solving.

SGA-MCTS exhibits notable capabilities in navigating the intricacies of multi-turn dialogues, as evidenced by its strong performance on the BFCL v3 benchmark. This challenging test assesses an agent’s ability to accurately track the evolving state of a conversation, requiring it to remember previous interactions and infer user intent across multiple exchanges. The framework’s success on BFCL v3 indicates a capacity for maintaining contextual awareness, a critical element for building effective and engaging conversational agents. Unlike systems reliant on memorizing entire dialogue histories, SGA-MCTS efficiently manages state through its decoupled planning and execution process, allowing it to focus on relevant information and adapt to changing conversational dynamics with greater precision.

SGA-MCTS enhances the capabilities of Large Language Model (LLM) agents by strategically separating the processes of planning and execution. Traditionally, LLM agents often intertwine these functions, leading to brittle performance when faced with unforeseen circumstances or complex tasks. This framework, however, allows for more deliberate consideration of potential action sequences before any action is taken, and independent refinement of each step during execution. By isolating these two critical phases, SGA-MCTS fosters a system that is not only more adaptable to dynamic environments but also exhibits greater generalizability across diverse toolsets and benchmarks, effectively building agents capable of robust problem-solving rather than simply memorizing specific solutions.

Evaluations on demanding benchmarks reveal that SGA-MCTS attains a 44.79% success rate, demonstrating a substantial 13.86% performance gain over its initial, zero-shot configuration. This achievement is particularly noteworthy as the framework reaches a level of performance comparable to that of closed-source, proprietary systems such as GPT-5, all without requiring any adjustments to its underlying parameters. The results suggest a significant step towards creating more capable and adaptable LLM agents through decoupled planning and execution, offering a competitive level of performance achieved through algorithmic innovation rather than sheer model scale.

Evaluations reveal a clear correlation between experience volume and performance gains within the SGA-MCTS framework; initial success rates of 42.3% – achieved with just two training examples – steadily increased to 45.4% as the number of examples grew to 246. This progression underscores the data efficiency inherent in the system’s core ‘SGA atoms’, demonstrating an ability to rapidly learn and improve from relatively limited experiential data. The observed performance lift highlights a key advantage of this approach, suggesting that even modest increases in training data can yield substantial gains in the robustness and reliability of LLM-based agent tool use.

SGA atoms exhibit strong data efficiency and continued performance gains, improving from 42.3% with limited experience ([latex]N=2[/latex]) to 45.4% with expanded data ([latex]N=246[/latex]).
SGA atoms exhibit strong data efficiency and continued performance gains, improving from 42.3% with limited experience ([latex]N=2[/latex]) to 45.4% with expanded data ([latex]N=246[/latex]).

The pursuit of adaptable systems, as demonstrated by SGA-MCTS, echoes a fundamental truth about complexity. This framework, which decouples planning from execution via reusable experience atoms, isn’t about avoiding the accrual of technical debt – the inevitable cost of simplification – but about managing it proactively. As John von Neumann observed, “There is no possibility of absolute certainty in this world.” The system acknowledges inherent uncertainty by externalizing experience, allowing the agent to draw upon a reservoir of past interactions rather than relying solely on parametric learning. This echoes the concept of non-parametric learning within the article, where the system’s ‘memory’ – its experience replay buffer – dictates its adaptability, accepting that perfect foresight is unattainable and graceful decay is the goal.

What Lies Ahead?

The decoupling of planning and execution, as demonstrated by SGA-MCTS, offers a temporary reprieve, not a resolution. Systems invariably degrade; the question isn’t if stored experience will become irrelevant, but when. The current reliance on offline MCTS for experience generation feels
contained. A static archive, however extensive, eventually becomes a fossil record, incapable of adapting to genuinely novel situations. The elegance of de-lexicalization only delays the inevitable drift between representation and reality.

Future work will likely focus on dynamic experience acquisition – systems that continuously refine their atomic representations, not through fine-tuning the language model itself, but through increasingly sophisticated methods of self-observation and contextual recalibration. But even this feels like rearranging deck chairs. The core challenge remains: how to maintain a useful model of the world when the world refuses to remain consistent.

Stability, it seems, is often merely a prolonged period of unobserved decay. The pursuit of robust LLM agents isn’t about building systems that avoid error, but systems that fail gracefully, and perhaps, learn something from the wreckage. The true test won’t be flawless execution, but the capacity to absorb the entropy inherent in complex environments.


Original article: https://arxiv.org/pdf/2604.14712.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-04-20 02:08