Guiding Language Models with Beliefs and Policies

Author: Denis Avetisyan

A new system decouples pipeline logic from execution, allowing language model workflows to dynamically adapt based on semantic understanding.

Credo enables the declaration of logical pipelines, beliefs, and policies via its APIs, after which an execution engine dynamically rewrites and enacts these pipelines, adapting to runtime beliefs and matched policies to ensure consistent and informed action-a process grounded in the principle that behavior stems from logically derived, belief-driven execution [latex] B \rightarrow P \rightarrow A [/latex].

Credo enables adaptive LLM pipelines by separating logical definition from physical execution via declarative beliefs and policies.

While agentic AI systems increasingly navigate complex, evolving environments, their reliance on imperative control and ephemeral memory hinders adaptability and verifiability. This limitation motivates the development of ‘Credo: Declarative Control of LLM Pipelines via Beliefs and Policies’, a system that decouples pipeline logic from execution by representing semantic state as beliefs and regulating behavior through declarative policies. This approach enables dynamic, auditable, and composable LLM pipelines that adjust configurations at runtime without code modification. Could this belief-driven paradigm unlock truly robust and transparent agentic systems capable of continuous learning and reliable decision-making?

The Challenge of Complex Reasoning

Despite the remarkable capabilities of Large Language Models (LLMs) across a spectrum of applications, a persistent challenge lies in their ability to effectively address complex queries that demand multi-step reasoning. These models, while proficient at identifying patterns and generating text, often struggle when confronted with questions requiring the synthesis of information from multiple sources or the application of logical inference. The difficulty stems from the inherent limitations in how LLMs process and retain information, frequently leading to inaccuracies or incomplete responses when faced with tasks exceeding simple recall or pattern matching. This is not a matter of lacking data, but rather a difficulty in using the data to build a coherent and logical pathway to a correct answer, hindering their performance on tasks demanding more than superficial understanding.

Conventional information retrieval systems, such as those employing the BM25 algorithm, prioritize keyword matching and often struggle with the subtleties of complex queries. While computationally efficient, this approach frequently overlooks the contextual meaning and relationships between concepts, resulting in a relatively low accuracy rate of 45% when addressing multifaceted questions. In contrast, newer methodologies, like that demonstrated by Credo, emphasize a deeper understanding of semantic intent and achieve a significantly improved accuracy of 72% – representing a remarkable 27-point performance increase. This substantial gain highlights the limitations of purely lexical matching and underscores the growing importance of semantic understanding in modern information retrieval.

Evaluating a retrieval-to-generation pipeline on the FinanceBench benchmark-using 100 queries across hundreds of lengthy financial reports-demonstrates a trade-off between accuracy and total cost, which includes both LLM inference and embedding expenses.

Constructing Adaptive LLM Pipelines: A Necessary Architecture

LLM pipelines establish a structured framework for information processing by chaining together individual language model calls and associated data transformations. These pipelines facilitate complex tasks beyond the capabilities of a single LLM interaction. Several systems aid in pipeline construction, notably LOTUS, which provides a modular and extensible architecture for defining and executing LLM-based workflows, and DocETL, a tool specifically designed for extracting, transforming, and loading data to and from LLMs. These tools offer features such as data ingestion, pre-processing, LLM integration, post-processing, and output formatting, enabling developers to build robust and scalable applications leveraging large language models.

Traditional, static Large Language Model (LLM) pipelines are limited in their ability to handle queries requiring differing levels of complexity or reasoning. These pipelines execute a pre-defined sequence of operations regardless of the input, which can lead to inefficiencies or inaccurate results when faced with nuanced or multi-faceted questions. Dynamic approaches, such as those utilizing ReAct or AutoGen, address this limitation by enabling iterative and adaptive reasoning within the pipeline. ReAct facilitates LLMs in observing, thinking, and acting, allowing for dynamic adjustments to the processing flow based on intermediate results. AutoGen extends this by enabling multiple LLM agents to collaborate on complex tasks, breaking down queries into smaller, manageable steps and improving overall accuracy and reliability for complex queries.

RouteLLM functions as a query complexity scoring mechanism designed to facilitate dynamic LLM pipeline selection. It operates by assigning a numerical score to incoming queries based on characteristics like length, the presence of multiple entities, and the identified reasoning steps required for a satisfactory response. This score is then utilized to route the query to a pre-defined pipeline configuration optimized for that specific complexity level; simpler queries might utilize a streamlined, single-step pipeline, while more complex queries are directed to multi-step reasoning pipelines leveraging frameworks like ReAct or AutoGen. The lightweight nature of RouteLLM allows for rapid assessment and routing, minimizing latency while maximizing the effectiveness of the overall LLM pipeline system.

Credo: Declarative Control and Adaptive Reasoning in Practice

Credo utilizes a declarative control mechanism for Large Language Model (LLM) Pipelines, diverging from traditional imperative approaches. This is achieved through the definition of Beliefs – factual statements representing the system’s understanding – and Policies – rules specifying how the pipeline should behave based on those Beliefs. By explicitly defining these elements, Credo enables precise control over pipeline execution and facilitates adaptability to changing information or requirements without requiring code modifications. The system interprets these declarative statements to govern the flow of data and the application of operators within the LLM Pipeline, effectively allowing the system’s behavior to be determined by what needs to be done, rather than how it should be done.

The Credo Reactive Execution Engine functions by continuously assessing defined policies against the current state of system Beliefs. This assessment triggers dynamic pipeline modification, encompassing both Operator Rewrite and Pipeline Rewrite. Operator Rewrite involves adjusting individual operator parameters or selections within the pipeline, while Pipeline Rewrite facilitates the addition, removal, or reordering of entire operators. These modifications are performed iteratively, responding to changes in Beliefs – such as updated retrieved evidence or refined reasoning steps – to optimize pipeline behavior without requiring manual intervention or retraining. This adaptive process ensures the pipeline remains aligned with the desired objectives and current information context.

Credo’s control mechanisms facilitate both answer refinement and factual verification. Through a process termed Self-Refinement, the system iteratively improves response quality. Simultaneously, Credo ensures Groundness by confirming that all generated content is directly supported by retrieved evidence, thereby mitigating the risk of Hallucinations. Benchmarking demonstrates Credo achieves 72% accuracy in this combined process, significantly exceeding the 50% accuracy of a strong 27B parameter model utilizing BM25 retrieval as its baseline.

The CredoWeb interface visualizes system execution and adaptation by displaying the compiled plan, execution history with corrective rewrites, extracted semantic evidence, and active policies that can be edited to observe immediate plan propagation.

Validating Adaptive Pipelines and Charting Future Directions

Retrieval-to-Generation pipelines are increasingly leveraged for complex question answering, and systems like Credo demonstrate substantial performance gains when integrated into these architectures. Recent evaluations on the challenging FinanceBench dataset reveal Credo’s ability to surpass existing methods; achieving 72% accuracy compared to a baseline of 65% attained by a 12B parameter language model paired with dense retrieval and carefully tuned chain-of-thought prompting. This improvement suggests that adaptive pipeline control, facilitated by systems like Credo, effectively refines the information flow and reasoning process within these models, leading to more accurate and reliable answers for intricate queries. The results highlight the potential of dynamically adjusting retrieval and generation strategies to optimize performance in knowledge-intensive tasks.

A critical component of validating the adaptive pipeline architecture lies in its ability to assess retrieval quality with a high degree of granularity, achieved through the implementation of CRAG – a comprehensive retrieval assessment framework. Unlike traditional metrics that offer a holistic, but often opaque, view of retrieval performance, CRAG dissects the process, evaluating retrievals based on their relevance, sufficiency, and novelty. This detailed analysis allows researchers to pinpoint specific strengths and weaknesses within the retrieval component, enabling targeted improvements and a deeper understanding of how effectively the system identifies and delivers supporting evidence to the Large Language Model. By meticulously examining retrieval at this granular level, CRAG not only confirms the overall effectiveness of the adaptive pipeline but also provides actionable insights for future optimization and refinement.

Current research indicates that the future of Large Language Model (LLM) Pipelines lies in refining the systems that govern them, specifically through more expressive Policies and Beliefs. This approach allows for increasingly granular control over the LLM’s reasoning process, moving beyond simple prompting techniques. Evidence of this potential is demonstrated by Credo, a system achieving 72% accuracy on complex question answering – a significant improvement over the 65% attained by a comparable 12B parameter model utilizing dense retrieval and meticulously crafted chain-of-thought prompting. Further development in this area promises even more sophisticated LLM behavior, enabling nuanced responses and improved performance across a wider range of tasks by giving the system a more robust internal framework for decision-making.

The pursuit of robust LLM pipelines, as exemplified by Credo, demands a foundation built on predictable and provable logic. The system’s separation of pipeline definition from execution mirrors a commitment to mathematical purity-a pipeline isn’t merely ‘working’ if it produces desired outputs on specific tests; it must be demonstrably correct across a semantic state. As Grace Hopper wisely stated, “It’s easier to ask forgiveness than it is to get permission.” This resonates with Credo’s adaptive execution; the system doesn’t require pre-defined paths for every eventuality, but dynamically adjusts, embracing a reactive approach to control, and prioritizing a logically sound foundation over rigid predetermination.

What’s Next?

The separation of logical specification from physical execution, as demonstrated by Credo, presents a clear, if understated, advancement. However, the current formulation relies on explicitly defined beliefs and policies. A truly elegant solution would demand a system capable of inferring these from the pipeline’s inherent semantics – a formal derivation of control logic from the problem statement itself. The asymptotic complexity of belief space exploration remains a significant obstacle; naïve enumeration is unsustainable, and any heuristic approach introduces the spectre of suboptimal solutions.

Furthermore, the current architecture treats beliefs as immutable during a single execution. This is a pragmatic simplification, but it ignores the potential for self-correction. A future system could incorporate mechanisms for belief refinement based on observed pipeline performance, effectively implementing a form of online learning. The challenge, naturally, lies in preventing catastrophic belief drift – a deviation from truth that invalidates the entire control scheme. A provably convergent belief update algorithm is not merely desirable; it is essential.

Finally, the question of expressiveness lingers. While the declarative approach offers advantages in modularity and maintainability, it remains to be seen whether it can fully capture the nuances of complex LLM interactions. The boundary between what is declarable and what requires imperative control is not yet well-defined, and exploring this limit will be crucial for realizing the full potential of belief-driven systems.

Original article: https://arxiv.org/pdf/2604.14401.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Complex Reasoning

Constructing Adaptive LLM Pipelines: A Necessary Architecture

Credo: Declarative Control and Adaptive Reasoning in Practice

Validating Adaptive Pipelines and Charting Future Directions

What’s Next?

See also: