Predictive Process Monitoring Gets a Multi-Agent Boost

Author: Denis Avetisyan

A new framework combines the power of large language models with a multi-agent system to significantly improve the accuracy and flexibility of predicting outcomes in ongoing processes.

The system constructs a predictive model of ongoing processes by transforming event logs into semantic narratives stored in a vector-based memory, then employs a multi-agent framework-comprising predictor agents and a decision-making assistant fused into a single entity-to generate anticipatory assessments of future states.

This work introduces a Retrieval-Augmented Generation approach leveraging multi-agent systems for enhanced Work-in-Progress prediction using event log data and narrative encoding.

Accurate anticipation of workload fluctuations remains a persistent challenge in predictive process monitoring, despite advances in time-series forecasting. This paper introduces ‘A Multi-Agent Retrieval-Augmented Framework for Work-in-Progress Predictio’-a novel approach that synergistically combines retrieval-augmented generation with a collaborative multi-agent system to enhance both the accuracy and adaptability of Work-in-Progress (WiP) prediction. By leveraging historical event data encoded as semantic narratives and enabling dynamic reasoning between specialized agents, our framework achieves competitive performance-reducing Mean Absolute Percentage Error to as low as 1.50% on benchmark datasets and outperforming traditional methods. Could this integration of large language models and multi-agent systems unlock a new paradigm for robust and interpretable process intelligence?

Beyond Prediction: Decoding the Whispers in Process Data

Current predictive models, such as Recurrent Neural Networks and Transformer Architectures, demonstrate proficiency in identifying patterns and dependencies within sequential process data. However, these approaches often treat events as isolated data points, focusing on when something occurs rather than what it signifies. While adept at extrapolating from past sequences – predicting the next step based on established order – they struggle with interpreting the underlying meaning of those steps. This limitation hinders their ability to anticipate deviations caused by unforeseen circumstances or subtle shifts in operational context, as the models lack the capacity to understand the implications of each event beyond its immediate temporal relationship with others. Consequently, predictions can be inaccurate when faced with novel situations or nuanced changes in Work-in-Progress, highlighting the need for systems that can move beyond mere pattern recognition to true semantic understanding.

Current predictive models, while proficient at identifying patterns in process data, frequently struggle with the subtle shifts inherent in real-world workflows. These models often treat each event as a discrete data point, overlooking the contextual relationships that define Work-in-Progress (WiP). Consequently, they may fail to anticipate how a minor deviation-such as a slight delay in material delivery or a change in operator assignment-can cascade into larger disruptions. This limitation hinders proactive adaptation to dynamic workloads, forcing systems to react after a problem emerges rather than preemptively adjusting to prevent it. The inability to discern nuanced changes in WiP therefore restricts the potential for truly intelligent and responsive process control, necessitating a move beyond purely sequential analysis.

Predictive models operating on process data frequently excel at identifying that an event has occurred, but struggle with interpreting its significance within a broader operational context. A move towards semantic understanding – discerning what is happening, not simply registering a change – promises a substantial leap in predictive accuracy. By incorporating knowledge about the meaning of events, relationships between processes, and the implications of specific states, models can move beyond reactive forecasting to proactive anticipation. This approach allows systems to infer potential disruptions, identify emerging bottlenecks, and adapt to dynamic workloads with a level of nuance currently beyond the reach of traditional sequential analysis. Ultimately, enriching models with semantic information facilitates a shift from pattern recognition to genuine comprehension, enabling more robust and reliable predictions in complex operational environments.

Comparing prediction accuracy on the BPIC13 dataset, the Multi-Agent model demonstrates the lowest Mean Absolute Percentage Error (MAPE) for predicting Work-in-Progress (WiP) compared to Daily Memory, Weekday-Aware, and Windowed Context approaches.

From Data Streams to Narratives: Reconstructing the Operational Story

Narrative Encoding addresses the limitations of directly analyzing raw event logs by converting process traces – sequences of actions within a system – into human-readable textual narratives. This transformation isn’t simply a reformatting of data; it involves structuring the event sequence to explicitly represent both the temporal order of events and their semantic meaning. Specifically, the process captures not just that an event occurred, but when it occurred in relation to other events, and what that event signified within the overall process. This structured textual representation facilitates analysis beyond traditional log mining, enabling the application of natural language processing techniques and improving the interpretability of predictive models by providing contextual information typically lost in numerical data.

SNAP and LUPIN are methodologies designed to generate textual narratives from process execution logs, utilizing large language models (LLMs) like GPT-3.5-Turbo. SNAP employs a prompting strategy that decomposes process traces into individual events and then uses the LLM to articulate these events in a coherent sequence. LUPIN, conversely, focuses on identifying patterns and dependencies within the logs before constructing the narrative, allowing for more contextualized storytelling. Both methods rely on carefully crafted prompts to guide the LLM in transforming raw log data – including timestamps, event names, and associated data – into human-readable accounts of process execution, effectively translating operational data into a structured narrative format.

Encoding process data into a narrative format facilitates prediction robustness and interpretability beyond traditional statistical modeling. Statistical methods often identify correlations without explaining why an event occurs, limiting their ability to generalize to unseen scenarios. Narrative encoding, by structuring process traces as text, introduces contextual information and causal relationships that language models can leverage. This allows for predictions based on understanding the sequence of events and their underlying meaning, improving performance in scenarios with limited data or evolving processes. Furthermore, the textual format enables human review and validation of the reasoning behind predictions, increasing trust and facilitating debugging of the predictive system.

Orchestrating Intelligence: A Multi-Agent Framework for WiP Prediction

The proposed framework combines Retrieval-Augmented Generation (RAG) with an agentic Large Language Model (LLM) architecture to leverage narrative encoding for improved Work-in-Progress (WiP) prediction. RAG enables the LLM to access and incorporate relevant historical narratives during prediction, grounding its output in concrete past instances rather than relying solely on parametric knowledge. This is achieved by retrieving pertinent narratives from a vector-based memory store using techniques like LlamaIndex, and then augmenting the LLM’s input with this retrieved context. The agentic architecture further decomposes the prediction task into specialized roles, allowing each agent to focus on specific aspects of the narrative and temporal data, thereby enhancing the overall predictive capability by encoding information from multiple perspectives.

The framework employs three distinct Predictor Agents to generate Work in Progress (WiP) forecasts from varying analytical perspectives. The Daily Memory Agent focuses on recent activity, utilizing short-term patterns to predict immediate needs. The Weekday-Aware Agent incorporates the known cyclical variations in workflow associated with specific days of the week, accounting for predictable fluctuations. Finally, the Windowed Agent analyzes WiP trends over user-defined periods, identifying longer-term patterns and potential shifts in demand. Each agent operates independently, producing a separate forecast which is subsequently consolidated by the Fusion Agent to create a comprehensive WiP prediction.

The Fusion Agent serves as the central component for consolidating predictions from the Daily Memory, Weekday-Aware, and Windowed Predictor Agents. It employs ReAct (Reason + Act) reasoning, an iterative process where the agent generates a rationale for its actions and then executes those actions to refine its understanding and ultimately synthesize a comprehensive Work-in-Progress (WiP) forecast. This involves analyzing the individual predictions, identifying areas of agreement and disagreement, and leveraging the reasoning process to resolve discrepancies and generate a unified output. The ReAct framework enables the Fusion Agent to dynamically adjust its reasoning based on the retrieved information and the outputs of the Predictor Agents, resulting in a more robust and accurate WiP prediction than any single agent could produce independently.

The system utilizes vector-based memory to store and efficiently retrieve past incident narratives. These narratives are embedded as vectors using techniques like sentence transformers, allowing for semantic similarity searches. LlamaIndex serves as the data framework, indexing these vector embeddings and providing an interface for querying relevant past cases based on the current incident description. This retrieval process grounds the WiP predictions in concrete examples, moving beyond purely generative forecasting and enabling the agents to leverage historical data when formulating their predictions. The combination of vector storage and LlamaIndex’s indexing capabilities significantly reduces retrieval latency and improves the relevance of the retrieved narratives.

Comparing model performance on the Helpdesk dataset, the bottom panel shows that Daily Memory, Weekday-Aware, Windowed Context, and Multi-Agent models exhibit varying Mean Absolute Percentage Error (MAPE) when predicting Work in Progress (WiP) as shown in the top panel.

Beyond Accuracy: Measuring Impact and Reshaping Operational Dynamics

Rigorous evaluation of the proposed framework utilized established metrics – Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE) – to quantify predictive accuracy. Results demonstrate a substantial improvement over traditional methods in forecasting Work-in-Progress (WiP). Specifically, the framework achieved a remarkably low MAPE of 1.50% on the BPIC13 Incidents dataset, surpassing the performance of all benchmark models. Further validation on the Helpdesk dataset yielded a MAPE of 2.91%, accompanied by a corresponding MAE of 9.45% on BPIC13 Incidents, solidifying the framework’s reliability and precision in predicting future workloads and enabling data-driven decision-making.

The framework’s utility extends beyond simple prediction, incorporating Decision-Making Assistant Agents to translate complex data into actionable insights. These agents, exemplified by the Trend Analyst, don’t merely present forecasts; they actively distill high-level signals from the predicted workflow patterns, clarifying why certain outcomes are anticipated. This interpretability is crucial, as it allows stakeholders to understand the underlying drivers of predicted events, enabling more informed and strategic decision-making. By surfacing these key trends, the agents guide proactive resource allocation and facilitate adaptive responses to changing conditions, moving beyond reactive problem-solving towards a more predictive and efficient operational model.

The framework’s predictive power stems from a novel approach to incorporating semantic understanding into Work-in-Progress (WiP) forecasting. By employing narrative encoding, raw event data is transformed into a structured, human-readable format that captures the context and relationships between process steps. This enriched data then fuels a multi-agent system, where each agent specializes in extracting specific semantic signals – such as identifying critical resources or potential bottlenecks – and contributes to a more holistic and nuanced prediction. This integration of ‘meaning’ dramatically improves the framework’s ability to handle noisy or incomplete data, resulting in predictions that are demonstrably more robust and reliable than those produced by traditional, purely statistical methods. Consequently, the system is less susceptible to minor variations in operational conditions and better equipped to accurately anticipate future workloads.

Rigorous testing demonstrates the predictive power of the proposed multi-agent Retrieval-Augmented Generation (RAG) framework, achieving a Mean Absolute Percentage Error (MAPE) of just 1.50% when applied to the BPIC13 Incidents dataset – a result that surpasses the performance of all benchmark models. Further validation on the Helpdesk dataset yielded a MAPE of 2.91%, alongside a Mean Absolute Error (MAE) of 9.45% on the BPIC13 Incidents dataset, collectively highlighting the framework’s capacity for accurate and reliable Work-in-Progress (WiP) predictions across diverse operational scenarios. These results confirm the framework’s potential to significantly improve forecasting accuracy and facilitate data-driven decision-making.

The culmination of this predictive framework lies in its ability to fundamentally reshape resource allocation and operational responsiveness. By accurately forecasting future workloads, organizations can move beyond reactive problem-solving towards proactive strategies, ensuring optimal staffing levels and preemptively addressing potential bottlenecks. This shift minimizes wasted resources, reduces operational costs, and maximizes overall efficiency; instead of simply responding to demand, businesses can anticipate it. The framework facilitates a dynamic alignment between available resources and projected needs, fostering a resilient and adaptable operational model capable of navigating fluctuating demands and complex scenarios, ultimately leading to substantial gains in productivity and a strengthened competitive advantage.

The pursuit of accurate Work-in-Progress (WiP) prediction, as detailed in this framework, inherently involves a controlled dismantling of assumptions. The system doesn’t simply accept event log data; it actively probes, retrieves, and reconstructs understanding through the multi-agent interactions. This resonates deeply with Donald Knuth’s observation: “Premature optimization is the root of all evil.” The framework’s iterative refinement-constantly adjusting narrative encoding and retrieval strategies-is not about achieving a perfect, static model. It’s about embracing the imperfection inherent in prediction and iteratively improving upon it, acknowledging that each ‘patch’ – each model adjustment – is a confession of the limits of current knowledge and a step towards deeper understanding. The best hack, in this case, isn’t a single elegant solution, but a system designed to continually learn from its errors.

Beyond the Horizon

The framework presented here functions as a targeted exploit of comprehension, revealing the potential of multi-agent systems to navigate the inherently noisy landscape of work-in-progress prediction. However, the current iteration remains tethered to the quality of its event logs – a predictable limitation. The true challenge isn’t simply predicting what will happen, but gracefully handling the inevitable discrepancies between model expectation and observed reality. Future work must actively probe the boundaries of this predictive capability, deliberately introducing adversarial noise into the event logs to assess the system’s robustness – and, more importantly, its failure modes.

A natural progression lies in expanding the scope of ‘narrative encoding’. Currently, this function serves as a translation layer. It could, however, be engineered as an active agent itself, constructing counterfactual scenarios to evaluate the confidence intervals of its predictions. This moves beyond simple prediction toward a form of ‘cognitive simulation’ – a system capable of reasoning about why a process might deviate from its expected course.

Ultimately, the field isn’t concerned with building perfect predictors. Perfection is a static target, and reality is relentlessly dynamic. The value resides in creating systems that can continuously re-engineer their understanding, adapting to the inevitable anomalies and exploiting the subtle patterns hidden within the chaos. The goal, then, isn’t prediction, but persistent, iterative refinement of the model of reality itself.

Original article: https://arxiv.org/pdf/2512.19841.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Prediction: Decoding the Whispers in Process Data

From Data Streams to Narratives: Reconstructing the Operational Story

Orchestrating Intelligence: A Multi-Agent Framework for WiP Prediction

Beyond Accuracy: Measuring Impact and Reshaping Operational Dynamics

Beyond the Horizon

See also: