Orchestrating Care with Intelligent Agents

Author: Denis Avetisyan

A new system architecture aims to bring the power of large language models to dynamic clinical workflows with enhanced safety and coordination.

The proposed Agentic Operating System for hospitals demonstrates that a unified infrastructure-leveraging least-privilege execution, coordinated document mutation, manifest-guided memory retrieval, and ad-hoc skill composition-can effectively address the full spectrum of clinical needs, from scheduled care and reactive workflows to longitudinal analysis, multi-agent coordination, and critical infrastructure requirements, all without necessitating bespoke architectural extensions.

This review details the design of an agentic operating system for healthcare, focusing on restricted execution, document-mutation coordination, and page-indexed memory.

Despite rapid advances in large language models, deploying autonomous agents in healthcare remains challenging due to limitations in reliability, security, and long-term memory. This work, ‘When OpenClaw Meets Hospital: Toward an Agentic Operating System for Dynamic Clinical Workflows’, proposes a novel architecture built upon the open-source OpenClaw framework, designed to address these hurdles through a restricted execution environment, document-centric interaction, and page-indexed memory. The resulting system facilitates safe and auditable coordination of clinical workflows by structuring agent capabilities as discrete, curated skills. Could such an ‘Agentic Operating System’ ultimately redefine how healthcare leverages the power of artificial intelligence to enhance patient care and operational efficiency?

The Illusion of Access: Why Data Remains Locked Away

The prevailing methods of navigating Electronic Health Records often fall short in delivering clinicians the precise information needed for effective patient care. Traditional search functionalities, reliant on keyword matching and structured data fields, frequently fail to account for the nuanced and often implicit details embedded within physician notes, lab reports, and imaging studies. This necessitates laborious manual review of extensive patient histories, consuming valuable time and potentially leading to critical information being overlooked. Consequently, clinicians face a significant challenge in efficiently synthesizing a comprehensive understanding of a patient’s medical journey, hindering timely and informed decision-making and potentially impacting patient outcomes. The complexity arises not only from the sheer volume of data, but also from its varied formats and the lack of standardized terminology across different healthcare providers and systems.

The increasing reliance on longitudinal patient history-a comprehensive record of an individual’s medical journey-is paradoxically hampered by the very data it comprises. Healthcare providers are often confronted with a deluge of information, much of which exists as unstructured text-notes, radiology reports, and discharge summaries-rather than readily searchable data points. This presents a significant obstacle to timely clinical decision-making, as extracting relevant insights requires considerable effort and can delay diagnosis or treatment. The sheer volume, coupled with the lack of standardized formatting, means crucial details can remain hidden within a patient’s record, potentially leading to incomplete assessments and suboptimal care. Consequently, innovative approaches to data organization, natural language processing, and information retrieval are essential to unlock the full potential of longitudinal patient history and facilitate more efficient, informed healthcare.

This page-indexed memory architecture enables efficient, interpretable retrieval of clinical information by organizing data into a hierarchical document tree and leveraging language-reasoning to navigate and selectively read manifests-summaries of child nodes-without relying on vector embeddings.

Beyond the Vector Hype: A More Structured Approach

Flat vector retrieval methods represent clinical data, such as patient records or medical literature, as points in a high-dimensional space, enabling similarity searches based on vector distance. However, this approach struggles to capture the inherent hierarchical and relational complexities within clinical information. Clinical data is often structured with multiple levels of abstraction – for example, a patient has diagnoses, which are linked to symptoms, which are associated with underlying conditions. Representing this as a single flat vector loses crucial contextual information and relationships between these entities. Consequently, flat vector search can yield results that are superficially similar but clinically irrelevant, as it fails to account for the semantic relationships that define meaningful connections within the data.

Progressive Disclosure Retrieval utilizes Manifest Files to enable a hierarchical exploration of documents, differing from flat vector retrieval methods. These Manifest Files act as summaries or indices, outlining the content of larger documents or document segments. Instead of searching the entire corpus at once, the system first queries the Manifest Files to identify relevant high-level sections. This narrows the search space, and only then are the detailed contents of those specific sections retrieved and processed. This two-stage process reduces computational cost and latency, particularly when dealing with extensive datasets, by prioritizing the retrieval of pertinent information based on initial manifest analysis.

Page-Indexed Memory Architecture organizes data as a tree-structured document, enabling efficient retrieval by navigating this hierarchy rather than performing exhaustive searches. This architecture optimizes manifest maintenance – the process of updating metadata that reflects data changes – by limiting the scope of required LLM calls. Specifically, following a data mutation event, the LLM is invoked to update the manifest only for nodes along the path from the modified node to the root of the tree. This results in a computational complexity of [latex]O(L)[/latex], where [latex]L[/latex] represents the depth of the modified node within the tree; shallower modifications require fewer LLM calls, significantly reducing maintenance overhead compared to systems requiring full manifest recalculations.

The Agentic Operating System for Hospital utilizes a three-layered architecture-agent, interface, and OS enforcement-to ensure secure, auditable interactions between roles (patient, clinician, etc.) via document-based communication and kernel-level access controls, preventing direct agent-to-agent communication.

LLM Agents: More Promise Than Reality (So Far)

LLM Agent Architecture fundamentally extends the capabilities of large language models beyond text completion by integrating three core components: reasoning, planning, and tool invocation. Reasoning allows the agent to interpret information and draw logical conclusions; planning enables the decomposition of a complex goal into a sequence of actionable steps; and tool invocation provides the mechanism to interact with external systems and APIs. This architecture allows agents to execute multi-step workflows, such as scheduling appointments, retrieving information from databases, or controlling physical devices, effectively bridging the gap between natural language understanding and real-world action. Unlike traditional LLMs limited to generating text based on input prompts, LLM Agents can autonomously achieve defined objectives through iterative planning and execution, utilizing tools to gather data and modify their approach as needed.

Recent advancements in Large Language Model (LLM) agents, specifically through methods like ReAct, Reflexion, and Generative Agents, enable capabilities beyond single-turn responses by facilitating multi-step interactions with external environments. ReAct combines reasoning and acting, allowing the agent to observe, think, and act iteratively. Reflexion improves performance through self-reflection on past actions and error analysis. Generative Agents create simulated entities with memory and the ability to interact with users and other agents. These approaches commonly involve cycles of observation, planning, and action execution, utilizing tools and APIs to gather information or effect changes in the external world, thereby demonstrating potential for autonomous task completion and complex problem-solving.

Retrieval-Augmented Generation (RAG) improves Large Language Model (LLM) performance by incorporating information retrieved from external knowledge sources. Rather than relying solely on parameters learned during training, RAG systems access and utilize data from repositories like Electronic Health Records (EHRs) during the generation process. This allows LLMs to provide more accurate, up-to-date, and contextually relevant responses, particularly in domains requiring specialized or frequently updated information. The process typically involves retrieving relevant documents based on a user query, then incorporating the content of those documents into the prompt provided to the LLM, effectively grounding the LLM’s response in external evidence.

The Illusion of Control: Agentic Operating Systems and the Need for Rigor

The Agentic Operating System for Hospital functions as a foundational infrastructure layer specifically engineered to facilitate the coordinated deployment of autonomous agents. This system provides the necessary services and protocols for managing agent lifecycles, including provisioning, execution, and monitoring, within a hospital environment. It is designed to handle the complexities of distributing tasks across multiple agents and ensuring their synchronized operation, enabling scalable and resilient automation of various hospital processes. The architecture prioritizes a modular design, allowing for the integration of diverse agent types and the adaptation to evolving operational requirements without requiring significant system-wide modifications.

The Agentic Operating System prioritizes secure operation through the implementation of Least-Privilege Execution and a Document Mutation Model. Least-Privilege Execution restricts agents to only the permissions necessary to complete their designated tasks, minimizing potential damage from compromised or malicious agents. The Document Mutation Model governs how agents interact with and modify data, ensuring all changes are tracked and controlled. This model avoids direct, in-place modifications, instead favoring the creation of new document versions, enabling rollback and comprehensive auditing of data transformations.

The system utilizes an append-only mutation log to guarantee data consistency and provide a complete audit trail of all agent actions. This log records every state change, enabling deterministic replay and forensic analysis. Critically, manifest updates are maintained with a cost of O(L), where L represents the length of the manifest. This logarithmic scaling minimizes computational overhead during updates, avoiding the necessity for reprocessing entire datasets or sibling nodes, and significantly reducing maintenance requirements as the system scales.

The Next Iteration: Hoping for True Understanding

Recent advancements in Retrieval-Augmented Generation (RAG) systems are increasingly leveraging the power of knowledge graphs, as exemplified by approaches like GraphRAG and HippoRAG. These systems move beyond simple keyword searches by representing information as interconnected entities and relationships, allowing for more nuanced and context-aware retrieval. Rather than merely finding documents containing specific terms, these graph-based RAG models can infer connections and retrieve information based on semantic meaning and logical relationships within the knowledge graph. This capability is particularly valuable when dealing with complex queries or when the desired information is not explicitly stated but implied through connections between different concepts. By structuring knowledge in a graph format, these models aim to overcome limitations of traditional RAG systems, delivering more relevant and insightful responses and ultimately enhancing the overall performance of large language models.

MemGPT addresses the limitations of traditional Retrieval-Augmented Generation (RAG) systems when dealing with extensive knowledge bases by implementing a hierarchical memory model directly inspired by operating system virtual memory paging. This innovative approach segments long-term memory into ‘pages’ which are loaded and unloaded as needed, mirroring how an operating system manages RAM. Rather than retrieving from a monolithic knowledge store, MemGPT dynamically accesses only the relevant memory pages for a given query, dramatically reducing computational costs and latency. This ‘lazy loading’ strategy enables the system to effectively handle significantly larger datasets – potentially exceeding millions of tokens – without being constrained by the memory limitations of typical RAG implementations. Furthermore, this hierarchical structure facilitates more focused and efficient retrieval, improving the quality and relevance of generated responses by prioritizing information directly pertinent to the current context.

RAPTOR reimagines information retrieval through a recursive approach to summarization, constructing a tree-like structure where each node represents an abstractive distillation of the content below it. This hierarchical organization allows the system to navigate complex knowledge domains with greater efficiency; instead of sifting through lengthy documents, RAPTOR can quickly pinpoint relevant information by traversing the summary tree, focusing on increasingly specific nodes. The innovation lies in its ability to not simply retrieve relevant passages, but to synthesize understanding at multiple levels of abstraction, ultimately promising more intelligent assistance that goes beyond keyword matching and towards genuine comprehension of user needs and complex queries. By building this recursive structure, RAPTOR effectively creates a ‘cognitive map’ of the knowledge base, enabling it to reason about information and deliver more nuanced and insightful responses.

The pursuit of agentic operating systems, as detailed in this work concerning OpenClaw and clinical workflows, inevitably courts complexity. It’s a familiar pattern: elegant architectures designed to manage dynamic processes, only to be humbled by the sheer unpredictability of production. As Donald Knuth observed, “Premature optimization is the root of all evil.” This research, with its focus on restricted execution environments and document-mutation coordination, isn’t necessarily about building the fastest system, but one that avoids catastrophic failure when faced with the messy reality of real-world data and clinical practice. Better a cautiously designed system, even if slightly slower, than a brittle one optimized for theoretical perfection.

What Comes Next?

The architecture detailed within offers a predictable progression: more layers of abstraction between a Large Language Model and actual patient care. The presented solutions – restricted execution, coordinated document mutation, page-indexed memory – address immediate concerns, but merely shift the failure modes. Tests are, after all, a form of faith, not certainty. The inevitable edge cases – the ambiguous chart note, the conflicting medication order, the unexpected interaction between agents – will not be solved by better infrastructure alone. They will be discovered in production, usually on a Friday.

Future work will undoubtedly focus on scaling these systems, deploying ever more agents to manage increasingly complex workflows. But the real challenge isn’t scale; it’s brittleness. Each added layer of automation creates new opportunities for cascading errors. The question isn’t whether these systems can automate clinical work, but whether they can degrade gracefully when they inevitably fail.

The promise of an ‘agentic operating system’ implies a level of robustness rarely seen in software, let alone systems interacting with human lives. One suspects the field will spend the next decade learning that elegance is a liability, and that the most successful systems will be the ones that are messily, stubbornly, just good enough to avoid complete collapse.

Original article: https://arxiv.org/pdf/2603.11721.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/