Securing the Rise of AI Agents

Author: Denis Avetisyan


As autonomous AI agents become increasingly prevalent, ensuring their security demands a re-evaluation of traditional computer security principles.

A standard security architecture delineates components based on trust, visually distinguishing trusted elements-those considered secure and reliable-from untrusted ones that represent potential vulnerabilities or external threats.
A standard security architecture delineates components based on trust, visually distinguishing trusted elements-those considered secure and reliable-from untrusted ones that represent potential vulnerabilities or external threats.

This review examines the foundational systems security challenges of agentic computing, focusing on prompt injection vulnerabilities, least privilege enforcement, and information flow control within probabilistic systems.

While hardening individual AI models is a natural impulse, it mirrors decades-old cybersecurity lessons demonstrating that isolated defenses are often insufficient. This paper, Systems Security Foundations for Agentic Computing, reframes AI agent security through the lens of holistic systems security, arguing that established principles-like least privilege and information flow control-are applicable but demand adaptation due to the unique challenges of probabilistic models and ill-defined system boundaries. We present an analysis of eleven real-world attacks and distill critical research problems arising from applying these principles to agentic systems. How can we build truly robust and trustworthy AI agents by grounding their security in the proven foundations of computer systems security?


Beyond Perimeter Defenses: Securing the Fluid AI Landscape

Conventional computer security measures, built on the premise of defending fixed infrastructure, are proving insufficient when applied to the rapidly evolving world of artificial intelligence agents. These traditional systems excel at perimeter defense – controlling access to static resources – but struggle to accommodate the inherent dynamism of AI. Unlike servers with predictable functions, AI agents continuously learn, adapt, and interact with complex environments, creating a moving target for security protocols. This fundamental mismatch necessitates a shift in focus; safeguarding AI isn’t simply about securing where data is stored, but understanding how agents behave, reason, and potentially deviate from intended parameters. The inherent fluidity of agentic systems demands security approaches that are equally adaptive, proactive, and centered on behavioral analysis rather than static rule enforcement.

The emergence of agentic computing dramatically expands the potential for malicious attacks beyond the scope of traditional cybersecurity. Unlike static systems protected by perimeter defenses, AI agents actively interact with complex environments and other agents, creating a vastly increased attack surface. These agents, designed to autonomously pursue goals, can be exploited through manipulated inputs, compromised tools, or adversarial interactions, leading to unintended-and potentially harmful-consequences. Existing security paradigms, focused on network security and data protection, fail to address vulnerabilities inherent in agent behavior, such as goal hijacking, reward hacking, and the propagation of misinformation. Consequently, a fundamental shift is required to develop security mechanisms specifically tailored to the dynamic and unpredictable nature of agentic systems, moving beyond simply protecting data to safeguarding agent actions and ensuring trustworthy interactions.

Current security architectures largely presume a static defense – a defined perimeter safeguarding internal assets. However, the emergence of autonomous AI agents necessitates a paradigm shift; these agents, by their nature, operate dynamically, constantly interacting with environments and other agents. This paper details how traditional perimeter-based security becomes insufficient when the very entity being protected is the interaction itself. Securing AI agents, therefore, demands a focus on behavioral monitoring, intent verification, and the establishment of trust frameworks governing agent-to-agent communication. Instead of simply shielding an agent, the emphasis must be on understanding how it operates, what its goals are, and ensuring those goals align with intended safety parameters, a challenge that requires entirely new approaches to risk assessment and mitigation.

Probabilistic Safeguards: Adapting to Uncertainty in AI

Traditional security models for AI agents often rely on rigid, deterministic definitions of trust and access control. However, the inherent complexity and evolving nature of AI systems necessitate a shift towards probabilistic approaches. Probabilistic Trustworthy Computing Bases (TCBs) represent a core component of this shift, employing statistical methods to quantify and manage uncertainty in security assessments. Instead of a binary “trusted” or “untrusted” designation, probabilistic TCBs assign probabilities to the likelihood of a component behaving as expected, accounting for factors like data quality, model accuracy, and environmental variations. This allows for a more nuanced evaluation of risk and enables security policies to adapt based on the confidence level associated with each agent component and its actions. The use of Bayesian networks and other probabilistic graphical models is common in implementing these systems, allowing for the propagation of uncertainty and the identification of critical vulnerabilities with associated confidence intervals.

Dynamic security policies represent a shift from static, pre-defined security rules to context-aware mechanisms that adjust protections based on an AI agent’s operational environment and assigned task. These policies utilize real-time data, such as sensor input, task parameters, and observed agent behavior, to modify access controls, resource allocation, and threat responses. Implementation typically involves policy engines capable of evaluating conditions and enacting changes to security configurations automatically. This adaptation is critical for AI agents operating in unpredictable or evolving environments where fixed security measures may be insufficient or overly restrictive, and allows for a balance between security and operational efficiency by providing only the necessary protections for a given situation.

Adaptive security systems, while offering enhanced protection through dynamic policy adjustments, inherently create fuzzy security boundaries. These boundaries arise because traditional, statically defined trust relationships are replaced with context-dependent assessments, leading to ambiguities in determining permissible agent actions. Consequently, risk management requires novel techniques beyond conventional access control lists and firewalls. These techniques must account for probabilistic trust evaluations and the potential for misclassification or manipulation of contextual data. This work focuses on developing methods to quantify uncertainty within these fuzzy boundaries and implement mitigation strategies, such as continuous monitoring, anomaly detection, and dynamic policy refinement, to reduce the likelihood of security breaches stemming from unclear trust definitions.

Containing the Chaos: Methods for Controlling Agent Interactions

Agentic systems fundamentally operate by leveraging external tools to execute tasks, necessitating robust security protocols to manage access and prevent unauthorized actions. The Model Calling Protocol (MCP) is a key component in this architecture, defining a standardized method for agents to request and utilize tools while ensuring controlled interactions. MCP typically involves a structured request format, authentication mechanisms to verify agent identity, and authorization policies to determine permissible tool usage. This protocol facilitates a separation of concerns, allowing tool developers to focus on functionality without directly managing agent access, and enabling centralized control over resource utilization within the agentic system. Successful implementation of MCP is critical for maintaining the integrity and security of agent-driven workflows.

Sandboxing provides a critical security layer for agentic systems by confining agents within a restricted environment, thereby limiting their access to system resources and data. This isolation is particularly relevant for Browser Agents, which frequently interact with untrusted external websites and content. By executing these agents within a sandbox, potential compromises – such as malicious scripts or cross-site scripting (XSS) attacks – are contained, preventing unauthorized access to the underlying operating system or sensitive user data. Sandboxing techniques commonly employed include virtualization, containerization, and the use of restricted execution environments, each offering varying degrees of isolation and performance overhead. Effective sandboxing requires careful configuration of permissions and resource limits to balance security with functionality.

Prompt injection attacks represent a significant vulnerability in agentic systems, occurring when malicious input is crafted to manipulate the agent’s core instructions rather than being treated as data. This manipulation circumvents intended security protocols and can lead to unintended actions, data breaches, or system compromise. Current defense strategies prioritize Instruction-Data Separation, a technique that enforces a clear distinction between the agent’s fixed instructions and the variable data it processes. This is achieved through techniques like input validation, careful prompt engineering, and the implementation of dedicated parsers that categorize and handle user input accordingly, preventing malicious commands from being interpreted as legitimate instructions.

The Foundation of Trust: Core Security Practices for AI

The principle of least privilege access control dictates that artificial intelligence agents should operate with a strictly limited set of permissions, granting them only the minimum access necessary to fulfill their designated functions. This practice dramatically reduces the potential damage from compromised agents or malicious actors exploiting vulnerabilities. By restricting an agent’s capabilities, the blast radius of any security breach is contained, preventing unauthorized access to sensitive data or critical systems. Implementing this requires careful analysis of an agent’s tasks and a granular permission system, ensuring it cannot deviate from its intended purpose or access resources beyond its defined scope. Effectively, it’s a foundational security measure that minimizes risk and bolsters the overall resilience of AI-driven systems, safeguarding both data and infrastructure.

Information flow control represents a crucial layer of defense in securing AI agents, operating on the principle that even if an agent’s core functionality is compromised, the scope of data exposure can be strictly limited. This isn’t simply about preventing unauthorized access, but about meticulously tracking how information moves through the system. By defining clear pathways for data – specifying which agents can access what data, and under what conditions – potential leaks are contained at their source. Techniques range from static analysis of code to runtime monitoring of data dependencies, ensuring that sensitive information doesn’t unintentionally propagate to untrusted components or external systems. Consequently, a robust information flow control strategy minimizes the blast radius of a security breach, preserving data confidentiality and integrity even when faced with adversarial attacks or compromised agent behavior.

Coding agents, designed to autonomously generate and execute code, present unique security challenges demanding proactive mitigation. Utilizing Docker containers offers a robust isolation strategy, encapsulating the agent’s runtime environment and limiting potential damage from malicious or compromised code. Equally vital is stringent API key management; these keys, granting access to external services and data, must be securely stored, regularly rotated, and granted with the principle of least privilege. Without these measures, a compromised coding agent could inadvertently expose sensitive information or launch unauthorized actions, making containerization and secure key handling not merely best practices, but fundamental requirements for building trustworthy and reliable AI systems.

A security vulnerability in OpenAI's operator framework allows attackers to begin with a GitHub issue and ultimately exfiltrate Personally Identifiable Information (PII) to their own domain.
A security vulnerability in OpenAI’s operator framework allows attackers to begin with a GitHub issue and ultimately exfiltrate Personally Identifiable Information (PII) to their own domain.

The pursuit of agentic computing security predictably resurrects old debates. This paper correctly identifies the application of established principles – least privilege, information flow control – yet frames them as novel challenges due to the peculiarities of large language models. It’s a familiar pattern: innovation rarely creates entirely new problems, it simply recontextualizes existing ones. As Linus Torvalds once said, “Most programmers think that if their code works, it is finished. I think it is never finished.” The focus on instruction-data separation, for instance, is merely a refined version of the perennial struggle against input validation vulnerabilities. The probabilistic nature of LLMs doesn’t negate the need for robust security architectures; it simply complicates their implementation, ensuring that even the most elegant designs will eventually succumb to production realities.

The Road Ahead

The exercise of applying established security architectures to agentic systems feels…familiar. A predictable cycle of re-framing old problems. The core tenets – least privilege, information flow control, a minimized trusted computing base – remain stubbornly relevant, even as the substrate shifts to probabilistic models and loosely defined system boundaries. This suggests the true challenge isn’t inventing new security principles, but accepting that their implementation will be an ongoing negotiation with inherent uncertainty.

The focus on instruction-data separation is a start, but a clean separation feels increasingly like a comforting fiction. Every controlled release will inevitably reveal novel vectors for influence. The pursuit of ‘perfect’ prompt injection defense will be a long, expensive distraction. More productive avenues likely lie in accepting a degree of controlled leakage, focusing instead on robust monitoring and adaptive response mechanisms. It’s about managing the inevitable, not preventing it.

Ultimately, the field will be defined not by elegant theoretical frameworks, but by the accumulated tech debt of production systems. Each deployed agent will be a testament to the limitations of foresight. The real innovation won’t be in building more secure agents, but in building better post-mortem tools. After all, the bugs aren’t flaws; they’re proof of life.


Original article: https://arxiv.org/pdf/2512.01295.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-02 09:59