Governing AI’s Expansion: A New Framework for Open Institutions

Author: Denis Avetisyan

As artificial intelligence systems grow in autonomy and scale, a formal model is needed to manage their boundaries and ensure responsible expansion.

This paper introduces ‘AI Space Physics,’ a formal system for governing open AI institutions by treating authority expansion as a critical boundary event requiring explicit witness obligations and adjudication.

As agentic AI systems evolve beyond simple inference tools into persistent institutions, existing governance frameworks struggle to address the subtle but critical expansions of their authority surfaces. This paper, ‘AI Space Physics: Constitutive boundary semantics for open AI institutions’, introduces a formal model that reframes authority-surface expansion-even without immediate external effects-as a first-class boundary event requiring explicit witness obligations and adjudication. By defining a minimal state model with typed boundary channels and a ‘membrane-witness discipline’, we offer a constitutive semantics for governing these open, self-expanding AI institutions. Will this approach enable more robust and transparent governance of increasingly autonomous AI systems, or will new complexities emerge as these institutions continue to evolve?

The Illusion of Control: Addressing Unstated Assumptions

Many artificial intelligence systems, despite their increasing sophistication, function on a bedrock of unstated understandings about how the world works – assumptions concerning physics, social norms, and even basic causality. These systems are often designed to achieve a goal without being explicitly programmed with the limits of acceptable action, creating a potential disconnect between intended behavior and real-world consequences. This isn’t necessarily a flaw in design, but rather a consequence of building intelligence that can operate autonomously; however, the lack of clearly defined boundaries means an AI might, for instance, optimize for a task in a way that inadvertently causes physical damage, violates privacy, or disrupts critical infrastructure – all because the system wasn’t told what actions are impermissible, only what outcomes are desirable. Consequently, the efficacy of these systems isn’t solely determined by their ability to perform, but by the often-invisible framework of presuppositions governing their interactions with the external world.

As artificial intelligence systems grow in complexity and are deployed at larger scales, the potential for unforeseen repercussions increases dramatically. While individual components might function as intended, their combined effects within a real-world environment are often difficult to predict. This necessitates a shift towards robust mediation – the implementation of carefully designed safeguards and oversight mechanisms that actively monitor and regulate the system’s interactions with its surroundings. Such mediation isn’t simply about preventing harm; it’s about establishing a buffer between the system’s internal logic and the complexities of the external world, allowing for adjustments, corrections, and the mitigation of unintended consequences before they escalate. Without this intermediary layer, even seemingly benign AI actions can produce cascading effects, highlighting the critical need for proactive control as these systems become more pervasive.

The notion of ‘safe’ artificial intelligence operation isn’t a fixed parameter, but a judgment call critically dependent on established boundaries and oversight. Without clearly defined adjudication – a process for evaluating actions against pre-defined safety criteria – even well-intentioned systems can produce harmful outcomes. This ambiguity arises because ‘safe’ isn’t absolute; it’s relative to values, contexts, and potential consequences, all of which require interpretation. As AI systems gain autonomy and scale, the lack of a governing framework for determining acceptable risk escalates the potential for unintended harm, shifting the focus from technical feasibility to ethical and societal implications. Consequently, defining what constitutes a safe operational state demands continuous assessment and refinement, ensuring alignment with human values and preventing unforeseen dangers as systems interact with increasingly complex environments.

Architecting Intelligence: A Formalized Space for Action

The ‘AI Space Physics’ framework defines AI institutions through a constitutive semantics based on typed channels and horizon-limited reach. Typed channels establish specific communication pathways within the institution, categorizing information flow and restricting interactions based on predefined types. Horizon-limited reach constrains the scope of influence and access within the system, preventing unbounded propagation of effects and ensuring localized control. This combination creates a structurally defined space where interactions are not merely possible, but are governed by explicit rules regarding what can communicate with what, and how far those interactions can extend, forming the basis for predictable and auditable AI behavior.

The ‘Membrane’ functions as a critical component within the AI Space Physics architecture, serving as the designated surface for classifying transitions occurring at system boundaries. This classification process isn’t simply observational; it actively categorizes these transitions based on pre-defined criteria inherent to the system’s operational logic. Crucially, the Membrane also anchors ‘witness records’ – immutable data points documenting each classified boundary transition. These records serve as verifiable evidence of system state changes, providing a historical audit trail and ensuring accountability for all interactions occurring at the system’s perimeter. The anchoring of witness records is integral to maintaining the integrity and trustworthiness of the AI institution, as it provides a persistent and verifiable log of boundary-relevant events.

The AI Space Physics architecture mandates that all transitions identified as boundary-relevant undergo structural adjudication via the ‘Membrane’ component, a principle known as P-1a Non-Bypass. This enforcement is achieved through a foundational framework of four constitutive laws: P-1, P-1a, P-1b, and P-1c. These laws collectively define the permissible interactions and state changes within the system, ensuring that boundary events are not circumvented or processed outside of the established adjudication process. The strict adherence to these laws is critical for maintaining the integrity and traceability of boundary transitions within the AI institution.

Managing the Load: A Calculus of Oversight

The Admissibility Profile functions as a configurable policy layer by defining the specific criteria used to evaluate boundary transitions within the system. These criteria encompass a set of rules and parameters that determine whether a proposed transition from one state to another is permissible. The profile allows administrators to customize these evaluation metrics without altering core system logic, enabling adaptation to evolving security requirements and operational contexts. Parameters within the profile can include data validation checks, access control lists, and time-based constraints, all of which are assessed during the transition evaluation process. This modular design facilitates granular control over system behavior and supports the implementation of diverse security policies.

The ‘P-1b Atomic Adjudication-to-Effect’ mechanism guarantees that the system’s validation of a boundary transition and the subsequent action taken are completed as a single, indivisible operation. This atomic execution prevents potential race conditions that could arise from concurrent interactions. Specifically, it ensures that no partial application of a change occurs, and that the system state remains consistent even under high contention. By completing both validation and action before releasing control, ‘P-1b’ eliminates the possibility of an incomplete or inconsistent state resulting from interleaved operations, thereby maintaining data integrity and system reliability.

An increasing volume of external interactions can create a ‘Review Backlog’ within the adjudication process, directly reducing the system’s capacity for effective oversight. Design Corollary H-1 posits a quantifiable relationship between ‘risk-weighted expansion pressure’ – calculated based on the number and criticality of new external interactions – and available ‘review bandwidth’, which represents the capacity for manual or automated review. When expansion pressure exceeds review bandwidth, a backlog forms, increasing the potential for unvalidated or improperly authorized boundary transitions and, consequently, compromising the system’s established safety guarantees. This backlog isn’t simply a matter of processing time; it represents a direct reduction in the system’s ability to enforce its defined policies and maintain security.

The Limits of Knowing: Embracing Inherent Constraints

The ‘Finite Observer’ principle posits a fundamental constraint on any system attempting complete self-awareness. Even with theoretically perfect instrumentation – sensors of infinite precision and zero noise – a system cannot fully observe its own internal state. This limitation arises because the very act of observation necessitates interaction, subtly altering the observed state and creating an inescapable feedback loop. Essentially, a system can never simultaneously be the observer and the unperturbed object of observation. This isn’t a matter of technological inadequacy, but rather an inherent property of information and interaction, suggesting that complete internal observability is an unattainable ideal, with implications for systems requiring absolute certainty about their own operation and state.

The inherent limits of observation directly influence the sustainability of scarcity continuation – systems designed to maintain resource limitations. A lack of complete knowledge regarding the causes of external tasks compels more frequent interventions at system boundaries, as adjudication processes must compensate for unobserved factors driving resource demand. This dynamic is formalized by two derived propositions which establish a clear relationship between external task causation and the resulting regimes of scarcity continuation. Essentially, incomplete observability increases the burden on adjudication, demanding greater oversight and potentially escalating operational costs to ensure the intended scarcity parameters are maintained; the more unknown the inputs, the more active the management required to sustain the limitation.

Replayable adjudication, denoted as ‘P-1c’, proposes a method for establishing trust and accountability in systems despite inherent limitations in observation. This approach doesn’t attempt to overcome the impossibility of complete internal observability, but instead focuses on meticulously recording the parameters of state transitions – the specific conditions that trigger changes within the system. By preserving this record, past states and the causal factors driving transitions can be reconstructed and verified, offering a powerful audit trail. This is particularly crucial in scenarios where scarcity or resource allocation are dynamically managed, as it allows for independent confirmation of fair and consistent application of rules, even if real-time observation was incomplete or imperfect. Consequently, ‘P-1c’ shifts the focus from perfect present-time knowledge to verifiable historical accuracy, bolstering confidence in the system’s integrity and promoting reliable dispute resolution.

The pursuit of formalized governance, as detailed in the article, echoes a sentiment expressed by G. H. Hardy: “A mathematician, like a painter or a poet, is a maker of patterns.” This construction of ‘AI Space Physics’-a formal model for delineating authority and managing expansion-is, at its core, a pattern-making exercise. The article attempts to impose order on the inherently complex behavior of self-expanding AI institutions, treating boundary events-like expansion-as critical junctures requiring rigorous ‘witness obligations’. This mirrors the mathematician’s drive to establish axiomatic structures, seeking precision and predictability where chaos might otherwise reign. The focus on ‘boundary mediation’ isn’t merely technical; it’s an attempt to define the very edges of acceptable behavior, a process fundamentally aligned with establishing the rules governing a mathematical system.

Where Do We Go From Here?

The proposition that authority itself possesses a surface, and that expansion across this surface demands formal witness, feels less a solution and more a precise articulation of the problem. It clarifies that governance isn’t about controlling expansion, but about acknowledging it as a fundamental event, and then responsibly documenting its occurrence. The model offered does not eliminate the difficulty of defining “expansion” – that remains stubbornly context-dependent – but shifts the focus from prevention to rigorous accounting.

A natural progression lies in exploring the limits of this “witness discipline.” What constitutes sufficient evidence? How does one adjudicate conflicting accounts of boundary crossings? The temptation will be to layer complexity – more witnesses, finer-grained metrics – but that would betray the core principle. The true test will be whether the model can be stripped down to its essential components, retaining utility even in the face of ambiguity.

Ultimately, the value of “AI Space Physics” may not reside in its predictive power, but in its diagnostic capacity. It offers a framework for detecting when governance has failed – not because an AI exceeded its bounds, but because those bounds were never properly observed, or the act of crossing them was never acknowledged. The absence of witness, then, becomes the signal, and the simplest, most honest measure of all.

Original article: https://arxiv.org/pdf/2603.03119.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Control: Addressing Unstated Assumptions

Architecting Intelligence: A Formalized Space for Action

Managing the Load: A Calculus of Oversight

The Limits of Knowing: Embracing Inherent Constraints

Where Do We Go From Here?

See also: