Building Worlds with Words: The Rise of Web-Based AI Environments

Author: Denis Avetisyan

Researchers are exploring how to combine the power of large language models with web technologies to create dynamic and controllable virtual worlds for artificial intelligence agents.

This work introduces a series of Web World Models, detailed in Section 3, which establish a framework for representing and manipulating digital environments.

This paper introduces Web World Models, a novel architecture for constructing persistent, deterministic environments that enable robust agent reasoning and procedural generation.

Creating truly persistent and adaptable environments for AI agents remains a core challenge, often forcing a trade-off between rigid, database-driven systems and the unbounded but unpredictable nature of fully generative models. This paper introduces Web World Models (WWMs), an architecture that bridges this gap by leveraging ordinary web code to define consistent world states and “physics,” while employing large language models to generate dynamic narratives and high-level decisions. We demonstrate that this approach-building on a realistic web stack-enables the creation of scalable, controllable, and open-ended environments, from infinite travel atlases to complex simulations. Could web infrastructure itself become a fundamental substrate for building the next generation of intelligent, embodied agents?

The Illusion of Consistency: Limitations of Current LLMs

Despite their remarkable ability to generate human-quality text, Large Language Models often falter when tasked with constructing and maintaining a coherent world. These models, trained on vast datasets of text, excel at mimicking patterns and relationships, but lack a grounding in real-world consistency or the capacity for robust reasoning. Consequently, narratives produced by LLMs can suffer from internal contradictions, illogical events, or a shifting of established details-a character’s profession might change mid-story, or previously defined rules of the environment may be broken without explanation. This isn’t a failure of creativity, but rather a limitation of their architecture; the models prioritize statistical likelihood over factual accuracy or narrative integrity, making consistent world-building and dependable reasoning enduring challenges in the pursuit of truly immersive and believable artificial intelligence.

Despite the remarkable advancements in Large Language Models (LLMs) and their capacity to generate seemingly endless creative content, simply increasing their scale proves inadequate for building genuinely persistent and interactive environments. The core issue lies in a lack of inherent consistency; LLMs, trained on vast datasets of disconnected text, often struggle to maintain a coherent world state or remember details across extended interactions. This absence of ‘grounding’ – a connection to real-world knowledge or a defined set of rules – results in outputs that can be internally contradictory or nonsensical within the context of the simulated environment. Consequently, even the most powerful LLMs require supplementary mechanisms – such as memory systems, knowledge graphs, or external databases – to reliably track information and ensure logical coherence, moving beyond mere text generation towards true interactive simulation.

Unlike traditional web frameworks that rely on fixed database contexts and limit scalability, our Web World Model leverages large language models and a code-based physics layer to generate unlimited context without extensive data storage, though this generative approach introduces challenges in maintaining deterministic global control.

Decoupling Imagination: The Architecture of Web World Models

Web World Models represent a significant architectural shift in interactive application development by separating the foundational logic of a simulated environment from the generative capabilities of Large Language Models. Traditionally, application state and behavior are tightly coupled, making dynamic content creation and consistent world operation challenging. This new paradigm distinctly defines a deterministic core – responsible for managing state, physics, and core mechanics – and an ‘imagination’ layer driven by LLMs. This decoupling allows LLMs to contribute descriptive text, narratives, and dynamic elements without directly controlling the underlying world’s fundamental behavior, ensuring predictable system operation even with variable LLM outputs.

Decoupling world physics from imaginative content generation via Web World Models facilitates consistent state management by isolating the deterministic elements responsible for maintaining world consistency from the non-deterministic outputs of Large Language Models. This architecture ensures predictable behavior because the core world state is governed by code, unaffected by the potentially variable descriptive content generated by the LLM. Consequently, even with dynamic content creation, the system maintains a stable and trackable world state, preventing inconsistencies and enabling reliable interactions and simulations. The LLM’s output, while influencing the description of the world, does not directly alter the underlying, deterministic rules governing its functionality.

Web World Models leverage a dual-system architecture founded on deterministic code and large language models. The deterministic code component functions as the foundational layer, responsible for consistently updating and maintaining the state of the virtual environment based on defined rules and inputs. This ensures predictable and repeatable outcomes for all interactions within the simulation. Complementing this, large language models are employed to generate descriptive text and dynamic content, enriching the user experience without altering the underlying stability of the simulated environment. This separation of concerns allows for creative expression through the LLM while guaranteeing a reliable and consistent world state managed by the deterministic code.

The Galaxy Travel Atlas system generates a logically consistent and infinite universe by using coordinate hashes to seed procedural star system layouts, then leveraging a large language model to synthesize narrative content conforming to a defined JSON schema.

Applications Demonstrated: Scaling Interactive Worlds

Web World Models (WWMs) facilitate the creation of applications capable of generating large-scale, explorable content without requiring pre-authored assets for every element. Applications such as Infinite Travel Atlas, Galaxy Travel Atlas, and WWMPedia leverage this capability to construct expansive virtual environments and knowledge repositories. These applications dynamically generate content based on user interaction and underlying data, effectively creating infinitely large worlds or datasets. The WWMs achieve this by defining rules and relationships that govern the generation of content, rather than storing the content itself, allowing for exploration of virtually limitless spaces and information.

AI Alchemy and Cosmic Voyager leverage agent reasoning to perform complex simulations entirely within the deterministic framework of Web World Models. This means all simulation outcomes are predictable given the initial state and agent actions, eliminating randomness. Agents within these applications are defined by their objectives and permissible actions, interacting with the simulated environment through defined interfaces. The deterministic nature allows for precise control and repeatability of experiments, crucial for applications like virtual chemical synthesis in AI Alchemy or celestial mechanics modeling in Cosmic Voyager. These simulations do not rely on probabilistic methods; instead, agent behavior and environmental responses are governed by explicit rules and logical operations, ensuring consistent results for identical inputs.

Applications leveraging Web World Models achieve scalability through procedural hashing and typed interfaces. Procedural hashing allows for the generation of content on demand, avoiding the need to pre-store vast datasets; unique identifiers are created algorithmically, enabling access to content without requiring explicit storage of all possible states. This is coupled with typed interfaces, which define strict data structures for communication between components and ensure interoperability. These interfaces facilitate seamless data exchange, allowing different modules to interact reliably and efficiently, regardless of their underlying implementation, and contributing to a modular and scalable system architecture.

The Infinite Travel Atlas functions by hashing user-selected geographic coordinates to retrieve deterministic seeds and location metadata, which then contextually prompts a large language model to generate a structured, visually-themed destination guide rendered directly on the client without requiring backend storage.

Resilience by Design: Graceful Degradation and Deterministic Consistency

A notable strength of Web World Models lies in their ability to maintain functionality even when faced with unpredictable large language model (LLM) performance. Rather than failing outright when an LLM call times out or returns an error, the system is designed for graceful degradation. This means the experience doesn’t simply halt; instead, the simulation adapts, potentially offering a slightly simplified interaction or utilizing cached data to keep the world responsive. This resilience is achieved through careful architectural choices, prioritizing a consistent user experience over complete reliance on always-available LLM responses. Consequently, users remain engaged with the virtual environment, even during temporary disruptions, which is crucial for applications demanding high reliability and consistent performance.

The foundation of a believable and expansive simulated world within Web World Models rests on deterministic generation. This means that, given the same initial conditions and inputs, the system will consistently produce identical outputs. This isn’t merely about aesthetic uniformity; it’s a critical requirement for scalability. By eliminating randomness, the system avoids branching possibilities that would exponentially increase computational demands. Deterministic behavior enables efficient caching, streamlined debugging, and reliable reproduction of events – essential for handling a growing number of users and complex interactions within the simulation. Without this consistency, the simulated world would become unpredictable and ultimately unsustainable, undermining the user’s sense of immersion and trust.

The foundation of this system’s dependability rests on established web technologies. Utilizing TypeScript brings enhanced code quality and maintainability through static typing, minimizing runtime errors and facilitating large-scale development. Furthermore, the implementation of HTTP streaming allows for a responsive user experience by delivering information progressively, rather than waiting for the entire simulation to render. This approach not only reduces perceived latency but also enables the system to gracefully handle fluctuating network conditions and varying computational loads. By embracing these proven technologies, the architecture ensures both a robust and performant experience, capable of consistently delivering a functional and engaging simulation even under challenging circumstances.

Cosmic Voyager combines a WebGL solar system engine with a Gemini Flash-powered symbolic core to deliver a dynamic, educational experience featuring both general planetary descriptions and view-dependent narration that adapts every 30 seconds, ensuring continuous functionality even without API access.

The Future: Neuro-Symbolic AI and the Convergence of Reasoning

Web World Models signify a crucial advancement toward Neuro-Symbolic Artificial Intelligence, a paradigm that strategically merges the capabilities of neural networks with the precision of symbolic reasoning. Traditionally, neural networks excel at pattern recognition and learning from vast datasets, but often lack the ability to explain how they arrive at conclusions. Conversely, symbolic reasoning provides logical, interpretable steps, yet struggles with the ambiguity and complexity of real-world data. Web World Models bridge this gap by enabling systems to learn from data – much like neural networks – and then represent that knowledge in a structured, symbolic format. This allows for both creative generation and verifiable reasoning, leading to AI that is not only intelligent but also transparent and reliable – a necessary step towards truly immersive and interactive digital experiences.

The convergence of neural networks and symbolic reasoning yields systems distinguished by a unique blend of capabilities. Unlike traditional neural networks, often considered ‘black boxes’, this neuro-symbolic synergy allows for interpretable decision-making, where the rationale behind an action can be traced and understood. This transparency directly fosters reliability, as errors are more easily identified and corrected. Furthermore, these integrated systems transcend simple pattern recognition; they exhibit the capacity for complex reasoning, enabling them to solve problems requiring logical deduction, planning, and adaptation to novel situations – essentially bridging the gap between data-driven insights and human-like cognitive abilities.

The convergence of neural networks and symbolic reasoning promises a future where interactive digital environments transcend current limitations, offering experiences indistinguishable from reality. These next-generation worlds will not simply respond to user input, but genuinely understand intent and context, fostering dynamic narratives and personalized interactions. Imagine virtual spaces capable of learning from user behavior, adapting to individual preferences, and even generating novel content on the fly – all grounded in logical consistency and interpretable reasoning. This neuro-symbolic synergy allows for the creation of truly immersive experiences, where the boundary between the physical and the digital blurs, opening doors to unprecedented opportunities in entertainment, education, and beyond. The potential extends to creating digital twins of real-world environments with intelligent agents that can simulate, predict, and optimize complex systems, fundamentally changing how individuals interact with technology and the world around them.

AI Alchemy utilizes a neural-symbolic architecture-integrating a user interface with a cellular automata simulator and a large language model-to synthesize and cache reaction outcomes from particle collisions, enabling controlled emergent behavior within a self-expanding sandbox and guided by an optional AI Supervisor.

The pursuit of deterministic systems, as exemplified by Web World Models, aligns with a fundamental principle of computational elegance. Ken Thompson once stated, “Software is only ever truly complete when it’s been shipped.”, a sentiment that resonates with the WWM’s focus on creating shipped environments-persistent, controllable worlds for agent interaction. This isn’t merely about functionality; it’s about building systems where behavior is predictable and verifiable, mirroring the mathematical purity sought in elegant algorithms. The WWM architecture, by grounding agent reasoning in a deterministic web framework, strives for a completeness that transcends mere empirical testing, aiming instead for provable scalability and reliability-a cornerstone of truly elegant design.

Where Do We Go From Here?

The construction of Web World Models, while a pragmatic step toward scalable agent environments, merely relocates the fundamental challenge: ensuring deterministic behavior within a probabilistic system. The allure of procedural generation, coupled with large language models, should not distract from the fact that true control demands provable state transitions. Every interaction, every rendered element, introduces a potential source of non-determinism, and the architecture, as presented, offers no guarantee against emergent, unpredictable behavior. Minimizing redundancy in the environment’s construction is paramount; any superfluous element is an abstraction leak waiting to manifest as an illogical agent action.

Future work must concentrate on formal verification of these generated worlds. The current reliance on empirical testing – observing whether an agent “succeeds” – is philosophically unsatisfying and practically insufficient. A rigorous mathematical framework for specifying world constraints and validating agent interactions is essential. Only then can the promise of truly controllable, persistent environments be realized, moving beyond the illusion of intelligence to genuine, demonstrable reasoning.

The pursuit of neuro-symbolic integration should not be conflated with simply appending a symbolic layer onto a stochastic process. The elegance of a solution lies not in its complexity, but in its simplicity. The most powerful environments will be those built on the fewest assumptions, the most rigorous constraints, and the most ruthlessly pruned redundancies.

Original article: https://arxiv.org/pdf/2512.23676.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/