AI Takes the Helm: Mastering Nuclear Reactor Control

Author: Denis Avetisyan


A new approach to artificial intelligence demonstrates reliable power control of a nuclear reactor through data-driven learning and closed-loop simulation.

An integrated framework for agentic physical AI demonstrates that scaling foundation models from 1K to 100K nuclear reactor scenarios induces qualitative phase transitions-increasing precision from 26.2% to 92%, collapsing variance by 500×, and compressing policy entropy from 1.38 to 0.89 nats-while a two-phase curriculum leveraging CPT and LoRA stabilizes agentic policies by separating domain structure from task specialization and concentrating 76% of actions on single strategies despite limited training frequency, ultimately achieving closed-loop control within specified tolerances in a physics-constrained environment <span class="katex-eq" data-katex-display="false">\mathcal{M}\_{\text{feas}}</span>.
An integrated framework for agentic physical AI demonstrates that scaling foundation models from 1K to 100K nuclear reactor scenarios induces qualitative phase transitions-increasing precision from 26.2% to 92%, collapsing variance by 500×, and compressing policy entropy from 1.38 to 0.89 nats-while a two-phase curriculum leveraging CPT and LoRA stabilizes agentic policies by separating domain structure from task specialization and concentrating 76% of actions on single strategies despite limited training frequency, ultimately achieving closed-loop control within specified tolerances in a physics-constrained environment \mathcal{M}\_{\text{feas}}.

Researchers develop a domain-specific foundation model, trained with a physics-aligned policy and scaled data, to achieve a stable control manifold for reactor power management.

Despite advances in artificial intelligence, reliably controlling complex physical systems remains a significant challenge, particularly given the limitations of scaling general-purpose foundation models to tasks demanding precise quantitative reasoning. This is addressed in ‘Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor Control’, which demonstrates that a compact, physics-validated language model can learn a robust control policy for nuclear reactor power management. By scaling a synthetic dataset and employing closed-loop validation, we induce a phase transition toward stable, low-variance execution, revealing an emergent concentration on optimal control strategies. Could this approach-prioritizing physics-aligned learning over perceptual imitation-represent a viable pathway toward safe and reliable AI for critical infrastructure?


The Inevitable Limits of Parameter-Driven Control

Conventional nuclear reactor control historically centers on the meticulous adjustment of operational parameters – things like control rod position, coolant flow, and power level – often implemented through single-bank control strategies. This approach, while seemingly direct, proves inherently limited by the reactor’s complex and nonlinear dynamics. Maintaining stable operation requires extensive and often painstaking tuning of these parameters, a process that struggles to account for the myriad interactions within the core and the potential for unforeseen disturbances. The reliance on precise parameter matching creates a rigid system, ill-equipped to adapt to changing conditions or efficiently respond to unexpected events, ultimately hindering optimal performance and increasing the risk of instability.

Conventional nuclear reactor control systems, while historically effective, are increasingly challenged by the intricate and often unpredictable nature of reactor dynamics. Maintaining stability isn’t simply a matter of setting parameters; it demands continual, meticulous adjustment as the reactor responds to shifting conditions and internal feedback loops. This reliance on precise parameter tuning arises from the fact that reactors aren’t static systems – neutron flux, temperature distributions, and coolant flow all interact in non-linear ways. Consequently, engineers must painstakingly calibrate control mechanisms for every operating scenario, a process that is both time-consuming and prone to error, especially when facing unforeseen disturbances or complex transients. The inherent difficulty in modeling and predicting these dynamic interactions necessitates a level of fine-tuning that pushes the limits of traditional control strategies.

Conventional nuclear reactor operation prioritizes the precise manipulation of control parameters – coolant flow, rod position, and so on – to maintain desired operating conditions. However, this approach struggles with the reactor’s inherent complexity and can necessitate extensive, and often brittle, tuning. A fundamentally different strategy – outcome-centric control – instead focuses directly on achieving desired results, such as stable power output or safe shutdown, irrespective of the specific parameter settings. Recent investigations utilizing this paradigm, trained across 100,000 simulated scenarios, demonstrate a 97.4% success rate in closed-loop control, suggesting that prioritizing outcomes over meticulous parameter matching offers a pathway to more robust and adaptable reactor management. This shift represents a move away from prescriptive control and towards a system capable of intelligently navigating the complexities of reactor dynamics.

Increasing the dataset size to 100K samples enables the emergence of a robust, high-precision control policy, demonstrated by a significant jump in validation success (<span class="katex-eq" data-katex-display="false">26.2\%</span> to <span class="katex-eq" data-katex-display="false">92\%</span>) and consistent performance across operational regimes.
Increasing the dataset size to 100K samples enables the emergence of a robust, high-precision control policy, demonstrated by a significant jump in validation success (26.2\% to 92\%) and consistent performance across operational regimes.

Beyond Reactive Control: The Promise of Physical AI

Traditional reactor control systems focus on maintaining specific parameter setpoints – temperature, pressure, neutron flux – through feedback loops. Physical AI represents a departure from this approach, instead directly optimizing for desired physical outcomes within the reactor core, such as sustained energy production or efficient isotope generation. This shift means the AI isn’t simply reacting to deviations from pre-defined values; it’s learning complex control policies that directly influence the reactor’s physical state to achieve broader operational goals, even if that requires temporary deviations from nominal parameter settings. This outcome-centric methodology allows for greater adaptability and optimization of overall reactor performance beyond the limitations of precise, but ultimately inflexible, parameter control.

The implementation of complex reactor control policies utilizes the SmolLM2-360M language model, a 360 million parameter system trained to interpret reactor states and predict optimal control actions. This model diverges from traditional control systems by learning directly from data, enabling it to identify and execute strategies beyond pre-programmed responses. SmolLM2-360M is trained on a dataset of simulated reactor operations, allowing it to generalize to unseen conditions and autonomously refine control parameters to achieve desired outcomes. The model’s architecture facilitates the translation of high-level objectives – such as maximizing energy output or maintaining stability – into precise actuator commands, effectively functioning as a learned control algorithm.

Traditional reactor control systems, typically employing Proportional-Integral-Derivative (PID) controllers, focus on maintaining specific parameter setpoints, limiting their ability to respond to dynamic operational changes. In contrast, Physical AI prioritizes achieving desired reactor performance outcomes, enabling adaptation to fluctuating conditions without explicit reprogramming. Benchmarking demonstrates a significant performance difference: Physical AI achieved a 97.4% closed-loop success rate in maintaining optimal reactor states, while conventional PID controllers attained only 43.8% under identical operational scenarios. This substantial improvement is attributed to the AI’s capacity to learn and implement complex control policies that optimize overall reactor performance rather than individual parameter regulation.

Agentic AI demonstrates substantially improved robustness and success rates <span class="katex-eq" data-katex-display="false">97.4%</span> compared to PID <span class="katex-eq" data-katex-display="false">43.8%</span> control, particularly under large power changes, by effectively mitigating catastrophic errors and managing tail risk as evidenced by its superior error distribution and cumulative distribution function.
Agentic AI demonstrates substantially improved robustness and success rates 97.4% compared to PID 43.8% control, particularly under large power changes, by effectively mitigating catastrophic errors and managing tail risk as evidenced by its superior error distribution and cumulative distribution function.

Building the Intelligent Core: Data, Training, and Validation

Effective training of the SmolLM2-360M controller necessitates the application of Offline Data Scaling to expand the training dataset beyond initially available operational data. This technique generates synthetic data representing a broader spectrum of reactor states and control actions, increasing the model’s exposure to diverse scenarios. Scaling is achieved through techniques like trajectory augmentation and noise injection, effectively increasing the dataset size without requiring new real-world data collection. The resulting expanded dataset is crucial for improving the controller’s generalization capability and robustness, allowing it to effectively manage a wider range of operational conditions and unforeseen events within the reactor simulation environment.

The SmolLM2-360M controller is trained using a Two-Phase Curriculum that prioritizes initial language model capabilities before focusing on reactor-specific tasks. This approach first trains the model on a broad corpus of text to establish a foundational understanding of grammatical structure and linguistic patterns. Subsequently, the model undergoes adaptation utilizing Low-Rank Adaptation (LoRA), a parameter-efficient technique that allows for specialization in reactor control without retraining the entire model. LoRA minimizes computational cost and data requirements during the second phase, enabling the model to refine its understanding and apply its linguistic foundation to the nuances of reactor operation and command interpretation.

Rigorous evaluation of the SmolLM2-360M controller utilizes closed-loop validation performed within a reactor simulation environment. Performance is quantitatively assessed using Power Change Tolerance metrics, which define acceptable operational limits. This validation process demonstrably reduces the 95th percentile terminal power error (Q95) – a key indicator of control accuracy – from approximately 40% when training on limited data to 1%. This substantial reduction in Q95 signifies a significant improvement in the controller’s ability to maintain stable and accurate reactor power levels during operation, and confirms its safe operational characteristics as verified by the simulation environment.

Model scale drives strategic specialization, evolving from uniformly low success rates at 1K to selective excellence at 10K and near-perfect discrimination at 100K, indicating the learning of a hierarchical risk structure that prioritizes safe operation and deploys higher-risk strategies only when necessary.
Model scale drives strategic specialization, evolving from uniformly low success rates at 1K to selective excellence at 10K and near-perfect discrimination at 100K, indicating the learning of a hierarchical risk structure that prioritizes safe operation and deploys higher-risk strategies only when necessary.

Towards Physics-Aligned Control: A Foundation for Robustness

Physics-Aligned Control emerges from a novel integration of a Domain-Specific Foundation Model within the established Physical AI framework, resulting in systems exhibiting markedly stable and predictable behaviors. This approach diverges from conventional methods by embedding a deep understanding of physical principles directly into the control architecture; the Foundation Model learns the underlying dynamics of the environment, enabling proactive adaptation and robust performance even in the face of unforeseen disturbances. By grounding control decisions in learned physics, the system minimizes the need for reactive corrections, leading to smoother, more efficient movements and a significant reduction in instability. This inherent stability isn’t merely a characteristic of the model’s training, but a fundamental property arising from its physics-informed foundation, allowing for generalization to novel situations and reliable performance across a wider range of conditions.

Attempts to directly apply Vision-Language Models (VLMs) to closed-loop physical control systems consistently fell short due to their reliance on correlational reasoning rather than a fundamental understanding of physics. While VLMs excel at associating images with text, this approach proves inadequate when precise, real-time adjustments are required to maintain stability and achieve desired outcomes in a dynamic physical environment. Unlike VLMs, which struggle with extrapolating beyond observed data, a physics-aligned approach grounds control decisions in established physical principles, enabling robust performance even in unforeseen circumstances. This distinction highlights the critical need for models that not only perceive the physical world, but also understand its underlying mechanics to effectively govern its behavior, a requirement that correlational models inherently lack.

The implementation of Simultaneous Control represents a significant advancement in robotic control systems, leveraging multiple, independent control banks to achieve enhanced flexibility and responsiveness beyond the capabilities of traditional single-bank approaches. Rigorous testing reveals a super-linear scaling exponent of 1.24 – indicated by \alpha = 1.24 – at a tolerance of ±1%, demonstrating that performance doesn’t simply improve with more data, but undergoes a qualitative shift towards markedly superior control. This model’s robustness is further substantiated by its exceptional transfer success rate, exceeding 94% when deployed within the PyRK simulator, suggesting a capacity for reliable adaptation to new and unseen environments and tasks.

PyRK consistently outperforms the variable window approach in parsing and validation success rates, demonstrating greater robustness across maneuver regimes and exhibiting significantly lower error variance, while the variable window approach prioritizes flexible time-window handling over strict accuracy.
PyRK consistently outperforms the variable window approach in parsing and validation success rates, demonstrating greater robustness across maneuver regimes and exhibiting significantly lower error variance, while the variable window approach prioritizes flexible time-window handling over strict accuracy.

The pursuit of agentic physical AI, as demonstrated in this work, echoes a sentiment articulated by Ada Lovelace: “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.” This research doesn’t seek to create intelligence ex nihilo, but rather to meticulously instruct a system-a compact language model-to navigate the complex control manifold of a nuclear reactor. The two-phase curriculum and data scaling aren’t about granting the AI creativity, but about refining its ability to execute a physics-aligned policy with increasing reliability. The emphasis on closed-loop validation signifies an acknowledgement that even sophisticated architecture, without a grounding in empirical observation, remains fragile and ephemeral-a system destined to decay without graceful aging.

What’s Next?

The demonstration of a compact agentic system navigating the control manifold of a nuclear reactor presents not a culmination, but a sharpening of the inevitable. Uptime, even within simulation, is merely a temporary reprieve from entropy. The core challenge now isn’t achieving control-any system, given enough constraint, can exhibit temporary order-but understanding the limits of learned physics. The induced stability, while promising, remains a cached illusion, its duration inversely proportional to the complexity of unmodeled phenomena.

Future work will undoubtedly explore the scaling of both data and model complexity. However, a more pressing question involves the characterization of failure modes. What subtle perturbations, currently masked by the curriculum, will reveal the brittleness of this learned control? The path isn’t simply toward larger models, but toward models that explicitly acknowledge their own ignorance-systems that can quantify the latency inherent in every request for action, and adapt accordingly.

Ultimately, the field must move beyond seeking perfect control and embrace the inherent stochasticity of physical systems. The true metric isn’t the absence of error, but the system’s capacity to gracefully degrade, to trade performance for continued operation as the inevitable decay of the control manifold progresses. The objective is not to prevent the system from aging, but to ensure it does so with a degree of predictable resilience.


Original article: https://arxiv.org/pdf/2512.23292.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-30 16:37