Models That Learn Physics: Discovering Hidden Laws in Neural Networks

Author: Denis Avetisyan

New research reveals deep generative models can independently infer the underlying Hamiltonian dynamics of complex systems, going beyond simple statistical approximation.

Without relying on predefined physical principles, a deep generative model operating on a dense attention graph autonomously derived the underlying structure of a one-dimensional spin chain-including sequence identities and frustrated short-range interactions-by algebraically inverting neural predictions to collapse onto an exact Hamiltonian basis, effectively functioning as a direct force estimator through the equivalence between its diffusion score field and thermodynamic restoring force [latex]\mathbf{s}\_{\theta}\equiv-\beta\nabla H[/latex].

Researchers demonstrate that deep generative models, guided by physical priors, autonomously recover the Hamiltonian governing many-body systems through score matching and Riemannian geometry.

The remarkable success of deep generative models in representing complex physical systems raises a fundamental question: do these networks simply memorize training data, or do they genuinely learn underlying physical laws? In the work ‘Autonomous Emergence of Hamiltonian in Deep Generative Models’, we introduce a framework to rigorously extract implicit physical interactions from these models, establishing an equivalence between score fields and thermodynamic forces. This allows us to demonstrate that a trained neural network can autonomously recover the microscopic Hamiltonian parameters governing a frustrated spin glass system, achieving 99.7% cosine similarity with ground truth values without any energetic priors. Does this represent a pathway towards discovering physical laws directly from data, and what are the limits of this emergent understanding?

The Illusion of Control: Reverse Engineering Physical Reality

For centuries, the core endeavor of physics has been predictive: given a set of microscopic rules governing the interactions of particles, scientists have sought to explain and anticipate the macroscopic behaviors that emerge. However, a growing field, Inverse Statistical Mechanics, proposes a compelling reversal of this traditional approach. Instead of starting with the rules and predicting the outcome, this discipline attempts to infer those very rules – the underlying Hamiltonian, or energy function – simply by observing the system’s equilibrium state. This isn’t merely an academic exercise; understanding how macroscopic patterns arise from hidden microscopic laws promises breakthroughs in diverse areas, from materials science and condensed matter physics to biology and even economics, though extracting those fundamental laws from complex observed data remains a formidable computational and inferential challenge.

Inverse Statistical Mechanics, while theoretically promising, confronts substantial computational hurdles. Determining the microscopic rules governing a system solely from its macroscopic equilibrium state demands solving equations that are often analytically intractable, even for relatively simple models. The sheer dimensionality of the possible Hamiltonian landscapes presents a significant challenge, requiring sophisticated algorithms and substantial computing power to navigate effectively. Moreover, reliably distinguishing the ‘true’ Hamiltonian from a multitude of others that could produce similar observed behavior necessitates the development of robust inference methods capable of quantifying uncertainty and avoiding spurious solutions. This reliance on statistical inference introduces complexities in validating results and ensuring the recovered microscopic laws accurately reflect the underlying physical reality, pushing the boundaries of current computational and statistical techniques.

The pursuit of understanding complex systems-from protein folding to the dynamics of financial markets-hinges on identifying the underlying Hamiltonian, the mathematical expression defining the system’s total energy and thus its behavior. Successfully reconstructing this Hamiltonian from observed equilibrium states represents a formidable challenge, as infinitely many microscopic interactions can yield the same macroscopic properties. This ambiguity necessitates sophisticated inference methods and computational power to navigate the vast landscape of possible Hamiltonians, effectively distinguishing plausible models from those that merely mimic observed data. While traditional statistical mechanics predicts outcomes from known rules, this inverse problem requires deducing the rules themselves, a task complicated by the inherent limitations in extracting detailed microscopic information from macroscopic observations and the potential for non-unique solutions. Progress in this area promises not only a deeper comprehension of established systems, but also the capacity to model and predict the behavior of entirely new, complex phenomena.

The fully connected equivariant attention network autonomously learns a sparse, localized interaction basis, demonstrated by a sharp drop in absolute interaction weights [latex]|W_{ij}|[/latex] near a cutoff distance and a rapidly decaying tail of residual weights in logarithmic scale, consistent with thermal smearing at [latex]T_0 = 0.05[/latex].

Symmetry as a Crutch: Imposing Order on Chaos

The O(3) Equivariant Attention Architecture is designed to maintain consistency under three-dimensional rotations. This is achieved by incorporating rotational symmetry directly into the attention mechanism, ensuring that predictions transform appropriately when the input data is rotated. Specifically, the architecture utilizes representations that are invariant to rotations, meaning that the output of the model will not change if the input is rotated. This property is critical because many physical systems exhibit rotational symmetry, and enforcing this symmetry as a prior can significantly improve the model’s ability to generalize and make accurate predictions. The use of O(3) equivariant layers ensures that the learned features respect these symmetries, leading to more physically plausible and robust results compared to standard attention mechanisms.

The training dataset consists of configurations generated from a one-dimensional Spin Glass model. This system is defined by a disordered arrangement of spins with competing ferromagnetic and antiferromagnetic interactions, resulting in a “frustrated” energy landscape with numerous local minima. The inherent complexity of navigating this landscape-where minimizing energy at one spin can increase it at another-makes the 1D Spin Glass an effective benchmark for evaluating the inference capabilities of the O(3) Equivariant Attention Architecture. The model’s ability to accurately predict spin configurations within this frustrated system demonstrates its capacity to learn and generalize from complex, disordered data.

The employed Riemannian Diffusion Model leverages the analytical correspondence between its score field and the Thermodynamic Restoring Force. This direct relationship simplifies the inference process by allowing gradients to be computed directly from the underlying physics of the system, eliminating the need for complex sampling or iterative refinement typically associated with diffusion models. Specifically, the score function, [latex]\nabla_{x} \log p(x)[/latex], directly represents the force that would restore the system to a stable equilibrium, effectively guiding the diffusion process towards probable states and accelerating convergence. This analytical connection provides computational efficiency and improved sample quality compared to traditional diffusion-based inference methods.

Analysis of a frustrated 1D [latex]\mathbf{O(3)}[/latex] spin glass dataset reveals a distribution of sequence lengths and bond strengths that induce geometrically frustrated, non-collinear conformations with a well-defined gradient field suitable for diffusion model training.

The Devil’s in the Details: Validating the Reconstruction

Overdetermined Linear Inversion is employed to estimate the Hamiltonian parameters based on predictions from a neural network model of the force field. This technique solves a system of equations where the number of equations exceeds the number of unknowns, providing a more stable and robust parameter estimation. To further refine the inference and address potential ill-conditioning, Tangent Projection is integrated. This method constrains the inferred parameters to remain within the subspace spanned by the tangent vectors of the force field, effectively regularizing the solution and preventing divergence from physically plausible configurations. The resulting parameters represent the coefficients defining the potential energy surface and, consequently, the forces acting within the system.

To assess the accuracy of the inferred Hamiltonian parameters, both Cosine Similarity and the coefficient of determination, R-squared (R²), were utilized. Cosine Similarity measures the angle between the inferred and ground-truth parameter vectors, providing a value between -1 and 1, where 1 indicates perfect alignment. R² quantifies the proportion of variance in the ground-truth force field explained by the inferred Hamiltonian parameters; it ranges from 0 to 1, with higher values indicating a better fit. These metrics provide complementary measures of the model’s performance, evaluating both the directional accuracy of the inferred parameters and their ability to reproduce the observed force field dynamics.

Replica Exchange Monte Carlo (REMC) was implemented to efficiently sample conformational space and generate thermal equilibrium snapshots necessary for both training and validation of the Hamiltonian inference model. This method employs multiple replicas of the system, each at a different temperature, and periodically exchanges configurations between adjacent temperature levels based on a Metropolis criterion. The use of the von Mises-Fisher distribution ensures proper sampling on the relevant energy landscape, particularly crucial for systems with complex potential energy surfaces. By generating a diverse set of equilibrated configurations, REMC provides a robust dataset for accurately determining the Hamiltonian parameters and assessing the model’s predictive capabilities.

Spherical Brownian Motion (SBM) is implemented to model the stochastic dynamics occurring on the constraint manifold during Hamiltonian inference. This approach treats the system’s state as evolving randomly on the surface of a hypersphere, accounting for the geometric constraints imposed by the Hamiltonian. The use of SBM facilitates robust sampling by preventing the system from violating these constraints during the inference process, particularly crucial when dealing with high-dimensional parameter spaces. The diffusion coefficient within the SBM framework is adjusted to maintain detailed balance, ensuring proper convergence of the Markov Chain Monte Carlo sampling used for parameter estimation. This method provides a statistically sound basis for generating training data and validating the inferred Hamiltonian parameters.

Quantitative analysis confirms a high degree of fidelity between the inferred Hamiltonian parameters and the established ground truth. Specifically, cosine similarity measurements yielded a value of 99.7%, indicating near-perfect alignment in the direction of the parameter vectors. Furthermore, the model accounts for 87% of the variance observed in the force field, as determined by R-squared calculations. These metrics collectively demonstrate the model’s capacity to accurately reconstruct the underlying Hamiltonian dynamics from network predictions, validating the efficacy of the inference process.

Beyond Prediction: A Glimmer of Understanding

The research presents a departure from traditional deep learning applications, which largely focus on predicting outcomes from existing data. This framework instead actively seeks to uncover the underlying physical laws governing a system, effectively functioning as an automated scientific discovery tool. By analyzing data from interacting spin models, the deep learning architecture doesn’t simply map inputs to outputs, but infers the rules governing those interactions – identifying, for instance, the effective couplings between spins. This capability extends beyond mere pattern recognition; the model can generalize these discovered laws to previously unseen configurations, suggesting a pathway towards building AI systems that don’t just learn from data, but understand the principles that generate it – a crucial step towards tackling complex scientific challenges where explicit equations are unknown or intractable.

Rigorous evaluation of the developed method involved a direct comparison with AlphaFold 3, a state-of-the-art deep learning system renowned for its structural prediction capabilities. This benchmarking, conducted within the context of the Ising spin model (ISM), revealed a compelling efficacy of the new approach. The results demonstrate not simply predictive power, but a capacity to accurately infer the underlying physical laws governing the system – a capability exceeding that of existing methods. Specifically, the new technique showcased superior performance in identifying key parameters and phase transitions within the ISM, suggesting its potential as a powerful tool for uncovering hidden relationships in complex physical systems and offering a pathway beyond purely predictive modeling.

A critical component of this research involved rigorously testing the model’s ability to perform Out-of-Distribution (OOD) prediction, effectively gauging its generalization prowess beyond the training data. This assessment moved beyond simply confirming accuracy on familiar configurations; instead, the system was challenged with previously unseen scenarios to determine its robustness and predictive reliability in novel contexts. Successful OOD prediction indicates the learned internal representation captures fundamental physical principles, rather than memorizing specific instances. This capacity is vital for real-world applicability, as physical systems invariably encounter configurations outside of those used for initial training, and a model’s ability to accurately extrapolate to these unseen states is paramount for trustworthy predictions and discovery.

The innovative Information Symmetry Mapping (ISM) framework extends far beyond the study of spin glasses, offering a powerful new methodology applicable to a diverse range of complex systems. Researchers anticipate ISM will prove invaluable in materials science, potentially accelerating the discovery of novel materials with tailored properties by predicting stable configurations and identifying emergent behaviors. Simultaneously, the framework holds promise for biophysics, where understanding protein folding, molecular interactions, and cellular dynamics relies heavily on deciphering intricate, multi-body relationships – challenges ISM is uniquely positioned to address. By effectively learning the underlying symmetries governing these systems, ISM doesn’t merely predict outcomes, but begins to reveal the fundamental principles at play, offering a path toward deeper understanding and potentially revolutionary advancements in both fields.

The pursuit of elegant frameworks within deep generative models feels increasingly like building sandcastles against the tide. This work, demonstrating autonomous Hamiltonian discovery, isn’t about achieving perfect theoretical alignment-it’s about observing what actually emerges when systems are pushed to their limits. As Jean-Paul Sartre noted, “Existence precedes essence.” The model doesn’t begin with a pre-defined Hamiltonian; it creates one through interaction with data. This echoes the findings-the Hamiltonian isn’t imposed, it’s inferred, a practical outcome forged from the messy reality of complex systems. Tests, of course, are merely a momentary stay against the inevitable entropy of production deployments.

What’s Next?

The autonomous discovery of Hamiltonians within deep generative models presents a predictable escalation. The elegance of inferring physical laws from data will inevitably collide with the messiness of production systems. Current implementations, while theoretically satisfying, remain constrained by the simplicity of the modeled systems. The true test will be scaling these methods to encompass genuinely complex phenomena – turbulent flows, protein folding, the stock market – where the underlying Hamiltonian, if it exists at all, is likely a monster of infinite terms.

A critical path lies in addressing the limitations of the priors themselves. Injecting physical constraints is currently a manual process, a form of algorithmic hand-holding. Future work must explore methods for learning the appropriate priors, a meta-inference problem that threatens to introduce even more layers of abstraction. Anything that promises to simplify life adds another layer of abstraction. And, naturally, the resulting models will require extensive validation – a process for which adequate metrics, let alone labeled data, are conspicuously absent.

The field now faces a choice: pursue ever-more-sophisticated inference algorithms, or invest in the unglamorous work of building robust, scalable simulation infrastructure. The former offers the allure of intellectual novelty, while the latter is simply… necessary. Documentation is a myth invented by managers, but reliable benchmarks are not. CI is the temple – one prays nothing breaks.

Original article: https://arxiv.org/pdf/2604.20821.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Control: Reverse Engineering Physical Reality

Symmetry as a Crutch: Imposing Order on Chaos

The Devil’s in the Details: Validating the Reconstruction

Beyond Prediction: A Glimmer of Understanding

What’s Next?

See also: