Uncovering Hidden Laws of Motion

Author: Denis Avetisyan

A new Bayesian framework systematically searches for the underlying equations governing dynamic systems, even when key physical principles are unknown.

A bioreactor system leverages machine learning to model missing physical phenomena; specifically, a neural network and Bayesian symbolic regression-represented by orange dots and green dashed lines, respectively- both predict aspects not captured by the traditional Monod equation [latex] \mu = \mu_{max} \frac{S}{K_S + S} [/latex] (shown as a solid blue line).

This work introduces a Bayesian approach to symbolic regression using Reversible Jump MCMC to quantify uncertainty in discovered models of missing physics.

Model-based systems often struggle when underlying physical laws are incomplete or unknown. This limitation motivates the work presented in ‘Bayesian Inference for Missing Physics’, which addresses the challenge of discovering missing dynamics in complex systems. By combining universal differential equations with Bayesian symbolic regression and Reversible Jump Markov Chain Monte Carlo, this paper provides a principled framework for quantifying uncertainty in recovered model structures-moving beyond point estimates to a full posterior distribution. Can this approach enable more robust and reliable modeling of dynamical systems, particularly in scenarios with limited or noisy data?

Bridging the Gap: The Challenge of Missing Physics

The foundation of countless scientific and engineering models rests upon differential equations – mathematical statements describing how systems change over time. However, a significant challenge arises because these equations often require terms that represent underlying physical processes which are not fully understood or are computationally intractable. For instance, modeling turbulent fluid flow, complex chemical reactions, or even biological population dynamics frequently necessitates approximations or simplifications due to missing knowledge of the precise forces, rates, or interactions at play. While the broad structure of the equation may be known – recognizing that change occurs – the specific functional forms and parameter values governing how that change unfolds remain elusive, leading to models that, while useful, are inherently incomplete and potentially inaccurate in predicting real-world behavior. This gap between mathematical formalism and physical reality necessitates innovative approaches to bridge the divide and improve the fidelity of simulations.

Conventional approaches to modeling complex systems frequently falter when confronted with incomplete physical understanding. These methods, reliant on precisely defined equations, produce increasingly unreliable forecasts as critical terms – representing, for instance, subtle interactions or unmeasured forces – are omitted. The resulting discrepancies aren’t merely statistical errors; they stem from a fundamental inability to capture the true dynamics governing the system. Consequently, predictions based on such incomplete models can diverge significantly from observed reality, particularly over extended time horizons or when systems undergo substantial change. This limitation underscores the need for innovative techniques capable of navigating uncertainty and effectively approximating the influence of these ‘missing’ physical components, ensuring model robustness and predictive power.

The capacity to discern ‘missing physics’ represents a fundamental advancement in modeling complex systems, extending far beyond theoretical exercises to impact practical prediction across numerous scientific fields. When crucial elements governing a system’s behavior remain unknown, standard modeling techniques falter, yielding inaccurate forecasts and limiting understanding. Successfully inferring these absent components – be they subtle interactions in fluid dynamics, uncharacterized reaction pathways in chemistry, or hidden variables in ecological modeling – allows for the creation of more robust and reliable simulations. This capability isn’t simply about refining existing models; it unlocks the potential to accurately predict outcomes in scenarios where previous approaches failed, driving innovation in areas ranging from climate forecasting and materials science to drug discovery and economic analysis. Consequently, research dedicated to uncovering these hidden physical principles is paramount for advancing scientific knowledge and tackling complex, real-world challenges.

Both a neural network and Bayesian symbolic regression ([latex]10[/latex] draws) successfully predict the states of a Lotka-Volterra system despite noisy measurements, closely matching the true values indicated by solid lines.

Unveiling the Hidden: Bayesian Symbolic Regression as a Powerful Synthesis

Bayesian Symbolic Regression (BSR) automates the process of identifying missing terms in differential equations. Traditional methods often require manual specification of potential terms, which can be computationally expensive and prone to human error. BSR, however, employs search algorithms, such as genetic programming, to explore a vast space of possible mathematical expressions. It then uses Bayesian inference to evaluate the likelihood of each expression given observed data, effectively ranking potential terms based on their ability to accurately model the system described by the differential equation. This allows the algorithm to identify and include relevant terms, and exclude those that do not contribute to a good fit, thereby reconstructing the complete differential equation from limited information. The process inherently handles model complexity by penalizing overly complex expressions, preventing overfitting and promoting parsimony in the discovered equations.

Bayesian Symbolic Regression integrates the capabilities of Symbolic Regression, which identifies mathematical expressions representing relationships within data, with the principles of Bayesian Inference. This combination moves beyond simply finding an equation to providing a probability distribution over possible equations. Bayesian Inference allows for the quantification of uncertainty associated with the discovered model; rather than a single ‘best’ equation, the method yields a posterior distribution over the model space, enabling assessment of the confidence in each identified term and the overall model. This is achieved through the use of Bayes’ Theorem, combining a prior probability distribution reflecting existing knowledge with the likelihood of the data given the model to produce a posterior probability distribution representing the updated belief about the model parameters. The result is not just a model [latex] \hat{y} = f(x) [/latex], but a predictive distribution [latex] p(y|x,D) [/latex] that accounts for model uncertainty.

The implementation of prior distributions within Bayesian Symbolic Regression allows for the incorporation of pre-existing knowledge about the system being modeled, thereby influencing the search for optimal equations. These priors can take various forms, including preferences for specific functional forms, constraints on parameter values, or expectations regarding the relative importance of different terms. By assigning higher probabilities to plausible solutions based on prior knowledge, the algorithm reduces the search space and mitigates overfitting, leading to more robust and generalizable equations, particularly when dealing with noisy or limited datasets. This is achieved by modifying the posterior probability distribution, favoring solutions that align with the defined prior beliefs while still being consistent with the observed data.

Embracing Learnable Physics: Universal Differential Equations and Implementation

Universal Differential Equations (UDEs) establish a methodology for incorporating Neural Networks directly into the structure of differential equations to model systems with incompletely understood or unknown physical laws. Rather than defining a fixed differential equation, UDEs treat unknown physics as learnable functions parameterized by a Neural Network. This network is then embedded within the equation, typically as coefficients or functional forms of existing terms, or as entirely new terms. The resulting equation, for example, [latex]\frac{du}{dt} = f(t, u) + NN(t, u)\frac{du}{dt}[/latex], is then solved using standard numerical methods for differential equations, with the network’s weights being updated during the solution process via backpropagation based on observed data. This allows the model to simultaneously learn the unknown physics represented by the Neural Network and solve the differential equation, providing a data-driven approach to scientific modeling.

Traditional physics-informed neural networks (PINNs) often require researchers to manually specify the functional form of missing terms within governing differential equations. In contrast, Universal Differential Equations (UDEs) circumvent this limitation by directly learning the structure of these unknown terms from observed data. The method achieves this through a neural network embedded within the differential equation itself; the network’s parameters are then optimized to minimize the residual of the equation given the data. This data-driven approach eliminates the need for prior assumptions about the form of the missing physics, allowing the model to discover potentially complex relationships directly from observations without requiring explicit feature engineering or domain expertise regarding the equation’s structure. Consequently, UDEs offer increased flexibility and can potentially model a wider range of physical phenomena compared to methods reliant on pre-defined functional forms.

Implementation of Universal Differential Equations (UDEs) relies on the DynamicExpressions.jl package within the Julia programming language. This package facilitates the construction and evaluation of symbolic expressions representing differential equations as tree structures. These tree structures enable efficient automatic differentiation and allow for the dynamic modification of equation terms during the learning process. DynamicExpressions.jl provides tools for composing complex mathematical operations, simplifying expressions, and generating efficient code for evaluating [latex]\frac{dy}{dt}[/latex] and other derivatives, which is critical for training UDEs with observational data. The package’s focus on performance and flexibility allows for scalable implementation of UDEs, accommodating both simple and complex physical systems.

From Inference to Impact: Validation and Real-World Application

Bayesian Symbolic Regression has proven capable of discerning the fundamental equations governing biological processes, as demonstrated by its application to a fed-batch bioreactor. This approach successfully recovered the well-established Monod equation, which describes the relationship between microbial growth rate and limiting substrate concentration. The Monod equation, often expressed as [latex]\mu = \frac{\mu_{max}S}{K_m + S}[/latex], where μ is the specific growth rate, [latex]\mu_{max}[/latex] is the maximum specific growth rate, [latex]S[/latex] is the substrate concentration, and [latex]K_m[/latex] represents the half-saturation constant, governs the dynamics of numerous bioprocesses. Recovering this equation directly from experimental data signifies a powerful capability to identify the underlying physics of complex systems without prior assumptions about the model structure, paving the way for more accurate process monitoring and control in biotechnological applications.

A well-planned experimental strategy proves paramount when discerning the underlying principles governing a complex biological system like a fed-batch bioreactor. Researchers discovered that intelligently designed experiments – those strategically varying input parameters – dramatically enhance the ability to recover the true mathematical relationships, such as the Monod equation describing microbial growth, compared to simply running experiments at random. This isn’t merely about achieving a better fit to existing data; it’s about correctly identifying the form of the governing equations themselves. The study demonstrates that sufficient, purposefully collected data unlocks the potential to infer the missing physics, significantly improving model accuracy and offering a path towards more effective process control and optimization in biomanufacturing.

The demonstrated success in recovering governing equations – particularly rational functions – from limited experimental data signifies a substantial advancement in dynamic system modeling. This approach transcends traditional methods by not simply fitting data, but actively inferring the underlying physical relationships driving the system’s behavior. The method’s capacity to accurately reconstruct complex dynamics, even with sparse datasets, vastly outperforms random experimentation, offering a pathway towards significantly improved process control and optimization strategies. By discerning the core principles at play, rather than relying on purely empirical correlations, this technique enables more robust predictions, efficient resource allocation, and ultimately, a deeper understanding of the bioprocess itself – paving the way for more intelligent and adaptable biotechnological systems.

This bioreactor system integrates state estimation and key measurements to monitor and control biological processes.

The pursuit of universal models, as demonstrated by this work on Bayesian inference for missing physics, echoes a fundamental human endeavor – the search for underlying principles governing complex systems. This research acknowledges the inherent uncertainty in model selection, a crucial step often glossed over. It is reminiscent of Blaise Pascal’s observation: “The eloquence of angels is silence.” The algorithms presented do not claim absolute truth, but instead offer a probabilistic framework, quantifying the likelihood of different physical laws. Every bias report in model selection is society’s mirror, and this method reflects a responsible approach to automating discovery, recognizing that the recovered model structure is only as good as the priors and data used, and crucially, that acknowledging uncertainty is not a limitation, but a form of intellectual honesty.

What Lies Ahead?

The pursuit of missing physics, framed through Bayesian symbolic regression, reveals less a path to complete knowledge and more a carefully illuminated boundary of ignorance. This work demonstrates a rigorous method for quantifying uncertainty in model discovery, a critical step often glossed over in the rush to find a model, rather than assessing the space of plausible models. However, the computational demands of Reversible Jump MCMC remain a significant obstacle, particularly as system complexity increases. Scaling these methods will require not merely faster hardware, but fundamentally new algorithmic approaches to explore the model space efficiently.

A critical, and often unacknowledged, limitation lies in the inductive biases embedded within both the chosen basis functions and the prior distributions. Data itself is neutral; the recovered models are, inescapably, reflections of the assumptions made before encountering the data. Future work must explicitly address the sensitivity of results to these prior choices, perhaps through the development of methods for learning priors from data or employing techniques that allow for a more flexible and data-driven exploration of model space.

Ultimately, the question isn’t simply whether a system can be modeled, but how it is modeled, and what values are implicitly encoded in that representation. Tools without values are weapons, and the automation of scientific discovery demands a corresponding commitment to transparency, accountability, and a critical awareness of the worldview embedded within every algorithm.

Original article: https://arxiv.org/pdf/2603.14918.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Bridging the Gap: The Challenge of Missing Physics

Unveiling the Hidden: Bayesian Symbolic Regression as a Powerful Synthesis

Embracing Learnable Physics: Universal Differential Equations and Implementation

From Inference to Impact: Validation and Real-World Application

What Lies Ahead?

See also: