Decoding the Weather Machine: How AI Reveals Internal Physics

Author: Denis Avetisyan

A new study shows that the inner workings of a leading data-driven weather model can be understood by extracting and interpreting the features it learns from data.

GraphCast distills complex atmospheric dynamics into interpretable features by transforming dense, unreadable internal representations into sparse linear combinations of known variables, exposing the underlying logic of its predictive process and revealing how the model internally abstracts environmental conditions.

Sparse autoencoders uncover interpretable physical representations within the GraphCast model, enabling causal analysis of its predictions.

Despite the recent success of data-driven physics models like GraphCast in weather forecasting, their internal computational processes remain largely opaque. This work, ‘Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features’, addresses this challenge by applying interpretability techniques-specifically sparse autoencoders-to dissect the model’s internal representations. We demonstrate that these techniques can successfully extract interpretable features corresponding to physically meaningful phenomena, ranging from tropical cyclones to seasonal behavior. Could this approach unlock a new era of trustworthy, scientifically valuable, and discoverable data-driven models for complex physical systems?

Decoding the Forecast: Beyond Prediction to Understanding

Even with increasingly accurate weather predictions, a fundamental challenge persists: deciphering the reasoning behind those forecasts. Modern weather models, while adept at processing vast datasets and simulating atmospheric behavior, often operate as complex ‘black boxes’. While a model might correctly predict rainfall, for instance, pinpointing why it predicted that specific amount, location, and timing remains difficult. This lack of interpretability isn’t merely an academic concern; it hinders the refinement of these models, limits the ability to identify and correct systematic errors, and ultimately erodes trust in forecasts, particularly when dealing with high-impact weather events. The predictive power exists, but understanding the causal links within these simulations is crucial for transforming forecasts into actionable insights and building truly reliable predictive systems.

Current weather prediction models, while remarkably adept at forecasting outcomes, often present a significant challenge in explaining why those predictions are made. Traditional interpretability methods – techniques designed to unpack the reasoning behind a model’s decision – struggle to keep pace with the increasing complexity of these systems. This disconnect between accuracy and understanding hinders not only trust in the forecasts, particularly during high-impact events, but also the crucial process of model refinement. Without a clear grasp of the underlying physical mechanisms driving a prediction, identifying and correcting biases or improving the model’s representation of atmospheric processes becomes exceedingly difficult, limiting the potential for continued advancement in weather forecasting capabilities.

Atmospheric dynamics present a uniquely intricate challenge to predictive modeling, necessitating a shift away from purely statistical, ‘black box’ approaches. The weather isn’t simply a pattern to be recognized, but a cascading system of fluid mechanics, thermodynamics, and radiative transfer – interactions so numerous and nonlinear that even slight variations in initial conditions can yield dramatically different outcomes. Consequently, simply knowing a model predicts rain isn’t enough; understanding why it predicts rain – tracing the causal pathways through atmospheric variables – is crucial for building confidence, identifying biases, and ultimately improving forecast skill. This demands new techniques in model interpretation, focusing on methods that can dissect the complex interplay of factors driving weather patterns and reveal the physical reasoning behind each prediction, rather than treating the model as an opaque oracle.

Logistic regression probes trained on individual features effectively distinguish between the presence of tropical cyclones and atmospheric rivers, as demonstrated by their respective F1 scores.

Unveiling the Model’s Language: Node Embeddings and Feature Discovery

GraphCast employs a graph neural network (GNN) to model the Earth’s atmosphere as a discrete network. This network consists of a spatially distributed set of nodes, where each node represents a location on the atmospheric grid. The relationships between these nodes – representing physical interactions such as air movement and energy transfer – are encoded as node embeddings. These embeddings are multi-dimensional vector representations learned by the GNN, capturing the relevant characteristics of each node and its connections to neighboring nodes. The GNN processes these node embeddings to predict future atmospheric states, effectively learning a representation of the atmosphere where relationships are explicitly modeled through the connections and learned weights within the graph structure.

GraphCast employs a Sparse Autoencoder to decompose high-dimensional node embeddings – numerical representations of atmospheric states at specific locations – into a set of interpretable features. This process involves training the autoencoder to reconstruct the original embeddings from a compressed, sparse representation. The sparsity constraint forces the model to learn a minimal set of features that capture the most salient information within the atmospheric data. By analyzing the learned features, researchers can identify the underlying patterns and physical variables the model prioritizes when representing and predicting weather phenomena, effectively revealing the core elements of GraphCast’s internal understanding of atmospheric states.

The Linear Representation Hypothesis posits that the atmospheric state embeddings generated by GraphCast can be accurately reconstructed through a linear combination of the discovered features. This implies that the features represent independent, fundamental aspects of atmospheric conditions, and their weighted sum effectively recreates the original high-dimensional embedding. Mathematically, this can be expressed as $\hat{e} = Wf$ , where $\hat{e}$ is the reconstructed embedding, $f$ represents the feature vector, and $W$ is a weight matrix defining the linear combination. The ability to reconstruct embeddings with high fidelity via this linear mapping is critical, as it allows researchers to interpret the learned features and understand how GraphCast internally represents and processes atmospheric data without requiring the model to be a “black box”.

The process of discovering the features used by GraphCast is critical because it moves beyond simply observing the model’s predictive accuracy to understanding how those predictions are generated. GraphCast, as a neural network, learns to represent atmospheric states through numerical embeddings; however, these embeddings are not inherently interpretable. By disentangling these embeddings and identifying the underlying features they encode, researchers can establish a connection between the model’s internal representation and known physical phenomena. This interpretability is essential for verifying the model’s physical consistency, diagnosing potential biases, and ultimately increasing confidence in its long-term reliability and utility for weather forecasting and climate modeling.

GraphCast's grid representation exhibits spurious activations, indicating potential grid-locking of features. — GraphCast’s grid representation exhibits spurious activations, indicating potential grid-locking of features.

Probing for Physical Reality: Validating Model Consistency

Feature modification techniques within GraphCast involve the targeted alteration of discovered feature activations to assess their impact on model predictions. Specifically, these techniques selectively amplify or attenuate the influence of individual features – increasing or decreasing their contribution to the overall forecast. This is achieved by modifying the activation values of learned feature maps during the forward pass of the model, allowing researchers to isolate the effect of specific features on the predicted weather variables. By systematically varying the strength of these features, the model’s sensitivity to physically relevant signals can be quantified and validated.

Feature modification techniques are used to systematically alter discovered features within the GraphCast model, and the resulting changes in model output are then analyzed to determine adherence to established physical laws. Specifically, the model’s response is evaluated against the principles of hydrostatic balance – the vertical equilibrium between pressure gradient force and gravity – as well as the conservation of mass and force balance. Deviations from these principles indicate potential inconsistencies in the model’s learned representations. This probing process involves controlled perturbations of input features and observation of corresponding changes in predicted physical quantities to quantify the model’s physical plausibility.

Investigation of specific weather phenomena, including Tropical Cyclones and Atmospheric Rivers, was conducted by analyzing GraphCast’s internal feature representations. This involved examining how the model encodes characteristics associated with these events – such as cyclonic rotation, concentrated moisture transport, and associated pressure gradients – within its learned features. By correlating feature activations with known properties of these phenomena, researchers assessed the model’s ability to identify and represent key aspects of their structure and behavior. This approach enabled a detailed understanding of how GraphCast internally models complex meteorological events, going beyond overall forecast accuracy to reveal the underlying representational strategies employed by the model.

Evaluation of GraphCast demonstrates the model’s capacity to learn physically plausible representations of weather phenomena. Specifically, analysis reveals maintenance of physical consistency in predicted states, indicating adherence to established meteorological principles. This is quantitatively supported by a tropical cyclone detection F1-score of 0.48, representing the harmonic mean of precision and recall in identifying these complex weather systems. This score indicates a balance between minimizing false positives and false negatives in tropical cyclone detection, validating the model’s ability to accurately represent and predict the formation and behavior of these events.

Modifying a specific hurricane feature within the GraphCast model predictably alters hurricane strength forecasts and maintains physical consistency, as demonstrated by changes in maximum wind speeds, mean sea level pressure, path trajectories, and a continued balance between pressure gradients and azimuthal wind speeds γ.

Recognizing the Model’s Blind Spots: Addressing Spurious Features

Model analysis has revealed the presence of ‘Grid-Locked Features’, a phenomenon where certain activations within the system are demonstrably linked to the model’s computational grid rather than genuine atmospheric processes. These features arise because the model, in attempting to represent continuous phenomena, can inadvertently prioritize grid alignment over physical plausibility, leading to patterns that reflect the discretization scheme itself. Consequently, interpretations based solely on these activations would be spurious, potentially misrepresenting actual climate signals and introducing systematic errors into analyses of variables like sea ice extent and atmospheric river detection. Identifying these artifacts is crucial, as their presence underscores the need for robust validation techniques and careful consideration of model limitations when drawing conclusions about complex climate systems.

The identification of grid-locked and other spurious features within climate model outputs underscores a critical need for diligent analytical practices. Simply accepting model results at face value can lead to misinterpretations of atmospheric processes and flawed conclusions about climate change. Robustness in model interpretation isn’t achieved through increased complexity, but through careful scrutiny – actively seeking out and mitigating these artifacts. Strategies range from refined data analysis techniques designed to isolate genuine signals from grid-induced noise, to the development of alternative model architectures less susceptible to these spurious patterns. Ultimately, acknowledging and addressing such limitations is paramount for building trust in climate model predictions and ensuring informed decision-making regarding our changing planet.

The presence of spurious features isn’t confined to a single climate region; analysis demonstrates their manifestation in both Arctic and Antarctic sea ice extent data. This discovery underscores the pervasive nature of these artifacts, indicating that model interpretations relying on such data require careful scrutiny across diverse geographical locations. The consistent appearance of these grid-locked features, irrespective of polar region, suggests a systematic bias inherent in the model’s structure, rather than a localized error. Consequently, researchers must implement mitigation strategies not just for specific areas of interest, but as a standard practice when interpreting model outputs globally, to avoid drawing inaccurate conclusions about climate trends in any zone.

Despite the identification of grid-locked features within the model, a robust consistency in atmospheric river detection emerged from testing. The F1-score, a measure of a test’s accuracy, remained approximately stable across a range of hyperparameter settings, indicating the model’s ability to reliably identify these crucial weather systems even with the presence of these spurious artifacts. This suggests that while the internal workings of the model may be influenced by the grid structure, the overall performance in recognizing atmospheric rivers isn’t substantially compromised. The finding highlights an interesting resilience within the model and implies that the identified features, though noteworthy, do not fundamentally undermine its predictive capabilities in this specific application.

GraphCast learns features exhibiting oscillations across multiple timescales-including daily, seasonal, and annual patterns-that correspond to real-world phenomena like sea ice extent, desert heating, precipitation patterns, and rainforest activity, despite having no direct access to this information.

The pursuit of mechanistic understanding in data-driven models, as demonstrated by this work on GraphCast, inevitably exposes the limitations of purely rational interpretations. The model doesn’t ‘think’ through physics; it identifies and amplifies patterns, effectively translating observed correlations into numerical outputs. This echoes a fundamental truth about human behavior – we aren’t governed by logic, but by ingrained habits and emotional responses. As Sergey Sobolev aptly stated, “The most dangerous illusion is to believe that you are thinking rationally when you are not.” The extraction of interpretable features via sparse autoencoders merely reveals the complex algorithms at play, mirroring how predictable flaws drive investment decisions, and confirming that even the most sophisticated systems are built upon imperfect foundations.

The Algorithm Remembers

The pursuit of interpretability in data-driven models isn’t about finding truth, but about constructing a plausible narrative for a system that operates according to its own internal logic. This work, revealing emergent physical features within GraphCast through sparse autoencoders, offers a glimpse into that logic – but it’s a glimpse refracted through the lens of human expectation. The extracted features resemble physical phenomena because the researchers, inevitably, sought resemblance. The model doesn’t ‘understand’ pressure gradients; it manipulates numbers in a way that, to human observers, correlates with pressure gradients.

The next step isn’t simply refining the autoencoder, but acknowledging the inherent limitations of feature extraction. The assumption that interpretable features equate to causal mechanisms requires rigorous testing. Can these extracted representations genuinely predict model behavior under novel conditions, or are they merely post-hoc justifications? A focus on counterfactual analysis-perturbing internal activations and observing the downstream effects-will be critical. The real challenge lies not in seeing what the model represents, but in understanding why it represents it that way-a question rooted less in physics and more in the quirks of the training data and the architecture itself.

Ultimately, this line of inquiry suggests a subtle shift in perspective. The goal isn’t to build models that mimic understanding, but to build tools that allow humans to navigate the opaque internal world of the algorithm – to predict its biases, its failures, and its unexpected strengths. The algorithm doesn’t forget; it simply prioritizes patterns based on an optimization function-and those priorities will always be alien, if not incomprehensible, to the beings who created it.

Original article: https://arxiv.org/pdf/2512.24440.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Decoding the Forecast: Beyond Prediction to Understanding

Unveiling the Model’s Language: Node Embeddings and Feature Discovery

Probing for Physical Reality: Validating Model Consistency

Recognizing the Model’s Blind Spots: Addressing Spurious Features

The Algorithm Remembers

See also: