Decoding Extreme Weather with AI

Author: Denis Avetisyan


A new intelligent agent framework is automating the complex analysis of severe weather events, offering a scalable solution for meteorologists and researchers.

EWE elucidates the origins of extreme events through a reasoning process akin to human cognition, iteratively accessing data and employing a physics-based diagnostic toolkit to discern causal factors.
EWE elucidates the origins of extreme events through a reasoning process akin to human cognition, iteratively accessing data and employing a physics-based diagnostic toolkit to discern causal factors.

This paper introduces Extreme Weather Expert (EWE), an agentic system integrating large language models, meteorological tools, and closed-loop reasoning for automated diagnosis of extreme weather.

Despite advances in weather prediction, diagnosing the physical mechanisms driving extreme events remains a critical bottleneck, demanding expert analysis and hindering scientific progress. This paper introduces EWE: An Agentic Framework for Extreme Weather Analysis, an intelligent agent designed to automate this traditionally labor-intensive diagnostic process. By integrating large language models with a tailored meteorological toolkit and employing closed-loop reasoning, EWE autonomously analyzes raw data and generates interpretable multimodal visualizations. Could this framework not only accelerate discovery but also democratize access to expertise for regions disproportionately impacted by extreme weather?


Decoding Extremes: The Challenge of Attribution

Establishing a definitive link between any single extreme weather event – a heatwave, flood, or drought – and long-term climate change presents a complex scientific challenge. While the overall warming trend is unequivocally established, discerning the degree to which climate change influenced a specific event requires sophisticated analysis. Natural variability plays a substantial role in weather patterns, meaning any extreme event would have occurred somehow, even without human-induced warming. The difficulty lies in quantifying how much more likely or intense the event became because of climate change, demanding researchers disentangle the effects of natural fluctuations from the signal of a changing climate. This process isn’t simply a matter of proving causation, but rather determining the fraction of an event’s probability or magnitude attributable to anthropogenic forcing, a task complicated by limited historical data and the chaotic nature of weather systems.

Historically, determining the extent to which climate change influenced an extreme weather event – such as a heatwave, flood, or drought – demanded painstaking analysis. These traditional diagnostic methods relied heavily on running complex climate models multiple times, each simulation requiring significant computational resources and weeks or even months to complete. The process wasn’t merely time-consuming; it necessitated substantial manual effort from climate scientists to meticulously compare the simulations, assess probabilities, and ultimately, attribute the event to specific climate factors. This slow turnaround hindered the ability of policymakers and emergency responders to effectively prepare for and respond to disasters, creating a critical need for more streamlined and automated approaches to extreme event attribution.

The accelerating frequency and intensity of extreme weather events demand a paradigm shift in how attribution science is conducted. Traditional methods, while rigorous, often prove too slow to inform immediate disaster response or proactive preparedness measures; a detailed analysis may emerge weeks or months after the event, diminishing its practical value. Consequently, there is a growing imperative for automated systems capable of rapidly assessing the influence of climate change on specific events – such as heatwaves, floods, or droughts. These systems, leveraging advanced modeling and statistical techniques, aim to provide near real-time assessments, enabling quicker deployment of resources, more effective early warning systems, and ultimately, a reduction in the vulnerability of communities facing increasingly severe weather impacts. This swift analysis is not simply about understanding the past, but about building resilience for the future.

This workflow demonstrates how the agent analyzes extreme precipitation events by iteratively processing its thoughts, actions, observations, and interpretations.
This workflow demonstrates how the agent analyzes extreme precipitation events by iteratively processing its thoughts, actions, observations, and interpretations.

EWE: An Intelligent Agent for Weather Diagnostics

The Extreme Weather Expert (EWE) is an intelligent agent framework engineered to automate the diagnostic analysis of extreme weather occurrences. This system functions by receiving observational data related to weather events and then processing this information to identify the contributing factors and characteristics of the event. EWE is designed to operate without direct human intervention in the diagnostic process, offering a scalable solution for monitoring and understanding high-impact weather. The framework’s architecture supports the analysis of diverse data sources, including satellite imagery, radar data, and surface observations, to build a comprehensive diagnostic assessment.

Knowledge-Enhanced Planning within the EWE framework facilitates the breakdown of complex weather diagnostic tasks into a series of discrete, achievable sub-goals. This decomposition mirrors the analytical process employed by human experts, who address intricate meteorological scenarios by isolating and evaluating specific contributing factors. EWE utilizes a knowledge base incorporating meteorological principles and observational data to inform the planning process, enabling the agent to formulate a sequence of analytical steps – such as identifying key weather features, assessing data quality, and formulating hypotheses – each designed to contribute to the overall diagnostic objective. The agent dynamically adjusts this plan based on intermediate results and newly acquired information, allowing for flexible and efficient problem-solving.

The EWE system utilizes a Self-Evolving Closed-Loop Reasoning process where analytical methods are continuously refined through iterative observation and validation against weather data. This process enables the agent to autonomously improve its diagnostic capabilities without explicit reprogramming. Performance evaluations demonstrate quantifiable improvements resulting from this system; synoptic-scale analysis achieved a performance gain of up to 0.239, while mesoscale analysis saw improvements up to 0.213. These gains are measured against established benchmarks for extreme weather event diagnosis and represent the system’s ability to enhance analytical accuracy over time.

The EWE framework facilitates self-evolving, closed-loop reasoning from user requests to generate comprehensive analytical reports.
The EWE framework facilitates self-evolving, closed-loop reasoning from user requests to generate comprehensive analytical reports.

The Meteorological Foundation: Data and Equations

The Meteorological Toolkit leverages the ERA5 reanalysis dataset, a globally complete and consistent record of hourly estimates from 1979 to near-real time. Developed by the European Centre for Medium-Range Weather Forecasts (ECMWF), ERA5 merges a vast array of historical observations – including surface, balloon, radar, and satellite data – with a state-of-the-art atmospheric model. This results in a comprehensive dataset covering variables such as temperature, wind speed, precipitation, and humidity, available at a horizontal resolution of 31km and encompassing land, ocean, and atmosphere. The dataset is publicly accessible and regularly updated, providing a robust foundation for weather analysis and forecasting within the EWE framework.

The Meteorological Toolkit utilizes established diagnostic equations to quantify atmospheric phenomena, prominently featuring calculations for Integrated Vapor Transport (IVT). IVT represents the total water vapor transported horizontally through a column in the atmosphere, calculated as the integral of vapor flux across a vertical cross-section. Specifically, IVT is determined by $IVT = \int_{p_{top}}^{p_{bottom}} q \cdot v \, dp$, where $q$ is specific humidity, $v$ is the horizontal wind vector, and the integral is taken vertically through the atmosphere from top pressure $p_{top}$ to bottom pressure $p_{bottom}$. High IVT values are strongly correlated with the presence and intensity of atmospheric rivers, making it a crucial metric for identifying these significant precipitation events.

The EWE Meteorological Toolkit integrates foundation models – specifically Pangu-Weather, GraphCast, and FengWu – to enhance forecasting and provide Explainable AI (XAI) capabilities. These models move beyond traditional numerical weather prediction by leveraging machine learning to improve accuracy and provide insights into forecast reasoning. Independent evaluation using Claude-4-Sonnet yielded a Report Generation Score of 0.950, indicating a high degree of coherence and factual correctness in the reports generated by the system. This scoring metric assesses the models’ ability to synthesize meteorological data into understandable and reliable summaries.

LLMs refine human-drafted checklists for evaluating analytical reports generated during extreme weather events, improving assessment criteria across all event types.
LLMs refine human-drafted checklists for evaluating analytical reports generated during extreme weather events, improving assessment criteria across all event types.

Validating the System: A Step-wise Evaluation

The Step-wise Evaluation Metric provides a comprehensive assessment of the End-to-End Weather Explorer (EWE) system by gauging performance throughout its complete operational sequence. Unlike evaluations focused solely on final outputs, this metric traces accuracy from the initial code generation – where algorithms are created to process weather data – through to the ultimate extraction of meaningful meteorological insights. This holistic approach ensures that any shortcomings, whether in coding logic or interpretive analysis, are identified and addressed. By scrutinizing each stage of the workflow, the metric delivers a nuanced understanding of EWE’s capabilities and limitations, ultimately bolstering the reliability and trustworthiness of the system’s weather-related conclusions.

The system’s validation process uniquely employs the reasoning capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) to move beyond simple accuracy checks. These models don’t merely verify if an analysis arrives at the correct numerical result; they also assess the quality of the reasoning itself. By examining the generated code and accompanying textual explanations, the LLMs and MLLMs determine if the analysis is logically sound, clearly articulated, and readily interpretable by a human expert. This dual evaluation-of both correctness and clarity-ensures that the system’s outputs are not only accurate but also trustworthy and easily understood, representing a significant advancement in automated meteorological insight generation.

To guarantee trustworthy results, the Extreme Weather Explorer (EWE) system incorporates dedicated auditing components. The Code Auditor rigorously examines the generated code for logical errors, adherence to best practices, and potential vulnerabilities, ensuring the analytical pipeline functions as intended. Complementing this is the Content Auditor, which focuses on the clarity and interpretability of the extracted meteorological insights, verifying that conclusions are logically supported by the data and presented in a readily understandable manner. These dual checks – one for computational integrity and the other for analytical coherence – are critical for establishing confidence in EWE’s outputs and facilitating informed decision-making based on its findings, particularly when dealing with complex and potentially impactful weather events.

A rigorous evaluation of the automated workflow relies on curated datasets sourced from internationally recognized authorities – EM-DAT for disaster events, reports from the World Meteorological Organization, and the comprehensive categorization within the IPCC AR6 assessment. This foundation enables a nuanced assessment of analytical performance, as demonstrated by scores of 0.827 for Synoptic-scale Analysis and 0.782 for Mesoscale Analysis. These metrics, determined by the Claude-4-Sonnet LLM, reflect the system’s capacity to accurately interpret complex meteorological data and provide reliable insights into weather-related events, validating its utility for diverse applications ranging from disaster preparedness to climate monitoring.

The development of EWE, as detailed in the article, underscores a fundamental principle of complex system design. It’s not merely about assembling tools – large language models and meteorological datasets – but orchestrating them within a cohesive, closed-loop reasoning process. This holistic approach resonates with Robert Tarjan’s observation: “Complexity is not a bug, it’s a feature.” EWE embraces this complexity, transforming the traditionally labor-intensive task of extreme weather analysis into a scalable, automated system. The framework’s success stems from acknowledging that understanding extreme weather necessitates integrating diverse data sources and reasoning capabilities, thereby reflecting a system where structure dictates behavior.

What’s Next?

The introduction of an agentic framework like Extreme Weather Expert (EWE) does not, as some might presume, solve the problem of extreme weather analysis. Rather, it relocates the inherent difficulties. The system’s architecture, while promising a scalable approach to diagnosis, merely externalizes the complexity previously contained within the expertise of a human analyst. Every optimization – the tighter integration of meteorological tools, the refinement of the closed-loop reasoning – inevitably introduces new tension points, new opportunities for systematic error. The illusion of complete automation is a persistent, and often misleading, goal.

Future work will undoubtedly focus on expanding the multimodal reasoning capabilities of such systems. However, a more pressing concern lies in understanding the limits of that reasoning. What classes of weather events remain fundamentally resistant to algorithmic diagnosis? Where does the need for human intuition – that messy, non-quantifiable element – remain essential? The true measure of progress will not be the system’s ability to mimic human analysis, but to reveal the underlying structure of meteorological phenomena in ways previously inaccessible.

Ultimately, the architecture is the system’s behavior over time. The pursuit of increasingly sophisticated agents must be tempered with a recognition that the most elegant solutions are often the simplest, and that true understanding emerges not from brute-force computation, but from a deep appreciation for the interconnectedness of meteorological systems.


Original article: https://arxiv.org/pdf/2511.21444.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-27 15:33