Unlocking Earth’s Secrets: AI Agents for Data-Driven Geoscience

Author: Denis Avetisyan

A new hierarchical system utilizes artificial intelligence to autonomously explore and analyze vast archives of geoscientific data, accelerating discovery in the Earth sciences.

The PANGAEA-GPT Multi-Agent System employs a hierarchical, two-phase architecture wherein a Search Agent initially identifies pertinent datasets in response to natural language queries, after which a Supervisor Agent distributes subsequent analytical and visualization tasks to specialized agents operating within a secure environment, culminating in the synthesis of findings into a unified report by a Writer Agent.

This paper presents PANGAEA-GPT, a multi-agent system leveraging large language models for intelligent data retrieval, integration, and analysis within a major geoscientific repository.

Despite the increasing volume of Earth science data, a significant portion remains underutilized, hindering impactful scientific discovery. This paper introduces ‘A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives’, presenting PANGAEA-GPT, a novel framework leveraging large language models within a multi-agent system to autonomously query, integrate, and analyze heterogeneous data from large geoscientific repositories. By implementing a Supervisor-Worker architecture with robust error handling and self-correction, PANGAEA-GPT enables complex, multi-step workflows with minimal human intervention. Could this approach unlock the full potential of existing Earth science datasets and accelerate future research?

Unveiling Earth’s Complexity: A Paradigm Shift in Data Analysis

Historically, Earth science has been constrained by the laborious process of manually compiling and interpreting data from diverse sources – satellite imagery, geological surveys, climate models, and more. This reliance on human effort creates significant bottlenecks, slowing the pace of discovery and limiting the ability to address complex, large-scale environmental challenges. Researchers often spend considerable time simply preparing data for analysis, rather than extracting meaningful insights. Consequently, scaling investigations to encompass broader geographic areas or longer timeframes becomes exceedingly difficult, hindering comprehensive understanding of Earth’s dynamic systems and impeding proactive responses to global change. This manual approach struggles to keep pace with the exponential growth of available environmental data, necessitating innovative solutions to unlock its full potential.

The proliferation of environmental sensors, satellite imagery, and computational models has generated an unprecedented deluge of data, far exceeding the capacity of traditional analytical methods. This exponential growth isn’t simply a matter of volume; the data originates from diverse sources, utilizes varying formats, and often contains inherent uncertainties and biases. Consequently, effectively integrating and interpreting this complex web of information demands sophisticated automated systems. These intelligent tools must not only process vast datasets but also identify patterns, establish correlations, and extrapolate meaningful insights – tasks previously requiring extensive manual effort from Earth science experts. The need for such systems stems from the limitations of human-driven analysis in the face of this complexity, hindering timely responses to critical environmental challenges and slowing the pace of scientific discovery.

PANGAEA-GPT represents a novel approach to Earth science data analysis through a hierarchical multi-agent system, designed to autonomously integrate and interpret complex environmental datasets. This system functions by distributing tasks among specialized agents, each focusing on a specific aspect of data processing – from initial data acquisition and cleaning to advanced pattern recognition and predictive modeling. The architecture allows for efficient scaling and adaptation to diverse data types and scientific questions, ultimately accelerating the pace of discovery. Rigorous testing, including a search benchmark, demonstrates the system’s effectiveness, yielding a mean score of 8.14/10 and highlighting its potential to transform how Earth scientists approach complex challenges and extract meaningful insights from the planet’s vast data resources.

The Oceanographer Agent enhanced the PANGAEA biological dataset by integrating 4D hydrographic data [latex] heta/SS[/latex] from GLORYS12V1, revealing correlations between [latex]Aglantha digitale[/latex] abundance and specific water masses as visualized in the temperature-salinity diagram.

Orchestrating Expertise: A Symphony of Specialized Agents

The Supervisor Agent functions as the core control mechanism within the agentic system, responsible for breaking down high-level user requests into a sequence of discrete sub-tasks. This decomposition process allows the Supervisor to dynamically assign each sub-task to the most appropriate specialist agent, leveraging their specific capabilities. Rather than a single agent handling all aspects of a complex query, the Supervisor facilitates a workflow where agents collaborate, each contributing their expertise to achieve the overall objective. This task allocation is not static; the Supervisor continually monitors progress and adjusts assignments as needed to optimize performance and ensure efficient resource utilization.

Specialist agents are designed with specific competencies for processing distinct data types and performing targeted analyses. The Oceanographer Agent, for example, is equipped to handle oceanic data formats and execute analyses relevant to marine science, while the Ecologist Agent focuses on ecological datasets and related modeling. The DataFrame Agent specializes in tabular data manipulation and analysis, leveraging libraries like Pandas for efficient processing. This specialization extends to the agents’ internal tools and algorithms, allowing each to perform its designated tasks with greater accuracy and speed compared to a generalized agent attempting the same operations.

The system’s modular architecture enables parallel processing of tasks by distributing workload among specialist agents, resulting in significant reductions in overall analysis time. In benchmark testing, the Agentic Search achieved a precision score of 8.53, indicating the accuracy of identified parameters, and a parameter coverage score of 8.99, demonstrating the breadth of parameters successfully addressed. This performance is directly attributable to the efficient resource allocation facilitated by the agent-based design, allowing for concurrent execution of specialized analyses on diverse data types.

Using 100 geoscientific queries, an agentic search architecture consistently outperformed both baseline Elastic Search and simple LLM approaches across five semantic metrics, as indicated by standard deviation error bars.

Data Ingestion and Processing: From Raw Signals to Actionable Knowledge

The Oceanographer Agent utilizes data sourced from the Copernicus Marine Service and ERA5, both recognized providers of comprehensive Earth observation data. Data is primarily accessed in NetCDF and Zarr formats, chosen for their efficiency in storing and retrieving large, multi-dimensional datasets common in oceanographic and meteorological applications. NetCDF’s structure allows for self-describing, portable data, while Zarr offers cloud-native storage and parallel read/write capabilities, facilitating scalable data access for analysis and processing. These formats minimize data transfer times and storage requirements, enabling the agent to efficiently process extensive datasets related to ocean currents, sea surface temperature, and atmospheric conditions.

Earthmover serves as the primary data access layer for the ERA5 dataset, a comprehensive reanalysis dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). This component handles the retrieval of atmospheric, land, and oceanic climate variables, providing the Oceanographer Agent with necessary environmental context. Crucially, all code generated by the agents, intended to process or analyze this data, is executed within a sandboxed environment. This sandboxed execution prevents potentially malicious or erroneous code from accessing system resources or compromising data integrity, ensuring a secure and controlled operational framework. The implementation utilizes containerization technologies to isolate the execution context and enforce resource limitations.

The Oceanographer Agent utilizes ReAct prompting in conjunction with the GPT-5.2 language model to iteratively refine data queries, improving the precision and pertinence of retrieved information. This approach was demonstrated in Scenario 1, where analysis revealed a Spearman correlation coefficient of -0.47 (p=0.033) between ocean current speed and microplastic concentration. This statistically significant correlation – indicating a moderate negative relationship – suggests that faster currents are associated with lower concentrations of microplastics in the specified scenario, a finding derived through the iterative query process enabled by ReAct prompting.

Comparison of in-situ temperature observations from ten HAUSGARTEN moorings with the Copernicus GLORYS12V1 reanalysis shows a bias of +0.35°C, a root mean squared error of 1.09°C, and a correlation coefficient of 0.31 after a 4D nearest-grid-point matchup of 135,678 data points, with color indicating observation depth.

Visualizing and Validating Results: Ensuring Data Integrity and Clarity

The creation of compelling data visualizations is significantly enhanced through the use of a Retrieval-Augmented Generation (RAG) Index within the Visualization Agent. This innovative approach doesn’t simply generate plots; it first retrieves relevant information – including established visualization best practices, data characteristics, and appropriate chart types – from a curated knowledge base. This retrieved context then informs the generation process, ensuring that the resulting visualizations are not only aesthetically pleasing but also accurately and effectively communicate the underlying data’s story. By combining generative capabilities with a strong foundation of verified knowledge, the RAG Index allows for the creation of informative and insightful plots that go beyond simple data representation, revealing nuanced patterns and facilitating deeper understanding. The system dynamically selects the most suitable visual encoding based on the data’s properties, moving beyond pre-defined templates to deliver truly data-driven visualizations.

The integrity of data visualization relies heavily on automated quality checks, and this system employs a Visual Quality Control Loop to rigorously assess generated plots. This loop doesn’t simply verify aesthetic elements; it confirms that visual conventions – such as appropriate axis labeling, clear legends, and logical scaling – are consistently applied. More critically, the loop validates the accuracy of the visual representation by comparing the plot’s depiction of data trends and values against the original dataset. Discrepancies trigger automatic adjustments or flags for review, ensuring that the final visualization faithfully reflects the underlying data and avoids misleading interpretations. This process is crucial for maintaining scientific rigor and building trust in the presented findings, as a flawed visualization, however aesthetically pleasing, can invalidate research conclusions.

The culmination of data analysis lies not simply in generating visualizations, but in translating complex findings into accessible scientific narratives. This is achieved through a dedicated Writer Agent, which synthesizes results from both the Visualization and Quality Control loops into a coherent and easily understood report. The agent doesn’t merely describe the plots; it articulates the meaning behind them, providing concise explanations of key findings and their implications. This process ensures that the insights derived from data are not lost in visual complexity, but are instead presented as a clear, logical argument, suitable for scientific communication and furthering understanding. By focusing on clarity and conciseness, the Writer Agent transforms raw data into a compelling and informative story.

Robustness and Error Correction: Building a Resilient Analytical Framework

When the primary GPT-5.2 agent encounters challenges in processing complex Earth science data or resolving ambiguities, the system intelligently redirects the task to the Wise Agent, powered by Claude Opus. This secondary agent functions as a critical error-correction layer, leveraging a distinct architectural approach to re-examine the problematic input and generate an alternative response. The Wise Agent isn’t merely a redundant system; it represents a deliberate design choice to enhance overall reliability. By employing a different large language model, the system minimizes the risk of systematic errors that might arise from limitations inherent in a single model. This fallback mechanism ensures that even intricate queries or nuanced datasets receive a robust evaluation, ultimately improving the accuracy and trustworthiness of the Earth science insights generated.

The system’s foundation for information access relies on Elasticsearch, a distributed, RESTful search and analytics engine renowned for its speed, scalability, and reliability. This technology functions as a crucial baseline, enabling efficient retrieval of vast datasets necessary for complex Earth science analyses. Elasticsearch doesn’t simply locate data; it indexes it, allowing for nuanced queries and rapid identification of relevant information, even within unstructured text. This robust search capability is vital for ensuring the accuracy and consistency of the entire system, as it provides a dependable source of truth against which the primary agent’s responses can be validated, and contributes to the system’s overall resilience when faced with challenging or ambiguous inputs.

The system’s architecture deliberately integrates multiple layers of processing to guarantee both resilience and analytical accuracy across complex Earth science investigations. This approach extends beyond simple redundancy; it allows for cross-validation of results and correction of errors stemming from any single component. Demonstrated in Scenario 4, this robust design facilitated the statistically significant identification – with a p-value of less than 0.0014 – of a notable disparity in biodiversity between Western and Eastern transects. Such precision opens avenues for detailed ecological studies, refined environmental monitoring, and improved predictive modeling, ultimately enabling a deeper understanding of our planet’s intricate systems and fostering data-driven conservation efforts.

Comparison of ERA5 reanalysis and MOSAiC observations confirms Lagrangian consistency and reveals distinct wind regimes.

The pursuit of automated scientific discovery, as demonstrated by PANGAEA-GPT, necessitates a relentless focus on correctness and provability. The system’s hierarchical multi-agent approach, designed to intelligently navigate and integrate complex geoscientific data, echoes a commitment to algorithmic purity. As Grace Hopper famously stated, “It’s easier to ask forgiveness than it is to get permission.” This resonates with the innovative spirit of PANGAEA-GPT, which proactively explores data relationships and automates analyses, prioritizing efficient data curation and knowledge extraction even if it means circumventing traditional, manual workflows. The system’s ability to retrieve and synthesize information from disparate sources demands a foundation built on demonstrable accuracy, not merely functional operation.

What’s Next?

The presented system, while demonstrating a capacity for automated geoscience analysis, merely scratches the surface of a fundamentally unresolved problem: the reliable extraction of truth from inherently noisy data. The elegance of a multi-agent architecture rests on the precision of its components; a single flawed logical step propagated through the hierarchy yields results indistinguishable from randomness. Reproducibility, the cornerstone of any scientific endeavor, remains a challenge when LLMs, by their probabilistic nature, introduce a degree of non-determinism. The current implementation mitigates this through constrained prompting, but a truly robust system demands formal verification of each agent’s reasoning-a pursuit akin to squaring the circle.

Future work must address the issue of ‘hallucination’-the generation of plausible but factually incorrect information-not as a statistical anomaly to be averaged out, but as a logical impossibility to be prevented. Furthermore, the reliance on curated data repositories, while pragmatic, introduces a bias; a truly autonomous system should be capable of assessing data quality and flagging inconsistencies, operating independently of human intervention. The ambition is not simply to accelerate analysis, but to establish a framework where computational results possess an inherent, mathematically demonstrable certainty.

Ultimately, the pursuit of automated geoscience is a philosophical exercise. It forces a reckoning with the limitations of approximation and the enduring need for rigorous, provable solutions. The system detailed herein is a step, albeit a small one, towards a future where algorithms do not merely ‘work’ but prove their correctness – a standard to which all scientific computation should aspire.

Original article: https://arxiv.org/pdf/2602.21351.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/