Author: Denis Avetisyan
This review explores how machine learning techniques are transforming our understanding of galaxy clusters and the complex physics governing their formation and evolution.

A comprehensive overview of machine learning applications to galaxy cluster mass estimation, cosmological analysis, and the challenges posed by baryonic physics and simulation limitations.
Establishing robust cosmological constraints from galaxy clusters is challenged by complex astrophysical processes and projection effects. This review, ‘Machine Learning applications to Galaxy Clusters’, surveys the burgeoning field of artificial intelligence techniques applied to cluster research, demonstrating their capacity to refine mass estimation and probe intricate phenomena like mergers and baryonic physics. Recent advances leverage simulations to train models capable of capturing non-linear features beyond traditional methods, yet require careful attention to systematic uncertainties and interpretability. Can a synergistic approach combining flexible simulations with advanced machine learning fully unlock the precision cosmology potential of next-generation surveys?
Cosmic Behemoths: Probing the Universe’s Foundations
Galaxy clusters represent the most massive, gravitationally bound structures in the universe, serving as pivotal probes into both cosmological principles and the enigmatic nature of dark matter. These colossal assemblies, containing hundreds or even thousands of galaxies, are held together by their combined gravity, with the majority of their mass existing not in visible matter, but in dark matter and hot gas. The distribution and evolution of these clusters are directly influenced by the underlying cosmology – the composition, geometry, and ultimate fate of the universe – making them ideal laboratories for testing cosmological models. Furthermore, the sheer mass of galaxy clusters amplifies the gravitational effects of dark matter, allowing scientists to map its distribution and better understand its properties, which remain largely unknown despite comprising approximately 85% of the universe’s total mass. Consequently, investigations into galaxy cluster formation and evolution are central to advancing our understanding of the cosmos.
Galaxy clusters aren’t simply collections of galaxies; they are immersed within the Intracluster Medium (ICM), a diffuse, incredibly hot plasma that constitutes the vast majority of the cluster’s baryonic mass. Reaching temperatures of tens to hundreds of millions of degrees Kelvin, this plasma emits strongly in the X-ray spectrum, providing a crucial observational window into the cluster’s overall properties. The ICM’s thermal energy represents a significant fraction of the cluster’s total energy budget and is exquisitely sensitive to the cluster’s gravitational potential and formation history. Consequently, detailed studies of the ICM – including its temperature, density, and chemical composition – offer invaluable insights into the processes governing structure formation in the universe and the elusive nature of dark matter, as the plasma’s behavior is directly influenced by the cluster’s underlying gravitational field.
Establishing a robust connection between a galaxy cluster’s total mass and readily observable characteristics – such as its X-ray luminosity or the temperature of its Intracluster Medium – is paramount to cosmological studies. However, accurately determining this mass presents a significant challenge. Traditional methods, like employing the virial theorem or modeling the hydrostatic equilibrium of the ICM, are susceptible to systematic errors and rely on simplifying assumptions about the cluster’s internal state. These limitations can lead to substantial uncertainties in mass estimates, hindering precise cosmological inferences. Consequently, researchers are actively pursuing novel approaches, including weak gravitational lensing and dynamical tracer analyses, to refine mass measurements and overcome the shortcomings of conventional techniques, ultimately seeking a more reliable ‘mass indicator’ for these cosmic behemoths.

Simulating the Cosmos: The Limits of Our Models
Hydrodynamical simulations, such as The Magneticum Simulations and The Three Hundred Project, represent a computational astrophysics technique focused on modeling the formation and evolution of galaxy clusters. These simulations combine N-body methods – which track the gravitational interactions of dark matter and stellar particles – with hydrodynamics equations to model the behavior of baryonic matter, including gas. By numerically solving these coupled equations, researchers can track processes like gas cooling, star formation, and the impact of active galactic nuclei (AGN) feedback on the overall cluster properties. These simulations produce large datasets containing the positions, velocities, and physical properties of millions to billions of particles, allowing for detailed analysis of cluster morphology, mass distribution, and observable characteristics.
Cosmological hydrodynamical simulations model the universe by combining N-body simulations, which calculate gravitational interactions between dark matter particles, with hydrodynamical equations that describe the behavior of gas. The N-body component tracks the gravitational forces acting on a large number of particles, while the hydrodynamical component simulates gas properties like density, temperature, and velocity, including radiative processes. This combined approach allows researchers to study the formation and evolution of structures like galaxies and galaxy clusters. However, accurately resolving the relevant physical scales and including all necessary physics – such as star formation, feedback from active galactic nuclei, and metal enrichment – requires enormous computational resources. The computational cost scales rapidly with resolution, necessitating the use of supercomputers and often limiting the size and duration of simulations.
The CAMELS (Cosmological Analysis of Massive SurveyS) project seeks to improve the accuracy of hydrodynamical simulations by systematically varying input parameters and refining calibration techniques. Despite these efforts to enhance simulation fidelity, computational limitations remain a significant challenge. Critically, traditional methods for analyzing simulation outputs, particularly those focused on mass estimation of structures like galaxy clusters, exhibit substantially higher scatter – meaning a wider range of possible mass values for a given cluster – than those achievable using contemporary machine learning approaches. This increased scatter limits the precision of cosmological inferences derived from these simulations and motivates the development of more accurate analytical techniques.

Machine Learning: New Eyes on Cosmic Structure
Machine learning techniques are increasingly utilized to analyze the substantial datasets generated by astrophysical simulations and observations of galaxy clusters. Specifically, data derived from X-ray emission and the Sunyaev-Zel’dovich (SZ) effect – both sensitive probes of the hot intracluster gas – benefit from these analytical approaches. The complexity and high dimensionality of these datasets, coupled with weak or noisy signals, necessitate the use of algorithms capable of identifying subtle patterns and correlations that are difficult to detect with traditional methods. This includes the processing of both simulated data, used for algorithm training and validation, and observational data, enabling more comprehensive and efficient analysis of cluster properties and cosmological parameters.
Machine learning techniques, including Random Forest, Support Vector Machines, and Convolutional Neural Networks, facilitate the identification of complex patterns within astronomical datasets and enable the prediction of cluster properties. Random Forest and Support Vector Machines utilize ensemble learning and kernel functions, respectively, to classify and regress cluster characteristics based on input features derived from X-ray emission and Sunyaev-Zel’dovich effect observations. Convolutional Neural Networks, a deep learning approach, excel at automatically learning hierarchical representations from data, proving particularly effective when analyzing image-like data and extracting spatial correlations within galaxy clusters; this automated feature extraction bypasses the need for manually defined parameters, enhancing predictive power and robustness.
Machine learning techniques facilitate improved calibration of scaling relations – empirical relationships between galaxy cluster properties like mass, X-ray luminosity, and Sunyaev-Zel’dovich signal – by providing a more efficient and accurate means of determining cluster masses. Traditional methods for calibrating these relations often rely on assumptions or limited data, leading to uncertainties in mass estimation. Implementation of machine learning algorithms, specifically those capable of identifying complex data patterns, has demonstrated the potential to reduce scatter in mass estimates by a factor of 1.5 to 2.0 relative to conventional calibration approaches. This reduction in scatter directly translates to increased precision in cosmological parameter estimation and improved statistical power in large-scale structure studies.
![A convolutional neural network successfully identified merging galaxy clusters in both SZ effect and X-ray images by accurately delineating the [latex]R_{200}[/latex] radius, as demonstrated by Arendt et al. (2024).](https://arxiv.org/html/2605.21991v1/figures/ArendtMerger.png)
Beyond Pixels: Mapping the Cosmic Web with Graph Neural Networks
The intricate relationships between galaxies, particularly within dense clusters and the expansive cosmic web, present a unique analytical challenge. Traditional methods often struggle to capture the full complexity of these gravitational interactions and interdependencies. However, Graph Neural Networks (GNNs) offer a powerful alternative by explicitly modeling these connections; galaxies and clusters are treated as nodes within a network, and their relationships-defined by gravity, gas flows, and shared cosmic environments-become the edges. This approach allows the network to learn directly from the structure of the cosmic web, identifying patterns and dependencies that might be missed by conventional analyses. By leveraging the relational information inherent in the distribution of galaxies, GNNs can effectively map and understand the large-scale structure of the universe, offering insights into galaxy formation, dark matter distribution, and the evolution of the cosmos.
Galaxy clusters, vast collections of gravitationally bound galaxies, aren’t isolated entities; they exist within a complex network shaped by ongoing gravitational interactions and the flow of gas throughout the cosmic web. Researchers are now leveraging Graph Neural Networks (GNNs) to model this interconnectedness, representing each cluster as a node and the relationships between them – dictated by gravity and gas dynamics – as edges. This approach moves beyond traditional methods that treat clusters in isolation, allowing the GNN to learn intricate dependencies and predict properties based on the cluster’s network context. By considering how clusters influence and are influenced by their neighbors, GNNs can uncover subtle correlations and provide a more holistic understanding of large-scale structure formation and evolution, ultimately offering improved estimations of cluster masses and their internal dynamics.
Galaxy clusters are not simply collections of galaxies; they are embedded within vast halos of diffuse light, known as Intracluster Light (ICL). Recent studies demonstrate that incorporating this ICL as a connected element within Graph Neural Networks (GNNs) significantly refines the understanding of a cluster’s overall gravitational potential. By treating the ICL as an integral part of the cluster’s network, rather than a detached phenomenon, GNNs can more accurately estimate the total mass, including dark matter. This approach yields substantial improvements in predicting the ICL mass fraction itself – a key indicator of cluster formation and evolution – offering a more holistic and precise assessment of these cosmic structures than traditional methods.

The application of machine learning to galaxy cluster research, as detailed in this work, reveals a fascinating interplay between observational data and theoretical modeling. This pursuit echoes a deeper truth about scientific inquiry itself. As Grigori Perelman once stated, “It is better to remain silent than to say something meaningless.” The methodologies described – multispectral observations enabling calibration of accretion and jet models – demonstrate a rigorous approach to minimizing ‘meaningless’ assertions. Comparison of theoretical predictions with EHT data demonstrates both the limitations and achievements of current simulations, acknowledging the inherent challenges in peering beyond the event horizon of our current understanding. The work, therefore, isn’t simply about refining mass estimations; it’s about honestly assessing the boundaries of knowledge.
The Horizon Beckons
The application of machine learning to galaxy clusters offers a seductive promise: to wring meaningful insight from data increasingly complex and voluminous. This work, and others like it, demonstrate the potential for these techniques to refine mass estimations, a cornerstone of cosmological inference. Yet, it’s a potential constantly shadowed by the limitations of the tools themselves, and more critically, by the simulations upon which they are trained. These simulations, elegant as they may be, remain imperfect representations of the universe – and any conclusions drawn from them are, at best, approximations.
The true challenge isn’t building more sophisticated algorithms, but confronting the irreducible uncertainties inherent in baryonic physics. Gas, stars, and active galactic nuclei introduce complexities that simulations struggle to fully capture, creating systematic errors that machine learning, however clever, cannot magically resolve. It’s a humbling reminder that physics is the art of guessing under cosmic pressure, and a beautiful model is still a model, not reality.
Future work will undoubtedly refine these methods, pushing the boundaries of precision. But a truly significant leap forward requires a deeper understanding of the underlying astrophysics, a willingness to question the assumptions embedded in our simulations, and an acceptance that, like all theories, this one may ultimately vanish beyond the event horizon of observational constraint.
Original article: https://arxiv.org/pdf/2605.21991.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Total Football free codes and how to redeem them (March 2026)
- Honor of Kings x Attack on Titan Collab Skins: All Skins, Price, and Availability
- Farming Simulator 26 arrives May 19, 2026 with immersive farming and new challenges on mobile and Switch
- FC Mobile 26 TOTS (Team of the Season) event Guide and Tips
- Top 5 Best New Mobile Games to play in May 2026
- PUBG Mobile x Harley-Davidson Partnership to introduce new Motor Cruise event with rewards and Skins
- Talking Tom Gold Run introduces its new Relic Rush event with a “toxic” twist
- Honkai: Star Rail Silver Wolf Lv. 999 Build Guide: Best Relics, Light Cone, Team Comps, and more
- Clash of Clans May 2026: List of Weekly Events, Challenges, and Rewards
- All Mobile Games (Android and iOS) releasing in May 2026
2026-05-24 11:57