Designing Materials by How They’re Made

Author: Denis Avetisyan

A new perspective argues that focusing on the synthesis process itself, rather than just material structure, is key to accelerating materials discovery.

The research proposes a shift from traditional molecule discovery-which prioritizes virtual structure prediction and often falters at practical synthesis-to a workflow elevating executable synthesis protocols as central design elements, effectively closing the gap between prediction and realization through autonomous experimentation.

This review highlights the potential of AI-driven synthesis protocol-property relationships to bridge the ‘synthesizability gap’ and enable autonomous materials design.

Despite advances in artificial intelligence for materials discovery, a critical gap persists between computationally predicted structures and their actual synthesizability. The perspective presented in ‘Beyond Structure: Revolutionising Materials Discovery via AI-Driven Synthesis Protocol-Property Relationships’ argues that overcoming this ‘synthesizability gap’ requires a shift from structure-centric to synthesis-centric approaches, treating executable synthesis protocols-rather than just atomic configurations-as primary design variables. This framework, defined by the causal relationship [latex]P \rightarrow X \rightarrow y[/latex] (protocol to structure to properties), necessitates machine-readable protocol representation, generative modelling of reaction pathways, and closed-loop optimisation with experimental feedback. Could this synthesis-first paradigm unlock a new era of autonomous materials innovation and accelerate the discovery of sustainable, functional materials?

The Illusion of Predictability: Beyond Structure-Property Relationships

For decades, the field of materials science has been fundamentally guided by the Structure-Property Paradigm, a principle asserting a direct correlation between a material’s atomic arrangement and its macroscopic characteristics. This approach posits that by meticulously controlling the organization of atoms – whether in crystalline lattices, amorphous networks, or complex composites – scientists can predictably engineer desired properties like strength, conductivity, or optical behavior. Historically, this has involved iterative cycles of synthesis, characterization, and refinement, often driven by intuition and empirical observation. While immensely successful in yielding countless materials innovations, the traditional paradigm is increasingly challenged by the sheer complexity of modern materials and the vastness of compositional space, prompting a search for more efficient and predictive discovery methods that build upon – and extend – this foundational principle.

Computational materials science has been revolutionized by high-throughput Density Functional Theory (DFT) calculations, which systematically explore vast chemical spaces to predict material properties. This approach relies heavily on large-scale databases – notably the Materials Project, AFLOW, and the Open Quantum Materials Database (OQMD) – that provide pre-calculated data and facilitate efficient screening of potential candidates. However, even with these advancements and increasingly powerful computing infrastructure, DFT remains computationally expensive, particularly when dealing with complex materials or requiring high accuracy. The calculations scale unfavorably with system size and complexity, limiting the ability to fully explore all possible material combinations and hindering the discovery of truly novel compounds. Consequently, researchers continually seek more efficient algorithms and approximations, alongside advancements in hardware, to overcome these computational bottlenecks and accelerate materials discovery.

Despite increasingly sophisticated computational materials science, a stark disconnect remains between prediction and realization. While high-throughput calculations can now rapidly screen vast chemical spaces and propose novel materials with desired properties, fewer than 2% of these computationally derived candidates are ever successfully synthesized and verified in the laboratory. This ‘Synthesizability Gap’ isn’t merely a matter of experimental difficulty; it highlights fundamental limitations in current predictive models. These models often fail to fully account for the complex kinetic and thermodynamic factors governing material formation, overlooking crucial aspects of chemical reactivity, phase stability under realistic conditions, and the practical challenges of controlling stoichiometry during synthesis. Bridging this gap requires not only advancements in computational power, but also the integration of materials informatics, machine learning, and a deeper understanding of the chemical principles that govern how atoms assemble into stable, realizable structures.

A closed-loop discovery cycle integrates heterogeneous data, AI/ML modeling-including updating of [latex]P \to X[/latex] and [latex]X \to y[/latex] models-and automated execution to enable time-resolved characterization and optimization.

Inverting the Process: Prioritizing Synthesis

The conventional approach to chemical synthesis prioritizes target molecule design, with synthetic routes determined subsequently. The Synthesis-First Paradigm inverts this process, establishing synthesis protocols – reaction types, reagents, and conditions – as the initial design variables. This means exploring synthetic accessibility before defining the target molecule, effectively shifting the focus from what can be made to how things are made. By treating synthesis as a proactive design element, researchers aim to expand chemical space exploration and overcome limitations imposed by retrospective synthetic analysis, ultimately enabling the discovery of novel compounds previously considered inaccessible.

Protocol Representation involves encoding chemical synthesis knowledge – including reactants, reagents, conditions, and transformations – into a structured, machine-readable format. This representation is not simply a database of reactions; it captures the rules governing chemical transformations, allowing algorithms to understand and predict synthetic feasibility. Common approaches utilize graph-based representations where atoms are nodes and bonds are edges, augmented with attributes describing reaction conditions and stereochemistry. A standardized Protocol Representation facilitates computational exploration of chemical space by enabling algorithms to search for, evaluate, and propose novel synthetic pathways based on established chemical principles, rather than relying solely on previously reported examples. This structured data is essential for training and validating generative models used in automated synthesis design.

Generative models are essential for de novo synthesis planning due to their ability to propose potential reaction sequences. Variational Autoencoders (VAEs) learn compressed latent representations of molecular structures and reactions, enabling the generation of novel compounds and pathways. Generative Adversarial Networks (GANs) utilize a competitive process between a generator and discriminator to create realistic and synthetically plausible routes. Diffusion models, inspired by non-equilibrium thermodynamics, progressively add noise to data and then learn to reverse the process, effectively generating new synthetic schemes. Autoregressive models predict subsequent reaction steps based on preceding ones, offering a sequential approach to pathway construction. Each of these model types contributes unique strengths to the challenge of efficiently exploring chemical space and identifying viable synthetic routes to target molecules.

Intelligent Exploration: Navigating the Synthesis Landscape

Bayesian Optimization (BO) addresses the challenge of efficiently searching the vast and often non-convex space of chemical synthesis protocols. BO employs a probabilistic surrogate model, typically a Gaussian Process, to approximate the unknown objective function – for example, reaction yield or selectivity – based on a limited number of evaluated experiments. This surrogate model, combined with an acquisition function, guides the selection of the next experiment to perform, balancing exploration of uncertain regions with exploitation of areas predicted to yield high performance. The acquisition function, such as Probability of Improvement or Expected Improvement, quantifies the desirability of evaluating a given set of synthesis conditions, allowing BO to iteratively refine the surrogate model and converge on optimal or near-optimal protocols with fewer experimental iterations than traditional methods like grid search or random sampling.

Self-Driving Laboratories (SDLs) represent a closed-loop system integrating robotic automation with machine learning algorithms to accelerate materials synthesis and discovery. These systems automate experimental tasks including reagent handling, reaction monitoring via spectroscopic techniques, and product analysis, thereby removing manual intervention and increasing throughput. Computationally proposed synthesis conditions are directly executed by the robotic system, and the resulting experimental data is fed back into the machine learning models for iterative refinement of predictions. This automation enables rapid validation or rejection of hypotheses, leading to a significant reduction in experimental time and resource consumption compared to traditional, manual experimentation. SDLs facilitate the exploration of vast chemical spaces and optimization of reaction parameters with minimal human oversight, offering a pathway to accelerated materials innovation.

Active Learning and Reinforcement Learning (RL) methodologies enhance automated synthesis exploration by strategically prioritizing experimental investigations. Active Learning algorithms iteratively select the most informative experiments to minimize uncertainty in predictive models, reducing the total number of required trials. In contrast, RL approaches treat the synthesis protocol optimization as a sequential decision-making problem; an agent learns to navigate the synthesis space by maximizing a defined reward function, such as reaction yield or selectivity, through trial and error. Both techniques, when integrated into self-driving laboratory platforms, facilitate accelerated protocol optimization by focusing experimental resources on the most promising areas of the synthesis landscape, thereby surpassing the efficiency of purely randomized or grid-search based approaches.

Physics-Informed Neural Networks (PINNs) enhance predictive accuracy and efficiency in synthesis space exploration by integrating fundamental physical principles directly into the network architecture. These networks are trained not only on experimental data, but also on the governing equations – such as those derived from CALPHAD (CALculation of PHAse Diagrams) – ensuring predictions adhere to known physical constraints. Multi-fidelity learning techniques further optimize training by utilizing data of varying computational cost; lower-fidelity data, generated quickly, provide broad coverage of the search space, while higher-fidelity, more accurate data refine predictions in promising regions. This approach reduces the reliance on large, computationally expensive datasets and improves generalization performance, especially when dealing with limited experimental data or complex, multi-parameter systems.

Beyond Prediction: Realizing the Promise of Synthesis

Traditional materials discovery often prioritizes predicting materials with desirable properties, leading to a significant bottleneck when attempting actual synthesis – fewer than 2% of computationally designed materials are ever successfully created. The Synthesis-First Paradigm offers a crucial shift in approach, prioritizing the prediction of materials that are synthesizable – meaning they can realistically be made in a laboratory setting. By incorporating synthetic accessibility as a core criterion during the materials design process, this paradigm effectively narrows the search space to compounds achievable with current or near-future experimental techniques. This focus not only increases the likelihood of realizing predicted materials, but also streamlines the innovation cycle, allowing researchers to rapidly test, refine, and ultimately deploy novel compounds with targeted functionalities, fostering progress in fields ranging from renewable energy to advanced manufacturing.

The advent of self-driving laboratories is poised to revolutionize materials science, dramatically compressing the timeline from computational prediction to physical realization. These facilities integrate robotic systems with machine learning algorithms, enabling autonomous experimentation, data analysis, and iterative refinement of material synthesis protocols. By automating repetitive tasks and intelligently exploring vast compositional spaces, these labs circumvent the bottlenecks of traditional, manual research. The resulting acceleration isn’t merely incremental; it facilitates the rapid prototyping and optimization of materials with targeted properties, fostering breakthroughs in areas like renewable energy storage, advanced electronics, and sustainable manufacturing. This closed-loop system, where experiments inform algorithms and algorithms guide experiments, promises a future where materials innovation is limited only by the boundaries of scientific imagination, rather than the constraints of time and resources.

The promise of a synthesis-first approach to materials discovery lies in its potential to access a previously unattainable wealth of materials designed for specific purposes. Current methods often predict materials computationally, yet the vast majority remain unrealized due to synthetic challenges; this new paradigm prioritizes synthesizability from the outset, effectively opening doors to a reservoir of compounds with tailored electrical, mechanical, or thermal properties. Consequently, advancements in automated laboratories and machine learning, guided by this principle, could yield breakthroughs in crucial areas like renewable energy storage – developing more efficient batteries or solar cells – and sustainable manufacturing – creating lighter, stronger, and more durable materials with reduced environmental impact. This isn’t merely about discovering new substances, but about engineering materials to directly address pressing global challenges, offering solutions previously considered beyond reach.

The current landscape of materials discovery is hampered by a stark realization gap: fewer than 2% of materials predicted by computational methods are actually synthesized and verified. This low rate stems from challenges in translating theoretical designs into practical, reproducible laboratory procedures. However, a shift towards a ‘Synthesis-First’ paradigm, coupled with automated laboratories and machine learning, offers a pathway to dramatically increase this success rate. By prioritizing synthesizability during the design phase and leveraging automation to rapidly test and refine materials, researchers anticipate a significant acceleration in materials innovation, unlocking a far greater proportion of the theoretically possible material space and bringing promising new technologies closer to reality.

The pursuit of novel materials, as detailed in this perspective, often fixates on the idealized structure, neglecting the chaotic reality of bringing something into being. It’s a curious thing, this focus on the destination while ignoring the path. As Confucius observed, “The gem cannot be polished without friction, nor man perfected without trials.” The ‘synthesizability gap’ isn’t merely a technical hurdle; it’s a consequence of treating synthesis protocols as afterthoughts, not as fundamental design variables. The models, these spells cast upon data, will continue to fail until they acknowledge the friction-the imperfections, the noise-inherent in the process of creation. Beautiful predictions remain illusions if they cannot withstand the trials of the laboratory.

What’s Next?

The insistent push toward synthesis-centric artificial intelligence feels less like a breakthrough and more like acknowledging a longstanding debt. For too long, the field has treated synthesis as an afterthought, a messy implementation detail. Now, the algorithms demand protocols as first-class citizens, but this merely shifts the burden. The true challenge isn’t representing how something is made, but predicting which protocols will actually yield anything sensible – or, crucially, which won’t. Expect a surge in negative results, carefully curated and fed back into the learning loops; noise, after all, is just truth without funding.

The ‘synthesizability gap’ won’t be closed with bigger models; it will be narrowed by embracing the inherent messiness of materials creation. Current approaches still assume a level of control that rarely exists. Future work must account for stochasticity, imperfect reagent purity, and the subtle, undocumented variations in experimental setups. Consider this not a problem to be solved, but a fundamental property to be modeled – a dance with chaos, not a quest for order.

Ultimately, the success of this paradigm won’t be measured in optimized structures, but in the volume of failed syntheses the algorithms accurately predict. A machine that can tell you what won’t work is far more valuable than one that occasionally stumbles upon something novel. It’s a humbling thought: perhaps the highest form of materials discovery is knowing when to stop.

Original article: https://arxiv.org/pdf/2605.00313.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Predictability: Beyond Structure-Property Relationships

Inverting the Process: Prioritizing Synthesis

Intelligent Exploration: Navigating the Synthesis Landscape

Beyond Prediction: Realizing the Promise of Synthesis

What’s Next?

See also: