Author: Denis Avetisyan
A new analysis reveals the accelerating integration of powerful AI models into the research process, but also exposes critical gaps in tracking their impact.

This review examines the increasing adoption of foundation models in scientific research, addressing challenges in measurement and advocating for open access and transparency.
While the increasing prevalence of artificial intelligence in scientific discovery is widely acknowledged, quantifying its true impact remains a challenge. This is addressed in ‘The Rapid Growth of AI Foundation Model Usage in Science’, a large-scale analysis revealing near-exponential growth in the adoption of these models-particularly in fields like linguistics, computer science, and engineering-with open-weight models dominating current usage. Our findings suggest a widening gap between the scale of models developed by AI builders and those utilized by scientists, potentially limiting the full benefits of AI-enabled research, as studies leveraging larger models demonstrate greater impact. Will bridging this gap-and fostering increased transparency in model accessibility-be crucial for realizing the full potential of AI to accelerate scientific progress?
Data Deluge and the Promise of Foundation Models
The relentless expansion of scientific data, coupled with its increasing complexity, presents a formidable challenge to traditional research methodologies. Historically, scientists have relied on hypothesis-driven experimentation and meticulous analysis of relatively contained datasets. However, modern science frequently deals with information streams orders of magnitude larger and more intricate – genomic sequences, climate simulations, astronomical surveys, and social network interactions, for example. This data deluge often overwhelms conventional analytical techniques, hindering the ability to discern meaningful patterns and relationships. Consequently, a pressing need has emerged for innovative approaches capable of effectively processing, interpreting, and extracting knowledge from these massive, high-dimensional datasets, pushing the boundaries of what is computationally and analytically feasible.
Originally developed to understand and generate human language, foundation models are rapidly extending their influence into the realm of scientific inquiry. These powerful algorithms, trained on massive datasets, demonstrate an ability to identify patterns and relationships previously hidden within complex scientific data – ranging from genomic sequences and material properties to astronomical observations and climate simulations. Unlike traditional, task-specific models, foundation models can be adapted to a wide array of scientific challenges with minimal retraining, offering a significant acceleration of the research process. This versatility promises to reshape fields like drug discovery, materials science, and environmental modeling, enabling scientists to tackle increasingly complex problems and fostering a new era of data-driven scientific exploration.
Foundation models are fundamentally reshaping scientific inquiry through their capacity to discern patterns and extract knowledge from previously intractable volumes of data. These models, trained on broad datasets, don’t require task-specific programming, allowing researchers to apply them across diverse scientific challenges – from materials discovery and drug design to climate modeling and genomic analysis. While still in its early stages, adoption is demonstrably increasing; as of 2024, nearly one percent of all scientific publications report the utilization or customization of foundation models, a figure indicating a rapidly accelerating trend and a fundamental shift in how scientific discovery is conducted. This growing integration suggests a future where these models become indispensable tools for researchers seeking to navigate the complexities of modern data-rich science and unlock new insights at an unprecedented pace.

Access Strategies: Open Weights vs. Restricted APIs
Foundation models are primarily deployed in two distinct access formats. Open-Weight Models provide users with complete access to the model’s parameters, enabling local execution, modification, and full customization of the model’s behavior. In contrast, Restricted-Access Models are hosted remotely and accessed through Application Programming Interfaces (APIs). This API-based access allows for use of the model’s capabilities without requiring local infrastructure or direct parameter manipulation, but limits the user’s ability to modify the model itself. The choice between these two deployment methods impacts the level of control, customization, and computational resources required for utilizing foundation model technology.
Model customization, enabled by Open-Weight foundation models, allows researchers to adapt pre-trained models to address specific scientific challenges. This tailoring process involves techniques such as fine-tuning, where model weights are adjusted using task-specific datasets, or parameter-efficient tuning methods. While still nascent, the adoption of customized foundation models in published research is growing; current data indicates that 0.4% of all publications now utilize these customized models, representing a measurable increase in their application to scientific inquiry and suggesting a potential trend towards greater specialization and optimization in model use.
Restricted-Access Foundation Models, typically accessed through Application Programming Interfaces (APIs), prioritize ease of implementation and reduced computational demands for users. However, this convenience comes at the cost of adaptability; researchers are limited to the functionalities and parameters exposed by the API provider and cannot modify the underlying model weights. This restriction inhibits the exploration of novel research directions that require bespoke model architectures or fine-tuning approaches, potentially limiting the scope of scientific inquiry and the development of specialized solutions beyond the provider’s intended use cases.

Measuring Scientific Impact: Beyond Simple Citation Counts
Citation analysis systematically evaluates the impact of research by examining the frequency with which scholarly works are cited by other publications. Resources like the Semantic Scholar Academic Graph (SSAG) provide large-scale, machine-readable data on citations, enabling quantitative assessments of influence. The SSAG, constructed from data on over 200 million papers, allows researchers to trace citation networks, identify key publications within a field, and measure the reach of specific ideas. This methodology moves beyond simple counts to incorporate network characteristics, providing a more nuanced understanding of research impact than relying solely on journal prestige or author reputation. Data derived from citation analysis can be used to inform funding decisions, evaluate researcher performance, and identify emerging trends in scientific literature.
Citation Count represents the number of times a publication has been referenced in subsequent scholarly works, serving as a direct indicator of its influence within the research community. Journal Impact Factor (JIF), calculated annually by Clarivate, measures the average number of citations received in a particular year by papers published in that journal during the two preceding years. While JIF is often used as a proxy for the relative importance of a journal, it is important to note that it is a journal-level metric and does not reflect the impact of individual articles. Both metrics are susceptible to various biases, including self-citation and field-specific citation patterns, and should be interpreted in conjunction with other indicators of research quality and impact.
Analysis of citation networks alongside data on Foundation Model (FM) utilization demonstrates a growing disparity in resource access within the scientific community. Current data, as of 2024, indicates that organizations building FMs utilize models with a median size 26 times larger than those utilized by organizations adopting and applying these models. This gap suggests a concentration of computational resources among a smaller group of builders, potentially influencing research directions and creating an asymmetry in the ability to conduct computationally intensive scientific inquiry. Tracking citation patterns alongside FM usage allows for the identification of research areas most reliant on large models, and helps quantify the extent to which resource limitations might affect research outcomes for adopting organizations.

Discipline-Specific Impact: From Biology to Chemistry
Foundation models are rapidly becoming indispensable tools across diverse scientific disciplines, extending far beyond their origins in natural language processing. These models, trained on vast datasets, demonstrate an unprecedented ability to generalize and adapt to new tasks without extensive task-specific training. This capability is revolutionizing fields like biology and chemistry, where researchers are leveraging foundation models to analyze complex data, predict molecular properties, and accelerate the discovery of novel materials and therapeutics. The increasing adoption rates-a remarkable 309% annual growth in biology and a 168% three-year compound annual growth rate in chemistry-underscores a significant shift towards this paradigm, suggesting that foundation models are not merely a trend, but a fundamental change in how scientific research is conducted and innovation is achieved.
The integration of foundation models is rapidly transforming biological research, offering powerful new tools for analyzing the increasingly complex datasets generated by modern experimentation. These models excel at identifying patterns and relationships within genomic, proteomic, and imaging data, significantly accelerating the pace of discovery. This capability is proving particularly valuable in drug discovery, where foundation models can predict the efficacy and safety of potential drug candidates, streamlining the traditionally lengthy and expensive process. Reflecting this impact, adoption of foundation models within the biological sciences has surged, experiencing an impressive 309% annual growth rate and signaling a fundamental shift in how biological questions are approached and answered.
Foundation models are rapidly transforming chemical research by enabling the accurate prediction of molecular properties and accelerating the design of novel materials. This computational leap bypasses the limitations of traditional methods, allowing scientists to virtually screen countless compounds for desired characteristics – such as stability, reactivity, or specific electronic properties – before committing to costly and time-consuming laboratory synthesis. The impact is reflected in the field’s impressive growth, with foundation model usage in chemistry experiencing a remarkable 168% compound annual growth rate over the past three years. This surge indicates a fundamental shift towards in silico experimentation, promising to unlock new frontiers in materials science, drug discovery, and sustainable chemistry through data-driven innovation.
Analysis of foundation model adoption reveals a distinctly collaborative research landscape, with team size playing a crucial role in successful implementation. Current data indicates that Linguistics departments are leading the charge, boasting a 34% adoption rate – a figure significantly higher than that of Computer Science and Engineering, which currently stand at 18% and 4.6% respectively. This disparity suggests that interdisciplinary teams, potentially leveraging linguistic expertise in data analysis and model interpretation, are proving particularly effective. Further research into the composition and size of these leading teams will be vital to understanding how best to foster widespread innovation and unlock the full potential of foundation models across all scientific disciplines.
The study meticulously charts the rise of foundation models in scientific literature, yet one suspects these metrics only capture the surface. Ada Lovelace observed that “The Analytical Engine has no pretensions whatever to originate anything.” This rings painfully true; these models accelerate existing research, but true innovation still demands human insight. The paper acknowledges difficulties in precisely quantifying adoption – a polite way of saying that chasing citation counts is a fool’s errand. Production research will inevitably find new ways to leverage – and break – these elegant tools, rendering any neatly calculated adoption rate obsolete before the next deployment cycle. It’s a useful snapshot, but hardly a prediction.
What’s Next?
The citation counts will, of course, continue to climb. Each new pre-print will boast a foundation model somewhere in its methodology, regardless of genuine necessity. The question isn’t whether these models will be used, but whether their use constitutes actual progress, or merely a more efficient means of generating plausible narratives. The bug tracker, in this case the peer-review system, will fill with reports of subtle biases and unacknowledged limitations. The metrics of ‘adoption’ outlined in this work are, at best, a snapshot of the hype cycle, not a measure of scientific impact.
Future work will inevitably focus on ‘explainability’ and ‘trustworthiness’ – buzzwords designed to soothe anxieties about opaque algorithms. These efforts will likely yield incremental improvements, but the fundamental problem remains: these models are exceptionally good at seeming to understand, and terrible at actually doing so. The illusion of intelligence is a powerful force, and disentangling it from genuine discovery will be a decades-long undertaking.
It’s not a deployment-it’s a letting go. The field will need to grapple with the inevitable consequences of scaling these systems: the amplification of existing inequalities, the erosion of critical thinking, and the increasing difficulty of distinguishing signal from noise. The next citation analysis won’t measure progress, it will measure the rate at which we’ve outsourced our curiosity.
Original article: https://arxiv.org/pdf/2511.21739.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Clash Royale Best Boss Bandit Champion decks
- Clash Royale December 2025: Events, Challenges, Tournaments, and Rewards
- December 18 Will Be A Devastating Day For Stephen Amell Arrow Fans
- Clash Royale Furnace Evolution best decks guide
- Clash Royale Witch Evolution best decks guide
- Mobile Legends X SpongeBob Collab Skins: All MLBB skins, prices and availability
- Mobile Legends December 2025 Leaks: Upcoming new skins, heroes, events and more
- Esports World Cup invests $20 million into global esports ecosystem
- Mobile Legends November 2025 Leaks: Upcoming new heroes, skins, events and more
- BLEACH: Soul Resonance: The Complete Combat System Guide and Tips
2025-12-01 10:38