Author: Denis Avetisyan
Researchers have unveiled a powerful agentic framework capable of independently conducting complex scientific investigations and achieving state-of-the-art results.
InternAgent-1.5 integrates long-horizon memory and automated experimentation within a unified agentic system to advance scientific discovery across algorithmic and empirical domains.
Despite increasing demands for accelerated scientific progress, fully autonomous discovery remains a significant challenge. This paper introduces InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery, a system designed to address this by integrating generative, verification, and evolutionary subsystems with deep research capabilities and long-horizon memory. InternAgent-1.5 achieves state-of-the-art performance on scientific reasoning benchmarks and demonstrates its capacity for both algorithmic and empirical discovery-including executing complete experiments in diverse domains. Could this unified agentic framework represent a scalable path toward truly autonomous scientific exploration and accelerate the pace of innovation?
The Erosion of Traditional Inquiry
Historically, the advancement of scientific understanding has been deeply intertwined with the cognitive abilities of individual researchers and teams. This reliance on human expertise, while yielding remarkable discoveries, inherently limits the pace and scope of progress; complex experiments demand substantial time, funding, and highly specialized skills. The iterative process of hypothesis formation, experimental design, data analysis, and peer review, though crucial for rigor, can be remarkably slow, particularly when exploring uncharted scientific territory. Moreover, resource limitations-from access to cutting-edge equipment to the availability of qualified personnel-often constrain the number of investigations that can be pursued simultaneously, creating bottlenecks in the pursuit of knowledge and hindering the exploration of potentially groundbreaking ideas.
Although contemporary artificial intelligence excels at identifying correlations within vast datasets – a skill often lauded in applications like image and speech recognition – genuine scientific discovery demands far more than pattern detection. True innovation necessitates the ability to construct causal models, formulate hypotheses that extend beyond the observed data, and generalize principles to novel, unforeseen situations. Current AI systems frequently falter at these crucial steps, often mistaking correlation for causation or struggling to apply learned knowledge flexibly. This limitation stems from a reliance on statistical learning rather than a deep understanding of underlying mechanisms, hindering their capacity to independently design experiments, interpret ambiguous results, or propose truly groundbreaking theories – capabilities that remain central to the human scientific process.
The advancement of artificial intelligence in scientific discovery is significantly hampered by a critical flaw in current evaluation methods; existing benchmarks often prioritize easily quantifiable tasks, failing to reflect the ambiguity, incomplete data, and creative problem-solving inherent in frontier research. These benchmarks typically assess performance on well-defined problems with clear-cut solutions, neglecting the ability of a system to formulate hypotheses, design experiments to address open questions, and critically evaluate unexpected results. Consequently, AI models may achieve high scores on standard tests while remaining incapable of tackling the genuinely novel challenges that drive scientific progress – essentially excelling at solving known problems rather than discovering new ones. This discrepancy necessitates the development of more sophisticated evaluation frameworks that can accurately gauge an AI’s capacity for independent thought, adaptability, and genuine scientific reasoning, moving beyond simple pattern recognition to embrace the complexities of the scientific process.
A System for Accelerated Decomposition
InternAgent-1.5 represents a complete system for scientific discovery by integrating three core processes: hypothesis generation, experimental verification, and iterative refinement. This unified framework moves beyond isolated stages of research, allowing the system to autonomously formulate research questions, design experiments to test those questions, analyze the resulting data, and then use those findings to generate new hypotheses. By combining these processes within a single system, InternAgent-1.5 aims to accelerate the pace of scientific investigation and facilitate the exploration of complex research areas without constant human intervention. The system’s architecture is designed to create a closed-loop cycle of learning and discovery, enabling continuous improvement and adaptation throughout the research process.
InternAgent-1.5 utilizes a Structured Cognitive Memory (SCM) as its foundational knowledge representation. The SCM is organized hierarchically into three primary components: Task-Episodic memory, responsible for storing specific experiences and observations related to completed and ongoing tasks; Semantic-Knowledge memory, which houses generalized, factual knowledge and concepts extracted from data and external sources; and Strategy-Procedural memory, dedicated to retaining learned procedures, algorithms, and optimal strategies for problem-solving. This tripartite structure facilitates persistent learning by allowing the system to retain and build upon past experiences, generalize knowledge for broader applicability, and refine its methodologies through adaptive refinement based on performance feedback.
InternAgent-1.5 utilizes Large Language Models (LLMs) to process existing scientific literature and databases, synthesizing information to formulate novel hypotheses. This LLM-driven knowledge synthesis is coupled with automated tools for experimental design, including the specification of parameters, controls, and data acquisition methods. Following experimentation, automated data analysis pipelines process the results, employing statistical methods and machine learning algorithms to validate or refute the generated hypotheses. The LLM then integrates these findings, refining its knowledge base and iteratively improving subsequent hypothesis generation and experimental design processes, creating a closed-loop scientific discovery system.
Refining the Inevitable Decay
InternAgent-1.5’s Solution Refinement process operates by continuously evaluating the projected outcomes of experimental proposals and iteratively adjusting methodologies to maximize the probability of achieving desired results. This is achieved through a feedback loop where initial proposals are simulated or tested, and the resulting data is used to refine the parameters of subsequent iterations. The system doesn’t simply execute a pre-defined protocol; it actively learns from each cycle, allowing it to optimize experimental design, resource allocation, and data analysis techniques. This mirrors the scientific method by prioritizing empirical evidence and adapting approaches based on observed outcomes, ultimately leading to more efficient and robust research processes.
InternAgent-1.5 incorporates Automated Machine Learning (AutoML) to facilitate autonomous algorithm design and optimization for targeted scientific challenges. This integration allows the system to independently search for, evaluate, and refine algorithmic approaches without explicit human intervention. The AutoML pipeline within InternAgent-1.5 typically involves defining a search space of potential algorithms, utilizing techniques like Bayesian optimization or reinforcement learning to navigate this space, and employing cross-validation to assess the performance of candidate algorithms on relevant datasets. This process enables the system to identify algorithms that maximize performance metrics specific to the scientific problem at hand, accelerating the research cycle and potentially uncovering novel algorithmic solutions.
InternAgent-1.5 incorporates a ‘Long Horizon Memory’ system to address the challenges of maintaining contextual relevance in prolonged research endeavors. This memory isn’t simply data storage; it’s an active, associative system capable of retaining and integrating information across multiple experimental cycles and algorithmic refinements. The system utilizes a multi-layered approach, preserving both short-term operational data and long-term conceptual frameworks. This allows InternAgent-1.5 to not only recall previous results, but to understand their significance within the broader research context, facilitating complex reasoning and sustained conceptual development beyond the limitations of typical short-term memory architectures. The implementation relies on a vector database coupled with a retrieval mechanism that prioritizes conceptual similarity over chronological order, enabling efficient access to relevant information regardless of when it was initially generated.
Expanding the Horizon of Entropy
InternAgent-1.5 represents a significant leap in artificial intelligence, moving beyond isolated benchmark performance to exhibit genuine proficiency in complex scientific data analysis across diverse fields. The system doesn’t simply answer questions; it actively engages with data inherent to Life Science, Physical Science, and Earth Science, demonstrating an ability to interpret results, identify patterns, and formulate insights. This capability is evidenced not only by superior scores on evaluations like SGI-Bench, GAIA, and HLE, but also by achieving state-of-the-art results on specialized tests such as GPQA-diamond and AutoTSF. By automating key steps in the scientific process – from data interpretation to hypothesis generation – InternAgent-1.5 promises to accelerate the pace of discovery and unlock new understandings in a multitude of scientific disciplines, marking a transition towards AI as a collaborative tool for researchers.
InternAgent-1.5 demonstrates a marked advancement in tackling complex academic reasoning, as evidenced by its superior performance on established benchmarks. The system achieved a 37.74% score on SGI-Bench’s ‘Deep Research’ component, substantially exceeding the 18.48% attained by Gemini-3-pro. Furthermore, InternAgent-1.5’s capacity for innovative thought is highlighted by its 58.11% score on SGI-Bench’s ‘Idea Generation’ task, surpassing GPT-5’s 55.40%. These results indicate not merely incremental improvement, but a substantial leap in the ability to perform and contribute to the demanding processes inherent in scientific inquiry and discovery.
InternAgent-1.5 demonstrates a significant leap in automating scientific workflows, promising to expedite research across diverse fields. Performance benchmarks reveal its capacity to not only process but also synthesize information, achieving 40.87% accuracy on the challenging HLE benchmark – surpassing existing models – and leading on FrontierScience with 77.20% accuracy, exceeding DeepSeek-V3.2-Thinking. This automated reasoning extends to specialized areas like question answering, where it attains 87.37% accuracy on GPQA-diamond, and chemical property prediction with a top-1 accuracy of 0.86 on ChemCoTBench, outperforming established models. Furthermore, in time series forecasting, InternAgent-1.5 achieves a remarkably low RMSE of 0.8488 on AutoTSF, considerably improving upon traditional methods like Kriging and BCSD, suggesting its potential to refine predictive modeling and accelerate scientific discovery.
The pursuit of automated scientific discovery, as exemplified by InternAgent-1.5, reveals a fascinating tension. The framework’s capacity for long-horizon memory and experimentation suggests a striving for permanence within a fundamentally impermanent process. As Henri Poincaré observed, “Mathematics is the art of giving reasons, and mathematical rigor is nothing more than a way of ensuring that our reasons are valid.” This echoes the need for InternAgent-1.5 to establish a robust, verifiable knowledge graph – a ‘reason’ for its conclusions – against the inevitable decay of information and the challenges of complex data. The system’s architecture doesn’t prevent entropy, but rather attempts to manage it, achieving temporary stability in the face of the universe’s relentless march forward. It is a testament to the notion that systems age not because of errors, but because time is inevitable.
What Lies Ahead?
InternAgent-1.5 represents a predictable, if notable, advance. Any improvement, however elegantly constructed, ages faster than expected. The demonstrated capacity for autonomous scientific discovery, while impressive, merely shifts the locus of inevitable decay. The current architecture, reliant on knowledge graphs and generative AI, still faces the fundamental challenge of distinguishing signal from noise across truly long horizons – the system will eventually encounter data that fundamentally undermines its established heuristics.
The pursuit of ‘long-horizon memory’ is, in essence, a search for stable invariants in a relentlessly changing universe. The framework’s success on benchmarks is a snapshot; true validation will require sustained performance across decades of accumulating data, confronting the very biases it currently mitigates. Rollback, the process of correcting erroneous conclusions, is not simply a return to a previous state, but a journey back along the arrow of time, increasingly difficult and imprecise with each step.
Future work must address not just the scaling of computational resources, but the development of more robust ontologies capable of accommodating genuine novelty. The question is not whether InternAgent-1.5 can discover, but whether it can unlearn gracefully-a far more demanding, and ultimately more revealing, test.
Original article: https://arxiv.org/pdf/2602.08990.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Gold Rate Forecast
- Married At First Sight’s worst-kept secret revealed! Brook Crompton exposed as bride at centre of explosive ex-lover scandal and pregnancy bombshell
- MLBB x KOF Encore 2026: List of bingo patterns
- Outlander’s Caitríona Balfe joins “dark and mysterious” British drama
- Mystic Realms introduces portal-shifting card battles with legendary myth-inspired cards, now available on mobile
- How TIME’s Film Critic Chose the 50 Most Underappreciated Movies of the 21st Century
- Bianca Censori finally breaks her silence on Kanye West’s antisemitic remarks, sexual harassment lawsuit and fears he’s controlling her as she details the toll on her mental health during their marriage
- Wanna eat Sukuna’s fingers? Japanese ramen shop Kamukura collabs with Jujutsu Kaisen for a cursed object-themed menu
- Bob Iger revived Disney, but challenges remain
- First look at John Cena in “globetrotting adventure” Matchbox inspired movie
2026-02-10 11:43