Beyond the Search: Reimagining Systematic Literature Reviews

Author: Denis Avetisyan

A new human-centered tool aims to transform the laborious process of synthesizing research, shifting focus from tedious tasks to strategic knowledge discovery.

Arcis streamlines systematic literature review by integrating search, comparison, and discovery into a unified workflow, further enhanced by a human-verifiable AI filtering module that enables strategic oversight and iterative refinement of results.

This paper introduces Arc, an integrated platform designed to reduce cognitive load and enhance exploration in systematic literature reviews through responsible AI assistance.

Despite the foundational role of Systematic Literature Reviews (SLRs) in advancing scientific knowledge, researchers are often burdened by fragmented tools and overwhelming data volumes. This paper, ‘From Toil to Thought: Designing for Strategic Exploration and Responsible AI in Systematic Literature Reviews’, details the development and evaluation of ARC, a novel environment designed to alleviate cognitive load and foster more strategic exploration during the SLR process. Our findings suggest that an integrated, human-in-the-loop approach, leveraging transparent AI assistance and external representations, can shift researchers’ focus from administrative tasks to insightful knowledge synthesis. How can we further refine such tools to not only streamline the SLR process, but also ensure responsible and verifiable AI contributions to the evolving landscape of scholarly work?

The Weight of Evidence: Challenges in Knowledge Synthesis

Systematic Literature Reviews (SLRs) form the bedrock of evidence-based practice across numerous disciplines, offering a comprehensive and transparent summation of existing research on a specific question. However, the very thoroughness that defines an SLR contributes to substantial demands on both time and resources. Each review necessitates exhaustive searching of multiple databases, meticulous screening of potentially relevant studies – often numbering in the thousands – and rigorous data extraction followed by quality assessment. This process, while essential for minimizing bias and ensuring reliability, can require months, or even years, of dedicated effort from experienced researchers. The intensive nature of SLRs frequently presents a barrier to timely knowledge translation, hindering the rapid application of research findings to real-world problems and impacting areas such as healthcare, policy-making, and environmental management.

The sheer volume of published scientific research presents a significant obstacle to timely knowledge synthesis. Exponential growth in scholarly output-driven by increased research funding, collaborative efforts, and the proliferation of journals-far outpaces the capacity of traditional review methods. Manual screening of articles, a cornerstone of systematic reviews, becomes increasingly impractical as the literature expands, leading to substantial delays in translating research findings into actionable insights and clinical practice. This bottleneck not only hinders progress in various fields but also risks perpetuating outdated or incomplete understandings, as reviews struggle to keep pace with the latest evidence. Consequently, researchers are actively exploring automated tools and innovative approaches to efficiently navigate and synthesize the ever-growing body of scientific literature.

The pursuit of robust and reliable conclusions in systematic literature reviews necessitates adherence to stringent methodological guidelines, such as those outlined in the PRISMA Statement, but this commitment introduces considerable procedural complexity. PRISMA, and similar frameworks, demand meticulous documentation of search strategies, transparent inclusion/exclusion criteria, and rigorous assessment of study quality and risk of bias. While these requirements are essential for minimizing errors and ensuring the validity of findings, they significantly increase the time and expertise required to conduct a review. Detailed reporting protocols, comprehensive data extraction, and the often-subjective evaluation of methodological flaws all contribute to a more resource-intensive process, creating a substantial challenge for researchers attempting to synthesize the rapidly expanding body of scientific knowledge.

A comparative user study assessed two SLR conditions-baseline and Arc-by dividing participants into balanced groups who completed three tasks, alternating between conditions after each task and completing a post-task questionnaire.

Arc: An Integrated System for Streamlined Reviews

Arc is a unified platform designed to facilitate and enhance the process of Systematic Literature Reviews (SLRs). Unlike fragmented workflows relying on disparate tools, Arc integrates key SLR stages – including search, screening, data extraction, and reporting – into a single system. This integration aims to reduce manual effort, minimize errors associated with data transfer between applications, and improve overall review efficiency. The system is built as a ‘design probe’, meaning it’s intended to be iteratively developed and refined based on user feedback and evolving best practices in SLR methodology. By offering a centralized and adaptable environment, Arc supports the rigorous and reproducible nature of SLRs while addressing common challenges related to workflow management and data synthesis.

AI-Assisted Screening within Arc utilizes Large Language Models (LLMs) to expedite the initial assessment of research papers during Systematic Literature Reviews. This feature analyzes paper abstracts and titles, then suggests relevance tags – such as ‘included’, ‘excluded’, or ‘uncertain’ – to assist reviewers. By providing these suggestions, the LLMs reduce the time required for manual screening, enabling faster identification of potentially relevant studies. It is important to note that these are suggestions only, and human reviewers retain full control over the final relevance determination.

Arc’s Human-in-the-Loop design prioritizes human reviewer control throughout the Systematic Literature Review process. While Arc utilizes AI, specifically Large Language Models, to assist with tasks like relevance tagging, all AI suggestions are presented to human reviewers for validation and modification. This approach avoids fully automated screening, allowing experts to apply nuanced judgment and domain-specific knowledge to ensure accuracy and minimize bias. Reviewers can accept, reject, or edit AI-generated tags, and the system learns from these corrections to improve future suggestions. This iterative feedback loop ensures that human expertise remains central to the review’s integrity and quality, rather than relying solely on algorithmic outputs.

Arc’s Multi-Database Search functionality moves beyond the limitations of single-database systematic reviews by simultaneously querying multiple scholarly databases – including PubMed, Scopus, and Web of Science – through a unified interface. This capability significantly broadens the scope of literature discovery, reducing the risk of publication bias and increasing the comprehensiveness of the evidence base. The system then de-duplicates search results, presenting a consolidated list of records for review, and supports the export of search strings for transparent and reproducible research.

The Arc technical workflow encompasses data acquisition, model training, and deployment for creating and refining robotic skills.

Validating Arc: User Experience and Efficiency Gains

Prior to Arc’s development, an exploratory design study was conducted to ascertain the specific needs and challenges faced by researchers during literature review. This study employed a combination of interviews and observational methods to identify key pain points, including difficulties in efficiently filtering large volumes of search results, the cognitive burden associated with tracking relevant papers, and the time-consuming nature of snowball sampling. Findings indicated a need for a system that facilitates iterative query refinement, supports transparent comparison of search results, and minimizes mental effort during the information retrieval process. The results of this study directly informed the design and functionality of Arc, shaping features intended to address the identified researcher needs and improve overall research efficiency.

A user study was conducted to comparatively evaluate Arc against a baseline manual research process, focusing on three key metrics: usability, efficiency, and user satisfaction. Usability was assessed through task completion rates and error analysis, while efficiency was measured via task completion time. User satisfaction was quantified using the NASA Task Load Index (NASA-TLX) to gauge perceived cognitive load and mental demand. The study involved researchers performing representative tasks – paper filtering and snowballing – using both Arc and the baseline method, allowing for a direct comparison of performance and subjective experience.

The Arc system incorporates an Iterative Search Comparison feature which allows researchers to view and assess the impact of modifications to their search queries in real-time. This functionality presents results based on the current query alongside results from previous iterations, facilitating a direct comparison of retrieval sets. Researchers can therefore immediately determine whether alterations to keywords, filters, or Boolean operators are improving the relevance and precision of the search results, enabling rapid refinement of their information retrieval strategy and minimizing time spent on unproductive searches.

Quantitative evaluation of Arc demonstrated an 18% reduction in the time required for paper filtering tasks compared to a manual baseline approach. Concurrently, the NASA Task Load Index (NASA-TLX) – a subjective measure of cognitive demand – yielded significantly lower scores for Arc users (mean = 2.75) versus baseline participants (mean = 3.67). This statistically significant difference in NASA-TLX scores indicates a measurable reduction in perceived mental effort and cognitive load when utilizing Arc for literature review, supporting its potential to improve researcher efficiency and reduce fatigue.

User study data indicates that Arc significantly improved efficiency in the snowballing task, reducing average completion time to approximately 1 minute. This represents a substantial decrease compared to the baseline manual method, which required an average of 10.68 minutes to complete the same task. This nearly tenfold reduction in time suggests Arc’s features effectively streamline the process of identifying and incorporating relevant research from cited sources, offering a considerable time saving for researchers.

Arc’s design incorporates principles of cognitive load management to minimize mental effort during literature review. Specifically, the interface prioritizes information density and reduces extraneous visual elements to decrease processing demands. Features like the Iterative Search Comparison tool allow researchers to directly assess the impact of query refinements, reducing the need for repeated full searches and associated mental tracking. Furthermore, streamlined workflows for tasks such as paper filtering and snowball sampling, demonstrated by an 18% reduction in filtering time and a decrease in NASA-TLX scores from 3.67 to 2.75, contribute to a diminished cognitive burden and improved researcher focus.

Users can directly compare any two previous searches, highlighting differences in both search criteria-such as keywords and databases-and the resulting papers.

Towards Living Systematic Reviews and Open Science

The conventional systematic review, a snapshot in time, often becomes outdated before publication due to the constant influx of new research. Arc addresses this limitation by facilitating Living Systematic Reviews – a dynamic process of continuous updating. Rather than a one-time endeavor, Arc enables automated surveillance of relevant databases, identifying and incorporating new studies as they emerge. This iterative approach ensures that evidence syntheses remain current and reflect the most up-to-date understanding of a topic, providing a more reliable foundation for decision-making in fields like healthcare and policy. By shifting from static reviews to living knowledge resources, Arc promises to drastically reduce the time lag between research and its application, ultimately accelerating the pace of scientific progress and improving outcomes.

The foundational design of Arc prioritizes transparency and collaborative science by actively supporting Open Science principles. The system’s architecture isn’t simply about automating tasks; it’s built to maximize data accessibility, ensuring that all stages of the systematic review process – from search strategies to data extraction and analysis – are openly available for scrutiny and validation. This commitment extends to facilitating reproducibility, as the platform logs each step, making it possible for others to independently verify findings and build upon existing research. By openly sharing workflows, code, and extracted data, Arc fosters a more collaborative and trustworthy scientific ecosystem, moving beyond traditional, often opaque, research practices and accelerating the pace of discovery.

The core of Arc’s operational capacity lies in its robust Application Programming Interface (API), a critical component enabling seamless interaction with a vast network of databases and online resources. This API functions as a digital intermediary, automating the typically laborious processes of data querying and extraction from sources like PubMed, Cochrane Library, and other relevant repositories. By utilizing standardized requests, the API efficiently retrieves research findings, minimizing manual effort and the potential for human error. This automated data pipeline not only accelerates the Systematic Literature Review (SLR) process but also facilitates consistent and reproducible results, allowing researchers to focus on analysis and interpretation rather than data collection, and ultimately driving more efficient knowledge synthesis.

The Arc system significantly diminishes the traditionally laborious process of conducting Systematic Literature Reviews (SLRs), thereby accelerating the pace of scientific advancement. By automating key stages – such as database searching, data extraction, and initial synthesis – Arc frees researchers from repetitive tasks, allowing them to focus on critical appraisal and nuanced interpretation of evidence. This streamlined workflow not only reduces the time required to complete an SLR, but also enhances the potential for living reviews – continuously updated syntheses that reflect the most current understanding of a topic. Consequently, knowledge translation becomes more efficient, enabling evidence-based insights to reach practitioners and policymakers with greater speed and impact, ultimately fostering more informed decision-making and driving innovation across diverse fields.

A survey of 20 researchers revealed that 17 utilized and shared databases during their Systematic Literature Review (SLR) process, as illustrated by the distribution in the pie chart.

The development of Arc, as detailed in the paper, emphasizes a holistic approach to Systematic Literature Reviews. The tool isn’t simply about automating tasks; it’s about reshaping the entire research workflow to minimize cognitive burden and foster deeper insights. This resonates with G.H. Hardy’s assertion: “Mathematics may be compared to a tool, and the only question about it is whether it does its job or not.” Arc, much like a well-crafted mathematical instrument, aims to ‘do its job’-facilitating knowledge synthesis-not through brute force, but through elegant design and a careful consideration of how each component interacts with the overall system. The paper underscores that modifying one aspect of the review process-like information retrieval-has ripple effects, demanding a unified and thoughtful architecture, mirroring the interconnectedness Hardy implicitly acknowledges.

Beyond the Horizon

The presentation of Arc suggests a familiar pattern: tooling, while necessary, rarely addresses fundamental issues of knowledge work. The systematic literature review, despite its noble aims, often resembles an archaeological dig – painstakingly unearthing fragments, then struggling to reconstruct the original city. Arc offers a more navigable infrastructure, but the true challenge lies not in faster excavation, but in evolving the very blueprints for knowledge synthesis. The field must move beyond simply automating existing workflows, toward designs that support genuinely strategic exploration.

A crucial limitation remains the reliance on pre-defined search parameters. Current systems excel at finding what is known to be relevant, but falter when faced with the genuinely novel. Future iterations must incorporate mechanisms for serendipity – for allowing researchers to stumble upon unexpected connections, to build bridges between disparate fields. The ideal system doesn’t simply deliver information; it fosters insight.

Ultimately, the longevity of such tools depends on their ability to adapt. Infrastructure should evolve without rebuilding the entire block. The focus should shift toward modularity and interoperability, allowing researchers to seamlessly integrate new methods and data sources. The goal isn’t a finished product, but a continuously refined ecosystem – a living map of the ever-expanding landscape of knowledge.

Original article: https://arxiv.org/pdf/2603.05514.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Weight of Evidence: Challenges in Knowledge Synthesis

Arc: An Integrated System for Streamlined Reviews

Validating Arc: User Experience and Efficiency Gains

Towards Living Systematic Reviews and Open Science

Beyond the Horizon

See also: