How Scientists Actually Use AI Assistants

Author: Denis Avetisyan


A new dataset reveals detailed insights into how researchers are interacting with and leveraging artificial intelligence to accelerate their work.

The study documented two distinct interfaces of the Asta system-detailed in the work of Singh et al. (2025) and developed by the Allen Institute for Artificial Intelligence (2025)-to explore the evolving architecture of intelligent systems and their inherent susceptibility to the natural processes of decay and adaptation.
The study documented two distinct interfaces of the Asta system-detailed in the work of Singh et al. (2025) and developed by the Allen Institute for Artificial Intelligence (2025)-to explore the evolving architecture of intelligent systems and their inherent susceptibility to the natural processes of decay and adaptation.

The Asta Interaction Dataset provides a large-scale analysis of user behavior with AI-powered scientific question answering tools and their integration into research workflows.

Despite the increasing integration of artificial intelligence into scientific workflows, a comprehensive understanding of how researchers actually interact with these tools remains surprisingly limited. This paper addresses this gap by presenting and analyzing the Asta Interaction Dataset, a large-scale log of over 200,000 user queries and interactions with an LLM-powered research assistant encompassing literature discovery and question answering. Our analysis reveals that users treat these systems as collaborative partners, submitting complex queries and engaging with generated outputs in non-linear ways, and that experience fosters more targeted inquiry alongside deeper citation analysis. How can these insights inform the design of more effective and user-centered AI assistants for the future of scientific discovery?


The Erosion of Context in Scholarly Search

The prevailing methods of academic search frequently prioritize keyword matching, a technique that often overlooks the subtle context and intricate connections embedded within research literature. This approach treats words as isolated units, neglecting the ways in which concepts are defined, debated, and related across different studies. Consequently, valuable insights-particularly those expressed through paraphrasing, implication, or the exploration of tangential ideas-can remain hidden from researchers. The limitations of keyword searches become especially apparent when investigating interdisciplinary topics or emerging fields, where terminology may be inconsistent or concepts are not yet firmly established, leading to incomplete or misleading results despite a high volume of returned papers.

The contemporary landscape of academic research presents a significant challenge: an overwhelming abundance of information. Researchers routinely face not a scarcity of data, but a deluge, making the process of identifying genuinely relevant studies remarkably time-consuming. This information overload isn’t simply a matter of quantity; the difficulty lies in synthesizing findings across numerous papers, each presenting a fragmented piece of a larger puzzle. Traditional methods often require researchers to manually comb through abstracts, introductions, and conclusions, attempting to reconcile conflicting results or identify subtle connections. Consequently, workflows become inefficient, hindering the pace of discovery and potentially leading to overlooked insights as researchers struggle to connect the dots amidst the sheer volume of published work.

The persistence of established search tools inadvertently cultivates a cognitive bias known as functional fixedness within the research process. This phenomenon describes a tendency to rely on familiar search paradigms – typically keyword-based systems – even when more innovative approaches might yield superior results. Researchers, accustomed to these tools, can become mentally constrained, failing to recognize the potential of alternative methods such as semantic search or network analysis. Consequently, exploration of more effective research strategies is hampered, and opportunities to uncover hidden connections or synthesize information in novel ways are lost. This isn’t a failure of the tools themselves, but rather a cognitive consequence of their long-term dominance, illustrating how ingrained habits can impede scientific progress.

Keyword queries are dominant, with Asta users preferring natural language phrasing over S2 users, and while both groups primarily seek broad exploration or concept explanation, S2 users exhibit a stronger focus on retrieving specific papers.
Keyword queries are dominant, with Asta users preferring natural language phrasing over S2 users, and while both groups primarily seek broad exploration or concept explanation, S2 users exhibit a stronger focus on retrieving specific papers.

Asta: Reclaiming Meaning Through Semantic Understanding

Asta provides researchers with two primary interfaces, PaperFinder and ScholarQA, both built upon Large Language Model (LLM)-based Retrieval and Synthesis techniques. PaperFinder functions as an enhanced search tool, delivering ranked lists of relevant academic papers accompanied by concise summaries to aid in initial assessment. ScholarQA goes further, generating structured reports that directly address specific scientific questions posed by the user. This is achieved by not only retrieving relevant documents but also by synthesizing information from those documents using LLMs, offering a consolidated and focused answer rather than simply a list of sources. Both interfaces are designed to improve research efficiency by reducing the time spent sifting through numerous publications and accelerating the process of knowledge discovery.

PaperFinder and ScholarQA represent distinct approaches to knowledge access within the Asta platform. PaperFinder functions as a focused search tool, presenting researchers with a ranked list of relevant publications accompanied by concise summaries to facilitate rapid assessment of content. In contrast, ScholarQA operates as a question-answering system; it generates comprehensive, structured reports directly addressing specific scientific inquiries. These reports synthesize information extracted from multiple sources, offering a consolidated response rather than a list of individual papers. This difference in output format caters to different user needs – exploratory research with PaperFinder versus focused investigation with ScholarQA.

Both the PaperFinder and ScholarQA interfaces within Asta utilize Semantic Scholar as the primary source for initial document retrieval. However, the core functionality extends beyond simple keyword searches; Large Language Models (LLMs) are then applied to refine these results based on semantic understanding. This LLM-driven refinement includes re-ranking papers for relevance and, crucially, synthesizing information across multiple documents to produce novel insights. The LLMs do not simply return source documents; they generate summaries and structured responses, effectively performing knowledge distillation from the retrieved corpus.

The distribution of research intents on Asta reveals the typical tasks performed within each field of study.
The distribution of research intents on Asta reveals the typical tasks performed within each field of study.

Decoding Researcher Behavior: The Asta Interaction Dataset

The Asta Interaction Dataset consists of over 200,000 anonymized user queries, providing a substantial resource for behavioral analysis. Data points include ‘Session Duration’, measured in seconds, and detailed ‘Search Criteria’ which are categorized by query type and associated keywords. Additional captured metrics encompass user interaction events – such as document clicks and abstract views – and system-level data concerning response times. This comprehensive logging enables researchers to reconstruct user search sessions and quantitatively assess the effectiveness of PaperFinder and ScholarQA in addressing diverse information needs. The dataset’s scale facilitates statistically significant conclusions regarding user behavior patterns and system performance characteristics.

The Asta Interaction Dataset facilitates the analysis of user behavior across varying levels of experience, categorized into ‘User Experience Stages’. This categorization allows for the observation of distinct interaction patterns with both PaperFinder and ScholarQA. Data indicates that novice users tend to utilize broader search criteria within PaperFinder, while experienced users demonstrate a preference for more specific queries and a higher propensity to engage with ScholarQA’s structured answer format. These differences are measurable through metrics such as session duration, query complexity, and the frequency of utilizing advanced search features, providing insights into how user expertise influences information-seeking behavior on each platform.

Analysis of the Asta Interaction Dataset indicates a statistically significant correlation between user queries expressing ‘Abstract Intent’ – those seeking conceptual understanding – and increased engagement with the ScholarQA platform. Users initiating searches indicative of abstract intent exhibited a median session duration of 8 minutes on ScholarQA, double the 4-minute median observed for PaperFinder. This suggests ScholarQA effectively addresses user needs for structured information and detailed explanations of complex topics, while PaperFinder is more frequently utilized for quicker, fact-based searches. The extended session durations on ScholarQA imply users are not only finding relevant results, but also spending considerable time exploring the provided information.

User engagement with policy feedback (PF) reports decreases over time, likely due to their passive consumption, while engagement with search quality assessment (SQA) reports increases as users must click to access the associated web content, as shown by the 95% confidence intervals.
User engagement with policy feedback (PF) reports decreases over time, likely due to their passive consumption, while engagement with search quality assessment (SQA) reports increases as users must click to access the associated web content, as shown by the 95% confidence intervals.

Measuring Resonance: Click-Through Rates and the Dynamics of Engagement

Asta’s success hinges on effectively connecting researchers with pertinent information, a connection quantified by the click-through rate (CTR) on its generated report summaries. This metric serves as a direct indicator of both user engagement and the relevance of the synthesized content; a higher CTR suggests users find the summaries valuable and representative of their information needs. By meticulously tracking which summaries attract clicks, developers can refine Asta’s algorithms to prioritize the most useful information and improve the overall quality of its synthesis. Consequently, the CTR isn’t merely a number, but a dynamic signal guiding the evolution of the platform and ensuring it remains a powerful tool for navigating the complexities of scientific literature.

A detailed analysis of click-through rates, segmented by the complexity of user queries and varying levels of prior experience with Asta’s interface, reveals crucial insights into the system’s usability. Investigations demonstrate that users posing more nuanced or specialized questions often exhibit different engagement patterns compared to those with broader inquiries; for example, highly specific queries may require more refined presentation of synthesized results to maximize relevance. Furthermore, observing how novice users interact with the interface, contrasted with experienced researchers, highlights areas where intuitive design can be improved to accelerate knowledge discovery. These granular analyses don’t merely measure success; they pinpoint specific interface elements that either facilitate or hinder effective information retrieval, guiding iterative design improvements and ensuring Asta consistently meets the evolving needs of the scientific community.

Analysis of user behavior reveals a strong tendency to view generated reports not as one-time outputs, but as persistent resources for ongoing research; the revisitation rate stands at 50.5% for ScholarQA and 42.1% for PaperFinder. This suggests users are building knowledge incrementally, returning to synthesized findings as they delve deeper into complex topics. Furthermore, nearly 18.8% of users actively refine their initial queries after receiving a report, indicating an iterative exploration process where AI-generated summaries stimulate further investigation and a more nuanced understanding of the scientific landscape. This pattern of revisitation and refinement highlights the potential for these tools to facilitate a dynamic, rather than static, approach to knowledge discovery.

The increasing availability of synthesized knowledge, facilitated by AI-powered tools, represents a paradigm shift in scientific research. Rather than relying solely on exhaustive literature reviews, researchers can now leverage these tools to efficiently access condensed, relevant information, accelerating discovery and fostering innovation. Analysis of user engagement metrics, such as click-through and revisitation rates, demonstrates that scientists are not simply receiving information, but actively integrating these tools into their iterative research workflows. This suggests a move toward a more dynamic research process, where AI serves as a collaborative partner, enhancing human intellect and ultimately reshaping how knowledge is created and disseminated within the scientific community.

User engagement analysis across platforms reveals that section expansion is the most frequent action, exceeding feedback mechanisms, and is positively correlated with overall satisfaction, suggesting users actively explore content before providing evaluation.
User engagement analysis across platforms reveals that section expansion is the most frequent action, exceeding feedback mechanisms, and is positively correlated with overall satisfaction, suggesting users actively explore content before providing evaluation.

The Asta Interaction Dataset, as detailed in the study, reveals a fascinating dynamic within scientific workflows. Researchers aren’t simply seeking answers; they’re engaging in a conversation, iteratively refining their queries and assessing the assistant’s responses. This mirrors the inherent lifecycle of any system, a concept Brian Kernighan aptly captured when he stated, “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” Just as code requires continuous refinement, so too does the interaction with these AI tools, highlighting that even the most advanced systems are subject to ongoing adaptation and improvement within the context of user needs and expectations. The dataset provides a crucial snapshot of this evolutionary process.

What Lies Ahead?

The Asta Interaction Dataset offers a snapshot, not a stasis. It reveals patterns in the early adoption of AI research assistants, but those patterns will inevitably shift. Current analyses illuminate how researchers currently frame questions and interpret responses; however, the tool itself will reshape the questions asked. Technical debt accrues even in the most elegant systems-the initial utility of easily answered queries will give way to the more complex task of discerning meaningful answers from an increasingly verbose output. Uptime, that rare phase of temporal harmony, cannot be guaranteed; the system will degrade, and adaptation will be crucial.

Future work must address the inherent asymmetry between tool capability and user expectation. The dataset points to areas where the assistant excels at simple information retrieval, but falls short in nuanced reasoning. The true challenge isn’t simply improving accuracy, but understanding how researchers integrate these tools into existing workflows-and where the friction lies.

Ultimately, this is a study in erosion. The initial enthusiasm will weather, revealing the underlying bedrock of human cognitive processes. The dataset provides a valuable baseline for measuring that decay-and, perhaps, for designing systems that age with a semblance of grace.


Original article: https://arxiv.org/pdf/2602.23335.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-02-27 14:21