Author: Denis Avetisyan
A new wave of AI-powered web agents is automating the process of gathering legal information, potentially bridging the access-to-justice gap for underserved populations.

This review explores the development and application of multimodal large language model-based web agents for automated legal information retrieval and task completion.
Despite the increasing availability of online legal information, navigating complex websites and forms remains a significant barrier to access to justice. This paper introduces LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents, a framework employing multimodal large language models to create autonomous web agents capable of bridging this gap. Our system achieves an 84.4% average success rate on real-world legal tasks, demonstrating high-level autonomy in complex web interactions. Could such agentic AI systems fundamentally reshape how citizens engage with the legal system and ultimately expand access to justice for all?
The Echo of Unmet Need
A pervasive imbalance characterizes the landscape of justice, where substantial legal needs frequently go unmet due to limitations in accessing vital information. This disparity isn’t simply a matter of inconvenience; it actively hinders equitable outcomes across numerous facets of life, from housing and employment to family law and consumer rights. Individuals lacking the resources to navigate complex legal systems, or even simply understand their rights, find themselves at a distinct disadvantage. Consequently, legal vulnerabilities disproportionately affect marginalized communities and those with limited financial means, perpetuating cycles of inequality. The inability to access clear, reliable legal information doesn’t necessarily indicate a lack of valid claims, but rather a systemic barrier preventing individuals from asserting them effectively, ultimately undermining the very foundations of a just society.
For many individuals, navigating the legal system is hindered not by a lack of right, but by a practical inability to access relevant information. Traditional legal research demands considerable time, often requiring hours of sifting through case law, statutes, and regulations – a luxury unavailable to those already facing hardship. Beyond time constraints, the expense associated with legal databases, professional legal counsel, and even simply traveling to legal resource centers creates a significant financial barrier. This disparity disproportionately affects low-income communities, those with limited digital literacy, and individuals residing in areas with scarce legal services, effectively denying them equal access to justice and perpetuating systemic inequities. The complexities and costs inherent in conventional legal research therefore serve to widen the gap between legal need and effective recourse, underscoring the urgency for more accessible solutions.
The potential for automated, web-based legal tools to broaden access to justice hinges critically on advances in artificial intelligence. These platforms aim to simplify complex legal information and provide guidance to individuals who cannot afford traditional legal counsel, yet their effectiveness relies on AI’s capacity to accurately interpret legal language, understand nuanced case details, and deliver relevant, reliable results. Current challenges include developing algorithms that can discern ambiguity in legal texts, navigate the ever-changing landscape of laws and precedents, and avoid perpetuating existing biases present in legal data. Successfully overcoming these hurdles requires not only sophisticated natural language processing and machine learning techniques, but also a commitment to transparency and accountability in the design and deployment of these increasingly influential tools, ensuring equitable outcomes for all users.

The Automated Advocate: A Framework Emerges
LegalWebAgent is a newly developed framework designed to automate complex legal tasks that require interaction with web-based resources. It utilizes multimodal Large Language Models (LLMs), enabling it to process both text and visual information from webpages. This approach allows the system to understand and respond to legal documents, forms, and databases accessible online. The framework’s core functionality centers on the LLM’s ability to interpret task requests and translate them into a sequence of web interactions, effectively performing legal research, data extraction, and form completion without direct human intervention. The integration of multimodal LLMs distinguishes it from traditional web automation tools by enabling a higher degree of understanding and adaptability in navigating and interpreting complex legal web content.
LegalWebAgent’s autonomous operation is achieved through implementation of the Browse-Use framework, which facilitates iterative browsing and information utilization. The system employs Playwright, a reliable web automation library, to programmatically control a web browser, enabling actions such as navigating to specified URLs, filling forms, and clicking on interactive elements. This allows LegalWebAgent to independently access and interact with legal websites, extracting data and completing tasks without direct human intervention. Playwright’s capabilities provide the foundation for simulating user behavior and handling dynamic content commonly found on modern web pages, ensuring consistent and repeatable performance.
LegalWebAgent utilizes a predefined Action Library to execute web-based tasks autonomously. This library consists of a set of discrete actions, including simulating user clicks on webpage elements, inputting text into form fields, and performing scrolling operations to navigate content. These actions are not hardcoded but are instead dynamically selected and chained together based on the task requirements and the current webpage state. The Action Library enables the system to mimic human web interaction by executing a sequence of standardized operations, facilitating access to and extraction of information from legal websites without requiring manual intervention. Each action within the library is designed to be modular and reusable, contributing to the system’s adaptability and scalability.
LegalWebAgent improves data extraction accuracy and system resilience by combining HTML and visual analysis of webpages. HTML analysis parses the document’s structural elements, identifying tags and attributes to locate relevant information. However, reliance on HTML alone is insufficient due to increasingly complex website designs and the use of images containing text. Therefore, LegalWebAgent incorporates visual analysis, utilizing Optical Character Recognition (OCR) to extract text embedded within images and computer vision techniques to identify elements not explicitly defined in the HTML structure. This dual approach allows the system to accurately interpret content regardless of its presentation, mitigating issues caused by dynamic content, image-based text, or poorly structured HTML.

Empirical Validation: A Test of Capabilities
The evaluation process utilized the LegalWebAgent framework to assess the performance of three leading Large Language Models (LLMs): GPT-4o, Claude-Sonnet-4, and DeepSeek-v3.1. This framework provides a standardized environment for testing LLMs on complex tasks, specifically within the domain of legal research. The selection of these models represents a cross-section of current state-of-the-art capabilities in natural language processing and reasoning. LegalWebAgent facilitated a comparative analysis, allowing for consistent measurement of each model’s ability to perform defined legal web-based tasks and providing a basis for quantifying performance metrics like task success rate, runtime, and token consumption.
The evaluation dataset comprised complex search tasks specifically designed to replicate the demands of legal research within the Québec Civil Law system. These tasks were not simple keyword searches, but required the LLMs to synthesize information from multiple web sources, interpret legal terminology, and identify relevant precedents to address nuanced legal questions. The complexity was introduced through tasks demanding the identification of specific articles within the Civil Code of Québec, the application of legal principles to hypothetical scenarios, and the differentiation between similar but distinct legal concepts, mirroring the cognitive load experienced by legal professionals conducting research.
The evaluation of multimodal Large Language Models (LLMs) within the LegalWebAgent framework utilized two primary performance indicators: Task Success Rate and Token Consumption. Task Success Rate quantifies the percentage of complex search tasks, grounded in Québec Civil Law, that were completed accurately by each model. This metric directly assesses the LLM’s ability to provide correct and relevant information in response to legal research scenarios. Token Consumption, measured in thousands of tokens (k), represents the computational cost associated with processing each task; lower token consumption indicates greater efficiency in utilizing computational resources and potentially faster response times. These indicators were chosen to provide a comprehensive assessment of both the effectiveness and efficiency of each LLM in an automated legal web task context.
The LegalWebAgent system achieved an overall Task Success Rate of 86.7% when evaluated on complex legal search tasks grounded in Québec Civil Law. Performance varied between Large Language Models (LLMs) integrated within the framework; both GPT-4o and DeepSeek-v3.1 attained a Task Success Rate of 86.7%. Claude-Sonnet-4 demonstrated a lower success rate, achieving 80.0% on the same dataset. These results indicate a high level of automation potential for legal web tasks using the LegalWebAgent system, with GPT-4o and DeepSeek-v3.1 exhibiting comparable accuracy in completing the defined tasks.
Performance benchmarking within the LegalWebAgent framework revealed significant differences in efficiency between the evaluated Large Language Models. GPT-4o completed tasks with an average runtime of 90 seconds, a substantial improvement over Claude-Sonnet-4’s 416 seconds and DeepSeek-v3.1’s 730 seconds. Furthermore, GPT-4o’s token consumption was markedly lower, utilizing 20,000 tokens per task compared to the 227,000 tokens consumed by Claude-Sonnet-4 and the 195,000 tokens used by DeepSeek-v3.1. These metrics indicate that GPT-4o not only processed requests more quickly but also did so with considerably greater resource efficiency within the tested legal research context.

The Horizon of Automation: Limitations and Future Growth
Investigations into the performance of Large Language Models (LLMs) reveal a consistent decline in Task Success Rate as search complexity increases. Specifically, when presented with queries demanding multiple filters or conditional stipulations, the models demonstrably struggle to maintain accuracy and completeness. This suggests an inherent limitation in their capacity to effectively parse and synthesize nuanced information, hindering their ability to navigate intricate search landscapes. The observed reduction in performance isn’t simply a matter of increased processing time; rather, it indicates a fundamental challenge in reasoning and information retrieval when faced with complex criteria, prompting further research into more robust query processing techniques and enhanced model architectures.
The observed limitations in handling complex search tasks highlight a critical need to advance the reasoning capabilities of large language models (LLMs). Current architectures often struggle with queries demanding multiple filters or nuanced conditions, suggesting a deficiency in their ability to effectively process and synthesize information. Consequently, research must prioritize developing more sophisticated task decomposition strategies – methods allowing LLMs to break down complex problems into smaller, more manageable sub-tasks. This approach, mirroring human problem-solving, could involve identifying relevant information, establishing logical connections, and iteratively refining search parameters. Ultimately, bolstering both reasoning and decomposition skills will be paramount to unlocking the full potential of LLMs in tackling intricate challenges and delivering consistently reliable results.
The integration of multimodal Large Language Models (LLMs) into the LegalWebAgent framework demonstrates a promising avenue for the automation of legal research processes. By processing and understanding information from diverse sources – including text, images, and potentially audio/video evidence – these LLMs move beyond traditional keyword-based searches. This capability allows for a more nuanced comprehension of legal issues and the identification of relevant precedents, statutes, and case law that might otherwise be missed. Consequently, LegalWebAgent not only streamlines the research process for legal professionals but also holds the potential to broaden access to justice by making legal information more readily available and understandable to the public, ultimately reducing barriers to legal recourse and promoting a more equitable legal system.
Continued development centers on bolstering the LegalWebAgent framework through iterative refinement and a dedicated focus on improving the underlying large language model’s reasoning capabilities. Researchers intend to move beyond current limitations by implementing more nuanced algorithms for complex query decomposition and knowledge synthesis. This will not only enhance performance within established legal research tasks but also facilitate expansion into previously inaccessible areas of law, such as regulatory compliance and intellectual property analysis. Ultimately, the goal is to create a versatile and adaptive system capable of addressing a broad spectrum of legal challenges and democratizing access to vital legal information across diverse domains.
The pursuit of automated web interaction, as demonstrated by LegalWebAgent, feels less like construction and more like tending a garden. One nurtures the potential within the system, knowing full well that the initial design will inevitably succumb to the pressures of a changing digital landscape. As Donald Knuth observed, “Premature optimization is the root of all evil.” This rings true; focusing solely on immediate efficiency overlooks the inherent dynamism of legal websites and the need for adaptable agents. The architecture isn’t a solution, but a compromise frozen in time, a temporary scaffolding against the relentless tide of technological evolution. The system will change, dependencies will remain, and the garden will require constant tending.
What Lies Ahead?
The construction of ‘LegalWebAgent’ – and all such agentic systems – reveals less a solution to access to justice, and more a postponement of inevitable brittleness. Each successfully automated task on a legal website is, simultaneously, a prophecy of its failure. Websites shift, schemas evolve, and the carefully crafted prompts that coax information from these models will, with predictable regularity, return nonsense. The true metric isn’t task completion rate, but the speed of decay.
Future work will inevitably focus on ‘robustness,’ a comforting illusion. The pursuit of generalized web interaction, independent of site-specific knowledge, is a phantom. These agents aren’t navigating the web; they’re memorizing specific instances. The interesting question isn’t how to make them adaptable, but how to design for graceful degradation – systems that admit their own limitations and revert to human oversight before dispensing incorrect legal guidance.
Ultimately, the value may lie not in automating legal research, but in lowering the bar for human contribution. Perhaps these models become sophisticated scaffolding, assisting paralegals and pro bono workers, rather than replacing them. A tool for augmenting human capacity, not for chasing the unsustainable dream of fully automated justice. The system isn’t the goal; it’s a temporary reprieve from complexity.
Original article: https://arxiv.org/pdf/2512.04105.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Clash Royale Best Boss Bandit Champion decks
- Best Hero Card Decks in Clash Royale
- Clash Royale Witch Evolution best decks guide
- Clash Royale December 2025: Events, Challenges, Tournaments, and Rewards
- Ireland, Spain and more countries withdraw from Eurovision Song Contest 2026
- JoJo’s Bizarre Adventure: Ora Ora Overdrive unites iconic characters in a sim RPG, launching on mobile this fall
- ‘The Abandons’ tries to mine new ground, but treads old western territory instead
- How to get your Discord Checkpoint 2025
- Best Builds for Undertaker in Elden Ring Nightreign Forsaken Hollows
- Clash of Clans Meltdown Mayhem December 2025 Event: Overview, Rewards, and more
2025-12-07 16:06