Beyond Words: Building Fact-Checkers That ‘See’ and Reason

Author: Denis Avetisyan

Researchers are developing new AI systems that go beyond simply analyzing text to verify claims by integrating visual information and robust reasoning capabilities.

A novel multimodal fact-checking workflow leverages an agent-based architecture to rigorously assess information veracity, enabling a systematic decomposition of claims and evidence for enhanced reliability-a process fundamentally reliant on the logical consistency of <span class="katex-eq" data-katex-display="false"> p \land q </span>, where <i>p</i> represents the claim and <i>q</i> the supporting evidence. — A novel multimodal fact-checking workflow leverages an agent-based architecture to rigorously assess information veracity, enabling a systematic decomposition of claims and evidence for enhanced reliability-a process fundamentally reliant on the logical consistency of $p \land q$ , where p represents the claim and q the supporting evidence.

This review introduces AgentFact, an agent-based framework, and RW-Post, a new dataset designed to advance explainable multimodal fact-checking using large language models.

Despite advances in automated fact-checking, verifying multimodal claims-those incorporating both text and images-remains a significant challenge due to limited reasoning capabilities and shallow evidence utilization. This work, ‘Multimodal Fact-Checking: An Agent-based Approach’, addresses this gap by introducing RW-Post, a new dataset enabling explainable verification, alongside AgentFact, an agent-based framework designed to emulate human fact-checking workflows. Through strategic evidence retrieval, visual analysis, and iterative reasoning, AgentFact substantially improves both the accuracy and interpretability of multimodal claim verification. Can this approach pave the way for more robust and trustworthy automated systems capable of discerning truth in an increasingly complex information landscape?

The Erosion of Truth in the Modern Information Landscape

The sheer volume of information circulating online presents a significant challenge to effective fact-checking. Existing methods, often designed for simple, text-based assertions, are increasingly strained by the complexity of modern misinformation. Claims are no longer confined to straightforward statements; they frequently involve nuanced arguments, statistical data, or combinations of text, images, and video – known as multimodal content. This shift demands fact-checking systems capable of not only identifying false statements but also analyzing the relationships between different information types and assessing the validity of complex reasoning. Current approaches struggle with this, often relying on manual review or superficial analysis, leaving a considerable gap in the ability to reliably debunk sophisticated disinformation campaigns and maintain public trust in online information.

Conventional fact-checking often delivers a verdict – true or false – without illuminating the reasoning behind it, creating a significant barrier to public trust. This opacity stems from methodologies that prioritize simply debunking claims rather than dissecting the evidence and logic underpinning them. Without a clear explanation of how a conclusion was reached, audiences are left to rely on faith in the fact-checker’s authority, rather than independent evaluation of the information itself. This lack of transparency is particularly problematic with complex issues, where nuanced arguments require detailed scrutiny. Consequently, even accurate fact-checks can fail to persuade those who disagree, or may be dismissed as biased, hindering the broader goal of fostering informed public discourse and critical thinking.

This architecture details multimodal fact-checking agents designed to verify claims by integrating and reasoning over both textual and visual evidence.

A Modular Architecture for Rigorous Fact Verification

AgentFact utilizes an agent-based system architecture to address the complexities of automated fact-checking. This involves decomposing the fact-checking process into discrete tasks, each assigned to a specialized agent. These agents operate autonomously, communicating and coordinating to verify claims. Specifically, agents are designed for functions such as claim analysis, evidence retrieval from external sources (e.g., web searches, knowledge bases), and logical reasoning over retrieved information. This modular design allows for scalability and facilitates the incorporation of new verification techniques by simply adding or modifying individual agent components without disrupting the entire system.

The AgentFact framework utilizes a Strategy Planning Agent to initially dissect incoming claims into smaller, verifiable components. This decomposition process informs the subsequent actions of specialized agents, specifically directing external evidence retrieval based on identified key assertions. Retrieved evidence is then processed by reasoning agents which apply logical inference and information synthesis techniques to determine the veracity of each claim component. The Strategy Planning Agent dynamically adjusts the verification process based on the complexity of the claim and the availability of supporting evidence, ensuring focused and efficient fact-checking.

The Explanation Generation Agent within AgentFact is responsible for consolidating the outputs of the fact-checking pipeline – including retrieved evidence and the reasoning applied to that evidence – into a coherent, natural language explanation. This agent employs techniques in natural language generation to articulate the verification process in a manner accessible to human reviewers, detailing how the system arrived at its conclusion regarding a claim’s veracity. The resulting explanation isn’t merely a summary of findings, but a step-by-step account of the reasoning, allowing for auditability and facilitating user trust by demonstrating the system’s logical progression from claim to verification result.

Image annotations in a fact-checking dataset categorize images as providing direct visual evidence (<span class="katex-eq" data-katex-display="false">necessary</span>), supporting context (<span class="katex-eq" data-katex-display="false">relevant</span>), or being unrelated to claim verification (<span class="katex-eq" data-katex-display="false">irrelevant</span>). — Image annotations in a fact-checking dataset categorize images as providing direct visual evidence ( $necessary$ ), supporting context ( $relevant$ ), or being unrelated to claim verification ( $irrelevant$ ).

Harnessing LLMs with Structured Reasoning for Robust Verification

AgentFact utilizes Large Language Models (LLMs) to capitalize on their inherent abilities in natural language processing, including semantic understanding and text generation. However, the framework moves beyond solely relying on LLM capabilities by integrating structured reasoning techniques. This approach addresses limitations of LLMs, such as potential factual inaccuracies or a lack of transparent justification, by enforcing a logical sequence of steps in evidence evaluation and claim verification. The combination of LLM power with structured reasoning allows AgentFact to not only generate responses but also to articulate how those responses are derived from supporting evidence, enhancing both reliability and explainability.

AgentFact employs dedicated agents for external knowledge retrieval to substantiate claim verification. The ‘Text Evidence Retrieval Agent’ utilizes search engines and knowledge bases to locate relevant textual evidence, while the ‘Image Retrieval Agent’ focuses on visually corroborating information through image searches. These agents operate in parallel, sourcing evidence from diverse and reliable sources to provide contextual support for claims. Retrieved evidence is then used to assess the factual consistency of a claim, enabling the system to identify potential inaccuracies or biases and provide a verifiable justification for its conclusions. The effectiveness of these agents directly impacts the overall reliability and trustworthiness of the AgentFact framework.

Chain-of-Thought (CoT) prompting and Question-Guided prompting are employed to enhance the reasoning capabilities of the agents within the framework. CoT prompting involves structuring prompts to elicit a step-by-step reasoning process from the LLM, rather than a direct answer, allowing for intermediate logical inferences to be examined. Question-Guided prompting further refines this process by framing the reasoning as a series of targeted questions, directing the LLM to focus on relevant information and build a justification incrementally. These techniques improve the robustness of the generated justifications by explicitly outlining the reasoning pathway and enabling verification of each step, contributing to more reliable and explainable outputs.

Evaluations demonstrate that the AgentFact framework achieves state-of-the-art performance across multiple datasets designed for fact verification. Specifically, AgentFact surpasses the performance of existing models including LEMMA, DEFAME, and Sniffer, as measured by standard metrics such as accuracy and F1-score. These results indicate a significant advancement in the ability to accurately verify claims and provide supporting evidence, consistently outperforming previously established benchmarks in the field of automated fact verification.

Knowledge distillation is employed within the AgentFact framework to create smaller, more computationally efficient models without significantly reducing accuracy. This technique involves training a smaller “student” model to mimic the behavior of a larger, pre-trained “teacher” model. The student model learns to replicate not only the teacher’s predictions but also the probability distribution of those predictions, effectively transferring the knowledge embedded within the larger model. This results in a model with a reduced parameter count and faster inference speed, enabling deployment on resource-constrained devices or within systems requiring low latency, while maintaining competitive performance on fact verification tasks.

Leveraging contextual information (purple) enables large language models to accurately identify links as either posts or supporting evidence (pink), improving rationale selection.

Towards a Comprehensive System for Real-World Information Integrity

AgentFact distinguishes itself through an ability to assess the veracity of claims presented not just as text, but also through visual data like images and videos. This multimodal approach is critical for navigating the complexities of modern information, where misleading visuals frequently accompany – or even supplant – textual narratives. Beyond simple verification, the system actively addresses the pervasive issue of Out-Of-Context (OOC) detection, identifying instances where visual content is divorced from its original meaning or presented with deceptive intent. By cross-referencing visual evidence with supporting documentation and contextual information, AgentFact aims to provide a more comprehensive and reliable assessment of factual claims in an increasingly visually-driven world.

AgentFact addresses the challenge of complex claim verification through a process called Claim Decomposition. Rather than attempting to evaluate an entire statement at once, the system strategically breaks it down into a series of smaller, more easily analyzed sub-claims. This modular approach allows AgentFact to pinpoint specific factual components within a larger assertion, facilitating targeted evidence retrieval and reasoning. By tackling each sub-claim individually, the system improves both the accuracy and efficiency of its verification process, ultimately enhancing its ability to assess the overall factual soundness of intricate statements that would otherwise be difficult to evaluate comprehensively.

Rigorous human evaluation served as a critical benchmark for assessing AgentFact’s reasoning capabilities, revealing a demonstrable improvement in factual soundness. Trained annotators carefully reviewed the system’s responses, judging whether the conclusions drawn were logically supported by the evidence provided and accurately reflected the presented information. This assessment moved beyond simple accuracy metrics to focus on the quality of reasoning – whether the system not only arrived at a correct answer, but did so through a factually grounded and logically coherent process. The results indicated a significant positive shift in the system’s ability to avoid unsupported claims and maintain fidelity to the source material, bolstering confidence in its potential for reliable knowledge synthesis and information verification.

The system’s ability to verify claims is demonstrably linked to the complexity of the information needed, with performance decreasing as the number of supporting evidence items required increases. Specifically, while AgentFact performs robustly with claims substantiated by fewer evidence pieces, accuracy diminishes when verification necessitates integrating information from four to six distinct sources. This suggests a limitation in the current architecture’s capacity to effectively synthesize and reconcile a larger volume of evidence, potentially due to increased computational demands or challenges in discerning the most relevant information from a more extensive dataset. Further refinement of the system’s evidence aggregation and reasoning mechanisms is therefore crucial for tackling highly complex claims that demand comprehensive information synthesis.

A case study reveals that the model misclassified a claim due to discrepancies between its reasoning and supporting evidence compared to the ground truth.

The pursuit of robust fact-checking, as detailed in this work concerning AgentFact and the RW-Post dataset, echoes a fundamental tenet of mathematical rigor. It’s not simply about detecting falsehoods, but establishing verifiable proof of a claim’s validity-or lack thereof. This resonates with David Hilbert’s assertion: “One must be able to say in a finite number of steps that a problem is unsolvable.” While the scope differs-Hilbert addressed foundational limits of computation, and this research tackles practical verification-both emphasize the necessity of demonstrably conclusive reasoning. The AgentFact framework, by prioritizing evidence grounding and explainability, moves beyond empirical success to approach a more mathematically sound basis for truth assessment. The ability to trace a claim’s verification back to concrete evidence provides the ‘finite steps’ needed for a conclusive determination.

Beyond Verification: Charting a Course for Truth

The introduction of RW-Post and AgentFact represents a step, albeit a small one, toward a more rigorous science of truth. However, the field remains fundamentally challenged by the inherent ambiguity of natural language and the limitations of current representational systems. The focus, thus far, has been on detecting falsehoods; a more compelling, and considerably more difficult, endeavor lies in constructing systems capable of genuinely understanding the underlying semantics of a claim and its evidentiary support – a pursuit demanding more than mere correlation of features.

The current agent-based approach, while offering a degree of modularity, still relies on the fallible foundation of large language models. True progress necessitates a shift away from probabilistic reasoning and toward formal, provable systems. The aim should not be to mimic human cognition, with its inherent biases and inconsistencies, but to create a computational logic that transcends them. A system that can demonstrate, not simply assert, the veracity of a claim.

Future work must address the issue of context – not merely the surrounding text, but the broader web of knowledge that imbues a statement with meaning. This demands a move beyond isolated fact-checking toward a system capable of reasoning about beliefs, intentions, and the very nature of evidence itself. The pursuit of ‘explainable AI’ is, ultimately, a pursuit of logical completeness – a system where every inference is traceable to a first principle.

Original article: https://arxiv.org/pdf/2512.22933.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Erosion of Truth in the Modern Information Landscape

A Modular Architecture for Rigorous Fact Verification

Harnessing LLMs with Structured Reasoning for Robust Verification

Towards a Comprehensive System for Real-World Information Integrity

Beyond Verification: Charting a Course for Truth

See also: