Can AI Solve Unsolved Math Problems?

Author: Denis Avetisyan


A new study demonstrates the potential of large language models to assist in mathematical discovery, tackling long-standing challenges like the Erdős problems.

Researchers applied a large language model, Aletheia, to explore conjectures and verify potential solutions in the field of combinatorial number theory.

Despite the longstanding challenge of automating mathematical discovery, this paper, ‘Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems’, investigates the potential of large language models to address open conjectures. By systematically evaluating 700 problems from Bloom’s Erdős Problems database, we demonstrate that AI can both verify existing solutions and uncover novel approaches, resolving 13 previously ‘Open’ problems-often revealing that their status stemmed from obscurity rather than inherent difficulty. However, this work also highlights critical issues surrounding originality and the potential for ‘subconscious plagiarism’ when applying AI to mathematical research. Could this hybrid AI-human approach fundamentally reshape the process of mathematical exploration and validation?


The Endless Frontier: Why Problems Persist

Mathematics, despite centuries of advancement, remains punctuated by enduring enigmas – conjectures known as ‘Open Problems’ that have defied resolution by the brightest minds. These aren’t simply gaps in knowledge; they represent fundamental challenges that have resisted decades, even centuries, of rigorous effort. Examples range from the Riemann Hypothesis, concerning the distribution of prime numbers, to P versus NP, a question at the heart of computational complexity. The persistence of these problems isn’t due to a lack of effort, but rather to their inherent difficulty, often stemming from subtle complexities or a need for entirely new mathematical tools. Their continued resistance serves as a potent reminder of the boundless nature of mathematical inquiry and the limits of current understanding, fueling ongoing research and inspiring new generations of mathematicians to confront these formidable challenges.

Many mathematical challenges aren’t overcome due to a lack of fundamental tools, but rather because the number of possibilities to examine grows at an astonishing rate. This “combinatorial explosion” means that even problems seemingly simple to state can require evaluating [latex]n![/latex] potential solutions, where [latex]n[/latex] represents a relatively small number. Consequently, brute-force approaches become computationally intractable, and even clever optimizations struggle to prune the search space effectively. The sheer scale of possibilities often obscures the underlying patterns or principles that would lead to a resolution, creating a barrier that resists even the most dedicated mathematical inquiry and highlighting the need for fundamentally new approaches to problem-solving.

Many unsolved mathematical problems present challenges not because of a lack of foundational tools, but due to the sheer scale of possibilities that must be considered. Traditional approaches, reliant on systematic exploration or clever deduction, quickly become overwhelmed by what is known as ‘combinatorial explosion’ – a rapid increase in the number of potential solutions that exceeds the capacity of even the most powerful computers. For example, attempting to verify a conjecture involving arrangements of numbers or patterns quickly leads to a number of cases that grow factorially, rendering exhaustive search impractical. This limitation isn’t a sign of intellectual failure, but rather a fundamental constraint; the problem space is simply too large to navigate using conventional techniques, necessitating novel approaches such as probabilistic methods, heuristics, or the leveraging of entirely new mathematical frameworks to make even incremental progress.

Aletheia: Chasing Proofs with a Digital Mind

Aletheia is a mathematics research agent leveraging the capabilities of the Gemini Deep Think large language model. This implementation utilizes Gemini’s advanced reasoning and pattern recognition abilities, specifically tailored for mathematical exploration. The agent’s architecture allows it to process and synthesize complex mathematical statements, definitions, and previously published research. Gemini Deep Think provides the foundational computational power and knowledge base necessary for Aletheia to operate as an autonomous research tool, enabling it to analyze mathematical problems and formulate potential solutions without direct human guidance.

Aletheia operates within the ErdosProblems.com database, a curated collection of open mathematical problems and conjectures. Its systematic analysis involves parsing problem statements, identifying key variables and relationships, and then applying a suite of algorithmic techniques to explore potential solution pathways. This process includes the generation of lemmas, the exploration of analogous problems, and the construction of potential proof sketches. The system doesn’t rely on pre-existing proof libraries; instead, it attempts to formulate novel approaches by analyzing the conjecture’s structure and constraints, effectively creating a search space of potential arguments for evaluation.

Unlike conventional automated theorem provers which operate by checking the validity of user-supplied proofs or exhaustively searching pre-defined proof spaces, Aletheia is designed to autonomously discover new proofs for mathematical conjectures. This is achieved through a process of hypothesis generation and exploration, where the system formulates potential proof strategies based on its understanding of mathematical principles and then attempts to construct a logical argument supporting the conjecture. This contrasts with verification systems that require a complete proof structure to be provided as input; Aletheia aims to build the proof from first principles, effectively acting as an independent mathematical researcher.

Formal Verification: When Rigor Must Reign

Formal verification establishes the absolute correctness of a system or component through mathematically rigorous proof. Unlike empirical testing, which can only demonstrate the presence of errors, formal verification aims to prove the absence of errors within a defined specification. This is achieved by constructing a formal model of the system and then using logical deduction – often employing theorem provers or model checkers – to demonstrate that the model satisfies the stated properties. The resulting proof provides a level of assurance unattainable through other validation methods, as it relies on the consistency of the underlying mathematical system itself. A successful verification demonstrates that, given the axioms and rules of logic used, the system will behave as specified under all possible conditions, effectively eliminating the possibility of certain classes of bugs.

Lean is a dependent type theory system and a formal verification platform built on the Elf project. It allows users to express mathematical definitions, theorems, and proofs as computer-verifiable data. The core of Lean is its powerful tactic language, enabling automated proof construction and simplification. Unlike traditional proof assistants, Lean incorporates a sophisticated type system that can express complex mathematical structures and relationships, including inductive types and higher-order logic. Its capabilities extend to verifying properties of software and hardware, and the system features a growing library of formally verified mathematical theorems and algorithms. Lean’s output is a fully formalized proof, ensuring the logical correctness of the verified argument and offering a high degree of confidence in the underlying mathematical statements.

Formal verification, despite its substantial computational demands, serves as a complementary technique to Aletheia’s exploratory methods by providing a definitive assessment of proposed solutions. Aletheia facilitates rapid prototyping and hypothesis generation, but lacks the capacity to prove correctness. Formal verification addresses this limitation by utilizing mathematical proofs to establish the absolute validity of a system’s design. This dual approach – exploratory development followed by rigorous formal checks – enhances confidence in the overall system. The computational cost is mitigated by applying formal methods selectively, focusing on critical components or areas where Aletheia’s results require independent validation. This targeted application optimizes resource utilization while ensuring a high degree of assurance in the final product.

Geometric Shadows and Diophantine Secrets

The seemingly abstract realm of Diophantine equations – those demanding integer solutions – often reveals a hidden geometric structure. This connection isn’t merely aesthetic; it provides powerful tools for tackling problems that resist purely algebraic approaches. Researchers are increasingly translating these number-theoretic challenges into questions about the arrangement of points in space, allowing them to leverage geometric intuition and techniques. For example, finding integer solutions to certain equations can be reinterpreted as determining if specific geometric configurations are possible. This interplay has proven fruitful in areas like number theory and cryptography, where geometric insights offer new avenues for both proving theorems and designing algorithms. The translation allows for visual and spatial reasoning to be applied to problems previously considered purely numerical, leading to breakthroughs in understanding these long-standing mathematical puzzles.

The seemingly abstract realm of Pell equations finds a surprising connection to the construction of point sets and their geometric properties. Research demonstrates that solutions to these equations can be used to define coordinates for points in space, allowing for the creation of configurations with precisely controlled distances between them. This connection directly influences the ‘Transfinite Diameter’ – a measure of the largest possible distance between any two points within a given set – as the structure dictated by the Pell equation inherently limits or expands this maximum separation. By manipulating the parameters within the Pell equation, researchers can systematically explore and construct point sets exhibiting specific distance characteristics, offering insights into problems related to sphere packing, error-correcting codes, and the broader field of discrete geometry. The study of these configurations not only deepens understanding of the Pell equation itself but also provides a powerful tool for investigating geometric problems in higher dimensions.

The concept of logarithmic capacity provides a crucial lens through which to analyze the distribution of points in multi-dimensional space, directly impacting the solvability of Diophantine equations. Recent research has rigorously defined the limit of [latex]g_d(n)/d^{n-1}[/latex] as [latex]1/(n-1)![/latex] for [latex]n≥2[/latex], establishing a quantifiable lower bound on the number of points necessary to generate at least [latex]n[/latex] distinct distances within a [latex]d[/latex]-dimensional space. This finding illuminates the inherent ‘spread’ required of these point sets and offers a powerful tool for evaluating potential solutions to long-standing mathematical problems. Furthermore, the specific case of [latex]g_d(1) = 2[/latex] for [latex]n=1[/latex] confirms that a minimum of two points is required to define a single distance, laying a foundational element for understanding higher-dimensional configurations.

The pursuit of automated theorem proving, as detailed in this study of the Erdős problems, feels…familiar. It’s a fresh coat of paint on a very old machine. This paper highlights Aletheia’s attempts to verify and even discover mathematical truths, but one is reminded of the inherent limitations. As Donald Knuth observed, “Premature optimization is the root of all evil.” This pursuit of AI-assisted mathematics, while seemingly novel, risks prioritizing elegant solutions over robust, verifiable ones. The system can suggest paths, but ultimately, human oversight remains crucial-a tedious necessity, perhaps, but one that prevents the machine from confidently building castles on sand. The core idea of tackling ss-distance sets with AI is interesting, but it’s a complex problem that demands more than just statistical inference.

What’s Next?

The application of large language models to Erdős problems, or any established mathematical domain, reveals less a revolution than a new source of elegantly phrased, and potentially misleading, heuristics. Aletheia, and systems like it, will undoubtedly generate conjectures. Many will be previously known, some will be trivially false, and a statistically insignificant few will require actual verification. The real cost, predictably, isn’t computational-it’s human time spent distinguishing signal from noise. The “semi-autonomous” designation feels generous; it implies a degree of independent thought that currently exists only in marketing materials.

Future work will inevitably focus on automating the verification process, but that merely shifts the bottleneck. Even if a machine can prove a conjecture, understanding why it’s true – gaining genuine mathematical insight – remains firmly in the human domain. Expect a proliferation of tools that efficiently re-discover existing results, presented as novel breakthroughs. The field will measure progress not in solved problems, but in the sophistication of its automated truth-finders.

If code looks perfect, no one has deployed it yet. The current excitement around AI-assisted mathematics should be tempered by the understanding that any framework, no matter how theoretically sound, will ultimately be judged by its practical failures. The true test won’t be whether these models can find problems, but whether they can withstand the relentless pressure of production mathematics, where elegance is secondary to correctness, and the most valuable contribution is often a well-documented bug fix.


Original article: https://arxiv.org/pdf/2601.22401.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-02-02 16:45