Beyond Problem Solving: How AI is Learning to Think Creatively

Author: Denis Avetisyan


New research explores how large language models can move past simply answering questions to generate genuinely novel ideas and solutions.

Analytical reasoning remains bound by established solution spaces, while creative approaches-including transfer from analogous domains, introduction of novelty, and fundamental rule alteration-actively reshape the problem itself, expanding beyond initial constraints to forge entirely new possibilities.
Analytical reasoning remains bound by established solution spaces, while creative approaches-including transfer from analogous domains, introduction of novelty, and fundamental rule alteration-actively reshape the problem itself, expanding beyond initial constraints to forge entirely new possibilities.

This paper introduces the Universe of Thoughts (UoT) framework, enabling large language models to perform combinational, exploratory, and transformational creative reasoning.

While Large Language Models excel at conventional reasoning tasks, generating truly creative solutions-essential for domains like drug discovery and strategic innovation-remains a significant challenge. This paper introduces a novel framework, Universe of Thoughts: Enabling Creative Reasoning with Large Language Models, designed to equip LLMs with the capacity for creative problem-solving through systematic exploration of a ‘universe of thoughts’. We propose three core creative reasoning paradigms-combinational, exploratory, and transformative-and materialize them with a new set of methods called UoT, alongside a benchmark for evaluating creativity based on feasibility, utility, and novelty. Can this approach unlock a new era of LLM-driven innovation, extending beyond mere problem-solving to genuine creative discovery?


Beyond Novelty: Deconstructing the Illusion of Innovation

The sheer volume of ideas is insufficient to define genuine innovation; impactful progress necessitates a qualitative dimension. Simply generating numerous concepts offers little value if those concepts lack practical application or represent only minor variations on existing themes. True breakthroughs arise from ideas that not only introduce something genuinely new, diverging from established norms, but also possess demonstrable value – offering a tangible benefit, solving a relevant problem, or improving an existing process. This interplay between novelty and utility is fundamental; an idea may be strikingly original yet remain inconsequential, or conversely, a useful solution may lack the spark of innovation needed to disrupt or inspire further development. Therefore, assessing creative output requires a careful consideration of both its originality and its potential impact.

Creative Reasoning, as understood within this framework, isn’t simply about generating ideas; it’s a measurable process defined by the interplay of three distinct qualities. Novelty assesses the originality of a solution, determining how far it deviates from previously known concepts. However, a truly creative solution must also possess Utility – demonstrating practical value and effectively addressing the problem at hand. Crucially, Feasibility completes the triad, ensuring the proposed solution is realistically implementable given available resources and constraints. A strength in one metric doesn’t guarantee creative reasoning; instead, it’s the balance and synergistic interaction of all three that ultimately determines the quality and impact of a solution.

The evaluation of creative problem-solving often relies on inherently subjective assessments, hindering both the identification of truly innovative solutions and the development of strategies to enhance creative capacity. Without standardized metrics, judging the merit of an idea frequently devolves into personal preference or prevailing opinion, making comparative analysis unreliable and progress difficult to measure. This lack of objectivity poses a significant challenge for fields ranging from product development to scientific research, where discerning genuinely novel and valuable contributions is paramount. A rigorous framework for quantifying qualities like originality, practicality, and viability is therefore essential, not simply to rank ideas, but to provide actionable insights for fostering and refining creative reasoning itself.

The Limits of Language: Mapping LLM Capabilities

Current implementations of large language models (LLMs), including GPT-5 and DeepSeek V3.1, are frequently applied to tasks requiring complex reasoning, such as problem-solving and logical inference. However, empirical evidence suggests that while these models can successfully execute reasoning processes, their ability to generate genuinely novel or creative solutions is variable. Performance inconsistencies stem from the models’ reliance on existing data and learned patterns; while proficient at identifying and applying known information, they often struggle to produce outputs that deviate significantly from established norms or demonstrate true originality. This limitation is particularly apparent when assessing the quality of generated ideas based on metrics such as novelty, utility, and feasibility.

Large language models (LLMs) demonstrate proficiency in identifying and applying established patterns within datasets, enabling efficient information retrieval and response generation. This capability stems from their architecture, which allows them to statistically correlate input prompts with vast quantities of pre-existing text. Consequently, LLMs excel at tasks requiring the synthesis of known information, such as summarizing documents, translating languages, and answering factual questions. However, this reliance on existing data means their outputs are fundamentally constrained by the knowledge encoded during training; responses are constructed by recombining and re-presenting learned patterns rather than generating truly original concepts.

Evaluating creative reasoning in large language models necessitates assessment using quantifiable metrics, specifically Novelty, Utility, and Feasibility. Our research introduces the Universe of Thoughts (UoT) framework as a method for this evaluation, demonstrated through performance on the Bridge Task. Results indicate the UoT framework achieved a Creativity Score of 0.698, representing a measurable improvement over the 0.649 score attained by GPT-5 on the same task. This data suggests that while both models exhibit creative capacity, the UoT framework provides a higher level of demonstrable novelty, utility, and feasibility in generated solutions according to the defined metrics.

Deconstructing Creativity: An Integrated Metric

Creative Reasoning, within this framework, is not assessed as an isolated cognitive skill but rather as an emergent property resulting from the interplay of three distinct components: Novelty, Utility, and Feasibility. Novelty quantifies the originality of a solution, indicating its deviation from existing knowledge; Utility measures the practical value or benefit derived from implementing the solution; and Feasibility assesses the extent to which the solution can be realized given established constraints, such as resource limitations or physical laws. The model posits that a high degree of Creative Reasoning is achieved only when these three components are simultaneously maximized, reflecting a solution that is not only original and valuable but also realistically implementable.

Novelty, Utility, and Feasibility represent distinct but interconnected dimensions in evaluating solution quality. Novelty assesses the originality of a solution, determining the extent to which it deviates from previously known approaches; a high degree of novelty indicates a unique contribution. Utility measures the practical value of a solution, quantifying its ability to address the defined problem or fulfill a specific need. Finally, Feasibility confirms whether a solution can be implemented within existing constraints, including resource limitations, technological capabilities, and established protocols. A solution lacking in any of these areas – being unoriginal, impractical, or impossible to implement – will not contribute to overall efficiency gains.

Optimization for the integrated metric of Novelty, Utility, and Feasibility enables the generation of solutions assessed as both innovative and implementable, representing a qualitative shift from simply producing a high volume of ideas. Empirical results demonstrate this approach yields an ‘astronomically large’ efficiency gain when applied to the C-UoT (Chain-of-Thought with Utility Optimization) methodology. Furthermore, T-UoT (Tree-of-Thought with Utility Optimization) demonstrates a ‘most significant’ efficiency gain in comparison to exhaustive search baseline methods, indicating a substantial improvement in problem-solving performance through the prioritization of viable and original solutions.

The pursuit of creative reasoning, as outlined in the exploration of the Universe of Thoughts, inherently demands a willingness to dismantle established patterns. This framework doesn’t simply solve problems; it creates solutions through systematic exploration and recombination. Vinton Cerf aptly observes, “Any sufficiently advanced technology is indistinguishable from magic.” The UoT, by enabling Large Language Models to move beyond conventional problem-solving, approaches this very threshold. It isn’t about finding the right answer, but generating a multiplicity of possibilities, mirroring the chaotic yet ordered nature of genuine innovation. The architecture of thought, then, isn’t a rigid structure but a fluid landscape constantly reshaped by the forces of combinational, exploratory, and transformational creativity.

Beyond the Thought Experiment

The framework presented here-this ‘Universe of Thoughts’-is, at its core, a controlled demolition of the conventional LLM. It’s not about building a better problem-solver, but about dissecting the very idea of problem-solving. The model doesn’t merely find solutions; it generates the space in which ‘solution’ even becomes a meaningful term. Naturally, this introduces new instabilities. The systematic exploration of combinational space exposes the inherent fragility of semantic coherence. The more thoroughly one dismantles assumptions, the more precarious the resulting structure.

Future work will inevitably focus on refining the exploratory mechanisms, seeking to balance divergent thinking with convergent stability. However, a more pressing question remains: what constitutes ‘creativity’ when artificially generated? Is it novelty, utility, surprise-or simply the illusion thereof? The model currently operates within parameters defined by human expectation. A truly disruptive advance will require the LLM to define its own criteria for meaningful output, to hack the very definition of ‘interesting’.

Ultimately, this isn’t about mimicking human thought, but about exceeding its limitations. The ‘Universe of Thoughts’ is a starting point, a proof of concept. The real challenge lies not in building a creative LLM, but in understanding what it reveals about the nature of creativity itself – and perhaps, about the limitations of our own cognitive architectures.


Original article: https://arxiv.org/pdf/2511.20471.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-27 02:07