AI-Powered Slide Design: From Text to Teaching

Author: Denis Avetisyan


A new framework harnesses the power of artificial intelligence to automatically create informative and engaging slide presentations for educational settings.

A system orchestrates automated presentation creation through a three-stage pipeline—content retrieval coordinated by a central agent, translation into $LaTeXBeamer$ code, and subsequent enhancement with visuals and commentary—leveraging feedback loops to refine adaptability and maintain consistent quality throughout the process.
A system orchestrates automated presentation creation through a three-stage pipeline—content retrieval coordinated by a central agent, translation into $LaTeXBeamer$ code, and subsequent enhancement with visuals and commentary—leveraging feedback loops to refine adaptability and maintain consistent quality throughout the process.

SlideBot leverages large language models, retrieval-augmented generation, and cognitive load theory to produce high-quality, pedagogically sound multi-modal presentations.

While large language models show promise in automating educational tasks, generating effective and reliable multi-modal presentations remains a significant challenge. This paper introduces SlideBot: A Multi-Agent Framework for Generating Informative, Reliable, Multi-Modal Presentations, a novel system that addresses these limitations through a modular, multi-agent approach integrating LLMs with retrieval, structured planning, and code generation. By grounding content in external sources and leveraging cognitive learning principles, SlideBot demonstrably enhances conceptual accuracy and instructional value in generated slides. Could this framework ultimately redefine slide preparation and broaden access to high-quality educational materials?


Decoding Cognitive Load: The Limits of Attention

The human mind possesses a limited capacity for processing information, a reality often overlooked in conventional learning environments. Traditional methods, such as densely packed lectures or text-heavy slides, frequently overwhelm this capacity by introducing extraneous cognitive load – mental effort devoted to processing irrelevant details rather than the core learning material. This overload doesn’t simply slow down comprehension; it actively hinders the ability to build lasting knowledge. The brain, faced with excessive stimulation, struggles to filter essential information from the noise, diminishing available resources for deep processing and knowledge construction. Consequently, learners may experience frustration, reduced motivation, and ultimately, impaired retention, even when presented with valuable content. Optimizing learning, therefore, requires a deliberate shift towards minimizing distractions and streamlining information presentation to align with the mind’s inherent processing limitations.

Cognitive Load Theory proposes that learning isn’t simply about the amount of information presented, but rather how that information is processed by working memory. This theory categorizes cognitive load into three distinct types: intrinsic, which is the inherent difficulty of the material itself; germane, representing the deep processing and schema construction that leads to genuine understanding; and extraneous, which arises from poorly designed instructional materials or unnecessary distractions. Critically, extraneous cognitive load actively impedes learning by consuming valuable mental resources that could otherwise be devoted to germane processing. When instructional designs force learners to contend with irrelevant details or confusing presentations, the capacity for building lasting knowledge is significantly diminished, highlighting the importance of minimizing extraneous load to optimize the learning experience.

Contrary to intuition, increasing the sheer volume of learning materials doesn’t necessarily enhance understanding. Research demonstrates that effective knowledge transfer isn’t about more content, but about optimizing cognitive effort. The key lies in minimizing extraneous cognitive load – distractions and inefficient presentation hindering processing – and maximizing germane load, which represents the deep processing crucial for building lasting schemas. This means prioritizing clarity, reducing unnecessary complexity, and encouraging active thinking, such as through problem-solving and elaboration, to facilitate genuine learning rather than simply overwhelming working memory. A focus on quality over quantity, and fostering deep processing, ultimately unlocks more effective and enduring knowledge acquisition.

Unlike Copilot and GPT-4o, which produce superficial slides lacking depth and citations, SlideBot leverages information retrieval and content enhancement to generate focused, pedagogically sound LaTeX Beamer presentations, as demonstrated on the topic of Manifold Learning.
Unlike Copilot and GPT-4o, which produce superficial slides lacking depth and citations, SlideBot leverages information retrieval and content enhancement to generate focused, pedagogically sound LaTeX Beamer presentations, as demonstrated on the topic of Manifold Learning.

SlideBot: An Agentic System for Controlled Information Flow

SlideBot utilizes an agentic framework to address the complexities of automated presentation creation by breaking down the overall process into discrete, manageable tasks. This decomposition allows for specialization; rather than a single model attempting to handle all aspects of presentation design and content generation, individual agents are assigned specific roles. These roles include planning the presentation’s structure, retrieving relevant information, generating content, enhancing visual elements, and moderating the final output. This modular approach improves efficiency, allows for targeted optimization of each task, and facilitates easier debugging and maintenance of the system compared to monolithic approaches to presentation generation.

Retrieval-Augmented Generation (RAG) is implemented within SlideBot to mitigate the risks of hallucination and ensure factual accuracy in generated presentation content. This process involves initially retrieving relevant documents from a specified knowledge base—which can include internal documentation, web resources, or curated datasets—based on the user’s presentation topic. The language model then utilizes both the retrieved content and the original prompt to formulate responses, effectively grounding the generated text in verifiable sources. By incorporating this retrieval step, SlideBot minimizes the generation of unsupported claims and enhances the relevance of the presentation material to the provided knowledge base, improving overall content quality and trustworthiness.

SlideBot’s functionality is structured around five distinct modular agents. The Planner agent initially decomposes the user’s presentation request into a series of actionable steps and defines the overall structure. The Retriever agent then sources relevant information from external knowledge bases using Retrieval-Augmented Generation (RAG). A Code Generator agent transforms the retrieved content into presentation-ready content, typically utilizing a markup language. The Enhancer agent focuses on improving the visual and textual quality of the generated slides, and finally, the Moderator agent oversees the entire process, ensuring coherence, accuracy, and adherence to the initial presentation goals.

Student and expert evaluations reveal that SlideBot consistently outperforms direct prompting with GPT-4o across key presentation quality metrics, including explanation style, structure, credibility, conceptual accuracy, and topic coverage.
Student and expert evaluations reveal that SlideBot consistently outperforms direct prompting with GPT-4o across key presentation quality metrics, including explanation style, structure, credibility, conceptual accuracy, and topic coverage.

Content Curation and Retrieval: The Foundation of Intelligent Slides

The Retriever Agent leverages the arXiv Application Programming Interface (API) to access a comprehensive and continuously updated collection of scholarly papers in fields including physics, mathematics, computer science, quantitative biology, quantitative finance, and statistics. The arXiv API provides access to metadata and full-text content, enabling the agent to retrieve research relevant to slide creation. Utilizing arXiv ensures the information presented is sourced from pre-print and published research, contributing to the credibility and timeliness of the generated content. The API allows for programmatic access, facilitating automated retrieval and integration of scientific literature into the intelligent slide generation process.

BM25 is a ranking function used to estimate the relevance of documents to a given search query. It operates on the principle that documents containing query terms are more relevant, but accounts for term frequency and document length to prevent bias towards longer documents. The algorithm calculates a score based on the term frequency ($TF$) within a document, inverse document frequency ($IDF$) – reflecting the rarity of terms across the corpus – and document length normalization. Specifically, BM25 uses parameters $k_1$ and $b$ to control term frequency saturation and document length normalization, respectively. Higher scores indicate greater relevance, allowing the system to prioritize and deliver the most pertinent content for each slide based on the query.

The system’s content retrieval process is designed to reduce cognitive load by prioritizing essential information for each slide. This targeted approach, as opposed to presenting comprehensive but potentially overwhelming datasets, demonstrably improves conceptual understanding. Expert review data indicates a 0.86 improvement in Conceptual Accuracy – a metric assessing the clarity and correctness of learned concepts – when utilizing this streamlined information delivery method. This suggests a direct correlation between reduced extraneous information and enhanced knowledge retention and comprehension.

The Enhancer tool accurately translates a prewritten confusion matrix macro into its corresponding rendered output, demonstrating successful code generation.
The Enhancer tool accurately translates a prewritten confusion matrix macro into its corresponding rendered output, demonstrating successful code generation.

From Framework to Output: Beamer Integration and Visual Coherence

SlideBot distinguishes itself through its integration with Beamer, a powerful LaTeX package renowned for crafting visually polished and structurally sound presentations. This choice moves beyond simple slide generation, embedding each presentation within a framework designed for professional quality and consistent formatting. Beamer’s capabilities allow for precise control over typography, color schemes, and layout, ensuring a cohesive aesthetic across all slides. The result is not merely a collection of bullet points, but a presentation that leverages mathematical notation – such as $E=mc^2$ – and complex diagrams with clarity and elegance. By prioritizing a robust underlying structure, SlideBot delivers presentations that enhance comprehension and maintain audience engagement through a refined and consistent visual experience.

The system’s use of LaTeX and the Beamer package isn’t merely about aesthetic polish; it fundamentally empowers instructors with granular control over presentation content. Beyond the automatically generated framework, educators can readily modify every element – from text and images to equations like $E=mc^2$ and complex diagrams – to align precisely with specific pedagogical goals. This adaptability extends to branding and institutional requirements, allowing for seamless integration of logos, color schemes, and preferred citation styles. Consequently, instructors aren’t constrained by a rigid template but instead possess a highly malleable foundation upon which to build presentations that are not only visually coherent but also uniquely tailored to enhance student understanding and achieve targeted learning outcomes.

SlideBot distinguishes itself from conventional presentation creation tools by employing an agentic approach, focusing on the deliberate structuring of information rather than simply responding to direct requests. This methodology demonstrably enhances learning outcomes; student surveys reveal a significant $1.71$ improvement in the clarity of explanations presented, coupled with a $1.58$ increase in overall suitability of the material. Importantly, the agentic design fosters heightened credibility, as indicated by a $2.42$ point increase in student assessments, suggesting that a thoughtfully organized presentation, prioritizing coherence and minimizing extraneous details, substantially impacts perception and knowledge retention.

The Enhancer system translates prewritten pseudocode into functional code, as demonstrated by the example macro and its rendered output.
The Enhancer system translates prewritten pseudocode into functional code, as demonstrated by the example macro and its rendered output.

The pursuit of automated presentation generation, as exemplified by SlideBot, isn’t merely about efficiency—it’s a deliberate dismantling of established pedagogical norms. The framework’s reliance on retrieval-augmented generation and cognitive load theory represents a systematic probing of how information is best conveyed. One might observe, as Marvin Minsky famously stated, “The more we learn about intelligence, the more we realize how much we don’t know about it.” SlideBot, in its attempt to distill complex topics into digestible slides, exposes the inherent imperfections in our understanding of effective teaching and learning – each generated slide a testament to the ongoing refinement of knowledge itself. The system doesn’t simply create presentations; it iteratively challenges our assumptions about communication.

What Breaks Down Next?

SlideBot, as presented, operates under the implicit assumption that ‘informative’ and ‘reliable’ are universally definable qualities within educational content. But what happens when the system encounters deliberately contradictory or ambiguous source material? The framework currently prioritizes synthesis; a challenge lies in building in mechanisms for detecting and flagging epistemological uncertainty, rather than smoothing it over with confident, yet potentially misleading, presentations. One could envision adversarial attacks, not on the LLM itself, but on the retrieval stage – seeding the knowledge base with subtly incorrect information to observe how gracefully the system fails.

The current reliance on cognitive load theory as a guiding principle is… convenient. It offers quantifiable metrics, but simplifies the messy reality of human learning. Future iterations should explore the introduction of learner-specific models – tailoring not just the content of the slides, but the very structure of the presentation to individual cognitive profiles. What if the system deliberately introduces controlled ‘cognitive friction’ – elements designed to encourage deeper processing, even at the expense of immediate comprehension?

Ultimately, SlideBot exemplifies a broader trend: automating the ‘telling’ of knowledge. The more interesting question isn’t whether the system can generate a good presentation, but whether it can expose the inherent limitations of the knowledge it presents. Can it design slides that highlight what isn’t known, rather than simply what is? That would be a truly disruptive – and likely unsettling – application of this technology.


Original article: https://arxiv.org/pdf/2511.09804.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-17 02:28