Context is King: Smarter AI for Coding Education

Author: Denis Avetisyan

New research shows that grounding AI coding assistants in specific project context dramatically improves their ability to help students learn and understand code.

AI tool adoption isn’t uniform across a project’s lifecycle; rather, integration ebbs and flows with each phase, suggesting that successful implementation isn’t about blanket coverage but strategic alignment with evolving needs.

Integrating a locally deployed, repository-aware Large Language Model enhances AI assistance for code comprehension and project-specific tasks in software engineering education.

The increasing prevalence of generative AI presents both opportunity and challenge for modern software engineering education. This paper, ‘Learning to Code with Context: A Study-Based Approach’, details a study investigating the effective integration of AI assistance within a university-level programming project. Our findings demonstrate that a locally deployed, repository-aware Large Language Model (LLM) significantly improves the relevance and quality of AI support, particularly for tasks requiring code comprehension and project-specific context. How can these insights inform the development of curricula that equip students to responsibly and effectively leverage AI throughout the software development lifecycle?

The Evolving Landscape of Code and Cognition

The advent of generative artificial intelligence is fundamentally reshaping how students approach programming projects. No longer solely reliant on independent problem-solving and manual coding, students are increasingly incorporating AI tools – such as code completion engines and automated debugging assistants – into their workflows. This integration manifests as a shift from crafting code line-by-line to iteratively refining suggestions provided by AI, and from exhaustive debugging to leveraging AI-driven error detection. While this promises increased efficiency and accessibility, it also introduces a new dynamic where students must learn to effectively prompt, evaluate, and integrate AI-generated content, fundamentally altering the skills required for successful software development and demanding a re-evaluation of traditional pedagogical approaches.

The swift integration of generative artificial intelligence into programming education, while offering exciting possibilities, isn’t without potential drawbacks. Initial observations suggest that students utilizing these tools without sufficient guidance may produce code that, while syntactically correct, lacks a deep understanding of the underlying principles. This can manifest as an inability to debug effectively, adapt solutions to novel problems, or even fully grasp the logic of the code they submit. Consequently, educators face the critical task of carefully evaluating the impact of these tools, not simply on the final product, but on the learning process itself. A rigorous assessment of code quality, coupled with methods to gauge conceptual understanding, is therefore paramount to ensure that AI serves as a catalyst for genuine skill development, rather than a shortcut that hinders long-term competency.

A comprehensive understanding of how students are integrating – and potentially misusing – generative AI tools is now paramount to maximizing the benefits of these technologies in education. Current research indicates that while these tools offer significant potential to streamline workflows and enhance learning, unguided adoption can introduce substantial risks. Specifically, students may encounter instances of “hallucination,” where the AI generates factually incorrect or nonsensical code, or exhibit “tool misuse,” applying the AI inappropriately to tasks requiring fundamental understanding. Investigating the prevalence of these issues, alongside a detailed analysis of student experiences, is essential for developing targeted interventions and pedagogical strategies that harness the power of AI while safeguarding the integrity of the learning process and fostering genuine skill development.

AI tools demonstrate varying levels of helpfulness throughout different project phases.

Contextual Awareness: Grounding AI in the Codebase

A Repository-Aware LLM Assistant was developed to improve the quality of code suggestions for students by integrating Retrieval-Augmented Generation (RAG). This approach combines the predictive capabilities of a Large Language Model (LLM) with information retrieved from a designated code repository. Specifically, the system first identifies relevant code segments and documentation within the repository based on the student’s query. This retrieved context is then provided to the LLM as input, enabling it to generate code suggestions grounded in the project’s existing codebase. The implementation aims to address limitations of standalone LLMs by reducing the generation of inaccurate or irrelevant code, and increasing the relevance of suggestions to the specific project context.

Local deployment of the LLM assistant provides several key operational advantages. By hosting the model and associated data on-premise or within a private cloud environment, organizations maintain complete control over data access and processing, addressing data privacy concerns and ensuring compliance with relevant regulations. This configuration also guarantees reproducibility of results, as the model’s behavior is isolated from external dependencies and updates. Furthermore, local deployment facilitates fine-grained control over the AI’s behavior through parameter adjustments, custom training, and the implementation of specific guardrails, enabling tailored performance and alignment with specific organizational requirements.

The Repository-Aware LLM Assistant employs Contextual Grounding by directly accessing and incorporating relevant code and documentation from a specified project repository during the generation of responses. This process mitigates inaccuracies and reduces the occurrence of “hallucinations” – the generation of factually incorrect or nonsensical information – commonly observed in baseline Large Language Models (LLMs). By grounding responses in a defined knowledge base, the assistant provides more accurate and contextually appropriate code suggestions and explanations, thereby improving overall reliability and reducing the defect rate associated with LLM-generated content.

A local deployment stack leveraging Open WebUI, Docling, and vLLM on a DGX-H100 host enables chat, knowledge retrieval, tool use, and document grounding with GitLab repository integration.

The Battleship Project: A Testbed for LLM Performance

The Battleship Project served as the testing ground for evaluating the performance of several large language models (LLMs), including DeepSeek-Coder-V2, Qwen3-235B, and Mistral-Large-Instruct. These models were subjected to the project’s requirements to assess their capabilities in a practical coding scenario. The selection of these specific LLMs allowed for a comparative analysis, ranging from a medium-sized model like DeepSeek-Coder-V2 (23.6B parameters) to larger models such as Qwen3-235B and Mistral-Large-Instruct. The project provided a standardized benchmark for measuring the models’ ability to generate, understand, and modify code within a defined context.

A user study was conducted to assess the usability of the tool and its effect on the quality of code produced by students. Data collection involved gathering direct feedback from students utilizing the assistant on the Battleship Project. This feedback encompassed subjective evaluations of the tool’s interface and workflow, as well as objective measurements of code correctness and efficiency. Analysis of student responses and code samples was performed to identify areas of strength and weakness in the tool’s design and functionality, ultimately informing iterative improvements and refinements.

Analysis of the Battleship Project identified a Defect Catalog encompassing common errors such as hallucination and tool misuse; the assistant demonstrated mitigation capabilities for these issues. Specifically, models utilizing repository awareness exhibited a reduction in the ‘Defect Rate (Hallucination)’ across all tested instances. Despite being a 23.6 billion parameter model – considered medium-sized within the test group – DeepSeek-Coder-V2 consistently generated usable outputs, indicating strong performance relative to its size. A persistent defect, ‘Defect Rate (Missing resource entry)’ was identified, but analysis suggests this stems from limitations within the retrieval-to-action pipeline, rather than inherent flaws in the language models themselves.

Participants generally perceived the AI as useful for the project and expressed a willingness to utilize it in future endeavors.

Toward Adaptive Systems: Shaping AI for Educational Resilience

The demonstrable effectiveness of this AI-assisted learning system underscores a critical principle: successful integration hinges on contextual tailoring. Rather than applying generalized AI solutions, this research highlights the benefits of tools designed with the specific nuances of the learning environment in mind. By focusing on the particular needs and challenges within a defined educational setting, the system achieved a higher degree of accuracy and relevance in its support of students. This approach minimizes the potential for irrelevant or inaccurate responses, fostering a more productive and engaging learning experience. The positive outcomes suggest that future development should prioritize customization and adaptation, recognizing that a ‘one-size-fits-all’ approach to AI in education is unlikely to yield optimal results.

Prioritizing contextual grounding and local deployment represents a crucial strategy for responsible integration of artificial intelligence in education. By ensuring AI systems are deeply aware of the specific learning environment – including curriculum, student needs, and institutional resources – the potential for generating irrelevant, inaccurate, or even harmful content is significantly reduced. Local deployment further minimizes risks by keeping data processing within the educational institution’s control, addressing privacy concerns and enabling greater customization. This approach doesn’t simply leverage AI’s capabilities, but rather shapes them to align with pedagogical goals, ultimately maximizing the positive impact on student learning and fostering a more trustworthy and effective educational experience.

Ongoing development centers on a comprehensive refinement of the system’s error identification and correction capabilities, specifically through an expanded “Defect Catalog.” This catalog will meticulously document frequently occurring issues, allowing for the implementation of proactive strategies to mitigate them and enhance the learning experience. A key area of focus is reducing the “Defect Rate (Missing resource entry),” which currently hinders the seamless transition from information retrieval to actionable learning steps; improvements here promise a more fluid and effective educational process by ensuring students can readily access all necessary materials and complete tasks without interruption. The ultimate goal is a self-improving system capable of anticipating and resolving common errors, fostering a more robust and user-friendly AI-assisted learning environment.

The study illuminates a familiar pattern: systems, even those designed to assist in learning, are inherently subject to the pressures of context and evolution. As architectures mature, their relevance hinges not on inherent perfection, but on graceful adaptation. This research, focused on repository-aware LLMs, demonstrates a commitment to acknowledging that improvements age faster than comprehension allows-a locally deployed model isn’t simply a tool, but an attempt to anchor assistance within the specific ecosystem it serves. John von Neumann observed, “The best way to predict the future is to invent it.” This sentiment resonates with the proactive approach taken here; rather than passively accepting the limitations of general-purpose models, the authors actively shape an architecture optimized for the present, understanding that continuous refinement is the only sustainable path forward.

What Lies Ahead?

This work establishes a benchmark-a point on the timeline-for localized, context-aware AI assistance in software engineering education. However, the chronicle of this system’s development will inevitably reveal entropy. The current iteration, while promising, remains tethered to the quality of its training data and the limitations of retrieval-augmented generation. Future efforts must address the inherent fragility of knowledge representation; a repository-aware LLM is only as astute as the history it logs.

A critical, unresolved question concerns the scaling of such systems. Maintaining local LLM deployments for each student, or even each project, introduces logistical complexities. The true test lies not merely in improving code comprehension, but in fostering genuine creative problem-solving. Can an AI, even one steeped in contextual awareness, truly assist in the generation of novel algorithms, or will it remain a sophisticated echo of past solutions?

The field now faces a choice: pursue ever-larger models, chasing diminishing returns in generalization, or focus on building systems that age gracefully-systems that can adapt, learn, and refine their understanding over time, not simply accumulate data. The ultimate metric isn’t accuracy, but resilience-the capacity to remain useful as the landscape of software development continually shifts.

Original article: https://arxiv.org/pdf/2512.05242.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Evolving Landscape of Code and Cognition

Contextual Awareness: Grounding AI in the Codebase

The Battleship Project: A Testbed for LLM Performance

Toward Adaptive Systems: Shaping AI for Educational Resilience

What Lies Ahead?

See also: