Beyond Algorithms: Reimagining Recommendations with Large Language Models

Author: Denis Avetisyan

A new framework, OpenOneRec, is leveraging the power of generative AI to build more flexible and performant recommendation systems.

The OneRec series models utilize a post-training pipeline to refine performance and optimize functionality.

OpenOneRec introduces a novel benchmark, RecIF-Bench, and open-source models demonstrating state-of-the-art performance and strong transfer learning capabilities for instruction-following recommendation.

Despite advances in unifying recommendation pipelines, current systems remain limited by isolated data and a lack of general intelligence-proficient in pattern matching but lacking reasoning and instruction following. The OpenOneRec Technical Report addresses this gap by introducing a framework leveraging large language models, alongside the RecIF-Bench benchmark and a substantial open-source dataset, to evaluate and advance holistic recommender capabilities. This work demonstrates state-of-the-art performance with the OneRec-Foundation models-achieving significant gains on both the RecIF-Bench and Amazon benchmarks-and showcases predictable scaling alongside mitigated catastrophic forgetting. Could this approach pave the way for truly intelligent recommender systems capable of more nuanced and effective user engagement?

Beyond Superficiality: Addressing the Limits of Conventional Recommendations

Conventional recommendation systems frequently encounter limitations when dealing with new users or items – a challenge known as the ‘cold-start’ problem. These systems typically rely on historical interaction data, making it difficult to provide relevant suggestions without sufficient prior information. Furthermore, traditional approaches often treat user preferences and item characteristics as isolated data points, failing to capture the nuanced relationships and underlying semantics that drive human decision-making. This reliance on simplistic data representation hinders their ability to understand why a user might prefer a particular item, limiting the accuracy and personalization of recommendations. Consequently, these systems struggle to effectively connect users with items that align with their complex and evolving needs, often resorting to popularity-based suggestions or superficial matching based on limited attributes.

The limitations of conventional recommendation systems, particularly their struggle with novel users or items and an inability to fully grasp nuanced preferences, are increasingly addressed by the capabilities of Large Language Models. These models, pretrained on vast amounts of text data, demonstrate an inherent understanding of natural language that allows them to interpret user intent far beyond simple keyword matching. However, directly applying LLMs to recommendation isn’t seamless; they require specific adaptation through techniques like fine-tuning or prompt engineering to effectively translate language understanding into relevant item suggestions. This involves training the models on recommendation-specific datasets and optimizing their output for ranking and personalization, ensuring the rich semantic information gleaned from language translates into actionable and satisfying recommendations.

Successfully integrating Large Language Models into recommendation systems isn’t simply a matter of model size; it demands a nuanced approach to computational scaling. Research indicates that the performance of these models doesn’t increase linearly with added parameters or data. Instead, effective scaling follows specific power laws: a model’s size ( $N$ ) exhibits a scaling exponent of 0.44, while the token budget ( $D$ ) – representing the amount of text the model processes – scales with an exponent of 0.56. These exponents are critical because they demonstrate that simply increasing model size or data volume yields diminishing returns; optimized scaling requires a more strategic allocation of resources, prioritizing increases in the token budget to maximize performance gains while managing the substantial computational costs associated with these complex models.

Analysis of recommendation models reveals scaling laws where training loss decreases with increasing compute, and optimal model and token budgets scale as <span class="katex-eq" data-katex-display="false">C^{0.44}</span> and <span class="katex-eq" data-katex-display="false">C^{0.56}</span> respectively, with compute budget <span class="katex-eq" data-katex-display="false">C</span>. — Analysis of recommendation models reveals scaling laws where training loss decreases with increasing compute, and optimal model and token budgets scale as $C^{0.44}$ and $C^{0.56}$ respectively, with compute budget $C$ .

A Unified Framework: OpenOneRec for Holistic Recommendation

OpenOneRec provides a complete workflow for developing Large Language Model (LLM)-based recommendation systems, structured around three core phases: data processing, pre-training, and post-training. The data processing stage focuses on preparing heterogeneous user and item data for model input, including feature engineering and sequence construction. Pre-training involves training LLMs to understand both general knowledge and recommendation-specific patterns through a co-pretraining strategy. Finally, the post-training phase fine-tunes the pre-trained models using specific recommendation tasks, such as predicting user preferences or ranking items, optimizing performance for deployment in real-world scenarios. This holistic approach aims to address the entire lifecycle of building and deploying LLM-powered recommendation systems.

The OneRec-Foundation model series utilizes the Qwen architecture, a transformer-based language model known for its strong performance and scalability. These models are available in multiple parameter sizes – 7B, 14B, and 72B – to accommodate varying computational resources and performance requirements. The Qwen architecture employs techniques such as rotary positional embeddings and grouped-query attention to improve efficiency and handle long sequences effectively, crucial for modeling user behavior and item characteristics within recommendation systems. These pre-trained foundation models serve as a robust starting point, enabling efficient fine-tuning for specific recommendation tasks and datasets.

OpenOneRec utilizes a co-pretraining strategy to improve recommendation model performance by concurrently training on both broad general knowledge corpora and datasets specific to user-item interactions. This approach contrasts with traditional pre-training methods that focus solely on one data source. By integrating general knowledge, the model develops a more robust understanding of entities and relationships, while exposure to recommendation-specific data allows it to directly learn patterns of user behavior and item characteristics. The simultaneous learning process facilitates knowledge transfer between domains, resulting in enhanced generalization capability and improved accuracy in predicting user preferences and recommending relevant items.

OneRec-Foundation leverages item-text alignment and mixed-domain co-pretraining for initial learning, followed by supervised fine-tuning and reinforcement learning with knowledge distillation to achieve robust recommendation performance and cross-domain transferability, as validated on RecIF-Bench and Amazon datasets.

Refinement Through Optimization: Post-Training Techniques for Enhanced Performance

Supervised Fine-tuning within OpenOneRec’s post-training optimization utilizes existing labeled datasets to further train the model after initial pre-training. This process adjusts model weights to minimize the difference between predicted outputs and known correct labels, improving performance on specific tasks or datasets. The labeled data provides direct feedback, enabling the model to refine its understanding of input features and their relationship to desired outputs. This is particularly effective when addressing data distributions or user behaviors not fully captured during pre-training, and can significantly enhance metrics like precision, recall, and click-through rate on targeted recommendations.

On-policy distillation in OpenOneRec functions by training a student model to mimic the output distribution of a pre-trained teacher model, utilizing the same on-policy data used during the teacher’s training. This process transfers knowledge regarding optimal action selection and state representation, resulting in improved generalization performance, particularly in scenarios with limited labeled data. Furthermore, the student model, being typically smaller in size than the teacher, exhibits enhanced computational efficiency during inference, reducing latency and resource consumption without significant performance degradation. The distillation loss function typically incorporates a combination of cross-entropy loss, measuring the divergence between student and teacher outputs, and potentially other regularization terms to encourage knowledge transfer and prevent overfitting.

Following initial training, OpenOneRec utilizes Reinforcement Learning (RL) with the Gradient-based Policy Optimization (GRPO) algorithm to further refine model performance. GRPO optimizes the recommendation model by directly maximizing cumulative rewards derived from user interactions. This process involves defining a reward function that quantifies the value of a recommendation – typically based on metrics like click-through rate, conversion rate, or time spent on an item. The GRPO algorithm then iteratively adjusts the model’s policy – its strategy for selecting recommendations – to increase expected rewards. The gradient-based approach allows for efficient policy updates by estimating the impact of each recommendation on the overall reward signal, enabling the model to learn from real-world user feedback and adapt its behavior accordingly.

On-policy distillation via policy gradient optimizes a student policy model by iteratively refining its trajectory sampling-guided by reverse <span class="katex-eq" data-katex-display="false">KL</span> divergence feedback from a teacher model-to maximize reward. — On-policy distillation via policy gradient optimizes a student policy model by iteratively refining its trajectory sampling-guided by reverse $KL$ divergence feedback from a teacher model-to maximize reward.

Beyond Correlation: Bridging Semantics for Robust and Adaptable Recommendations

OpenOneRec introduces Itemic-Text Alignment, a technique designed to forge a direct link between item representations and their corresponding natural language descriptions. This process moves beyond traditional methods that treat these as separate entities; instead, the system learns to understand the semantic relationships between an item and how it’s described. By aligning these two modalities, the model develops a richer, more nuanced understanding of each item – not just its inherent characteristics, but also how it’s perceived and understood by users through language. This allows the system to capture subtle distinctions and contextual information, ultimately improving the relevance and quality of recommendations by leveraging the descriptive power of language alongside item features.

The ability to discern semantic relationships between items and their descriptions allows the model to excel in scenarios where data is limited, commonly known as the cold-start problem. By understanding the underlying meaning of an item – beyond simple keyword matching – the system can effectively recommend relevant products even with minimal user interaction data. This nuanced comprehension extends beyond simply finding similar items; it facilitates the identification of items that align with a user’s expressed preferences or the contextual meaning of their search, resulting in more personalized and insightful recommendations. Consequently, users benefit from a more diverse and relevant selection, while the system demonstrates enhanced adaptability and predictive accuracy in dynamic recommendation landscapes.

Comprehensive evaluation of the framework utilized the RecIF-Bench benchmark, a rigorous testing ground for recommendation systems, and yielded state-of-the-art performance metrics. Results demonstrated an average improvement of 26.8% in Recall@10 when assessed against the Amazon benchmark, signifying a substantial advancement in recommendation accuracy. This metric, focusing on the ability to successfully recommend relevant items within the top ten predictions, highlights the framework’s enhanced capacity to deliver pertinent suggestions to users. The observed gains underscore the efficacy of the implemented techniques in addressing the challenges of real-world recommendation scenarios and establishing a new standard for performance in the field.

Achieving state-of-the-art performance on recommendation benchmarks like RecIF-Bench and demonstrating strong cross-domain transferability on Amazon datasets, our model effectively balances specialized task performance with the retention of general knowledge.

The pursuit of elegant solutions permeates the OpenOneRec framework, mirroring a dedication to distilling complexity. This focus on parsimony aligns with Paul Erdős’s assertion: “A mathematician knows a lot of things, but a good mathematician knows which ones to use.” The framework’s emphasis on transfer learning, showcased by state-of-the-art performance on RecIF-Bench, exemplifies this principle. Rather than reinventing the wheel with each new recommendation task, OpenOneRec leverages pre-trained large language models, selectively applying existing knowledge to achieve optimal results. The result isn’t merely about achieving high accuracy, but about doing so with the minimum necessary computational overhead-a testament to the beauty of simplicity in generative AI.

What Lies Ahead?

The current work establishes a functional, if predictably complex, interface between large language models and recommendation tasks. The demonstrated performance, while notable, merely shifts the central problem. It does not solve it. The efficacy of transfer learning, highlighted by OpenOneRec, suggests the true challenge lies not in model architecture, but in curating datasets that expose fundamental relational structures, not superficial correlations. The benchmark, RecIF-Bench, is a necessary, though inevitably transient, step; future iterations must prioritize robustness to adversarial examples and account for the non-stationarity inherent in user behavior.

The framework’s reliance on instruction following, while elegant, introduces a dependency on prompt engineering. This is a brittle solution. The field must move beyond coaxing emergent behavior from opaque models and toward systems grounded in explicit knowledge representation. A truly lossless compression of user preference would not require constant prompting, but internal, verifiable reasoning.

Ultimately, the question is not whether large language models can build recommendations, but whether they necessitate a fundamental re-evaluation of what a recommendation is. The pursuit of increasingly accurate predictions risks obscuring the more meaningful goal: facilitating genuine discovery, not merely mirroring existing biases. The next iteration should focus not on maximizing clicks, but on minimizing regret.

Original article: https://arxiv.org/pdf/2512.24762.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Superficiality: Addressing the Limits of Conventional Recommendations

A Unified Framework: OpenOneRec for Holistic Recommendation

Refinement Through Optimization: Post-Training Techniques for Enhanced Performance

Beyond Correlation: Bridging Semantics for Robust and Adaptable Recommendations

What Lies Ahead?

See also: