Beyond Lists: Smarter Recommendations with Generative Slate Planning

Author: Denis Avetisyan

A new framework, HiGR, leverages advanced generative techniques to move beyond simple item lists and create more diverse and satisfying recommendation slates for users.

HiGR employs a framework centered around semantic tokenization-achieved through a CRQ-VAE-and hierarchical slate decoding, enabling coarse-to-fine generation with global planning and item-specific refinement, further enhanced by an ORPO-based preference alignment module that iteratively optimizes output quality based on user feedback.

HiGR utilizes hierarchical planning, contrastive quantization, and preference alignment to improve the efficiency and quality of generative slate recommendation systems.

While generative models show promise in slate recommendation-presenting users with ranked lists of items-existing approaches struggle with inefficient decoding and semantically entangled item representation. To address these limitations, we introduce HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment, a novel framework that decouples slate generation into hierarchical planning and item-level decoding, utilizing contrastive quantization for improved semantic structure. Experiments on a large-scale media platform demonstrate HiGR achieves over 10% improvement in offline recommendation quality with a 5x speedup, alongside significant gains in user engagement through online A/B testing. Could this hierarchical approach unlock further advancements in personalized and efficient content delivery across diverse online platforms?

Beyond Simple Prediction: Embracing Generative Recommendation

Early recommendation systems frequently relied on collaborative filtering or content-based approaches, methods that often simplify the intricacies of human preference. These systems typically analyze past interactions – purchases, ratings, or clicks – to predict future behavior, but struggle to account for nuanced relationships between items or the evolving nature of user tastes. A core limitation arises from their inability to fully capture the context surrounding each interaction; a user might choose an item not simply because they inherently prefer it, but due to external factors like current trends, social influence, or even momentary mood. Consequently, these systems frequently produce recommendations that are either too predictable, lacking in novelty, or fail to truly reflect the user’s underlying – and often complex – desires. The inability to model these intricate connections limits their capacity to deliver genuinely personalized and satisfying experiences.

While sequential recommendation models excel at predicting the next item a user will interact with based on their history, their strength can become a limitation when aiming for discovery. These models, trained to maximize predictive accuracy, often reinforce existing preferences, leading to a “filter bubble” effect and hindering the introduction of genuinely novel or serendipitous recommendations. The very algorithms designed to understand user patterns can inadvertently restrict exposure to items outside of those established patterns, reducing the potential for users to explore new interests or stumble upon unexpected delights. This inherent conservatism stems from the optimization process – minimizing prediction error favors items similar to those already consumed, rather than promoting diversity or challenging user expectations.

Generative recommendation represents a fundamental shift in how recommendation systems are designed, moving away from predicting ratings or ranking items towards directly generating sequences of recommendations. Instead of classifying user-item interactions, this paradigm treats the task as a language modeling problem – akin to how large language models generate text. The system learns the underlying patterns of user behavior and item characteristics, then generates a plausible sequence of items a user might interact with next. This allows for more diverse and nuanced recommendations, as the model isn’t constrained by pre-defined item lists or similarity metrics; it can creatively combine preferences and suggest items a user might not have discovered otherwise. By framing recommendation as a generation task, researchers are unlocking the potential for systems that can proactively anticipate user needs and curate truly personalized experiences, extending beyond simply reflecting past behavior.

Generative recommendation systems are beginning to redefine personalization through the application of large language models. These models, pre-trained on massive datasets of text and code, are adapted to understand user behavior and item characteristics, not as discrete data points, but as elements within a sequential narrative. This allows the system to generate recommendations – crafting item lists that feel less like algorithmic selections and more like continuations of a user’s established preferences. By framing the recommendation task as a language modeling problem, the system can produce diverse and contextually relevant suggestions, going beyond simply predicting the next item a user might interact with to proactively shaping a more compelling and individualized experience. The result is a potential shift from reactive recommendation – responding to past actions – to proactive curation, anticipating future desires and fostering deeper user engagement.

The contrastive semantic ID pipeline establishes a method for identifying and distinguishing semantic information.

HiGR: A Hierarchical Approach to Refined Slate Generation

The Hierarchical Generative Slate Recommendation (HiGR) framework is a new approach to generating recommendation slates designed to overcome limitations in existing methods. Traditional slate generation often struggles with both computational efficiency and the creation of diverse, yet relevant, recommendations. HiGR addresses these challenges by structuring the generation process hierarchically, allowing for a coarse-to-fine refinement of potential slate compositions. This decomposition enables the system to explore a wider range of item combinations while maintaining a feasible computational cost, ultimately aiming to deliver higher-quality recommendation slates to users.

HiGR employs a Hierarchical Slate Decoder to generate recommendation slates through a two-stage process. Initially, a coarse-grained stage determines the high-level characteristics of the slate, such as overall theme or category distribution. This is followed by a fine-grained stage that populates the slate with specific items, conditioned on the output of the coarse stage. This decomposition improves computational efficiency by reducing the search space for relevant items and enhances diversity by allowing for controlled variation at both levels of the hierarchy. The hierarchical structure facilitates the generation of slates that balance relevance with novelty, addressing limitations of single-stage slate generation methods.

HiGR utilizes Contrastive Representation Question Variational Autoencoder (CRQ-VAE) for semantic tokenization, a process which represents items based on their underlying semantic meaning rather than surface-level features. Evaluation demonstrates a Semantic Consistency of 66.47%, indicating the degree to which generated slates maintain coherent semantic themes. Furthermore, the system achieves a Collision Rate of 2.37%, representing the proportion of generated items that are semantically redundant within a single slate; this low rate is a direct result of the contrastive learning component, which explicitly penalizes similar representations during the encoding process.

HiGR’s ability to generate both relevant and surprising recommendation slates stems from its capacity to model complex item relationships beyond simple co-occurrence. The framework achieves this by representing items as semantic tokens, enabling the identification of nuanced connections that traditional methods may overlook. This allows for the inclusion of items that are not immediately obvious matches but possess underlying semantic similarities to user preferences, contributing to increased slate diversity. The resulting slates balance familiarity-through the inclusion of highly relevant items-with novelty, as the model can propose items offering a degree of surprisingness while remaining conceptually aligned with the user’s interests.

HiGR demonstrates superior efficiency compared to OneRec.

Listwise Alignment: Optimizing for Holistic User Engagement

Listwise Preference Alignment within HiGR functions by optimizing the quality of recommendation slates – the complete set of items presented to a user – based on implicit feedback signals. This approach moves beyond optimizing individual item rankings and instead focuses on the overall user experience with the entire slate. Implicit feedback, such as video views, watch time, and request counts, is leveraged to determine slate-level quality. The system learns to prioritize slates that maximize these metrics, effectively aligning recommendations with user preferences as demonstrated through their interactions, without requiring explicit ratings or preferences.

HiGR leverages established preference optimization techniques – specifically, Direct Preference Optimization (DPO), Simple Preference Optimization (SimPO), and Odds Ratio Preference Optimization (ORPO) – to refine recommendation quality. These methods function by directly optimizing the model based on implicit user feedback signals, such as which slates of recommendations users engage with more frequently. DPO utilizes a loss function derived from the Bradley-Terry model to maximize the likelihood of preferred responses. SimPO simplifies this process with a computationally efficient approach. ORPO, conversely, focuses on directly modeling the odds ratio of preference, providing an alternative optimization target. Integrating these techniques allows HiGR to move beyond traditional ranking metrics and align generated recommendations with observed user preferences at the slate level.

Online A/B testing of HiGR demonstrated statistically significant improvements in key user engagement metrics. Specifically, the implementation resulted in a 1.73% increase in Average Video Views, indicating a greater propensity for users to initiate video playback with HiGR recommendations. Average Watch Time increased by 1.22%, suggesting users were more engaged with the content served. Furthermore, the Average Request Count saw a 1.57% increase, reflecting a higher overall level of user interaction with the recommendation system.

Listwise Preference Alignment, utilized within HiGR, focuses on optimizing recommendation quality by directly addressing user appeal in addition to diversity. This is accomplished through integration with preference optimization methods – including Direct Preference Optimization (DPO), Similarity Preference Optimization (SimPO), and Odds Ratio Preference Optimization (ORPO) – which learn from implicit user feedback signals. By prioritizing recommendations that users demonstrably prefer, as evidenced by metrics like increased video views (1.73%), watch time (1.22%), and request counts (1.57%) observed in A/B testing, the system moves beyond simply providing varied results to delivering content aligned with individual user tastes.

HiGR consistently improves both convergence loss and NDCG@5 as model size increases, demonstrating its scalability.

Expanding the Horizon: HiGR’s Broad Applicability and Future Potential

HiGR establishes a new benchmark in generative recommendation systems, consistently surpassing the performance of established models like ListCVAE, SASRec, BERT4Rec, TIGER, and HSTU across multiple datasets. This advancement isn’t simply incremental; HiGR’s hierarchical generation process and refined preference alignment techniques demonstrably improve recommendation accuracy and diversity. Rigorous testing reveals that HiGR not only predicts user preferences more effectively but also generates more relevant and engaging item sequences, suggesting a superior ability to capture the nuances of individual user behavior. The consistent outperformance across various models and datasets highlights HiGR’s robustness and potential for widespread adoption in diverse recommendation applications.

The HiGR model distinguishes itself through a highly adaptable architecture, intentionally designed for straightforward incorporation of established and emerging recommendation techniques. This modularity facilitates the seamless integration of methods like contrastive learning, which enhances representation quality by drawing similar items closer and dissimilar ones further apart, and self-attention mechanisms, enabling the model to weigh the importance of different items within a user’s interaction history. By not being rigidly tied to a specific algorithmic approach, HiGR allows researchers and developers to readily experiment with and benefit from advancements in the field, customizing the model to optimize performance across a spectrum of datasets and recommendation tasks. This flexibility promises a sustained advantage, ensuring HiGR remains at the forefront of generative recommendation technology as the landscape evolves.

The HiGR framework distinguishes itself through adaptability, successfully navigating a wide spectrum of recommendation challenges and data characteristics. Unlike models tailored to specific datasets or interaction types, HiGR’s architecture readily accommodates varying user behaviors, item attributes, and feedback modalities. Evaluations across multiple benchmark datasets – encompassing scenarios from e-commerce and movie recommendations to music streaming – demonstrate consistent performance gains. This versatility stems from its capacity to model hierarchical preferences and align recommendations with nuanced user interests, regardless of the underlying data structure. Consequently, HiGR offers a robust and generalizable solution, poised to enhance recommendation systems across a multitude of application domains and evolving data landscapes.

Evaluations confirm that HiGR’s hierarchical structure and preference alignment strategies demonstrably improve recommendation quality. By organizing user interactions and item attributes into multiple levels of abstraction, the model captures nuanced relationships often missed by traditional methods. This allows HiGR to not only predict what a user might like, but also why they might like it, leading to recommendations that are more aligned with individual tastes. The resulting increase in relevance and novelty fosters a more engaging user experience, potentially increasing interaction and satisfaction. Ultimately, this research suggests a pathway towards recommendation systems that move beyond simple prediction to deliver truly personalized content discovery, promising a future where recommendations feel intuitive and tailored to each user’s unique profile.

The development of HiGR exemplifies a systemic approach to slate recommendation, mirroring the belief that structure dictates behavior. The framework’s hierarchical planning and multi-objective preference alignment aren’t simply added features, but integral components defining the system’s operational logic. As Donald Davies observed, “The difficulty is not in making systems complex, but in making them simple.” HiGR strives for this simplicity by breaking down the recommendation process into manageable, interconnected layers, optimizing efficiency without sacrificing quality-a testament to elegant design emerging from clarity and a holistic understanding of the recommendation ecosystem. The contrastive quantization technique, specifically, highlights the importance of defined boundaries within a complex structure.

Where Do We Go From Here?

The pursuit of generative slate recommendation, as exemplified by HiGR, inevitably encounters the fundamental constraints of any system attempting to model complex human preference. While hierarchical decoding offers a pragmatic approach to mitigating computational expense, it merely trades one set of limitations for another. The structure is the behavior, after all, and a coarser hierarchy implies a certain blindness to nuanced detail. If the system looks clever, it’s probably fragile. The elegance of contrastive quantization should not obscure the fact that it’s a form of controlled information loss – a necessary sacrifice, perhaps, but a sacrifice nonetheless.

Future work will likely focus on refining the objective functions used for preference alignment. Current approaches treat preference as a signal to be optimized, but rarely address the inherent instability of that signal itself. Human tastes are not static; they are fluid, contradictory, and frequently illogical. A truly robust system will need to incorporate models of cognitive dissonance and shifting priorities.

Ultimately, the field faces a choice: pursue ever-more-complex architectures in the hope of capturing increasingly subtle patterns, or embrace a more minimalist approach, acknowledging that some degree of approximation is unavoidable. Architecture is the art of choosing what to sacrifice. The former risks overfitting to spurious correlations; the latter, a kind of elegant resignation. Either way, the real challenge lies not in generating recommendations, but in understanding the limitations of the very notion of “relevance.”

Original article: https://arxiv.org/pdf/2512.24787.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Simple Prediction: Embracing Generative Recommendation

HiGR: A Hierarchical Approach to Refined Slate Generation

Listwise Alignment: Optimizing for Holistic User Engagement

Expanding the Horizon: HiGR’s Broad Applicability and Future Potential

Where Do We Go From Here?

See also: