Beyond Predictions: How AliExpress is Using AI to Understand *What* Shoppers Want

Author: Denis Avetisyan

A new generative AI system is moving beyond simply predicting purchases to actively interpreting user intent and tailoring recommendations accordingly.

SIGMA establishes a holistic framework wherein system behavior arises from underlying structure, prioritizing clarity and simplicity as foundational elements of effective design and control.

SIGMA, a semantic-grounded, instruction-driven generative recommender, leverages large language models and multi-task learning to improve recommendation accuracy and versatility at AliExpress.

While current generative recommendation systems often struggle to adapt to evolving user preferences and diverse business needs, this paper introduces ‘SIGMA: A Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress’, a novel approach leveraging large language models and instruction-following to address these limitations. SIGMA grounds item representations in a unified semantic space, utilizes a hybrid tokenization method, and is trained on a large-scale multi-task dataset to fulfill varied recommendation demands with improved accuracy and diversity. Through extensive offline evaluation and online A/B testing at AliExpress, SIGMA demonstrates significant performance gains-but how can these semantic and instruction-driven techniques be further generalized to other complex recommendation scenarios?

Beyond the Echo Chamber: Cultivating Discovery in Recommendation Systems

Conventional recommendation systems frequently operate by identifying items similar to those a user has previously interacted with, a process that, while efficient, inherently restricts the potential for discovery. This reliance on item-to-item comparisons often results in a feedback loop, reinforcing existing preferences and limiting exposure to genuinely novel or unexpected content. Consequently, users may find themselves presented with increasingly narrow selections, hindering serendipitous encounters and diminishing the overall personalization experience. The system’s inability to move beyond established patterns can stifle exploration, leading to a sense of predictability rather than a feeling of tailored discovery, and ultimately failing to fully capture the nuances of individual tastes.

Traditional recommendation systems frequently falter when encountering new users or items – a challenge known as the cold-start problem – because they rely on historical interactions to find similarities. This limitation extends beyond simply identifying what’s already popular; these systems struggle to decipher the nuanced and evolving preferences of individuals. A user’s tastes aren’t static, and often extend beyond easily categorized patterns; they may involve complex combinations of attributes or latent interests not readily apparent from past behavior. Consequently, conventional methods often deliver predictable, and sometimes irrelevant, suggestions, failing to truly personalize the experience or introduce users to genuinely novel content that aligns with their deeper, unexpressed needs.

Generative recommendation systems represent a paradigm shift from simply retrieving existing items to actively constructing personalized content identifiers. Instead of matching users to pre-defined products, these models learn the underlying patterns of user preference and directly synthesize new item representations – effectively ‘imagining’ items a user might enjoy. This approach bypasses the limitations of traditional methods, which struggle with novelty and often reinforce existing biases. By generating unique identifiers, the system can address the cold-start problem – recommending items even with limited user history – and deliver a level of personalization previously unattainable, as the system isn’t constrained by a fixed catalog but can dynamically create suggestions tailored to individual tastes. The potential lies in moving beyond predicting what a user will like, to proactively crafting experiences designed for them.

SIGMA: A System for Semantic Grounding and Generative Recommendation

SIGMA utilizes Large Language Models (LLMs) to create unique identifiers, termed “item tokens,” for each item in the recommendation catalog. These tokens are generated by prompting the LLM with item descriptions and attributes, effectively representing each item as a sequence of tokens understood by the model. This approach allows SIGMA to move beyond traditional methods relying on explicit item features or collaborative filtering; the LLM’s generative capabilities enable the system to synthesize representations even for items with limited interaction data. Consequently, the generated item tokens facilitate personalized recommendations by allowing the LLM to understand item semantics and relationships, and promote diversity by exploring a wider range of potentially relevant items beyond those typically associated with a user’s historical interactions.

SIGMA’s Multi-View Alignment Framework addresses the challenge of representing diverse data types – natural language descriptions, structured world knowledge, and item-specific attributes – within a common representational space. This is achieved by projecting each view-text, knowledge graphs, and item features-into a shared latent space where semantic relationships can be directly compared and contrasted. The framework utilizes contrastive learning to minimize the distance between semantically similar entities across these views while maximizing the distance between dissimilar ones. This alignment process ensures that recommendations are not solely based on superficial similarities but reflect a deeper understanding of item characteristics and user preferences, leading to improved semantic consistency and relevance in generated recommendations.

Item Tokenization within SIGMA addresses the incompatibility between LLM input requirements and typical item representations. Raw item data, such as product titles or descriptions, is converted into discrete tokens understood by the LLM. This process involves utilizing a tokenizer – a subword-based algorithm – to break down item information into a sequence of integer IDs. These IDs represent individual tokens within the LLM’s vocabulary, enabling the model to process and understand item characteristics. The tokenization process is crucial for representing items in a format that facilitates semantic alignment and subsequent generative recommendation tasks; without it, the LLM cannot effectively reason about item attributes or generate relevant recommendations.

SIGMA utilizes an online serving architecture to facilitate real-time performance and scalability.

Refining the Generative Engine: Fine-tuning and Knowledge Transfer

SIGMA employs Multi-Task Supervised Fine-tuning to specialize a pre-trained Large Language Model (LLM) for recommendation tasks. This process involves simultaneously training the LLM on multiple recommendation-related objectives, such as predicting user preferences and item relevance, using labeled data. By optimizing for these diverse, yet interconnected, tasks concurrently, the model learns a more robust and generalized representation of user-item interactions. This approach avoids task-specific overfitting and improves the model’s ability to generalize to unseen data, ultimately enhancing recommendation performance compared to single-task fine-tuning.

Knowledge Distillation is implemented within SIGMA to leverage insights from pre-trained ranking models. This process transfers knowledge by training the model to mimic the softened probability distributions – outputs – of a teacher model, rather than solely relying on hard labels. The similarity between the teacher and student model outputs is quantified using the Kullback-Leibler (KL) Divergence Loss [latex]D_{KL}(P||Q)[/latex], where P represents the teacher model’s output distribution and Q the student model’s. Minimizing this loss encourages the student model to replicate the teacher’s decision-making process, effectively transferring learned ranking capabilities and improving performance without requiring retraining from scratch on the original dataset.

Contrastive Learning within SIGMA employs the InfoNCE Loss function to refine semantic representations of items and users, thereby enhancing recommendation performance. InfoNCE, or Noise Contrastive Estimation, operates by discriminating between positive pairs – representing relevant user-item interactions – and negative pairs constructed from randomly sampled items. The loss function encourages the model to maximize the similarity between positive pairs while minimizing similarity for negative pairs. This process effectively aligns embeddings in a semantic space, leading to improved retrieval of relevant items and a reduction in irrelevant recommendations. The optimization process directly addresses the challenge of semantic ambiguity, ensuring that similar items are clustered closely together in the embedding space and dissimilar items are well-separated.

Performance across tasks improves with model scaling, but varies significantly depending on the method employed.

From Representation to Impact: Efficient Serving and Real-World Validation

SIGMA employs a Hybrid Item Tokenization strategy to overcome challenges in representing the vast and diverse catalog of items typically found in e-commerce. This approach leverages a RQ-VAE – a robust autoencoder – to compress complex item semantic representations into compact SID (Semantic Item ID) tokens. Unlike traditional methods that might rely on one-hot encoding or high-dimensional embeddings, the RQ-VAE learns a lower-dimensional latent space, effectively quantizing item features while preserving crucial semantic information. This quantization isn’t merely about reducing storage; it allows for efficient similarity comparisons and faster retrieval during real-time serving. By distilling item characteristics into these SID tokens, SIGMA facilitates scalable and performant recommendation systems capable of handling massive catalogs and delivering personalized experiences with minimal latency.

The system generates refined item identifiers through a carefully orchestrated three-step process. Initially, candidate identifiers are produced, capturing a broad range of potential item representations. These candidates are then subjected to a filtering stage, eliminating improbable or redundant options. Crucially, an Adaptive Probabilistic Fusion technique intelligently combines the strengths of multiple filtering approaches; instead of relying on a single method, the system dynamically weights each filter’s contribution based on the specific item being processed. This nuanced approach ensures that the final identifier accurately reflects the item’s characteristics while remaining computationally efficient, ultimately improving the precision of item representation and enabling faster, more relevant recommendations.

Recent online A/B testing conducted at AliExpress reveals that the implementation of SIGMA yielded substantial improvements across key performance indicators. Specifically, the system facilitated a +2.80% increase in overall order volume, signaling heightened user engagement and purchasing activity. This translated into a notable +3.84% rise in conversion rate, demonstrating a more effective transition from browsing to completed transactions. Furthermore, the gross merchandise volume (GMV) experienced a significant boost of +7.84%, highlighting the system’s ability to drive increased revenue. Importantly, SIGMA also fostered greater product discovery, as evidenced by a +2.47% expansion in the breadth of purchased categories, indicating users were exploring a wider variety of items – collectively demonstrating a significant and positive real-world impact on the e-commerce platform.

The development of SIGMA highlights a crucial principle: systems break along invisible boundaries-if one can’t see them, pain is coming. This research demonstrates that simply scaling large language models isn’t enough; true progress demands semantic alignment between the model and the underlying data, specifically through innovative item tokenization. SIGMA’s multi-task learning approach anticipates weaknesses by recognizing that diverse recommendation demands require a unified understanding of user intent and item characteristics. As David Hilbert observed, “One must be able to say at all times what one knows and what one does not.” SIGMA embodies this principle, explicitly modeling both known and unknown preferences to provide more robust and versatile recommendations. The system’s success at AliExpress underscores that a holistic, structurally sound approach is essential for building truly intelligent recommender systems.

Future Directions

The emergence of SIGMA, and systems like it, highlights a predictable tension. The pursuit of increasingly granular control over recommendation – shaping outputs with instruction, aligning semantics – risks obscuring a fundamental truth: effective recommendation isn’t about generating preference, but recognizing it. The architecture, while demonstrating clear gains, remains inherently reliant on the quality of the underlying language model and the fidelity of item tokenization. Future work must address the brittleness inherent in these components – the subtle shifts in language, the ever-expanding catalog – rather than simply adding layers of generative complexity.

A more fruitful avenue may lie in a re-evaluation of the ‘multi-task’ paradigm itself. The assumption that diverse recommendation demands can be unified under a single generative framework feels, at best, optimistic. Perhaps the true elegance isn’t in a universal model, but in a system that gracefully delegates tasks – recognizing when generative approaches are appropriate, and when simpler, more robust methods will suffice. The cost of generalization, after all, is often a loss of specificity.

Ultimately, the challenge isn’t to build a system that can recommend anything, but one that understands when not to. A truly intelligent recommender doesn’t simply respond to instruction; it anticipates need, respects boundaries, and acknowledges the inherent limits of its own knowledge. The focus should shift from generating outputs to structuring information – a principle as applicable to algorithms as it is to life itself.

Original article: https://arxiv.org/pdf/2602.22913.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Predictions: How AliExpress is Using AI to Understand What Shoppers Want

Beyond the Echo Chamber: Cultivating Discovery in Recommendation Systems

SIGMA: A System for Semantic Grounding and Generative Recommendation

Refining the Generative Engine: Fine-tuning and Knowledge Transfer

From Representation to Impact: Efficient Serving and Real-World Validation

Future Directions

See also: