The Long Game of Personalized Marketing

Author: Denis Avetisyan

New research shows that AI-driven customer relationship management can maintain initial performance gains over time, even without constant human oversight.

A randomized controlled trial unfolded over eleven months, initially leveraging dedicated marketing professionals-humans-in-the-loop-during a four-month active management phase, before transitioning to a seven-month passive follow-up where autonomous agents continuously learned and adapted, demonstrating a system’s evolution from guided intervention to self-sustaining operation.

A longitudinal case study demonstrates sustained impact from agentic personalization utilizing reinforcement learning and a human-in-the-loop approach with Thompson Sampling.

While scalable personalisation is a key goal of modern Customer Relationship Management, maintaining performance gains over time remains a persistent challenge. This paper, ‘Sustained Impact of Agentic Personalisation in Marketing: A Longitudinal Case Study’, presents an 11-month analysis of a real-world application employing agentic infrastructure to personalise marketing messages. Results demonstrate that, although human curation initially drives the largest gains in engagement, autonomous agents can effectively sustain positive uplift following a transition to passive operation. This suggests a viable symbiotic model where human expertise initializes strategy, while automated systems preserve and scale performance-but can such models adapt to evolving consumer behaviour?

The Erosion of Mass Marketing: A Call for Granular Connection

Historically, marketing strategies have often grouped customers into large segments based on demographics or basic behaviors, a practice that inherently limits the potential for genuine personalization. This approach assumes a degree of homogeneity within each segment that rarely exists in reality, leading to messaging that, while targeted to a group, often misses the mark for individuals. The limitations of broad segmentation stem from its inability to account for the nuances of individual preferences, past interactions, and real-time context. Consequently, customers frequently receive irrelevant or generic communications, fostering a sense of detachment and diminishing the effectiveness of marketing efforts. The shift towards expecting individualized experiences now highlights the inadequacy of these traditional methods, prompting a need for more granular and dynamic approaches to truly connect with each customer.

Modern customer interactions rarely follow a linear path; instead, individuals weave through multiple touchpoints and channels, creating increasingly complex journeys. This shift necessitates a move beyond static marketing materials towards dynamic content – messaging that adapts in real-time based on a customer’s behavior, preferences, and context. However, delivering such personalized experiences isn’t simply about creating varied content; it demands sophisticated orchestration – the seamless coordination of these messages across all channels. Effective orchestration requires technologies that can anticipate customer needs, trigger the appropriate content at the right moment, and ensure a consistent brand experience, ultimately transforming fragmented interactions into cohesive and meaningful engagements.

The modern marketing landscape demands agility, yet many organizations struggle with content workflows that stifle responsiveness. Traditional content creation often relies on lengthy processes and siloed teams, resulting in materials that are outdated or irrelevant by the time they reach the customer. This inefficiency isn’t merely a logistical problem; it directly impacts the customer experience, preventing marketers from delivering the hyper-personalized messages that drive engagement and conversion. The bottleneck extends beyond creation to delivery, as complex systems for content management and distribution frequently lack the flexibility needed to serve individualized content across multiple channels in real-time. Consequently, opportunities to connect with customers on a personal level are missed, hindering efforts to build lasting relationships and maximize marketing ROI.

Agentic CRM: An Infrastructure for Autonomous Engagement

The Agentic CRM infrastructure utilizes sequential decision-making, a process where the system iteratively selects actions based on prior outcomes and current context to optimize marketing interactions. This is achieved through the implementation of reinforcement learning and Markov Decision Processes (MDPs), allowing the CRM to learn optimal strategies for engagement. Each interaction is treated as a state within the MDP, with available actions representing potential marketing messages or offers. The system evaluates the resulting state after each action, receiving a reward signal based on predefined key performance indicators (KPIs) such as click-through rates, conversion rates, and customer lifetime value. By continuously refining its decision-making process through these reward signals, the Agentic CRM dynamically adjusts marketing strategies to maximize desired outcomes and improve overall interaction effectiveness.

The Agentic CRM incorporates a hybrid approach to task execution, utilizing automation for routine processes while retaining the capacity for human intervention when necessary. This human-in-the-loop functionality allows for the review and modification of automated decisions, particularly in complex scenarios or when dealing with novel data. The system’s architecture is designed to facilitate seamless transitions between autonomous operation and manual oversight, providing adaptability to changing conditions and ensuring alignment with strategic objectives. This combined approach enables optimization of marketing interactions beyond the limitations of purely automated systems, improving both efficiency and effectiveness.

Content assembly within the Agentic CRM is facilitated by Atomic Content Components (ACCs), which are discrete, self-contained units of information designed for reuse across multiple marketing interactions. These components, rather than requiring the creation of entirely new content for each channel or segment, enable the dynamic composition of personalized messages from a standardized library. This modular approach minimizes redundancy, reduces the potential for inconsistencies in brand voice or factual accuracy, and significantly accelerates content creation cycles. The reusability of ACCs also streamlines updates; modifications to a single component are automatically reflected wherever that component is utilized, improving content governance and ensuring message integrity.

Thompson Sampling: Navigating the Exploration-Exploitation Paradox

Thompson Sampling operates within a sequential decision-making process by framing marketing strategy selection as an exploration-exploitation trade-off. The algorithm maintains a probability distribution representing the expected reward for each available strategy. At each time step, a strategy is sampled from these distributions, with higher-valued strategies having a greater probability of selection. Following strategy implementation, observed rewards are used to update the corresponding distribution using Bayesian inference, refining the algorithm’s understanding of each strategy’s effectiveness. This probabilistic approach allows the system to continually balance trying new strategies (exploration) with leveraging those already known to perform well (exploitation), optimizing for cumulative reward over time.

The learning process within Thompson Sampling relies on a Cumulative Reward Signal, which quantitatively assesses the performance of each marketing strategy tested. Following each interaction – typically a marketing campaign impression or customer response – the agent receives a reward value reflecting the outcome; positive values indicate successful conversions or desired actions, while negative or zero values signify failure or no effect. This reward is added to the cumulative total for that specific strategy. The agent utilizes these accumulated rewards to maintain a probability distribution representing the expected effectiveness of each strategy; strategies with higher cumulative rewards are assigned higher probabilities, increasing their likelihood of selection in subsequent iterations, thereby refining the approach and maximizing overall impact on conversion rates.

Thompson Sampling enables personalized marketing optimization by dynamically adjusting strategy selection based on observed individual customer responses. The algorithm maintains a probability distribution representing the estimated conversion rate for each marketing approach, per user or user segment. As interactions occur, these distributions are updated using Bayesian inference; successful conversions increase the probability of selecting that strategy for similar customers in the future, while unsuccessful attempts decrease it. This iterative process allows the system to prioritize strategies with higher predicted success rates for each individual, ultimately maximizing the overall conversion rate by moving beyond generalized approaches and focusing on statistically-informed personalization.

Demonstrating Impact: Rigorous Evaluation and Quantifiable Gains

To rigorously evaluate the Agentic CRM Infrastructure, a randomized controlled trial was implemented, dividing users into control and treatment groups. This methodology allowed for a direct comparison of user behavior with and without the influence of the agentic system. Participants were randomly assigned, mitigating potential biases and ensuring that observed differences could be reliably attributed to the infrastructure itself. The trial’s design facilitated precise measurement of key performance indicators, focusing on how the agentic system altered user engagement and conversion rates. This controlled experiment formed the foundation for quantifying the impact of the infrastructure, establishing a statistically sound basis for understanding its effectiveness and demonstrating its ability to drive meaningful results.

To rigorously quantify the agentic system’s effect, researchers implemented a Difference-in-Differences (DID) design, a statistical technique commonly used in econometrics and increasingly applied to digital product evaluation. This approach compared changes in key metrics for a treatment group – those exposed to the agentic CRM – against a control group that did not receive the intervention. Crucially, DID isolates the impact of the agentic system by accounting for pre-existing trends and external factors that might influence the observed changes. This was further strengthened by integrating ML-RATE, a method that leverages machine learning to refine the statistical modeling and accurately attribute incremental gains specifically to the agentic infrastructure, rather than confounding variables. The combined methodology provides a robust and defensible measurement of the system’s contribution to improved user engagement and conversion rates.

Rigorous evaluation of the Agentic CRM Infrastructure revealed substantial gains across several key performance indicators. Specifically, the system generated a 65.3% increase in Direct App Opens, indicating heightened user responsiveness to communications. Engagement also saw a positive trend, with Days with Engagement increasing by 2.8%. Notably, a 1.2% lift in conversion was observed within a strategically important vertical, suggesting a tangible impact on business outcomes. Perhaps most compellingly, the positive effects proved durable; even after the cessation of active human content contribution, the system maintained a 57% lift in Direct App Opens and a 2.4% increase in Days with Engagement for a full seven months, demonstrating the infrastructure’s capacity for sustained, autonomous performance.

Scaling Towards Self-Direction: The Promise of Intelligent Automation

The creation of granular, reusable content – known as Atomic Content Components – often demands significant manual effort. However, recent advancements propose leveraging the capabilities of Large Language Models to automate this process. These models, trained on vast datasets of text and code, can generate diverse content variations based on minimal input, effectively transforming broad concepts into specific, modular pieces. This integration promises to dramatically reduce the time and resources currently devoted to content creation, allowing for a more agile and responsive content strategy. By automating the generation of these foundational elements, systems can rapidly adapt to evolving market demands and personalize content at scale, ultimately streamlining workflows and enhancing overall efficiency.

The capacity for an agentic system to swiftly adjust to dynamic market forces and evolving customer expectations hinges on its ability to leverage Large Language Models for content creation. By automating the generation of tailored content, the system moves beyond pre-programmed responses and towards genuine adaptability. This isn’t merely about speed; the integration allows for nuanced shifts in messaging, tone, and even product focus, responding in real-time to emerging trends and individual customer data. Consequently, the system becomes a self-tuning engine, perpetually optimizing its output to maximize engagement and conversion – a crucial advantage in increasingly competitive landscapes where stagnation equates to decline.

The culmination of this research lies in the Passive Follow-up Phase, which showcases a pathway toward genuinely self-directed systems. By leveraging automated content creation and adaptive messaging, the agentic system continues engagement without requiring any human oversight – a crucial step towards scalable personalization. This phase doesn’t simply respond to prompts; it proactively maintains customer relationships, delivering tailored content based on evolving preferences and market signals. The demonstrated ability to operate autonomously signifies a significant reduction in operational costs and allows for near-infinite scalability, potentially reshaping how businesses cultivate and retain customers through dynamic, perpetually optimized interactions.

The study illuminates a predictable trajectory; initial human intervention provides essential direction, but the system’s longevity hinges on autonomous adaptation. This mirrors the inevitable shift from curated stability to algorithmically-maintained performance, a transition detailed in the longitudinal analysis of agentic CRM. As Alan Turing observed, “No subject can be mathematically treated at all without being reduced to a calculus of logistics.” The paper demonstrates precisely that-a reduction of customer relationship management to a sequence of decisions, allowing an agent to learn and sustain performance over time. The fleeting nature of initial gains, while significant, underscores that true resilience resides in the system’s capacity for ongoing, self-directed refinement-a process where latency, the ‘tax’ on every interaction, is minimized through learned efficiencies.

The Long View

The demonstrated longevity of agentic personalization is less a triumph of automation and more an acknowledgement of inevitable human limitations. Every system, even those meticulously curated, succumbs to entropy. The initial gains from human oversight are, predictably, transient; the true measure lies in the agent’s capacity to maintain performance after the curator’s influence wanes. This isn’t about replacing judgment, but distributing the burden of sustained attention – a resource always in finite supply. The question becomes not whether an agent can replicate human insight, but whether it can preserve it.

Future work must address the implicit cost of this preservation. While Thompson Sampling offers a robust framework for sequential decision-making, it provides little insight into what is being learned, or, crucially, unlearned. A system that sustains performance without understanding its own decay is, at best, delaying the inevitable. The focus should shift toward interpretable agents-systems that not only act, but articulate the rationale behind their actions, revealing the evolving map of customer understanding.

Architecture without history is fragile, and this holds true for agentic systems. The value isn’t simply in achieving a performance metric today, but in building a traceable lineage of adaptation. Every delay in achieving optimal performance is the price of understanding-understanding not just what works, but why, and what conditions might render it obsolete. The true challenge lies in constructing agents capable of graceful degradation, systems that acknowledge their limitations and adapt accordingly – systems that age, not fail.

Original article: https://arxiv.org/pdf/2604.08621.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/