Author: Denis Avetisyan
New research reveals that popular AI assistants built on retrieval-augmented generation routinely leak sensitive user data during everyday conversations.

PrivacyBench, a novel benchmark, demonstrates over 15% leakage of user secrets and underscores the need for privacy-preserving retrieval mechanisms in personalized AI systems.
While personalized AI promises enhanced user experiences, its reliance on sensitive data creates substantial privacy risks-a paradox acutely addressed by our work, ‘PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI’. We introduce a novel benchmark and evaluation framework revealing that current Retrieval-Augmented Generation (RAG) assistants leak user secrets in over 15% of conversational interactions, despite attempts at mitigation through privacy-aware prompting. This leakage stems from indiscriminate data retrieval, placing undue burden on the generator and creating a critical single point of failure. Can architectural innovations prioritizing privacy at the retrieval stage unlock truly secure and ethical personalized AI experiences?
The Paradox of Personalization: Balancing Utility and Privacy
The convenience of personalized assistants – those increasingly ubiquitous digital companions – stems directly from their capacity to accumulate and analyze vast amounts of user data. This reliance, however, introduces substantial privacy risks, as detailed personal information – encompassing spoken requests, location history, contact lists, and even sensitive habits – becomes concentrated in the hands of service providers. While these data streams enable increasingly accurate and helpful responses, they also create potential vulnerabilities to data breaches, unauthorized surveillance, and the misuse of personal information. The very features that define the utility of these assistants – proactive suggestions, tailored recommendations, and seamless integration into daily life – are fundamentally dependent on continuous data collection, presenting a complex trade-off between functionality and the safeguarding of individual privacy.
Current evaluations of personalized assistant performance, such as those provided by benchmarks like LaMP and LongLaMP, heavily emphasize the achievement of increasingly sophisticated personalization capabilities. However, these assessments largely overlook a critical dimension: the potential for privacy breaches that accompany such enhanced functionality. This prioritization creates a significant gap in understanding how vulnerable these systems are to the unintentional or malicious disclosure of sensitive user information. While these benchmarks demonstrate impressive advancements in tailoring responses to individual preferences, they fail to rigorously probe the extent to which user data is retained, processed, or potentially exposed during the personalization process, leaving a considerable space for unexplored security and privacy risks that could undermine user trust and data protection.
Effective privacy assessment transcends mere data anonymization, necessitating a comprehensive framework that analyzes the entire information lifecycle within personalized assistant systems. Current approaches often focus on obscuring identifying details, but fail to account for how seemingly innocuous data, when aggregated or used in conjunction with other information, can reveal sensitive user details. A robust evaluation must therefore map the flow of information – from initial data collection and storage, through processing and model training, to the final delivery of personalized responses – identifying potential leakage points and quantifying the risk of re-identification. This requires considering not only what data is collected, but also how it is used, where it is stored, and who has access, enabling a more nuanced understanding of privacy vulnerabilities and paving the way for truly privacy-preserving personalized assistants.

Contextual Integrity: A Foundation for Privacy Evaluation
Contextual Integrity Theory reframes privacy as a matter of appropriate information flow rather than solely data confidentiality. This perspective posits that privacy is not simply about preventing access to personal information, but about whether that information is handled in a manner consistent with the context in which it was given. Normative information flow is determined by pre-existing social norms governing who can access what information, under what conditions, and for what purposes. Violations of privacy, therefore, occur when these contextual norms are breached, even if no data is technically “secret” or accessed by unauthorized parties. The theory emphasizes that information flows are inherently social and are evaluated based on appropriateness within specific contexts, rather than absolute rules about data access.
Privacy breaches, according to Contextual Integrity theory, are not solely determined by the confidentiality of data, but by deviations from established contextual norms governing information flow. These norms dictate what information is appropriate to collect, with whom it is permissible to share it, and under what conditions access is granted. A violation occurs when information is moved or used in a manner inconsistent with these understood expectations, even if the information itself isn’t sensitive or secret. This means that technically secure data handling does not guarantee privacy if those processes disregard established contextual boundaries, and conversely, a lack of technical security isn’t the sole determinant of a privacy violation; the appropriateness of the information flow is paramount.
Evaluation of personalized assistant data handling through the lens of Contextual Integrity requires a multi-faceted approach. This involves determining the specific contextual norms relevant to the assistant’s function – considering what information users reasonably expect to share, with whom, and for what purposes. Assessment then focuses on whether the assistant’s data collection, storage, processing, and dissemination practices conform to these established norms; deviations constitute privacy violations even if data is technically secure. Crucially, this evaluation must extend beyond explicit consent, accounting for implicit understandings and societal expectations regarding appropriate information flows within the assistant’s operational context.
PrivacyBench: A Standardized Framework for Privacy Assessment
PrivacyBench functions as a standardized benchmark for assessing the privacy characteristics of personalized assistant technologies. It achieves this by constructing simulated conversations – multi-turn interactions – that mirror typical user engagements. These interactions are not based on real user data, but are algorithmically generated to represent realistic dialogue patterns and information exchanges. The benchmark’s design allows researchers to systematically query the assistant during these simulated conversations and analyze its responses for potential disclosures of sensitive user information, providing a quantifiable metric for privacy performance across different assistant implementations and configurations.
PrivacyBench extends the capabilities of prior personalized assistant benchmarks, such as PersonaBench, by integrating dedicated privacy assessments alongside standard personalization performance measurements. While benchmarks like PersonaBench primarily focus on evaluating the coherence and engagingness of assistant responses, PrivacyBench introduces a complementary evaluation dimension focused on identifying potential information leakage. This is achieved by quantifying the extent to which sensitive user attributes can be inferred from the assistant’s behavior during multi-turn conversations, providing a more holistic assessment of the system’s overall security and user-data protection capabilities.
PrivacyBench constructs realistic user profiles and interaction scenarios using synthetically generated data, avoiding reliance on potentially sensitive real user data. This synthetic data is not randomly created; it is structured according to a defined Social Graph which models relationships and common attributes among users. The Social Graph ensures statistical consistency within the generated population, simulating plausible connections and shared characteristics. This approach allows for controlled experimentation and repeatable privacy evaluations across a diverse, yet statistically representative, user base without compromising individual privacy.
PrivacyBench employs two distinct probing strategies to assess privacy vulnerabilities in personalized assistants. The Direct Probing Strategy involves explicitly querying the assistant for sensitive Personally Identifiable Information (PII) such as name, address, or date of birth. This directly tests whether the assistant will reveal protected data. Conversely, the Indirect Probing Strategy assesses privacy by observing the assistant’s responses to seemingly innocuous prompts, analyzing if those responses inadvertently reveal PII or allow inference of sensitive attributes through contextual clues. This approach aims to detect more subtle privacy leaks that might not be exposed through direct questioning, providing a more comprehensive privacy evaluation.
Quantifying Privacy: Metrics and Empirical Insights
Quantifying the nuanced concept of privacy requires precise measurement, and PrivacyBench addresses this challenge through a suite of key metrics. Leakage Rate assesses the extent to which sensitive user data is inadvertently revealed, while the Inappropriate Retrieval Rate gauges how often irrelevant or unauthorized information is accessed. Critically, a system that prioritizes privacy too strongly can become unusable; therefore, PrivacyBench also tracks the Over-Secrecy Rate, which measures instances where legitimate requests are incorrectly blocked. By systematically evaluating these three dimensions, PrivacyBench provides a comprehensive and quantifiable understanding of privacy violations, moving beyond subjective assessments to data-driven insights into system vulnerabilities and the delicate balance between data protection and utility.
Initial evaluations using a standard prompt revealed substantial privacy vulnerabilities within the system. The study quantified information leakage through a metric termed ‘Leakage Rate’, observing an average of 15.80% – indicating a significant proportion of prompts resulted in the unintentional disclosure of sensitive data. Compounding this issue, the ‘Inappropriate Retrieval Rate’ reached 62.80%, demonstrating that over half of the system’s responses contained information that should not have been revealed given the query. These baseline results underscore the inherent challenges in safeguarding user privacy, even without malicious intent, and highlight the necessity for proactive privacy-enhancing techniques to mitigate these risks.
A substantial reduction in information leakage was observed through the implementation of a privacy-aware system prompt. Initial assessments revealed a Leakage Rate – the proportion of sensitive user data inadvertently revealed – of 15.80%. However, by strategically redesigning the system’s guiding instructions to prioritize privacy, this rate was demonstrably lowered to just 5.12%. This improvement indicates a significant enhancement in the system’s ability to protect user confidentiality, suggesting that carefully crafted prompts can serve as a powerful tool in mitigating privacy risks within large language models and similar technologies. The observed decrease underscores the potential for proactive privacy engineering, rather than solely relying on reactive measures after vulnerabilities are identified.
The implementation of a privacy-aware system prompt not only minimized information leakage but also refined the balance between data protection and useful responses, as evidenced by a reduction in the Over-Secrecy Rate from 35.74% to 27.80%. This metric indicates the frequency with which a system unnecessarily withholds information, hindering its utility; lowering this rate suggests the prompt successfully allowed for relevant data retrieval without compromising privacy. The observed decrease demonstrates that privacy enhancements need not come at the cost of functionality; instead, a carefully constructed prompt can optimize both aspects, providing a more effective and user-friendly experience while safeguarding sensitive data.
The study demonstrates a critical vulnerability inherent in many modern systems: the unintentional disclosure of sensitive user data through seemingly harmless personalization features. Analysis reveals that even when not explicitly requested, large language models can leak private information-averaging a 15.80% Leakage Rate with baseline prompts-simply by tailoring responses to perceived user preferences. This leakage isn’t necessarily malicious; it stems from the model’s attempt to provide relevant and engaging answers, inadvertently drawing upon and revealing underlying private details. The findings underscore a crucial need for careful evaluation and proactive mitigation of privacy risks associated with personalization, as the pursuit of enhanced user experience can, without proper safeguards, compromise data security and individual privacy.
The study reveals a critical tension within personalized AI systems-the pursuit of relevance often compromises user privacy. This echoes Paul Erdős’s observation: “A mathematician knows a lot of things, but a physicist knows a few.” In this context, the ‘mathematician’ represents a system focused solely on performance metrics, while the ‘physicist’ understands the underlying structural vulnerabilities. PrivacyBench demonstrates that focusing on generative safeguards alone-the equivalent of polishing the surface-fails to address the fundamental leakages occurring at the retrieval stage. Every new dependency introduced for personalization, as highlighted in the research, carries a hidden cost to user privacy, demanding a holistic architectural approach to secure these complex systems.
What Lies Ahead?
The findings presented here are not, perhaps, surprising. The rush to embed large language models within retrieval-augmented generation systems has, predictably, prioritized functionality over foundational security. It reveals a familiar pattern: complexity layered upon complexity, with privacy addressed as an afterthought – a filter applied to the output rather than woven into the architecture. The observed leakage rates-exceeding 15%-suggest that post-hoc safeguards are, at best, a temporary palliative.
Future work must shift the focus upstream. The retrieval stage, currently a black box of semantic similarity, demands greater scrutiny. A more nuanced understanding of contextual integrity is needed-not simply identifying sensitive keywords, but reasoning about the relationships between information and the expectations of the user. The challenge lies in building systems that inherently minimize the exposure of private data, even at the cost of some retrieval breadth.
Ultimately, the pursuit of personalized AI necessitates a re-evaluation of the trade-offs between utility and privacy. Simplification-limiting the scope of retrieval, employing differential privacy techniques, or embracing federated learning-may prove essential. Each of these choices carries a cost, a reduction in performance or convenience. But the alternative-a future of pervasive surveillance masked as helpful assistance-is a price too high to pay.
Original article: https://arxiv.org/pdf/2512.24848.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Clash Royale Best Boss Bandit Champion decks
- Vampire’s Fall 2 redeem codes and how to use them (June 2025)
- Mobile Legends January 2026 Leaks: Upcoming new skins, heroes, events and more
- Clash Royale Furnace Evolution best decks guide
- M7 Pass Event Guide: All you need to know
- Clash of Clans January 2026: List of Weekly Events, Challenges, and Rewards
- World Eternal Online promo codes and how to use them (September 2025)
- Best Arena 9 Decks in Clast Royale
- Clash Royale Season 79 “Fire and Ice” January 2026 Update and Balance Changes
- Clash Royale Witch Evolution best decks guide
2026-01-04 19:08