Turning Wearable Data into Actionable Insights

Author: Denis Avetisyan

A new AI system automates the process of identifying meaningful health indicators from the constant stream of data generated by wearable sensors.

CoDaS leverages multi-agent systems and causal inference to accelerate biomarker discovery and clinical validation from wearable sensor data.

Despite the increasing volume of physiological data collected from wearable sensors, translating this continuous stream into clinically actionable biomarkers remains a significant challenge. This work introduces CoDaS-AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors-a multi-agent system designed to automate and accelerate the process of biomarker discovery through iterative hypothesis generation, statistical analysis, and validation. Across multiple cohorts ([latex]\mathcal{N}=9,279[/latex] participant-observations), CoDaS identified 66 candidate digital biomarkers-including circadian instability features and a cardiovascular fitness index-and demonstrated modest improvements in predictive performance for depression and insulin resistance. Could this systematic, traceable approach unlock the full potential of wearable data for personalized health monitoring and early disease detection?

Deciphering the Noise: Biomarker Discovery in the Age of Wearable Sensors

The pursuit of reliable biomarkers – measurable indicators of health or disease – faces increasing difficulty with the proliferation of data from wearable sensors. While devices like smartwatches and fitness trackers generate a wealth of information on physiological signals and daily behaviors, traditional analytical methods often struggle to discern genuine biological signals from the noise inherent in real-world data collection. This challenge stems not only from the sheer volume of data – encompassing heart rate, sleep patterns, activity levels, and more – but also from its inherent complexity, including individual variability, environmental factors, and sensor inaccuracies. Consequently, studies frequently report spurious correlations or unreliable biomarkers, necessitating more sophisticated computational tools capable of filtering noise, identifying meaningful patterns, and validating potential indicators with greater rigor. The promise of personalized medicine hinges on overcoming these hurdles and extracting truly insightful information from the growing stream of wearable data.

The proliferation of wearable sensors and large-scale longitudinal studies, such as the Digital Wellbeing Study and GLOBEM Study, has generated an unprecedented volume of time-series data, presenting a significant challenge to traditional biomarker discovery methods. These datasets, continuously tracking physiological and behavioral metrics over extended periods, far exceed the capacity of manual analysis and conventional statistical techniques. Consequently, researchers are actively developing novel computational approaches – including machine learning algorithms and advanced data mining strategies – to effectively process, profile, and interpret this complex information. The goal is not simply to store the data, but to extract meaningful signals indicative of health states and predict potential risks, demanding tools capable of handling high dimensionality, temporal dependencies, and individual variability inherent in these continuous streams of information.

The promise of personalized medicine hinges on identifying reliable biomarkers from the ever-increasing stream of data generated by wearable sensors, but current analytical methods are proving inadequate to the task. Existing techniques struggle to efficiently sift through the sheer volume of time-series data, hindering both the formulation of testable hypotheses and the rigorous validation needed to confirm true biomarkers. This limitation isn’t merely a matter of computational speed; the intensive manual effort required to analyze even a modest dataset is substantial, with estimates suggesting 37 person-days of work for a single, focused investigation. Consequently, automated approaches are not simply desirable, but essential to unlock the full potential of this data and translate biomarker discoveries into meaningful improvements in healthcare.

CoDaS is a closed-loop system that leverages continuous physiological data, large language models, and deterministic code to iteratively refine candidate biomarkers from natural-language research directives, achieving competitive performance on benchmarks and receiving high scores in blinded human expert evaluations.

A Multi-Agent System for Autonomous Biomarker Discovery

CoDaS employs a Multi-Agent System (MAS) architecture wherein biomarker discovery tasks are decomposed and distributed across multiple autonomous agents. This approach mirrors the parallel processing observed in biological systems, enhancing computational efficiency and scalability. Each agent within CoDaS is responsible for a specific sub-task, such as data acquisition, feature selection, or model evaluation, and operates independently while coordinating with other agents through defined communication protocols. The MAS framework allows for concurrent execution of these sub-tasks, significantly reducing the overall time required for biomarker discovery compared to traditional sequential methods. This distribution of workload also provides inherent fault tolerance, as the failure of one agent does not necessarily halt the entire process.

CoDaS incorporates automated Data Profiling and Hypothesis Generation as central components of its biomarker discovery process. Data Profiling involves the systematic examination of datasets to identify data characteristics, including distributions, relationships, and anomalies. This analysis informs the Hypothesis Generation module, which leverages these identified patterns to formulate potential biological explanations – specifically, hypotheses linking data features to biomarker potential. The system automatically constructs these hypotheses, detailing the observed data patterns and proposing mechanistic links to relevant biological processes, thereby reducing the need for manual curation and accelerating the exploratory phase of biomarker research.

The CoDaS system incorporates the Gemini-3.1 Pro and Gemini-3 Flash large language models to facilitate reasoning processes and expedite iterative analytical tasks. This integration results in a complete automated analysis runtime of 6 to 8 hours. This represents a substantial efficiency gain when contrasted with the estimated 37 person-days required to achieve comparable results through manual data analysis and hypothesis testing.

CoDaS is an autonomous research system that leverages a coordinated network of specialized agents to iteratively discover and validate statistically robust biomarkers, culminating in a reproducible draft manuscript grounded in a deterministic fact sheet and protected against spurious discoveries through leakage prevention, adversarial validation, and rigorous statistical testing with FDR correction.

Fortifying Validity: Eliminating Spurious Signals Through Adversarial Testing

CoDaS utilizes an adversarial validation process wherein two agent-based systems, a ‘critic’ and a ‘defender’, are deployed to rigorously assess potential biomarkers. The ‘defender’ agent proposes biomarker candidates, while the ‘critic’ agent actively attempts to disprove their validity through targeted counter-examples and data perturbations. This iterative process, resembling a game-theoretic scenario, subjects each biomarker to continuous challenge, exposing vulnerabilities and weaknesses in its predictive capability. By forcing the biomarker to withstand adversarial attacks, CoDaS aims to identify robust signals and eliminate spurious correlations that might otherwise appear significant in standard validation procedures.

Statistical validation within the CoDaS framework employs established methods to determine the likelihood that identified biomarkers represent genuine biological signals rather than random occurrences. This assessment incorporates techniques such as p-value calculation, confidence interval estimation, and effect size determination to quantify the strength and reliability of biomarker associations. Rigorous statistical thresholds are applied to minimize false positive rates, and validation is performed on independent datasets whenever possible to ensure generalizability. Furthermore, the robustness of biomarkers is evaluated through sensitivity analyses, assessing how results change with variations in data preprocessing, statistical modeling choices, and the inclusion/exclusion of specific covariates, ultimately increasing confidence in the identified signals.

The WEAR-ME (Wearable Evaluation of Acute Response – Multi-dimensional Evaluation) study serves as a critical dataset for validating CoDaS biomarker candidates by providing a direct correlation between continuously monitored wearable sensor data and corresponding comprehensive clinical panel assessments, establishing a ground truth for comparison. An independent expert review of CoDaS outputs using the WEAR-ME dataset demonstrated an 86% non-rejection rate – meaning 86% of biomarkers identified by CoDaS were supported by the clinical panel data – a statistically significant improvement over the 0% non-rejection rate achieved by the Biomni platform when assessed with the same dataset.

CoDaS demonstrates competitive performance across a suite of benchmarks-including clinical reasoning ([latex]HealthBench[/latex]), real-world data analysis ([latex]DSBench[/latex]), data science code generation ([latex]DataSciBench[/latex]), quantitative reasoning ([latex]DSGym[/latex]), and hypothesis generation ([latex]DiscoveryBench[/latex])-suggesting it possesses the analytical capabilities necessary for autonomous biomarker discovery.

Translating Data into Insight: From Sleep Variability to Cardiovascular Fitness

Recent research utilizing the CoDaS system indicates a compelling link between subtle shifts in sleep patterns and the potential for depressive disorders. This innovative approach identifies Sleep Variability – deviations from an individual’s typical sleep schedule, including timing and duration – as a promising biomarker for early detection. Unlike traditional diagnostic methods, CoDaS offers a non-invasive means of assessment, relying on data-driven analysis to pinpoint changes that might otherwise go unnoticed. The system’s ability to quantify these variations provides a valuable tool for clinicians, potentially enabling earlier intervention and more personalized treatment strategies for individuals at risk of developing depression. This advancement moves beyond simply noting sleep quantity, focusing instead on the quality and consistency of sleep as a crucial indicator of mental wellbeing.

Cardiovascular fitness emerges as a central determinant of metabolic health, according to recent analyses. The system’s findings reveal a strong correlation between an individual’s aerobic capacity and key metabolic markers, including insulin sensitivity, lipid profiles, and glucose regulation. This relationship underscores the potential for tailoring interventions – from exercise prescriptions to dietary adjustments – based on a person’s current fitness level and metabolic risk. By accurately assessing cardiovascular health, the system facilitates a shift towards preventative, personalized medicine, enabling proactive strategies to mitigate the development of metabolic disorders and improve overall well-being. This precision approach promises to move beyond generalized recommendations, fostering more effective and sustainable health outcomes.

The capacity to convert complex datasets into practical knowledge is powerfully demonstrated by recent findings stemming from the integrated analysis of multiple studies. This research showcases how the CoDaS system doesn’t merely collect data, but distills it into actionable insights – meaning the information gained can directly inform interventions and strategies. Crucially, the consistency of these evaluations is exceptionally high, as evidenced by an Inter-Class Correlation (ICC) score of 0.888. This robust inter-rater reliability confirms that the insights generated by CoDaS are not subjective interpretations, but rather dependable and consistently reproducible observations derived from the data itself, bolstering its potential for widespread application and clinical utility.

Despite differences in operationalization and data sources, two independent cohorts-DWB and GLOBEM-converge on circadian instability, specifically through sleep variability, as a consistent biomarker of depression severity, as evidenced by monotonically increasing variability across severity bands.

The development of CoDaS exemplifies a holistic approach to biomarker discovery, mirroring the interconnectedness of systems Hardy described when he stated, “A mathematician, like a painter or a poet, is a maker of patterns.” This system isn’t merely assembling data points; it’s crafting a coherent understanding from the complex patterns inherent in wearable sensor data. The multi-agent architecture, allowing for hypothesis generation and causal inference, demonstrates that scalable solutions arise not from brute force computation, but from clear, well-defined interactions between components. Each agent, functioning within the broader system, contributes to a more robust and insightful pattern, much like the elegance found in a well-proven mathematical theorem.

What Lies Ahead?

The automation of biomarker discovery, as demonstrated by CoDaS, presents a seductive illusion of progress. It is tempting to view this as a simple scaling of existing methods, yet the true challenge resides not in processing more data, but in acknowledging the inherent limitations of the data itself. Wearable sensors, for all their convenience, offer a severely filtered view of biological reality; the signal is rarely the substance. Future work must therefore prioritize not simply the refinement of algorithms, but the development of methods to account for – and even model – the very biases introduced by these convenient proxies.

The multi-agent approach is a promising architectural choice, mirroring the complex, distributed nature of physiological systems. However, the current emphasis on agent collaboration should be balanced with a critical examination of potential emergent behaviors – unintended consequences arising from the interaction of these automated ‘scientists’. A system capable of generating hypotheses also requires mechanisms for rigorously assessing their plausibility beyond statistical significance, demanding a deeper integration of domain knowledge and causal reasoning.

Ultimately, the success of such systems will not be measured by their ability to discover biomarkers, but by their capacity to guide focused, impactful clinical investigation. The true test lies not in replacing the human scientist, but in augmenting their intuition – offering a curated landscape of possibilities, rather than a deluge of statistically significant noise. Simplification, after all, always carries a cost.

Original article: https://arxiv.org/pdf/2604.14615.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Deciphering the Noise: Biomarker Discovery in the Age of Wearable Sensors

A Multi-Agent System for Autonomous Biomarker Discovery

Fortifying Validity: Eliminating Spurious Signals Through Adversarial Testing

Translating Data into Insight: From Sleep Variability to Cardiovascular Fitness

What Lies Ahead?

See also: