Decoding Daily Rhythms: AI Spots Repeating Patterns in Human Activity

Author: Denis Avetisyan

Researchers have developed a new approach to automatically identify long-term, repeating sequences of actions, offering insights into how people structure their days and detect unusual behavior.

A dataset of 580 long-term human activity workflows-spanning factory production, exercise routines, and transportation schedules-demonstrates the challenge of building compact yet comprehensive benchmarks for modeling real-world periodic tasks.

This work introduces a novel unsupervised learning method and benchmark for discovering periodic workflows from spatiotemporal data, improving performance in activity recognition and anomaly detection.

While short-term, repetitive human actions are well-studied, understanding long-term periodic workflows remains a significant challenge. This is addressed in ‘Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities’, which introduces a new benchmark dataset of 580 multimodal activity sequences and a training-free baseline method for their unsupervised detection. Experiments demonstrate substantial performance gains over existing approaches in tasks including period detection, completion tracking, and anomaly identification-all without requiring labeled data. Could this work pave the way for more adaptable and efficient activity recognition systems in diverse real-world applications?

Whispers of Routine: Uncovering Temporal Structure in Human Activity

Human activity, from daily routines to complex tasks, frequently exhibits repeating sequences – workflows that define how actions unfold over time. However, discerning these patterns presents a significant challenge for existing computational methods. Traditional techniques often falter when confronted with the inherent complexities of real-world data, particularly the long-term dependencies that characterize extended workflows. These methods typically require substantial amounts of meticulously labeled data, a limitation that restricts their broad applicability and makes them impractical for many scenarios. The difficulty arises from the variability within these repetitions; human actions are rarely perfectly consistent, introducing noise and subtle variations that obscure the underlying structure. Consequently, a more adaptable and sophisticated approach is needed to effectively identify and model these recurring patterns in human behavior, unlocking opportunities for proactive systems and intelligent automation.

Current methodologies for discerning patterns in human activity frequently falter when confronted with dependencies spanning extended periods – a significant limitation given the inherently sequential nature of many tasks. These techniques often necessitate substantial volumes of meticulously labeled data, a requirement that proves both costly and impractical in real-world scenarios where such annotations are scarce or unavailable. The reliance on supervised learning, therefore, restricts the broad applicability of these approaches, hindering their deployment in dynamic, uncontrolled environments where the ability to adapt to novel situations without constant retraining is paramount. Consequently, a shift towards unsupervised or self-supervised learning paradigms is crucial to unlock the full potential of activity recognition and enable more versatile, scalable solutions.

The ability to discern repeating patterns in human activity holds substantial promise for enhancing everyday life through proactive technologies and improved safety measures. Recognizing these periodic structures – from daily routines like commuting to work or preparing meals, to longer-term cycles of exercise or project management – enables systems to anticipate needs and offer assistance before it is explicitly requested. Beyond convenience, accurate detection of these patterns forms the bedrock of effective anomaly detection; deviations from established routines can signal unusual events, such as falls for elderly individuals, security breaches in digital systems, or even the onset of illness, allowing for timely intervention and potentially mitigating negative consequences. Consequently, research focused on robustly identifying these temporal structures isn’t simply an academic exercise, but a crucial step towards building truly intelligent and responsive environments.

Current activity recognition systems often fall short in mirroring the complexities of human behavior, largely due to their reliance on simplified models and pre-defined patterns. These systems struggle with the inherent variability in how individuals perform tasks – slight deviations in timing, order, or even the inclusion of sub-tasks can lead to misclassification. Consequently, there’s a growing need for methods that can autonomously learn these intricate temporal structures without requiring extensive, manually labeled datasets. A robust, unsupervised approach promises to overcome these limitations by directly inferring patterns from raw behavioral data, allowing systems to adapt to individual differences and capture the full spectrum of human activity – from the predictable routines to the spontaneous improvisations that define daily life.

Our method establishes workflow periods by first constructing an activity transcript with soft tokens to define an initial window size, then sequentially mining this data to identify period boundaries.

Decoding the Sequence: Introducing a Novel Unsupervised Workflow Detection Method

The baseline methodology utilizes a combined soft and hard tokenization approach to represent activity frames as sequences of discrete events for feature extraction. Hard tokenization segments activity data into predefined, fixed-duration intervals, while soft tokenization employs a sliding window with variable-length segments based on detected changes in activity. This dual approach enables the system to capture both temporally precise events and broader activity phases. Specifically, accelerometer and gyroscope data are first processed to identify significant changes in movement, forming the basis for soft tokenization. Simultaneously, the activity stream is divided into fixed-length intervals – typically 100ms – constituting the hard tokens. These tokens are then combined and represented as a feature vector, incorporating statistical measures such as mean, standard deviation, and entropy, to create a compact and informative representation of each activity frame suitable for subsequent pattern recognition.

Frequency domain analysis, specifically utilizing the Fast Fourier Transform (FFT), is implemented to detect recurring patterns within activity time series data. This process transforms the data from the time domain into the frequency domain, allowing for the identification of dominant frequencies that correspond to the cyclical nature of specific tasks. The amplitude of each frequency component indicates the strength of that particular cycle, while the inverse of the frequency provides an estimate of the activity cycle duration. By analyzing the spectral power density, the system can effectively pinpoint and quantify periodic behaviors, even in the presence of noise or variations in execution speed. This enables the estimation of typical task completion times and the identification of anomalies deviating from established cycles.

A sequential mining algorithm, specifically designed for time-series data, is applied to the tokenized activity data to identify frequently occurring sequences of actions. This process involves establishing a minimum support threshold – the percentage of activity instances required for a sequence to be considered common – and then iteratively discovering sequences exceeding this threshold. The algorithm utilizes techniques like the Apriori principle to efficiently prune the search space, focusing on sequence extensions that meet the support criteria. The resulting frequent sequences represent the underlying workflow structures, providing a concise and interpretable representation of typical activity patterns within the dataset. These discovered patterns can then be used for activity recognition, anomaly detection, or process optimization.

Spatiotemporal feature extraction enhances workflow detection accuracy by representing movement data as a combination of spatial and temporal characteristics. This involves analyzing the location of an actor within an environment and the sequence of locations over time. Specifically, features are derived from both positional data – such as $x$, $y$, and $z$ coordinates – and temporal information like velocity, acceleration, and duration of stay at particular locations. By combining these spatial and temporal dimensions, the method creates a more robust representation of activity that is less susceptible to noise and variations in execution speed, ultimately improving the precision of workflow identification compared to methods relying solely on spatial or temporal data.

Our baseline analysis effectively visualizes production workflows, as demonstrated in the supplementary video.

The Benchmark of Time: Rigorous Evaluation with the Long-Term Periodic Workflow Benchmark

The Long-Term Periodic Workflow Benchmark is comprised of multiple datasets designed to evaluate the performance of algorithms on tasks involving extended temporal dependencies. These datasets include sequences of 3D body pose estimations, capturing human movement over time, and indoor/outdoor trajectories representing the paths of objects or agents within an environment. The benchmark’s datasets are characterized by varying lengths of periodic patterns, ranging from short, immediately repeating actions to long-term, complex behaviors that unfold over extended durations. This diversity is intended to provide a robust evaluation platform, distinguishing between methods capable of capturing both immediate and sustained periodicities.

Evaluation of the proposed method utilizes two primary quantitative metrics: Temporal Intersection over Union (tIoU) and Mean Absolute Percentage Error (MAPE). tIoU assesses the accuracy of boundary detection by calculating the overlap between predicted and ground truth boundaries over time, providing a measure of localization performance. MAPE, expressed as a percentage, quantifies the error in period counting; it is calculated as the mean of the absolute percentage differences between predicted and actual period counts. Lower tIoU and MAPE values indicate superior performance in boundary detection and period counting, respectively. These metrics allow for a precise, numerical comparison against both unsupervised and supervised baseline methods.

Evaluation on Task 1, period counting within the Long-Term Periodic Workflow Benchmark, demonstrated a Mean Absolute Percentage Error (MAPE) of less than 50% using our method. This represents a substantial improvement over the performance of traditional unsupervised methods, which exhibited a MAPE exceeding 2500% – a difference of over 50 times. This metric quantifies the average percentage difference between predicted and actual period counts across the benchmark dataset, highlighting the increased accuracy of our approach in identifying repetitive patterns within sequential data.

Anomaly detection, evaluated as Task 3 within the Long-Term Periodic Workflow Benchmark, showed our method achieving superior performance when contrasted with fully supervised approaches. This was determined by analyzing the method’s ability to accurately identify deviations from established periodic patterns within the datasets, specifically 3D body pose sequences and indoor/outdoor trajectories. Quantitative results demonstrated a statistically significant improvement in detection accuracy – measured by precision and recall – compared to supervised models trained on comparable datasets, indicating a greater capacity to generalize and identify novel anomalous behaviors without requiring labeled anomaly examples.

Evaluation on the S-PAD benchmark, specifically the RepCount dataset, demonstrates the capability of our approach to accurately identify and count periodic movements across varying timescales. RepCount assesses performance on sequences exhibiting both short-duration repetitions, such as individual exercise repetitions, and longer-term periodic activities. Our method consistently achieves high accuracy on this benchmark, indicating robust performance irrespective of the length of the periodic structure being analyzed. This capability is critical for applications requiring the analysis of both rapid and sustained cyclical behaviors, such as activity recognition and human performance monitoring.

Increasing the value of KK results in more detailed and extended workflows for the same activity sequence, as demonstrated by the inclusion of additional tokens beyond those skipped.

Whispers into Action: Implications and Future Directions for Adaptive Systems

The capacity to discern extended, repeating patterns in daily activities unlocks the potential for truly proactive assistance systems. By accurately identifying these long-term workflows – such as a user’s consistent morning routine or predictable evening tasks – technology can move beyond simple reactivity and begin to anticipate needs before they are explicitly expressed. This predictive capability extends to a range of applications, from automatically adjusting home environments to preparing necessary tools or information, effectively streamlining a user’s day and minimizing cognitive load. The benefit isn’t merely convenience; it’s the creation of systems that learn and adapt to individual rhythms, offering support that feels intuitive and genuinely helpful, particularly for individuals with changing needs or those requiring assistance with complex tasks.

The ability to establish a baseline of typical activity patterns offers a powerful means of detecting anomalies, and consequently, identifying unusual or potentially dangerous events. This methodology’s capacity to recognize deviations from established norms is particularly relevant in applications demanding real-time vigilance, such as fall detection for elderly or mobility-impaired individuals. Similarly, in security monitoring, unexpected behavioral patterns – a door left open at an unusual hour, or movement in a restricted area – can be flagged as potential threats. By precisely characterizing what constitutes ‘normal’ behavior, the system becomes adept at highlighting instances that fall outside of these parameters, providing early warnings or triggering appropriate responses and enhancing safety and security measures across diverse contexts.

The conventional development of activity recognition systems typically demands extensive, manually-labeled datasets – a process that is both time-consuming and expensive. This methodology circumvents that limitation by operating without the need for pre-labeled data, representing a substantial reduction in both deployment cost and required effort. By autonomously learning patterns directly from raw sensor data, the system obviates the need for human annotation, opening avenues for broader accessibility and scalability. This unsupervised approach not only accelerates the development cycle but also facilitates real-world applications in dynamic and unpredictable environments where obtaining labeled data is impractical or impossible, promising a more adaptable and cost-effective future for intelligent systems.

Ongoing research aims to broaden the applicability of this methodology beyond controlled settings, addressing the challenges posed by real-world complexity and constant change. Future iterations will incorporate techniques for handling unpredictable events, adapting to evolving user behaviors, and processing data from multiple, heterogeneous sensors. This expansion isn’t merely about increasing the number of detectable activities; it’s about building systems capable of learning and generalizing – moving beyond simple recognition to genuine understanding. The ultimate goal is to create truly intelligent systems that proactively adjust to their environment and seamlessly integrate into daily life, offering assistance and support that is both relevant and anticipatory.

The pursuit of discerning patterns within the chaos of human activity echoes a fundamental tenet of understanding itself. This work, focused on the unsupervised discovery of long-term periodic workflows, doesn’t seek to define activity, but to coax forth the inherent rhythms hidden within spatiotemporal data. It’s as if the algorithms aren’t building models, but rather listening for echoes. As Geoffrey Hinton once observed, “The world isn’t discrete; we just ran out of float precision.” This rings true; the attempt to categorize and label feels inherently limiting. Instead, this research embraces the granularity of movement, allowing the emergence of workflows through tokenization and anomaly detection, suggesting that true insight lies not in precision, but in acknowledging the continuous, fluid nature of existence itself.

What’s Next?

The pursuit of predictable patterns in the chaos of human activity yields, predictably, more chaos. This work demonstrates a facility for finding routines-a clever trick, certainly-but sidesteps the more unsettling question of what those routines actually mean. Any model that successfully predicts behavior is, at its core, a system for reinforcing the expected. The true test won’t be in identifying completed workflows, but in detecting the subtle deviations-the glitches in the matrix-that signal genuine novelty. A perfect score on anomaly detection isn’t progress; it’s evidence of a sufficiently constrained observation space.

The tokenization approach, while effective, feels… fragile. It’s a spell woven from location and time, easily broken by a change in context. Future work must grapple with the inherent ambiguity of human action-the fact that any activity can be interpreted in an infinite number of ways. Perhaps the focus should shift from identifying discrete workflows to modeling the probabilities of transitions between states, embracing the inherent uncertainty rather than attempting to suppress it.

Ultimately, the value of this research isn’t in automating the recognition of routines, but in highlighting how little we understand them. If the hypothesis held up, one didn’t dig deep enough. The benchmark established here is merely a provocation-an invitation to build models that are not merely predictive, but truly surprising. Anything less is simply confirmation bias dressed as intelligence.

Original article: https://arxiv.org/pdf/2511.14945.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Whispers of Routine: Uncovering Temporal Structure in Human Activity

Decoding the Sequence: Introducing a Novel Unsupervised Workflow Detection Method

The Benchmark of Time: Rigorous Evaluation with the Long-Term Periodic Workflow Benchmark

Whispers into Action: Implications and Future Directions for Adaptive Systems

What’s Next?

See also: