Seeing Eye to Eye: Tracking Human Awareness in Robot Teams

Author: Denis Avetisyan


A new dataset and methodology allow researchers to measure how quickly humans process information when collaborating remotely with robots.

The interface presents a single, focused view of robotic operations, allowing an operator to monitor robot status and actions within the environment, while simultaneously verifying detected signs of casualties and their precise location - a design prioritizing situational awareness over comprehensive data display.
The interface presents a single, focused view of robotic operations, allowing an operator to monitor robot status and actions within the environment, while simultaneously verifying detected signs of casualties and their precise location – a design prioritizing situational awareness over comprehensive data display.

This paper introduces the HRI-SA dataset, a multimodal resource for assessing human situational awareness latency during remote human-robot teaming using eye-tracking and machine learning.

Maintaining robust situational awareness is critical in complex human-robot teams, yet operators frequently experience lapses, particularly under high cognitive load. To address this limitation, we introduce ‘HRI-SA: A Multimodal Dataset for Online Assessment of Human Situational Awareness during Remote Human-Robot Teaming’, a novel resource comprising eye-tracking, biosignals, and interaction data collected during a realistic search-and-rescue scenario with [latex]\mathcal{N}=30[/latex] participants. Our analysis demonstrates the feasibility of detecting perceptual situational awareness latency-the time between an event requiring assistance and the operator’s response-using readily available eye-tracking features and machine learning, achieving up to 80.38% F1-score. Will this dataset accelerate the development of adaptive interfaces that proactively support operators and enhance team performance in dynamic, real-world applications?


The Fragile Consensus: Shared Awareness in Human-Robot Teams

Successful collaboration between humans and robots fundamentally depends on a shared understanding of their surroundings – a concept known as Situational Awareness. However, this shared awareness is frequently compromised by inherent delays in how operators perceive and interpret information. These delays aren’t simply about reaction time; they encompass the entire cognitive process of recognizing a change in the environment, understanding its significance, and projecting its future impact. Consequently, even a seemingly minor lag in an operator’s comprehension can disrupt the seamless coordination vital for effective Human-Robot Teaming, potentially leading to miscommunication, inefficient task execution, and, in critical scenarios, compromised safety. Bridging this gap requires not only faster data transmission but also innovative methods for presenting information in a way that minimizes cognitive load and accelerates the operator’s ability to build and maintain an accurate mental model of the situation.

Discrepancies between an operator’s mental model of a situation and the actual state of the environment – known as gaps in situational awareness – represent a significant vulnerability in human-robot teams. These gaps aren’t merely cognitive missteps; they directly translate into performance deficits and heighten the risk of critical errors. A robot operating under the assumption of a correctly understood environment, while the operator harbors a flawed perception, can lead to miscoordinated actions, delayed responses, and potentially hazardous outcomes. The severity of these effects is compounded in dynamic scenarios where rapid adaptation is crucial; even brief lapses in shared understanding can escalate into significant problems, underscoring the need for robust methods to identify and mitigate these awareness gaps.

Conventional methods of gauging situational awareness – such as relying on post-activity questionnaires or interrupting ongoing tasks for verbal reports – frequently fall short when applied to dynamic, real-world scenarios. These techniques offer only a snapshot in time, failing to capture the temporal element of awareness; they cannot reveal how and when critical information is actually perceived and integrated by a human operator. This creates a significant challenge because delays in comprehension, even brief ones, can cascade into errors in judgment, particularly when coordinating with robotic teammates. Consequently, researchers are increasingly focused on developing more nuanced assessment tools-like eye-tracking and physiological monitoring-capable of continuously measuring the operator’s cognitive state and pinpointing the precise moments when awareness lags behind the evolving environment, ultimately improving the safety and efficacy of human-robot collaboration.

The efficiency and, crucially, the safety of human-robot collaboration hinges on minimizing the delay – the latency – between when critical information arises and when the human operator fully comprehends it. Research indicates this cognitive lag isn’t simply about reaction time; it encompasses the entire process of perceiving a change in the environment, recognizing its significance, and integrating it into a coherent understanding of the situation. Prolonged latency can create a dangerous disconnect, where the robot acts on outdated or incomplete information, or the operator fails to provide timely guidance. Consequently, investigations are increasingly focused on developing methods to quantify this latency – employing techniques like eye-tracking and neuroimaging – and, more importantly, on designing systems that proactively reduce it, perhaps through predictive displays or adaptive automation that anticipates operator needs and preemptively highlights crucial data.

The user study utilized a tunnel and cave environment where the robot required teleoperation in constrained spaces and waypoint guidance at intersections, with efficient tunnel exploration yielding object detection in the order a-f and the cave allowing for parallel object detection via two robots-a-i and j-n.
The user study utilized a tunnel and cave environment where the robot required teleoperation in constrained spaces and waypoint guidance at intersections, with efficient tunnel exploration yielding object detection in the order a-f and the cave allowing for parallel object detection via two robots-a-i and j-n.

Quantifying the Drift: A Dataset for Situational Awareness Measurement

The HRI-SA Dataset was developed to provide objective measurement of Situational Awareness (SA) in Human-Robot Interaction (HRI) scenarios. Data collection occurred within a Simulated Human-Robot Teaming (HRT) Environment, allowing for controlled experimentation and repeatable conditions. The dataset incorporates multimodal data streams, specifically eye-tracking metrics – including gaze position and fixation duration – physiological biosignals such as heart rate variability and electrodermal activity, and detailed records of robot state, including position, actions, and communicated intentions. This combination of data sources facilitates the analysis of cognitive processes related to SA, moving beyond subjective self-reports to quantifiable metrics.

The HRI-SA Dataset facilitates the differentiation between two distinct types of situational awareness (SA) latency: Perceptual SA Latency and Comprehending SA Latency. Perceptual SA Latency specifically measures the time delay between an environmental change and its initial recognition by the operator; it quantifies the delay in detecting a change in state. Comprehending SA Latency, conversely, measures the subsequent delay between recognizing the change and fully understanding its meaning or implications for the current task. By capturing multimodal data correlated to operator response times, the dataset allows for the separate analysis and quantification of these two cognitive processes, providing a more granular understanding of SA breakdowns than traditional, holistic measures.

Classification of Perceptual Situational Awareness Latency (PSAL) was performed using three machine learning models: Logistic Regression, Decision Trees, and Support Vector Machines. The models were trained and evaluated on the HRI-SA Dataset, resulting in an overall F1-score of 80.38% for PSAL detection. This metric represents a balanced measure of precision and recall, indicating the model’s ability to accurately identify instances of delayed perceptual recognition. Performance varied between models, but the aggregate score demonstrates the feasibility of automated PSAL detection using multimodal data as a proxy for cognitive state.

The performance of the latency detection models is directly correlated to the quality and variety of the input data, specifically eye-tracking metrics and biosignals. These physiological signals function as quantifiable proxies for internal cognitive states such as workload and attentional focus, allowing the algorithms to infer perceptual processing delays. In detecting Perceptual SA Latency (PSAL), the model demonstrated a precision of 72.89%, indicating the proportion of correctly identified PSAL instances out of all instances flagged as such. Complementing this, the model achieved a recall of 88.91%, representing the proportion of actual PSAL instances correctly identified by the model. These metrics demonstrate the model’s ability to both minimize false positives and maximize the detection of genuine perceptual latency events.

Mapping the Response: Algorithms for Latency Classification

A comprehensive evaluation was conducted utilizing several machine learning algorithms to identify optimal methods for classifying latency in operator performance. Algorithms included in the assessment were Naive Bayes, K-Nearest Neighbors, Random Forest, and AdaBoost. This comparative analysis aimed to determine which algorithm best distinguished between different latency levels based on input features. The selection of these algorithms was predicated on their varying approaches to classification – including probabilistic, distance-based, and ensemble methods – to ensure a robust exploration of potential solutions for automated latency identification.

Evaluation of multiple machine learning models-including Naive Bayes, K-Nearest Neighbors, Random Forest, and AdaBoost-demonstrated the technical viability of automated latency classification. These models successfully distinguished between varying degrees of operator delay, indicating that quantifiable metrics related to situational awareness response times can be derived through algorithmic analysis of operator data. The achieved performance, with the highest-performing model reaching an F1-score of 80.38%, suggests that real-time identification of delays impacting operator performance is achievable, allowing for potential intervention or system adaptation to mitigate negative effects.

Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) were investigated as preprocessing steps to reduce the number of input features used in latency classification. Both techniques aim to maximize the separability between classes while minimizing within-class variance. LDA assumes normally distributed data and equal covariance matrices across all classes, projecting data onto a lower-dimensional space using linear combinations of features. QDA, conversely, allows for different covariance matrices for each class, enabling more flexible decision boundaries but also increasing model complexity and the risk of overfitting. Evaluation focused on whether dimensionality reduction via LDA or QDA improved the performance of subsequent classification algorithms, specifically in terms of metrics like F1-score and classification accuracy.

Evaluation of multiple machine learning algorithms for latency classification revealed a Random Forest model, utilizing both eye-tracking data and contextual features, achieved a peak F1-score of 80.38%. This performance represents a statistically significant improvement over models relying solely on eye-tracking features, which produced a lower F1-score of 67.63%. The F1-score, calculated as the harmonic mean of precision and recall, provides a balanced measure of the model’s accuracy in identifying latency events, and the observed difference indicates the contextual features substantially contribute to improved classification performance.

Beyond Measurement: Adaptive Automation and the Future of HRI

The capacity to pinpoint lapses in situational awareness (SA) latency is poised to revolutionize human-robot interaction through adaptive automation. Instead of rigidly following pre-programmed instructions, a system capable of recognizing when an operator’s cognitive grasp of the situation diminishes can dynamically adjust its level of autonomy. When SA latency is detected, the system might offer increased assistance – perhaps by highlighting critical information, suggesting alternative courses of action, or even temporarily assuming greater control – effectively buffering the operator from potential errors. Conversely, as the operator’s SA recovers, the system can seamlessly relinquish control, fostering a collaborative partnership where automation supports, rather than supplants, human expertise. This proactive tailoring of automation levels promises not only to enhance safety and efficiency, but also to reduce operator workload and improve overall system performance in dynamic and demanding environments.

Traditional methods of evaluating team situational awareness, such as the SAGAT and SART techniques, rely on periodic ā€˜freeze-probe’ assessments – moments where operations pause to gauge understanding. However, these snapshots offer only a limited view of cognitive state, potentially missing crucial shifts in awareness during dynamic scenarios. Integrating real-time situational awareness monitoring provides a continuous stream of data, complementing these established methods and building a far more nuanced picture of team performance. This combined approach allows for identification of not just what the team knows at a given moment, but how their understanding evolves over time, revealing patterns and potential vulnerabilities previously obscured by infrequent assessments. The result is a more robust and comprehensive evaluation, critical for optimizing team training and enhancing collaborative effectiveness in complex human-robot interactions.

The capacity to anticipate and address lapses in situational awareness holds significant promise for improving performance in high-stakes human-robot interaction. In challenging environments like search and rescue operations, where time is critical and information is often incomplete, a system capable of proactively adjusting assistance levels based on the operator’s cognitive state can dramatically enhance both safety and efficiency. Similarly, during disaster response or remote operations – such as deep-sea exploration or hazardous material handling – this predictive capability minimizes the risk of errors stemming from cognitive overload or attentional tunneling. By preemptively offering support, such as simplifying displays, providing critical information, or even temporarily assuming control, the technology fosters a more resilient and effective partnership between human and robot, ultimately leading to better outcomes in complex and dynamic situations.

Investigations are increasingly directed toward embedding real-time situational awareness (SA) monitoring within the control architecture of human-robot teams. This integration aims to move beyond simple performance assessment and enable genuinely collaborative systems, where the robot dynamically adjusts its actions based on the operator’s cognitive state. Such closed-loop control systems will leverage SA data to proactively offer assistance, redistribute workload, or even temporarily assume control when detecting lapses in operator attention or understanding. The envisioned outcome is a synergistic partnership, optimizing team performance and resilience in dynamic, complex environments – ultimately fostering a more intuitive and effective interaction between humans and robotic collaborators.

The pursuit of quantifiable metrics, such as detecting situational awareness latency as explored in this research, often feels like chasing a phantom. One attempts to distill the fluid, messy reality of human-robot teaming into neat, measurable units. It echoes a familiar pattern: building systems predicated on the belief that increased complexity equates to improved control. As Linus Torvalds observed, ā€œTalk is cheap. Show me the code.ā€ This dataset, HRI-SA, isn’t merely a collection of data points; it’s an attempt to ground abstract concepts in observable behavior. However, one must remember that even the most meticulously crafted architecture-in this case, a multimodal dataset-is ultimately a prophecy of future limitations, a temporary scaffolding against the inevitable entropy of real-world interaction. The goal isn’t perfect prediction, but a pragmatic understanding of how systems evolve and fail.

The Horizon Beckons

This work, like all attempts to quantify the human-machine partnership, reveals less about control and more about the illusion of it. The HRI-SA dataset offers a snapshot of perceptual latency, a fleeting measure of where attention was rather than where understanding resides. Every new metric, every carefully calibrated eye-tracker, merely refines the map of a territory constantly reshaped by unforeseen events. The promise of predictive models for situational awareness is a siren song; they will inevitably reflect the biases of their training, becoming brittle in the face of true novelty.

The true challenge isn’t detecting when awareness lags, but accepting that it always will. Future efforts should not focus solely on minimizing latency, but on building systems resilient to its inevitable presence. Consider the architecture not as a command structure, but as a scaffolding for graceful degradation. Systems that anticipate their own failures, that allow for ambiguous inputs, and that prioritize human override will prove more valuable than those striving for perfect prediction.

The field chases a phantom: a perfectly aware teammate. It would be wiser to acknowledge that order is just a temporary cache between failures, and to design for a future where the most reliable constant is the unexpected. The value isn’t in anticipating every scenario, but in cultivating the ability to adapt when the inevitable occurs.


Original article: https://arxiv.org/pdf/2603.18344.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-20 18:56