When Robots Fail: Understanding Human Response in Healthcare

Author: Denis Avetisyan

A new dataset reveals how people react to medical robot errors and what recovery strategies they prefer, paving the way for more intuitive and trustworthy robotic assistants.

The deployment of robotic crash cart platforms extends from the unpredictable currents of a functioning hospital-where adaptability is paramount-to the rigorously controlled conditions of a university laboratory, highlighting the necessity of evaluating system performance across the spectrum of operational environments.

Researchers introduce RFM-HRI, a multimodal dataset analyzing human-robot interaction during item retrieval tasks, focusing on failure scenarios and preferred recovery methods.

Despite increasing deployment of robots in real-world settings, understanding human responses to inevitable interaction failures remains a critical gap in human-robot interaction. To address this, we introduce the ‘RFM-HRI : A Multimodal Dataset of Medical Robot Failure, User Reaction and Recovery Preferences for Item Retrieval Tasks’-a multimodal dataset captured through Wizard-of-Oz studies in laboratory and hospital environments, documenting human reactions to four types of robot failures during item retrieval. Our analysis reveals that failures significantly degrade affective state and perceived control, eliciting confusion, annoyance, and frustration, while also demonstrating a preference for transparent verbal recovery strategies. How can these insights inform the development of more robust and user-friendly robots capable of gracefully handling failures and maintaining user trust in safety-critical applications?

The Inevitable Stumble: Understanding Robotic Fallibility

The promise of seamless human-robot collaboration often stumbles on the reality of robotic fallibility during commonplace activities. When robots misinterpret instructions, struggle with object recognition, or exhibit unreliable performance, it erodes user confidence and sparks frustration. This isn’t merely a matter of inconvenience; repeated failures actively diminish a user’s trust in the robot’s capabilities, creating a negative feedback loop where hesitancy replaces cooperation. Consequently, even minor glitches can significantly impede the development of truly collaborative systems, as individuals become less willing to rely on a partner perceived as unpredictable or unreliable, hindering the potential benefits of human-robot teams in various settings.

Robot malfunctions in human-robot interaction aren’t typically dramatic breakdowns, but rather subtle errors that erode user confidence. These failures frequently present as difficulties in natural language processing – the robot misinterprets spoken commands or responds with garbled speech. Beyond communication, robots struggle with accurate object recognition and retrieval; an incorrect item search, even for a common household object, disrupts the collaborative flow and signals unreliability. Such seemingly minor incidents collectively degrade the user experience, fostering frustration and ultimately hindering the development of truly effective and trusted human-robot partnerships. The cumulative effect of these failures suggests that improving robotic accuracy in everyday tasks is paramount to broader acceptance and integration into daily life.

Simulating Imperfection: A Controlled Environment for Failure Analysis

A Wizard-of-Oz protocol was utilized to emulate autonomous robotic operation while allowing for the controlled introduction of specific failure modes. This involved a human operator remotely controlling a physical robot-designated the ‘Crash Cart Robot’-to mimic expected behaviors during an item retrieval task. This approach enabled systematic induction of pre-defined failures, such as timing errors or simulated component malfunctions, in a repeatable and observable environment. The method facilitated investigation of human-robot interaction under duress, providing a platform to assess user responses to various robotic failures without relying on the unpredictable nature of actual system malfunctions during early-stage testing.

The experimental protocol utilized a ‘Crash Cart Robot’ to systematically introduce predefined errors, specifically timing failures, during an ‘Item Retrieval Task’. This allowed for controlled manipulation of robot performance and observation of resultant user interactions. A total of 189 trials were conducted, providing a substantial dataset for analysis. Each trial involved a participant attempting to retrieve a specified item while the robot, operating under the controlled failure conditions, assisted or hindered the process. Data collected during these trials focused on user responses to the induced errors, quantifying aspects like response time, error rate, and corrective actions taken.

Failure detection was implemented to accurately identify instances of robot malfunction during the 189 trials of the Item Retrieval Task. This process involved monitoring key performance indicators and utilizing pre-defined thresholds to flag deviations indicating a failure. Data from these detected failures then served as the basis for analyzing user responses, with a participant pool of 41 individuals comprised of both trained healthcare workers and individuals without specific robotics or medical expertise. This dual demographic allowed for comparative analysis of failure response strategies across varying levels of experience and technical understanding.

This interface provides the components for a Wizard-of-Oz experiment, allowing a human operator to remotely control and guide a system's behavior. — This interface provides the components for a Wizard-of-Oz experiment, allowing a human operator to remotely control and guide a system’s behavior.

Decoding the Human Response: Multimodal Assessment of Emotional States

The RFM-HRI Dataset is a multimodal resource designed to capture comprehensive user reaction data. It integrates three primary data streams: visual information derived from facial expression analysis, acoustic features extracted through speech recognition, and direct user responses recorded during Human-Robot Interaction (HRI) scenarios. The dataset consists of synchronized recordings of these modalities, allowing for a holistic assessment of user states. Data collection focused on capturing user behavior during both successful and failed task completion, providing a basis for analyzing emotional responses to varying interaction outcomes. The dataset’s structure enables the application of data fusion techniques to correlate facial cues, vocal characteristics, and reported user experiences.

The RFM-HRI dataset underwent analysis utilizing Facial Expression Analysis and Speech Recognition techniques to quantify user emotional states. Facial Expression Analysis employed algorithms to identify and categorize expressions indicative of confusion or frustration, such as brow furrowing or lip corner depression. Simultaneously, Speech Recognition transcribed user utterances, and subsequent analysis focused on linguistic cues – including pauses, speech rate, and specific keywords – associated with cognitive load and negative affect. The combination of these two modalities provided complementary data points for assessing the intensity and type of user emotional response during Human-Robot Interaction.

Multimodal data fusion was employed to integrate data streams from facial expression analysis, speech recognition, and user response data, allowing for a composite assessment of emotional state during instances of system failure. This integration facilitated the identification of statistically significant differences (p < 0.001) in emotional response between successful and failed human-computer interactions. The fusion process enhanced the robustness of emotional state detection by leveraging complementary information from multiple modalities, mitigating the limitations inherent in relying on a single data source. This approach enabled a more accurate and nuanced understanding of user reactions to failure events than could be achieved through unimodal analysis.

The distribution of primary emotions differs significantly between successful and failed outcomes, indicating a correlation between emotional state and performance.

Beyond Error Correction: Towards Transparent Recovery and Collaborative Partnership

Research demonstrates that a robot’s ability to articulate its failures and subsequent recovery efforts – termed a ‘Verbal Recovery Strategy’ – is crucial for maintaining positive user perception. The study reveals that when a robot encounters difficulty, openly communicating the issue and the steps taken to resolve it significantly influences how humans perceive the robot’s competence and trustworthiness. Rather than attempting to mask errors, a proactive verbal approach fosters a sense of transparency, allowing users to understand the situation and maintain confidence in the robot’s overall functionality. This suggests that effective communication during failure scenarios is not simply about problem-solving, but also about managing the human-robot interaction and building a collaborative partnership.

Transparent Recovery represents a paradigm shift in how robots address failures, moving beyond silent attempts at correction or simply halting operation. This approach centers on proactive communication, where a robot explicitly acknowledges an issue to its human partner, detailing not only that a failure occurred, but also how it intends to resolve it. By verbalizing the recovery process – for example, stating “I am re-attempting to grasp the object” or “My path is blocked; I am recalculating” – the robot fosters understanding and predictability. This openness allows humans to maintain appropriate levels of situation awareness, reducing frustration and building trust, as they are kept informed of the robot’s internal state and actions during challenging circumstances. Ultimately, Transparent Recovery aims to create a collaborative environment where failures are not disruptive events, but rather opportunities for shared problem-solving.

Research indicates a significant preference for verbal communication when robots encounter and resolve failures, with 63.9% of observed recovery instances relying on spoken strategies. This finding underscores the critical role of transparency in fostering positive human-robot interactions. By clearly articulating the nature of a problem and the steps taken to rectify it, robots can mitigate user frustration and maintain trust. This approach moves beyond simply correcting errors, instead prioritizing a collaborative dynamic where users understand the robot’s internal state and recovery process. Ultimately, this work suggests that prioritizing clear, verbal communication during failures is fundamental to building robust, user-friendly robotic systems and enhancing long-term human-robot collaboration.

The RFM-HRI dataset meticulously charts the inevitable decay inherent in any system – in this case, the interaction between humans and robots. Every instance of failure, as documented within the dataset, is a signal from time, revealing the limitations of current designs and prompting a need for refinement. The study’s focus on recovery strategies acknowledges that perfect execution is an illusion; instead, the emphasis shifts to graceful degradation. As Donald Davies observed, “The real problem is not to build systems to last forever, but to design them to be easily replaced.” This principle resonates deeply with the dataset’s contribution, offering valuable data for building robots that, while not immune to failure, can navigate it with transparency and elicit appropriate user responses.

What Lies Ahead?

The introduction of RFM-HRI marks not an arrival, but an acknowledgement of inevitable system entropy. Every retrieval completed flawlessly is merely a postponement of the eventual failure, and this dataset begins to chart the terrain of those moments. The preference for verbal transparency discovered within the study suggests a human need not for avoidance of error-that is, ultimately, impossible-but for anticipation. Future work should not focus solely on minimizing failures, but on maximizing the utility of their occurrence as opportunities for recalibration-of both robot and user expectations.

A critical, and presently under-explored, dimension concerns the timescale of these interactions. The dataset captures immediate responses, yet the long-term effects of repeated, even gracefully handled, failures remain largely unknown. Does consistent transparency breed trust, or simply a learned helplessness? Architecture without history is fragile, and any truly robust system must account for the accumulated weight of past performance-both successes and shortcomings.

Ultimately, the value of RFM-HRI resides not in what it reveals about current human-robot interaction, but in the questions it compels. Every delay is the price of understanding, and the path toward genuinely resilient healthcare robotics demands a willingness to embrace the inherent imperfections of complex systems – and to learn from them, before they demand a reckoning.

Original article: https://arxiv.org/pdf/2603.05641.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Stumble: Understanding Robotic Fallibility

Simulating Imperfection: A Controlled Environment for Failure Analysis

Decoding the Human Response: Multimodal Assessment of Emotional States

Beyond Error Correction: Towards Transparent Recovery and Collaborative Partnership

What Lies Ahead?

See also: