Teaching Robots to Hesitate: A New Dataset for Expressive Movement

Author: Denis Avetisyan

Researchers have created a multi-modal dataset of human-guided robot motions, allowing robots to learn how to communicate uncertainty and intent through subtle pauses and adjustments.

A dancer instructs a Franka Emika Panda manipulator through kinesthetic teaching, guiding the robot to produce a deliberate trajectory towards a Jenga tower, demonstrating an approach to robot programming centered on physical guidance and iterative refinement.

The Dance2Hesitate dataset captures kinesthetic teaching from dancers, providing data for robots to learn functional expressivity and improve human-robot interaction.

Effective human-robot collaboration requires robots to signal their internal states, yet conveying hesitancy-a crucial element of natural communication-remains a significant challenge due to its embodiment and context-dependency. To address this, we introduce ‘Dance2Hesitate: A Multi-Modal Dataset of Dancer-Taught Hesitancy for Understandable Robot Motion’, a novel multi-modal dataset capturing both kinesthetic teaching demonstrations and human motion capture of hesitant reaching movements. This resource comprises 70 whole-body trajectories and 84 upper-limb trajectories across three graded hesitancy levels, enabling researchers to benchmark and develop more intuitive and predictable robot behaviors. How can we leverage this dataset to build robots that communicate uncertainty in a way that fosters trust and improves collaborative task performance?

The Nuances of Interaction: Beyond Functional Robotics

For decades, the field of robotics has largely centered on achieving optimal performance in terms of speed, accuracy, and repeatability. This emphasis on efficiency has, however, inadvertently led to a neglect of the subtle, yet crucial, aspects of communication inherent in human interaction. Robots are typically designed to execute tasks, not to express intent or uncertainty. Consequently, robotic movements often appear abrupt, mechanical, and lacking in the social cues that humans instinctively use to coordinate actions and build trust. This prioritization of function over expressiveness presents a significant obstacle to seamless human-robot collaboration, especially in dynamic environments where ambiguity and adaptability are paramount, limiting the potential for robots to truly integrate into human-centered workflows.

Human communication extends far beyond spoken words; a wealth of information is conveyed through subtle bodily cues, including micro-movements, shifts in posture, and even variations in gaze. These non-verbal signals intuitively communicate a person’s intentions, levels of confidence, and degrees of uncertainty, allowing for seamless social interaction. Current robotic systems, however, typically lack the capacity to either express or interpret these nuanced cues. While a robot might flawlessly execute a programmed task, it often does so without conveying how certain it is of its actions, or signaling potential difficulties. This deficiency hinders effective collaboration, as humans are naturally inclined to respond to these subtle signals when interacting with one another, and struggle to build rapport or anticipate the actions of a robot that appears rigidly deterministic.

Effective human-robot collaboration falters when robots operate without the ability to signal uncertainty or intent, especially within dynamic, real-world scenarios. Current robotic systems, optimized for precise execution, often lack the expressive capabilities humans instinctively use – a slight hesitation, a broadened gaze, or a modulated movement – to convey nuanced information during joint tasks. In complex environments, such as disaster response or collaborative manufacturing, this absence of nonverbal communication creates ambiguity and increases the cognitive load on human partners who must constantly interpret a robot’s actions without insight into its internal state. Consequently, collaborative efficiency decreases, and the potential for errors rises as humans struggle to anticipate robotic behavior and adapt to unforeseen circumstances, highlighting the critical need for robots capable of communicating beyond simple task completion.

Mapping Human Hesitation: Data Acquisition and Analysis

RGB-D motion capture was utilized to record detailed human movement data, with a specific emphasis on capturing the subtle characteristics of hesitant actions. This technology combines standard RGB video with depth data, allowing for the accurate reconstruction of three-dimensional human pose and movement in space. The resulting data provides a comprehensive record of kinematic information, including joint positions, velocities, and accelerations, crucial for analyzing the nuanced physical expressions associated with uncertainty and indecision. This approach allows for quantitative assessment of hesitation beyond simple behavioral observation, enabling precise measurement and characterization of subtle motor patterns.

A participant pool of 14 dancers, consisting of 2 male and 12 female individuals, was utilized to generate data exhibiting hesitant actions. Participants performed tasks centered around manipulating a Jenga tower, a physical setup selected to elicit subtle, observable expressions of uncertainty and cautious movement. This controlled environment allowed for the consistent capture of whole-body and upper-limb trajectories as performers navigated the challenge of removing blocks without causing the tower to collapse, providing a standardized basis for analyzing the kinematic characteristics of hesitancy.

A dataset of human movement was compiled consisting of 70 full-body trajectories and 84 upper-limb trajectories, captured across three defined levels of hesitancy. Data processing leveraged the OpenPose keypoint detection system, supplemented by custom algorithms, with a minimum keypoint confidence threshold of 0.30 to ensure data accuracy. The resulting motion capture data was sampled at a rate of 30 frames per second, providing a high-resolution temporal representation of subject movements for detailed analysis and characterization of hesitant behavior.

Data was collected using a kinesthetic teaching station to demonstrate desired motions and a motion capture station to record the robot's movements. — Data was collected using a kinesthetic teaching station to demonstrate desired motions and a motion capture station to record the robot’s movements.

Formalizing Expressivity: Modeling Hesitant Motion

A formalized hesitation profile was developed utilizing acceleration data to quantify and characterize hesitant movement. This profile defines hesitancy not as a pause in motion, but as a pattern of repeated accelerations and decelerations before completing a trajectory. Specifically, the profile measures the amplitude, frequency, and duration of these acceleration peaks and valleys to distinguish between varying degrees of hesitancy. The resulting data allows for the objective comparison of trajectories exhibiting different levels of hesitation and serves as a basis for generating synthetic hesitant motions. This approach contrasts with methods based on velocity or position, which can be less sensitive to the nuanced dynamics of hesitant behavior.

A Variational Autoencoder (VAE) was implemented to create a compressed, latent representation of observed hesitant motions. This approach allows for the decoupling of movement characteristics into a lower-dimensional space, facilitating the generation of novel, yet plausible, expressions of hesitancy. The VAE architecture learns a probabilistic mapping between the observed motion data and this latent space, enabling the sampling of new points within the latent space and their subsequent decoding into diverse hesitant movements. By training the VAE on collected kinesthetic teaching data, the model captures the underlying statistical distribution of hesitant behaviors, resulting in generated motions that exhibit realistic variations in speed, acceleration, and trajectory.

Implementation of the learned hesitancy model involved integration with a Franka Emika Panda robotic manipulator utilizing kinesthetic teaching. A dataset of 66 trajectories was collected, representing demonstrations of hesitant motion across three distinct levels of expressed hesitancy. This kinesthetic teaching approach enabled the transfer of learned behaviors from human demonstration to the robotic platform, allowing the robot to reproduce the nuanced movements characteristic of varying degrees of hesitancy. The resulting dataset provided the basis for training and validating the model’s ability to generate and execute hesitant motions on a physical robotic system.

Trust Through Transparency: Validating Robot Expressivity

Recent research confirms that robots can effectively communicate uncertainty through non-verbal cues, specifically by conveying hesitation in their movements. Qualitative assessments revealed that human collaborators readily perceived and correctly interpreted these subtle signals, indicating a capacity for nuanced communication beyond simple task execution. This ability is crucial for building trust and fostering effective human-robot teamwork, as it allows the robot to signal when its internal state reflects doubt or a need for clarification. By visibly expressing uncertainty, the system avoids presenting potentially inaccurate information as definitive, ultimately promoting safer and more reliable collaboration scenarios – particularly in complex or unpredictable environments.

Researchers assessed the emotional impact of a robot’s expressive movements by utilizing the Valence-Arousal-Dominance (VAD) model, a widely accepted framework for charting emotional states. This analysis revealed a significant correlation between the robot’s intended expressions – such as happiness, sadness, or uncertainty – and the emotional responses registered in human observers. Specifically, movements designed to convey positive emotions consistently elicited high valence and arousal scores, while those intended to signal negative states resulted in lower values. This suggests the robot isn’t simply performing movements, but effectively communicating emotional cues that are appropriately perceived and processed by humans, a crucial step towards building trust and facilitating seamless collaboration.

The design of this robotic system prioritized not simply what the robot did, but how it moved, leveraging the principles of Laban-Effort analysis-a method for describing and categorizing human movement qualities. This approach moved beyond purely functional kinematics, intentionally shaping robotic gestures to convey subtle emotional cues. By incorporating elements like weight, space, and flow, the research team aimed to create movements that resonated with human observers on an aesthetic and emotional level, fostering a sense of natural interaction. The result is a robot capable of expressing hesitation or uncertainty not through verbal cues, but through the quality of its physical movements – a critical step towards building truly collaborative and trustworthy human-robot partnerships.

The work detailed in ‘Dance2Hesitate’ highlights a crucial element often overlooked in robotics: the communication of intent beyond precise execution. The dataset’s focus on hesitant motions, captured through kinesthetic teaching, reveals that effective interaction isn’t solely about what a robot does, but how it signals its uncertainty. This echoes a core tenet of robust system design; structure dictates behavior, and the inclusion of subtle cues-like a slight pause or altered trajectory-creates a more understandable, and therefore reliable, system. As Linus Torvalds once stated, “Talk is cheap. Show me the code.” This dataset provides the ‘code’ – the nuanced movement data – necessary to build robots that don’t just perform tasks, but communicate their internal state through their actions.

The Road Ahead

The creation of ‘Dance2Hesitate’ feels less like an arrival and more like a well-calibrated starting point. The dataset addresses a genuine scarcity – nuanced human demonstration of uncertainty in motion – but it also highlights a deeper, and predictably thorny, issue. If the system looks clever, it’s probably fragile. Demonstrating how a robot should express hesitancy is demonstrably easier than imbuing it with the appropriate contextual awareness to know when. The captured data represents skillful performance, but skill is often a mask for a cascade of unarticulated assumptions.

Future work will undoubtedly focus on bridging this gap – on developing algorithms that can translate kinematic data into something resembling genuine communicative intent. However, a more fundamental challenge lies in acknowledging that ‘functional expressivity’ is not merely a feature to be added, but a constraint to be accepted. Architecture is the art of choosing what to sacrifice; a robot capable of expressing the full spectrum of human uncertainty will likely be a chaotic and unpredictable entity.

The true test will not be whether a robot can convincingly mimic hesitancy, but whether it can leverage it to foster more robust and, crucially, more trustworthy interactions. The goal, after all, isn’t to build a robot that merely appears uncertain, but one that allows a human to accurately predict its next, perhaps tentative, step.

Original article: https://arxiv.org/pdf/2603.10166.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Nuances of Interaction: Beyond Functional Robotics

Mapping Human Hesitation: Data Acquisition and Analysis

Formalizing Expressivity: Modeling Hesitant Motion

Trust Through Transparency: Validating Robot Expressivity

The Road Ahead

See also: