Teaching Robots to Feel: Smarter Manipulation Through Language and Touch

Author: Denis Avetisyan


Researchers have developed a new approach that combines visual understanding, language guidance, and adaptable force control to enable robots to perform complex manipulation tasks with greater safety and precision.

The system integrates visual observation, language instruction, and force feedback to dynamically adjust impedance parameters [latex]\mathcal{K,D}[/latex], enabling a variable impedance controller to execute adaptable and safe contact-rich manipulation.
The system integrates visual observation, language instruction, and force feedback to dynamically adjust impedance parameters [latex]\mathcal{K,D}[/latex], enabling a variable impedance controller to execute adaptable and safe contact-rich manipulation.

This work introduces CompliantVLA-adaptor, a method leveraging Vision-Language-Action models and variable impedance control for robust contact-rich manipulation.

While recent advances in robotic manipulation leverage the power of vision-language models, a critical gap remains in ensuring safe and compliant physical interactions. This work introduces CompliantVLA-adaptor: VLM-Guided Variable Impedance Action for Safe Contact-Rich Manipulation, a novel approach that augments state-of-the-art Vision-Language-Action (VLA) systems with variable impedance control informed by large language models. By adapting stiffness and damping parameters based on task context and real-time force feedback, we demonstrate significantly improved success rates and reduced force violations in complex contact-rich scenarios-increasing overall success from 9.86\% to 17.29\%. Could this VLM-guided impedance control pave the way for more robust and versatile robotic systems capable of seamlessly interacting with uncertain environments?


The Fragility of Action: Bridging Perception and Reality

Contemporary Vision-Language-Action (VLA) models, despite advancements in artificial intelligence, consistently demonstrate fragility when confronted with the unpredictable nature of physical interactions. These systems, trained primarily on curated datasets, often exhibit a disconnect between perception and action in real-world scenarios. Subtle variations in object pose, lighting conditions, or unexpected contact forces can lead to significant performance degradation, causing robotic systems to falter during even seemingly simple tasks. The core issue lies in the difficulty of generalizing learned behaviors from simulated or limited environments to the continuous and often chaotic reality of physical manipulation, hindering the development of truly robust and adaptable robotic agents.

Current robotic manipulation systems, guided by vision-language-action models, often falter when confronted with the subtleties of real-world interactions. These systems struggle to perceive and respond to nuanced forces – the slight resistance when pressing parts together, the delicate balance needed when lifting a fragile object, or the unexpected push from an uneven surface. This limitation becomes critically apparent in tasks demanding precision, such as assembly or delicate handling, where even minor miscalculations can lead to failure or damage. The inability to adapt to unpredictable environmental factors – a slippery surface, a slightly misaligned component, or an unforeseen obstruction – further exacerbates the problem, hindering the deployment of these robots in dynamic, unstructured settings. Consequently, achieving truly robust robotic manipulation requires a significant leap in the ability to perceive, understand, and react to the complex interplay of forces and environmental conditions.

Current robotic manipulation systems, despite advances in vision and language processing, often falter due to a limited grasp of the fundamental physics governing physical contact. This deficiency manifests as an inability to accurately predict the consequences of actions involving force, friction, and deformation – crucial elements in tasks requiring precision or adaptability. Consequently, robots may apply excessive force, leading to damage to objects or even unsafe conditions, or they may fail to exert enough force to successfully complete an assembly or manipulation. The resulting manipulations are frequently ineffective, requiring human intervention or exhibiting a lack of robustness when faced with even minor environmental changes or variations in object properties. Addressing this limitation necessitates a deeper integration of physics-based modeling and learning algorithms, enabling robots to not merely see and plan, but to truly understand how forces propagate and interact during physical contact.

Current Variable-stiffness Actuator (VLA) systems, lacking force-awareness, pose safety risks during physical tasks with contact or uncertainty, highlighting a key area for improvement in their deployment.
Current Variable-stiffness Actuator (VLA) systems, lacking force-awareness, pose safety risks during physical tasks with contact or uncertainty, highlighting a key area for improvement in their deployment.

Adaptive Compliance: Augmenting Action with Context

CompliantVLA-adaptor is a newly developed methodology designed to enhance the safety and adaptability of existing Variable Length Actuation (VLA) robotic systems. This is achieved through the integration of variable impedance control, allowing the robot to dynamically adjust its stiffness and damping characteristics. Rather than requiring modifications to the underlying VLA hardware or control architecture, CompliantVLA-adaptor functions as an augmentation layer, providing a means to modulate the robot’s response to external forces and uncertainties encountered during task execution. This approach aims to improve robustness in dynamic and unpredictable environments by allowing the robot to react more flexibly to contact and disturbances.

CompliantVLA-adaptor leverages the capabilities of the ChatGPT-4o-mini Large Language Model to dynamically adjust impedance control parameters in robotic systems. The system prompts ChatGPT-4o-mini with information describing the current task context, including object properties and desired interaction behaviors. The LLM then outputs values for proportional, derivative, and potentially other gains used in the robot’s impedance controller. These generated parameters are then applied to the robot’s control loop, modulating its stiffness and damping characteristics to better suit the specific task requirements without requiring manual tuning or pre-programmed behaviors.

CompliantVLA-adaptor enhances the performance of contact-rich robotic tasks through dynamic modulation of robot compliance. By adjusting stiffness and damping parameters, the system mitigates potential damage from unexpected collisions and improves adaptability to varying environmental conditions. Quantitative results demonstrate a significant improvement in task success rates, increasing from 9.86% with standard VLA control to 17.29% when utilizing CompliantVLA-adaptor. This represents a 75.2% relative increase in successful task completion, indicating a substantial gain in both safety and reliability for applications involving physical interaction with the environment.

Across eight simulated tasks, a CompliantVLA-adaptor consistently improved task success rates-reaching up to 100% in some cases-demonstrating its ability to stabilize performance relative to baseline VLA models which exhibited highly variable and often failing ([latex]0\%[/latex]) results under a 30N contact force constraint.
Across eight simulated tasks, a CompliantVLA-adaptor consistently improved task success rates-reaching up to 100% in some cases-demonstrating its ability to stabilize performance relative to baseline VLA models which exhibited highly variable and often failing ([latex]0\%[/latex]) results under a 30N contact force constraint.

The Mechanics of Adaptation: Implementing Variable Impedance

Variable Impedance Control (VIC) maintains stability and adapts to external forces by modulating a robot’s response to contact. This is achieved through the continuous monitoring of force feedback – typically utilizing force/torque sensors – which informs adjustments to the robot’s stiffness, damping, and inertia parameters. Specifically, deviations from desired trajectories, coupled with sensed contact forces, are used as input to a control loop that modifies the robot’s impedance – its resistance to motion. Higher stiffness values provide greater resistance, suitable for precise positioning, while lower stiffness and increased damping allow for compliant interaction and shock absorption. Precise and low-latency force feedback is critical; delays or inaccuracies can lead to instability or failure to adapt to unexpected disturbances. The controller calculates required joint torques based on the desired impedance, sensed forces, and robot kinematics, effectively regulating the robot’s behavior during contact.

The CompliantVLA-adaptor utilizes impedance control by modulating parameters governing robot stiffness and damping in real-time. This dynamic adjustment is achieved through modification of the impedance controller’s gain matrix, specifically altering values associated with proportional and derivative gains. Increased proportional gain results in higher stiffness, enabling faster response to deviations from the desired trajectory, while increased derivative gain provides higher damping, reducing oscillations and improving stability during contact. By independently controlling these parameters, the CompliantVLA-adaptor can tailor the robot’s mechanical response to varying task demands and environmental uncertainties, allowing it to effectively handle unpredictable external forces and maintain stable contact.

The OpenVLA model, implemented with an Operational Space Controller, establishes a performance benchmark for evaluating the effectiveness of Variable Impedance Control adaptations. Utilizing this model, a baseline task success rate of 9.86% has been consistently achieved under defined operational parameters. This figure represents the percentage of successfully completed tasks without the enhancements provided by variable impedance control, and serves as the comparative standard against which improvements in task completion, stability, and adaptability are measured. Data collection and performance evaluation are conducted using the same task parameters for both the baseline OpenVLA model and the Variable Impedance Control implementations to ensure a valid and quantifiable comparison.

Regulating stiffness enables precise force control during real-world pushing tasks.
Regulating stiffness enables precise force control during real-world pushing tasks.

Beyond Performance: Towards Truly Adaptive Systems

The CompliantVLA-adaptor represents a significant advancement in robotic manipulation, demonstrably improving performance across a range of complex tasks. Rigorous testing reveals a substantial 76% increase in task success rates when compared to previous methodologies, highlighting its effectiveness in real-world scenarios. This improvement isn’t limited to previously encountered tasks; the system also exhibits remarkable zero-shot generalization capabilities, allowing it to successfully execute novel manipulations without specific prior training. This adaptability stems from a refined approach to controlling robotic compliance, enabling more robust and versatile interactions with objects and environments, and paving the way for robots capable of handling unforeseen circumstances with greater reliability.

The system’s enhanced robustness stems from its ability to incorporate contextual understanding during operation. Rather than rigidly executing pre-programmed actions, the CompliantVLA-adaptor dynamically adjusts its behavior based on real-time environmental feedback and object characteristics. This context-awareness allows it to compensate for variations in lighting, surface textures, or even minor differences in object dimensions – factors that often derail traditional robotic manipulation systems. Consequently, the system demonstrates significantly improved performance across a wider range of scenarios, exhibiting a remarkable ability to maintain task success even when faced with unpredictable or imperfect conditions. This adaptability is crucial for real-world deployment, where pristine laboratory settings are rarely encountered.

Continued development centers on optimizing the translation of natural language commands into precise robot motions, known as impedance mapping. This refinement aims to create a more intuitive and nuanced control interface, allowing for greater expressiveness and adaptability in complex scenarios. Researchers are also investigating advanced learning algorithms, including reinforcement learning and meta-learning techniques, to enable the system to rapidly acquire new skills and generalize to previously unseen environments and object properties. These enhancements promise to move beyond pre-programmed behaviors, fostering a truly flexible and intelligent robotic assistant capable of seamlessly responding to a wider range of instructions and dynamic situations.

Simulated contact-rich tasks demonstrate the policy's ability to handle complex interactions and maintain stability.
Simulated contact-rich tasks demonstrate the policy’s ability to handle complex interactions and maintain stability.

The pursuit of robust contact-rich manipulation, as detailed in this work, inherently demands a reduction of unnecessary complexity. The CompliantVLA-adaptor method exemplifies this principle by integrating variable impedance control with Vision-Language-Action models. This isn’t simply adding functionality; it’s about refining the system to achieve safer, more effective interactions. As Donald Knuth observed, “Premature optimization is the root of all evil.” The authors demonstrate a mindful approach, prioritizing a clear and adaptable framework over immediate, potentially brittle, gains. The system’s ability to modulate impedance based on LLM guidance speaks to an understanding that true power lies not in exhaustive features, but in elegant simplicity.

What’s Next?

This work addresses a practical need. Safe manipulation requires more than simply seeing and planning. It demands nuanced interaction. Yet, reliance on large language models introduces a familiar fragility. Abstractions age, principles don’t. The true test lies not in demonstrating success within controlled settings, but in managing unforeseen contact dynamics.

Current methods treat force regulation as an addendum. Future work should integrate it as a foundational element. Consider the limitations of purely reactive impedance control. Proactive adaptation, anticipating contact before it occurs, remains a significant challenge. Every complexity needs an alibi. The pursuit of generalizable contact-rich manipulation demands parsimony, not proliferation of parameters.

The field now faces a choice. Will it prioritize increasingly elaborate architectures, or a deeper understanding of fundamental control principles? The answer isn’t in more data. It’s in clearer definitions of success. The goal is not to mimic human dexterity, but to surpass its limitations with robust, predictable systems.


Original article: https://arxiv.org/pdf/2601.15541.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-24 15:55