Robots Gain a ‘Feel’ for Articulated Objects

Author: Denis Avetisyan

A new framework combines vision and tactile sensing to enable robots to reliably track and manipulate complex, moving objects.

The proposed ArtReg framework integrates visuo-tactile pose tracking with interactive perception to detect articulation, ultimately enabling closed-loop, goal-driven manipulation—a system built on the understanding that even the most elegant robotic control will inevitably confront the unpredictable realities of physical interaction.

Researchers present ArtReg, a SE(3) Lie Group-based Kalman filter that fuses visual and tactile data for robust interactive perception and manipulation of unseen articulated objects.

Despite advances in robotic perception, reliably interacting with unknown articulated objects—those with movable parts like doors or tools—remains a significant challenge. This paper introduces ArtReg, a novel framework for visuo-tactile based pose tracking and manipulation of such objects, enabling robots to perceive and control them without prior knowledge of their geometry or kinematics. ArtReg integrates visual and tactile data within an SE(3) Lie Group-based Kalman filter, achieving robust tracking and manipulation even in challenging conditions. Could this approach pave the way for robots to seamlessly interact with the complex, dynamic environments of our everyday lives?

Beyond Brute Force: Recognizing What Moves

Traditional robotic manipulation struggles with objects possessing joints – articulated objects. Success isn’t simply seeing an object, but understanding its internal mechanics. Relying solely on vision is insufficient; it can’t determine if a joint is locked or free. A deeper understanding of an object’s mechanics, combined with the ability to infer its state from limited sensory input, is essential. Successfully manipulating these objects requires understanding how forces influence their movement and internal dynamics, tracking degrees of freedom, and predicting responses to interactions. It’s a constant chase, really – fixing the problems we’ll inevitably create later.

An experimental setup utilizes a Franka Emika Panda robot equipped with an Azure Kinect DK and a Universal Robots UR5 with tactile sensors to perform interactive perception and detect articulation structures within unknown objects, tracking their 6 degree-of-freedom movements using the ArtReg algorithm for goal-driven manipulation.

Successfully manipulating articulated objects demands not just identifying their geometry, but also comprehending how applied forces influence their movement and internal dynamics. This necessitates robust methods for estimating the object’s configuration, tracking its degrees of freedom, and predicting its response to external interactions.

Feeling Your Way Around: Interactive Perception

Interactive Perception addresses manipulation challenges by integrating physical interaction with sensing. This moves beyond passive observation, employing robot manipulation to probe an object’s structure and detect kinematic joints. The core principle involves applying controlled forces and monitoring resulting movements, inferring structure without complete reliance on vision. This is particularly effective at identifying revolute and prismatic connections – the joints defining an object’s range of motion.

Interactive perception employs distinct actions—pushing for revolute joints and grasping/pulling for prismatic joints—to identify and characterize object articulation.

This ‘feeling’ for structure allows robots to distinguish between articulated objects and rigid bodies, crucial for developing adaptable manipulation strategies.

ArtReg: Accurate Tracking, Inevitable Complexity

The ArtReg algorithm tackles real-time object tracking by utilizing the mathematical framework of the Special Euclidean group, SE(3), to represent rigid body transformations. This geometrically consistent representation enables accurate estimation even with sensor noise. ArtReg integrates this with an Unscented Kalman Filter (UKF), propagating pose through state space while accounting for non-linear 3D rotations. Experiments demonstrate an average Absolute Differential Error (ADI) of 7.23 cm, and average pose tracking error less than 1.5 cm.

The experimental setup defines a state and measurement vector, while visualizations demonstrate manifold operations including the exponential and logarithmic maps, all integrated within the ArtReg algorithm—a manifold unscented Kalman filter—to operate on the state and measurement manifolds.

This represents a 60% accuracy improvement over state-of-the-art methods, extending to both articulated and rigid objects, providing a foundation for advanced manipulation.

Tactile Guidance: A Delicate Touch, Fragile Assumptions

Combining tactile perception with specific manipulation actions – ‘Hold-Pull’ for prismatic joints and ‘Push’ for initial contact – enables precise control of articulated objects. This moves beyond pre-programmed trajectories, allowing adaptive responses to unforeseen variations. Tactile sensing provides critical feedback regarding contact forces and slippage, allowing the robot to adjust actions in real-time, preventing drops and maintaining a secure grip. Algorithms estimate contact wrench and predict instabilities, proactively modifying grasp parameters.

Goal-driven closed-loop control successfully manipulates articulated objects, demonstrating effective control strategies.

This combination enables robust goal-driven manipulation, with evaluations demonstrating a goal-state error of less than 4 cm. Ultimately, it’s another layer of abstraction built on assumptions – just waiting for production to find a way to break it spectacularly.

Center of Mass: The Illusion of Control

The Center of Mass (CoM) plays a critical role in stability and manipulability, particularly in articulated systems. Accurate CoM estimation allows precise control of pose and trajectory, improving robustness. Deviations from predicted CoM locations can lead to instability, especially with complex objects. Predicting CoM shifts under manipulation is crucial for planning robust actions, requiring accurate perception of geometry and physical properties, as well as modeling complex interactions.

Variations in the center of mass are explored during goal-driven pushing of articulated objects, highlighting the system’s adaptability to different object configurations.

Future work will focus on integrating CoM estimation into ArtReg, enabling even more precise and adaptive manipulation. This promises to extend the range of tasks robots can reliably perform, particularly in unstructured environments.

The pursuit of robust articulated object tracking, as demonstrated by ArtReg, feels less like innovation and more like delaying the inevitable. This framework, with its SE(3) Lie Group-based Kalman filter merging visual and tactile data, strives for a perfect model of interaction. Yet, production environments—the true arbiters of any system—will relentlessly expose its limitations. As John von Neumann observed, “There is no possibility of absolute certainty.” ArtReg, for all its elegance, simply reduces uncertainty—a temporary reprieve before the next edge case surfaces, demanding another iteration of refinement. The cycle continues, a testament to the fact that even the most sophisticated systems are built on layers of managed approximations.

What’s Next?

The presented framework, ArtReg, neatly addresses a constrained problem – tracking articulated objects. One suspects, however, that the moment this ventures beyond laboratory curios, the ‘unseen’ will rapidly become ‘unmanageable’. The reliance on SE(3) Lie Groups is, as always, elegant until the first outlier appears in production data. Then the beautiful math becomes just another layer of abstraction masking a failing filter. It is a common story.

Future work will undoubtedly explore scaling this to more degrees of freedom, and more objects. But the real challenge isn’t algorithmic – it’s data. The system performs well with objects it hasn’t ‘seen’ – a claim that should be filed under ‘temporarily true’. The world excels at presenting novel configurations, occlusions, and objects that defy neat categorization. Better one robust Kalman filter, meticulously tuned, than a hundred adaptive architectures chasing phantoms.

The promise of multi-modal perception remains compelling, of course. But let’s not mistake correlation for understanding. The system reacts to tactile input; it doesn’t reason about it. Until robotics confronts the messy reality of imperfect sensors and unpredictable environments, ‘generalizable’ will remain a marketing term. The logs will tell the true story, as they always do.

Original article: https://arxiv.org/pdf/2511.06378.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/