Seeing is Doing: Robots Gain Smarter Perception for Complex Tasks

A new framework combines visual understanding and active perception, enabling robots to more effectively locate and manipulate objects in real-world environments.

A new framework combines visual understanding and active perception, enabling robots to more effectively locate and manipulate objects in real-world environments.

Researchers have developed a new system that allows robots to learn complex manipulation tasks by directly leveraging human movement data.
![The Technology Acceptance Model posits that an individual’s likelihood of adopting a new technology is determined by their perceived usefulness and perceived ease of use, influencing both attitude and behavioral intent, ultimately shaping actual system use [latex] TAM = (PU + PEU) \rightarrow Attitude \rightarrow Behavioral Intent \rightarrow Actual System Use [/latex].](https://arxiv.org/html/2603.11279v1/tam_model.png)
New research applies established psychological measurement techniques to evaluate the reasoning capabilities of advanced AI systems, revealing significant progress in their ability to model human thought.
![Dexterous manipulation, whole-body motion, and locomotion are integrated across eight diverse, long-horizon tasks to evaluate [latex]\Psi_{0}[/latex], with task instructions and sub-task markers overlaid for clarity and policy rollout videos available in supplementary materials.](https://arxiv.org/html/2603.12263v1/figures/PSI-Tasks-v3.png)
Researchers have unveiled a new model that bridges the gap between visual understanding and physical action in humanoid robots, enabling more natural and versatile loco-manipulation capabilities.

As autonomous vehicles tackle increasingly complex real-world scenarios, the need for robust reasoning-especially in situations requiring social awareness-is becoming paramount.

Researchers have developed a new system that combines data from wearable sensors and on-robot cameras to accurately interpret human gestures and identify the intended command source, even at a distance.

New research shows that incorporating thermodynamic descriptors derived from molecular dynamics simulations significantly improves the accuracy and reliability of machine learning models for predicting the boiling points of complex compounds.

Researchers have developed a new framework allowing multiple robots to collaborate on complex object manipulation tasks, regardless of the number of team members.
![The system’s architecture defines states as compositions of structure-expressed as hypotheses [latex]\mathcal{H}[/latex]-parameters [latex]\theta\in\mathcal{M}[/latex], energy [latex]E[/latex], and history τ-which evolve through observation-triggered coalgebraic steps yielding new states and observations, a dynamic governed by competing structural actions and parametric updates, and ultimately mediated by a local objective function that balances energetic cost with predictive success-a process reflecting the inherent trade-off between maintaining form and adapting to change within any decaying system.](https://arxiv.org/html/2603.11355v1/x1.png)
A new learning paradigm moves beyond fixed models, allowing AI systems to evolve their internal organization and resource allocation for more efficient and interpretable intelligence.

A new framework combines the reasoning power of large language models with traditional robotic planning to enable robots to tackle unfamiliar tasks and environments with greater flexibility.