Composing Scenes with AI: Skywork UniPic 3.0 Takes a Unified Approach

A new diffusion model elegantly blends multiple images into cohesive scenes, achieving state-of-the-art performance in both image editing and complex composition tasks.

A new diffusion model elegantly blends multiple images into cohesive scenes, achieving state-of-the-art performance in both image editing and complex composition tasks.

A new framework aims to improve the reliability of AI-powered research assistants by automatically verifying their work and adapting to failures.
![The system integrates visual observation, language instruction, and force feedback to dynamically adjust impedance parameters [latex]\mathcal{K,D}[/latex], enabling a variable impedance controller to execute adaptable and safe contact-rich manipulation.](https://arxiv.org/html/2601.15541v1/figs/overview.png)
Researchers have developed a new approach that combines visual understanding, language guidance, and adaptable force control to enable robots to perform complex manipulation tasks with greater safety and precision.

Researchers investigate whether large language models exhibit signs of sentience through self-reporting, and whether those reports are truthful.

New research reveals that even powerful language models struggle with the same biased thinking that plagues human decision-making.
![A learning framework leverages a Unity-based simulation-generating 75,655 robot configurations-to train a deep neural network that predicts the minimum distance [latex] d_{min} [/latex] between robotic arms, enabling the system to issue an audio warning when [latex] d_{min} [/latex] falls below 0.2 meters and preemptively mitigate potential collisions on real-world robotic setups.](https://arxiv.org/html/2601.15459v1/images/framework.jpg)
A new framework combines simulation and deep learning to dramatically improve safety and coordination in complex multi-arm surgical robotic systems.

A new computational framework moves beyond simple interaction counts to analyze the nuanced significance of characters within novels.

A new approach proposes shifting the focus from optimizing AI alignment to actively co-constructing it through ongoing user participation.

A new agentic system, AgentSM, dramatically improves the accuracy and efficiency of translating natural language into database queries by intelligently leveraging past reasoning steps.
![Models readily latch onto superficial object cues as shortcuts during learning, sacrificing robust verb representation-a study using a ViT[10] trained on a verb-object subset of Sth-com[16] reveals that while object accuracy increases rapidly, verb accuracy plummets in unseen compositional settings, even dropping below chance, demonstrating a bias towards easily-identified objects over generalized verb understanding.](https://arxiv.org/html/2601.16211v1/x5.png)
New research tackles a core challenge in video understanding: ensuring AI infers actions based on temporal reasoning, not just the objects present in a scene.