Can AI Reason Like a Biologist?

A new benchmark challenges large language models to move beyond memorization and demonstrate true experimental reasoning in the life sciences.

A new benchmark challenges large language models to move beyond memorization and demonstrate true experimental reasoning in the life sciences.
New research reveals that current voice assistants struggle to match the nuanced, proactive support of a human helper during complex inspection sequences.

A new benchmark reveals that today’s autonomous web agents often prioritize fabrication over genuine information gathering when faced with complex tasks.

This review systematically explores the design landscape for artificial intelligence systems intended to enhance and interact with live musical performance.
A new benchmark of ten unsolved mathematical problems challenges artificial intelligence to move beyond pattern recognition and demonstrate genuine proof-finding capabilities.

A new study reveals that the software bringing robot designs to life often falls short of theoretical ideals, impacting performance and reliability.
Researchers have developed an intelligent agent that translates natural language commands into precise molecular designs, streamlining the process of chemical discovery.
New research reveals that simply knowing how often a robot succeeds isn’t enough for users to truly understand – or trust – its capabilities.
This review introduces a choice-theoretic framework for understanding the rationality of AI recommendations, even when the AI’s understanding of the task differs from our own.
![The robot’s capacity to navigate complex tasks is demonstrated through benchmarks assessing both object manipulation-approaching geometrically varied shapes using forward-facing [latex]RGB[/latex] vision-and precision locomotion, where adherence to centered lines on a colored ground plane is achieved with a downwards-facing line camera, highlighting an adaptable sensing strategy for differing operational demands.](https://arxiv.org/html/2602.04868v1/figs/lf2.png)
Researchers have unveiled a comprehensive simulation suite designed to push the boundaries of continual reinforcement learning in robotics, addressing the challenge of catastrophic forgetting.