Science – Page 66

Can AI Reason Like a Biologist?

06.02.2026 by ebaster

The Biology Arena Benchmark (BABE) was constructed through a multi-stage annotation pipeline to yield a diverse question set spanning twelve biological subfields, deliberately balanced with questions exhibiting both strong (45%) and weak (55%) correlations to existing knowledge, thereby establishing a challenging assessment of biological reasoning capabilities.

A new benchmark challenges large language models to move beyond memorization and demonstrate true experimental reasoning in the life sciences.

Seeing Isn’t Always Believing: The Limits of Voice AI in Real-World Tasks

06.02.2026 by ebaster

New research reveals that current voice assistants struggle to match the nuanced, proactive support of a human helper during complex inspection sequences.

Can AI Agents Truly Investigate?

06.02.2026 by ebaster

Current evaluations of autonomous agents prioritize either functional task completion or adversarial safety, yet a crucial element-investigative competence, the ability to proactively seek hidden context-remains largely unaddressed, hindering the development of truly robust agents capable of autonomous, safe, and context-aware operation, a gap that PATHWAYS aims to bridge.

A new benchmark reveals that today’s autonomous web agents often prioritize fabrication over genuine information gathering when faced with complex tasks.

Orchestrating the Future: A Map for Live Music AI

06.02.2026 by ebaster

A comprehensive design space-constructed from an analysis of 184 systems spanning human-computer interaction, artificial intelligence, and computer music-organizes live music agent development around Usage Context, Interaction, Technology, and Ecosystem, thereby illuminating concrete use cases and surfacing key design opportunities within the field.

This review systematically explores the design landscape for artificial intelligence systems intended to enhance and interact with live musical performance.

Can AI Truly Prove It?

06.02.2026 by ebaster

A new benchmark of ten unsolved mathematical problems challenges artificial intelligence to move beyond pattern recognition and demonstrate genuine proof-finding capabilities.

The Reality Gap in Robot Control

06.02.2026 by ebaster

The inherent tensions between control systems design and software engineering-a divergence in perspective-inevitably introduce challenges that propagate through the entire system lifecycle, forecasting future points of failure.

A new study reveals that the software bringing robot designs to life often falls short of theoretical ideals, impacting performance and reliability.

Can AI Reason Like a Biologist?

Seeing Isn’t Always Believing: The Limits of Voice AI in Real-World Tasks

Can AI Agents Truly Investigate?

Orchestrating the Future: A Map for Live Music AI

Can AI Truly Prove It?

The Reality Gap in Robot Control

Designing Molecules with AI: A New Era of Computational Chemistry

Beyond Success Rates: How We Judge Robot Intelligence

Decoding AI Decisions: A New Lens for Choice

Robots That Never Forget: A New Benchmark for Continuous Learning