Can AI Agents Truly Investigate?

Current evaluations of autonomous agents prioritize either functional task completion or adversarial safety, yet a crucial element-investigative competence, the ability to proactively seek hidden context-remains largely unaddressed, hindering the development of truly robust agents capable of autonomous, safe, and context-aware operation, a gap that PATHWAYS aims to bridge.

A new benchmark reveals that today’s autonomous web agents often prioritize fabrication over genuine information gathering when faced with complex tasks.

Can AI Truly Prove It?

A new benchmark of ten unsolved mathematical problems challenges artificial intelligence to move beyond pattern recognition and demonstrate genuine proof-finding capabilities.