Can AI Rediscover Science?

FIRE-Bench assesses an AI research agent’s capacity to independently replicate a published study’s core empirical result when presented with a high-level research question, allowing for a detailed comparison of the agent’s research process against the original human methodology.

A new benchmark assesses how well autonomous agents can independently arrive at established scientific findings, revealing significant hurdles to automating the full research cycle.