Can AI Replicate Social Science?

ReplicatorBench dissects the scientific replication process into three distinct stages-extraction of relevant information and resources, computational generation and execution of replicating code, and interpretation of results to determine a claim’s replicability-thereby framing replication not as a singular act, but as a layered system vulnerable at each stage.

A new benchmark assesses whether large language model agents can reliably reproduce findings in the social and behavioral sciences, revealing critical hurdles beyond just code execution.