Can AI Truly Do Biology?

The comparative analysis reveals performance distinctions between LAB-Bench and LABBench2 across a spectrum of high-level task families, highlighting nuanced variations in accuracy achieved by each benchmark.

A new benchmark reveals that while large language models are improving, significant challenges remain in automating complex biological research tasks.