Can AI Pass Physics?

Model performance on advanced physics exams is not absolute, but fluctuates considerably year to year, as evidenced by diverging trajectories and inconsistent rankings across models-with certain exam iterations consistently challenging all models while others prove universally accessible-revealing that evaluation is contingent on specific test characteristics rather than inherent AI capability, and underscored by the variability in scoring consistency between independent raters.

A new study rigorously tests the problem-solving abilities of artificial intelligence on challenging, algebra-based physics questions.