Beyond Logic: When AI Gets Reasoning Wrong

Despite achieving 81.7% accuracy in syntactic validity-demonstrating proficiency in formal logical reasoning-the models falter in natural language understanding believability, scoring only 56.2% and revealing a 25.50 percentage point gap between formal correctness and semantic plausibility.

New research reveals that while artificial intelligence systems can flawlessly solve formal logic problems, they struggle with reasoning based on real-world knowledge and are surprisingly susceptible to human biases.