Can AI Truly Navigate Like Humans?

Goal-oriented navigation in urban environments demands an agent translate linguistic instructions into progressive actions based on continuous observation, yet current large multimodal models demonstrate a significant disparity in spatial reasoning and action execution compared to human capabilities.

A new benchmark assesses the ability of advanced AI models to perform goal-oriented navigation in complex urban environments, revealing critical limitations in spatial understanding.