Teaching AI to See Like Us: A New Approach to 3D Spatial Reasoning

Author: Denis Avetisyan


Researchers are leveraging insights from human visual perception to accelerate the training of reinforcement learning agents in complex 3D environments.

Trained models demonstrated performance-measured in accuracy and visual fixations-across a diverse set of environments, encompassing six discrete conditions and one continuous spectrum, all cultivated through a curriculum informed by human experimentation, suggesting a pathway toward robust generalization through ecologically valid training.
Trained models demonstrated performance-measured in accuracy and visual fixations-across a diverse set of environments, encompassing six discrete conditions and one continuous spectrum, all cultivated through a curriculum informed by human experimentation, suggesting a pathway toward robust generalization through ecologically valid training.

Human-informed curriculum learning successfully trains agents on a 3D Same-Different task, revealing distinct but effective exploration strategies.

Despite advances in artificial intelligence, replicating human-level performance in complex 3D visuospatial reasoning remains a significant challenge. This work, ‘Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design’, investigates the application of reinforcement learning to a 3D Same-Different task, demonstrating that strategically designed curricula-informed by human psychophysical data-can effectively train agents to solve it. Interestingly, the resulting agent strategies, while successful, diverge from typical human visual exploration patterns. Could this approach unlock more robust and adaptable AI systems capable of tackling increasingly complex real-world problems?


The Illusion of Spatial Understanding

The development of truly intelligent artificial systems hinges on replicating the human capacity for visuospatial perception – the ability to understand and interact with the world through sight and spatial awareness. This extends far beyond simply ‘seeing’ an image; it encompasses interpreting depth, recognizing objects regardless of perspective, and mentally manipulating those objects to plan actions. Replicating this skill is paramount because it underpins numerous real-world applications, from autonomous navigation and robotic surgery to advanced image recognition and virtual reality. A system capable of robust visuospatial understanding doesn’t merely identify a chair; it comprehends its location, dimensions, and how it relates to other objects in the environment, enabling it to navigate around it, grasp it, or even predict its movement – mirroring the seamless way humans perform these tasks daily.

The ‘Same-Different Task’ represents a cornerstone in evaluating visuospatial intelligence, both in humans and increasingly, in artificial intelligence systems. This deceptively simple test requires subjects to quickly and accurately determine whether two presented objects are identical or distinct, demanding a complex interplay of cognitive processes. Beyond mere visual acuity, the task probes core abilities such as object recognition, feature extraction, and comparative analysis. Successful completion relies on the brain’s capacity to encode object features, maintain them in working memory, and then compare these representations – skills crucial for navigation, problem-solving, and ultimately, a comprehensive understanding of the visual world. Researchers utilize variations of this task to map the neural substrates underlying these abilities and to benchmark the performance of AI algorithms striving for human-level visual cognition.

The pursuit of truly intelligent artificial systems extends far beyond simply identifying patterns within existing datasets. While current machine learning excels at recognizing previously encountered stimuli, a critical hurdle lies in achieving robust generalization and adaptation to novel situations. Traditional algorithms often falter when presented with variations outside their training parameters, exhibiting a rigidity that contrasts sharply with human cognitive flexibility. This limitation stems from a reliance on statistical correlations rather than a deeper understanding of underlying principles. Consequently, AI must evolve from being proficient pattern recognizers to systems capable of abstracting knowledge and applying it creatively to unforeseen circumstances – a capacity that demands innovative approaches to learning and representation, moving beyond the confines of purely data-driven methodologies.

The Shepard rotation task, depicted here, assesses human cognitive ability by requiring mental manipulation of three-dimensional objects.
The Shepard rotation task, depicted here, assesses human cognitive ability by requiring mental manipulation of three-dimensional objects.

Mimicking Cognition Through Trial and Error

Reinforcement Learning (RL) is a computational approach to learning where an agent improves its behavior through iterative interaction with an environment. This framework directly parallels operant conditioning, a learning paradigm central to behavioral psychology, and specifically embodies Edward Thorndike’s Law of Effect. This law posits that behaviors followed by satisfying consequences are more likely to be repeated, while those followed by negative consequences become less frequent. In RL, the agent receives rewards or penalties – analogous to satisfying or negative consequences – for its actions, and adjusts its strategy to maximize cumulative reward. This process allows the agent to learn optimal policies without explicit programming, discovering effective actions through trial and error, and strengthening those associated with positive outcomes.

The training environment is constructed using Unity, a 3D game engine, and the ML-Agents Toolkit, a Unity-based toolkit designed to facilitate the development and training of reinforcement learning agents. This allows for the creation of visually and physically realistic scenarios where agents can interact with virtual objects termed ‘TEOS Objects’. These objects serve as the stimuli for the Same-Different Task, providing a controlled and repeatable environment for agent learning. The simulation allows precise control over environmental parameters, sensor inputs, and action spaces, crucial for isolating and evaluating agent performance. Furthermore, the use of a simulation platform enables scalable data collection and parallel training, accelerating the learning process and improving the robustness of the trained agents.

The Same-Different Task, implemented within the 3D simulation environment, utilizes a sparse reward structure where agents receive positive reinforcement only upon correctly identifying whether two presented ‘TEOS Objects’ are identical or different. This presents a significant challenge for reinforcement learning agents as the infrequent reward signal necessitates extensive exploration of the state space to discover successful strategies. Unlike dense reward systems providing immediate feedback, the sparsity demands that agents effectively balance exploration and exploitation to learn through delayed reinforcement, potentially requiring techniques such as intrinsic motivation or curriculum learning to facilitate discovery of the optimal policy for accurate object comparison.

The reinforcement learning agent, represented by a magenta sphere, navigates and interacts with objects within a Unity-based environment, as visualized with inset displays of its perspective and recent actions.
The reinforcement learning agent, represented by a magenta sphere, navigates and interacts with objects within a Unity-based environment, as visualized with inset displays of its perspective and recent actions.

Guiding the Algorithm Toward Competence

Curriculum Learning was implemented to address the difficulties arising from sparse reward signals during agent training. This involved utilizing data derived from human experimental performance to construct a progressive training schedule, beginning with simpler tasks and gradually increasing complexity. This approach facilitated more efficient learning and resulted in agent accuracies of 93.8%, which is comparable to human performance as reported in solbach2023psychophysics. The use of human data as a guide ensured the agent was exposed to increasingly challenging scenarios in a manner consistent with human learning progression, thereby mitigating the issues associated with delayed or infrequent rewards.

The complexity of the simulated 3D environment necessitates the use of advanced machine learning techniques; therefore, we implemented Deep Reinforcement Learning algorithms, specifically Proximal Policy Optimization (PPO) and Generative Adversarial Imitation Learning (GAIL). PPO is utilized for its stability and sample efficiency in continuous action spaces, while GAIL enables learning from expert demonstrations, allowing the agent to mimic optimal behaviors observed in the environment. These algorithms facilitate effective policy optimization and exploration within the high-dimensional state and action spaces characteristic of the 3D simulation, ultimately enabling robust performance and adaptation.

Agent performance was evaluated across multiple environments varying in the number of viewpoints. Results indicate an accuracy of 95.88% was achieved in a 6-viewpoint environment, increasing to 97.9% in a 12-viewpoint environment and 92.62% in a 24-viewpoint environment. Performance was further tested in a more complex 48-viewpoint environment, where an accuracy of 78.6% was attained. These results demonstrate the scalability of the approach as the complexity of the environment, as measured by the number of viewpoints, increases.

Training with curriculum learning on a discrete environment with six viewpoints reveals a positive correlation between cumulative reward and lesson progression over 18 million training steps.
Training with curriculum learning on a discrete environment with six viewpoints reveals a positive correlation between cumulative reward and lesson progression over 18 million training steps.

The Mirage of General Intelligence

The development of robust visuospatial reasoning within simulated environments represents a crucial step towards more adaptable artificial intelligence. This capability – the ability to understand and interact with space based on visual information – underpins a wide range of cognitive skills, from navigation and object manipulation to planning and problem-solving. By successfully training agents to explore and learn within these virtual worlds, researchers are establishing a foundation for AI systems capable of generalizing knowledge to novel situations. Such systems aren’t simply memorizing solutions, but rather developing an internal representation of space that allows them to infer relationships, predict outcomes, and ultimately, operate with greater autonomy and efficiency in complex, real-world scenarios. This progress suggests a pathway beyond narrowly focused AI, towards systems possessing the broader cognitive flexibility characteristic of general intelligence.

The pursuit of Artificial General Intelligence (AGI) often encounters limitations in scalability, but recent work suggests a promising pathway through the synergistic combination of reinforcement learning and thoughtfully constructed curricula. This methodology doesn’t simply task an agent with a goal; instead, it carefully sequences learning challenges, starting with simpler scenarios and gradually increasing complexity. By strategically guiding the agent’s experience, researchers can accelerate learning and improve generalization capabilities. This contrasts with approaches that rely on massive datasets or brute-force exploration, offering a more efficient and potentially more robust route towards creating AI systems capable of adapting to a wide range of tasks and environments. The benefit lies in fostering a progressive acquisition of skills, enabling agents to build upon prior knowledge and ultimately achieve a level of cognitive flexibility characteristic of general intelligence.

The agents exhibited a noteworthy capacity for spatial reasoning, as evidenced by their exploration patterns within the simulated environments; on average, each agent investigated 5.65 viewpoints in the simpler 6-cell setup, maintaining a robust level of exploration-5.11 viewpoints-even when confronted with the increased complexity of the 12-viewpoint environment. This suggests the development of efficient strategies for gathering information and building a comprehensive understanding of their surroundings, rather than relying on random searches. Current research endeavors are now directed towards translating these computationally-derived skills into tangible applications within real-world scenarios, addressing the crucial challenge of transferring knowledge gained in simulation to the complexities of physical reality and ultimately fostering more adaptable and intelligent systems.

Models trained with a naïve curriculum achieved measurable performance-in terms of accuracy and fixation counts-across six discrete and one continuous environments.
Models trained with a naïve curriculum achieved measurable performance-in terms of accuracy and fixation counts-across six discrete and one continuous environments.

The pursuit of artificial general intelligence often fixates on replicating human strategies, yet this work subtly suggests that the path needn’t be mimetic. The agents, trained through human-informed curriculum learning on the 3D Same-Different task, achieve success, but with divergent exploratory behaviors. This echoes a core tenet: systems aren’t built, they grow, and their evolution, even when seeded with human insight, will inevitably chart its own course. As Blaise Pascal observed, “People rarely think of their own thoughts; they are usually the echoes of others.” Here, the system learns the task, but the ‘echo’ of human visual exploration is distinctly altered, a testament to the organic, unpredictable nature of complex adaptive systems.

What Lies Beyond?

The success of human-informed curricula in guiding agents through a 3D same-different task should not be mistaken for mastery. It is, rather, a temporary reprieve from chaos. The divergence in exploratory strategies-agent solutions differing from human intuition-highlights a fundamental truth: systems do not learn what to do, they learn how to appear to do it. The observed behavior isn’t replication, but adaptation to the reward landscape, a landscape subtly, and inevitably, misaligned with the original intent. Long stability in performance is the sign of a hidden disaster, a brittle solution poised to fracture upon encountering unforeseen variance.

The next phase isn’t about refining the curriculum, but about accepting its inherent limitations. The field should shift focus from imposing structure onto learning to cultivating systems capable of generating their own curricula, systems that actively perceive their own epistemic gaps and formulate challenges to close them. True intelligence isn’t found in efficient task completion, but in the capacity for self-directed discovery, a capacity born not of pre-defined paths, but of deliberate, controlled wandering.

The ultimate challenge isn’t Artificial General Intelligence, but Artificial Curious Intelligence. A system that doesn’t merely solve problems, but finds them, and then, with quiet, relentless efficiency, dismantles its own assumptions. The pursuit of elegant solutions is a distraction; the only constant is the evolution of unexpected shapes.


Original article: https://arxiv.org/pdf/2511.17595.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-25 18:01