Building Intelligence, Block by Block

Author: Denis Avetisyan


A new review argues that the brain’s modular design holds the key to creating more flexible and powerful artificial intelligence systems.

Modularity is presented as a fundamental principle underlying both natural and artificial intelligence, offering a path to compositional generalization and emergent properties in AI.

Despite the remarkable advances in artificial intelligence, current systems often demand computational resources far exceeding those of the human brain, highlighting a critical need for novel architectural principles. This review, titled ‘Modularity is the Bedrock of Natural and Artificial Intelligence’, argues that modularity-the organization of systems into specialized, interacting components-is a fundamental principle underlying both biological and artificial intelligence. By examining research across neuroscience and AI, we demonstrate that modularity supports efficient learning, strong generalization, and compositional flexibility, offering a pathway towards more adaptable and resource-conscious AI. Can embracing this principle unlock a new era of artificial intelligence that truly mirrors the elegance and efficiency of the brain?


The Limits of Reasoning: A Crisis Forged in Scale

Despite achieving remarkable feats in natural language processing, Large Language Models (LLMs) frequently falter when confronted with tasks demanding genuine reasoning and the ability to generalize beyond their training data. While proficient at identifying patterns and generating human-like text, these models often struggle with scenarios requiring common sense, causal inference, or the application of knowledge to novel situations. This limitation manifests in practical applications, where LLMs may produce factually incorrect or illogical outputs, struggle with ambiguous prompts, or fail to adapt to unexpected inputs. The impressive performance observed in controlled benchmarks doesn’t always translate to reliable performance in the messiness of real-world contexts, revealing a crucial gap between statistical fluency and true cognitive ability.

Current approaches to artificial intelligence frequently rely on the “bitterness lesson” – the observation that simply increasing the scale of data and model size often yields performance improvements. However, this strategy is proving increasingly unsustainable; achieving even modest gains in reasoning ability now demands training on tens of trillions of tokens. This represents a stark contrast to biological intelligence, where the human brain, despite its remarkable capabilities, develops through exposure to vastly less data – an experience roughly equivalent to a few terabytes. The escalating data requirements of large language models not only present logistical and financial hurdles but also suggest a fundamental inefficiency in their architecture, prompting researchers to explore alternative designs inspired by the efficiency and structural principles of the human brain.

The energy demands of contemporary large language models present a stark contrast to the efficiency of biological intelligence. Training a single model, such as GPT-3, requires approximately 1287 megawatt hours of energy – a figure that dwarfs the 3.15 megawatt hours needed to power a human brain at a consistent 20 watts for eighteen years. This represents a difference of roughly three orders of magnitude, exposing a critical gap in computational efficiency. Such immense energy consumption raises significant sustainability concerns and underscores the limitations of simply scaling model size and data volume as a path towards artificial general intelligence; it suggests that fundamentally new approaches, inspired by the brain’s architecture, are necessary to achieve truly intelligent systems.

Current limitations in large language models aren’t simply matters of computational power, but rather reflect a fundamental divergence from the efficiency of biological intelligence. The human brain, developed through millions of years of evolution, achieves remarkable reasoning capabilities with an astonishingly low energy budget and a comparatively small dataset. This suggests that inherent structural properties – such as sparse coding, hierarchical organization, and predictive processing – are crucial for robust and generalizable intelligence. Consequently, a paradigm shift is needed in artificial intelligence design, moving beyond simply scaling up existing architectures and instead focusing on incorporating principles borrowed from the brain’s efficient and structured approach to information processing. Mimicking these biological designs may prove essential to overcome the current scaling bottlenecks and unlock truly intelligent machines.

The Brain’s Blueprint: Modular Organization as a Path Forward

Functional modularity in the brain describes the organization of neural processing into discrete, specialized units – modules – rather than a single, undifferentiated mass. These modules are not anatomically isolated; instead, they are interconnected networks of neurons dedicated to specific cognitive functions, such as facial recognition, language processing, or spatial navigation. This distributed architecture allows for parallel processing, increasing computational efficiency and speed. Evidence for functional modularity comes from lesion studies, where damage to a specific brain region often impairs only the associated function, while leaving others intact, and from neuroimaging techniques like fMRI, which demonstrate localized activation patterns during specific tasks. The specialization within modules, combined with their interconnectedness, facilitates both rapid processing and flexible adaptation to changing demands.

Modularity, as a systems design principle, confers several key advantages to biological and artificial systems. Robustness is achieved through redundancy and distributed processing; failure of a single module does not necessarily compromise the entire system’s function. Adaptability stems from the ability to modify or replace individual modules without requiring a complete overhaul of the architecture, enabling incremental improvements and responses to changing demands. Enhanced performance results from specialization; modules optimized for specific tasks operate more efficiently than a generalized, monolithic system attempting to handle all functions. This principle of decomposition into interacting, specialized components is therefore a fundamental strategy for building complex, high-performing, and resilient systems.

The brain’s modular organization is substantiated by the prevalence of Canonical Microcircuits – recurring neural arrangements found throughout the cortex – and the complex interplay of Brain Networks. Canonical Microcircuits, typically consisting of six layers with consistent excitatory and inhibitory connections, represent fundamental processing units. These microcircuits aren’t isolated; instead, they participate in larger-scale Brain Networks, such as the default mode, executive control, and salience networks. The consistent presence of these microcircuits, coupled with the coordinated activity of interconnected networks, demonstrates that modularity isn’t limited to broad brain regions but extends to microscopic neural computations and macroscopic cognitive functions, indicating a hierarchical and multi-scale organizational principle.

Hierarchical Modularity describes the brain’s organization as nested modules, where simpler modules combine to form more complex ones, and these, in turn, integrate into even higher-level systems. This arrangement isn’t simply a linear progression; modules at each level can operate relatively independently while contributing to the function of the encompassing structure. For example, individual neurons form microcircuits, which contribute to larger cortical columns, and these columns then participate in broader brain networks responsible for complex cognitive functions. This multi-level organization allows for both specialization-with lower-level modules optimizing specific computations-and integration, enabling the brain to handle increasingly complex tasks without succumbing to combinatorial explosion of connections, and crucially, to maintain functionality even with localized damage.

Echoes of Biology: Implementing Modularity in Artificial Systems

Deep Neural Networks (DNNs) inherently possess a degree of modularity stemming from their layered architecture. Each layer performs a specific feature extraction or transformation, operating as a relatively independent unit within the larger network. This layered composition allows for hierarchical representation learning, where lower layers detect simple features and subsequent layers combine these into more complex representations. While not explicitly designed as discrete modules, this structure facilitates a degree of specialization and allows for transfer learning, where pre-trained layers can be re-used in different contexts. The implicit modularity of DNNs provides a foundational basis for exploring and implementing more explicit modular designs, leveraging existing network structures to improve performance and efficiency.

Architectural Modularity, achieved through the deliberate incorporation of modularity priors during AI system design, focuses on constructing networks from pre-defined, reusable components. This contrasts with solely relying on emergent modularity observed during training. Specifically, techniques include defining explicit interfaces between modules, employing sparse connectivity patterns, and utilizing techniques like Mixture-of-Experts where different modules specialize in different sub-tasks. These approaches aim to improve generalization performance by reducing the number of parameters needing adjustment for novel tasks, enhance training efficiency through parallelization and transfer learning, and increase robustness by isolating failures within specific modules rather than propagating them system-wide. Empirical results demonstrate that explicitly modular architectures often exhibit superior performance on tasks requiring compositional generalization compared to monolithic networks with equivalent parameter counts.

Emergent modularity refers to the phenomenon where complex neural networks develop functionally distinct sub-networks, or modules, during the training process without explicit architectural constraints enforcing such modularity. Observation of internal activations reveals that individual neurons or groups of neurons specialize in processing specific features or aspects of the input data. This specialization isn’t pre-programmed but arises as a consequence of optimization towards the overall task objective. Research indicates that factors such as network size, training data diversity, and the optimization algorithm employed can influence the degree and nature of emergent modularity. Consequently, strategic design choices – including regularization techniques, curriculum learning, and network initialization schemes – offer potential avenues to encourage the development of beneficial modular structures and improve model generalization capabilities.

Memory-Augmented Language Models (LLMs) and Retrieval-Augmented Generation (RAG) represent implementations of modularity by decoupling the core language model from external knowledge repositories. In these architectures, the LLM functions as a processing module, while the external memory or retrieval system acts as a separate knowledge module. RAG, specifically, enhances LLM performance by retrieving relevant documents from a knowledge base and incorporating them into the prompt, allowing the model to ground its responses in factual information. This modular approach improves reasoning capabilities, particularly in knowledge-intensive tasks, and demonstrably reduces the occurrence of hallucinations-generating factually incorrect or nonsensical outputs-by providing a verifiable source for generated content. The separation of knowledge storage and processing also facilitates easier updating and maintenance of the knowledge base without requiring retraining of the core LLM.

Beyond Scale: Towards a More Principled Artificial Intelligence

Current approaches to artificial intelligence often treat systems as monolithic entities, a stark contrast to the highly modular architecture of the brain. Emerging theories, such as the Thousand Brains Theory and Global Workspace Theory, are providing blueprints for building AI systems composed of numerous, specialized modules. The Thousand Brains Theory posits that the brain functions not as a single network, but as a collection of independent ‘brains’ each responsible for processing specific aspects of the environment, while the Global Workspace Theory suggests consciousness arises from a ‘workspace’ where these modular processes compete for attention and access to resources. By drawing inspiration from these neuroscientific models, researchers are beginning to explore AI architectures where complex tasks are broken down and distributed across multiple modules, fostering greater efficiency, adaptability, and potentially, a more human-like intelligence.

Inspired by the efficiency of biological neural networks, spike-based communication offers a fundamentally different approach to information transfer within artificial intelligence systems. Unlike traditional AI which relies on continuous signals, this method transmits information via discrete “spikes”-brief pulses of activity-much like neurons in the brain. This event-driven communication dramatically reduces energy consumption, as modules only communicate when necessary and resources aren’t wasted on constant signal transmission. Furthermore, the timing of these spikes can encode additional information, increasing the complexity and nuance of the communication without significantly increasing energy demands. By mirroring this biological signaling, AI architectures can achieve greater computational power with substantially lower energy footprints, potentially enabling deployment in resource-constrained environments and paving the way for more sustainable AI technologies.

The architecture of an artificial neural network is profoundly shaped by the cost function used during its training. Traditionally, these functions prioritize overall task performance, often leading to monolithic networks where specialized knowledge is diffusely represented. However, innovative cost functions are now being designed to explicitly reward modularity – that is, networks composed of distinct, specialized modules. These functions achieve this by, for example, penalizing excessive cross-talk between modules or incentivizing the development of sparse connections. Consequently, the resulting networks not only achieve comparable or superior performance on target tasks, but also exhibit improved generalization capabilities, increased robustness to noise, and enhanced interpretability due to the clear functional roles of individual modules. This approach promises a significant departure from simply scaling up existing architectures, offering a pathway towards more efficient, adaptable, and biologically plausible intelligence.

The current trajectory of artificial intelligence development heavily emphasizes scaling – increasing the number of parameters in existing model architectures. Recent advancements boast models with parameter counts exceeding those of their predecessors by a factor of ten, yet this approach appears to be reaching limitations in terms of efficiency and genuine intelligence. A shift towards modularity offers a compelling alternative, drawing inspiration from the brain’s organization into specialized, interconnected regions. This isn’t simply about building larger networks, but about crafting systems comprised of distinct modules, each responsible for specific functions and communicating effectively with others. By embracing this principle, researchers aim to move beyond incremental improvements in performance and towards AI systems that exhibit greater robustness, adaptability, and ultimately, a more biologically plausible form of intelligence.

The pursuit of artificial intelligence, as detailed within, mirrors the very architecture of nature’s intelligence. This work posits modularity not as a design choice, but as an inevitable consequence of complex systems striving for robustness and adaptability. It’s a principle echoing through the layers of the brain and increasingly visible in successful deep learning models. G. H. Hardy once stated, “A mathematician, like a painter or a poet, is a maker of patterns.” This sentiment rings true; the construction of intelligence, whether biological or artificial, isn’t about building monolithic structures, but about composing smaller, interconnected patterns-modules-that collectively give rise to emergent properties and compositional generalization. The system doesn’t become intelligent; intelligence emerges from the arrangement.

What Lies Ahead?

The pursuit of modularity, as this work suggests, isn’t about assembling intelligence-it’s about cultivating the conditions for its emergence. Every dependency is a promise made to the past, a constraint on future adaptation. The architectures proposed today, however elegant, will inevitably reveal their fault lines; the question isn’t if they will break, but where, and when. The real challenge lies not in maximizing performance on current benchmarks, but in minimizing the surface area for future failure.

Compositional generalization, a holy grail for those seeking truly adaptable systems, demands a reckoning with the illusion of control. Control is an illusion that demands SLAs. Systems aren’t built, they’re grown, and their behavior will always exceed the bounds of their initial specification. The focus must shift from designing intelligence to designing for surprise – for systems that can not only tolerate novelty, but actively seek it out, and self-correct when the inevitable cracks appear.

Everything built will one day start fixing itself. The cycles of refinement will continue, not as top-down engineering, but as a distributed, emergent process. The brain, after all, doesn’t ‘control’ its complexity-it embodies it. The path forward isn’t to replicate the brain, but to understand the principles that allow it to flourish, even amidst constant disruption.


Original article: https://arxiv.org/pdf/2602.18960.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-02-25 01:54