Ray Kurzweil writes that the human brain is made up of over 300 million pattern recognizers in layers. He thinks that AGI will be constructed upon an infrastructure of layering pattern recognizers. I don’t know that he is right or wrong. However, I do think that too many advanced AI researchers either start with current technology of Neural Networks and Deep Learning or the layered matrix math approach of Machine Learning and try to assemble AI, or they start with neuroscience or biology and try to deconstruct and emulate what they understand of what they find there.
My approach (right or wrong) is to observe the patterns of capability in real world models and autonomously intelligent systems, and abstract a catalog of capabilities that can be modeled with technology, language and layered-interface data structures. I do think that pattern matchers have a place as a fundamental building block, but they are not necessarily the most essential atomic or molecular element upon which to construct AGI.
To contrast this approach with that of what I see in the current field of research, others seem to search for the ultimate “fabric” out of which to construct all mental processes. The Deep Learning crowd assumes that a network of strategically placed layers of simulated neurons, trained with canned data sets and back propagation or similar correction techniques are the silver bullet to teaching a machine to perform simple or complex decision making tasks. If followed down the AGI path, these researchers would expect to construct a complex web of subsystems that perform simple tasks, and somehow compose them into a web of interrelated or hierarchical capabilities. Unfortunately, this path breaks down on two critical fronts.
(1) Stringing together fully trained subsystems or clusters of neurons to feed input to the next cluster in a chain or hierarchy leads to exponential complexity and compounding error margins when training the second or third strata of composed subsystems. Imagine taking this approach out to the ultimate level required, where complex or subtle consciousness capabilities necessitate a thousand or more strata of composed neural clusters, and through which one must train all layers, cluster by cluster, and strata by strata. Even if we could curate the training data sets to properly train the base strata clusters, and the patience to do so, just to embark on planning the training data set and strategy for the second strata would be daunting. Because of this, the second approach (2) would be the natural assumed approach.
(2) instead of training small clusters of neurons to make simple and accurate decisions (like identifying gender from a series of images of faces), create vastly deep and wide networks that receive raw inputs on thousands or millions of channels, and produce hundreds or thousands of output. Train these mega clusters to absorb massive amounts of input about each scenario or scene, and curate their capabilities to produce complex or nuanced decisions (like, is there any reason to believe that any person in a given audio/video clip a threat to the security of the assets in the bank or to the physical safety of the people in or near the bank?) The challenge with this approach is the curation of a significantly sized and accurate training data set. Even still, if such a mega cluster were successfully trained, it’s decision making capability remains highly specialized and would remain difficult to integrate into a broader network of intelligent capabilities that could autonomously select the path to direct contextless incoming information to the right specialized cluster. As such, one would have to feed incoming information to all specialized mega clusters for processing in case one or more of them produced interesting findings, and would require the creation of a layer of mega clusters specializing in prioritizing outputs or conclusions from a million clusters to decide what, if any, conclusions are worth mentioning. This filtering or prioritization specialization can in fact be observed in the human brain in the reticular activating system (RAS) at the stem of the brain. However, this system filters raw input before it enters the rest of the brain for processing. So, while there might be something worth investigating in the approach of simultaneously firing many specialized neural clusters on the same raw input, and filtering for relevance or value, the difficulty in leaning too heavily on the basic technique of neural networks and deep learning remains a limiting factor.
So, what if we place this technique in our back pocket and use it for small specialized firmware? Firmware would be those functions of the brain which perform a fairly simple and static function regardless of context. For example, we might focus this technique on training a RAS that is very good at evaluating the relevance or value of data coming from other subsystems. Once trained, that system would not need to be reprogrammed, retrained or otherwise to demonstrate plasticity or evolution. This sort of firmware seems to be a perfect use case for deep learning clusters.
I do not yet see any evidence that suggests that the neural network approach to AGI can help us when it comes to creating cognition, capability, skill, autonomous learning, reflection, or cognition. I loosely believe that we will need alternative data structures and information processing constructs with which to construct rigid structures as well as plasticity out of which an emergent consciousness will reveal itself. The trouble so far has been to discover the fabric of plasticity. Rigid structures have thus far been the staples of computer science, and hence we keep repeating the mistake of trying to form rigid simulations of intelligence.
An Alternative Approach
Before I dive into my alternative framework for AGI research, I would like to digress into my observations on the evolution of database technologies, particularly since around the year 2006.
Having been around computer science since the late 1980’s, I watched the early file-based storage techniques (e.g. ISAM and Btree) get replaced with relational databases, then SQL standards and then OLAP systems like Microsoft SQL Analytics Server. Today, these are all considered old-school. And yet they still serve as a foundational technology for many advanced systems in 2022. Next came the NoSql wave, where MongoDB, Redis and Cassandra were born to rebel from structures relational table structures and introduced us to how JSON (documents) could be stored and retrieved allowing for some plasticity in the data design (semi-fluid schema). The hundreds of NoSql data technologies that arose from 2006 to 2020 continued to experiment with breaking the old rules of SQL relational databases and in doing so created breakthroughs in horizontal scalability (breaking ACID rules to declare “eventual consistency”) and techniques like sharding. As a software and data architect, I initially felt the urge to evaluate these alternative technologies to pick one that would replace my old favorite systems. I sought to move on from the old and pick one new technology basket into which I could place all my eggs. However, I realized that in this expanded universe of data techniques, there is no one-size-fits-all answer. Instead what I arrived at was that future architectures would need to apply a “polyglot data strategy” to leverage site benefits and specialties of 2-5 different families of data techniques. I might use a doc-store like AWS DynamoDB for high-scale worldwide distributed data collection at the transaction level, while streaming those transactions through a stream processing bus like AWS Kinesis or Kafka, plugging in Sinks to store the raw stream while using other adapters to Storm or Flink to merge and query live streams, to flow transformed data into snowflake schema columnar data warehouse clusters like AWS Redshift or Google BigQuery, only to post-process those data marts into highly aggregated cubes using Microsoft SSAS or Apache Kylin, for a hyper speed (0-4 millisecond) query response for analytics uses. Gone are the days where Microsoft SQL Server plus SSAS could handle enterprise scale data.
Why this tangent about databases and polyglot architecture? Well, in similar fashion, the expert system, the inference engine, the neural network, the deep learning network, and now the Machine Learning approach all seem to try to be the one silver bullet for AI. I strongly believe that we are only at the dawn of experimenting with foundational techniques that will someday be composed into an orchestra of evolved AI as a Polyglot AI Architect constructs their AI designs. I think we are presently (in 2022) at the equivalent moment in the evolution of AI techniques to where data was in 1992 when Btree/ISAM files were replaced by DBase and FoxPro x-base relational files. Yet, we seem so enamored with ML (since it is making money solving whatever micro-problems for business that will generate profit) and we will not seem to get unstuck from this spot to evolve beyond it, until we burn up most of the fuel (profit) with ML. Only those who are not profit motivated (how few they are) shall look beyond this moment of ML mania to forge ahead on the path to developing new foundational techniques for constructing structures and processes of intelligence that may someday lead to the emergent intelligence of AGI.
Orthogonal Patterns Toward Plasticity
When considering the nature of structures that are observable in both the physical and digital spheres, that yield plasticity and dynamic potential, we can turn to both mathematics and software architecture.
When I stumbled on the concept of orthogonal design when studying software architecture, the underlying concept of orthogonality imprinted itself on my thinking about non software topics. The idea of orthogonal design can most easily be grasped by reading the following quote from The Pragmatic Programmer:
Example software architecture patterns that produce orthogonal results include:
- Factory pattern
- Template Class
- Adapter
- Decorator
In order to design rigid structures that support evolving capabilities and plasticity, this principle of orthogonality lends well to the architecture of an AGI mesh that enables open architecture, pluggable content and extensibility.
Trends in Worldwide Disruption Through Open Integrative Standards
I am seeing a pattern in how massive waves of digital advancement come about. Throughout the world, creative innovators contribute novel techniques and technologies that enter the commercial scene and survive by a measure of their commercial value. However, proprietary technologies that flourish (like Microsoft SQL server and Oracle server) will compete in the marketplace and influence the market by proving the need for that specific type of technology (e.g. relational databases), but beyond the overall impact to the advancement of humanity remains slow and limited. When contrasted with open standards based technologies like web standards (HTTP, HTML, JavaScript, CSS), commercial and proprietary tech makes far less worldwide impact. Open, standards based technologies seek to unify and provide a conceptual mesh, within which the commercial market can participate to achieve profitable success. The world advances along aligned directions and create a rich ecosystem in which many may participate in innovation and contribution.
So, when I watch the standards based technologies that have followed, like BlueTooth, 3G-5G cellular, RDF triple store graphs, SVG vs Flash, and HTML 5, I see how the world advanced technology and innovation in what looks like a cooperative and aligned fashion. When energy (money/effort) is expended by total strangers pulling in the same direction that creates force, momentum and accelerated advancement.
Enter my latest inspiration: USD standards for the MetaVerse. This weekend, I read up on the technology that underlies the NVIDIA OmniVerse (their own MetaVerse). I was not surprised to discover that Pixar (the animated movie company) had been inventing their own proprietary technologies for years to innovate the efficient production of 3D immersive worlds, in pursuit of running their own company more efficiently and more competitively. What did excite me was to see that their latest innovation (Universal Scene Description language) is now becoming the open standard to unify how the world contributes to build the MetaVerse. NVIDIA and Pixar are both investing in enhancing USD to become the next HTML/HTTP unlocking event to ignite the MetaVerse revolution, similar to what happened to ignite the Web revolution.
I will excitedly watch how this pattern unfolds to see how commercial applications form around this movement to create force and momentum to advance the digital world.
Bringing this tangent back to the thread about AGI, I envision a need for a similar moment where a few primed agents (people or companies), sitting atop a few insights or investments in advanced technologies recognize the opportunity to create a unifying mesh out of open standards based technology for AI. Imagine a standard set of protocols, languages and systems that could compose strata of clustered AI capabilities (materials, skills, ontologies), to feed to processing engines that “render” intelligent behavior, transfer learning, inheritance and specialization, adaptation and plasticity. Imagine a moment when a library of components that perform specialized skills are available to be composed into compound skills and capabilities. Imagine anyone in the world adhering to the standards can contribute micro-innovations to the MindScape, enabling others to build upon those to integrate new models, frameworks and tools to expand the web of AI. Then, imagine a time when AI sub-systems can ponder or process new information and respond by composing new information, skills and capabilities from the mesh.
Composable AI becomes Generative AI.
This is the concept of my efforts and vision for the future of AGI. This is why I will invest time and creativity into envisioning an open standards framework for an AI mesh (MindScape) that could ignite aligned momentum in the advancement of worldwide AI technology.