How much information do you need to build a human being and, more specifically, a human brain?
After all, we are by far the most complex species on the planet. To take it up a notch, some of our brains think that our brains are the most complex structures in the universe!
Nevertheless, a tomato has more genes than a human being. 7000 more, to be precise.
Looking at our genes, we have a hard time figuring out where all our complexity is coded for.
There are only approximately 20000 genes to begin with, and around half of them are concerned with other things like building hands and feet and vital organs.
To put it mathematically (considering our genome can be viewed as a code carrying information being processed by something similar to a Turing machine, as I explain in more detail here), our genome only carries 25 million bytes of design information for the brain after lossless compression.
Gauge that against the 10^(15) connections (one quadrillion!) that adults are estimated to have in the neocortex, the most recent part of our brain that only exists in primates and has grown extraordinarily large in homo sapiens. You’ll see that, if we don’t run completely amiss in our understanding of genes, it would be irrational to assume that large portions of our knowledge and abilities are encoded directly in the genes.
The only alternative is that there needs to be a much simpler, more efficient way of defining the blueprint for our brain and for our neocortex.
And with that for building a prototypical intelligent system.
A unified theory of brain function
In his book On Intelligence, Jeff Hawkins complains that the prevalent picture of the brain is as being composed of highly specialized regions.
He compares this situation with 19th-century biologists examining in ever-increasing detail the large variety of species, without having an eye out for the unifying principles behind life. Until Darwin came along with his theory of evolution, no one understood how to describe the multiplicity of appearances of the natural world in an overarching narrative.
The brain likewise might look like it’s composed of many different, highly specialized brain regions, but their apparent specialization shouldn’t lead us to conclude that they might not all work based on the same anatomical and algorithmic principles.
In fact, we observe that there is a surprising homogeneity in the anatomy of the neocortex. Neuroplasticity indicates that most brain regions can easily take on tasks previously carried out by other brain regions, showing a certain universality behind their design principles.
In his bestseller The Brain that Changes Itself, Norman Doidge tells impressive stories of patients remapping entire sensory systems to new parts of the brain, like people learning to see with their tongue by mapping visual stimuli recorded with a camera to sensory stimuli straight into their mouth.
Research on stroke patients likewise shows that abilities lost to strokes are routinely relearned by new brain regions, and people born deaf can remap their Broca area (responsible for language processing) to control their hand movements, with which they communicate by sign language, instead of the movement of their mouth, with which they would articulate speech.
The brain exerts an incredible capacity and flexibility to learn new things. Most people can learn whichever language they grow up with, or choose to learn a new one later in life, can learn whichever instrument they pick up (admittedly with varying success) and so on.
The fact of plasticity and flexible learning can be interpreted as pointing, in accord with the sparsity of information in our genes, towards a universal structure underlying both in the biological setup of the neocortex and the learning algorithms with which it operates.
The structure of thought
It can be difficult to conceptualize thinking itself (as I delved into much more detail in my recent article on the geometry of thought), but there are certain structures and patterns and that run deep through almost every aspect of our cognition.
As Ray Kurzweil explains in his book How to Create a Mind, we perceive the world in a hierarchical manner, composed of simple patterns increasing in complexity. According to him, pattern recognition forms the foundation of all thought, from the most primitive patterns up to highly abstract and complex concepts.
Take language and writing as an example. Small lines build up patterns that we can recognize as letters. Assortments of letters form words, then sentences. Sentences form paragraphs, whole articles. And in the end, out of an assortment of a small number of minimal patterns arranged in a highly specific way, narrative and meaning emerge.
The biology of pattern recognition
Modern neuroimaging data indicates that the neocortex is composed of a uniform assortment of structures called cortical columns. Each one is built up from around 100 neurons.
Kurzweil proposes that these columns form what he calls minimal pattern recognizers. A conceptual hierarchy is created by connecting layers upon layers of pattern recognizers with each other, each specialized in recognizing a single pattern from the input of one of many different possible sensory modalities (like the eyes, the ears, the nose).
Building upon basic feature extractions (like detecting edges in visual stimuli or recognizing a tone), these patterns stack up to form more and more intricate patterns.
A pattern recognizer is not bound to, say, processing visual or auditory stimuli. It can process all kinds of signals as inputs, generating outputs based on structures contained in the inputs. Learning means wiring up pattern recognizers and learning their weight structure (basically how strong they respond to each other’s input and how much they are interconnected among each other), similar to what is done when learning neural networks.
But how can the brain be so homogeneous while being so good at solving many different tasks? The answer might lie in the intersection of neuroscience and computer science.
The role of information
What does visual, auditory and sensory information have in common? The obvious answer is that it’s all some sort of information.
While information is frankly a bit tricky to define and gets thrown around way too much in the information age, in the context of information processing in the brain it has a technical meaning. A step towards understanding how this architecture could work so well for us can lie in realizing that the brain can be thought of as an information processing device.
There is much uniformity in the input to neurons, the underlying currency of neural computation. Whichever signal the brain is processing, it is always composed of spatial and temporal firing patterns of neurons. Every kind of pattern we observe in the outside world is encoded in our sense organs into neural firing patterns, which then, according to Kurzweil, flow upwards and downwards the hierarchy of pattern recognizer until meaning is successfully extracted.
The neuroscientific evidence is supported by ideas from computer science. In his book The Master Algorithm, Pedro Domingos proposes that we might find a universal algorithm that would, given the right data, allow us to learn pretty much anything we could think of.
This universal learning algorithm may even be composed of a mix of already existing learning algorithms (like Bayesian networks, connectionist or symbolist approaches, evolutionary algorithms, support vector machines, etc.).
Something akin to this universal algorithm might also be used by the brain, although we are not yet quite sure how the brain learns from an algorithmic perspective. As the most basic example, there’s, of course, Hebbian learning, which has been shown to take place in the brain to some extent. For more sophisticated algorithms, researchers have been trying to find biologically plausible mechanisms for implementing backpropagation in the brain, among many other things.
But it is clear that the brain is very good at learning, and needs to do so in a way that we can in principle understand and very probably model on our computers.
Information loss in Neural Networks
The trick to recognizing a pattern is to decode it, to parse out the relevant information hidden inside the signal. Learning how the brain does this might be one of the key steps to understanding how intelligence works.
Jeff Hawkins, the author of On Intelligence, complains about our poverty of tools when it comes to studying the role of information in the brain, but there has been increasing progress in understanding information flows in computational architectures.
This summer, I had the great privilege of attending two talks by Israeli neuroscientist Naftali Tishby on his information bottleneck method. With gleaming eyes and an enthusiasm that elated the entire crowd, he explained how information is filtered when deep neural networks extract relevant features from input data (watch his talk at Stanford for an introduction).
The theory illuminates how information flows in deep neural networks (and gives a nice reason why deep networks tend to work so much better than shallow networks).
When you learn to recognize a face from pictures with 300×300 pixels, you have 90000 pixels containing information, but a face can, if you know what usually makes up a face, be characterized by much less information (e.g. the relevant features like distance of the eyes, the width of the mouth, the position of the nose, etc.).
This idea is used for instance in some deep generative models like Autoencoders (an introduction can be found here), where latent, lower-dimensional representations of the data are learned and then used to generate higher-dimensional, realistic-looking output.
Network training methods like stochastic gradient descent allow the network to filter out relevant patterns by throwing out all irrelevant information from the input effectively (like ignoring the background of a photo when classifying the object in the photo, as Ian Goodfellow points out in his book Deep Learning).
Tishby compares it to water flowing from the bottom of a bottle to its top: the bottleneck gets tighter and tighter, and less and less information can flow through. But if the bottleneck is set up well, the water that reaches the top ends up carrying all the necessary information.
I brought this up because I think this information-theoretic perspective can help us understand the idea of the neocortex as composed of pattern recognizers.
Pattern recognizers extract patterns from data. These patterns only form a small subset of the input, so in essence, the pattern recognizers of the brain are set up to extract information relevant to our survival from our sense data, and to sort this extracted data into hierarchies of knowledge (I discuss at much more length how this might be structured into conceptual spaces in my article on the Geometry of Thought). These we can then use to bring order into the messy appearance of the world, increasing our chances of survival.
This is the job of the brain. At its core, it’s an information filtering and ordering device constantly learning useful patterns from data.
Jürgen Schmidthuber likens the progress of science to finding ever-more efficient compressing algorithms: Newton and Einstein brilliantly managed not to come up with large and incomprehensible formula, but rather they expressed an incredible range of phenomena by equations that could be written in one line. Schmidthuber thinks that this kind of ultra-compression might at one point also come true for general learners.
Compression and information filtering could well be at the core of what we think of as intelligence, so we might as well learn something from it (as we have been already) when building our own intelligent systems.
Why intelligence might be simpler than we think
I’m no prophet when it comes to the future of AI, and I hope you were taught not to put too much faith in the opinions of strangers on the internet, so take this with a grain of salt.
I admit that there’s much more to information processing and intelligence than simple pattern classification (see for example my article on Ants and the Problems With Neural Networks).
There are many questions to address before we have “solved” intelligence. Inferring causality or general, common sense knowledge structures, as Yann LeCun points out here, is a big issue, and building predictive models of the world into the algorithms (as I go into at length in my article on The Bayesian Brain Hypothesis) is very probably a necessary next step out of many more necessary steps.
Other open questions, connected to the need for better objective functions, are encountered in reinforcement learning when training robots to carry out tasks intelligently. Being intelligent means solving problems, and one big aspect of this is figuring out the best ways to define objectives and then to achieve these objectives (in the brain, this role is believed to be played to an extent by the basal ganglia).
So just stacking up pattern recognizers won’t suddenly bring about robots running around reasoning like humans.
I still think the sparsity of information contained in the genetic code supported by the emerging evidence for the simplicity and universality behind the setup of the neocortex and its learning algorithms should let us pause, and take the chances of building highly intelligent machines in the near future more seriously (Kurzweil predicts machines passing the Turing test in 2029 and human-level AI in 2045).
As P.W. Anderson said in his famous paper about the hierarchy of science, more is different, and more might come out of scaling up the use of simple things if we figured out the right way to scale them up. Some of this has already been apparent in the recent success of deep learning, which is closely tied to scaling up available data and computing power.
For me, understanding and building our own intelligence is an absolutely thrilling outlook.
But as many people emphasize (this Ted Talk gives a summary), the rise of AI could have large implications for mankind as a whole and should be taken seriously as a problem. And even if we overestimate the problem (because we frankly love thinking about the end of the world a little too much), we should be better safe than sorry.
Because after all nature came up with intelligence through the blind fancies of evolution. And it looks like we might come up with it as well soon.