| Next: | 2.4: The modality level | Bookmark | |
| Up: | 2: Part II: Levels of organization in deliberative general intelligence | Monolithic | |
| Prev: | 2.2: Levels of organization in deliberation |
The code level is composed of functions, classes, modules, packages; data types, data structures, data repositories; all the purely programmatic challenges of creating AI. Artificial Intelligence has traditionally been much more intertwined with computer programming than it should be, mostly because of attempts to overcompress the levels of organization and implement thought sequences directly as programmatic procedures, or implement concepts directly as LISP atoms or LISP frames. The code level lies directly beneath the modality level or brainware level; bleedover from modality-level challenges may show up as legitimate programmatic problems, but little else - not thoughts, cognitive content, or high-level problem-solving methods.
Any good programmer - a programmer with a feeling for aesthetics - knows the tedium of solving the same special case, over and over, in slightly different ways; and also the triumph of thinking through the metaproblem and creating a general solution that solves all the special cases simultaneously. As the hacker Jargon File observes, "Real hackers generalize uninteresting problems enough to make them interesting and solve them -- thus solving the original problem as a special case (and, it must be admitted, occasionally turning a molehill into a mountain, or a mountain into a tectonic plate)." [Raymond01a]. This idiom does not work for general AI! A real AI would be the ultimate general solution because it would encapsulate the cognitive processes that human programmers use to write any specific piece of code, but this ultimate solution cannot be obtained through the technique of successively generalizing uninteresting problems into interesting ones.
Programming is the art of translating a human's mental model of a problem-solution into a computer program; that is, the art of translating thoughts into code. Programming inherently violates the levels of organization; it leads directly into the pitfalls of classical AI. The underlying low-level processes that implement intelligence are of a fundamentally different character than high-level intelligence itself. When we translate our thoughts about a problem into code, we are establishing a correspondence between code and the high-level content of our minds, not a correspondence between code and the dynamic process of a human mind. In ordinary programming, the task is to get a computer to solve a specific problem; it may be an "interesting" problem, with a very large domain, but it will still be a specific problem. In ordinary programming the problem is solved by taking the human thought process that would be used to solve an instance of the problem, and translating that thought process into code that can also solve instances of the problem. Programmers are humans who have learned the art of inventing thought processes, called "algorithms", that rely only on capabilities an ordinary computer possesses.
The reflexes learned by a good, artistic programmer represent a fundamental danger when embarking on a general AI project. Programmers are trained to solve problems, and trying to create general AI means solving the programming problem of creating a mind that solves problems. There is the danger of a short-circuit, of misinterpreting the problem task as writing code that directly solves some specific challenge posed to the mind, instead of building a mind that can solve the challenge with general intelligence. Code, when abused, is an excellent tool for creating long-term problems in the guise of short-term solutions.
Having described what we are forbidden to do with code, what legitimate challenges lie on this level of organization?
Some programming challenges are universal. Any modern programmer should be familiar with the world of compilers, interpreters, debuggers, Integrated Development Environments, multithreaded programming, object orientation, code reuse, code maintenance, and the other tools and traditions of modern-day programming. It is difficult to imagine anyone successfully coding the brainware level of general intelligence in assembly language - at least if the code is being developed for the first time. In that sense object orientation and other features of modern-day languages are "required" for AI development; but they are necessary as productivity tools, not because of any deep similarity between the structure of the programming language and the structure of general intelligence. Good programming tools help with AI development but do not help with AI.
Some programming challenges, although universal, are likely to be unusually severe in AI development. AI development is exploratory, parallelized, and large. Writing a great deal of exploratory code means that IDEs with refactoring support and version control are important, and that modular code is even more important than it is usually - or at least, code that is as modular as possible given the highly interconnected nature of the cognitive supersystem.
Parallelism on the hardware level is currently supported by symmetric multiprocessing chip architectures [Hwang98], NOW (network-of-workstations) clustering [Anderson95] and Beowulf clustering [Becker95], and message-passing APIs such as PVM [Geist93] and MPI [Gropp94]. However, software-level parallelism is not handled well by present-day languages and is therefore likely to present one of the greatest challenges. Even if software parallelism were well-supported, AI developers will still need to spend time explicitly thinking on how to parallelize cognitive processes - human cognition may be massively parallel on the lower levels, but the overall flow of cognition is still serial.
Finally, there are some programming challenges that are likely to be unique to AI.
We know it is possible to evolve a general intelligence that runs on a hundred trillion synapses with characteristic limiting speeds of approximately 200 spikes per second. An interesting property of human neurobiology is that, at a limiting speed of 150 meters per second for myelinated axons, each neuron is potentially within roughly a single "clock tick" of any other neuron in the brain1. [Sandberg99] describes a quantity S that translates to the wait time, in clock cycles, between different parts of a cognitive system - the minimum time it could take for a signal to travel between the most distant parts of the system, measured in the system's clock ticks. For the human brain, S is on the rough order of 1 - in theory, at least. In practice, axons take up space and myelinated axons take up even more space, so the brain uses a highly modular architecture, but there are still long-distance pipes such as the corpus callosum. Currently, S is much greater than 1 for clustered computing systems. S is greater than 1 even within a single-processor computer system; Moore's Law for intrasystem communications bandwidth describes a substantially slower doubling time than processor speeds. Increasingly the limiting resource of modern computing systems is not processor speed but memory bandwidth [Wulf95] (and this problem has gotten worse, rather than better, since 1995).
One class of purely programmatic problems that are unique to AI arise from the need to "port" intelligence from massively parallel neurons to clustered computing systems (or other human-programmable substrate). It is conceivable, for example, that the human mind handles the cognitive process of memory association by comparing current working imagery to all stored memories, in parallel. We have no particular evidence that the human mind uses a brute force comparison, but it could be brute-forced. The human brain acknowledges no distinction between CPU and RAM. If there are enough neurons to store a memory, then the same neurons may presumably be called upon to compare that memory to current experience. (This holds true even if the correspondence between neural groups and stored memories is many-to-many instead of one-to-one.)
Memory association may or may not use a "compare" operation (brute force or otherwise) of current imagery against all stored memories, but it seems likely that the brain uses a massively parallel algorithm at one point or another of its operation; memory association is simply a plausible candidate. Suppose that memory association is a brute-force task, performed by asking all neurons engaged in memory storage to perform a "compare" against patterns broadcast from current working imagery. Faced with the design requirement of matching the brute force of 1014 massively parallel synapses with a mere clustered system, a programmer may be tempted to despair. There is no a priori reason why such a task should be possible.
Faced with a problem of this class, there are two courses the programmer can take. The first is to implement an analogous "massive compare" as efficiently as possible on the available hardware - an algorithmic challenge worthy of Hercules, but past programmers have overcome massive computational barriers through heroic efforts and the relentless grinding of Moore's Law. The second road - much scarier, with even less of a guarantee that success is possible - is to redesign the cognitive process for different hardware.
The human brain's most fundamental limit is its speed. Anything that happens in less than a second perforce must use less than 200 sequential operations, however massively parallelized. If the human brain really does use a massively parallel brute-force compare against all stored memories to handle the problem of association, it's probably because there isn't time to do anything else! The human brain is massively parallel because massive parallelism is the only way to do anything in 200 clock ticks. If modern computers ran at 200Hz instead of 2GHz, PCs would also need 1014 processors to do anything interesting in realtime.
A sufficiently bold general AI developer, instead of trying to reimplement the cognitive process of association as it developed in humans, might instead ask: What would this cognitive subsystem look like, if it had evolved on hardware instead of wetware? If we remove the old constraint of needing to complete in a handful of clock ticks, and add the new constraint of not being able to offhandedly "parallelize against all stored memories", what is the new best algorithm for memory association? For example, suppose that you find a method of "fuzzy hashing" a memory, such that mostly similar memories automatically collide within a container space, but where the fuzzy hash inherently requires an extended linear series of sequential operations that would have placed "fuzzy hashing" out of reach for realtime neural operations. "Fuzzy hashing" would then be a strong candidate for an alternative implementation of memory association.
A computationally cheaper association subsystem that exploits serial speed instead of parallel speed, whether based around "fuzzy hashing" or something else entirely, might still be qualitatively less intelligent than the corresponding association system within the human brain. For example, memory recognition might be limited to clustered contexts rather than being fully general across all past experience, with the AI often missing "obvious" associations (where "obvious" has the anthropocentric meaning of "computationally easy for a human observer"). In this case, the question would be whether the overall general intelligence could function well enough to get by, perhaps compensating for lack of associational breadth by using longer linear chains of reasoning. The difference between serialism and parallelism, on a low level, would propagate upward to create cognitive differences that compensate for the loss of human advantages or exploit new advantages not shared by humans.
Another class of problem stems from "porting" across the extremely different programming styles of evolution versus human coding. Human-written programs typically involve a long series of chained dependencies that intersect at single points of failure - "crystalline" is a good term to describe most human code. Computation in neurons has a different character. Over time our pictures of biological neurons have evolved from simple integrators of synaptic inputs that fire when a threshold input level is reached, to sophisticated biological processors with mixed analog-digital logics, adaptive plasticity, dendritic computing, and functionally relevant dendritic and synaptic morphologies [Koch00]. What remains true is that, from an algorithmic perspective, neural computing uses roughly arithmetical operations2 that proceed along multiple intertwining channels in which information is represented redundantly and processed stochastically. Hence, it is easier to "train" neural networks - even nonbiological connectionist networks - than to train a piece of human-written code. Flipping a random bit inside the state of a running program, or flipping a random bit in an assembly-language instruction, has a much greater effect than a similar perturbation of a neural network. For neural networks the fitness landscapes are smoother. Why is this? Biological neural networks need to tolerate greater environmental noise (data error) and processor noise (computational error), but this is only the beginning of the explanation.
Smooth fitness landscapes are a useful, necessary, and fundamental outcome of evolution. Every evolutionary success starts as a mutation - an error - or as a novel genetic combination. A modern organism, powerfully adaptive with a large reservoir of genetic complexity, necessarily possesses a very long evolutionary history; that is, the genotype has necessarily passed through a very large number of successful mutations and recombinations along the road to its current form. The "evolution of evolvability" is most commonly justified by reference to this historical constraint [Dawkins96], but there have also been attempts to demonstrate local selection pressures for the characteristics that give rise to evolvability [Wagner96], thus averting the need to invoke the controversial agency of species selection. Either way, smooth fitness landscapes are part of the design signature of evolution.
"Smooth fitness landscapes" imply, among other things, that a small perturbation in the program code (genetic noise), in the input (environmental noise), or in the state of the executing program (processor noise), is likely to produce at most a small degradation in output quality. In most human-written code, a small perturbation of any kind usually causes a crash. Genomes are built by a cumulative series of point mutations and random recombinations. Human-written programs start out as high-level goals which are translated, by an extended serial thought process, into code. A perturbation to human-written code perturbs the code's final form, rather than its first cause, and the code's final form has no history of successful mutation. The thoughts that gave rise to the code probably have a smooth fitness metric, in the sense that a slight perturbation to the programmer's state of mind will probably produce code that is at most a little worse, and possibly a little better. Human thoughts, which are the original source of human-written code, are resilient; the code itself is fragile.
The dream solution would be a programming language in which human-written, top-down code somehow had the smooth fitness landscapes that are characteristic of accreted evolved complexity, but this is probably far too much to ask of a programming language. The difference between evolution and design runs deeper than the difference between stochastic neural circuitry and fragile chip architectures. On the other hand, using fragile building blocks can't possibly help, so a language-level solution might solve at least some of the problem.
The importance of smooth fitness landscapes holds true for all levels of organization. Concepts and thoughts should not break as the result of small changes. The code level is being singled out because smoothness on the code level represents a different kind of problem than smoothness on the higher levels. On the higher levels, smoothness is a product of correctly designed cognitive processes; a learned concept will apply to messy new data because it was abstracted from a messy experiential base. Given that AI complexity lying within the concept level requires smooth fitness landscapes, the correct strategy is to duplicate the smoothness on that level - to accept as a high-level design requirement that the AI produce error-tolerant concepts abstracted from messy experiential bases.
On the code level, neural circuitry is smooth and stochastic by the nature of neurons and by the nature of evolutionary design. Human-written programs are sharp and fragile ("crystalline") by the nature of modern chip architectures and by the nature of human programming. The distinction is not likely to be erased by programmer effort or new programming languages. The long-term solution might be an AI with a sensory modality for code (see Part III), but that is not likely to be attainable in the early stages. The basic code-level "stuff" of the human brain has built-in support for smooth fitness landscapes, and the basic code-level "stuff" of human-written computer programs does not. Where human processes rely on neural circuitry being automatically error-tolerant and trainable, it will take additional programmatic work to "port" that cognitive process to a new substrate where the built-in support is absent. The final compromise solution may have error tolerance as one explicit design feature among many, rather than error-tolerance naturally emerging from the code level.
There are other important features that are also supported by biological neural networks - that are "natural" to neural substrate. These features probably include:
This concludes the account of exceptional issues that arise at the code level. An enumeration of all issues that arise at the code level - for example, serializing the current contents of a sensory modality for efficient transmission to a duplicate modality on a different node of a distributed network - would constitute at least a third of a complete constructive account of a general AI. But programming is not all the work of AI, perhaps not even most of the work of AI; much of the effort needed to construct an intelligence will go into prodding the AI into forming certain concepts, undergoing certain experiences, discovering certain beliefs, and learning various high-level skills. These tasks cannot be accomplished with an IDE. Coding the wrong thing successfully can mess up an AI project worse than any number of programming failures. I believe that the most important skill an AI developer can have is knowing what not to program.
| Next: | 2.4: The modality level |
| Up: | 2: Part II: Levels of organization in deliberative general intelligence |
| Prev: | 2.2: Levels of organization in deliberation |