General Intelligence and Seed AI is ©2001 by Singularity Institute for Artificial Intelligence, Inc.  All rights reserved.

Next: Interlude: Represent, Notice, Understand, Invent Bookmark
Up: 2: Mind Monolithic
Prev: 2.2: Sensory modalities


2.3: Concepts

2.3.1: Modality-level, concept-level, thought-level

Modalities in the human brain are mostly preprogrammed, as opposed to learned.  (Human modalities require external stimuli to grow into their preprogrammed organization, but this is not the same as learning.)  Individual neural signals can have meanings that are visible and understandable to an eavesdropper.  Programmers may legitimately take the risk of creating modalities through deliberate programming, with low-level elements that correspond to data structures, and human-written procedures for feature extraction.

Within GISAI, the term concept is used to refer to the kind of mental stuff that exists as a pattern in the modality.  A learned sequence of instructions that reconstructs a generic, abstracted "light bulb" in the visual modality is a concept.  Symbols, categories, and some memories are concepts.  (Despite common usage, "concept" might technically refer to non-declarative mental stuff such as a human cognitive reflex or a human motor skill.  However, in a seed AI, where everything is open to introspection, it makes sense to call the equivalents of human reflexes or skills "concepts".)  Concepts are patterns, learned or preprogrammed, that exist in long-term storage and can be retrieved.

A structure of concepts creates a thought.  The archetypal example, in humans, is words coming together to form sentences.  Thoughts are visualized; they operate within the RAM of the mind, the "workspace" represented by available content capacity in the sensory modalities, commonly called "short-term memory" or "working memory".  (The capacity of working memory in AIs is not determined by available RAM, but by available CPU capacity to perform feature extraction on the contents of memory.  If you have the data structures without the feature extraction, the AI won't notice the information.)  Thoughts manipulate the world-model.

In humans, at least, it's hard to draw clean boundaries between thoughts and concepts.  (1).  The experience of hearing the word for a single concept, such as "triangle", is not necessarily a mere concept; it may be more valid to view it as a thought composed of the single concept "triangle".  And, although some concepts are formed by categorizing directly from sense perception, more abstract concepts such as "three" probably occur first as deliberate thoughts.  We'll be discussing both types in this section.

2.3.2: Abstraction is information-loss; abstraction is not information-loss

In chemistry, abstract means remove; to "abstract" an atom from a molecule means to take it away.  Use of the term "abstract" to describe the process of forming concepts implies two assumptions:  First, to create a concept is to generalize; second, to generalize is to lose information.  It implies that, to form the concept of "red", it is necessary to ignore other high-level features such as shape and size, and focus only on color.

This is the classical-AI view of abstraction, and we should therefore be suspicious of it.  On the other hand, our mechanisms for abstraction can learn the concept for "red".  In a being with a visual modality, this concept would consist of a piece of mindstuff that had learned to distinguish between red objects and non-red objects.  Since redness is detected directly as a low-level feature, it shouldn't be very hard to train a piece of mindstuff to thus distinguish - whether the mindstuff is made of trainable neurons, evolving code, or whatever.  A neural net needs to learn to fire when the "red" feature is present, and not otherwise; a piece of code only needs to evolve to test for the presence of the redness feature.  At most, "red" might also require testing for solid-color or same-hue groupings.  Given a visual modality, the concept of "red" lies very close to the surface.

Of course, to have a real concept for "red", it's not enough to distinguish between red and non-red.  The concept has to be applicable; you have to be able to apply it to visualizations, as in "red dog".  You also need a default exemplar (2) for "red"; and an extreme exemplar for "red"; and memories of experiences that are stereotypically red, such as stoplights and blood.  (For all we know, leaving out any one of these would be enough to totally hose the flow of cognition.)  Again, these features lie close to the surface of a visual modality.  "Red" would be one of the easiest features to make reversible, with little additional computational cost involved; just set the hue of all colors to a red value.  (Although hopefully in such a way as to preserve all detected edges, contrasts, and so on.  Making everything exactly the same color would destroy non-color features.)  The default exemplar for red can be a red blob, or a red light; the extreme exemplar for red may be the same as the default exemplar, or it may be a more intensely red blob.  And the stereotypically red objects, such as stoplights and blood, are the objects in which the redness is important, and much remarked upon.

(3).

For the moment, however, let's concentrate on the problem of forming categories.  The conventional wisdom states that categorization consists of generalization, and that generalization consists of focusing on particular features at the expense of others.

We'll use the microdomain of letter-strings as an example.  To generalize from the instances {"aaa", "bbb", "ccc"} to form the category "strings-of-three-equal-letters", the information about which letter must be abstracted, or lost, from the model.  Actually, this misstates the problem.  If you lose that information on a letter-by-letter basis, then "aaa" and "aab" both look like "***".  What's needed is for the letter-string modality to first extract the features of "group-of-equal-letters", "number=3", and "letter=b", after which the concept can lose the last feature or focus on the first two.  If the second feature, "number", is also lost, then the result is an even more general concept, "strings-of-equal-letters".  Of course, this concept is precisely identical to the modality's built-in feature-detector for "group-of-equal-letters", which again points up that only very simple conceptual categories, lying very close to the surface of the modality's preprogrammed assumptions about which features are important, can be implemented by direct information-loss.

To examine a more complex concept, we'll look at the example of "three".

2.3.3: The concept of "three"

To a twenty-first-century human, trained in arithmetic and mathematics, the concept of "three" has enormous richness.  It must therefore be emphasized that we are dealing solely with the concept of "three", and that a mind can understand "three" without understanding "two" or "four" or "number" or "addition" or "multiplication".  A mind may have the concept "three" and the concept "two" without noticing any similarity between them, much less having the aha! that these concepts should go together under the heading "number".  If a mind somehow manages to pick up the categories of groups-of-three-dogs and groups-of-three-cats, it doesn't follow that the mind will generalize to the category of "three".

To think about infant-level or child-level AIs, or for that matter to teach human children, it's necessary to slow down and forget about what seems "natural".  It's necessary to make a conscious separation between ideas - ideas that, to humans, seem so close together that it takes a deliberate effort to see the distance.

Just because the AI exists on a machine performing billions of arithmetical operations per second doesn't mean that the AI itself must understand arithmetic or "three".  (John Searle, take note!)  Even if the AI has a codic modality which grants it direct access to numerical operations, it doesn't necessarily understand "three".  If every modality were programmed with feature-extractors that counted up the number of objects in every grouping, and output the result as (say) the tag "number: three", the AI might still fail to really understand "three", since such an AI would be unable to count objects that weren't represented directly in some modality.  An AI that learns the concept of "three" is more likely to notice not just three apples but that ve (the AI) is currently thinking three thoughts.  A preprogrammed concept only notices what the programmer was thinking about when he or she wrote the program.

What is "three", then?  How would the concept of "three" be learned by an AI whose modalities made no direct reference to numbers - whose modalities, in fact, were designed by a programmer who wasn't thinking about numbers at the time?  How can such a simple concept be decomposed into something even simpler?

There's an AI called "Copycat", written by Melanie Mitchell and conceived by Douglas R. Hofstadter, that tries to solve analogy problems in the microdomain of letter-strings.  If you tell Copycat:  "'abc' goes to 'abd'; what does 'bcd' go to?", it will answer "'bce'".  It can handle much harder problems, too.  (See Copycat in the glossary.)  Copycat is a really fascinating AI, and you can read about it in Metamagical Themas, or read the source code (it's a good read, and available as plain text online - no decompression required).  If you do look at the source code, or even just browse the list of filenames, you'll see the names of some very fundamental cognitive entities.  There are "bonds", "groups", and "correspondences".  There are "descriptors" (and "distinguishing descriptors") and "mappings", and all sorts of interesting things.

Without going too far into the details of Copycat, I believe that some of the mental objects in Copycat are primitive enough to lie very close to the foundations of cognition.  Copycat measures numbers directly (although it can only count up to five), but that's not the feature we're interested in.  Copycat was designed to understand relations and invent analogies.  It can notice when two letters occupy "the same position" in a letter-string, and can also notice when two letters occupy "the same role" in a higher-order mental construct.  It can notice that "c" in "abc" and "d" in "abd" and "d" in "bcd" all occupy the same position.  It can understand the concept of "the same role", if faced by an analogy problem which forces it to do so.  For example:  If "abc" goes to "abd", what does "pqrs" go to?  Copycat sees that "c" and "s" occupy the same role, even though they no longer occupy the same numerical position in the string, and so replies "pqrt".

Correspondences and roles and mappings are probably autonomically-detected features on the modality-level (as well as being very advanced concepts in cognitive science).  Intuitive, directly perceived correspondences allow two images in the same modality to be compared, and that is a basic part of what makes a modality go.

These intuitions obey certain underlying cognitive pressures (also modeled by the Copycat project):  If two high-level structures are equal, then the low-level structures should be mapped to each other.  Symmetry, which - very loosely defined - is the idea that each of these low-level mappings should be the same.  If one is reflected, they should all be reflected, and so on.  Completeness:  You shouldn't map five elements to each other but leave the sixth elements dangling.

Copycat shows an example of how to implement this class of cognitive intuitions using conflict-detectors, equality-detectors, and a feature called a "computational temperature".  Roughly speaking, conflicts raise the temperature and good structures lower the temperature.  The higher the temperature, the more easily cognitive perceptions break - the more easily groups and bonds and mappings dissolve.  Lower temperatures indicate better answers, and thus answers are more persistent - perceived pieces of the answer in the cognitive workspace are harder to break.  Copycat's intuitions may not have the same flexibility or insight as a human consciously trying to solve a "symmetry problem" or a "completeness problem", but they do arguably match a human's unconscious intuitions about analogy problems.  Each low-level built-in cognitive ability has its analogue as a high-level thought-based skill, and it is dangerous to confuse the standards to which the two are held.

We now return to the concept of "three".  We'll suppose for the moment that we're operating in a Newtonian billiard-ball modality, and that we want the AI to learn to recognize three billiard balls.

The first concept learned for "three" might look like this:

The mental image on the left is an "exemplar" (or "prototype"), attached to the three concept and stored in memory.  The mental image on the right is the target, containing the objects actually being counted.  The concept of "three" is satisfied when correspondences can be drawn between each object in the three-exemplar and each object in the target image.  If the target image contains two objects, a dangling object will be detected in the three-exemplar image, and the concept will not be satisfied.  If the target image contains four objects, then a dangling object will be detected in the target image.  (4).

This isn't a full answer to the "problem of three", of course.  A full answer would also consider the question of how to computationally implement a "unique correspondence" in a non-fragile way; how to distinguish each object from the background; how to apply the three-concept to a mental image formerly containing two or four objects to yield a new mental image containing three objects; how to retrieve the exemplar from memory; how to extend the intuition of "unique correspondence" across modalities.  And the type of mindstuff needed to implement these instructions in a non-fragile way; and how the exemplar and concept were created or learned in the first place.

In fact, the problem of three is so complicated that it would probably be first solved by conscious thought, and compiled into a concept afterwards.  This adds the problem of figuring out how the thoughts got started; what types of task would force a mind to notice "three" and evolve a definition like that above; and how the skill gets compiled into a pattern.  Also, an understanding of three that generalizes from the concept "three billiard balls" to the concept "three groups of three billiard balls" means asking what kind of problem would force the generalization.  It means asking how the generalization would take place inside the thought-based skill or mindstuff-based concept; how the need to generalize would translate into a cognitive pressure, and how that pressure would apply to a piece of the mindstuff-code, and how that piece would correctly shift under pressure.  And then there are questions about moving towards the adult-human understanding of "three", such as noticing that it doesn't matter which particular billiard ball A corresponds to which billiard ball B.

However, the diagram above does constitute a major leap forward in solving the problem.  It is a functional decomposition of three, one that invokes more basic forces such as unique correspondence and exemplar retrieval.  It is a concept that could be learned even by an AI whose programmers had never heard of numbers, or whose programmers weren't thinking about numbers at the time.  It is a concept that can mutate in useful ways.  By relaxing the requirement of no dangling objects in the exemplar, we get "less than or equal to three".  By relaxing the requirement of no dangling objects in the target image, we get "greater than or equal to three".  By requiring a dangling object in the target image, we get "more than three".  By comparing two images, instead of a exemplar and an image, we get "same number as" (5), and from there "less than" or "less than or equal to".

In fact, examining some of these mutations suggests a real-world path to threeness.  The general rule is that concepts don't get invented until they're useful.  Many physical tasks in our world require equal numbers of something; four pegs for four holes, and so on.  The task of perceiving a particular number of "holes" and selecting, in advance, the correct number of pegs, might force the AI to develop the concept of corresponding sets, or sets that contain the same number of objects.  The spatial fact that two pegs can't go in the same hole, and that one peg can't go in two holes, would be a force acting to create the perception of unique (one-to-one) correspondences.  "Corresponding-sets" would probably be the first concept formed.  After that, if it were useful to do so, would come a tendency to categorize sets into classes of corresponding sets, when it was useful to do so; after that would come the selection of a three-exemplar and the concept of three.

The decomposition of three in the above graphic is not the most efficient concept for three.  It is simply the most easily evolved.  After the formation of the exemplar-and-comparision concept for three would come a more efficient procedure:  Counting.

To evolve the counting concept requires that the counting skill be developed, which occurs on the thought-level, which thought in turn requires a more sophisticated concept-level depiction of three.  It requires that one and two have also been developed, and that one and two and three have been generalized into number.  Once this occurs, and the AI has been playing around with numbers for a while, it may notice that any group of three objects contains a group of two objects.  It may manage to form the concept of "one-more-than", an insight that would probably be triggered by watching the number of a group change as additional objects are added.  It might even notice that physical processes which add one object at a time always result in the same sequence of numerical descriptions:  "One, two, three, four..."

If multiple experiences of such physical processes can be generalized, and an exemplar experience of the process selected and applied, the result might be a counting procedure like that taught to human children: Tag an object as counted and say the word 'one'; tag another object as counted and say 'two'; tag another object as counted and say the word that, in the learned auditory chanting sequence, comes after 'two'; and so on.  Do not re-count any object that has already been tagged as "counted".  The last word said aloud is the number of the group.  This method is more efficient than checking unique correspondences, and the method also reflects a deeper understanding of numbers.

Finally, once "three" has been used long enough, it's likely that a human brain evolves some type of neural substrate for seeing threeness directly.  That is, some piece of the human visual modality - probably the object-recognition system in the temporal lobe, but that's just a wild guess - learns to respond to groups of three objects.  (Larger numbers like "five" or "six" are harder to recognize directly - that is, without counting - unless the objects are arranged in stereotypical five-patterns and six-patterns, like those on the sides of dice.)  The analogue for an AI might be a piece of code (or assembly language, or a neural net - you know, mindstuff) that counts items directly.

However, even if the AI eventually creates a highly-optimized counting method, implemented directly, the previous definitions of the concept will still exist.  When new situations are encountered, new situations that force the extension of the concept, the mind can switch from the optimized method to the methods that reflect underlying causes and underlying substrate.  If necessary, the problem can rise all the way to the level of conscious perception, so that the deliberate, thought-level methods - the thoughts from which the concepts first arose - are used.  The experiences that underlie the original definition, the experience of noticing the definition, the experience of using the definition - all can be reviewed.  This is why a concept is so much richer, so much more powerful, if it's learned instead of preprogrammed.  It's why learned, rich concepts are so much more flexible, so much likelier to mutate and evolve and spin off interesting specializations and generalizations and variations.  It's why learned concepts are more useful when a mind encounters special cases and has to resort to high-level reasoning.  It's why high-level cognitive objects are vastly more powerful, more real, than the flat, naked "predicate calculus" of classical AI.

Thus the idea of "information-loss" or "focus" is cast in a different light.  Sure, calling something a three-group, or placing it into the three-category, can be said to "lose" a lot of information - in information-theoretical terms, you've moved from specifying the distinct and individual object to specifying a member of the class of things that can be described by "three".  In classical-AI terms, you've decided to focus on the feature called "number" and not any of the other features of the object.  But to label a rich, complex, multi-step act of perception "information loss" borders on perversion.  Seeing the "threeness" of a group doesn't destroy information, it adds information.  One perceives everything that was previously known about the object, and its threeness as well; nor could that threeness be "focused" on, until the methods for perceiving threeness were learned.

2.3.4: Concept combination and application

"When you hear the phrase "triangular light bulb", you visualize a triangular light bulb...  How do these two symbols combine?  You know that light bulbs are fragile; you have a built-in comprehension of real-world physics - sometimes called "naive" physics - that enables you to understand fragility.  You understand that the bulb and the filament are made of different materials; you can somehow attribute non-visual properties to pieces of the three-dimensional shape hanging in your visual cortex.  If you try to design a triangular light bulb, you'll design a flourescent triangular loop, or a pyramid-shaped incandescent bulb; in either case, unlike the default visualization of "triangle", the result will not have sharp edges.  You know that sharp edges, on glass, will cut the hand that holds it."
        -- 1.2: Thinking About AI
How do the concepts of "triangular" and "light-bulb" combine?  My current hypothesis involves what might be called "reductionist energy minimization" or "holistic network relaxation", a conflict-resolution method that takes cues from both the "potential energy surface" of chemistry and the "computational temperature" of Copycat.

Neural networks, when perturbed, are known to seek out what might be called "minimal-energy states".  A network-relaxation model of concept combination could be computationally realistic - an operation that neurons can accomplish in the 200 operations-per-second timescale.  My current hypothesis for the basic neural operation in concept-combination is the resonance.  A neural resonance circuit - perhaps not a physical, synaptic circuit, but a virtual message-passing circuit, established by one of the higher-level neural communication methods (binding by neural synchrony, maybe) - can either resonate positively, reinforcing that part of the concept-combination, or resonate negatively, generating a conflict.  My guess at the network-relaxation method resembles the "potential energy surface" of chemistry in that multiple, superposed alternatives are tried out simultaneously, so that the minima-seeking resembles a flowing liquid rather than a rolling ball.

The high-level, salient facets of the concepts being combined are combined first.  These high-level features then visualize the mid-level features; if no conflict is detected, the mid-level features visualize the low-level features.  If a conflict is detected at any level, the conflict propagates back up to the conflicting high-level or mid-level features causing the problem.  Who wins the conflict?  The more salient, more important, or more useful feature - remember, we're talking about combining two concepts, each with its own set of features along various dimensions - is selected as dominant, and the network relaxation algorithm proceeds.  When one concept modifies another, the "more salient" feature is the one specified by the concept doing the modifying.  (Note also that, in casual reading, not all the facets of a concept may be important, just as you don't fully visualize every word in a sentence.  Only the facets that resonate with the subject of discussion, with the paragraph, will be visualized.)

In the case of "triangular light bulbs", "triangular" is an adjective.  The concept for "triangle" or "triangular" is modifying the concept of "light bulb", rather than vice versa.  The default exemplar for "light bulb" - that is, an image of the generic light bulb - is loaded into the mental workspace, including the visual facet of the exemplar being loaded into the visual cortex.  Next, the concept for "triangular" is applied to this mental image.

The concept of "triangular", as it refers to physical objects, has a single facet:  It alters the physical shape of the target image.  Note that I say "physical shape", not "visual shape".  The default exemplar for "light bulb" is a mental image - not a mental picture, but a mental image; in GISAI, an "image" means a representation in any modality or modalities, not just the visual cortex.  The "light bulb" exemplar is an image of a three-dimensional bulb-shaped object, made of glass, having a metal plug at the bottom, whose purpose is to emit light.  It is this multimodal mental image that "triangular" modifies, not just the visual component of the image.  In particular, the "shape" facet of the light-bulb concept, the facet being modified, is a high-level feature describing the shape of the three-dimensional physical object, not the shape of the visual image.  Thus, modifying the light-bulb shape will modify the mental image of the physical shape, rather than manipulating the 2-D visual shape in the visual cortex.

The "triangular" concept, when applied along the dimension of "shape", manipulates the mental image of the light bulb, changing the 3D model to be triangle-shaped.  However, since the image of a flat light bulb fails to resonate, "triangle" automatically slips to "pyramid".

(I'm not sure whether this conflict is detected at the mid-level feature of "flat light bulb", or whether a flat light bulb actually begins to visualize before the conflict is detected.  The slippage happens too fast for me to be sure.  I suspect that "triangular" has slipped to "pyramidal" before, when applied to three-dimensional mental images; for neural entities, anything that happens once is likely to happen again.  Neurons learn, and neural thinking wears channels in the neurons.  It could be that the non-flatness of light bulbs is salient because of their bulbous shape, and that this resonance with non-flatness causes "triangular" to slip to "pyramidal" before the concept is even applied.)

Pyramids are sharp.  I know, from introspection, that the "sharp pyramidal light-bulb" got all the way down to the visual level before the conflict was noticed.  (The conflict rose to the level of conscious perception, but was resolved more or less intuitively; I didn't have to "stop and think".  So this is probably still a valid example of concept-level processes.)  The particular conflict:  Sharp glass cuts the person who holds it.  We've all had visual experience of sharp glass, and the associated need for visual recognition and avoidance; thus, the mental image of sharp glass would trigger this recognition and create a conflict.  This conflict, once detected, was also visualized all the way down to the visual cortex; I briefly saw the mental image of a thumb sliding along the edge of the pyramid.

The problem of sharp edges is one that is caused by sharpness and can be solved by rounding, and I've had visual experience of glass with rounded edges, so the sharp edges on the mental image slipped to rounded edges.  The result was a complete mental image of a pyramidal light bulb, having four triangular sides, rounded edges and corners, and a square bottom with a plug in it.  (6)

Every sentence in the last five paragraphs, of course, is just begging the question:  "Why?  Why?  Why?"  A full answer is really beyond the scope of the section on "Mind"; I just want to remind my readers that often the real answer is "Because it happened that way at least once before in your lifetime."  A human mind is not necessarily capable of simultaneously inventing all the reflexes, salient pathways, and slippages necessary to visualize a triangular lightbulb.  Neurons learn, and thoughts wear channels in the network.  The first time I ever had to select which level triangle-imposition should apply to - visual, spatial, or physical - I may have made a comical mistake.  A seed AI may be able to avoid or shorten this period of infancy by using deliberate, thought-level reasoning about how concepts should combine; if so, this is functionality over and above that exhibited by humans.

You'll note that, throughout the entire discussion of concept combination, I've been talking about humans and even making appeals to specific properties of neurally based mindstuff, without talking about the problem of implementation in AIs.  Most of the time, the associational, similarity-based architecture of biological neural structures is a terrible inconvenience.  Human evolution always works with neural structures - no other type of computational substrate is available - but some computational tasks are so ill-suited to the architecture that one must turn incredible hoops to encode them neurally.  (This is why I tend to be instinctively suspicious of someone who says, "Let's solve this problem with a neural net!"  When the human mind comes up with a solution, it tends to phrase it as code, not a neural network.  "If you really understood the problem," I think to myself, "you wouldn't be using neural nets.")

Concept combination is one of the few places where neurons really shine.  It's one of the very rare occasions when the associational, similarity-based, channel-wearing architecture of biological neural structures is so appropriate that a programmer might reinvent naked neurons, with no features added or removed, as the correct computational elements for solving the problem.  Neural structures are just very well-suited to "reductionist energy minimization" or "holistic network relaxation" or whatever you want to call it.

Even so, neural networks are very hard to understand, or debug, or sensibly modify.  I believe in the ideal of mindstuff that both human programmers and the AI can understand and manipulate.  To expect direct human readability may be a little too much; that goal, if taken literally, tends to promote fragile, crystalline, simplistic code, like that of a classical AI.  Still, even if concept-level mindstuff doesn't have the direct semantics of code, we can expect better than the naked incomprehensibility of assembly language.  We can expect the programmer to be able to see and manipulate what's going on, at least in general terms, perhaps with the aid of some type of "decompiler".  I currently tend to lean towards code for the final mindstuff, while acknowledging that this code may tend to organize itself in neural-like patterns which will require additional tools to decode.

2.3.5: Thoughts are created by concept structures

Thoughts are created by structures of concept-level patterns.  The archetypal example is a grammatical sentence: a linear sequence of words parsed by the brain's linguistic centers into a more-or-less hierarchical structure, in which the referents of targetable words and phrases (an adjective needs a target image, for example) have been found, either inside the sentence or in the most salient part of the current mental image.  The inverse of this process is when a fact is noticed, turned into a concept structure, translated into a sentence, and articulated out loud within the mind.  (A possible reason for the stream-of-consciousness phenomenon is discussed in 2.4.3: Thoughts about thoughts.)

The current section has discussed concepts as mindstuff-based patterns in sensory modalities - that is, the mindstuff is assumed to pay attention to, or issue instructions to, the sensory modalities and the features therein.  That concepts interact with other concepts, and are influenced by the higher-level context in which they are invoked, has been largely ignored.  This was deliberate.  The farther you go from the mindstuff level, and the more "abstract" you get, the closer you are to the levels that are easily accessible to human introspection.  These are the introspective perceptions that come out in words; the qualities that modern culture associates with above-average intelligence; the levels enormously overemphasized by classical AI.

Still, there are some thoughts that are so abstract as to appear distant from any sensory grounding.  In that last sentence, for example, only the term "distant" has an obvious grounding, and since the sentence wasn't interpreted in a spatial context, it's unlikely that even that term had any direct visualizational effect.  Metaphors do show up more often than you might think, even in abstract thought (see Lakoff and Johnson, Metaphors We Live By or Philosophy in the Flesh).  Still, there are concepts whose definition and grounding is primarily their effect on other concepts - "abstract concepts".  Why doesn't the classical-AI method work for abstract concepts?

Even abstract concepts, mental images composed entirely of concepts referring to other concepts, exist within a reductholistic system.  Abstract concepts may not have reductionist definitions that ground directly in sensory experience, but they have reductionist definitions that ground in other concepts.  What are apparently high-level object-to-object interactions between two abstract concepts can, if conflicts appear, be modeled as mid-level structure-to-structure interactions between two definitions.  Abstract concepts still have lower-level structure, mid-level interactions, and higher-level context.

Still, defining concepts in terms of other concepts is what classical AIs do.  I can't actually recall, offhand, any (failed!) classical AIs with explicit holistic structure - I can't recall any classical AIs that constructed explicitly multilevel models to ground reasoning using semantic networks - but it seems likely that someone would have tried it at some point.  (Eurisko and Copycat don't count for reasons that will be discussed in future sections.  Besides, they didn't fail.)  So, why doesn't the classical method work for abstract concepts?

Many classical AIs lack even basic quantitative interactions (such as fuzzy logic), rendering them incapable of using methods such as holistic network relaxation, and lending all interactions an even more crystalline feeling.  Still, there are classical AIs that use fuzzy logic.

What's missing is flexibility, mutability, and above all richness; what's missing is the complexity that comes from learning a concept.  Perhaps it would be theoretically possible to select a piece of abstract reasoning in an adult AI in which the complexity of sensory modalities played no part at all.  Perhaps it would even be possible to remove all the grounding concepts below a certain level, and most of the modality-level complexity, without destroying the causal process of the reasoning.  Even so - even if the mind were deprived of its ultimate grounding and left floating - the result wouldn't be a classical AI.  Abstract concepts are learned, are grown in a world that's almost as rich as a sensory modality - because the grounding definitions are composed of slightly less abstract concepts with rich interactions, and those less-abstract concepts are rich because they grew up in a rich world composed of interactions between even-less-abstract concepts, and so on, until you reach the level of sensory modalities.  Richness isn't automatic.  Once a concept is created, you have to play around with it for a while before it's rich enough to support another layer.  You can't start from the top and build down.

Another factor that's missing from classical AIs is the ability to attach experience to concepts, to gain experience in thinking, to wear a channel in the mind.  Even a concept-combination like "triangular light bulb" has a dynamic pattern, a flow of cause and effect on the concept level, that relies on the thinker having done most of the thinking in advance.  That complexity is also absent from classical AIs.  (And of course, most classical AIs just don't support all the other dimensions of cognition - attention, focus, causality, goals, subjunctivity, et cetera.)

I think this provides an adequate explanation of why classical AI failed.  This is why classical AIs can't support thought-level reasoning or a stream of consciousness; why sensory modalities are necessary to learn abstract thought; and why concepts must be learned in order to be rich enough to support coherent thought.



Next: Interlude: Represent, Notice, Understand, Invent
Up: 2: Mind
Prev: 2.2: Sensory modalities