| Next: | 2.6: The thought level | Bookmark | |
| Up: | 2: Part II: Levels of organization in deliberative general intelligence | Monolithic | |
| Prev: | 2.4: The modality level |
DGI uses the term concept to refer to the mental stuffs underlying the words that we combine into sentences; concepts are the combinatorial building blocks of thoughts and mental imagery. These building blocks are learned complexity, rather than innate complexity; they are abstracted from experience. Concept structure is absorbed from recurring regularities in perceived reality.
A concept is abstracted from experiences that exist as sensory patterns in one or more modalities. Once abstracted, a concept can be compared to a new sensory experience to determine whether the new experience satisfies the concept, or equivalently, whether the concept describes a facet of the experience. Concepts can describe both environmental sensory experience and internally generated mental imagery. Concepts can also be imposed on current working imagery. In the simplest case, an exemplar associated with the concept can be loaded into the working imagery, but constructing complex mental imagery requires that a concept target a piece of existing mental imagery, which the concept then transforms. Concepts are faceted; they have internal structure and associational structure which comes into play when imposition or description encounters a bump in the road. Faceting can also be invoked purposefully; for example, "tastes like chocolate" versus "looks like chocolate".
A "concept kernel" is the pseudo-sensory pattern produced by abstracting from sensory experience. During concept satisfaction, this kernel interacts with the layered feature detectors to determine whether the reported imagery matches the kernel; during concept imposition, the kernel interacts with the layered feature controllers to produce new imagery or alter existing imagery. A programmer seeking a good representation for concept kernels must find a representation that simultaneously fulfills these requirements:
Concepts have other properties besides their complex kernels. Kernels relate concepts to sensory imagery and hence to the modality level. Concepts also have complexity that relates to the concept level; i.e., concepts have complexity that derives from their relation to other concepts. In Good Old-Fashioned AI this aspect of concepts has been emphasized at the expense of all others1, but this is no excuse for ignoring concept-concept relations in a new theory. For example, concepts are supercategories and subcategories of each other; there are concepts that describe concepts; there are concepts that describe relations between concepts; there are mutually exclusive concepts which cannot simultaneously describe the same referent. (Further examples of concept relations are given later.)
In formal logic, the traditional idea of concepts is that concepts are categories defined by a set of individually necessary and together sufficient requisites; that a category's extensional referent is the set of events or objects that are members of the category; and that the combination of two categories is the sum of their requisites and hence the intersection of their sets of referents. This formulation is inadequate to the complex, messy, overlapping category structure of reality and is incompatible with a wide range of established cognitive effects [Lakoff87]. Properties such as usually necessary and usually sufficient requisites, and concept combinations that are sometimes the sum of their requisites or the intersection of their extensional classes, are emergent from the underlying representation of concepts - along with other important properties, such as prototype effects in which different category members are assigned different degrees of typicality [Rosch78].
Concepts relate to the thought level primarily in that they are the building blocks of thoughts, but there are other level-crossings as well. Introspective concepts can describe beliefs and thoughts and even deliberation; the concept "thought" is an example. Inductive generalizations are often "about" concepts in the sense that they apply to the referents of a concept; for example, "Triangular lightbulbs are red." Deliberation may focus on a concept in order to arrive at conclusions about the extensional category, and introspective deliberation may focus on a concept in its role as a cognitive object. Concept structure is ubiquitously invoked within perceptual and cognitive processes because category structure is ubiquitous in the low-entropy processes of our low-entropy universe.
One of the meanings of "abstraction" is "removal"; in chemistry, to abstract an atom means subtracting it from a molecular group. Using the term "abstraction" to describe the process of creating concepts could be taken as implying two views: First, that to create a concept is to generalize; second, that to generalize is to lose information. Abstraction as information loss is implicit in the classical view of concepts (that is, the view of concepts under GOFAI and formal logic). Forming the concept "red" is taken to consist of focusing only on color, at the expense of other features such as size and shape; all concept usage is held to consist of purposeful information-loss.
The problem with the classical view is that it allows only a limited repertoire of concepts. True, some concepts apparently work out to straightforward information-loss. The task of arriving at a concept kernel for the concept "red" - a kernel capable of interacting with visual imagery to distinguish between red objects and non-red objects - is relatively trivial. Even simultaneously satisfying the abstraction and satisfaction problems for "red" is relatively trivial. Well-known, fully general tools such as neural nets or evolutionary computation would suffice. To learn to solve the satisfaction problem, a neural net need only to learn to fire when the modality-level feature detectors for "color" report a certain color - a point falling within a specific volume of color space - across a broad area, and not to fire otherwise. A piece of code need only evolve to test for the same characteristic. (The neural net would probably train faster for this task.)
A sufficiently sophisticated modality would simplify the task even further, doing most of the work of grouping visual imagery into objects and detecting solid-color or same-hue or mostly-the-same-hue surfaces. The human visual modality goes still farther and precategorizes colors, dividing them up into a complex color space [Boynton87], said color space having eleven culturally universal focal volumes [Berlin69], said focal volumes having comparatively sharp internal boundaries relative to physically continuous variations in wavelength (see [Shepard92], or just look at the bands in a rainbow). Distinguishing across innate color boundaries is easy; distinguishing within color boundaries is hard [Mervis75]. Thus, the human visual modality provides very strong suggestions as to where the boundaries lie in color space, although the final step of categorization is still required [Dedrick98].
Given a visual modality, the concept of red lies very close to the metaphorical "surface" of the modality. In humans red is probably at the surface, a direct output of the modality's feature-detectors. In AIs with less sophisticated visual modalities, "redness" as a category would need to be abstracted as a fuzzy volume within a smooth color space lacking the human boundaries. The red concept kernel (in humans and AIs) needs to be more complex than a simple binary test or fuzzy color clustering test, since "redness" as we understand it describes visual areas and not single pixels (although red can describe a "visual area" consisting of a small point). Even so, the complexity involved in the redness concept lies almost entirely within the sensory modality, rather than the concept kernel. We might call such concepts surface concepts.
Even for surface concepts, simultaneously solving abstraction, satisfaction, and imposition would probably be far more tractable with a special representation for concept kernels, rather than generically trained neural nets or evolutionary programs. Imposition requires a concept kernel which can be selectively applied to imagery within a visual modality, transforming that imagery such that the final result satisfies the concept. In the case of the concept "red", the concept kernel would interact with the feature controllers for color, and the targeted mental imagery would become red. This cannot be done by painting each individual pixel the same shade of red; such a transformation would obliterate edges, surfaces, textures, and many other high-level features that intuitively ought to be preserved. Visualizing a "red lemon" does not cause the mind to picture a bright red patch with the outline of a lemon. The concept kernel does not send separate color commands to the low-level feature controller of each individual visual element; rather the concept kernel imposes red in combination with other currently activated features, to depict a red lemon that retains the edge, shape, surface curvature, texture, and other visualized features of the starting lemon image. Probably this occurs because perceived coloration is a property of surfaces and visual objects rather than, or as well as, individual visual elements, and our redness concept kernel interacts with this high-level feature, which then ripples down in coherent combination with other features.
Abstracting an impose-able concept kernel for "red" is a problem of different scope than abstracting a satisfy-able kernel for "red". There is an immediately obvious way to train a neural net to detect satisfaction of "red", given a training set of known "red" and non-"red" experiences, but there is no equally obvious teaching procedure for the problem of imposing "red". The most straightforward success metric is the degree to which the transformed imagery satisfies a neural network already trained to detect "red", but a bright red lemon-shaped patch is likely to be more "red" than a visualized red lemon. How does the kernel arrive at a transformation which makes a coherent change in object coloration, rather than a transformation which paints all visual elements an indiscriminate shade of red, or a transformation which loads a random red object into memory? Any of these transformations would satisfy the "red" concept.
Conceivably fully general neural nets could be trained to impose minimal transformations, although I am not sure that "minimal transformation" is the rule which should govern concept imposition. Regardless of the real tractability of this problem, I strongly doubt that human cognitive systems create concepts by training generic neural nets on satisfaction and imposition. I suspect that concepts do not have independent procedures for satisfaction and imposition; I also suspect that neither satisfaction nor imposition are the product of reinforcement learning on a fully general procedure. Rather, I suspect that a concept kernel consists of a pattern in a representation related to (but not identical with) the representation of sensory imagery, that this pattern is produced by transforming the experiences from which the concept is abstracted, and that this pattern interacts with the modality to implement both concept satisfaction and concept imposition.
A very simple example of a non-procedural, pattern-based concept kernel would be "clustering on a single feature". Red might be abstracted from an experiential base by observing an unusual clustering of point values for the color feature. Suppose that the AI is challenged with a virtual game in which the goal is to find the "keys" to a "lock" by selecting objects from a large sample set. When the AI successfully passes five trials by selecting the correct object on the first try, the AI is assumed to have learned the rule. Let us suppose that the game rule is that "red" objects open the lock, and that the AI has already accumulated an experiential base from its past failures and successes on individual trials.
Assuming the use of a three-dimensional color space, the color values of the correct keys would represent a tight cluster relative to the distribution among all potential keys. Hence the abstracted concept kernel might take the form of a feature-cluster pair, where the feature is color and the cluster is a central point plus some measure of standard deviation. This creates a concept kernel with a prototype and quantitative satisfiability; the concept has a central point and fuzzy but real boundaries. The same concept kernel can also be imposed on a selected piece of mental imagery by loading the central color point into the color feature controller - that is, loading the clustered value into the feature controller corresponding to the feature detector clustered upon.
Clustering of this type also has indirect implications for concept-concept relations: The red concept's "color volume" might overlap a nearby concept such as burgundy, or might turn out to enclose that concept; a modality-level fact which over time might naturally give rise to an association relationship, or a supercategory relationship, on the concept level. This would not humanly occur through direct comparison of the representations of the concept kernels, but through the observation of overlap or inclusion within the categories of extensional referents. A more strongly introspective AI might occasionally benefit from inspecting kernel representations, but this should be an adjunct to experiential detection of category relationships, not a substitute for it.
Clustering on a single feature is definitely not a complete conceptual system. Single-feature clustering cannot notice a correlation between two features where neither feature is clustered alone; single-feature clustering cannot cross-correlate two features in any way at all. Concepts which are limited to clustering on a single feature will always be limited to concepts at the immediate surface of a given sensory modality.
At the same time, a concept system is not a general intelligence and need not be capable of representing every possible relation. Suppose a human were challenged with a game in which the "correct key" always had a color that lay on the exact surface of a sphere in color space; could the human concept-formation system directly abstract this property? I would guess not; I would guess that, at most, a human might notice that the key tended to belong to a certain group of colors; i.e., might slice up the surface of this color sphere into separate regions, and postulate that solution keys belong to one of several color regions. Thus, even though in this case the underlying "rule" is computationally very simple, it is unlikely that a human will create a concept that directly incorporates the rule; it may even be impossible for a human to abstract a kernel that performs this simple computation. A concept-formation system need not be generally intelligent in itself; need not represent all possible perceptual regularities; just enough for the overall mind to work.
I suspect that the system design used by humans, and a good design for AIs, will turn out to be a repertoire of different concept-formation methods. ("Clustering on a single feature" could be one such method, or could be a special case of a more general method.) Concept faceting could then result either from concepts with multiple kernels, so that a concept employs more than one categorization method against its perceptual referents, or from internal structure in a single kernel, or both. If some aspects of perceptual referents are more salient, then kernels which match those aspects are likely to have greater weight within the concept. Faceting within a concept, arising out of multiple unequal kernels or faceting within a single complex kernel, seems like the most probable source of prototype effects within a category.
Concept formation is a multi-stage process. For an AI to form a new concept, the AI must have the relevant experiences, perceptually group the experiences, notice possible underlying similarities within members of a group (this may be the same perceived similarity that led to the original experiential grouping), verify the generalization, initiate the new concept as distinguished cognitive content, create the concept kernel(s) by abstraction from the experiential base, and integrate the new concept into the system. (This checklist is intended as an interim approximation; actual mind designs may differ, but presumably a temporal sequence will still be involved.)
In the example given earlier, an AI abstracts redness starting with a bottom-up, experience-driven event: noticing the possible clustering of the color feature within the preexisting category keys. Conceivably the act of checking for color clustering could have been suggested top-down, for example by some heuristic belief, but in this example we will assume the seminal perception of similar coloration was an unexpected, bottom-up event; the product of continuous and automatic checks for clustering on a single feature across all high-level features in currently salient experiential categories. Rather than being part of an existing train of thought, the detection of clustering creates an "Aha!" event, a new cognitive event with high salience that becomes the focus of attention, temporarily shunting aside the previous train of thought. (See the discussion of the thought level.)
If the scan for clustering and other categorizable similarities is a continuous background task, it may imply a major expenditure of computational resources - perhaps a major percentage of the computing power used by the AI. This is probably the price of having a cognitive process that can be driven by bottom-up interrupts as well as top-down sequences, and the price of having a cognitive process that can occasionally notice the unexpected. Hence, the efficiency, optimization, and scalability of algorithms for such continuous background tasks may play a major role in determining the AI's performance. If imagery stays in place long enough, I would speculate that it may be possible to farm out the task of noticing a possible clustering to distant parts of a distributed network, while keeping the task of verifying the clustering, and all subsequent cognitive actions, within the local process. Most of the computing power is required to find the hint, not to verify the match, and a false hint does no damage (assuming the false hints are not malicious attacks from untrusted nodes).
Once the suspicion of similarity is triggered by a cue picked up by a continuous background process, and the actual degree of similarity is verified, the AI would be able to create the concept as cognitive content. Within the above example, the process that notices the possible clustering is essentially the same process that would verify the clustering and compute the degree of clustering, center of clustering, and variance within the cluster. Thus, clustering on a single feature may compress into a single stage the cueing, description, and abstraction of the underlying similarity. Given the expense of a continuous background process, however, I suspect it will usually be best to separate out a less expensive cueing mechanism as the background process, and use this cueing mechanism to suggest more detailed and expensive scans. (Note that this is a "parallel terraced scan"; see [Rehling97] and [Hofstadter95].)
After the creation of the concept and the concept kernel(s), it would then be possible for the AI to notice concept-concept relations, such as supercategory and subcategory relations. I do not believe that concept-concept relations are computed by directly comparing kernel representations; I think that concept-concept relations are learned by generalizing across the concept's usage. It may be a good heuristic to look for concept-concept relations immediately after forming a new concept, but that would be a separate track within deliberation, not an automatic part of concept formation.
After a concept has been formed, the new concept must be integrated into the system. For us to concede that a concept has really been "integrated into the system" and is now contributing to intelligence, the concept must be used. Scanning across the stored base of concepts, in order to find which concepts are satisfied by current mental imagery, promises to be an even more computationally expensive process than continuous background checks for clustering. An individual satisfaction check is probably less computationally intensive than carrying out a concept imposition - but satisfaction checks seem likely to be a continuous background operation, at least in humans.
As discussed earlier, humans and AIs have different computational substrates: Humans are slow but hugely parallel; AIs are fast, but resource-poor. If humans turn out to routinely parallelize against all learned concepts, an AI may simply be unable to afford it. The AI optimum may involve comparing working imagery against a smaller subset of learned complexity - only a few concepts, beliefs, or memories would be scanned against working imagery at any given point. Alternatively, an AI may be able to use terraced scanning2, fuzzy hashing3, or branched sorting4 to render the problem tractable. One hopeful sign is the phenomenon of cognitive priming on related concepts [Meyer71], which suggests that humans, despite their parallelism, are not using pure brute force. Regardless, I conjecture that matching imagery against large concept sets will be one of the most computationally intensive subprocesses in AI, perhaps the most expensive subprocess. Concept matching is hence another good candidate for distribution under "notice distantly, verify locally"; note also that the concept base could be sliced up among distributed processors, although this might prevent matching algorithms from exploiting regularities within the concept base and matching process.
Under the classical philosophy of category abstraction, abstraction consists solely of selective focus on information which is already known; focusing on the "color" or "redness" of an object as opposed to its shape, position, or velocity. In DGI's "concept kernels", the internal representation of a concept has complexity extending beyond information loss - even for the case of "redness" and other concepts which lie almost directly on the surface of a sensory modality. The only concept that is pure information-loss is a concept that lies entirely on the surface of a modality; a concept whose satisfaction exactly equals the satisfaction of some single feature detector.
The concept for "red", described earlier, is actually a fuzzy percept for degrees of redness. Given that the AI has a flat color space, rather than a human color space with innate focal volumes and color boundaries, the "redness" percept would contain at least as much additional complexity - over and above the modality-level complexity - as is used to describe the clustering. For example, "clustering on a single feature" might take the form of describing a Gaussian distribution around a central point. The specific use of a Gaussian distribution does not contribute to useful intelligence unless the environment also exhibits Gaussian clustering, but a Gaussian distribution is probably useful for allowing an AI to notice a wide class of clusterings around a central point, even clusterings that do not actually follow a Gaussian distribution.
Even in the absence of an immediate environmental regularity, a concept can contribute to effective intelligence by enabling the perception of more complex regularities. For example, an alternating sequence of "red" and "green" key objects may fail the modality-level tests for clustering because no Gaussian cluster contains (almost) all successes and excludes (almost) all failures. However, if the AI has already previously developed concepts for "red" and "green", the alternating repetition of the satisfaction of the "red" and "green" concepts is potentially detectable by higher-level repetition detectors. Slicing up the color space with surface-level concepts renders computationally tractable the detection of higher-order alternation. Even the formation of simple concepts - concepts lying on the surface of a modality - expands the perceptual capabilities of the AI and the range of problems the AI can solve.
Concepts can also embody regularities which are not directly represented in any sensory modality, and which are not any covariance or clustering of feature detectors already in a sensory modality.
Melanie Mitchell and Douglas Hofstadter's "Copycat" program works in the domain of letter-strings, such as "abc", "xyz", "onml", "ddd", "cwj", etc. The function of Copycat is to complete analogy problems such as "abc:abd::ace:?" [Hofstadter88]. Since Copycat is a model of perceptual analogy-making, rather than a model of category formation, Copycat has a limited store of preprogrammed concepts and does not learn further concepts through experience. (This should not be taken as criticism of the Copycat project; the researchers explicitly noted that concept formation was not being studied.)
Suppose that a general AI (not Copycat), working in the toy domain of letter strings, encounters a problem that can only be solved by discovering what makes the letter-strings "hcfrb", "yhumd", "exbvb", and "gxqrc" similar to each other but dissimilar to the strings "ndaxfw", "qiqa", "r", "rvm", and "zinw". Copycat has the built-in ability to count the letters in a string or group; in DGI's terms Copycat might be said to extract number as a modality-level feature. There is extensive evidence that humans also have brainware support for subitizing (directly perceiving) small numbers, and brainware support for perceiving the approximate quantities of large numbers (see [Dehaene97] for a review). Suppose, however, that a general AI does not possess a modality-level counting ability. How would the AI go about forming the category of "five", or even "groups-of-five-letters"?
This challenge points up the inherent deficit of the "information loss" viewpoint of abstraction. For an AI with no subitization support - or for a human challenged with a number like "nine", which is out-of-range for human subitization - the distinguishing feature, cardinality, is not represented by the modality (or in humans, represented only approximately). For both humans and AIs, the ability to form concepts for non-subitizable exact numbers requires more than the ability to selectively focus on the facet of "number" rather than the facet of "location" or "letter" (or "color", "shape", or "pitch"). The fundamental challenge is not focusing on the numerical facet but rather perceiving a "numerical facet" in the first place. For the purposes of this discussion, we are not speaking of the ability to understand numbers, arithmetic, or mathematics, only an AI's ability to form the category "five". Possession of the category "five" does not even imply the possession of the categories "four" or "six", much less the formulation of the abstract supercategory "number".
Similarly, the "discovery" of fiveness is not being alleged as mathematically significant. In mathematical terms almost any set of cognitive building blocks will suffice to discover numbers; numbers are fundamental and can be constructed through a wide variety of different surface procedures. The significant accomplishment is not "squeezing" numbers out of a system so sparse that it apparently lacks the usual precursors of number. Rather, the challenge is to give an account of the discovery of "fiveness" in a way that generalizes to the discovery of other complex concepts as well. The hypothesized building blocks of the concept should be general (useful in building other, non-numerical concepts), and the hypothesized relations between building blocks should be general. It is acceptable for the discovery of "fiveness" to be straightforward, but the discovery method must be general.
A working but primitive procedure for satisfying the "five" concept, after the discovery of fiveness, might look something like this: Focus on a target group (the group which may or may not satisfy "five"). Retrieve from memory an exemplar for "five" (that is, some specific past experience that has become an exemplar for the "five" concept). Picture the "five" exemplar in a separate mental workspace. Draw a correspondence from an object within the group that is the five exemplar to an object within the group that is the target. Repeat this procedure until there are no objects remaining in the exemplar imagery or there are no objects remaining in the target imagery. Do not draw a correspondence from one object to another if a correspondence already exists. If, when this procedure completes, there are no dangling objects in the exemplar or in the target group, label the target group as satisfying the "five" concept.
In this example, the "five" property translates to the property: "I can construct a complete mapping, with no dangling elements, using unique correspondences, between this target group of objects, and a certain group of objects whose mental image I retrieved from memory."
This is mathematically straightforward, but cognitively general. In support of the proposition that "correspondence", "unique correspondence", and "complete mapping with no dangling elements" are all general conceptual primitives, rather than constructs useful solely for discovering numbers, please note that Copycat incorporates correspondences, unique correspondences, and a perceptual drive toward complete mappings [Mitchell93]. Copycat has a direct procedural implementation of number sense and does not use these mapping constructs to build numerical concepts. The mapping constructs I have invoked for number are independently necessary for Copycat's theory of analogy-making as perception.
Once the procedure ends by labeling imagery with the "five" concept, that imagery becomes an experiential instance of the "five" concept. If the examples associated with a procedurally defined concept have any universal features or frequent features that are perceptually noticeable, the concept can acquire kernels after the fact, although the kernel may express itself as a hint or as an expectation, rather than being a necessary and sufficient condition for concept satisfaction. Concepts with procedural definitions are regular concepts and may possess kernels, exemplars, associated memories, and so on.
What is the benefit of decomposing "fiveness" into a complex procedure, rather than simply writing a codelet, or a modality-level feature detector, which directly counts (subitizes) the members of a group? The fundamental reason for preferring a non-modality solution in this example is to demonstrate that an AI must be capable of solving problems that were not anticipated during design. From this perspective "fiveness" is a bad example to use, since it would be very unlikely for an AI developer to not anticipate numericity during the design phase.
However, a decomposable concept for "five", and a modality-level feature detector which subitizes all numbers up to (232 - 1), can also be compared in terms of how well they support general intelligence. Despite its far greater computational overhead, I would argue that the decomposable concept is superior to a modality-level feature detector.
A billiards modality with a feature detector that subitizes all the billiard balls in a perceptual grouping and outputs a perceptually distinct label - a "numeron detector" - will suffice to solve many immediate problems that require a number sense. However, an AI that uses this feature detector to form a surface concept for "five" will not be able to subitize "five" groups of billiards within a supergroup, unless the programmer also had the foresight to extend the subitizing feature detector to count groups as well as specific objects5. Similarly, this universal subitizing ability will not extend across multiple modalities, unless the programmer had the foresight to extend the feature detector there as well6. Brainware is limited to what the programmer was thinking about at the time. Does an AI understand "fiveness" when it becomes able to count five apples? Or when the AI can also count five events in two different modalities? Or when the AI can count five of its own thoughts? It is programmatically trivial to extend the feature detector to handle any of these as a special case, but that is a path which ends in requiring an infinite amount of tinkering to implement routine thought processes (i.e., non-decomposability causes a "commonsense problem").
The most important reason for decomposability is that concepts with organized internal structures are more mutable. A human-programmed numeron detector, mutated on the code level, would probably simply break. A concept with internal structure or procedural structure, created by the AI's own thought processes in response to experience, is mutable by the AI's thought processes in response to further experience. For example, Douglas Lenat attests (see [Lenat83] and [Lenat84]) that the most difficult part of building EURISKO7 was inventing a decomposable representation for heuristics, so that the class of transformations accessible to EURISKO would occasionally result in improvements rather than broken code fragments and LISP errors. To describe this as smooth fitness landscapes is probably stretching the metaphor too much, but "smoothing" in some form is definitely involved. Raw code has only a single level of organization, and changing a random instruction on this level usually simply breaks the overall function. A EURISKO heuristic was broken up into chunks, and could be manipulated (by EURISKO's heuristics) on the chunk level.
Local shifts in the chunks of the "five"-ness procedure yield many useful offspring. By selectively relaxing the requirement of "no dangling objects" in the target image, we get the concept "less than or equal to five"-ness. By relaxing the requirement of "no dangling objects" in the exemplar image, we get the concept "greater than or equal to five"-ness. By requiring one or more dangling objects in the target image, we get the concept "more than five"-ness. By comparing two target images, instead of an exemplar and an image, we get the concept "one-to-one correspondence between group members" (what we would call "same-number-as" under a different procedure), and from there "less than" or "less than or equal to", and so on.
One of these concepts, the one-to-one correspondence between two mental images, is not just a useful offspring of the "fiveness" concept, but a simpler offspring. Thus it is probably not an "offspring" at all, but a prerequisite concept that suggests a real-world path to the apprehension of fiveness. Many physical tasks in our world require equal numbers (corresponding sets) for some group; four pegs for four holes, two shoes for two feet.
Consider the real-world task of placing four pegs in four holes. A peg cannot fill two holes; two pegs will not fit in one hole. Solid objects cannot occupy the same location, cannot appear in multiple locations simultaneously, and do not appear or disappear spontaneously. These rules of the physical environment are reflected in the default behaviors of our own visuospatial modality; even early infants represent objects as continuous and will look longer at scenes which imply continuity violations [Spelke90].
From real-world problems such as pegs and holes, or their microworld analogues, an AI can develop concepts such as unique correspondence: a peg cannot fill multiple holes, multiple pegs will not fit in one hole. The AI can learn rules for drawing a unique correspondence, and test the rules against experience, before encountering the need to form the more complex concept for "fiveness". The presence of an immediate, local test of utility means that observed failures and successes can contribute unambiguously to forming a concept that is "simple" relative to the already-trained base of concepts. If a new concept contains many new untested parts, and a mistake occurs, then it may be unclear to the AI which local error caused the global failure. If the AI tries to chunk "fiveness" all in a single step, and the current procedure for "fiveness" satisfaction fails - is positively satisfied by a non-five-group, or unsatisfied by a five-group - it may be unclear to the AI that the global failure resulted from the local error of a nonunique correspondence.
The full path to fiveness would probably involve:
If the programmer hardwires a subitizer that outputs numerons (unique number tags) as detected features, the AI may be able to chunk "five" very rapidly, but the resulting concept will suffer from opacity and isolation. The concept will not have the lower levels of organization that would enable the AI's native cognitive abilities to disassemble and reassemble the concept in useful new shapes; the inability of the AI to decompose the concept is opacity. The concept will not have a surrounding ecology of similar concepts and prerequisite concepts, such as would result from natural knowledge acquisition by the AI. Cognitive processes that require well-populated concept ecologies will be unable to operate; an AI that has "triangle" but not "pyramid" is less likely to successfully visualize "triangular lightbulb". This is isolation.
In the DGI model of AI development, concepts are abstracted from an experiential base; experiences are cognitive content within sensory modalities; and sensory modalities are targeted on a complex virtual microenvironment. Learning a concept requires (necessary, but not sufficient) having experiences from which to abstract the concept. How does an AI obtain these experiences? It would be possible to teach the AI about "fiveness" simply by presenting the AI with a series of sensory images (programmatically manipulating the AI's microenvironment) and prompting the AI's perceptual processes to generalize them, but this severs the task of concept formation from its ecological validity (metaphorically speaking). Knowledge goals (discussed in later sections) are not arbitrary; they derive from real-world goals or higher-level knowledge goals. Knowledge goals exist in a holonic goal ecology; the goal ecology shapes our knowledge goals and thereby often shapes the knowledge itself.
A first approximation to ecological validity is presenting the AI with a "challenge" in one of the virtual microenvironments previously advocated - for example, the billiards microenvironment. Henceforth, I will shorten "microenvironmental challenge" to "microtask". Microtasks can tutor concepts by presenting the AI with a challenge that must be solved using the concept the programmer wishes to tutor. For scrupulous ecological validity the key concept should be part of a larger problem, but even playing "one of these things is not like the others" would still be better than manipulating the AI's perceptual processes directly.
Tutoring a concept as the key to a microtask ensures that the concept's basic "shape", and associated experiences, are those required to solve problems, and that the AI has an experience of the concept being necessary, the experience of discovering the concept, and the experience of using the concept successfully. Effective intelligence is produced not by having concepts but by using concepts; one learns to use concepts by using them. The AI needs to possess the experiences of discovering and using the concept, just as the AI needs to possess the actual experiential referents that the concept generalizes; the AI needs experience of the contexts in which the concept is useful.
Forming a complex concept requires an incremental path to that complex concept - a series of building-block concepts and precursor concepts so that the final step is a leap of manageable size. Under the microtask developmental model, this would be implemented by a series of microtasks of ascending difficulty and complexity, in order to coax the AI into forming the precursor concepts leading up to the formation of complex concepts and abstract concepts. This is a major expense in programmer effort, but I would argue that it is a necessary expense for the creation of rich concepts with goal-oriented experiential bases.
The experiential path to "fiveness" would culminate with a microtask that could only be solved by abstracting and using the fiveness concept, and would lead up to that challenge through microtasks that could only be solved by abstracting and using concepts such as "object continuity", "unique correspondence", "mapping", "dangling group members", and the penultimate concept of "one-to-one mapping".
With respect to the specific microtask protocol for presenting a "challenge" to the AI, there are many possible strategies. Personally, I visualize a simple microtask protocol (on the level of "one of these things is not like the others") as consisting of a number of "gates", each of which must be "passed" by taking one of a set of possible actions, depending on what the AI believes to be the rule indicating the correct action. Passing ten successive gates on the first try is the indicator of success. (For a binary choice, the chance of this happening accidentally is 1024:1. If the AI thinks fast enough that this may happen randomly (which seems rather unlikely), the number of successive gates required can be raised to twenty or higher.) This way, the AI can succeed or fail on individual gates, gathering data about individual examples of the common rule, but will not be able to win through the entire microtask until the common rule is successfully formulated. This requires a microenvironment programmed to provide an infinite (or merely "relatively large") number of variations on the underlying challenge - enough variations to prevent the AI from solving the problem through simple memory.
The sensory appearance of a microtask would vary depending on the modality. For a Newtonian billiards modality, an individual "gate" (subtask) might consist of four "option systems", each option system grouped into an "option" and a "button". Spatial separations in the Newtonian modality would be used to signal grouping; the distance between option systems would be large relative to the distance within option systems, and the distance between an option and a button would be large relative to the distance between subelements of an option. Each option would have a different configuration; the AI would choose one of the four options based on its current hypothesis about the governing rule. For example, the AI might select an option that consists of four billiards, or an option with two large billiards and one small billiard, or an option with moving billiards. Having chosen an option, the AI would manipulate a motor effector billiard - the AI's embodiment in that environment - into contact with the button belonging to (grouped with) the selected option. The AI would then receive a signal - perhaps a movement on the part of some billiard acting as a "flag" - which symbolized success or failure. The environment would then shift to the next "gate", causing a corresponding shift in the sensory input to the AI's billiards modality.
(Since the format of the microtask is complex and requires the AI to start out with an understanding of notions like "button" or "the button which belongs to the chosen option", there is an obvious chicken-and-egg problem with teaching the AI the format of the microtask before microtasks can be used to tutor other concepts. For the moment we will assume the bootstrapping of a small concept base, perhaps by "cheating" and using programmer-created cognitive content as temporary scaffolding.)
Given this challenge format, a simple microtask for "fiveness" seems straightforward: The option containing five billiards, regardless of their size or relative positions or movement patterns, is the key to the gate. In practice, setting up the fiveness microtask may prove more difficult because of the need to eliminate various false ways of arriving at a solution. In particular, if the AI has a sufficiently wide variety of quantitative feature detectors, then the AI will almost certainly possess an emergent Accumulator Model (see [Meck83]) of numeracy. If the AI takes a relatively fixed amount of time to mentally process each object, then single-feature clustering on the subjectively perceived time to mentally process a group could yield the microtask solution without a complex concept of fiveness. Rather than fiveness, the AI would have formed the concept "things-it-takes-about-20-milliseconds-to-understand". The real-world analogue of this situation has already occurred when an experiment formerly thought to show evidence for infant numeracy on small visual sets was demonstrated to show sensitivity to the contour length (perimeter) of the visual set, but not to the cardinality of the visual set [Clearfield99]. Even with all precursor concepts already present, a complex microtask might be necessary to make fiveness the simplest correct answer.
Also, the microtasks for the earlier concepts leading up to fiveness might inherently require greater complexity than the "option set" protocol described above. The concept of unique correspondence derives its behavior from physical properties. Choosing the right option set is a perceptual decision task rather than a physical manipulation task; in a decision microtask, the only manipulative subtask is maneuvering an effector billiard to touch a selected button. Concepts such as "dangling objects" or "one-to-one mapping" might require manipulation subtasks rather than decision subtasks, in order to incorporate feedback about physical (microenvironmental) outcomes into the concept.
For example, the microtask for teaching "one-to-one mapping" might incorporate the microworlds equivalent of a peg-and-hole problem. The microtask might be to divide up 9 "pegs" among 9 "holes" - where the 9 "holes" are divided into three subgroups of 4, 3, and 2, and the AI must allocate the peg supply among these subgroups in advance. For example, in the first stage of the microtask, the AI might be permitted to move pegs between three "rooms", but not permitted to place pegs in holes. In the second stage of the microtask the AI would attempt to place pegs in holes, and would then succeed or fail depending on whether the initial allocation between rooms was correct. Because of the complexity of this microtask, it might require other microtasks simply to explain the problem format - to teach the AI about pegs and holes and rooms. ("Pegs and holes" are universal and translate easily to a billiards modality; "holes", for example, might be immobile billiards, and "pegs" moveable billiards to be placed in contact with the "holes".)
Placing virtual pegs in virtual holes is admittedly not an inherently impressive result. In this case the AI is being taught to solve a simple problem so that the learned complexity will carry over into solving complex problems. If the learned complexity does carry over, and the AI later goes on to solve more difficult challenges, then, in retrospect, getting the AI to think coherently enough to navigate a microtask will "have been" an impressive result.
Concept-concept interactions are more readily accessible to introspection and to experimental techniques, and are relatively well-known in AI and in cognitive psychology. To summarize some of the complexity bound up in concept-concept interactions:
| Next: | 2.6: The thought level |
| Up: | 2: Part II: Levels of organization in deliberative general intelligence |
| Prev: | 2.4: The modality level |