(µ, g, T ) =
∑
(µ α,g β .T ω)
ν(µ)γ(g, µ)χ Conπ (µ, g, T )
ν(µ α )γ(g β , µ α )χ Conπ (µ α , g β , T ω )
is the probability distribution formed by normalizing the fuzzy set χ Conπ (µ, g, T ).
A similar definition of the intellectual breadth of a context (µ, g, T ), relative to the distribution
σ over agents, may be posited. A weakness of these definitions is that they don’t try to
account for dependencies between agents or contexts; perhaps more refined formulations may
be developed that account explicitly for these dependencies.
Note that the intellectual breadth of an agent as defined here is largely independent of
the (efficient or not) pragmatic general intelligence of that agent. One could have a rather
(efficiently or not) pragmatically generally intelligent system with little breadth: this would be
a system very good at solving a fair number of hard problems, yet wholly incompetent on a
larger number of hard problems. On the other hand, one could also have a terribly (efficiently or
not) pragmatically generally stupid system with great intellectual breadth: i.e a system roughly
equally dumb in all contexts!
Thus, one can characterize an intelligent agent as “narrow” with respect to distribution ν over
environments and the distribution γ over goals, based on evaluating it as having low intellectual
breadth. A “narrow AI” relative to ν and γ would then be an AI agent with a relatively high
efficient pragmatic general intelligence but a relatively low intellectual breadth.
7.5 Conclusion
Our main goal in this chapter has been to push the formal understanding of intelligence in a more
pragmatic direction. Much more work remains to be done, e.g. in specifying the environment,
goal and efficiency distributions relevant to real-world systems, but we believe that the ideas
presented here constitute nontrivial progress.
If the line of research suggested in this chapter succeeds, then eventually, one will be able to
do AGI research as follows: Specify an AGI architecture formally, and then use the mathematics
of general intelligence to derive interesting results about the environments, goals and hardware
platforms relative to which the AGI architecture will display significant pragmatic or efficient
pragmatic general intelligence, and intellectual breadth. The remaining chapters in this section
present further ideas regarding how to work toward this goal. For the time being, such a mode
of AGI research remains mainly for the future, but we have still found the formalism given in
these chapters useful for formulating and clarifying various aspects of the CogPrime design as
will be presented in later chapters.
Chapter 8
Cognitive Synergy
8.1 Cognitive Synergy
As we have seen, the formal theory of general intelligence, in its current form, doesn’t really
tell us much that’s of use for creating real-world AGI systems. It tells us that creating extraordinarily
powerful general intelligence is almost trivial if one has unrealistically huge amounts
of computational resources; and that creating moderately powerful general intelligence using
feasible computational resources is all about creating AI algorithms and data structures that
(explicitly or implicitly) match the restrictions implied by a certain class of situations, to which
the general intelligence is biased.
We’ve also described, in various previous chapters, some non-rigorous, conceptual principles
that seem to explain key aspects of feasible general intelligence: the complementary reliance on
evolution and autopoiesis, the superposition of hierarchical and heterarchical structures, and so
forth. These principles can be considered as broad strategies for achieving general intelligence
in certain broad classes of situations. Although, a lot of research needs to be done to figure out
nice ways to describe, for instance, in what class of situations evolution is an effective learning
strategy, in what class of situations dual hierarchical/heterarchical structure is an effective way
to organize memory, etc.
In this chapter we’ll dig deeper into one of the “general principle of feasible general intelligences”
briefly alluded to earlier: the cognitive synergy principle, which is both a conceptual
hypothesis about the structure of generally intelligent systems in certain classes of environments,
and a design principle used to guide the architecting of CogPrime.
We will focus here on cognitive synergy specifically in the case of “multi-memory systems,”
which we define as intelligent systems (like CogPrime) whose combination of environment,
embodiment and motivational systems make it important for them to possess memories that
divide into partially but not wholly distinct components corresponding to the categories of:
• Declarative memory
• Procedural memory (memory about how to do certain things)
• Sensory and episodic memory
• Attentional memory (knowledge about what to pay attention to in what contexts
• Intentional memory (knowledge about the system’s own goals and subgoals)
In Chapter 9 below we present a detailed argument as to how the requirement for a multimemory
underpinning for general intelligence emerges from certain underlying assumptions
143
144 8 Cognitive Synergy
regarding the measurement of the simplicity of goals and environments; but the points made
here do not rely on that argument. What they do rely on is the assumption that, in the
intelligence in question, the different components of memory are significantly but not wholly
distinct. That is, there are significant “family resemblances” between the memories of a single
type, yet there are also thoroughgoing connections between memories of different types.
The cognitive synergy principle, if correct, applies to any AI system demonstrating intelligence
in the context of embodied, social communication. However, one may also take the theory
as an explicit guide for constructing AGI systems; and of course, the bulk of this book describes
one AGI architecture, CogPrime, designed in such a way.
It is possible to cast these notions in mathematical form, and we make some efforts in this
direction in Appendix ??, using the languages of category theory and information geometry.
However, this formalization has not yet led to any rigorous proof of the generality of cognitive
synergy nor any other exciting theorems; with luck this will come as the mathematics is further
developed. In this chapter the presentation is kept on the heuristic level, which is all that is
critically needed for motivating the CogPrime design.
8.2 Cognitive Synergy
The essential idea of cognitive synergy, in the context of multi-memory systems, may be expressed
in terms of the following points:
1. Intelligence, relative to a certain set of environments, may be understood as the capability
to achieve complex goals in these environments.
2. With respect to certain classes of goals and environments (see Chapter 9 for a hypothesis
in this regard), an intelligent system requires a “multi-memory” architecture, meaning
the possession of a number of specialized yet interconnected knowledge types, including:
declarative, procedural, attentional, sensory, episodic and intentional (goal-related). These
knowledge types may be viewed as different sorts of patterns that a system recognizes in
itself and its environment. Knowledge of these various different types must be interlinked,
and in some cases may represent differing views of the same content (see Figure ??)
3. Such a system must possess knowledge creation (i.e. pattern recognition / formation) mechanisms
corresponding to each of these memory types. These mechanisms are also called
“cognitive processes.”
4. Each of these cognitive processes, to be effective, must have the capability to recognize when
it lacks the information to perform effectively on its own; and in this case, to dynamically
and interactively draw information from knowledge creation mechanisms dealing with other
types of knowledge
5. This cross-mechanism interaction must have the result of enabling the knowledge creation
mechanisms to perform much more effectively in combination than they would if operated
non-interactively. This is “cognitive synergy.”
While these points are implicit in the theory of mind given in [Goe06a], they are not articulated
in this specific form there.
Interactions as mentioned in Points 4 and 5 in the above list are the real conceptual meat
of the cognitive synergy idea. One way to express the key idea here is that most AI algorithms
suffer from combinatorial explosions: the number of possible elements to be combined in a
8.2 Cognitive Synergy 145
Fig. 8.1: Illustrative example of the interactions between multiple types of knowledge, in representing
a simple piece of knowledge. Generally speaking, one type of knowledge can be converted
to another, at the cost of some loss of information. The synergy between cognitive processes
associated with corresponding pieces of knowledge, possessing different type, is a critical aspect
of general intelligence.
synthesis or analysis is just too great, and the algorithms are unable to filter through all the
possibilities, given the lack of intrinsic constraint that comes along with a “general intelligence”
context (as opposed to a narrow-AI problem like chess-playing, where the context is constrained
and hence restricts the scope of possible combinations that needs to be considered). In an AGI
architecture based on cognitive synergy, the different learning mechanisms must be designed
specifically to interact in such a way as to palliate each others’ combinatorial explosions - so
that, for instance, each learning mechanism dealing with a certain sort of knowledge, must
synergize with learning mechanisms dealing with the other sorts of knowledge, in a way that
decreases the severity of combinatorial explosion.
One prerequisite for cognitive synergy to work is that each learning mechanism must recognize
when it is “stuck,” meaning it’s in a situation where it has inadequate information to
make a confident judgment about what steps to take next. Then, when it does recognize that
it’s stuck, it may request help from other, complementary cognitive mechanisms.
A theoretical notion closely related to cognitive synergy is the cognitive schematic, formalized
in Chapter 7 above, which states that the activity of the different cognitive processes involved
in an intelligent system may be modeled in terms of the schematic implication
Context ∧ P rocedure → Goal
146 8 Cognitive Synergy
where the Context involves sensory, episodic and/or declarative knowledge; and attentional
knowledge is used to regulate how much resource is given to each such schematic implication in
memory. Synergy among the learning processes dealing with the context, the procedure and the
goal is critical to the adequate execution of the cognitive schematic using feasible computational
resources.
Finally, drilling a little deeper into Point 3 above, one arrives at a number of possible knowledge
creation mechanisms (cognitive processes) corresponding to each of the key types of knowledge.
Figure ?? below gives a high-level overview of the main types of cognitive process considered
in the current version of Cognitive Synergy Theory, categorized according to the type
of knowledge with which each process deals.
8.3 Cognitive Synergy in CogPrime
Different cognitive systems will use different processes to fulfill the various roles identified in
Figure ?? above. Here we briefly preview the basic cognitive processes that the CogPrime AGI
design uses for these roles, and the synergies that exist between these.
8.3.1 Cognitive Processes in CogPrime
: a Cognitive Synergy Based Architecture..." from ICCI 2009
Table 8.1: default
Table will go here
Table 8.2: The OpenCogPrime data structures used to represent the key knowledge types involved
Table 8.3: default
Table will go here
Table 8.4: Key cognitive processes, and the algorithms that play their roles in CogPrime
Tables 8.1 and 8.3 present the key structures and processes involved in CogPrime, identifying
each one with a certain memory/process type as considered in cognitive synergy theory. That
is: each of these cognitive structures or processes deals with one or more types of memory –
declarative, procedural, sensory, episodic or attentional. Table 8.5 describes the key CogPrime
8.3 Cognitive Synergy in CogPrime 147
Fig. 8.2: High-level overview of the key cognitive dynamics considered here in the context of
cognitive synergy. The cognitive synergy principle describes the behavior of a system as it
pursues a set of goals (which in most cases may be assumed to be supplied to the system
“a priori”, but then refined by inference and other processes). The assumed intelligent agent
model is roughly as follows: At each time the system chooses a set of procedures to execute,
based on its judgments regarding which procedures will best help it achieve its goals in the
current context. These procedures may involve external actions (e.g. involving conversation,
or controlling an agent in a simulated world) and/or internal cognitive actions. In order to
make these judgments it must effectively manage declarative, procedural, episodic, sensory
and attentional memory, each of which is associated with specific algorithms and structures
as depicted in the diagram. There are also global processes spanning all the forms of memory,
including the allocation of attention to different memory items and cognitive processes, and the
identification and reification of system-wide activity patterns (the latter referred to as “map
formation”)
Table 8.5: default
Table will go here
Table 8.6: Key OpenCogPrime cognitive processes categorized according to knowledge type and
process type
148 8 Cognitive Synergy
processes in terms of the “analysis vs. synthesis” distinction. Finally, Tables ?? and ?? exemplify
these structures and processes in the context of embodied virtual agent control.
In the CogPrime context, a procedure in this cognitive schematic is a program tree stored
in the system’s procedural knowledge base; and a context is a (fuzzy, probabilistic) logical
predicate stored in the AtomSpace, that holds, to a certain extent, during each interval of time.
A goal is a fuzzy logical predicate that has a certain value at each interval of time, as well.
Attentional knowledge is handled in CogPrime by the ECAN artificial economics mechanism,
that continually updates ShortTermImportance and LongTerm Importance values associated
with each item in the CogPrime system’s memory, which control the amount of attention other
cognitive mechanisms pay to the item, and how much motive the system has to keep the
item in memory. HebbianLinks are then created between knowledge items that often possess
ShortTermImportance at the same time; this is CogPrime’s version of traditional Hebbian
learning.
ECAN has deep interactions with other cognitive mechanisms as well, which are essential
to its efficient operation; for instance, PLN inference may be used to help ECAN extrapolate
conclusions about what is worth paying attention to, and MOSES may be used to recognize
subtle attentional patterns. ECAN also handles “assignment of credit”, the figuring-out of the
causes of an instance of successful goal-achievement, drawing on PLN and MOSES as needed
when the causal inference involved here becomes difficult.
The synergies between CogPrime’s cognitive processes are well summarized below, which is
a 16x16 matrix summarizing a host of interprocess interactions generic to CST.
One key aspect of how CogPrime implements cognitive synergy is PLN’s sophisticated management
of the confidence of judgments. This ties in with the way OpenCogPrime’s PLN inference
framework represents truth values in terms of multiple components (as opposed to the
single probability values used in many probabilistic inference systems and formalisms): each
item in OpenCogPrime’s declarative memory has a confidence value associated with it, which
tells how much weight the system places on its knowledge about that memory item. This assists
with cognitive synergy as follows: A learning mechanism may consider itself “stuck”, generally
speaking, when it has no high-confidence estimates about the next step it should take.
Without reasonably accurate confidence assessment to guide it, inter-component interaction
could easily lead to increased rather than decreased combinatorial explosion. And of course
there is an added recursion here, in that confidence assessment is carried out partly via PLN
inference, which in itself relies upon these same synergies for its effective operation.
To illustrate this point further, consider one of the synergetic aspects described in ?? below:
the role cognitive synergy plays in deductive inference. Deductive inference is a hard problem
in general - but what is hard about it is not carrying out inference steps, but rather “inference
control” (i.e., choosing which inference steps to carry out). Specifically, what must happen for
deduction to succeed in CogPrime is:
1. the system must recognize when its deductive inference process is “stuck”, i.e. when the
PLN inference control mechanism carrying out deduction has no clear idea regarding which
inference step(s) to take next, even after considering all the domain knowledge at is disposal
2. in this case, the system must defer to another learning mechanism to gather more information
about the different choices available - and the other learning mechanism chosen must,
a reasonable percentage of the time, actually provide useful information that helps PLN to
get “unstuck” and continue the deductive process
8.4 Some Critical Synergies 149
For instance, deduction might defer to the “attentional knowledge” subsystem, and make
a judgment as to which of the many possible next deductive steps are most associated with
the goal of inference and the inference steps taken so far, according to the HebbianLinks constructed
by the attention allocation subsystem, based on observed associations. Or, if this fails,
deduction might ask MOSES (running in supervised categorization mode) to learn predicates
characterizing some of the terms involving the possible next inference steps. Once MOSES provides
these new predicates, deduction can then attempt to incorporate these into its inference
process, hopefully (though not necessarily) arriving at a higher-confidence next step.
8.4 Some Critical Synergies
Referring back to Figure ??, and summarizing many of the ideas in the previous section, Table
?? enumerates a number of specific ways in which the cognitive processes mentioned in the
Figure may synergize with one another, potentially achieving dramatically greater efficiency
than would be possible on their own.
Of course, realizing these synergies on the practical algorithmic level requires significant
inventiveness and may be approached in many different ways. The specifics of how CogPrime
manifests these synergies are discussed in many following chapters.
Fig. 8.3: This table, and the following ones, show some of the synergies between the primary
cognitive processes explicitly used in CogPrime.
150 8 Cognitive Synergy
8.5 The Cognitive Schematic 151
8.5 The Cognitive Schematic
Now we return to the “cognitive schematic” notion, according to which various cognitive processes
involved in intelligence may be understood to work together via the implication
Context ∧ P rocedure → Goal < p >
(summarized C ∧ P → G). Semi-formally, this implication may be interpreted to mean: “If the
context C appears to hold currently, then if I enact the procedure P , I can expect to achieve
the goal G with certainty p.”
The cognitive schematic leads to a conceptualization of the internal action of an intelligent
system as involving two key categories of learning:
• Analysis: Estimating the probability p of a posited C ∧ P → G relationship
• Synthesis: Filling in one or two of the variables in the cognitive schematic, given assumptions
regarding the remaining variables, and directed by the goal of maximizing the
probability of the cognitive schematic
More specifically, where synthesis is concerned, some key examples are:
• The MOSES probabilistic evolutionary program learning algorithm is applied to find P ,
given fixed C and G. Internal simulation is also used, for the purpose of creating a simulation
embodying C and seeing which P lead to the simulated achievement of G.
– Example: A virtual dog learns a procedure P to please its owner (the goal G) in the
context C where there is a ball or stick present and the owner is saying “fetch”.
• PLN inference, acting on declarative knowledge, is used for choosing C, given fixed P and
G (also incorporating sensory and episodic knowledge as appropriate). Simulation may also
be used for this purpose.
152 8 Cognitive Synergy
– Example: A virtual dog wants to achieve the goal G of getting food, and it knows that
the procedure P of begging has been successful at this before, so it seeks a context C
where begging can be expected to get it food. Probably this will be a context involving a
friendly person.
• PLN-based goal refinement is used to create new subgoals G to sit on the right hand side
of instances of the cognitive schematic.
– Example: Given that a virtual dog has a goal of finding food, it may learn a subgoal of
following other dogs, due to observing that other dogs are often heading toward their
food.
• Concept formation heuristics are used for choosing G and for fueling goal refinement, but
especially for choosing C (via providing new candidates for C). They are also used for
choosing P , via a process called “predicate schematization” that turns logical predicates
(declarative knowledge) into procedures.
– Example: At first a virtual dog may have a hard time predicting which other dogs are
going to be mean to it. But it may eventually observe common features among a number
of mean dogs, and thus form its own concept of “pit bull,” without anyone ever teaching
it this concept explicitly.
Where analysis is concerned:
• PLN inference, acting on declarative knowledge, is used for estimating the probability of
the implication in the cognitive schematic, given fixed C, P and G. Episodic knowledge
is also used this regard, via enabling estimation of the probability via simple similarity
matching against past experience. Simulation is also used: multiple simulations may be
run, and statistics may be captured therefrom.
– Example: To estimate the degree to which asking Bob for food (the procedure P is “asking
for food”, the context C is “being with Bob”) will achieve the goal G of getting food, the
virtual dog may study its memory to see what happened on previous occasions where it
or other dogs asked Bob for food or other things, and then integrate the evidence from
these occasions.
• Procedural knowledge, mapped into declarative knowledge and then acted on by PLN inference,
can be useful for estimating the probability of the implication C ∧ P → G, in cases
where the probability of C ∧ P 1 → G is known for some P 1 related to P .
– Example: knowledge of the internal similarity between the procedure of asking for food
and the procedure of asking for toys, allows the virtual dog to reason that if asking Bob
for toys has been successful, maybe asking Bob for food will be successful too.
• Inference, acting on declarative or sensory knowledge, can be useful for estimating the
probability of the implication C ∧ P → G, in cases where the probability of C 1 ∧ P → G is
known for some C 1 related to C.
– Example: if Bob and Jim have a lot of features in common, and Bob often responds
positively when asked for food, then maybe Jim will too.
• Inference can be used similarly for estimating the probability of the implication C ∧P → G,
in cases where the probability of C ∧ P → G 1 is known for some G 1 related to G. Concept
8.6 Cognitive Synergy for Procedural and Declarative Learning 153
creation can be useful indirectly in calculating these probability estimates, via providing
new concepts that can be used to make useful inference trails more compact and hence
easier to construct.
– Example: The dog may reason that because Jack likes to play, and Jack and Jill are both
children, maybe Jill likes to play too. It can carry out this reasoning only if its concept
creation process has invented the concept of “child” via analysis of observed data.
In these examples we have focused on cases where two terms in the cognitive schematic are
fixed and the third must be filled in; but just as often, the situation is that only one of the
terms is fixed. For instance, if we fix G, sometimes the best approach will be to collectively
learn C and P . This requires either a procedure learning method that works interactively with a
declarative-knowledge-focused concept learning or reasoning method; or a declarative learning
method that works interactively with a procedure learning method. That is, it requires the sort
of cognitive synergy built into the CogPrime design.
8.6 Cognitive Synergy for Procedural and Declarative Learning
We now present a little more algorithmic detail regarding the operation and synergetic interaction
of CogPrime’s two most sophisticated components: the MOSES procedure learning
algorithm (see Chapter 33), and the PLN uncertain inference framework (see Chapter 34). The
treatment is necessarily quite compact, since we have not yet reviewed the details of either
MOSES or PLN; but as well as illustrating the notion of cognitive synergy more concretely,
perhaps the high-level discussion here will make clearer how MOSES and PLN fit into the big
picture of CogPrime.
8.6.1 Cognitive Synergy in MOSES
MOSES, CogPrime’s primary algorithm for learning procedural knowledge, has been tested on
a variety of application problems including standard GP test problems, virtual agent control,
biological data analysis and text classification [Loo06]. It represents procedures internally as
program trees. Each node in a MOSES program tree is supplied with a “knob,” comprising a
set of values that may potentially be chosen to replace the data item or operator at that node.
So for instance a node containing the number 7 may be supplied with a knob that can take
on any integer value. A node containing a while loop may be supplied with a knob that can
take on various possible control flow operators including conditionals or the identity. A node
containing a procedure representing a particular robot movement, may be supplied with a knob
that can take on values corresponding to multiple possible movements. Following a metaphor
suggested by Douglas Hofstadter [Hof96], MOSES learning covers both “knob twiddling” (setting
the values of knobs) and “knob creation.”
MOSES is invoked within CogPrime in a number of ways, but most commonly for finding a
procedure P satisfying a probabilistic implication C&P → G as described above, where C is an
observed context and G is a system goal. In this case the probability value of the implication
provides the “scoring function” that MOSES uses to assess the quality of candidate procedures.
154 8 Cognitive Synergy
Fig. 8.4: High-Level Control Flow of MOSES Algorithm
For example, suppose an CogPrime -controlled robot is trying to learn to play the game
of “tag." (I.e. a multi-agent game in which one agent is specially labeled "it", and runs after
the other player agents, trying to touch them. Once another agent is touched, it becomes the
new "it" and the previous "it" becomes just another player agent.) Then its context C is that
others are trying to play a game they call “tag” with it; and we may assume its goals are to
please them and itself, and that it has figured out that in order to achieve this goal it should
learn some procedure to follow when interacting with others who have said they are playing
“tag.” In this case a potential tag-playing procedure might contain nodes for physical actions
like step_f orward(speed s), as well as control flow nodes containing operators like if else
(for instance, there would probably be a conditional telling the robot to do something different
depending on whether someone seems to be chasing it). Each of these program tree nodes would
have an appropriate knob assigned to it. And the scoring function would evaluate a procedure
P in terms of how successfully the robot played tag when controlling its behaviors according to
P (noting that it may also be using other control procedures concurrently with P ). It’s worth
noting here that evaluating the scoring function in this case involves some inference already,
because in order to tell if it is playing tag successfully, in a real-world context, it must watch
and understand the behavior of the other players.
MOSES follows the high-level control flow depicted in Figure 8.4, which corresponds to the
following process for evolving a metapopulation of “demes“ of programs (each deme being a set
of relatively similar programs, forming a sort of island in program space):
1. Construct an initial set of knobs based on some prior (e.g., based on an empty program;
or more interestingly, using prior knowledge supplied by PLN inference based on the
system’s memory) and use it to generate an initial random sampling of programs. Add this
deme to the metapopulation.
2. Select a deme from the metapopulation and update its sample, as follows:
8.6 Cognitive Synergy for Procedural and Declarative Learning 155
a. Select some promising programs from the deme’s existing sample to use for modeling,
according to the scoring function.
b. Considering the promising programs as collections of knob settings, generate new collections
of knob settings by applying some (competent) optimization algorithm. For best
performance on difficult problems, it is important to use an optimization algorithm that
makes use of the system’s memory in its choices, consulting PLN inference to help
estimate which collections of knob settings will work best.
c. Convert the new collections of knob settings into their corresponding programs, reduce
the programs to normal form, evaluate their scores, and integrate them into the
deme’s sample, replacing less promising programs. In the case that scoring is expensive,
score evaluation may be preceded by score estimation, which may use PLN inference,
enaction of procedures in an internal simulation environment, and/or similarity
matching against episodic memory.
3. For each new program that meet the criterion for creating a new deme, if any:
a. Construct a new set of knobs (a process called “representation-building”) to define a
region centered around the program (the deme’s exemplar), and use it to generate a
new random sampling of programs, producing a new deme.
b. Integrate the new deme into the metapopulation, possibly displacing less promising
demes.
4. Repeat from step 2.
MOSES is a complex algorithm and each part plays its role; if any one part is removed the
performance suffers significantly [Loo06]. However, the main point we want to highlight here is
the role played by synergetic interactions between MOSES and other cognitive components such
as PLN, simulation and episodic memory, as indicated in boldface in the above pseudocode.
MOSES is a powerful procedure learning algorithm, but used on its own it runs into scalability
problems like any other such algorithm; the reason we feel it has potential to play a major role
in a human-level AI system is its capacity for productive interoperation with other cognitive
components.
Continuing the “tag” example, the power of MOSES’s integration with other cognitive processes
would come into play if, before learning to play tag, the robot has already played simpler
games involving chasing. If the robot already has experience chasing and being chased by other
agents, then its episodic and declarative memory will contain knowledge about how to pursue
and avoid other agents in the context of running around an environment full of objects, and this
knowledge will be deployable within the appropriate parts of MOSES’s Steps 1 and 2. Crossprocess
and cross-memory-type integration make it tractable for MOSES to act as a “transfer
learning” algorithm, not just a task-specific machine-learning algorithm.
8.6.2 Cognitive Synergy in PLN
While MOSES handles much of CogPrime’s procedural learning, and OpenCogPrimes internal
simulation engine handles most episodic knowledge, CogPrime’s primary tool for handling
declarative knowledge is an uncertain inference framework called Probabilistic Logic Networks
(PLN). The complexities of PLN are the topic of a lengthy technical monograph [GMIH08], and
156 8 Cognitive Synergy
here we will eschew most details and focus mainly on pointing out how PLN seeks to achieve
efficient inference control via integration with other cognitive processes.
As a logic, PLN is broadly integrative: it combines certain term logic rules with more standard
predicate logic rules, and utilizes both fuzzy truth values and a variant of imprecise probabilities
called indefinite probabilities. PLN mathematics tells how these uncertain truth values propagate
through its logic rules, so that uncertain premises give rise to conclusions with reasonably
accurately estimated uncertainty values. This careful management of uncertainty is critical for
the application of logical inference in the robotics context, where most knowledge is abstracted
from experience and is hence highly uncertain.
PLN can be used in either forward or backward chaining mode; and in the language introduced
above, it can be used for either analysis or synthesis. As an example, we will consider
backward chaining analysis, exemplified by the problem of a robot preschool-student trying to
determine whether a new playmate “Bob” is likely to be a regular visitor to its preschool or not
(evaluating the truth value of the implication Bob → regular_visitor). The basic backward
chaining process for PLN analysis looks like:
1. Given an implication L ≡ A → B whose truth value must be estimated (for instance
L ≡ C&P → G as discussed above), create a list (A 1 , ..., A n ) of (inference rule, stored
knowledge) pairs that might be used to produce L
2. Using analogical reasoning to prior inferences, assign each A i a probability of success
• If some of the A i are estimated to have reasonable probability of success at generating
reasonably confident estimates of L’s truth value, then invoke Step 1 with A i in place
of L (at this point the inference process becomes recursive)
• If none of the A i looks sufficiently likely to succeed, then inference has “gotten stuck”
and another cognitive process should be invoked, e.g.
– Concept creation may be used to infer new concepts related to A and B, and then
Step 1 may be revisited, in the hope of finding a new, more promising A i involving
one of the new concepts
– MOSES may be invoked with one of several special goals, e.g. the goal of finding
a procedure P so that P (X) predicts whether X → B. If MOSES finds such a
procedure P then this can be converted to declarative knowledge understandable
by PLN and Step 1 may be revisited....
– Simulations may be run in CogPrime’s internal simulation engine, so as to observe
the truth value of A → B in the simulations; and then Step 1 may be revisited....
The combinatorial explosion of inference control is combatted by the capability to defer to
other cognitive processes when the inference control procedure is unable to make a sufficiently
confident choice of which inference steps to take next. Note that just as MOSES may rely
on PLN to model its evolving populations of procedures, PLN may rely on MOSES to create
complex knowledge about the terms in its logical implications. This is just one example of the
multiple ways in which the different cognitive processes in CogPrime interact synergetically; a
more thorough treatment of these interactions is given in Chapter 49.
In the “new playmate” example, the interesting case is where the robot initially seems not
to know enough about Bob to make a solid inferential judgment (so that none of the A i seem
particularly promising). For instance, it might carry out a number of possible inferences and not
come to any reasonably confident conclusion, so that the reason none of the A i seem promising
is that all the decent-looking ones have been tried already. So it might then recourse to MOSES,
simulation or concept creation.
8.7 Is Cognitive Synergy Tricky? 157
For instance, the PLN controller could make a list of everyone who has been a regular
visitor, and everyone who has not been, and pose MOSES the task of figuring out a procedure
for distinguishing these two categories. This procedure could then used directly to make the
needed assessment, or else be translated into logical rules to be used within PLN inference. For
example, perhaps MOSES would discover that older males wearing ties tend not to become
regular visitors. If the new playmate is an older male wearing a tie, this is directly applicable.
But if the current playmate is wearing a tuxedo, then PLN may be helpful via reasoning that
even though a tuxedo is not a tie, it’s a similar form of fancy dress – so PLN may extend the
MOSES-learned rule to the present case and infer that the new playmate is not likely to be a
regular visitor.
8.7 Is Cognitive Synergy Tricky?
1
In this section we use the notion of cognitive synergy to explore a question that arises
frequently in the AGI community: the well-known difficulty of measuring intermediate progress
toward human-level AGI. We explore some potential reasons underlying this, via extending the
notion of cognitive synergy to a more refined notion of "tricky cognitive synergy." These ideas
are particularly relevant to the problem of creating a roadmap toward AGI, as we’ll explore in
Chapter 17 below.
8.7.1 The Puzzle: Why Is It So Hard to Measure Partial Progress
Toward Human-Level AGI?
It’s not entirely straightforward to create tests to measure the final achievement of human-level
AGI, but there are some fairly obvious candidates here. There’s the Turing Test (fooling judges
into believing you’re human, in a text chat), the video Turing Test, the Robot College Student
test (passing university, via being judged exactly the same way a human student would), etc.
There’s certainly no agreement on which is the most meaningful such goal to strive for, but
there’s broad agreement that a number of goals of this nature basically make sense.
On the other hand, how does one measure whether one is, say, 50 percent of the way to
human-level AGI? Or, say, 75 or 25 percent?
It’s possible to pose many "practical tests" of incremental progress toward human-level AGI,
with the property that if a proto-AGI system passes the test using a certain sort of architecture
and/or dynamics, then this implies a certain amount of progress toward human-level AGI based
on particular theoretical assumptions about AGI. However, in each case of such a practical test,
it seems intuitively likely to a significant percentage of AGI researchers that there is some way
to "game" the test via designing a system specifically oriented toward passing that test, and
which doesn’t constitute dramatic progress toward AGI.
Some examples of practical tests of this nature would be
1 This section co-authored with Jared Wigmore
158 8 Cognitive Synergy
• The Wozniak "coffee test": go into an average American house and figure out how to make
coffee, including identifying the coffee machine, figuring out what the buttons do, finding
the coffee in the cabinet, etc.
• Story understanding – reading a story, or watching it on video, and then answering questions
about what happened (including questions at various levels of abstraction)
• Graduating (virtual-world or robotic) preschool
• Passing the elementary school reading curriculum (which involves reading and answering
questions about some picture books as well as purely textual ones)
• Learning to play an arbitrary video game based on experience only, or based on experience
plus reading instructions
One interesting point about tests like this is that each of them seems to some AGI researchers
to encapsulate the crux of the AGI problem, and be unsolvable by any system not far along
the path to human-level AGI – yet seems to other AGI researchers, with different conceptual
perspectives, to be something probably game-able by narrow-AI methods. And of course, given
the current state of science, there’s no way to tell which of these practical tests really can be
solved via a narrow-AI approach, except by having a lot of people try really hard over a long
period of time.
A question raised by these observations is whether there is some fundamental reason why
it’s hard to make an objective, theory-independent measure of intermediate progress toward
advanced AGI. Is it just that we haven’t been smart enough to figure out the right test – or is
there some conceptual reason why the very notion of such a test is problematic?
We don’t claim to know for sure – but in the rest of this section we’ll outline one possible
reason why the latter might be the case.
8.7.2 A Possible Answer: Cognitive Synergy is Tricky!
Why might a solid, objective empirical test for intermediate progress toward AGI be an infeasible
notion? One possible reason, we suggest, is precisely cognitive synergy, as discussed
above.
The cognitive synergy hypothesis, in its simplest form, states that human-level AGI intrinsically
depends on the synergetic interaction of multiple components (for instance, as in
CogPrime, multiple memory systems each supplied with its own learning process). In this hypothesis,
for instance, it might be that there are 10 critical components required for a humanlevel
AGI system. Having all 10 of them in place results in human-level AGI, but having only
8 of them in place results in having a dramatically impaired system – and maybe having only
6 or 7 of them in place results in a system that can hardly do anything at all.
Of course, the reality is almost surely not as strict as the simplified example in the above
paragraph suggests. No AGI theorist has really posited a list of 10 crisply-defined subsystems
and claimed them necessary and sufficient for AGI. We suspect there are many different routes
to AGI, involving integration of different sorts of subsystems. However, if the cognitive synergy
hypothesis is correct, then human-level AGI behaves roughly like the simplistic example in the
prior paragraph suggests. Perhaps instead of using the 10 components, you could achieve humanlevel
AGI with 7 components, but having only 5 of these 7 would yield drastically impaired
functionality – etc. Or the point could be made without any decomposition into a finite set
of components, using continuous probability distributions. To mathematically formalize the
8.7 Is Cognitive Synergy Tricky? 159
cognitive synergy hypothesis becomes complex, but here we’re only aiming for a qualitative
argument. So for illustrative purposes, we’ll stick with the "10 components" example, just for
communicative simplicity.
Next, let’s suppose that for any given task, there are ways to achieve this task using a system
that is much simpler than any subset of size 6 drawn from the set of 10 components needed
for human-level AGI, but works much better for the task than this subset of 6 components
(assuming the latter are used as a set of only 6 components, without the other 4 components).
Note that this supposition is a good bit stronger than mere cognitive synergy. For lack of
a better name, we’ll call it tricky cognitive synergy. The tricky cognitive synergy hypothesis
would be true if, for example, the following possibilities were true:
• creating components to serve as parts of a synergetic AGI is harder than creating components
intended to serve as parts of simpler AI systems without synergetic dynamics
• components capable of serving as parts of a synergetic AGI are necessarily more complicated
than components intended to serve as parts of simpler AGI systems.
These certainly seem reasonable possibilities, since to serve as a component of a synergetic AGI
system, a component must have the internal flexibility to usefully handle interactions with a lot
of other components as well as to solve the problems that come its way. In a CogPrime context,
these possibilities ring true, in the sense that tailoring an AI process for tight integration with
other AI processes within CogPrime, tends to require more work than preparing a conceptually
similar AI process for use on its own or in a more task-specific narrow AI system.
It seems fairly obvious that, if tricky cognitive synergy really holds up as a property of
human-level general intelligence, the difficulty of formulating tests for intermediate progress
toward human-level AGI follows as a consequence. Because, according to the tricky cognitive
synergy hypothesis, any test is going to be more easily solved by some simpler narrow AI process
than by a partially complete human-level AGI system.
8.7.3 Conclusion
We haven’t proved anything here, only made some qualitative arguments. However, these arguments
do seem to give a plausible explanation for the empirical observation that positing tests
for intermediate progress toward human-level AGI is a very difficult prospect. If the theoretical
notions sketched here are correct, then this difficulty is not due to incompetence or lack
of imagination on the part of the AGI community, nor due to the primitive state of the AGI
field, but is rather intrinsic to the subject matter. And if these notions are correct, then quite
likely the future rigorous science of AGI will contain formal theorems echoing and improving
the qualitative observations and conjectures we’ve made here.
If the ideas sketched here are true, then the practical consequence for AGI development
is, very simply, that one shouldn’t worry a lot about producing intermediary results that are
compelling to skeptical observers. Just at 2/3 of a human brain may not be of much use,
similarly, 2/3 of an AGI system may not be much use. Lack of impressive intermediary results
may not imply one is on a wrong development path; and comparison with narrow AI systems on
specific tasks may be badly misleading as a gauge of incremental progress toward human-level
AGI.
160 8 Cognitive Synergy
Hopefully it’s clear that the motivation behind the line of thinking presented here is a desire
to understand the nature of general intelligence and its pursuit – not a desire to avoid testing our
AGI software! Really, as AGI engineers, we would love to have a sensible rigorous way to test our
intermediary progress toward AGI, so as to be able to pose convincing arguments to skeptics,
funding sources, potential collaborators and so forth. Our motivation here is not a desire to
avoid having the intermediate progress of our efforts measured, but rather a desire to explain
the frustrating (but by now rather well-established) difficulty of creating such intermediate
goals for human-level AGI in a meaningful way.
If we or someone else figures out a compelling way to measure partial progress toward AGI,
we will celebrate the occasion. But it seems worth seriously considering the possibility that the
difficulty in finding such a measure reflects fundamental properties of general intelligence.
From a practical CogPrime perspective, we are interested in a variety of evaluation and
testing methods, including the "virtual preschool" approach mentioned briefly above and more
extensively in later chapters. However, our focus will be on evaluation methods that give us
meaningful information about CogPrime’s progress, given our knowledge of how CogPrime
works and our understanding of the underlying theory. We are unlikely to focus on the achievement
of intermediate test results capable of convincing skeptics of the reality of our partial
progress, because we have not yet seen any credible tests of this nature, and because we suspect
the reasons for this lack may be rooted in deep properties of feasible general intelligence, such
as tricky cognitive synergy.
Chapter 9
General Intelligence in the Everyday Human
World
9.1 Introduction
Intelligence is not just about what happens inside a system, but also about what happens outside
that system, and how the system interacts with its environment. Real-world general intelligence
is about intelligence relative to some particular class of environments, and human-like general
intelligence is about intelligence relative to the particular class of environments that humans
evolved in (which in recent millennia has included environments humans have created using
their intelligence). In Chapter 2, we reviewed some specific capabilities characterizing humanlike
general intelligence; to connect these with the general theory of general intelligence from the
last few chapters, we need to explain what aspects of human-relevant environments correspond
to these human-like intelligent capabilities. We begin with aspects of the environment related
to communication, which turn out to tie in closely with cognitive synergy. Then we turn to
physical aspects of the environment, which we suspect also connect closely with various human
cognitive capabilities. Finally we turn to physical aspects of the human body and their relevance
to the human mind. In the following chapter we present a deeper, more abstract theoretical
framework encompassing these ideas.
These ideas are of theoretical importance, and they’re also of practical importance when one
turns to the critical area of AGI environment design. If one is going to do anything besides
release one’s young AGI into the “wilds” of everyday human life, then one has to put some
thought into what kind of environment it will be raised in. This may be a virtual world or it
may be a robot preschool or some other kind of physical environment, but in any case some
specific choices must be made about what to include. Specific choices must also be made about
what kind of body to give one’s AGI system – what sensors and actuators, and so forth. In
Chapter 16 we will present some specific suggestions regarding choices of embodiment and
environment that we find to be ideal for AGI development – virtual and robot preschools – but
the material in this chapter is of more general import, beyond any such particularities. If one
has an intuitive idea of what properties of body and world human intelligence is biased for,
then one can make practical choices about embodiment and environment in a principled rather
than purely ad hoc or opportunistic way.
161
162 9 General Intelligence in the Everyday Human World
9.2 Some Broad Properties of the Everyday World That Help
Structure Intelligence
The properties of the everyday world that help structure intelligence are diverse and span
multiple levels of abstraction. Most of this chapter will focus on fairly concrete patterns of this
nature, such as are involved in inter-agent communication and naive physics; however, it’s also
worth noting the potential importance of more abstract patterns distinguishing the everyday
world from arbitrary mathematical environments.
The propensity to search for hierarchical patterns is one huge potential example of an abstract
everyday-world property. We strongly suspect the reason that searching for hierarchical
patterns works so well, in so many everyday-world contexts, lies in the particular structure of
the everyday world – it’s not something that would be true across all possible environments
(even if one weights the space of possible environments in some clever way, say using programlength
according to some standard computational model). However, this sort of assertion is of
course highly “philosophical,” and becomes complex to formulate and defend convincingly given
the current state of science and mathematics.
Going one step further, we recall from Chapter 3 a structure called the “dual network”, which
consists of superposed hierarchical and heterarchical networks: basically a hierarchy in which
the distance between two nodes in the hierarchy is correlated with the distance between the
nodes in some metric space. Another high level property of the everyday world may be that dual
network structures are prevalent. This would imply that minds biased to represent the world in
terms of dual network structure are likely to be intelligent with respect to the everyday world.
In a different direction, the extreme commonality of symmetry groups in the (everyday and
otherwise) physical world is another example: they occur so often that minds oriented toward
recognizing patterns involving symmetry groups are likely to be intelligent with respect to the
real world.
We suspect that the number of cognitively-relevant properties of the everyday world is huge
... and that the essence of everyday-world intelligence lies in the list of varyingly abstract and
concrete properties, which must be embedded implicitly or explicitly in the structure of a natural
or artificial intelligence for that system to have everyday-world intelligence.
Apart from these particular yet abstract properties of the everyday world, intelligence is just
about “finding patterns in which actions tend to achieve which goals in which situations” ... but,
the simple meta-algorithm needed to accomplish this universally is, we suggest, only a small
percentage what it takes to make a mind.
You might say that a sufficiently generally intelligent system should be able to infer the
various cognitively-relevant properties of the environment from looking at data about the everyday
world. We agree in principle, and in fact Ben Kuipers and his colleagues have done
some interesting work in this direction, showing that learning algorithms can infer some basics
about the structure of space and time from experience [MK07]. But we suggest that doing this
really thoroughly would require a massively greater amount of processing power than an AGI
that embodies and hence automatically utilizes these principles. It may be that the problem of
inferring these properties is so hard as to require a wildly infeasible AIXI tl / Godel Machine
type system.
9.3 Embodied Communication 163
9.3 Embodied Communication
Next we turn to the potential cognitive implications of seeking to achieve goals in an environment
in which multimodal communication with other agents plays a prominent role.
Consider a community of embodied agents living in a shared world, and suppose that the
agents can communicate with each other via a set of mechanisms including:
• Linguistic communication, in a language whose semantics is largely (not necessarily
wholly) interpretable based on the mutually experienced world
• Indicative communication, in which e.g. one agent points to some part of the world or
delimits some interval of time, and another agent is able to interpret the meaning
• Demonstrative communication, in which an agent carries out a set of actions in the
world, and the other agent is able to imitate these actions, or instruct another agent as to
how to imitate these actions
• Depictive communication, in which an agent creates some sort of (visual, auditory, etc.)
construction to show another agent, with a goal of causing the other agent to experience
phenomena similar to what they would experience upon experiencing some particular entity
in the shared environment
• Intentional communication, in which an agent explicitly communicates to another agent
what its goal is in a certain situation 1
It is clear that ordinary everyday communication between humans possesses all these aspects.
We define the Embodied Communication Prior (ECP) as the probability distribution in
which the probability of an entity (e.g. a goal or environment) is proportional to the difficulty of
describing that entity, for a typical member of the community in question, using a particular set
of communication mechanisms including the above five modes. We will sometimes refer to the
prior probability of an entity under this distribution, as its “simplicity” under the distribution.
Next, to further specialize the Embodied Communication Prior, we will assume that for
each of these modes of communication, there are some aspects of the world that are much
more easily communicable using that mode than the other modes. For instance, in the human
everyday world:
• Abstract (declarative) statements spanning large classes of situations are generally much
easier to communicate linguistically
• Complex, multi-part procedures are much easier to communicate either demonstratively, or
using a combination of demonstration with other modes
• Sensory or episodic data is often much easier to communicate demonstratively
• The current value of attending to some portion of the shared environment is often much
easier to communicate indicatively
• Information about what goals to follow in a certain situation is often much easier to communicate
intentionally, i.e. via explicitly indicating what one’s own goal is
These simple observations have significant implications for the nature of the Embodied Communication
Prior. For one thing they let us define multiple forms of knowledge:
• Isolatedly declarative knowledge is that which is much more easily communicable linguistically
1 in Appendix ?? we recount some interesting recent results showing that mirror neurons fire in response to
some cases of intentional communication as thus defined
164 9 General Intelligence in the Everyday Human World
• Isolatedly procedural knowledge is that which is much more easily communicable
demonstratively
• Isolatedly sensory knowledge is that which is much more easily communicable depictively
• Isolatedly attentive knowledge is that which is much more easily communicable indicatively
• Isolatedly intentional knowledge is that which is much more easily communicable intentionally
This categorization of knowledge types resembles many ideas from the cognitive theory of
memory [TC05], although the distinctions drawn here are a little crisper than any classification
currently derivable from available neurological or psychological data.
Of course there may be much knowledge, of relevance to systems seeking intelligence according
to the ECP, that does not fall into any of these categories and constitutes “mixed knowledge.”
There are some very important specific subclasses of mixed knowledge. For instance, episodic
knowledge (knowledge about specific real or hypothetical sets of events) will most easily be
communicated via a combination of declarative, sensory and (in some cases) procedural communication.
Scientific and mathematical knowledge are generally mixed knowledge, as is most
everyday commonsense knowledge.
Some cases of mixed knowledge are reasonably well decomposable, in the sense that they
decompose into knowledge items that individually fall into some specific knowledge type. For
instance, an experimental chemistry procedure may be much more easily communicable procedurally,
whereas an allied piece of knowledge from theoretical chemistry may be much more
easily communicable declaratively; but in order to fully communicate either the experimental
procedure or the abstract piece of knowledge, one may ultimately need to communicate both
aspects.
Also, even when the best way to communicate something is mixed-mode, it may be possible
to identify one mode that poses the most important part of the communication. An example
would be a chemistry experiment that is best communicated via a practical demonstration
together with a running narrative. It may be that the demonstration without the narrative
would be vastly more valuable than the narrative without the demonstration. To cover such
cases we may make less restrictive definitions such as
• Interactively declarative knowledge is that which is much more easily communicable
in a manner dominated by linguistic communication
and so forth. We call these “interactive knowledge categories,” by contrast to the “isolated
knowledge categories” introduced earlier.
9.3.0.1 Naturalness of Knowledge Categories
Next we introduce an assumption we call NKC, for Naturalness of Knowledge Categories.
The NKC assumption states that the knowledge in each of the above isolated and interactive
communication-modality-focused categories forms a “natural category,” in the sense that
for each of these categories, there are many different properties shared by a large percentage of
the knowledge in the category, but not by a large percentage of the knowledge in the other categories.
This means that, for instance, procedural knowledge systematically (and statistically)
has different characteristics than the other kinds of knowledge.
9.3 Embodied Communication 165
The NKC assumption seems commonsensically to hold true for human everyday knowledge,
and it has fairly dramatic implications for general intelligence. Suppose we conceive general
intelligence as the ability to achieve goals in the environment shared by the communicating
agents underlying the Embodied Communication Prior. Then, NKC suggests that the best way
to achieve general intelligence according to the Embodied Communication Prior is going to
involve
• specialized methods for handling declarative, procedural, sensory and attentional knowledge
(due to the naturalness of the isolated knowledge categories)
• specialized methods for handling interactions between different types of knowledge, including
methods focused on the case where one type of knowledge is primary and the others are
supporting (the latter due to the naturalness of the interactive knowledge categories)
9.3.0.2 Cognitive Completeness
Suppose we conceive an AI system as consisting of a set of learning capabilities, each one
characterized by three features:
• One or more knowledge types that it is competent to deal with, in the sense of the two
key learning problems mentioned above
• At least one learning type: either analysis, or synthesis, or both
• At least one interaction type, for each (knowledge type, learning type) pair it handles:
“isolated” (meaning it deals mainly with that knowledge type in isolation), or “interactive”
(meaning it focuses on that knowledge type but in a way that explicitly incorporates other
knowledge types into its process), or “fully mixed” (meaning that when it deals with the
knowledge type in question, no particular knowledge type tends to dominate the learning
process).
Then, intuitively, it seems to follow from the ECP with NKC that systems with high efficient
general intelligence should have the following properties, which collectively we’ll call cognitive
completeness:
• For each (knowledge type, learning type, interaction type) triple, there should be a learning
capability corresponding to that triple.
• Furthermore the capabilities corresponding to different (knowledge type, interaction type)
pairs should have distinct characteristics (since according to the NKC the isolated knowledge
corresponding to a knowledge type is a natural category, as is the dominant knowledge
corresponding to a knowledge type)
• For each (knowledge type, learning type) pair (K,L), and each other knowledge type K1
distinct from K, there should be a distinctive capability with interaction type “interactive”
and dealing with knowledge that is interactively K but also includes aspects of K1
Furthermore, it seems intuitively sensible that according to the ECP with NKC, if the capabilities
mentioned in the above points are reasonably able, then the system possessing the
capabilities will display general intelligence relative to the ECP. Thus we arrive at the hypothesis
that
166 9 General Intelligence in the Everyday Human World
Under the assumption of the Embodied Communication Prior (with the Natural
Knowledge Categories assumption), the property above called “cognitive completeness”
is necessary and sufficient for efficient general intelligence at the level of an
inteligent adult human (e.g. at the Piagetan formal level [Pia53]).
Of course, the above considerations are very far from a rigorous mathematical proof (or
even precise formulation) of this hypothesis. But we are presenting this here as a conceptual
hypothesis, in order to qualitatively guide our practical AGI R&D and also to motivate further,
more rigorous theoretical work.
9.3.1 Generalizing the Embodied Communication Prior
One interesting direction for further research would be to broaden the scope of the inquiry, in
a manner suggested above: instead of just looking at the ECP, look at simplicity measures in
general, and attack the question of how a mind must be structured in order to display efficient
general intelligence relative to a specified simplicity measure. This problem seems unapproachable
in general, but some special cases may be more tractable.
For instance, suppose one has
• a simplicity measure that (like the ECP) is approximately decomposable into a set of fairly
distinct components, plus their interactions
• an assumption similar to NKC, which states that the entities displaying simplicity according
to each of the distinct components, are roughly clustered together in entity-space
Then one should be able to say that, to achieve efficient general intelligence relative to
this decomposable simplicity measure, a system should have distinct capabilities corresponding
to each of the components of the simplicity measure interactions between these capabilities,
corresponding to the interaction terms in the simplicity measure.
With copious additional work, these simple observations could potentially serve as the seed for
a novel sort of theory of general intelligence – a theory of how the structure of a system depends
on the structure of the simplicity measure with which it achieves efficient general intelligence.
Cognitive Synergy Theory would then emerge as a special case of this more abstract theory.
9.4 Naive Physics
Multimodal communication is an important aspect of the environment for which human intelligence
evolved – but not the only one. It seems likely that our human intelligence is also
closely adapted to various aspects of our physical environment – a matter that is worth carefully
attending as we design environments for our robotically or virtually embodied AGI systems to
operate in.
One interesting guide to the most cognitively relevant aspects of human environments is the
subfield of AI known as “naive physics” [Hay85] – a term that refers to the theories about the
physical world that human beings implicitly develop and utilize during their lives. For instance,
9.4 Naive Physics 167
when you figure out that you need to pressure the knife slightly harder when spreading peanut
butter rather than jelly, you’re not making this judgment using Newtonian physics or the
Navier-Stokes equations of fluid dynamics; you’re using heuristic patterns that you figured out
through experience. Maybe you figured out these patterns through experience spreading peanut
butter and jelly in particular. Or maybe you figured these heuristic patterns out before you ever
tried to spread peanut butter or jelly specifically, via just touching peanut butter and jelly to
see what they feel like, and then carrying out inference based on your experience manipulating
similar tools in the context of similar substances.
Other examples of similar “naive physics” patterns are easy to come by, e.g.
1. What goes up must come down.
2. A dropped object falls straight down.
3. A vacuum sucks things towards it.
4. Centrifugal force throws rotating things outwards.
5. An object is either at rest or moving, in an absolute sense.
6. Two events are simultaneous or they are not.
7. When running downhill, one must lift one’s knees up high.
8. When looking at something that you just barely can’t discern accurately, squint.
Attempts to axiomatically formulate naive physics have historically come up short, and we
doubt this is a promising direction for AGI. However, we do think the naive physics literature
does a good job of identifying the various phenomena that the human mind’s naive physics deals
with. So, from the point of view of AGI environment design, naive physics is a useful source
of requirements. Ideally, we would like an AGI’s environment to support all the fundamental
phenomena that naive physics deals with.
We now describe some key aspects of naive physics in a more systematic manner. Naive
physics has many different formulations; in this section we draw heavily on [SC94], who divide
naive physics phenomena into 5 categories. Here we review these categories and identify a
number of important things that humanlike intelligent agents must be able to do relative to
each of them.
9.4.1 Objects, Natural Units and Natural Kinds
One key aspect of naive physics involves recognition of various aspects of objects, such as:
1. Recognition of objects amidst noisy perceptual data
2. Recognition of surfaces and interiors of objects
3. Recognition of objects as manipulable units
4. Recognition of objects as potential subjects of fragmentation (splitting, cutting) and of
unification (gluing, bonding)
5. Recognition of the agent’s body as an object, and as parts of the agent’s body as objects
6. Division of universe of perceived objects into “natural kinds”, each containing typical and
atypical instances
168 9 General Intelligence in the Everyday Human World
9.4.2 Events, Processes and Causality
Specific aspects of naive physics related to temporality and causality are:
1. Distinguishing roughly-subjectively-instantaneous events from extended processes
2. Identifying beginnings, endings and crossings of processes
3. Identifying and distinguishing internal and external changes
4. Identifying and distinguishing internal and external changes relative to one’s own body
5. Interrelating body-changes with changes in external entities
Notably, these aspects of naive physics involve a different processes occurring on a variety of
different time scales, intersecting in complex patterns, and involving processes inside the agent’s
body, outside the agent’s body, and crossing the boundary of the agent’s body.
9.4.3 Stuffs, States of Matter, Qualities
Regarding the various states of matter, some important aspects of naive physics are:
1. Perceiving gaps between objects: holes, media, illusions like rainbows, mirages and holograms
2. Distinguishing the manners in which different sorts of entities (e.g. smells, sounds, light) fill
space
3. Distinguishing properties such as smoothness, roughness, graininess, stickiness, runniness,
etc.
4. Distinguishing degrees of elasticity and fragility
5. Assessing separability of aggregates
9.4.4 Surfaces, Limits, Boundaries, Media
Gibson [Gib77, Gib79] has argued that naive physics is not mainly about objects but rather
mainly about surfaces. Surfaces have a variety of aspects and relationships that are important
for naive physics, such as:
1. Perceiving and reasoning about surfaces as two-sided or one-sided interfaces
2. Inference of the various ecological laws of surfaces
3. Perception of various media in the world as separated by surfaces
4. Recognition of the textures of surfaces
5. Recognition of medium/surface layout relationships such as: ground, open environment,
enclosure, detached object, attached object, hollow object, place, sheet, fissure, stick, fibre,
dihedral, etc.
As a concrete, evocative “toy” example of naive everyday knowledge about surfaces and
boundaries, consider Sloman’s [Slo08a] example scenario, depicted in Figure 9.1 and drawn
largely from [SS74] (see also related discussion in [Slo08b], in which “A child can be given one
9.4 Naive Physics 169
Fig. 9.1: One of Sloman’s example test domains for real-world inference. Left: a number of pins
and a rubber band to be stretched around them. Right: use of the pins and rubber band to
make a letter T.
or more rubber bands and a pile of pins, and asked to use the pins to hold the band in place to
form a particular shape)... For example, things to be learnt could include”:
1. There is an area inside the band and an area outside the band.
2. The possible effects of moving a pin that is inside the band towards or further away
from other pins inside the band. (The effects can depend on whether the band is already
stretched.)
3. The possible effects of moving a pin that is outside the band towards or further away from
other pins inside the band.
4. The possible effects of adding a new pin, inside or outside the band, with or without pushing
the band sideways with the pin first.
5. The possible effects of removing a pin, from a position inside or outside the band.
6. Patterns of motion/change that can occur and how they affect local and global shape
(e.g. introducing a concavity or convexity, introducing or removing symmetry, increasing or
decreasing the area enclosed).
7. The possibility of causing the band to cross over itself. (NB: Is an odd number of crosses
possible?)
8. How adding a second, or third band can enrich the space of structures, processes and effects
of processes.
9.4.5 What Kind of Physics Is Needed to Foster Human-like
Intelligence?
We stated above that we would like an AGI’s environment to support all the fundamental phenomena
that naive physics deals with; and we have now reviewed a number of these specific
phenomena. But it’s not entirely clear what the “fundamental” aspects underlying these phenomena
are. One important question in the environment-design context is how close an AGI
environment needs to stick to the particulars of real-world naive physics. Is it important that a
young AGI can play with the specific differences between spreading peanut butter versus jelly?
Or is it enough that it can play with spreading and smearing various substances of different
consistencies? How close does the analogy between an AGI environment’s naive physics and
170 9 General Intelligence in the Everyday Human World
real-world naive physics need to be? This is a question to which we have no scientific answer at
present. Our own working hypothesis is that the analogy does not need to be extremely close,
and with this in mind in Chapter 16 we propose a virtual environment BlocksNBeadsWorld
that encompasses all the basic conceptual phenomena of real-world naive physics, but does not
attempt to emulate their details.
Framed in terms of human psychology rather than environment design, the question becomes:
At what level of detail must one model the physical world to understand the ways in
which human intelligence has adapted to the physical world?. Our suspicion, which underlies
our BlocksNBeadsWorld design, is that it’s approximately enough to have
• Newtonian physics, or some close approximation
• Matter in multiple phases and forms vaguely similar to the ones we see in the real world:
solid, liquid, gas, paste, goo, etc.
• Ability to transform some instances of matter from one form to another
• Ability to flexibly manipulate matter in various forms with various solid tools
• Ability to combine instances of matter into new ones in a fairly rich way: e.g. glue or tie
solids togethermix liquids together, etc.
• Ability to position instances of matter with respect to each other in a rich way: e.g. put
liquid in a solid cavity, cover something with a lid or a piece of fabric, etc.
It seems to us that if the above are present in an environment, then an AGI seeking to
achieve appropriate goals in that environment will be likely to form an appropriate “humanlike
physical-world intuition." We doubt that the specifics of the naive physics of different
forms of matter are critical to human-like intelligence. But, we suspect that a great amount
of unconscious human metaphorical thinking is conditioned on the fact that humans evolved
around matter that takes a variety of forms, can be changed from one form to another, and can
be fairly easily arranged and composited to form new instances from prior ones. Without many
diverse instances of matter transformation, arrangement and composition in its experience, an
AGI is unlikely to form an internal “metaphor-base” even vaguely similar to the human one –
so that, even if it’s highly intelligent, its thinking will be radically non-human-like in character.
Naturally this is all somewhat speculative and must be explored via experimentation. Maybe
an elaborate blocks-world with only solid objects will be sufficient to create human-level, roughly
human-like AGI with rich spatiotemporal and manipulative intuition. Or maybe human intelligence
is more closely adapted to the specifics of our physical world – with water and dirt and
plants and hair and so forth – than we currently realize. One thing that is very clear is that, as
we proceed with embodying, situating and educating our AGI systems, we need to pay careful
attention to the way their intelligence is conditioned by their environment.
9.5 Folk Psychology
Related to naive physics is the notion of “naive psychology” or “folk psychology” [Rav04], which
includes for instance the following aspects:
1. Mental simulation of other agents
2. Mental theory regarding other agents
3. Attribution of beliefs, desires and intentions (BDI) to other agents via theory or simulation
9.6 Body and Mind 171
4. Recognition of emotions in other agents via their physical embodiment
5. Recognition of desires and intentions in other agents via their physical embodiment
6. Analogical and contextual inferences between self and other, regarding BDI and other aspects
7. Attribute causes and meanings to other agents behaviors
8. Anthropomorphize non-human, including inanimate objects
The main special requirement placed on an AGI’s embodiment by the above aspects pertains
to the ability of agents to express their emotions and intentions to each other. Humans do this
via facial expressions, gestures and language.
9.5.1 Motivation, Requiredness, Value
Relatedly to folk psychology, Gestalt [Koh38] and ecological [Gib77, Gib79] psychology suggest
that humans perceive the world substantially in terms of the affordances it provides them for
goal-directed action. This suggests that, to support human-like intelligence, an AGI must be
capable of:
1. Perception of entities in the world as differentially associated with goal-relevant value
2. Perception of entities in the world in terms of the potential actions they afford the agent,
or other agents
The key point is that entities in the world need to provide a wide variety of ways for agents
to interact with them, enabling richly complex perception of affordances.
9.6 Body and Mind
The above discussion has focused on the world external to the body of the AGI agent embodied
and embedded in the world, but the issue of the AGI’s body also merits consideration. There
seems little doubt that a human’s intelligence is highly conditioned by the particularities of the
human body.
9.6.1 The Human Sensorium
Here the requirements seem fairly simple: while surely not strictly necessary, it would certainly
be preferable to provide an AGI with fairly rich analogues of the human senses of touch, sight,
sound, kinesthesia, taste and smell. Each of these senses provides different sorts of cognitive
stimulation to the human mind; and while similar cognitive stimulation could doubtless be
achieved without analogous senses, the provision of such seems the most straightforward approach.
It’s hard to know how much of human intelligence is specifically biased to the sorts of
outputs provided by human senses.
As vision already is accorded such a prominent role in the AI and cognitive science literature
– and is discussed in moderate depth in Chapter 26 of Part 2, we won’t take time elaborating
172 9 General Intelligence in the Everyday Human World
on the importance of vision processing for humanlike cognition. The key thing an AGI requires
to support humanlike “visual intelligence” is an environment containing a sufficiently robust
collection of materials that object and event recognition and identification become interesting
problems.
Audition is cognitively valuable for many reasons, one of which is that it gives a very rich
and precise method of sensing the world that is different from vision. The fact that humans can
display normal intelligence while totally blind or totally deaf is an indication that, in a sense,
vision and audition are redundant for understanding the everyday world. However, it may be
important that the brain has evolved to account for both of these senses, because this forced it
to account for the presence of two very rich and precise methods of sensing the world – which
may have forced it to develop more abstract representation mechanisms than would have been
necessary with only one such method.
Touch is a sense that is, in our view, generally badly underappreciated within the AI community.
In particular the cognitive robotics community seems to worry too little about the terribly
impoverished sense of touch possessed by most current robots (though fortunately there are
recent technologies that may help improve robots in this regard; see e.g. [Nan08]). Touch is how
the human infant learns to distinguish self from other, and in this way it is the most essential
sense for the establishment of an internal self-model. Touching others’ bodies is a key method
for developing a sense of the emotional reality and responsiveness of others, and is hence key to
the development of theory of mind and social understanding in humans. For this reason, among
others, human children lacking sufficient tactile stimulation will generally wind up badly impaired
in multiple ways. A good-quality embodiment should supply an AI agent with a body
that possesses skin, which has varying levels of sensitivity on different parts of the skin (so that
it can effectively distinguish between reality and its perception thereof in a tactile context);
and also varying types of touch sensors (e.g. temperature versus friction), so that it experiences
textures as multidimensional entities.
Related to touch, kinesthesia refers to direct sensation of phenomena happening inside the
body. Rarely mentioned in AI, this sense seems quite critical to cognition, as it underpins many
of the analogies between self and other that guide cognition. Again, it’s not important that an
AGI’s virtual body have the same internal body parts as a human body. But it seems valuable
to have the AGI’s virtual body display some vaguely human-body-like properties, such as feeling
internal strain of various sorts after getting exercise, feeling discomfort in certain places when
running out of energy, feeling internally different when satisfied versus unsatisfied, etc.
Next, taste is a cognitively interesting sense in that it involves the interplay between the
internal and external world; it involves the evaluation of which entities from the external world
are worthy of placing inside the body. And smell is cognitively interesting in large part because
of its relationship with taste. A smell is, among other things, a long-distance indicator of what
a certain entity might taste like. So, the combination of taste and smell provides means for
conceptualizing relationships between self, world and distance.
9.6.2 The Human Body’s Multiple Intelligences
While most unique aspect of human intelligence is rooted in what one might call the "cognitive
cortex" – the portions of the brain dealing with self-reflection and abstract thought. But the
cognitive cortex does its work in close coordination with the body’s various more specialized
9.6 Body and Mind 173
intelligent subsystems, including those associated with the gut, the heart, the liver, the immune
and endocrine systems, and the perceptual and motor cortices.
In the perspective underlying this book, the human cognitive cortex – or the core cognitive
network of any roughly human-like AGI system – should be viewed as a highly flexible, selforganizing
network. These cognitive networks are modelable e.g. as a recurrent neural net with
general topology, or a weighted labeled hypergraph, and are centrally concerned with recognizing
patterns in its environment and itself, especially patterns regarding the achievement of the
system’s goals in various appropriate contexts. Here we augment this perspective, noting that
the human brain’s cognitive network is closely coupled with a variety of simpler and more
specialized intelligent "body-system networks" which provide it with structural and dynamical
inductive biasing. We then discuss the implications of this observation for practical AGI design.
One recalls Pascal’s famous quote "The heart has its reasons, of which reason knows not."
As we now know, the intuitive sense that Pascal and so many others have expressed, that the
heart and other body systems have their own reasons, is grounded in the fact that they actually
do carry out simple forms of reasoning (i.e. intelligent, adaptive dynamics), in close, sometimes
cognitively valuable, coordination with the central cognitive network.
9.6.2.1 Some of the Human Body’s Specialized Intelligent Subsystems
The human body contains multiple specialized intelligences apart from the cognitive cortex.
Here we review some of the most critical.
Hierarchies of Visual and Auditory Perception
. The hierarchical structure of visual and auditory cortex has been taken by some researchers
[Kur12], [HB06] as the generic structure of cognition. While we suspect this is overstated, we
agree it is important that these cortices nudge large portions of the cognitive cortex to assume
an approximately hierarchical structure.
Olfactory Attractors
. The process of recognizing a familiar smell is grounded in a neural process similar to convergence
to an attractor in a nonlinear dynamical system [Fre95]. There is evidence that the
mammalian cognitive cortex evolved in close coordination with the olfactory cortex [Row11],
and much of abstract cognition reflects a similar dynamic of gradually coming to a conclusion
based on what initially "smells right."
Physical and Cognitive Action
. The cerebellum, a specially structured brain subsystem which controls motor movements,
has for some time been understood to also have involvement in attention, executive control,
language, working memory, learning, pain, emotion, and addiction [PSF09].
174 9 General Intelligence in the Everyday Human World
The Second Brain
. The gastrointestinal neural net contains millions of neurons and is capable of operating independently
of the brain. It modulates stress response and other aspects of emotion and motivation
based on experience – resulting in so-called "gut feelings" [Ger99].
The Heart’s Neural Network
. The heart has its own neural network, which modulates stress response, energy level and
relaxation/excitement (factors key to motivation and emotion) based on experience [Arm04].
Pattern Recognition and Memory in the Liver
. The liver is a complex pattern recognition system, adapting via experience to better identify
toxins [CB06]. Like the heart, it seems to store some episodic memories as well, resulting in liver
transplant recipients sometimes acquiring the tastes in music or sports of the donor [EMC12].
Immune Intelligence
. The immune network is a highly complex, adaptive self-organizing system, which ongoingly
solves the learning problem of identifying antigens and distinguishing them from the body
system [FP86]. As immune function is highly energetically costly, stress response involves subtle
modulation of the energy allocation to immune function, which involves communication between
neural and immune networks.
The Endocrine System: A Key Bridge Between Mind and Body
. The endocrine (hormonal) system regulates (and is related by) emotion, thus guiding all
aspects of intelligence (due to the close connection of emotion and motivation) [PH12].
Breathing Guides Thinking
. As oxygenation of the brain plays a key role in the spread of neural activity, the flow of breath
is a key driver of cognition. Forced alternate nostril breathing has been shown to significantly
affect cognition via balancing activity of the two brain hemispheres [SKBB91].
Much remains unknown, and the totality of feedback loops between the human cognitive
cortex and the various specialized intelligences operative throughout the human body, has not
yet been thoroughly charted.
9.6 Body and Mind 175
9.6.2.2 Implications for AGI
What lesson should the AGI developer draw from all this? The particularities of the human
mind/body should not be taken as general requirements for general intelligence. However, it
is worth remembering just how difficult is the computational problem of learning, based on
experiential feedback alone, the right way to achieve the complex goal of controlling a system
with general intelligence at the human level or beyond. To solve this problem without some sort
of strong inductive biasing may require massively more experience than young humans obtain.
Appropriate inductive bias may be embedded in an AGI system in many different ways.
Some AGI designers have sought to embed it very explicitly, e.g. with hand-coded declarative
knowledge as in Cyc, SOAR and other "GOFAI" type systems. On the other hand, the human
brain receives its inductive bias much more subtly and implicitly, both via the specifics of the
initial structure of the cognitive cortex, and via ongoing coupling of the cognitive cortex with
other systems possessing more focused types of intelligence and more specific structures and/or
dynamics.
In building an AGI system, one has four choices, very broadly speaking:
1. Create a flexible mind-network, as unbiased as feasible, and attempt to have it learn how
to achieve its goals via experience
2. Closely emulate key aspects of the human body along with the human mind
3. Imitate the human mind-body, conceptually if not in detail, and create a number of structurally
and dynamically simpler intelligent systems closely and appropriately coupled to
the abstract cognitive mind-network, provide useful inductive bias.
4. Find some other, creative way to guide and probabilistically constrain one’s AGI system’s
mind-network, providing inductive bias appropriate to the tasks at hand, without emulating
even conceptually the way the human mind-brain receives its inductive bias via coupling
with simpler intelligent systems.
Our suspicion is that the first option will not be viable. On the other hand, to do the second
option would require more knowledge of the human body than biology currently possesses. This
leaves the third and fourth options, both of which seem viable to us.
CogPrime incorporates a combination of the third and fourth options. CogPrime’s generic
dynamic knowledge store, the Atomspace, is coupled with specialized hierarchical networks
(DeSTIN) for vision and audition, somewhat mirroring the human cortex. An artificial endocrine
system for OpenCog is also under development, speculatively, as part of a project using
OpenCog to control humanoid robots. On the other hand, OpenCog has no gastrointestinal nor
cardiological nervous system, and the stress-response-based guidance provided to the human
brain by a combination of the heart, gut, immune system and other body systems, is achieved
in CogPrime in a more explicit way using the OpenPsi model of motivated cognition, and its
integration with the system’s attention allocation dynamics.
Likely there is no single correct way to incorporate the lessons of intelligent human bodysystem
networks into AGI designs. But these are aspects of human cognition that all AGI
researchers should be aware of.
176 9 General Intelligence in the Everyday Human World
9.7 The Extended Mind and Body
Finally, Hutchins [Hut95], Logan [Log07] and others have promoted a view of human intelligence
that views the human mind as extended beyond the individual body, incorporating social
interactions and also interactions with inanimate objects, such as tools, plants and animals.
This leads to a number of requirements for a humanlike AGI’s environment:
1. The ability to create a variety of different tools for interacting with various aspects of the
world in various different ways, including tools for making tools and ultimately machinery
2. The existence of other mobile, virtual life-forms in the world, including simpler and less
intelligent ones, and ones that interact with each other and with the AGI
3. The existence of organic growing structures in the world, with which the AGI can interact
in various ways, including halting their growth or modifying their growth pattern
How necessary these requirements are is hard to say – but it is clear that these things have
played a major role in the evolution of human intelligence.
9.8 Conclusion
Happily, this diverse chapter supports a simple, albeit tentative conclusion. Our suggestion is
that, if an AGI is
• placed in an environment capable of roughly supporting multimodal communication and
vaguely (but not necessarily precisely) real-world-ish naive physics
• surrounded with other intelligent agents of varying levels of complexity, and other complex,
dynamic structures to interface with
• given a body that can perceive this environment through some forms of sight, sound and
touch; and perceive itself via some form of kinesthesia
• given a motivational system that encourages it to make rich use of these aspects of its
environment
then the AGI is likely to have an experience-base reinforcing the key inductive biases provided
by the everyday world for the guidance of humanlike intelligence.
Chapter 10
A Mind-World Correspondence Principle
10.1 Introduction
Real-world minds are always adapted to certain classes of environments and goals. The ideas
of the previous chapter, regarding the connection between a human-like intelligence’s internals
and its environment, result from exploring the implications of this adaptation in the context
of the cognitive synergy concept. In this chapter we explore the mind-world connection in a
broader and more abstract way – making a more ambitious attempt to move toward a "general
theory of general intelligence."
One basic premise here, as in the preceding chapters is: Even a system of vast general
intelligence, subject to real-world space and time constraints, will necessarily be more efficient
at some kinds of learning than others. Thus, one approach to formulating a general theory of
general intelligence is to look at the relationship between minds and worlds – where a "world"
is conceived as an environment and a set of goals defined in terms of that environment.
In this spirit, we here formulate a broad principle binding together worlds and the minds that
are intelligent in these worlds. The ideas of the previous chapter constitute specific, concrete
instantiations of this general principle. A careful statement of the principle requires introduction
of a number of technical concepts, and will be given later on in the chapter. A crude, informal
version of the principle would be:
MIND-WORLD CORRESPONDENCE-PRINCIPLE
For a mind to work intelligently toward certain goals in a certain world, there should be a
nice mapping from goal-directed sequences of world-states into sequences of mind-states, where
"nice" means that a world-state-sequence W composed of two parts W 1 and W 2 , gets mapped
into a mind-state-sequence M composed of two corresponding parts M 1 and M 2 .
What’s nice about this principle is that it relates the decomposition of the world into parts,
to the decomposition of the mind into parts.
177
178 10 A Mind-World Correspondence Principle
10.2 What Might a General Theory of General Intelligence Look
Like?
It’s not clear, at this point, what a real "general theory of general intelligence" would look like
– but one tantalizing possibility is that it might confront the two questions:
• How does one design a world to foster the development of a certain sort of mind?
• How does one design a mind to match the particular challenges posed by a certain sort of
world?
One way to achieve this would be to create a theory that, given a description of an environment
and some associated goals, would output a description of the structure and dynamics that a
system should possess to be intelligent in that environment relative to those goals, using limited
computational resources.
Such a theory would serve a different purpose from the mathematical theory of "universal
intelligence" developed by Marcus Hutter [Hut05] and others. For all its beauty and theoretical
power, that approach currently gives it useful conclusions only about general intelligences
with infinite or infeasibly massive computational resources. On the other hand, the approach
suggested here is aimed toward creation of a theory of real-world general intelligences utilizing
realistic amounts of computational power, but still possessing general intelligence comparable
to human beings or greater.
This reflects a vision of intelligence as largely concerned with adaptation to particular classes
of environments and goals. This may seem contradictory to the notion of "general" intelligence,
but I think it actually embodies a realistic understanding of general intelligence. Maximally
general intelligence is not pragmatically feasible; it could only be achieved using infinite computational
resources [Hut05]. Real-world systems are inevitably limited in the intelligence they
can display in any real situation, because real situations involve finite resources, including finite
amounts of time. One may say that, in principle, a certain system could solve any problem
given enough resources and time but, even when this is true, it’s not necessarily the most interesting
way to look at the system’s intelligence. It may be more important to look at what a
system can do given the resources at its disposal in reality. And this perspective leads one to
ask questions like the ones posed above: which bounded-resources systems are well-disposed to
display intelligence in which classes of situations?
As noted in Chapter 7 above, one can assess the generality of a system’s intelligence via
looking at the entropy of the class of situations across which it displays a high level of intelligence
(where “high” is measured relative to its total level of intelligence across all situations). A system
with a high generality of intelligence will tend to be roughly equally intelligent across a wide
variety of situations; whereas a system with lower generality of intelligence will tend to be much
more intelligent in a small subclass of situations, than in any other. The definitions given above
embody this notion in a formal and quantitative way.
If one wishes to create a general theory of general intelligence according to this sort of
perspective, the main question then becomes how to represent goals/environments and systems
in such a way as to render transparent the natural correspondence between the specifics of the
former and the latter, in the context of resource-bounded intelligence. This is the business of
the next section.
10.3 Steps Toward A (Formal) General Theory of General Intelligence 179
10.3 Steps Toward A (Formal) General Theory of General Intelligence
Now begins the formalism. At this stage of development of the theory proposed in this chapter,
mathematics is used mainly as a device to ensure clarity of expression. However, once the theory
is further developed, it may possibly become useful for purposes of calculation as well.
Suppose one has any system S (which could be an AI system, or a human, or an environment
that a human or AI is interacting with, or the combination of an environment and a human or
AI’s body, etc.). One may then construct an uncertain transition graph associated with that
system S, in the following way:
• The nodes of the graph represent fuzzy sets of states of system S (I’ll call these state-sets
from here on, leaving the fuzziness implicit)