• Dynamics that add new Nodes or Links to a hypergraph, or remove existing ones. 13.3 Atoms: Their Types and Weights This section reviews a variety of CogPrime Atom types and gives simple examples of each of them. The Atom types considered are drawn from those currently in use in the OpenCog system. This does not represent a complete list of Atom types referred to in the text of this book, nor a complete list of those used in OpenCog currently (though it does cover a substantial majority of those used in OpenCog currently, omitting only some with specialized importance or intended only for temporary use). The partial nature of the list given here reflects a more general point: The specific collection of Atom types in an OpenCog system is bound to change as the system is developed and experiment with. CogPrime specifies a certain collection of representational approaches and cognitive algorithms for acting on them; any of these approaches and algorithms may be implemented with a variety of sets of Atom types. The specific set of Atom types in the OpenCog system currently does not necessarily have a profound and lasting significance – the list might look a bit different five years from time of writing, based on various detailed changes. The treatment here is informal and intended to get across the general idea of what each Atom type does. A longer and more formal treatment of the Atom types is given in Part II, beginning in Chapter 20. 13.3.1 Some Basic Atom Types We begin with ConceptNode – and note that a ConceptNode does not necessarily refer to a whole concept, but may refer to part of a concept – it is essentially a "basic semantic node" whose meaning comes from its links to other Atoms. It would be more accurately, but less tersely, named "concept or concept fragment or element node." A simple example would be a ConceptNode grouping nodes that are somehow related, e.g. ConceptNode: C InheritanceLink (ObjectNode: BW) C InheritanceLink (ObjectNode: BP) C InheritanceLink (ObjectNode: BN) C ReferenceLink BW (PhraseNode "Ben’s watch") ReferenceLink BP (PhraseNode "Ben’s passport") ReferenceLink BN (PhraseNode "Ben’s necklace") 248 13 Local, Global and Glocal Knowledge Representation indicates the simple and uninteresting ConceptNode grouping three objects owned by Ben (note that the above-given Atoms don’t indicate the ownership relationship, they just link the three objects with textual descriptions). In this example, the ConceptNode links transparently to physical objects and English descriptions, but in general this won’t be the case – most ConceptNodes will look to the human eye like groupings of links of various types, that link to other nodes consisting of groupings of links of various types, etc. There are Atoms referring to basic, useful mathematical objects, e.g. NumberNodes like NumberNode #4 NumberNode #3.44 The numerical value of a NumberNode is explicitly referenced within the Atom. A core distinction is made between ordered links and unordered links; these are handled differently in the Atomspace software. A basic unordered link is the SetLink, which groups its arguments into a set. For instance, the ConceptNode C defined by ConceptNode C MemberLink A C MemberLink B C is equivalent to SetLink A B On the other hand, ListLinks are like SetLinks but ordered, and they play a fundamental role due to their relationship to predicates. Most predicates are assumed to take ordered arguments, so we may say e.g. EvaluationLink PredicateNode eat ListLink ConceptNode cat ConceptNode mouse to indicate that cats eat mice. Note that by an expression like ConceptNode cat is meant ConceptNode C ReferenceLink W C WordNode W #cat since it’s WordNodes rather than ConceptNodes that refer to words. (And note that the strength of the ReferenceLink would not be 1 in this case, because the word "cat" has multiple senses.) However, there is no harm nor formal incorrectness in the "ConceptNode cat" usage, since "cat" is just as valid a name for a ConceptNode as, say, "C." We’ve already introduced above the MemberLink, which is a link joining a member to the set that contains it. Notable is that the truth value of a MemberLink is fuzzy rather than probabilistic, and that PLN is able to inter-operate fuzzy and probabilistic values. SubsetLinks also exist, with the obvious meaning, e.g. ConceptNode cat ConceptNode animal SubsetLink cat animal 13.3 Atoms: Their Types and Weights 249 Note that SubsetLink refers to a purely extensional subset relationship, and that InheritanceLInk should be used for the generic "intensional + extensional" analogue of this – more on this below. SubsetLink could more consistently (with other link types) be named ExtensionalInheritanceLink, but SubsetLink is used because it’s shorter and more intuitive. There are links representing Boolean operations AND, OR and NOT. For instance, we may say ImplicationLink ANDLink ConceptNode young ConceptNode beautiful ConceptNode attractive or, using links and VariableNodes instead of ConceptNodes, AverageLink $X ImplicationLink ANDLink EvaluationLink young $X EvaluationLink beautiful $X EvaluationLink attractive $X NOTLink is a unary link, so e.g. we might say AverageLink $X ImplicationLink ANDLink EvaluationLink young $X EvaluationLink beautiful $X EvaluationLink NOT EvaluationLink poor $X EvaluationLink attractive $X ContextLink allows explicit contextualization of knowledge, which is used in PLN, e.g. ContextLink ConceptNode golf InheritanceLink ObjectNode BenGoertzel ConceptNode incompetent says that Ben Goertzel is incompetent in the context of golf. 13.3.2 Variable Atoms We have already introduced VariableNodes above; it’s also possible to specify the type of a VariableNode via linking it to a VariableTypeNode via a TypedVariableLink, e.g. VariableTypeLink VariableNode $X VariableTypeNode ConceptNode which specifies that the variable $X should be filled with a ConceptNode. Variables are handled via quantifiers; the default quantifier being the AverageLink, so that the default interpretation of 250 13 Local, Global and Glocal Knowledge Representation ImplicationLink InheritanceLink $X animal EvaluationLink PredicateNode: eat ListLink \$X ConceptNode: food is AverageLink $X ImplicationLink InheritanceLink $X animal EvaluationLink PredicateNode: eat ListLink \$X ConceptNode: food The AverageLink invokes an estimation of the average TruthValue of the embedded expression (in this case an ImplicationLink) over all possible values of the variable $X. If there are type restrictions regarding the variable $X, these are taken into account in conducting the averaging. For AllLink and Exist s-Link may be used in the same places as AverageLink, with uncertain truth value semantics defined in PLN theory using third-order probabilities. There is also a ScholemLink used to indicate variable dependencies for existentially quantified variables, used in cases of multiply nested existential quantifiers. EvaluationLink and MemberLink have overlapping semantics, allowing expression of the same conceptual/logical relationships in terms of predicates or sets, i.e. EvaluationLink PredicateNode: eat ListLink $X ConceptNode: food has the same semantics as MemberLink ListLink $X ConceptNode: food ConceptNode: EatingEvents The relation between the predicate "eat" and the concept "EatingEvents" is formally given by ExtensionalEquivalenceLink ConceptNode: EatingEvents SatisfyingSetLink PredicateNode: eat In other words, we say that "EatingEvents" is the SatisfyingSet of the predicate "eat": it is the set of entities that satisfy the predicate "eat". Note that the truth values of MemberLink and EvaluationLink are fuzzy rather than probabilistic. 13.3 Atoms: Their Types and Weights 251 13.3.3 Logical Links There is a host of link types embodying logical relationships as defined in the PLN logic system, e.g. • InheritanceLink • SubsetLink (aka ExtensionalInheritanceLink) • Intensional InheritanceLink which embody different sorts of inheritance, e.g. SubsetLink salmon fish IntensionalInheritanceLink whale fish InheritanceLink fish animal and then • SimilarityLink • ExtensionalSimilarityLink • IntensionalSimilarityLink which are symmetrical versions, e.g. SimilaritytLink shark barracuda IntensionalSimilarityLink shark dolphin ExtensionalSimiliarityLink American obese\_person There are also higher-order versions of these links, both asymmetric • ImplicationLink • ExtensionalImplicationLink • IntensionalImplicationLink and symmetric • EquivalenceLink • ExtensionalEquivalenceLink • IntensionalEquivalenceLink These are used between predicates and links, e.g. ImplicationLink EvaluationLink eat ListLink $X dirt EvaluationLink feel ListLInk $X sick or 252 13 Local, Global and Glocal Knowledge Representation ImplicationLink EvaluationLink eat ListLink $X dirt InheritanceLink $X sick or ForAllLink $X, $Y, $Z ExtensionalEquivalenceLink EquivalenceLink $Z EvaluationLink + ListLink $X $Y EquivalenceLink $Z EvaluationLink + ListLink $Y $X Note, the latter is given as an extensional equivalence because it’s a pure mathematical equivalence. This is not the only case of pure extensional equivalence, but it’s an important one. 13.3.4 Temporal Links There are also temporal versions of these links, such as • PredictiveImplicationLink • PredictiveAttractionLink • SequentialANDLink • SimultaneousANDLink which combine logical relation between the argument with temporal relation between their arguments. For instance, we might say PredictiveImplicationLink PredicateNode: JumpOffCliff PredicateNode: Dead or including arguments, PredictiveImplicationLink EvaluationLink JumpOffCliff $X EvaluationLink Dead $X The former version, without variable arguments given, shows the possibility of using higherorder logical links to join predicates without any explicit variables. Via using this format exclusively, one could avoid VariableAtoms entirely, using only higher-order functions in the manner 13.3 Atoms: Their Types and Weights 253 of pure functional programming formalisms like combinatory logic. However, this purely functional style has not proved convenient, so the Atomspace in practice combines functional-style representation with variable-based representation. Temporal links often come with specific temporal quantification, e.g. PredictiveImplicationLink <5 seconds> EvaluationLink JumpOffCliff $X EvaluationLink Dead $X indicating that the conclusion will generally follow the premise within 5 seconds. There is a system for managing fuzzy time intervals and their interrelationships, based on a fuzzy version of Allen Interval Algebra. SequentialANDLink is similar to PredictiveImplicationLink but its truth value is calculated differently. The truth value of SequentialANDLink <5 seconds> EvaluationLink JumpOffCliff $X EvaluationLink Dead $X indicates the likelihood of the sequence of events occurring in that order, with gap lying within the specified time interval. The truth value of the PredictiveImplicationLink version indicates the likelihood of the second event, conditional on the occurrence of the first event (within the given time interval restriction). There are also links representing basic temporal relationships, such as BeforeLink and AfterLink. These are used to refer to specific events, e.g. if X refers to the event of Ben waking up on July 15 2012, and Y refers to the event of Ben getting out of bed on July 15 2012, then one might have AfterLink X Y And there are TimeNodes (representing time-stamps such as temporal moments or intervals) and AtTimeLinks, so we may e.g. say AtTimeLink X TimeNode: 8:24AM Eastern Standard Time, July 15 2012 AD 13.3.5 Associative Links There are links representing associative, attentional relationships, • HebbianLink • AsymmetricHebbianLink • InverseHebbianLink • SymmetricInverseHebbianLink These connote associations between their arguments, i.e. they connote that the entities represented by the two argument occurred in the same situation or context, for instance HebbianLink happy smiling AsymmetricHebbianLink dead rotten InverseHebbianLink dead breathing 254 13 Local, Global and Glocal Knowledge Representation The asymmetric HebbianLink indicates that when the first argument is present in a situation, the second is also often present. The symmetric (default) version indicates that this relationship holds in both directions. The inverse versions indicate the negative relationship: e.g. when one argument is present in a situation, the other argument is often not present. 13.3.6 Procedure Nodes There are nodes representing various sorts of procedures; these are kinds of ProcedureNode, e.g. • SchemaNode, indicating any procedure • GroundedSchemaNode, indicating any procedure associated in the system with a Combo program or C++ function allowing the procedure to be executed • PredicateNode, indicating any predicate that associates a list of arguments with an output truth value • GroundedPredicateNode, indicating a predicate associated in the system with a Combo program or C++ function allowing the predicate’s truth value to be evaluated on a given specific list of arguments ExecutionLinks and EvaluationLinks record the activity of SchemaNodes and PredicateNodes. We have seen many examples of EvaluationLinks in the above. Example ExecutionLinks would be: ExecutionLink step\_forward ExecutionLink step\_forward 5 ExecutionLink + ListLink NumberNode: 2 NumberNode: 3 The first example indicates that the schema "step forward" has been executed. The second example indicates that it has been executed with an argument of "5" (meaning, perhaps, that 5 steps forward have been attempted). The last example indicates that the "+" schema has been executed on the argument list (2,3), presumably resulting in an output of 5. The output of a schema execution may be indicated using an ExecutionOutputLink, e.g. ExecutionOutputLink + ListLink NumberNode: 2 NumberNode: 3 refers to the value "5" (as a NumberNode). 13.3.7 Links for Special External Data Types Finally, there are also Atom types referring to specific types of data important to using OpenCog in specific contexts. 13.3 Atoms: Their Types and Weights 255 For instance, there are Atom types referring to general natural language data types, such as • WordNode • SentenceNode • WordInstanceNode • DocumentNode plus more specific ones referring to relationships that are part of link-grammar parses of sentences • FeatureNode • FeatureLink • LinkGrammarRelationshipNode • LinkGrammarDisjunctNode or RelEx semantic interpretations of sentences • DefinedLinguisticConceptNode • DefinedLinguisticRelationshipNode • PrepositionalRelationshipNode There are also Atom types corresponding to entities important for embodying OpenCog in a virtual world, e.g. • ObjectNode • AvatarNode • HumanoidNode • UnknownObjectNode • AccessoryNode 13.3.8 Truth Values and Attention Values CogPrime Atoms (Nodes and Links) are quantified with truth values that, in their simplest form, have two components, one representing probability (strength) and the other representing weight of evidence; and also with attention values that have two components, short-term and long-term importance, representing the estimated value of the Atom on immediate and longterm time-scales. In practice many Atoms are labeled with CompositeTruthValues rather than elementary ones. A composite truth value contains many component truth values, representing truth values of the Atom in different contexts and according to different estimators. It is important to note that the CogPrime declarative knowledge representation is neither a neural net nor a semantic net, though it does have some commonalities with each of these traditional representations. It is not a neural net because it has no activation values, and involves no attempts at low-level brain modeling. However, attention values are very loosely analogous to time-averages of neural net activations. On the other hand, it is not a semantic net because of the broad scope of the Atoms in the network: for example, Atoms may represent percepts, procedures, or parts of concepts. Most CogPrime Atoms have no corresponding English label. However, most CogPrime Atoms do have probabilistic truth values, allowing logical semantics. 256 13 Local, Global and Glocal Knowledge Representation 13.4 Knowledge Representation via Attractor Neural Networks Now we turn to global, implicit knowledge representation – beginning with formal neural net models, briefly discussing the brain, and then turning back to CogPrime. Firstly, this section reviews some relevant material from the literature regarding the representation of knowledge using attractor neural nets. It is a mix of well-established fact with more speculative material. 13.4.1 The Hopfield neural net model Hopfield networks [Hop82] are attractor neural networks often used as associative memories. A Hopfield network with N neurons can be trained to store a set of bipolar patterns P , where each pattern p has N bipolar (±1) values. A Hopfield net typically has symmetric weights with no self-connections. The weight of the connection between neurons i and j is denoted by w ij . In order to apply a Hopfield network to a given input pattern p, its activation state is set to the input pattern, and neurons are updated asynchronously, in random order, until the network converges to the closest fixed point. An often-used activation function for a neuron is: ∑ y i = sign(p i w ij y j ) Training a Hopfield network, therefore, involves finding a set of weights w ij that stores the training patterns as attractors of its network dynamics, allowing future recall of these patterns from possibly noisy inputs. Originally, Hopfield used a Hebbian rule to determine weights: w ij = j≠i P∑ p i p j p=1 Typically, Hopfield networks are fully connected. Experimental evidence, however, suggests that the majority of the connections can be removed without significantly impacting the network’s capacity or dynamics. Our experimental work uses sparse Hopfield networks. 13.4.1.1 Palimpsest Hopfield nets with a modified learning rule In [SV99] a new learning rule is presented, which both increases the Hopfield network capacity and turns it into a “palimpsest”, i.e., a network that can continuously learn new patterns, while forgetting old ones in an orderly fashion. Using this new training rule, weights are initially set to zero, and updated for each new pattern p to be learned according to: 13.4 Knowledge Representation via Attractor Neural Networks 257 N∑ h ij = w ik p k k=1,k≠i,j ∆w ij = 1 n (p ip j − h ij p j − h ji p i ) 13.4.2 Knowledge Representation via Cell Assemblies Hopfield nets and their ilk play a dual role: as computational algorithms, and as conceptual models of brain function. In CogPrime they are used as inspiration for slightly different, artificial economics based computational algorithms; but their hypothesized relevance to brain function is nevertheless of interest in a CogPrime context, as it gives some hints about the potential connection between low-level neural net mechanics and higher-level cognitive dynamics. Hopfield nets lead naturally to a hypothesis about neural knowledge representation, which holds that a distinct mental concept is represented in the brain as either: 1. a set of “cell assemblies”, where each assembly is a network of neurons that are interlinked in such a way as to fire in a (perhaps nonlinearly) synchronized manner 2. a distinct temporal activation pattern, which may occur in any one (or more) of a particular set of cell assemblies For instance, this hypothesis is perfectly coherent if one interprets a “mental concept” as a SMEPH (defined in Chapter 14) ConceptNode, i.e. a fuzzy set of perceptual stimuli to which the organism systematically reacts in different ways. Also, although we will focus mainly on declarative knowledge here, we note that the same basic representational ideas can be applied to procedural and episodic knowledge: these may be hypothesized to correspond to temporal activation patterns as characterized above. In the biology literature, perhaps the best-articulated modern theories championing the cell assembly view are those of Gunther Palm [Pal82, HAG07] and Susan Greenfield [SF05, CSG07]. Palm focuses on the dynamics of the formation and interaction assemblies of cortical columns. Greenfield argues that each concept has a core cell assembly, and that when the concept rises to the focus of attention, it recruits a number of other neurons beyond its core characteristic assembly into a “transient ensemble.” 1 It’s worth noting that there may be multiple redundant assemblies representing the same concept – and potentially recruiting similar transient assemblies when highly activated. The importance of repeated, slightly varied copies of the same subnetwork has been emphasized by Edelman [Ede93] among other neural theorists. 1 The larger an ensemble is, she suggests, the more vivid it is as a conscious experience; an hypothesis that accords well with the hypothesis made in [Goe06b] that a more informationally intense pattern corresponds to a more intensely conscious quale – but we don’t need to digress extensively onto matters of consciousness for the present purposes. 258 13 Local, Global and Glocal Knowledge Representation 13.5 Neural Foundations of Learning Now we move from knowledge representation to learning – which is after all nothing but the adaptation of represented knowledge based on stimulus, reinforcement and spontaneous activity. While our focus in this chapter is on representation, it’s not possible for us to make our points about glocal knowledge representation in neural net type systems without discussing some aspects of learning in these systems. 13.5.1 Hebbian Learning The most common and plausible assumption about learning in the brain is that synaptic connections between neurons are adapted via some variant of Hebbian learning. The original Hebbian learning rule, proposed by Donald Hebb in his 1949 book [Heb49], was roughly 1. The weight of the synapse x → y increases if x and y fire at roughly the same time 2. The weight of the synapse x → y decreases if x fires at a certain time but y does not Over the years since Hebb’s original proposal, many neurobiologists have sought evidence that the brain actually uses such a method. One of the things they have found, so far, is a lot of evidence for the following learning rule [DC02, LS05]: 1. The weight of the synapse x → y increases if x fires shortly before y does 2. The weight of the synapse x → y decreases if x fires shortly after y does The new thing here, not foreseen by Donald Hebb, is the “postsynaptic depression” involved in rule component 2. Now, the simple rule stated above does not sum up all the research recently done on Hebbiantype learning mechanisms in the brain. The real biological story underlying these approximate rules is quite complex, involving many particulars to do with various neurotransmitters. Illunderstood details aside, however, there is an increasing body of evidence that not only does this sort of learning occur in the brain, but it leads to distributed experience-based neural modification: that is, one instance synaptic modification causes another instance of synaptic modification, which causes another, and so forth 2 [Bi01]. 13.5.2 Virtual Synapses and Hebbian Learning Between Assemblies Hebbian learning is conventionally formulated in terms of individual neurons, but, it can be extended naturally to assemblies via defining “virtual synapses” between assemblies. Since assemblies are sets of neurons, one can view a synapse as linking two assemblies if it links two neurons, each of which is in one of the assemblies. One can then view two assemblies as being linked by a bundle of synapses. We can define the weight of the synaptic bundle from assembly A1 to assembly A2 as the number w so that (the change 2 This has been observed in “model systems” consisting of neurons extracted from a brain and hooked together in a laboratory setting and monitored; measurement of such dynamics in vivo is obviously more difficult. 13.5 Neural Foundations of Learning 259 in the mean activation of A2 that occurs at time t+epsilon) is on average closest to w × (the amount of energy flowing through the bundle from A1 to A2 at time t). So when A1 sends an amount x of energy along the synaptic bundle pointing from A1 to A2, then A2’s mean activation is on average incremented/decremented by an amount w × x. In a similar way, one can define the weight of a bundle of synapses between a certain static or temporal activation-pattern P1 in assembly A1, and another static or temporal activationpattern P2 in assembly A2. Namely, this may be defined as the number w so that (the amount of energy flowing through the bundle from A1 to A2 at time t)×w best approximates (the probability that P2 is present in A2 at time t+epsilon), when averaged over all times t during which P1 is present in A1. It is not hard to see that Hebbian learning on real synapses between neurons implies Hebbian learning on these virtual synapses between cell assemblies and activation-patterns. These ideas may be developed further to build a connection between neural knowledge representation and probabilistic logical knowledge representation such as is used in CogPrime’s Probabilistic Logic Networks formalism; this connection will be pursued at the end of Chapter 34, once more relevant background has been presented. 13.5.3 Neural Darwinism A notion quite similar to Hebbian learning between assemblies has been pursued by Nobelist Gerald Edelman in his theory of neuronal group selection, or “Neural Darwinism.” Edelman won a Nobel Prize for his work in immunology, which, like most modern immunology, was based on C. MacFarlane Burnet’s theory of “clonal selection” [Bur62], which states that antibody types in the mammalian immune system evolve by a form of natural selection. From his point of view, it was only natural to transfer the evolutionary idea from one mammalian body system (the immune system) to another (the brain). The starting point of Neural Darwinism is the observation that neuronal dynamics may be analyzed in terms of the behavior of neuronal groups. The strongest evidence in favor of this conjecture is physiological: many of the neurons of the neocortex are organized in clusters, each one containing say 10,000 to 50,000 neurons each. Once one has committed oneself to looking at such groups, the next step is to ask how these groups are organized, which leads to Edelman’s concept of “maps.” A “map,” in Edelman’s terminology, is a connected set of groups with the property that when one of the inter-group connections in the map is active, others will often tend to be active as well. Maps are not fixed over the life of an organism. They may be formed and destroyed in a very simple way: the connection between two neuronal groups may be “strengthened” by increasing the weights of the neurons connecting the one group with the other, and “weakened” by decreasing the weights of the neurons connecting the two groups. If we replace “map” with “cell assembly” we arrive at a concept very similar to the one described in the previous subsection. Edelman then makes the following hypothesis: the large-scale dynamics of the brain is dominated by the natural selection of maps. Those maps which are active when good results are obtained are strengthened, those maps which are active when bad results are obtained are weakened. And maps are continually mutated by the natural chaos of neural dynamics, thus providing new fodder for the selection process. By use of computer simulations, Edelman and his colleagues have shown that formal neural networks obeying this rule can carry out fairly compli- 260 13 Local, Global and Glocal Knowledge Representation cated acts of perception. In general-evolution language, what is posited here is that organisms like humans contain chemical signals that signify organism-level success of various types, and that these signals serve as a “fitness function” correlating with evolutionary fitness of neuronal maps. In Neural Darwinism and his other related books and papers, Edelman goes far beyond this crude sketch and presents neuronal group selection as a collection of precise biological hypotheses, and presents evidence in favor of a number of these hypotheses. However, we consider that the basic concept of neuronal group selection is largely independent of the biological particularities in terms of which Edelman has phrased it. We suspect that the mutation and selection of “transformations” or “maps” is a necessary component of the dynamics of any intelligent system. As we will see later on (e.g. in Chapter 42 of Part 2, this business of maps is extremely important to CogPrime. CogPrime does not have simulated biological neurons and synapses, but it does have Nodes and Links that in some contexts play loosely similar roles. We sometimes think of CogPrime Nodes and Links as being very roughly analogous to Edelman’s neuronal clusters, and emergent intercluster links. And we have maps among CogPrime Nodes and Links, just as Edelman has maps among his neuronal clusters. Maps are not the sole bearers of meaning in CogPrime, but they are significant ones. There is a very natural connection between Edelman-style brain evolution and the ideas about cognitive evolution presented in Chapter 3. Edelman proposes a fairly clear mechanism via which patterns that survive a while in the brain are differentially likely to survive a long time: this is basic Hebbian learning, which in Edelman’s picture plays a role between neuronal groups. And, less directly, Edelman’s perspective also provides a mechanism by which intense patterns will be differentially selected in the brain: because on the level of neural maps, pattern intensity corresponds to the combination of compactness and functionality. Among a number of roughly equally useful maps serving the same function, the more compact one will be more likely to survive over time, because it is less likely to be disrupted by other brain processes (such as other neural maps seeking to absorb its component neuronal groups into themselves). Edelman’s neuroscience remains speculative, since so much remains unknown about human neural structure and dynamics; but it does provide a tentative and plausible connection between evolutionary neurodynamics and the more abstract sort of evolution that patternist philosophy posits to occur in the realm of mind-patterns. 13.6 Glocal Memory A glocal memory is one that transcends the global/local dichotomy and incorporates both aspects in a tightly interconnected way. Here we make the glocal memory concept more precise, and describe its incarnation in the context of attractor neural nets (which is similar to its incarnation in CogPrime, to be elaborated in later chapters). Though our main interest here is in glocality in CogPrime, we also suggest that glocality may be a critical property to consider when analyzing human, animal and AI memory more broadly. The notion of glocal memory has implicitly occurred in a number of prior brain theories (without use of the neologism “glocal”), e.g. [Cal96] and [Goe01], but it has not previously been explicitly developed. However the concept has risen to the fore in our recent AI work and so we have chosen to flesh it out more fully in [HG08], [GPI + 10] and the present section. 13.6 Glocal Memory 261 Glocal memory overcomes the dichotomy between localized memory (in which each memory item is stored in a single location within an overall memory structure) and distributed memory (in which a memory item is stored as an aspect of a multi-component memory system, in such a way that the same set of multiple components stores a large number of memories). In a glocal memory system, most memory items are stored both locally and globally, with the property that eliciting either one of the two records of an item tends to also elicit the other one. Glocal memory applies to multiple forms of memory; however we will focus largely on perceptual and declarative memory in our detailed analyses here, so as to conserve space and maintain simplicity of discussion. The central idea of glocal memory is that (perceptual, declarative, episodic, procedural, etc.) items may be stored in memory in the form of paired structures that are called (key, map) pairs. Of course the idea of a “pair” is abstract, and such pairs may manifest themselves quite differently in different sorts of memory systems (e.g. brains versus non-neuromorphic AI systems). The key is a localized version of the item, and records some significant aspects of the items in a simple and crisp way. The map is a dispersed, distributed version of the item, which represents the item as a (to some extent, dynamically shifting) combination of fragments of other items. The map includes the key as a subset; activation of the key generally (but not necessarily always) causes activation of the map; and changes in the memory item will generally involve complexly coordinated changes on the key and map level both. Memory is one area where animal brain architecture differs radically from the von Neumann architecture underlying nearly all contemporary general-purpose computers. Von Neumann computers separate memory from processing, whereas in the human brain there is no such distinction. In fact, it’s arguable that in most cases the brain contains no memory apart from processing: human memories are generally constructed in the course of remembering [Ros88], which gives human memory a strong capability for “filling in gaps” of remembered experience and knowledge; and also causes problems with inaccurate remembering in many contexts [BF71, RM95] We believe the constructive aspect of memory is largely associated with its glocality. The remainder of this section presents a fuller formalization of the glocal memory concept, which is then taken up further in three later chapters: • Chapter ?? discusses the potential implementation of glocal memory in the human brain • Chapter ?? discusses the implementation of glocal memory in attractor neural net systems • Chapter 23 presents Glocal Economic Attention Networks (ECANs), rough analogues of glocal Hopfield nets that play a central role in CogPrime. Our hypothesis of the potential general importance of glocality as a property of memory systems (beyond just the CogPrime architecture) – remains somewhat speculative. The presence of glocality in human and animal memory is strongly suggested but not firmly demonstrated by available neuroscience data; and the general value of glocality in the context of artificial brains and minds is also not yet demonstrated as the whole field of artificial brain and mind building remains in its infancy. However, the utility of glocal memory for CogPrime is not tied to this more general, speculative theme – glocality may be useful in CogPrime even if we’re wrong that it plays a significant role in the brain and in intelligent systems more broadly. 262 13 Local, Global and Glocal Knowledge Representation 13.6.1 A Semi-Formal Model of Glocal Memory To explain the notion of glocal memory more precisely, we will introduce a simple semi-formal model of a system S that uses a memory to record information relevant to the actions it carries out. The overall concept of glocal memory should not be considered as restricted to this particular model. This model is not intended for maximal generality, but is intended to encompass a variety of current AI system designs and formal neurological models. In this model, we will consider S’s memory subsystem as a set of objects we’ll call “tokens,” embedded in some metric space. The metric in the space, which we will call the “basic distance” of the memory, generally will not be defined in terms of the semantics of the items stored in the memory; though it may come to shape these dynamics through the specific architecture and evolution of the memory. Note that these tokens are not intended as generally being mapped one-to-one onto meaningful items stored in the memory. The “tokens” are the raw materials that the memory arranges in various patterns in order to store items. We assume that each token, at each point in time, may meaningfully be assigned a certain quantitative “activation level.” Also, tokens may have other numerical or discrete quantities associated with them, depending on the particular memory architecture. Finally, tokens may relate other tokens, so that optionally a token may come equipped with an (ordered or unordered) list of other tokens. To understand the meaning of the activation levels, one should think about S’s memory subsystem as being coupled with an action-selection subsystem, that dynamically chooses the actions to be taken by the overall system in which the two subsystems are embedded. Each combination of actions, in each particular type of context, will generally be associated with the activation of certain tokens in memory. Then, as analysts of the system S, we may associate each token T with an “activation vector” v(T, t), whose value for each discrete time t consists of the activation of the token T at time t. So, the 50 ′ th entry of the vector corresponds to the activation of the token at the 50 ′ th time step. “Items stored in memory” over a certain period of time, may then be defined as clusters in the set of activation vectors associated with memory during that period of time. Note that the system S itself may explicitly recognize and remember patterns regarding what items are stored in its memory – but, from an external analyst’s perspective, the set of items in S’s memory is not restricted to the ones that S has explicitly recognized as memory items. The “localization” of a memory item may be defined as the degree to which the various tokens involved in the item are close to each other according to the metric in the memory metric-space. This degree may be formalized in various ways, but choosing a particular quantitative measure is not important here. A highly localized item may be called “local” and a not-very-localized item may be called “global.” We may define the “activation distance” of two tokens as the distance between their activation vectors. We may then say that a memory is “well aligned” to the extent that there is a correlation between the activation distance of tokens, and the basic distance of the memory metric-space. Given the above set-up, the basic notion of glocal memory can be enounced fairly simply. A glocal memory is one: • that is reasonably well-aligned (i.e. the correlation between activation and basic distance is significantly greater than random) 13.6 Glocal Memory 263 • in which most memory items come in pairs, consisting of one local item and one global item, so that activation of the local item (the “key”) frequently leads in the near future to activation of the global item (the “map”) Obviously, in the scope of all possible memory structures constructible within the above formalism, glocal memories are going to be very rare and special. But, we suggest that they are important, because they are generally going to be the most effective way for intelligent systems to structure their memories. Note also that many memories without glocal structure may be “well-aligned” in the above sense. An example of a predominantly local memory structure, in which nearly all significant memory items are local according to the above definition, is the Cyc logical reasoning engine [LG90]. To cast the Cyc knowledge base in the present formal model, the tokens are logical predicates. Cyc does not have an in-built notion of activation, but one may conceive the activation of a logical formula in Cyc as the degree to which the formula is used in reasoning or query processing during a certain interval in time. And one may define a basic metric for Cyc by associating a predicate with its extension (the set of satisfying inputs), and defining the similarity of two predicates as the symmetric distance of their extensions. Cyc is reasonably well-aligned, but according to the dynamics of its querying and reasoning engines, it is basically a local memory structure without significant global memory structure. On the other hand, an example of a predominantly global memory structure, in which nearly all significant memory items are global according to the above definition, is the Hopfield associative memory network [Ami89]. Here memories are stored in the pattern of weights associated with synapses within a network of formal neurons, and each memory in general involves a large number of the neurons in the network. To cast the Hopfield net in the present formal model, the tokens are neurons and synapses; the activations are neural net activations; the basic distance between two neurons A and B may be defined as the percentage of the time that stimulating one of the neurons leads to the other one firing; and to calculate a basic distance involving a synapse, one may associate the synapse with its source and target neurons. With these definitions, a Hopfield network is a well-aligned memory, and (by intentional construction) a markedly global one. Local memory items will be very rare in a Hopfield net. While predominantly local and predominantly global memories may have great value for particular applications, our suggestion is that they also have inherent limitations. If so, this means that the most useful memories for general intelligence are going to be those that involve both local and global memory items in central roles. However, this is a more general and less risky claim than the assertion that glocal memory structure as defined above is important. Because, “glocal” as defined above doesn’t just mean “neither predominantly global nor predominantly local.” Rather, it refers to a specific pattern of coordination between local and global memory items – what we have called the “keys and maps” pattern. 13.6.2 Glocal Memory in the Brain Science’s understanding of human brain dynamics is still very primitive, one manifestation of which is the fact that we really don’t understand how the brain represents knowledge, except in some very simple respects. So anything anyone says about knowledge representation in the brain, at this stage, has to be considered highly speculative. Existing neuroscience knowledge 264 13 Local, Global and Glocal Knowledge Representation does imply constraints on how knowledge representation in the brain may work, but these are relatively loose constraints. These constraints do imply that, for instance, the brain is neither a relational database (in which information is stored in a wholly localized manner) nor a collection of “grandmother neurons” that respond individually to high-level percepts or concepts; nor a simple Hopfield type neural net (in which all memories are attractors globally distributed across the whole network). But they don’t tell us nearly enough to, for instance, create a formal neural net model that can confidently be said to represent knowledge in the manner of the human brain. As a first example of the current state of knowledge, we’ll discuss here a series of papers regarding the neural representation of visual stimuli [QaGKKF05, QKKF08], which deal with the fascinating discovery of a subset of neurons in the medial temporal lobe (MTL) that are selectively activated by strikingly different pictures of given individuals, landmarks or objects, and in some cases even by letter strings. For instance, in their 2005 paper titled ”Invariant visual representation by single neurons in the human brain”, it is noted that in one case, a unit responded only to three completely different images of the ex-president Bill Clinton. Another unit (from a different patient) responded only to images of The Beatles, another one to cartoons from The Simpson’s television series and another one to pictures of the basketball player Michael Jordan. Their 2008 follow-up paper backed away from the more extreme interpretation in the title as well as the conclusion, with the title “Sparse but not ‘Grandmother-cell’ coding in the medial temporal lobe.” As the authors emphasize there, Given the very sparse and abstract representation of visual information by these neurons, they could in principle be considered as ‘grandmother cells’. However, we give several arguments that make such an extreme interpretation unlikely. . . . MTL neurons are situated at the juncture of transformation of percepts into constructs that can be consciously recollected. These cells respond to percepts rather than to the detailed information falling on the retina. Thus, their activity reflects the full transformation that visual information undergoes through the ventral pathway. A crucial aspect of this transformation is the complementary development of both selectivity and invariance. The evidence presented here, obtained from recordings of single-neuron activity in humans, suggests that a subset of MTL neurons possesses a striking invariant representation for consciously perceived objects, responding to abstract concepts rather than more basic metric details. This representation is sparse, in the sense that responsive neurons fire only to very few stimuli (and are mostly silent except for their preferred stimuli), but it is far from a Grandmother-cell representation. The fact that the MTL represents conscious abstract information in such a sparse and invariant way is consistent with its prominent role in the consolidation of long-term semantic memories. It’s interesting to note how inadequate the [QKKF08] data really is for exploring the notion of glocal memory in the brain. Suppose it’s the case that individual visual memories correspond to keys consisting of small neuronal subnetworks, and maps consisting of larger neuronal subnetworks. Then it would be not at all surprising if neurons in the “key” network corresponding to a visual concept like “Bill Clinton’s face” would be found to respond differentially to the presentation of appropriate images. Yet, it would also be wrong to overinterpret such data as implying that the key network somehow comprises the “representation” of Bill Clinton’s face in the individual’s brain. In fact this key network would comprise only one aspect of said representation. In the glocal memory hypothesis, a visual memory like “Bill Clinton’s face” would be hypothesized to correspond to an attractor spanning a significant subnetwork of the individual’s brain 13.6 Glocal Memory 265 – but this subnetwork still might occupy only a small fraction of the neurons in the brain (say, 1/100 or less), since there are very many neurons available. This attractor would constitute the map. But then, there would be a much smaller number of neurons serving as key to unlock this map: i.e. if a few of these key neurons were stimulated, then the overall attractor pattern in the map as a whole would unfold and come to play a significant role in the overall brain activity landscape. In prior publications [Goe97] the primary author explored this hypothesis in more detail in terms of the known architecture of the cortex and the mathematics of complex dynamical attractors. So, one possible interpretation of the [QKKF08] data is that the MTL neurons they’re measuring are part of key networks that correspond to broader map networks recording percepts. The map networks might then extend more broadly throughout the brain, beyond the MTL and into other perceptual and cognitive areas of cortex. Furthermore, in this case, if some MTL key neurons were removed, the maps might well regenerate the missing keys (as would happen e.g. in the glocal Hopfield model to be discussed in the following section). Related and interesting evidence for glocal memory in the brain comes from a recent study of semantic memory, illustrated in Figure ?? [PNR07]. Their research probed the architecture of semantic memory via comparing patients suffering from semantic dementia (SD) with patients suffering from three other neuropathologies, and found reasonably convincing evidence for what they call a “distributed-plus-hub” view of memory. The SD patients they studied displayed highly distinctive symptomology; for instance, their vocabularies and knowledge of the properties of everyday objects were strongly impaired, whereas their memories of recent events and other cognitive capacities remain perfectly intact. These patients also showed highly distinctive patterns of brain damage: focal brain lesions in their anterior temporal lobes (ATL), unlike the other patients who had either less severe or more widely distributed damage in their ATLs. This led [PNR07] to conclude that the ATL (being adjacent to the amygdala and limbic systems that process reward and emotion; and the anterior parts of the medial temporal lobe memory system, which processes episodic memory) is a “hub” for amodal semantic memory, drawing general semantic information from episodic memories based on emotional salience. So, in this view, the memory of something like a “banana” would contain a distributed aspect, spanning multiple brain systems, and also a localized aspect, centralized in the ATL. The distributed aspect would likely contain information on various particular aspects of bananas, including their sights, smells, and touches, the emotions they evoke, and the goals and motivations they relate to. The distributed and localized aspects would influence one another dynamically, but, the data [PNR07] gathered do not address dynamics and they don’t venture hypotheses in this direction. There is a relationship between the “distributed-plus-hub” view and [Dam00] better-known notion of a “convergence zone”, defined roughly as a location where the brain binds features together. A convergence zone, in [Dam00] perspective, is not a “store” of information but an agent capable of decoding a signal (and of reconstructing information). He also uses the metaphor that convergence zones behave like indexes drawing information from other areas of the brain – but they are dynamic rather than static indices, containing the instructions needed to recognize and combine the features constituting the memory of something. The mechanism involved in the distributed-plus-hub model is similar to a convergence zone, but with the important difference that hubs are less local: [PNR07] semantic hub may be thought of a kind of “cluster of convergence zones” consisting of a network of convergence zones for various semantic memories. 266 13 Local, Global and Glocal Knowledge Representation Fig. 13.1: A Simplified Look at Feedback-Control in Uncertain Inference What is missing in [PNR07] and [Dam00] perspective is a vision of distributed memories as attractors. The idea of localized memories serving as indices into distributed knowledge stores is important, but is only half the picture of glocal memory: the creative, constructive, dynamical-attractor aspect of the distributed representation is the other half. The closest thing to a clear depiction of this aspect of glocal memory that seems to exist in the neuroscience literature is a portion of William Calvin’s theory of the “cerebral code” [Cal96]. Calvin proposes a set of quite specific mechanisms by which knowledge may be represented in the brain using complexly-structured strange attractors, and by which these strange attractors may be propagated throughout the brain. Figure 13.2 shows one aspect of his theory: how a distributed attractor may propagate from one part of the brain to another in pieces, with one portion of the attractor getting propagated first, and then seeding the formation in the destination brain region of a close approximation of the whole attractor. Calvin’s theory may be considered a genuinely glocal theory of memory. However, it also makes a large number of other specific commitments that are not part of the notion of glocality, such as his proposal of hexagonal meta-columns in the cortex, and his commitment to evolutionary learning as the primary driver of neural knowledge creation. We find these other 13.6 Glocal Memory 267 Fig. 13.2: Calvin’s Model of Distributed Attractors in the Brain hypotheses interesting and highly promising, yet feel it is also important to separate out the notion of glocal memory for separate consideration. Regarding specifics, our suggestion is that Calvin’s approach may overemphasize the distributed aspect of memory, not giving sufficient due to the relatively localized aspect as accounted for in the [QKKF08] results discussed above. In Calvin’s glocal approach, global memories are attractors and local memories are parts of attractors. We suggest a possible alternative, in which global memories are attractors and local memories are particular neuronal subnetworks such as the specialized ones identified by [QKKF08]. However, this alternative does not seem contradictory to Calvin’s overall conceptual approach, even though it is different from the particular proposals made in [Cal96]. The above paragraphs are far from a complete survey of the relevant neuroscience literature; there are literally dozens of studies one could survey pointing toward the glocality of various sorts of human memory. Yet experimental neuroscience tools are still relatively primitive, and every one of these studies could be interpreted in various other ways. In the next couple decades, as neuroscience tools improve in accuracy, our understanding of the role of glocality in human memory will doubtless improve tremendously. 268 13 Local, Global and Glocal Knowledge Representation 13.6.3 Glocal Hopfield Networks The ideas in the previous section suggest that, if one wishes to construct an AGI, it is worth seriously considering using a memory with some sort of glocal structure. One research direction that follows naturally from this notion is “glocal neural networks.” In order to explore the nature of glocal neural networks in a relatively simple and tractable setting, we have formalized and implemented simple examples of “glocal Hopfield networks”: palimpsest Hopfield nets with the addition of neurons representing localized memories. While these specific networks are not used in CogPrime, they are quite similar to the ECAN networks that are used in CogPrime and described in Chapter 23 of Part 2. Essentially, we augment the standard Hopfield net architecture by adding a set of “key neurons.” These are a small percentage of the neurons in the network, and are intended to be roughly equinumerous to the number of memories the network is supposed to store. When the Hopfield net converges to an attractor A, then new links are created between the neurons that are active in A, and one of the key neurons. Which key neuron is chosen? The one that, when it is stimulated, gives rise to an attractor pattern maximally similar to A. The ultimate result of this is that, in addition to the distributed memory of attractors in the Hopfield net, one has a set of key neurons that in effect index the attractors. Each attractor corresponds to a single key neuron. In the glocal memory model, the key neurons are the keys and the Hopfield net attractors are the maps. This algorithm has been tested in sparse Hopfield nets, using both standard Hopfield net learning rules and Storkey’s modified palimpsest learning rule [SV99], which provides greater memory capacity in a continuous learning context. The use of key neurons turns out to slightly increase Hopfield net memory capacity, but this isn’t the main point. The main point is that one now has a local representation of each global memory, so that if one wants to create a link between the memory and something else, it’s extremely easy to do so – one just needs to link to the corresponding key neuron. Or, rather, one of the corresponding key neurons: depending on how many key neurons are allocated, one might end up with a number of key neurons corresponding to each memory, not just one. In order to transform a palimpsest Hopfield net into a glocal Hopfield net, the following steps are taken: 1. Add a fixed number of “key neurons” to the network (removing other random neurons to keep the total number of neurons constant) 2. When the network reaches an attractor, create links from the elements in the attractor to one of the key neurons 3. The key neuron chosen for the previous step is the one that most closely matches the current attractor (which may be determined in several ways, to be discussed below) 4. To avoid the increase of the number of links in the network, when new links are created in Step 2, other key-neuron links are then deleted (several approaches may be taken here, but the simplest is to remove the key-neuron links with the lowest-absolute-value weights) In the simple implementation of the above steps that we implemented, and described in [GPI + 10], Step 3 is carried out simply by comparing the weights of a key neuron’s links to the nodes in an attractor. A more sophisticated approach would be to select the key neuron with the highest activation during the transient interval immediately prior to convergence to the attractor. 13.6 Glocal Memory 269 The result of these modifications to the ordinary Hopfield net, is a Hopfield net that continually maintains a set of key neurons, each of which individually represents a certain attractor of the net. Note that these key neurons – in spite of being “symbolic” in nature – are learned rather than preprogrammed, and are every bit as adaptive as the attractors they correspond to. Furthermore, if a key neuron is removed, the glocal Hopfield net algorithm will eventually learn it back, so the robustness properties of Hopfield nets are retained. The results of experimenting with glocal Hopfield nets of this nature are summarized in [GPI + 10]. We studied Hopfield nets with connectivity around .1, and in this context we found that glocality • slightly increased memory capacity • massively increased the rate of convergence to the attractor, i.e. the speed of recall However, probably the most important consequence of glocality is a more qualitative one: it makes it far easier to link the Hopfield net into a larger system, as would occur if the Hopfield net were embedded in an integrative AGI architecture. Because a neuron external to the Hopfield net may now link to a memory in the Hopfield net by linking to the corresponding key neuron. 13.6.4 Neural-Symbolic Glocality in CogPrime In CogPrime, we have explicitly sought to span the symbolic/emergentist pseudo-dichotomy, via creating an integrative knowledge representation that combines logic-based aspects with neural-net-like aspects. As reviewed in Chapter 6 above, these function not in the manner of multimodular systems, but rather via using (probabilistic) truth values and (attractor neural net like) attention values as weights on nodes and links of the same (hyper) graph. The nodes and links in this hypergraph are typed, like a standard semantic network approach for knowledge representation, so they’re able to handle all sorts of knowledge, from the most concrete perception and actuation related knowledge to the most abstract relationships. But they’re also weighted with values similar to neural net weights, and pass around quantities (importance values, discussed in Chapter 23 of Part 2) similar to neural net activations, allowing emergent attractor/assembly based knowledge representation similar to attractor neural nets. The concept of glocality lies at the heart of this combination, in a way that spans the pseudodichotomy: • Local knowledge is represented in abstract logical relationships stored in explicit logical form, and also in Hebbian-type associations between nodes and links. • Global knowledge is represented in large-scale patterns of node and link weights, which lead to large-scale patterns of network activity, which often take the form of attractors qualitatively similar to Hopfield net attractors. These attractors are called maps. The result of all this is that a concept like “cat” might be represented as a combination of: • A small number of logical relationships and strong associations, that constitute the “key” subnetwork for the “cat” concept. • A large network of weak associations, binding together various nodes and links of various types and various levels of abstraction, representing the “cat map”. 270 13 Local, Global and Glocal Knowledge Representation The activation of the key will generally cause the activation of the map, and the activation of a significant percentage of the map will cause the activation of the rest of the map, including the key. Furthermore, if the key were for some reason forgotten, then after a significant amount of effort, the system would likely to be able to reconstitute it (perhaps with various small changes) from the information in the map. We conjecture that this particular kind of glocal memory will turn out to be very powerful for AGI, due to its ability to combine the strengths of formal logical inference with those of self-organizing attractor neural networks. As a simple example, consider the representation of a “tower”, in the context of an artificial agent that has built towers of blocks, and seen pictures of many other kinds of towers, and seen some tall building that it knows are somewhat like towers but perhaps not exactly towers. If this agent is reasonably conceptually advanced (say, at Piagetan the concrete operational level) then its mind will contain some declarative relationships partially characterizing the concept of “tower,” as well as its sensory and episodic examples, and its procedural knowledge about how to build towers. The key of the “tower” concept in the agent’s mind may consist of internal images and episodes regarding the towers it knows best, the essential operations it knows are useful for building towers (piling blocks atop blocks atop blocks...), and the core declarative relations summarizing “towerness” – and the whole “tower” map then consists of a much larger number of images, episodes, procedures and declarative relationships connected to “tower” and other related entities. If any portion of the map is removed – even if the key is removed – then the rest of the map can be approximately reconstituted, after some work. Some cognitive operations are best done on the localized representation – e.g. logical reasoning. Other operations, such as attention allocation and guidance of inference control, are best done using the globalized map representation. Chapter 14 Representing Implicit Knowledge via Hypergraphs 14.1 Introduction Explicit knowledge is easy to write about and talk about; implicit knowledge is equally important, but tends to get less attention in discussions of AI and psychology, simply because we don’t have as good a vocabulary for describing it, nor as good a collection of methods for measuring it. One way to deal with this problem is to describe implicit knowledge using language and methods typically reserved for explicit knowledge. This might seem intrinsically non-workable, but we argue that it actually makes a lot of sense. The same sort of networks that a system like CogPrime uses to represent knowledge explicitly, can also be used to represent the emergent knowledge that implicitly exists in an intelligent system’s complex structures and dynamics. We’ve noted that CogPrime uses an explicit representation of knowledge in terms of weighted labeled hypergraphs; and also uses other more neural net like mechanisms (e.g. the economic attention allocation network subsystem) to represent knowledge globally and implicitly. Cog- Prime combines these two sorts of representation according to the principle we have called glocality. In this chapter we pursue glocality a bit further – describing a means by which even implicitly represented knowledge can be modeled using weighted labeled hypergraphs similar to the ones used explicitly in CogPrime. This is conceptually important, in terms of making clear the fundamental similarities and differences between implicit and explicit knowledge representation; and it is also pragmatically meaningful due to its relevance to the CogPrime methods described in Chapter 42 of Part 2 that transform implicit into explicit knowledge. To avoid confusion with CogPrime’s explicit knowledge representation, we will refer to the hypergraphs in this chapter as composed of Vertices and Edges rather than Nodes and Links. In prior publications we have referred to "derived" or "emergent" hypergraphs of the sort described here using the acronym SMEPH, which stands for Self-Modifying, Evolving Probabilistic Hypergraphs. 14.2 Key Vertex and Edge Types We begin by introducing a particular collection of Vertex and Edge types, to be used in modeling the internal structures of intelligent systems. The key SMEPH Vertex types are 271 272 14 Representing Implicit Knowledge via Hypergraphs • ConceptVertex, representing a set, for instance, an idea or a set of percepts • SchemaVertex, representing a procedure for doing something (perhaps something in the physical world, or perhaps an abstract mental action). The key SMEPH Edge types, using language drawn from Probabilistic Logic Networks (PLN) and elaborated in Chapter 34 below, are as follows: • ExtensionalInheritanceEdge (ExtInhEdge for short: an edge which, linking one Vertex or Edge to another, indicates that the former is a special case of the latter) • ExtensionalSimilarityEdge (ExtSim: which indicates that one Vertex or Edge is similar to another) • ExecutionEdge (a ternary edge, which joins S,B,C when S is a SchemaVertex and the result from applying S to B is C). So, in a SMEPH system, one is often looking at hypergraphs whose Vertices represent ideas or procedures, and whose Edges represent relationships of specialization, similarity or transformation among ideas and/or procedures. The semantics of the SMEPH edge types is given by PLN, but is simple and commonsensical. ExtInh and ExtSim Edges come with probabilistic weights indicating the extent of the relationship they denote (e.g. the ExtSimEdge joining the cat ConceptVertex to the dog ConceptVertex gets a higher probability weight than the one joining the cat ConceptVertex to the washing-machine ConceptVertex). The mathematics of transformations involving these probabilistic weights becomes quite involved - particularly when one introduces SchemaVertices corresponding to abstract mathematical operations, a step that enables SMEPH hypergraphs to have the complete mathematical power of standard logical formalisms like predicate calculus, but with the added advantage of a natural representation of uncertainty in terms of probabilities, as well as a natural representation of networks and webs of complex knowledge. 14.3 Derived Hypergraphs We now describe how SMEPH hypergraphs may be used to model and describe intelligent systems. One can (in principle) draw a SMEPH hypergraph corresponding to any individual intelligent system, with Vertices and Edges for the concepts and processes in that system’s mind. This is called the derived hypergraph of that system. 14.3.1 SMEPH Vertices A ConceptVertex in the derived hypergraph of a system corresponds to a structural pattern that persists over time in that system; whereas a SchemaVertex corresponds to a multi-timepoint dynamical pattern that recurs in that system’s dynamics. If one accepts the patternist definition of a mind as the set of patterns in an intelligent system, then it follows that the derived hypergraph of an intelligent system captures a significant fraction of the mind of that system. To phrase it a little differently, we may say that a ConceptVertex, in SMEPH, refers to the habitual pattern of activity observed in a system when some condition is met (this condition 14.3 Derived Hypergraphs 273 corresponding to the presence of a certain pattern). The condition may refer to something in the world external to the system, or to something internal. For instance, the condition may be observing a cat. In this case, the corresponding Concept vertex in the mind of Ben Goertzel is the pattern of activity observed in Ben Goertzel’s brain when his eyes are open and he’s looking in the direction of a cat. The notion of pattern of activity can be made rigorous using mathematical pattern theory, as is described in The Hidden Pattern [Goe06a]. Note that logical predicates, on the SMEPH level, appear as particular kinds of Concepts, where the condition involves a predicate and an argument. For instance, suppose one wants to know what happens inside Ben’s mind when he eats cheese. Then there is a Concept corresponding to the condition of cheese-eating activity. But there may also be a Concept corresponding to eating activity in general. If the Concept denoting the activity of eating X is generally easily computable from the Concepts for X and eating individually, then the eating Concept is effectively acting as a predicate. A SMEPH SchemaVertex, on the other hand, is like a Concept that’s defined in a timedependent way. One type of Schema refers to a habitual dynamical pattern of activity occurring before and/or during some condition is met. For instance, the condition might be saying the word Hello. In that case the corresponding SchemaVertex in the mind of Ben Goertzel is the pattern of activity that generally occurs before he says Hello. Another type of Schema refers to a habitual dynamical pattern of activity occurring after some condition X is met. For instance, in the case of the Schema for adding two numbers, the precondition X consists of the two numbers and the concept of addition. The Schema is then what happens when the mind thinks of adding and thinks of two numbers. Finally, there are Schema that refer to habitual dynamical activity patterns occurring after some condition X is met and before some condition Y is met. In this case the Schema is viewed as transforming X into Y. For instance, if X is the condition of meeting someone who is not a friend, and Y is the condition of being friends with that person, then the habitually intervening activities constitute the Schema for making friends. 14.3.2 SMEPH Edges SMEPH edge types fall into two categories: functional and logical. Functional edges connect Schema vertices to their input and outputs; logical edges refer mainly to conditional probabilities, and in general are to be interpreted according to the semantics of Probabilistic Logic Networks. Let us begin with logical edges. The simplest case is the Subset edge, which denotes a straightforward, extensional conditional probability. For instance, it may happen that whenever the Concept for cat is present in a system, the Concept for animal is as well. Then we would say Subset cat animal (Here we assume a notation where “R A B” denotes an Edge of type R between Vertices A and B.) On the other hand, it may be that 50% of the time that cat is present in the system, cute is present as well: then we would say Subset cat cute <.5> 274 14 Representing Implicit Knowledge via Hypergraphs where the <.5> denotes the probability, which is a component of the Truth Value associated with the edge. Next, the most basic functional edge is the Execution edge, which is ternary and denotes a relation between a Schema, its input and its output, e.g. Execution father_of Ben_Goertzel Ted_Goertzel for a schema father_of that outputs the father of its argument. The ExecutionOutput (ExOut) edge denotes the output of a Schema in an implicit way, e.g. ExOut say_hello refers to a particular act of saying hello, whereas ExOut add_numbers {3, 4) refers to the Concept corresponding to 7. Note that this latter example involves a set of three entities: sets are also part of the basic SMEPH knowledge representation. A set may be thought of as a hypergraph edge that points to all its members. In this manner we may define a set of edges and vertices modeling the habitual activity patterns of a system when in different situations. This is called the derived hypergraph of the system. Note that this hypergraph can in principle be constructed no matter what happens inside the system: whether it’s a human brain, a formal neural network, Cyc, OCP, a quantum computer, etc. Of course, constructing the hypergraph in practice is quite a different story: for instance, we currently have no accurate way of measuring the habitual activity patterns inside the human brain. fMRI and PET and other neuroimaging technologies give only a crude view, though they are continually improving. Pattern theory enters more deeply here when one thoroughly fleshes out the Inheritance concept. Philosophers of logic have extensively debated the relationship between extensional inheritance (inheritance between sets based on their members) and intensional inheritance (inheritance between entity-types based on their properties). A variety of formal mechanisms have been proposed to capture this conceptual distinction; see (Wang, 2006, 1995 TODO make ref) for a review along with a novel approach utilizing uncertain term logic. Pattern theory provides a novel approach to defining intension: one may associate with each ConceptVertex in a system’s derived hypergraph the set of patterns associated with the structural pattern underlying that ConceptVertex. Then, one can define the strength of the IntensionalInheritanceEdge between two ConceptVertices A and B as the percentage of A’s pattern-set that is also contained in B’s pattern-set. According to this approach, for instance, one could have IntInhEdge whale fish <0.6> ExtInhEdge whale fish <0.0> since the fish and whale sets have common properties but no common members. 14.4 Implications of Patternist Philosophy for Derived Hypergraphs of Intelligent Systems Patternist philosophy rears its head here and makes some definite hypotheses about the structure of derived hypergraphs. It suggests that derived hypergraphs should have a dual network 14.4 Implications of Patternist Philosophy for Derived Hypergraphs of Intelligent Systems 275 structure, and that in highly intelligent systems they should have subgraphs that constitute models of the whole hypergraph (these are self systems). SMEPH does not add anything to the patternist view on a philosophical level, but it gives a concrete instantiation to some of the general ideas of patternism. In this section we’ll articulate some "SMEPH principles", constituting important ideas from patternist philosophy as they manifest themselves in the SMEPH context. The logical edges in a SMEPH hypergraph are weighted with probabilities, as in the simple example given above. The functional edges may be probabilistically weighted as well, since some Schema may give certain results only some of the time. These probabilities are critical in terms of SMEPH’s model of system dynamics; they underly one of our SMEPH principles, Principle of Implicit Probabilistic Inference: In an intelligent system, the temporal evolution of the probabilities on the edges in the system’s derived hypergraph should approximately obey the rules of probability theory. The basic idea is that, even if a system - through its underlying dynamics - has no explicit connection to probability theory, it still must behave roughly as if it does, if it is going to be intelligent. The roughly part is important here; it’s well known that humans are not terribly accurate in explicitly carrying out formal probabilistic inferences. And yet, in practical contexts where they have experience, humans can make quite accurate judgments; which is all that’s required by the above principle, since it’s the contexts where experience has occurred that will make up a system’s derived hypergraph. Our next SMEPH principle is evolutionary, and states Principle of Implicit Evolution: In an intelligent system, new Schema and Concepts will continually be created, and the Schema and Concepts that are more useful for achieving system goals (as demonstrated via probabilistic implication of goal achievement) will tend to survive longer. Note that this principle can be fulfilled in many different ways. The important thing is that system goals are allowed to serve as a selective force. Another SMEPH dynamical principle pertains to a shorter time-scale than evolution, and states Principle of Attention Allocation: In an intelligent system, Schema and Concepts that are more useful for attaining short-term goals will tend to consume more of the system’s energy. (The balance of attention oriented toward goals pertaining to different time scales will vary from system to system.) Next, there is the Principle of Autopoesis: In an intelligent system, if one removes some part of the system and then allows the system’s natural dynamics to keep going, a decent approximation to that removed part will often be spontaneously reconstituted. And there is the 276 14 Representing Implicit Knowledge via Hypergraphs Cognitive Equation Principle: In an intelligent system, many abstract patterns that are present in the system at a certain time as patterns among other Schema and Concepts, will at a near-future time be present in the system as patterns among elementary system components. The Cognitive Equation Principle, briefly discussed in Chapter 3, basically means that Concepts and Schema emergent in the system are recognized by the system and then embodied as elementary items in the system so that patterns among them in their emergent form become, with the passage of time, patterns among them in their directly-system-embodied form. This is a natural consequence of the way intelligent systems continually recognize patterns in themselves. Note that derived hypergraphs may be constructed corresponding to any complex system which demonstrates a variety of internal dynamical patterns depending on its situation. However, if a system is not intelligent, then according to the patternist philosophy evolution of its derived hypergraph can’t necessarily be expected to follow the above principles. 14.4.1 SMEPH Principles in CogPrime We now more explicitly elaborate the application of these ideas in the CogPrime context. As noted above, in addition to explicit knowledge representation in terms of Nodes and Links, CogPrime also incorporates implicit knowledge representation in the form of what are called Maps: collections of Nodes and Links that tend to be utilized together within cognitive processes. These Maps constitute a CogPrime system’s derived hypergraph, which will not be identical to the hypergraph it uses for explicit knowledge representation. However, an interesting feedback loop arises here, in that the intelligence’s self-study will generally lead it to recognize large portions of its derived hypergraph as patterns in itself, and then embody these patterns within its concretely implemented knowledge hypergraph. This relates to the Cognitive Equation Principle defined above 3, in which an intelligent system continually recognizes patterns in itself and embodies these patterns in its own basic structure (so that new patterns may more easily emerge from them). Often it happens that a particular CogPrime node will serve as the center of a map, so that e.g. the Concept Link denoting cat will consist of a number of nodes and links roughly centered around a ConceptNode that is linked to the WordNode cat. But this is not guaranteed and some CogPrime maps are more diffuse than this with no particular center. Somewhat similarly, the key SMEPH dynamics are represented explicitly in CogPrime: probabilistic reasoning is carried out via explicit application of PLN on the CogPrime hypergraph, evolutionary learning is carried out via application of the MOSES optimization algorithm, and attention allocation is carried out via a combination of inference and evolutionary pattern mining. But the SMEPH dynamics also occur implicitly in CogPrime: emergent maps are reasoned on probabilistically as an indirect consequence of node-and-link level PLN activity; maps evolve as a consequence of the coordinated whole of CogPrime dynamics; and attention shifts between maps according to complex emergent dynamics. To see the need for maps, consider that even a Node that has a particular meaning attached to it - like the Iraq Node, say - doesn’t contain much of the meaning of Iraq in it. The meaning of Iraq lies in the Links attached to this Node, and the Links attached to their Nodes - and the other Nodes and Links not explicitly represented in the system, which will be created by 14.4 Implications of Patternist Philosophy for Derived Hypergraphs of Intelligent Systems 277 CogPrime’s cognitive algorithms based on the explicitly existent Nodes and Links related to the Iraq Node. This halo of Atoms related to the Iraq node is called the Iraq map. In general, some maps will center around a particular Atom, like this Iraq map, others may not have any particular identifiable center. CogPrime’s cognitive processes act directly on the level of Nodes and Links, but they must be analyzed in terms of their impact on maps as well. In SMEPH terms, Cog- Prime maps may be said to correspond to SMEPH ConceptNodes, and for instance bundles of Links between the Nodes belonging to a map may correspond to a SMEPH Link between two ConceptNodes. Chapter 15 Emergent Networks of Intelligence 15.1 Introduction When one is involved with engineering an AGI system, one thinks a lot about the aspects of the system one is explicitly building – what are the parts, how they fit together, how to test they’re properly working, and so forth. And yet, these explicitly engineered aspects are only a fraction of what’s important in an AGI system. At least as critical are the emergent aspects – the patterns that emerge once the system is up and running, interacting with the world and other agents, growing and developing and learning and self-modifying. SMEPH is one toolkit for describing some of these emergent patterns, but it’s only a start. In line with these general observations, most of this book will focus on the structures and processes that we have built, or intend to build, into the CogPrime system. But in a sense, these structures and processes are not the crux of CogPrime’s intended intelligence. The purpose of these pre-programmed structures and processes is to give rise to emergent structures and processes, in the course of CogPrime’s interaction with the world and the other minds within it. We will return to this theme of emergence at several points in later chapters, e.g. in the discussion of map formation in Chapter 42 of Part 2. Given the important of emergent structures – and specifically emergent network structures – for intelligence, it’s fortunate the scientific community has already generated a lot of knowledge about complex networks: both networks of physical or software elements, and networks of organization emergent from complex systems. As most of this knowledge has originated in fields other than AGI, or in pure mathematics, it tends to require some reinterpretation or tweaking to achieve maximal applicability in the AGI context; but we believe this effort will become increasingly worthwhile as the AGI field progresses, because network theory is likely to be very useful for describing the contents and interactions of AGI systems as they develop increasing intelligence. In this brief chapter we specifically focus on the emergence of certain large-scale network structures in a CogPrime knowledge store, presenting heuristic arguments as to why these structures can be expected to arise. We also comment on the way in which these emergent structures are expected to guide cognitive processes, and give rise to emergent cognitive processes. The following chapter expands on this theme in a particular direction, exploring the possible emergence of structures characterizing inter-cognitive reflection. 279 280 15 Emergent Networks of Intelligence 15.2 Small World Networks One simple but potentially useful observation about CogPrime Atomspaces is that they are generally going to be small world networks [Buc03], rather than random graphs. A small world network is a graph in which the connectivities of the various nodes display a power law behavior – so that, loosely speaking, there are a few nodes with very many links, then more nodes with a modest number of links ... and finally, a huge number of nodes with very few links. This kind of network occurs in many natural and human systems, including citations among papers, financial arrangements among banks, links between Web pages and the spread of diseases among people or animals. In a weighted network like an Atomspace, "small-world-ness" must be defined in a manner taking the weights into account, and there are several obvious ways to do this. Figure 15.1 depicts a small but prototypical small-worlds network, with a few "hub" nodes possessing far more neighbors than the others, and then some secondary hubs, etc. An excellent reference on network theory in general, including but not limited to small world networks, is Peter Csermely’s Weak Links [Cse06]. Many of the ideas in that work have apparent OpenCog applications, which are not elaborated here. Fig. 15.1: A typical, though small-sized, small-worlds network. One process via which small world networks commonly form is "preferential attachment" [Bar02]. This occurs in essence when "the rich get richer" – i.e. when nodes in the network grow new links, in a manner that causes them to preferentially grow links to nodes that already have more links. It is not hard to see that CogPrime’s ECAN dynamics will naturally lead to 15.3 Dual Network Structure 281 preferential attachment, because Atoms with more links will tend to get more STI, and thus will tend to get selected by more cognitive processes, which will cause them to grow more links. For this reason, in most circumstances, a CogPrime system in which most link-building cognitive processes rely heavily on ECAN to guide their activities will tend to contain a smallworld-network Atomspace. This is not rigorously guaranteed to be the case for any possible combination of environment and goals, but it is commonsensically likely to nearly always be the case. One consequence of the small worlds structure of the Atomspace is that, in exploring other properties of the Atom network, it is particularly important to look at the hub nodes. For instance, if one is studying whether hierarchical and heterarchical subnetworks of the Atomspace exist, and whether they are well-aligned with each other, it is important to look at hierarchical and heterarchical connections between hub nodes in particular (and secondary hubs, etc.). A pattern of hierarchical or dual network connection that only held up among the more sparsely connected nodes in a small-world network would be a strange thing, and perhaps not that cognitively useful. 15.3 Dual Network Structure One of the key theoretical notions in patternist philosophy is that complex cognitive systems evolve internal dual network structures, comprising superposed, harmonized hierarchical and heterarchical networks. Now we explore some of the specific CogPrime structures and dynamics militating in favor of the emergence of dual networks. 15.3.1 Hierarchical Networks The hierarchical nature of human linguistic concepts is well known, and is illustrated in Figure 15.2 for the commonsense knowledge domain (using a graph drawn from WordNet, a huge concept hierarchy covering 50K+ English-language concepts), and in Figure 15.4 for a specialized knowledge subdomain, genetics. Due to this fact, a certain amount of hierarchy can be expected to emerge in the Atomspace of any linguistically savvy CogPrime, simply due to its modeling of the linguistic concepts that it hears and reads. Hierarchy also exists in the natural world apart from language, which is the reason that many sensorimotor-knowledge-focused AGI systems (e.g. DeSTIN and HTM, mentioned in Chapter 4 above) feature hierarchical structures. In these cases the hierarchies are normally spatiotemporal in nature - with lower layers containing elements responding to more localized aspects of the perceptual field, and smaller, more localized groups of actuators. This kind of hierarchy certainly could emerge in an AGI system, but in CogPrime we have opted for a different route. If a CogPrime system is hybridized with a hierarchical sensorimotor network like one of those mentioned above, then the Atoms linked to the nodes in the hierarchical sensorimotor network will naturally possess hierarchical conceptual relationships, and will thus naturally grow hierarchical links between them (e.g. InheritanceLinks and IntensionalInheritanceLinks via PLN, AsymmetricHebbianLinks via ECAN). 282 15 Emergent Networks of Intelligence Fig. 15.2: A typical, though small, subnetwork of WordNet’s hierarchical network. Once elements of hierarchical structure exist via the hierarchical structure of language and physical reality, then a richer and broader hierarchy can be expected to accumulate on top of it, because importance spreading and inference control will implicitly and automatically be guided by the existing hierarchy. That is, in the language of Chaotic Logic [Goe94] and patternist theory, hierarchical structure is an "autopoietic attractor" – once it’s there it will tend to enrich itself and maintain itself. AsymmetricHebbianLinks arranged in a hierarchy will tend to cause importance to spread up or down the hierarchy, which will lead other cognitive processes to look for patterns between Atoms and their hierarchical parents or children, thus potentially building more hierarchical links. Chains of InheritanceLinks pointing up and down the hierarchy will lead PLN to search for more hierarchical links – e.g. most simply, A → B → C where C is above B is above A in the hierarchy, will naturally lead inference to check the viability of A → C by deduction. There is also the possibility to introduce a special DefaultInheritanceLink, as discussed in Chapter 34 of Part 2, but this isn’t actually necessary to obtain the inferential maintenance of a robust hierarchical network. 15.3.2 Associative, Heterarchical Networks Heterarchy is in essence a simpler structure than hierarchy: it simply refers to a network in which nodes are linked to other nodes with which they share important relationships. That is, there should be a tendency that if two nodes are often important in the same contexts or for 15.3 Dual Network Structure 283 Fig. 15.3: A typical, though small, subnetwork of the Gene Ontology’s hierarchical network. the same purposes, they should be linked together. Portrayals of typical heterarchical linkage patterns among natural language concepts are given in Figures 15.5 and 15.6. Just for fun, Figure 15.7 shows one person’s attempt to draw a heterarchical graph of the main concepts in one of Douglas Hofstadter’s books. Naturally, real concept heterarchies are far more large, complex and tangled than even this one. In CogPrime, ECAN enforces heterarchy via building SymmetricHebbianLinks, and PLN by building SimilarityLinks, IntensionalSimilarityLinks and ExtensionalSimilarityLinks. Furthermore, these various link types reinforce each other. PLN control is guided by importance spreading, which follows Hebbian links, so that a heterarchical Hebbian network tends to cause PLN to explore the formation of links following the same paths as the heterarchical Hebbian- Links. And importance can spread along logical links as well as explicit Hebbian links, so that the existence of a heterarchical logical network will tend to cause the formation of additional heterarchical Hebbian links. Heterarchy reinforces itself in "autopoietic attractor" style even more simply and directly than heterarchy. 284 15 Emergent Networks of Intelligence Fig. 15.4: Small-scale portrayal of a portion of the spatiotemporal hierarchy in Jeff Hawkins’ Hierarchical Temporal Memory architecture. 15.3.3 Dual Networks Finally, if both hierarchical and heterarchical structures exist in an Atomspace, then both ECAN and PLN will naturally blend them together, because hierarchical and heterarchical links will feed into their link-creation processes and naturally be combined together to form new links. This will tend to produce a structure called a dual network, in which a hierarchy exists, along with a rich network of heterarchical links joining nodes in the hierarchy, with a particular density of links between nodes on the same hierarchical level. The dual network structure will emerge without any explicit engineering oriented toward it, simply via the existence of hierarchical and heterarchical networks, and the propensity of ECAN and PLN to be guided by both the hierarchical and heterarchical networks. The existence of a natural dual network structure in both linguistic and sensorimotor data will help the formation process along, and then creative cognition will enrich the dual network yet further than is directly necessitated by the external world. 15.3 Dual Network Structure 285 Fig. 15.5: Portions of a conceptual heterarchy centered on specific concepts. Fig. 15.6: A portion of a conceptual heterarchy, showing the "dangling links" leading this portion to the rest of the heterarchy. A rigorous mathematical analysis of the formation of hierarchical, heterarchical and dual networks in CogPrime systems has not yet been undertaken, and would certainly be an interesting enterprise. Similar to the theory of small world networks, there is ample ground here for both theorem-proving and heuristic experimentation. However, the qualitative points made here are sufficiently well-grounded in intuition and experience to be of some use guiding our 286 15 Emergent Networks of Intelligence Fig. 15.7: A fanciful evocation of part of a reader’s conceptual heterarchy related to Douglas Hofstadter’s writings. ongoing work. One of the nice things about emergent network structures is that they are relatively straightforward to observe in an evolving, learning AGI system, via visualization and inspection of structures such at the Atomspace. Section V A Path to Human-Level AGI Chapter 16 AGI Preschool Co-authored with Stephan Vladimir Bugaj 16.1 Introduction In conversations with government funding sources or narrow AI researchers about AGI work, one of the topics that comes up most often is that of “evaluation and metrics” – i.e., AGI intelligence testing. We actually prefer to separate this into two topics: environments and methods for careful qualitative evaluation of AGI systems, versus metrics for precise measurement of AGI systems. The difficulty of formulating bulletproof metrics for partial progress toward advanced AGI has become evident throughout the field, and in Chapter 8 we have elaborated one plausible explanation for this phenomenon, the "trickiness" of cognitive synergy. [LWML09], summarizing a workshop on “Evaluation and Metrics for Human-Level AI” held in 2008, discusses some of the general difficulties involved in this type of assessment, and some requirements that any viable approach must fulfill. On the other hand, the lack of appropriate methods for careful qualitative evaluation of AGI systems has been much less discussed, but we consider it actually a more important issue – as well as an easier (though not easy) one to solve. We haven’t actually found the lack of quantitative intelligence metrics to be a major obstacle in our practical AGI work so far. Our OpenCogPrime implementation lags far behind the CogPrime design as articulated in Part 2 of this book, and according to the theory underlying CogPrime, the more interesting behaviors and dynamics of the system will occur only when all the parts of the system have been engineered to a reasonable level of completion and integrated together. So, the lack of a great set of metrics for evaluating the intelligence of our partiallybuilt system hasn’t impaired too much. Testing the intelligence of the current OpenCogPrime system is a bit like testing the flight capability of a partly-built airplane that only has stubs for wings, lacks tail-fins, has a much less efficient engine than the one that’s been designed for use in the first "real" version of the airplane, etc. There may be something to be learned from such preliminary tests, but making them highly rigorous isn’t a great use of effort, compared to working on finishing implementing the design according to the underlying theory. On the other hand, the problem of what environments and methods to use to qualitatively evaluate and study AGI progress, has been considerably more vexing to us in practice, as we’ve proceeded in our work on implementing and testing OpenCogPrime and developing the CogPrime theory. When developing a complex system, it’s nearly always valuable to see what this system does in some fairly rich, complex situations, in order to gain a better intuitive understanding of the parts and how they work together. In the context of human-level AGI, the theoretically best way to do this would be to embody one’s AGI system in a humanlike body 289 290 16 AGI Preschool and set it loose in the everyday human world; but of course, this isn’t feasible given the current state of development of robotics technology. So one must seek approximations. Toward this end we have embodied OpenCogPrime in non-player characters in video game style virtual worlds, and carried out preliminary experiments embodying OpenCogPrime in humanoid robots. These are reasonably good options but they have limitations and lead to subtle choices: what kind of game characters and game worlds, what kind of robot environments, etc.? One conclusion we have come to, based largely on the considerations in Chapter 11 on development and Chapter 9 on the importance of environment, is that it may make sense to embed early-stage proto-AGI and AGI systems in environments reminiscent of those used for teaching young human children. In this chapter we will explore this approach in some detail: emulation, in either physical reality or an multiuser online virtual world, of an environment similar to preschools used in early human childhood education. Complete specification of an “AGI Preschool” would require much more than a brief chapter; our goal here is to sketch the idea in broad outline, and give a few examples of the types of opportunities such an environment would afford for instruction, spontaneous learning and formal and informal evaluation of certain sorts of early-stage AGI systems. The material in this chapter will pop up fairly often later in the book. The AGI Preschool context will serve, throughout the following chapters, as a source of concrete examples of the various algorithms and structures. But it’s not proposed merely as an expository tool; we are making the very serious proposal that sending AGI systems to a virtual or robotic preschool is an excellent way – perhaps the best way – to foster the development of human-level human-like AGI. 16.1.1 Contrast to Standard AI Evaluation Methodologies The reader steeped in the current AI literature may wonder why it’s necessary to introduce a new methodology and environment for evaluating AGI systems. There are already very many different ways of evaluating AI systems out there ... do we really need another? Certainly, the AI field has inspired many competitions, each of which tests some particular type or aspect of intelligent behavior. Examples include robot competitions, tournaments of computer chess, poker, backgammon and so forth at computer olympiads, trading-agent competition,