FINANCE THEORY APPLICATIONS OFLEARNING, EvOLUTIoN AND SELF-ORGANIzATIoNbyBRYAN R. R0UTLEDGEB.Comm (Hon), Queen’s University at Kingston, 1987A THESIS SUBMITTED iN PARTIAL FULFILMENT OFTHE REQUllEMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYINTHE FACULTY OF GRADUATE STUDiESFACULTY OF COMMERCE AND BUSINESS ADMINISTRATIONWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIAOctober 1995© Bryan Ralph Routledge, 1995In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of ‘-) LT ( f (The University of British ColumbiaVancouver, CanadaDate I ? CT CEc2 L99 1DE-6 (2/88)Finance Theory Applications ofLearning, Evolution and SeOrganizationAbstractThis thesis investigates stochastic adaptive learning and contrasts models of adaptiveindividuals with models that assume complete and unbounded rationality. In this thesis,individuals follow rules of thumb which are developed by adopting or copying successfulrules, abandoning unsuccessful ones and occasionally creating novel rules. In fixedenvironments, these learning algorithms differ only slightly from complete rationalityBayesian approaches. However, in situations where an individual’s utility depends on thebehaviour of other individuals, a co-evolutionary environment, the prediction of adaptivelearning models can differ markedly from traditional complete rationality models. In thefirst section of the thesis interaction between individuals’ decisions is limited and direct.People face their neighbours in a repeated prisoner’s dilemma. A genetic algorithm, usedas an example of a stochastic adaptive learning process, is developed in Chapter 2. The rateof learning in the algorithm is controlled by altering the number of individuals obtaining newstrategies in a generation. In the infinitely repeated game the learning rate affects theequilibrium level of payoffs (ie. affects which equilibria are selected). In the finitelyrepeated game the learning rate determines whether or not the system converges to theunique Nash equilibrium. Chapter 3 considers a similar model analytically yieldinganalogous results. The second portion of the thesis investigates stochastic adaptive learning— II —in a non-strategic yet co-evolutionary environment. This section develops an asymmetricinformation, one-period, single risky asset portfolio choice model based on Grossman andStiglitz (1980). The main finding of these three chapters is that the appropriateness of therational expectations (or complete rationality) equilibrium depends upon the level of noisein the economy (in the form of noise traders) relative to the level of experimentation in theindividual’s learning processes. The discussion of this relationship begins in Chapter 4 byconstructing a learning process which converges to the rational expectations equilibrium andconcludes with a discussion of the stability of the Grossman-Stiglitz equilibrium to anadaptive learning process where experimentation does not vanish. Chapter 5 develops adeterministic representation of a stochastic adaptive learning process to formally develop thelink between noise trading, experimentation and the Grossman-Stiglitz equilibrium. Finally,Chapter 6 demonstrates the stability result in a more general environment using a geneticalgorithm as an example of a stochastic adaptive process.— 111 —Table of ContentsAbstract iiTable of Contents ivList of Tables ixList of Figures xAcknowledgements xviiChapter 1Introduction 1I Background on the Genetic Algorithm 7II Adaptation in Spatial Environment 11III Stochastic Adaptive Learning in a Financial Market 13IV Conclusion 16Chapter 2Co-evolution and Spatial Interaction 17I Background 22II The Repeated Prisoner’s Dilemma Model and Evolution 25(A) Agent Interaction 26(B) Evolution 29(C) Parametric Assumptions 34- iv -Learning, Evolution and S4fOrganizationIII Results.(A) Infinitely Repeated Prisoner’s Dilemma(B) Finitely Repeated Prisoner’s Dilemma Game(i) The one-shot game(ii) The twice repeated gameIV DiscussionV ConclusionChapter 3Chapter 4Stochastic Adaptive Learning in a Rational Etpectations ModelI BackgroundII Model(A) Structure(B) Learning(C) “Rational” or Complete-Knowledge Learning StatesTable of Co,Uenlz373845• • . • 45• .. . 47• . . . 5660Simple Spatial EvolutionI Replicator DynamicsII Spatial Markov Modelifi Conclusions848692102104107110110114118-V.Learning, Evolution and Se!-Organization Table of oment.cIII Learning, Equilibrium and Stability.120(A) Comparison of Informed and Uninformed Utility 120(B) Stochastic Adaptive Learning Process 127(C) Learning and Local Stability 129(D) Stability with Always-Experimenting Stochastic AdaptiveLearning 134IV Conclusions 140Chapter 5The Stability of SAL: Deterministic Dynamics 142I Learning by Pure Adaptation . . . 147II Driftiess Stable Dynamic Equilibria 152III Learning with Adaptation and Drift 153IV The Stability of the GS Equilibrium 158V Conclusion 162Chapter 6Genetic Algorithm Learning in a Rational Expectations Model 169I Simulation Design 171II Simulation Results 176(A) Stability Examples - Simulations One and Two 177- vi -Learning, Evolution and SefOrganization Table of Co,#enL(B) Convergence Examples - Simulations Three and Four 181ifi Extensions 183IV Conclusions 185Chapter 7Conclusions 226Bibliography 233Appendix A - Proofs for Chapter 3 243Appendix B - Proofs for Chapter 4 253Appendix C - Proofs for Chapter 5 259- vu -List of TablesTable 2.1 - Parameters for Simulation 35Table 2.2 - Regression ResultsData: 20 different initial populations and R=5,10,15,. . .,95,100 used to generate400 data points for regression 40Table 6.1 - Common Simulation Parameters 187Table 6.2 - Simulation Specific Parameters and Results Summary 188Table 6.3 - Common Simulation Parameters- Genetic Algorithm 189Table B. 1- Numbering of CasesThere are 11 cases to consider. Note the final cell represents 3 cases depending onthe relationship between K and K 248- vial -List of FiguresFigure 2.1 - The Repeated Prisoner’s Dilemma Game(a) Payoffs in the stage game (or one-shot) game. (b) Possible and equilibria periteration payoffs in infinitely repeated game 27Figure 2.2 - Schematic for Evolution Simulation 30Figure 2.3 - Population Average Fitness in Final GenerationFinal Generation is calculated as either the generation where the populationconverged or generation 2000 63Figure 2.4 - Population Average Fitness in Final Generation (L= 10)Results for 20 different initial populations. R =5,10,15,... ,95, 100. Final Generationdetermined as in Figure 3 64Figure 2.5 - Population Average Fitness per GenerationTime series plot for three different values for the number of agents replaced pergeneration. R=8,28,67. Same initial population 65Figure 2.6 - Location and Strategy Evolution of the AgentsInfinitely repeated Game with eight agents replaced. Each agent’s location (on thetorus) is represented by a circle. Radius is proportional to fitness. Shadingrepresents the percentage of C’s played the agent in tournament (10 move gameagainst four neighbours or 40 moves) 66Figure 2.7 - Difference in Average Fitness Between Converged and Generation 2000- ix -Learning, Evolution and SetOrganizo4on List ofFiguresThere are 31 cases where the convergence in Figure 3 was not stable 67Figure 2.8 - Average Fitness per Generation R=67Note the population converges and then diverges several times. Insert provides amagnification of generations 860 to 890 68Figure 2.9- Location and Strategy Evolution of the AgentsInfmitely repeated game with 67 agents Replaced. Each agent’s location (on thetorus) is represented by a circle. Radius is proportional to fitness. Shadingrepresents the percentage of C’s played the agent in tournament (10 move gameagainst four neighbours or 40 moves) 69Figure 2.10- Location and Strategy Evolution of the AgentsOne-Shot Game with 1 agent replaced. Each agent’s location (on the torus) isrepresented by a circle. Radius is proportional to fitness. In this one-shot game,shading represents agents that play C as dark and D as white 71Figure 2.11- Population Average in Generation 5000Population converged for R=11,44,47,50,51,...,100 72Figure 2.12- Population Average Fitness Statistics for Generations 5000 to 6000 (L=2)Frozen, Critical, Stationary and Converged regions are discussed in the text. . 73Figure 2.13 - All Possible Strategies for the Twice Repeated GameA string of 5 bits is used to represent these strategies. Strategy index numbers usedin histograms which follow 74Figure 2.14- Location of Agents in a Frozen EvolutionLearning, Evolution and S4fOrganization List ofFiguresThe dark agents are the 8 least fit agents and receive new strategies each generation.The other agents never change strategy 75Figure 2.15 - Population Average Fitness per Generation (R=8)Generation 5000 to 6000. An example of an evolution which is Frozen 76Figure 2.16 - Strategies in Population in Generation 5000White bars indicate initial population. See Figure 13 for Strategy Index 76Figure 2.17 - Population Average Fitness per Generation (R=34)Generation 5000 to 6000. An example of an evolution which is Stationary. . . . 77Figure 2.18 - Strategies in Population in Generation 5000White bars indicate initial population. See Figure 13 for Strategy Index 77Figure 2.19 - Population Average Fitness per Generation (R= 14)Generation 5000 to 6000. An example of an evolution which is Critical 78Figure 2.20 - Strategies in Population in Generation 5000White bars indicate initial population. See Figure 13 for Strategy Index 78Figure 2.21- Location and Strategy Evolution of the AgentsTwice Repeated Game with 14 agents replaced. Example of behaviour in CriticalRegion. Average population fitness moves from 1.49 in (b) to 1.25 in (f) and endsat 1.32 in (g) 79Figure 2.22 - Location and Strategy Evolution of the Agents-xi-Learning, Evolution and S4IOrganizaiion List ofFiguresTwice Repeated Game with 14 agents replaced. Example of behaviour in CriticalRegion. Average population average fitness moves from a low of 1.3 in (b) to ahigh of 1.5 in (f) 80Figure 2.23 - Strategy ReplacementShade of agent indicates the number of generations since the agent received a newstrategy. This simulation is for 14 agents replaced in the twice repeated game.Generations correspond to those in Figure 21 and 22 81Figure 2.24 - Two Initial PopulationsPopulations differ only in the first bit of agents 4 and 6 82Figure 2.25 - Strategies in Population After Generation 2000 and R= 16From two similar populations, R= 16 in the Critical Region produces very differentpopulations 82Figure 2.26 - Population Average Fitness with Initial Population 1 and R= 162000 generations are reported. R= 16 is in the Critical Region 83Figure 2.27 - Population Average Fitness with Initial Population 2 and R= 164000 generations are reported. Simulation was run for 15,000 generationsproducing similar plot as presented here 83Figure 3.1 - Agent InteractionFour agents, arranged in two groups, interact playing the prisoner’s dilemma. Thesymmetric payoffs for the row player are shown at the right 93Figure 4.1 - Expected Utility of Informed and Uninformed Agents- XII-Learning, Evolution and Sw-Organization List ofFiguresThe figure plots fi= 1 (a normalization) and f’ (for various £ and L) as functionsofX 137Figure 5.1 - Driftiess Learning ProcessThe arrows are for a typical driftless dynamic. The equilibria are at the GS values(X’,O) or at X = 1 165Figure 5.2 - Phase Plots in Driftiess Learning ProcessEach line represents a different initial condition. Note all lines converge to GSequilibrium (X*=O.5) or the X= 1 boundary 165Figure 5.3 - Learning Process with DriftThe unique dynamic equilibrium is near the GS values (X*,O) (denoted by thedot) 166Figure 5.4 - Phase Plots in Learning Process with DriftEach line represents a different initial condition. Note all lines converge to a pointwhich is close to the GS equilibrium 166Figure 5.5 - Learning Process with DriftThe unique dynamic equilibrium (denoted with the dot) is far from the GSvalues 167Figure 5.6 - Phase Plots in Learning Process with DriftEach line represents a different initial condition. Note all lines converge to a pointclosetoX=1 167Figure 5.7 - Learning Process with Drift- XIII -SimulationsFigure 6.2a- X, Proportion of Informed AgentsFigure 6.2b - Difference in Average Fitness: f1”Figure 6.2c- a0, Average PriceFigure 6.2d- a1, slope of price to yFigure 6.2e - a2, Slope of Price to eFigure 6.2f- j3Figure 6.2g -Figure 6.2h - 3UFigure 6.2i - 3UFigure 6.3a - X, Proportion of Informed AgentsFigure 6.3b - Difference in Average Fitness: f’—f”Figure 6.3c - a0, Average PriceFigure 6.3d - a1, slope of price to yFigure 6.3e - a2, Slope of Price to eFigure 6.3f-Learning, Evolution and SeOrganization List ofFiguresThere are three dynamic equilibria. Two are stable 168Figure 5.8 - Phase Plots in Learning Process with DriftEach line represents a different initial condition. Lines converge to close to the GSequilibrium or close to X =1 168Figure 6.1 - Schematic for Genetic Algorithm Stochastic Adaptive Learning174190191192193194195196197198199200201202203204- xiv -Learning, Evolution and S4fOrganizationFigure 6.3g-Figure 6.3h - 130UFigure 6.3i - 131JFigure 6.4a- X, Proportion of Informed AgentsFigure 6.4b - Average Fitness fi and fUFor the first 200 generationFigure 6.4c - ac,, Average PriceFigure 6.4d- a1, slope of price to yFigure 6.4e- a2, Slope of Price to eFigure 6.4f- j3’Figure 6.4g -Figure 6.4h -Figure 6.4i -Figure 6.5a - X, Proportion of Informed AgentsFigure 6.5b - Average Fitness f’ and fUFor the first 1000 generationFigure 6.5c - a0, Average PriceFigure 6.5d - a1, slope of price to yFigure 6.5e - a2, Slope of Price to eFigure 6.5f- i3’Figure 6.5g -List ofFigures• . 205206• . 207• . 208218219220221222223• 209• 210211• 212213214215216217-xv-Learning, Evolution and S4Organization List ofFiguresFigure 6.5h-i3ohl 224Figure 6.5i - ,9iU 225Figure B. 1 - Functions which support p as a stable dynamic equilibriumThe functions are from (A.9) and (A. 10). The dashed lines indicate smoothing ofthe piece-wise linear functions 245-xvi-AcknowledgementsFor my Mom and Dad,A small return for years of love and supportThere are many people to thank for the advice and support that I have received during mytime at UBC. Many thanks are due to Alan Kraus, my research supervisor, for guidance,patience, support and unsurpassed intellectual encouragement. I also thank the othermembers of my thesis committee, Jonathan Berk, Ken MacCrimmon, Vasant Naik andMichele Piccione for many stimulating and helpful conversations. There are also variousmembers of the academic community who contributed generously of their time. Theseinclude, Murray Carlson, Peter Danielson and Jasmina Arifovic. For financial support, Ithank the University of British Columbia and the Social Science and Humanities ResearchCouncil of Canada. Many thanks are also due to the faithful Tuesday/Friday basketballcolleagues for helping me maintain my sanity and to the Commerce Cubs, 1995Champions, for helping maintain the illusion of my youth. Thanks are due to my familyfor lots of encouragement. Finally, I thank my Marie who makes everything fun.-xvii-Chapter 1IntroductionAlthough the assumption of complete and unbounded rationality has beenfruitful in economics, it is not without controversy. In the 1950’s, Simon (1955, 1959)argued that the global rationality assumption should be replaced by a more realistic viewwhich recognizes the limits on information and cognition. Furthermore, the empiricalinvestigations into the foundations of decision theory consistently find violations of itsunderlying assumptions.1 Despite the controversy, the unbounded rationality assumptionremains dominant in modern financial and economic theory. Two reasons for thispersistence are the difficulty in finding tractable alternatives and the view that, even if agentsare not rational, they are reasonable and therefore will eventually adapt their behaviour tothat predicted by the rational model. Lucas (1986) captures the first sentiment when he1 Sec Shoemaker (1982) and Camerer and Weber (1992) for surveys of this research.Learning, Evolution and SdfLOrganizazion Introductionwrites of “... a tendency ... to disdain existing theories in favour of yet-to-be-constructedtheories” (page 218). In the same article he also captures the essence of the second pointwhen he states, “Technically, I think of economics as studying decision rules that are steadystates of some adaptive process, decision rules that are found to work over a range ofsituations and hence are no longer revised appreciably as more experience accumulates.”(Page 218). This thesis will develop and consider tractable models of adaptive behaviourto demonstrate that adaptation will often lead to behaviour which is dramatically differentfrom unbounded rationality.The argument that deviations from rationality should be small and unimportant is compelling.Even if economic agents are not fully rational they are reasonable. They have incentive toabandon behaviour which leads to poor results and imitate behaviour which was moresuccessful or try novel ideas. Over time, after many decisions, agents should have a welldeveloped set ofbehaviourial rules-of-thumb which are indistinguishable from a fully rationalagent. This argument is especially appealing in fmance which focuses on items such as riskpremia and stock price reactions. These items are determined by the aggregation of theanonymous trading behaviour of a large number of agents. Any individual anomalies shouldbe unsystematic and thus have little impact on aggregate behaviour. However, for manyeconomic models such those based on rational expectations, this argument is flawed. Therationality assumption is more than an assumption on individual behaviour. For manymodels, it is an assumption about what agents know about other agents. With rationality,-2-Learning, Evolution and S4fLOrganizadon Introductionthe behaviour of the other agents offers no source of uncertainty. If I know what you know,then I know what you will do.2 However, when behaviour is stochastic and adaptive, thisis not the case. Individual anomalies do not vanish.3 This thesis demonstrates this fact andexplores its causes.This thesis investigates the conditions and reasons that behaviour in models with stochasticadaptive learning individuals differs from that predicted by models based on unboundedrationality. The reason that adaptive models are complex and need not lead to theunboundedly rational prediction is that agent behaviour does not adapt independently of otheragents. The success (or fitness) of an agent depends on the adapting behaviour of the otheragents. In these situations, behaviour co-adapts, or borrowing from biology, co-evolves.As discussed by Kauffman and Johnsen (1991) in the biological context, adaptive changesby one agent not only affect the fitness of other agents, but change their fitness landscapes(ie. the map which translates behaviour into fitness). For example, when frogs developsticky tongues, ffies develop slippery feet. The development by the frog changes thedesirability (fitness) of slippery feet. What was once a valley on the fly’s fitness landscapeis now a peak. By altering the fitness landscape of other agents, adaptive moves by oneagent alter the “target” of the adaptive process.2 At least up to a point of randomizing between indifferent alternatives.This is analogous to the recent work in physics on Self-Organized Criticality (see Bak and Chea, 199,for an overview). Thiswork considers systems where small shocks can have large effects. A defining characteristic of a critical state is one whereevents (such as avalanches or earthquakes) propagate according to a Power Law. While small events occur more often, all sizedevents are possible.-3-Learning, Evoludon and S4t’Organizudon IntrodudionIn economics, an agent’s utility (or fitness) almost always depends on the behaviour of otheragents either directly, as in strategic situations, or indirectly through prices.4 Thus theagent’s success or fitness is coupled (using Kauffman and Johnsen’s term) to the behaviourof other agents. Both of these situation are considered in the thesis’s two main sections.First, in Chapters 2 and 3, individual fitness is directly coupled to the behaviour of othersvia a repeated prisoner’s dilemma game. Since play in this game is limited to one’sneighbours, fitness is directly coupled with only a few others and the complexity of coadaptation is addressed directly. In contrast, in the second main section of the thesis,consisting of Chapters 4 through 6, agent fitness is coupled indirectly with all others’behaviour. In an asymmetric information asset pricing environment based on the model ofGrossman and Stiglitz (1980), individual traders need to learn an endogenous relationshipbetween asset price and an exogenous signal of dividend value. Since all agents’ actionsaffect the price-signal relation, the success (or fitness) of one trader is coupled with all otheragents.In the two environments studied in this thesis, individuals are considered to have less thancomplete knowledge. In the repeated prisoner’s dilemma model, agents do not know thestrategy of the other agents. In the Grossman-Stiglitz based model, agents are assumed notto know parameters concerning the exogenous signal-dividend relation or endogenous price-signal relation. In these models, therefore, the opponents’ strategies or the signalFor example, Blume and Easley (1992) consider the fact that portfolio strategies of all agents affect the success (or fitness) ofeach individual strategy.-4-Learning, Evolution and Sef-Organization Introductionrelationship must be learned. Throughout the thesis, stochastic adaptive learning will beconsidered. In each model, agents form reasonable rules-of-thumb and experiment by trialand error.5 Good rules-of-thumb are imitated and used with more frequency, badrules-of-thumb are abandoned and randomly selected novel rules are periodically introduced.Stochastic adaptive learning has the appealing property that “average” or “typical” behaviourcan be predicted. Since learning is adaptive, agents, on average, choose the best action.However, individual behaviour can be predicted with less certainty as individual learning isstochastic and behaviour includes stochastic experimentation.In order to understand adaptive learning, it is useful to contrast adaptive learning withtraditional Bayesian learning. In Bayesian learning beliefs are updated based on priorbeliefs, observed information and Bayes Rule, and an optimal action, based on beliefs, ischosen. In an environment where success or fitness does not depend on other agents,6 thecomparison between the two models of learning can primarily be drawn from existingliterature.7 The primary difference between the two approaches is whether or not completelearning occurs (in the limit) and the true relationship is discovered. Bayesian learningmodels can lead to incomplete learning. With certain prior beliefs, the agent may notexperiment sufficiently to learn the true parameters. This incomplete learning is & aiue“Reasonable” depends on the model context. In particular, the full-infonnation, rational behaviour is a possible rule-of-thumb.6 A typical example is a monopolist learning about a demand function with an unknown parameter. It is important to note thatthe demand function in this example is treated as exogenous. If consumers were also adaptively learning, then the situationwould be co-adaptive and far more complicated.See for example Easley and Kiefer (1988), Aghion et al (1991) and Bala and Goyal (1993).-5-Learning, Evolution and S4fOrganization Introductionrational because the pessimistic prior beliefs about the unknown parameter do not warrantexperimentation. Stochastic adaptive learning, in contrast, will usually lead to completelearning. That is, the agent will learn the true parametric relationship and choose the expostoptimal action. Since the agent tries new strategies (ie. stochastic experimentation), she willgenerate sufficient information to uncover the underlying process. Since behaviour isadaptive, the behaviour will tend toward the optimal action, given the true relationship.While it is likely that Bayesian and stochastic adaptive learning will converge to the sameaction, it is possible that an agent acting adaptively can be more successful (realize a higherutility or fitness) than the Bayesian rational agent. It is important to note that rationalbehaviour need not be favoured by evolution.8 Adaptive learning selects behaviour whichdoes well ex post while rational behaviour focuses on ex ante.As is obvious from the discussion thus far, the thesis draws on many diverse fields for input.Besides the research on Bayesian learning models of optimal experimentation discussed inthe previous paragraph, the research on learning in both game theory and rationalexpectations models is relevant.9 Since there is a natural relationship between adaptivelearning and evolution, research in evolutionary game theory1° and evolutionary8 This point is made in Blume and Easley (1992).9 . .Both of these branches of research are extensive. However, examples of relevant papers in rational expectations learning areBray and Savin (1986) and Vives (1993). Examples of learning in game theory can be found in Jordan (1991), Milgrom andRoberts (1991) and Kalai and Lehrer (1993).10 Again, there are many examples which are relevant. However, the papers of Nachbar (1992), Binmore and Samuelson (1992),Kaudori, Mailath and Rob (1993) and Young (1993) are particularly relevant.-6-Learning, Evolution and SeOrganization Imroductioneconomics11 are relevant. The economic notion that agents making systematic errors havean incentive to modify their behaviour parallels natural selection which eliminates poorgenes. Thus the work in co-evolutionary biological systems is important (see Kauffman andJohnsen, 1991). The fact that adaptation takes place far quicker than biological evolutionis of little importance. A more thorough discussion of this literature appears at the beginningof each of the two major sections (Chapter 2 and 4).I - Background on the Genetic AlgorithmA chief advantage of considering learning as an stochastic adaptive process is that modelscan be easily simulated using a genetic algorithm which is a commonly used optimizationtechnique. In this thesis, the genetic algorithm is used as an example of a stochastic adaptivelearning process rather than for optimization. Genetic algorithms, developed primarily byHolland (1975), are biologically inspired stochastic optimization techniques. Geneticalgorithms simulate the adaptive learning process where successful strategies proliferate andunsuccessful strategies die out. They also generate novel strategies by combining andmutating existing strategies in an efficient search of the strategy space. Besides the intuitiveappeal of imitating good strategies, the genetic algorithm has the advantage of being easilyimplemented in the environments considered in this thesis. In addition, genetic algorithmshave proven successful in many optimization applications and are gaining popularity as11 See Dosi (1988) for a survey.-7-Learning, Evolution and Se/’Organization Introductionmodels of adaptation in economics.’2 Finally, there is the potential that if geneticalgorithms are well understood in the simple environments presented in this thesis, they canbe applied to more complicated, less tractable problems to guide intuition and research.Since genetic algorithms are an important example of a stochastic adaptive learning processand are used in Chapters 2 and 6, a brief background on genetic algorithms is included.’3There are three parts to the to the genetic algorithm, mapping strings to strategies, mappingthese strategies to fitness (utilities) and manipulating strings based on their fitness. The firsttwo parts are problem specific. For example, a string may be decoded into a butterfly’swing colour strategy. The wing colour strategy may be mapped into a probability of beingcaptured by a predator. In this thesis, strings are decoded into a strategy for playing theprisoner’s dilemma or a trader’s inference about a risky dividend and portfolio choice.A string is a collection of genes which are selected from some alphabet (usually {O, 1}). Apopulation of strings is manipulated by the genetic algorithm to form a new population,through selection, crossover and mutation. The selection, crossover and mutation aredesigned to search for good strings by focusing attention on the most promising parts of astrategy space. Selection chooses strings with probability proportional to fitness (ie. a12 For example, see Axelrod (1987), Miller (1987), Marimon, MeGrattan and Sargent (1990) and Slade and Eaton (1990) andArifovic (1994) all of which use simulations based on a genetic algorithm.13 All of the simulations using the genetic algorithm were programmed in Pascal and based on the sample code in Goldberg (1989).The simulations were ron on both PC-80x86 using Borland’s Turbo-Pascal and on an IBM RS6000 using IBM’s XLP compiler.For more information, please contact the author.-8-Learning, Evolution and S4f-Organization Introductionweighted roulette wheel). Crossover involves mating two strategies. Two strings that havebeen selected are cut at a randomly chosen point. Two new strategies are formed byrecombining the front portion of string one with the back portion of string two and the frontportion of string two with the back portion of string one. Crossover in the genetic algorithmdirects the search. A strategy with one good feature (say a good parameter for making aninference or a good move at some situation in a game) can be combined with a strategy withanother good feature. Since, on average, strings with good features propagate and stringswith bad features die out, the genetic algorithm constructs a population of optimal strings.Finally, mutation involves randomly altering a few genes in a population to ensure enoughgenetic diversity in the search.While artificial selection may be intuitively appealing, it is not obvious that the geneticalgorithm will iterate to an optimal solution. Genetic algorithms efficiently search byexploiting the fact that a string belongs to many regions (or schema) simultaneously. Forexample, the four-gene string 1101 and its associated fitness level provide information onstrings of the form {*l**, l1’, 1K0K,.. .}, where the * represents an arbitrary 0 or 1.Similar to the solution to a k-armed bandit problem,’4the genetic algorithm trades off theexploration for knowledge and the exploitation of known good strings. The choice betweenstrings of the form **l** and **0** is a two-armed bandit problem. The choice between14 The two-armed bandit problem, stated in terms of the slot machine analogy, is how to allocate trials over two slot machines tomaximise expected profits. Since the average payoffs on the machines are unknown, optimal play involves a tradeoff betweentesting both machines and playing only the machine thought to have the highest mean. The k-armed bandit problem simplyinvolves a larger choice set. See Goldberg (1989), page 36.-9-Learning, Evolulion and S4fOrganization Introduction**()()*,‘lO, **O1* and **11* is a four-armed bandit problem. Primarily through thecrossover step, the genetic algorithm is solving many k-armed bandit problemssimultaneously. This is the intuition for Holland’s (1975) Schema Theorem which provesthe effectiveness of genetic algorithms.15”6The proof of the Schema Theorem is for situations where the fitness of an individual stringdoes not depend on other strings in the population. However, in the situations consideredin this thesis this is not the case. The fitness of a string can depend directly on a few otheragents, as it does when individuals play a game (ie. the prisoner’s dilemma in Chapter 2).Alternatively, string fitness may depend indirectly on all agents’ behaviour as it does inChapter 6 where traders are trying to infer information from an endogenously determinedasset price. In these settings the Schema Theorem does not apply. It does not apply in coadaptive environments primarily because the concept of optimality is not well defined. Forexample, in a simple one-shot prisoner’s dilemma,17 two rational agents defect. Yet twocooperative agents achieve higher utility (fitness). This, however, is an issue that appliesto adaptive learning processes in general and is discussed more fully in the body of thethesis. The principal advantage of the genetic algorithm is that they capture the intuition of15 Besides Holland (1975), Goldberg (1989) Chapter 2 and offer explanation of this theorem. Vose (1991) has a generalizationof schema and genetic algorithms.16 An important assumption in the Schema Theorem is that the probability that good schema (or features) are disrupted bycrossover and mutation is small. While the restriction on the mutation rate is easily controlled, disruption caused by crossoverdepends on the nature of the problem and how strategies are coded to strings (see Goldberg, 1989, Chapter 4). In each of thetwo Chapters that use genetic algorithms alternative string coding schemes were tried producing similar results.17 For a specific prisoner’s dilemma see Figure 2.1 in Chapter 2.-10-Learning, Evohaion and Set/Organization Introductionproliferating successful strategies as well as stochasticly introducing novel ones and docorrespond with (ex post) optimality in simple settings. The specific details of how thealgorithm was implemented is the spacial prisoner’s dilemma model and Grossman-Stiglitzmodel are in Chapters 2 and 6 respectively.II - Adaptation in Spatial EnvironmentAs was noted earlier, the thesis has two main components. The first section attempts tohighlight the coupling of fitness landscapes by considering a spatial model. This sectioninvestigates directly the impact of co-adaptation or co-evolution. The interdependence whichproduces co-evolution is modeled using the well-known repeated prisoner’s dilemma game.Both the finitely and infinitely repeated versions are considered. It is important to state thatthe purpose of the section is not to provide another characterisation of when co-operationwill or will not take place. Instead, the familiarity of the struggle between the need for jointco-operation and the temptation of exploitation is used to assist in understanding the complexevolutionary dynamics.Chapter 2, uses a genetic algorithm simulation to consider stochastic adaptive learning forindividuals located on a lOx 10 grid (without boundaries) playing repeated prisoner’sdilemmas with each of their four immediate neighbours. The number of agents that receivea new strategy each generation can be considered to determine the rate at which the— 11 —Learning, Evolution and SeOrganization Introductionpopulation learns or adapts. This rate determines the importance of location in the model.By varying the number of agents which can alter their strategy in any generation, the effectof co-evolution is modified. If a large proportion of agents receive new strategies, then anyone agent will likely face neighbours with new strategies in each generation. Thus heroptimal strategy must co-evolve with the entire population. If the proportion of agentsreceiving new strategies is small then most changes in the population will not affect herneighbours. However, if one neighbour does change strategy, the impact will be significant.In this case co-evolution will be a local phenomenon. The proportion of the agents replacedin any one generation, or the learning rate, is this model’s key parameter in the explorationof co-evolution.The results in Chapter 2 demonstrate that in the presence of co-evolution, the way evolutionis modeled matters. In the infinitely repeated prisoner’s dilemma the population tends toevolve to one of the many Nash equilibria. Increasing the number of agents replaced pergeneration (the learning rate) reduces the mean and increases the variance of equilibriumpopulation average fitness. In the finitely repeated game, the population converges to theunique Nash equilibrium only when a large number of agents are replaced. These resultsare related to the Kauffman and Johnsen (1991) result which relates the ruggedness (ie.number of local optima) of a fitness landscape to the length of time an evolution requires toconverge to a Nash equilibrium. Finally, in the fmitely repeated game, the proportion ofagents replaced that maximises population average fitness corresponds to critical behaviour-12-Learning, Evolution and S4fOrganization Introductionwhich suggests the possibility of Self-Organized Criticality (Bak and Chen, 1991). Onecould argue that the maximal population fitness makes the critical state a dynamical attractor.Again, this result is consistent with a Kauffman and Johnsen (1991) result. However, in themodel developed in Chapter 2, the intuition of the prisoner’s dilemma game can be used tounderstand how and why optimal population fitness is related to critical behaviour.Chapter 3 analyses simpler models of stochastic adaptive learning in a spatial environment.In addition to a simpler spatial relationship, the interactions are the more rudimentary one-shot prisoner’s dilemma games. The benefit of these simplifications is that tractablealgebraic analysis is possible. The chapter begins by modelling the spatial interactionsimpact on the correlation in which strategies get matched to play. The chapter argues thata replicator dynamics approach, typical of many evolutionary game theory models, will notbe useful for this spatial setting since there is no natural way to parameterize the correlationin play induced by the spatial setting. The chapter then presents a Markov model of a simplefour-person, two-interactions system. Despite the simplicity of this model, it captures thecomplex relationship between number of agents replaces and average fitness which appearsin Chapter 2.ifi - Stochastic Adaptive Learning in a Financial Market- 13 -Learning, Evolution and SelfOrganization IntroductionThe second main section of the thesis abstracts from the strategic considerations which arecentral to the previous section. Instead, this section models a situation where all strategiesmust co-adapt with the entire population. In this section, many individual traders attemptto learn the information revealed by the competitive price of a financial asset. Chapters 4,5 and 6 use a model similar to the single risky asset, one-period model of Grossman andStiglitz (1980) to explore learning dynamics relative to the typical rational expectationsassumptions.In the model, agents may choose to be informed by purchasing a common signal on the riskydividend or be uninformed and imperfectly infer the signal from the asset’s price. Tradersrepeat the one-period model many times and must learn two of the model’s exogenousparameters and the parameters of an endogenous relationship. Learning in this modelunderscores the difference between learning a parametric relationship (ie. nature) andlearning an equilibrium relationship. Parametric relationships are unaffected by other agents.However, equilibrium relationships, which are of concern only to uninformed agents, dependon the learning of all the other agents and thus involve co-adaption or co-evolution.Chapter 4 presents the model and introduces stochastic adaptive learning in this environment.The chapter demonstrates that stochastic adaptive learning need not be inconsistent with therational expectations equilibrium. The chapter constructs a family of stochastic adaptivelearning process which converge to the rational expectations equilibrium. However, these-14-Learning, Evoluilon and S4fOrganization Introductionprocesses all have the feature that experimentation (ie. trial and error portion of learning)becomes arbitrarily small. The interesting results are for the case when experimentation,however small, does not disappear. In this case stochastic adaptive learning may not producebehaviour similar to the Grossman-Stiglitz rational expectations equilibrium. In general,whether or not stochastic adaptive learning yields the rational expectations equilibrium (orsome stochastic approximation of it) depends on a parameter in the economy and a parameterof the learning process. In particular if the level of noise trading in the economy is lowrelative to the individual experimentations, then the learning process will not producebehaviour similar to the rational expectations equilibrium.In the Grossman-Stiglitz model, the reason the uninformed agent’s inference is imperfect isdue to the stochastic supply of the risky asset. The noisy supply, often referred to as noisetraders, is important to the Grossman and Stiglitz’s original model. A rational expectationsequilibrium proportion of informed agents exists as long as some (perhaps very small) noisetrading is present. However, if the rational expectations equilibrium is used to model aworld with agents who are learning using some stochastic adaptive process, it is necessarythat the level of noise trading not be too low. The intuition for this result is presented inChapter 4. Since stochastic adaptive systems are complex, Chapter 5 investigates this issueusing deterministic dynamics as an approximation for stochastic systems. Finally, Chapter6 presents a genetic algorithm simulation of the model using the genetic algorithm as anexample of a stochastic adaptive learning process.- 15 -Learning, EvoluSion and SeOrganization IntroduciionIV- ConclusionUnderstanding the decisions of individuals is the cornerstone of modern finance. This thesisexpands that understanding by relaxing the demanding informational and computationassumptions. Introducing learning, particularly stochastic adaptive learning, can dramaticallyimpact a model’s predictions. Besides simply failing to converge to a traditional equilibrium,models with stochastic learning can involve complex behaviour.- 16 -Chapter 2Co-evolution and Spatial InteractionAlthough the assumption of complete and unbounded rationality in economicshas been fruitful, it is not without controversy. Critics, such as Simon, have long arguedthat the assumption is inconsistent with what is empirically known about human thought andchoice processes.1 In response to this criticism, modelling agents as continuously andunboundedly optimising is often justified by an appeal to some evolutionary process. Overtime, after many decisions, agents have a well developed set of behavioral rules-of-thumb.Good rules are imitated, bad rules die out and new rules are periodically introduced. Theend product of this evolution is a set of rules which are indistinguishable from the rules of1 See Simon (1987), for example. This criticism is also made in many of the papers in Hogarth and Reder (1986) and Langlois(1986).- 17 -Learning, Evolution and S4fOrganization Co-evolution and Spatial Interactionan unboundedly optimising agent.2 This argument is strengthened by appeals to aggregation.Individual anomalies are not systematic and thus have no impact on average behaviour.However, these evolutionary arguments are flawed. Rules-of-thumb can rarely evolveindependently of other agents. The success of an agent usually depends on the behaviour ofother agents. In these situations behaviour must co-evolve simultaneously. When thisoccurs, the evolution mechanism can fundamentally affect individual and aggregatebehaviour. Evolution need not lead to behaviour which is isomorphic to that of unboundedoptimisation. These complexities of co-evolution are a topic of recent interest inevolutionary biology and are the theme of this chapter.3The interdependencies which necessitate co-evolution can take many forms. Other agents’portfolio investment strategies affect the asset returns faced by all agents (Blume and Easley,1992). Agents’ behaviour affects the endogenous relationship between economic variablesand therefore influences the inferences agents should make (see Chapter 4). The most directform of interdependence occurs in strategic situations. In game theory, success dependsdirectly on the behaviour of opposing players. In fact, this chapter builds on work inevolutionary game theory.2 Lucas (1986) is one such example when he states: “Technically, I think of economics as studying decision rules that are steadystates of some adaptive process, decision rules that are found to work over a range of situations and hence are no longer revisedappreciably as more experience accumulates.” (Page 218).Rosenzweig, Brown and Vincent (1987) and Kauffman and Johnsen (1991) are two examples of co-evolutionary theory inbiology.- 18-Learning, Evolution and S4j’Organization Co-evolution and Spatial InteractionIn this chapter, interdependence is modeled using the repeated prisoner’s dilemma game.Both the finitely and infinitely repeated versions are considered. It is important to state thatthe purpose of the chapter is not to provide another characterisation of when co-operationwill or will not take place. Instead, the familiarity of the struggle between the need for jointco-operation and the temptation of exploitation is used to assist in understanding the complexevolutionary dynamics. Specifically, agents are located on a grid (without boundaries) andplay a repeated prisoner’s dilemma with each of their four immediate neighbours. Thenumber of agents that are replaced per generation determines the importance of location inthe model. By varying the number of agents which can alter their strategy in any generation,the effect of co-evolution is modified. If a large proportion of agents are replaced, then anyone agent will likely have new neighbours in each generation.4 Thus her optimal strategymust co-evolve with the entire population. If the proportion of agents being replaced is smallthen most changes in the population will not affect her neighbours. However, if oneneighbour does change strategy, the impact will be significant. In this case co-evolution willbe a local phenomenon. The proportion of the agents replaced in any one generation is thismodel’s key parameter in the exploration of co-evolution.The purpose of the chapter is to explore the hypothesis that the way evolution is modeledmatters. Evolving an optimal strategy as other agents are similarly evolving is fundamentallyEquivalently the agent faces old neighbours with new strategies. It does not matter whether we think of the agents getting newstrategies or being replaced by new agents. Strategies are modified only through evolution, there is no explicit learning wherepast interactions (beyond the current game) matter.- 19 -Learning, Evolution and S4/.Organization Co-evolution and Spatial Interactiondifferent from choosing a strategy which is optimal on a fixed landscape. Evolutionarychanges by other agents alter the landscape. An evolutionary mechanism which convergesto optimal behaviour in a fixed (no co-evolution) environment need not converge in thepresence of co-evolution. This chapter will demonstrate that how evolution is modeled notonly affects the behaviour of the system in the short run, but dramatically impacts the longrun characteristics. The equilibrium behaviour, if an equilibrium is reached, variesdepending on how co-evolution is modeled. The familiarity of the prisoner’s dilemma willbe used to understand why the characteristics of co-evolution matter. Specifically, in theinfinitely repeated prisoner’s dilemma (a game with many Nash equilibria), increasing theproportion of agents replaced per generation tends to reduce the expected population averagepayoff (or fitness). The increased replacements also increase the variance of the averagepayoffs. More equilibrium payoffs are possible when a larger number of agents is replaced.In contrast to the equilibrium achieved in the infinitely repeated game, evolution in the finitegame may not find the unique Nash equilibrium. With the finite game there is an interestingrelationship between the average long-run payoff and the model’s stability. The sameproportion of agents replaced that maximises long-run average payoffs leaves the model mostsensitive to initial conditions and random fluctuations (ie. butterfly effects). If there existedsome mechanism to select the optimal number of agents to be replaced per generation in thefinitely repeated game, then the system will exhibit chaotic behaviour.5 Finally, theseKauffman and Johnsen (1991), discussed below, use the term mete-dynamic to describe the selection of evolution parameters.An evolution dynamic determines individual agent characteristics based on individual fitness. A mete-dynamic, in contrast,effects evolution parameters based on the fitness of the entire population.- 20-Learning, Evolution and SeOrganization Co-evolution and Spatial Interactionresults are compared to those of recent co-evolutionary biology. The similarity between ourmodel and that of Kauffman and Johnsen (1991) is used to interpret both their results andthe results of this chapter.The methodology used to investigate co-evolution in this chapter is simulation. In the nextChapter, algebraic analysis in a simpler environment supports and augments many of theresults of the simulation analysis. The technique of using simulations to expand theempirical data base upon which theory can be constructed is typical of studies in the fieldof Artificial Life and Emergent Processes (see Langton, 1992). The simulation of evolutionis based on a genetic algorithm. The genetic algorithm manipulates the agents’ strategieswhich are represented as bit string mappings. The genetic algorithm operators of selection,crossover and mutation efficiently imitate successful strategies and introduce new strategiesto produce successive generations of the population. Besides the intuitive appeal, the geneticalgorithm is easily implemented and has proved a successful optimisation tool in manysettings. In addition, genetic algorithms have recently been applied in models where coevolution occurs.6The chapter is organized as follows. Section 2.1 discusses some of the literature related toco-evolution in biology, evolutionary game theory and the repeated prisoner’s dilemma.Section 2.11 introduces the spatial version of the game and the techniques used to simulate6 See for example, Miller (1989), Marks (1989a, 1989b), Marimon, McGrattan and Sargent (1990), Slade and Eaton (1990), andDanielson (1992).- 21 -Learning, Evolution and S4lOrganization Co—evolution and Spatial Interactionevolution. The results for the infinitely and finitely repeated games are presented in Section2.111. Section 2.IV discusses these results and their relationship with evolutionary biology.Section 2.V concludes.I - BackgroundThe relationship between theories of evolution in economics and biology dates back to theinfluences of Malthus on Darwin.7 The roots of this study lie in both the recent biologicaltheory of co-evolution and the economic study of evolutionary game theory. This sectiondiscusses some of the vast literature in these areas upon which this chapter draws.In biology, Kauffman and Johnsen (1991) provide both theoretical and simulationmethodology to study the dynamics of co-evolution. In their model the agents are adaptingto a fitness landscape that is both rugged (multi-peaked) and coupled with other agents.Adaptive moves by one co-evolutionary partner not only affect the fitness of the agent butalso the fitness landscapes of other agents. For example, when frogs develop sticky tongues,flies develop slippery feet. The development by the frog changes the desirability (fitness)of slippery feet. What was once a valley on the fly’s fitness landscape is now a peak. TheKauffman and Johnsen framework focuses on the interaction among the size of gene, theruggedness of the landscape and the degree to which landscapes are coupled. Gene size isRosser (1992) provides an interesting analysis of this historical relationship.- 22-Learning, Evolution and S4fOrganization Co-evolution and Spatial Imeractiona measure of the complexity of the evolutionary task. Ruggedness is a measure of both thecontinuity and the number of local optima of the landscape. Finally, landscapes which arecoupled to a higher degree buckle more severely when another agent changes strategy.These parameters control evolution characteristics like the length of time to achieve a Nashequilibrium, the stability of the system to perturbations in the initial conditions and theaverage fitness of the agents. Although the repeated prisoner’s dilemma model presentedbelow differs from the Kauffman and Johnsen methodology, our model has analogouscomponents and similar results. These similarities will be exploited in Section 2.IV in theinterpretation of the results.In economics, evolution and the repeated prisoner’s dilemma is closely associated with thework of Axeirod (1984). His tournaments used game theorists, decision scientists and othersto submit strategies to play a repeated prisoner’s dilemma game. This chapter, like theothers noted below, replaces people with computer simulation to generate strategies.Axefrod (1987) was also the first to consider a genetic algorithm simulation in the prisoner’sdilemma. Axeirod used the genetic algorithm to breed an optimal strategy to play againsta fixed environment (the winners from his previous tournament) and in an environmentconsisting of the evolving population. Miller (1989) and Marks (1989a, 1989b), buildingon Axefrod, use genetic algorithms to evolve a population of agents playing a repeatedprisoner’s dilemma. Since these papers do not vary the degree of co-evolution which is- 23 -Learning, Evolution and S4tOrganizadon Co-evolution and Spatial Interactionpresent in their models, these papers do not specifically address co-evolution. This chapterdiffers from these studies by considering the effects co-evolution directly.Evolutionary game theory is a popular topic in modem economics.8 It has developed toaddress the questions begged by the Nash equilibrium concept. Strategies form a Nashequilibrium when they are mutual best responses. However, the equilibrium concept doesnot address the issue of how agents jointly choose to play the equilibrium strategies. Thisquestion is even more important in games with multiple Nash equilibria. Even if agents areassumed to play Nash equilibria, they must somehow choose which equilibrium to play.Although many refinements of Nash equilibria have been proposed to select a uniqueequilibrium, none has gained universal acceptance.9 Whether evolutionary game theory cananswer the questions of how equilibrium is selected and reached remains to be seen.’° Thischapter argues that how evolution is modeled effects the equilibrium selection and determineswhether or not an equilibrium is reached. In many of the papers in evolutionary gametheory, however, some of the assumptions about the evolutionary mechanism are implicit.8 See Mailath (1992) survey of the papers contained in the symposium on evolutionary game theory in the Journal ofEconomicTheory.Kohlberg and Mertens (1986) provide a comprehensive discussion of refinements of Nash equilibrium.10 Skyrms (1990) also discusses the process of arriving at a Nash equilibrium. Unlike evolutionary game theory where agentsactually play the game, using the results to modi1’ their strategies, Skyrms considers a deliberation dynamic which occursentirely before play begins. Similarly, Binmore (1988) considers game-playing machines which use an internal dynamic similarto Skyrms to predict their opponent’s play.- 24-Learning, Evolution and S4fOrganization Co-evolution and Spatial IiUeractionA central notion of evolutionary game theory is the evolutionary stable strategy (ESS)equilibrium concept. ESS is motivated by considering a population of strategies and thesuccess of any possible mutant or invading strategy. The definition of ESS implies that itis a symmetric Nash equilibrium which is relatively unaffected by invasion by a few agentswith alternative strategies (ie. invading strategies score lower fitness and do not survive).1’ESS helps determine whether a given population of strategies is likely to persist. The ESSconcept plays an important role in interpreting the results presented here. However,evolutionary game theory which focuses on the dynamic properties of evolution will be moreuseful in developing theoretical analysis of the model presented below.’2II - The Repeated Prisoner’s Dilemma Model and EvolutionThere are three elements that define this study. The first is the interaction between theagents which determines their fitness. The second is the evolutionary mechanism whichcontrols how strategies are represented, created and modified. Finally, there are parametricassumptions for the simulations which can influence both quantitative and qualitative resultsof the study. Throughout this section it is important to be aware of the concern raised byFor a fonnal definition of ESS, see van Damme (1987). For a discussion of the biological motivation for ESS see MaynardSmith (1982). ESS is closely related to other refinements of Nash equilibrium. For example, an ESS equilibrium is both aperfect and a proper equilibrium (van Damme, 1987). However, ESS need not exist and if it does exist, it need not be unique.For a discussion of ESS in the context of repeated prisoner’s dilemma see Binmore and Samuelson (1992). They also providetheir own modification of ESS called Modified Evolutionary Stable Strategy or MESS.12 These papers include Friedman (1991) and Nachbar (1992) in models without mutations and Kandori, Mailath and Rob (1993)and Young (1993) in models that include mutations. Friedman (1991) discusses the relationship between ESS and dynamicstability. Other than in two-person symmetric games (as in our model) ESS need not be necessary or sufficient for dynamicstability.- 25 -Learning, Evolution and SeOrganizadon Co-evolution and Spatial ImeracionJefferson et. al. (1992). In their artificial life study, the authors express concern that someof the many assumptions of studies using simulation may beg deeper questions. Specificassumptions which appear innocuous may have a large influence on the behaviour of thesystem.(A) Agent InteractionThe interactions between agents are modeled by the familiar repeated prisoner’s dilemma.It is the familiarity of the game that makes it a useful model of interactions, incorporatingboth competition and co-operation. It is important to understand the complex evolutionarybehaviour in this simple environment before considering more sophisticated games. Thepayoffs in the stage (or one-shot) game are presented in Figure 2.1(a). Note that theinteraction between agents is pairwise. While pairwise interaction may not be that commonin economics,13 it is a useful simplification. In particular, pairwise interaction keeps thestrategy space manageable.The repeated game is an L-times repetition of the stage game with perfect information.’4Strategies in this game are more complex than in the one-shot game since they can dependon the history of moves. History includes both your opponent’s previous moves and yourprevious moves. Thus a strategy must specify play in all possible histories, including those13 Although not central to their study, this criticism of Axelrod’s environment is made in Slade and Eaton (1990).14 Miller (1989) considers the effect of introducing a noise so that agents do not know for certain the past actions of their opponent.- 26 -Learning, Evolution and S4tOrganizaiion Co-evolution and Spatial Interaction(2,2) (0,3)(3,0) (1,1)Figure 2.1- The Repeated Prisoner’s Dilemma Game(a) Payoffs in the stage game (or one-shot) game. (b) Possible and equilibria per iteration payoffs in infinitelyrepeated game.that are impossible given your strategy. While this is the standard definition of a strategyin game theory it is by no means innocuous. The interaction of this definition and theevolution mechanism are discussed in Section 2.ffl. Traditional game theory analysis yieldsa unique Nash equilibrium when L is finite.15 Regardless of how large L is, the equilibriumis always observationally equivalent to all D. Per iteration (or average) payoffs in this gameare 1.0 for all agents. In contrast, when L is infinite, all average payoffs in the shadedregion of Figure 2.1(b) are possible Nash equilibria payoffs.16 However, not all of these15 Traditional assumptions needed to suppolt this conclusion include common knowledge of the payoffs, the length of the gameand of the rationality of the players. Nachbar (1992) focuses on evolution in the finite game.16 This is the “Folk Theorem”. For any point (payoffs) in the shaded region of Figure 2.1(b), strategies can be constructed thatform a Nash equilibrium with these payoffs (provided agents’ discount factors are sufficiently close to 1 - ie. future payoffsmatter). A version of this appears in Fudenberg and Tirole (1991), page 152.- 27 -MeC DCYou(0,3)MeDPayoffs to (You,Me)(a)You (3,0)(b)Addapted froni Bhmore and Samuelscai (1992). page 279.Learning, Evolution and SeOrganizatlon Co-evolution and Spatial Interactionequilibria are possible in an evolutionary model. Payoffs (fitness) must be symmetric forevolution to converge. 17 Thus if Nash equilibria are reached in the infinite game, they willbe along the thick diagonal line of Figure 2.1(b).Most evolutionary models that focus on pairwise interaction consider an environment whereall agents play all other agents (Miller 1987 and Marks 1989a, 1989b) or agents are pairedrandomly from a population (Nachbar, 1992). Strategy fitness depends on the entirepopulation of agents. In our model, agents are located on a two dimensional torus (a gridwith no boundaries). All agents play a repeated prisoner’s dilemma with their fourimmediate neighbours. This environment was initially proposed in Axelrod (1984). Sincethe purpose of this chapter is to investigate co-evolution, a model with local interactionunderscores the inter-dependence of strategies by restricting the dependence of an agent’sfitness to a subset of the total population. There need not be a globally optimal strategy forthe game. An agent’s optimal strategy depends on his neighbourhood. If we think oflocation as geographic, then it might make perfect sense to trust your neighbour in WestVancouver but not in East Vancouver. However, location need not be thought of asgeographic. Axelrod points out that location could be interpreted as characteristics in aproduct space. For example, completion in the luxury car market differs from that in theeconomy car market.17 Symmetric payoffs are necessary for a population of strategies to be ESS.- 28 -Learning, Evolution and S4fOrganization Co-evolution and Spatial InteractionAlthough Axefrod had a similar form of interaction, his analysis was quite different than thatpresented here. The primary difference is the model of evolution. In Axelrod’s model,agents can adopt only the strategy of their neighbours. Thus evolution focused solely onlocal imitation. The evolution mechanism in our model is much richer, and is introducedbelow.(B) EvolutionHow strategies are represented and modified defines the evolutionary mechanism. Therepresentation of strategies, the replacement and creation of strategies and the geneticalgorithm are all discussed in this section. Figure 2.2 presents a schematic overview of thisprocess. Note that the procedure is iterative. A tournament consisting of all agents playinga L-times repeated game with their neighbours is conducted, then the strategies in thepopulation are updated, then a new tournament is played. Strategies do not change duringa tournament. There is the potential for confusion since the game itself is dynamic (ie. Ltimes repeated). However, strategies do not change during the tournament. It is best tothink of agents submitting complete history dependent strategies which are executedautomatically. In addition, in evolutionary models, such as this one, history refers only tothe current game. Strategies cannot condition on what occurred in past games ortournaments. 1818 Mailath (1992) discusses these assumptions of evolutionary models and their interpretation. If strategies could condition on pastgames (with other neighbours or in past generations) then the definition of the game would be changed. No longer wouldour intuition of the repeated prisoner’s dilemma be helpful.- 29 -Learning, Evolution and SeifOrganization Co-evolution and Spatial InteractionRandomly Select an Initial PopulationPlay Repeated Prisoner’s DilemmaUsing bit-string strategies, all agents playtheir four immediate neighbours a Lrepeated prissoner’s dilemmaCalculate FitnessesFitnessea are the average per round payoffsSelect R Agents for ReplacementThe R least fit agents are chosen toreceive a new strategyGenerate R StrategiesJsing the genetic algorithm operators of fitnessroportional selection, croover and mutation, Rstrategies are created to replace the above agents.This process is stochastic.Figure 2.2- Schematic for Evolution SimulationA genetic algorithm is used to create new strategies based on the results of the previoustournament. Genetic algorithms were developed by Holland (1975) and have provedeffective in many diverse environments. They incorporate the intuitive appeal of imitationof successful strategies and, through crossover and mutation, can create novel strategies.Since the genetic algorithm is the primarily evolutionary tool, interpreting the results requiresa basic understanding of genetic algorithms.A genetic algorithm is a directed random search. Agents’ strategies are represented asstrings of bits (or alleles) chosen from some alphabet (usually {O, 1}). Strategy representationis important and is discussed below. New strings are formed through selection, crossoverand mutation. Selection chooses strings randomly proportional to their fitness (success in- 30-Learning, Evolution and S4fOrganization Co-evolution and Spatial Interactionthe tournament with their neighbours). Strings with higher fitness are selected with higherprobability: a weighted roulette wheel. Crossover splices together two strings at a randomlychosen location. By combining successful strategies in this way, the genetic algorithmefficiently experiments with strings that are likely to produce superior strategies. Finally,mutation involves randomly altering a few bits to introduce genetic diversity. When geneticalgorithms are implemented on fixed landscapes (ie. no co-evolution) they usually convergeto the global optimum. What occurs in environments where co-evolution is present is theobject of this study.19Strategies for the repeated prisoner’s dilemma game are represented as bit-strings. Strategiesare restricted to pure strategies.2° Each possible history (the empty history, the 4 possiblehistories after the initial move, the 16 possible histories after two moves...) corresponds toa bit in the bit-string. The bit determines the agents play given that history. As long as L,the length of the game, is finite so are the string lengths.21 To implement simulations for19 The Scientific American article by Holland (1992) provides a short overview ofgenetic algorithms and Goldberg (1989) providesa thorough introduction. The proof that the genetic algorithm will converge to the global optimum is Holland’s (1975) SchemaTheorem. Goldberg (1989), Chapter 2 and Vose (1991) offer proofs of this theorem. Goldberg also provides a discussion ofthe premises of the theorem. The Pascal code for the simulations was adapted from Goldberg (1989). The implementation offitness-based selection is via the stochastic remainder method as described in Goldberg (1989), page 123 which is slightlydifferent than a weighted roulette wheel method noted in the text.20 It is not difficult to modify the representation of strategies to include mixed strategies. However, this would substantiallyincrease the required string length.21 Since there are four possible Outcomes from play in the stage game, {(C,C),(C,D),(D,C),(D,D)), the length of the bit stringrequired to describe strategies in the L-times repeated game is given by the geometric series, %.(4L.l). The stringcan be ordered many different ways. Whether, for example, the initial move is given by the first bit, the last bit or bit 17 couldinfluence the likelihood that some strategies are created. Bits which are located close together in the bit-string have a lowerprobability of being separated by the crossover operator of the genetic algorithm. This point is discussed in Goldberg (1989),Chapter 2 in relation to the Schema Theorem. To alleviate this concern, simulations were sun using different bit-stringmappings. This has little effect on the results.-31 -Learning, Evolution and S4/Organizazion Co-evolution and Spatial Interactionthe infmitely repeated game, strategies are bounded to have a memory of three rounds. Ingames of length greater than three, agents can no longer implement the strategy “D in thelast round”, since their limited memory cannot count the moves (see Nachbar, 1992). Byeliminating strategies which can condition on the length of the game, L> 3 will represent aninfinitely repeated game. Note that this restriction still leaves many possible strategies. Thebit-string length of 85 bits means there are 285 or 3.9x 1025 possible strategies. In particular,even though memory is resthcted, trigger strategies are possible.22The bit-string representation was first used by Axeirod (1987) and appears in Marks (1989a,1989b). An alternative representation is to represent strategies as finite automata or statemachines as used in Miller (1989).23 For genetic algorithm implementation of theautomata, the string is partitioned into a number of states. Each state consists of three parts.The first determines the move in the state. The second and third parts determine the nextstate given a C move or D move from the opponent. While the two representations areequivalent in theory (Kalai and Stanford, 1988) there are some practical differences. Fora given string length, more complex (longer memory) strategies are possible using finite22 Typical trigger strategies usually include items like “once my opponent plays D, Twill play D forever”. This strategy can beimplemented in a finite bit-string since the memory-limited history includes the agent’s own past moves.Binmore and Samuelson (1992) consider an evolutionary model using the state-machine representation and include diagrams formany strategies in the infinitely repeated game.- 32-Learning, Evolution and S4/Organization Co-evolution and Spatial Interactionautomata than bit-string representation. However, for finite games (such as the twicerepeated game considered here) the bit-string representation is easier to analyze?1As was mentioned in the introduction, the most important parameter of this model is thenumber of strategies replaced in each generation. It is assumed that of the N agents in thepopulation only the R agents with lowest fitness are selected to receive a new strategy?5Recall that new strategies are created by the genetic algorithm based on the entire currentpopulation. One can think of the ratio RIN as the degree of satisficing behaviour (ie. agentsare contented with a certain rank) or the cost of acquiring a new strategy.26 By varying thisratio the importance of location in the model is changed. This changes the nature of coevolution. The probability that your neighbour may adopt a new strategy determines whetherthe success of a strategy depends on whether it is well adapted locally or whether successrequires the strategy to be well adapted to the entire population (or more precisely welladapted to strategies which are likely to be generated from the current population using thegenetic algorithm). Although their models differ from this one, for comparison, Miller24 There is one additional possible representation for strategies called genetic programming (Kosa, 1992). This approach representsthe strategies as programs of functions (such as LISP). In more complex environments, genetic programming lets evolutionliterally choose and combine rules-of-thumb like “trust your neighbour.” In particular, the string length in these models neednot be fixed. However, the results from genetic programming are very difficult to interpret and involve difficult methodologicalquestions. Danielson (1992) uses this approach in a modified version of the one-shot prisoner’s dilemma game.Agents are ranked according to their fitness with ties being broken randomly. In short games, where ties are more frequentthe tie breaking assumption can influence the results.26 An alternative to selecting agents based on their rank would be to select agents whose fitness fell below some threshold.Determining replacement based on a threshold would be more consistent with satisficing behaviour (Simon, 1987) or theexistence of an opportunity cost for the agents to remain in the game. However, rank-based replacement provides for moreprecise control of the co-evolutionary environment and therefore best suits the objective of this chapter.- 33 -Learning, Evolution and SeOrganization Co-evolution and Spatial Interaction(1989) replaces 33% of the population and Danielson (1992) replaces 50% of the populationeach generation.The R agents whose strategies are replaced in a generation are selected based on fitness. Bydetermining replacement based on fitness the probability that your neighbour receives a newstrategy depends upon his fitness. This is crucial for the RIN ratio to have a significanteffect. If a group of strategies in one region work well together (ie. all have high fitness),then with a low R/Nratio, these strategies will not change. If, for example, replacement wasunrelated to fitness, then regardless of the RIN ratio some strategies in this group will bereplaced.(C) Parametric AssumptionsThe parametric assumptions that define these simulations are presented in Table 2.1 and thepayoffs used for the iterated prisoner’s dilemma game are in Figure 2.1(a). This sectionfocuses on the definition of fitness and the length of the repeated prisoner’s dilemmagame.2827 Stochastic replacement of a single agent is used in Chapter 3 where the probability of non-fitness based replacement as theanalog to the RIN ratio.28 Other parametric assumptions are also important. However, results from varying these parameters suggest they do notdramatically effect the quality of the results presented here.- 34-Learning, Evolution and Sel-Organization Co-evolution and Spatial InteractionPopulation size, N 100Maximum generations 100 to 17,500Crossover probability 0.7Mutation probability 0.0001Number of neighbours each agent pkiys in tournament 4Number of agents to replace each generation 1 to 100Number of iterations in the prisoner dilemma game 1, 2, 10Corresponding length of bit-string representing agent’s strategy 1, 5, 85Table 2.1 - Parameters for SimulationFitness in this model plays two roles. The first is to determine (deterministically) the Ragents that will receive new strategies. Second, fitness is used to determine the probabilityan existing strategy will be selected by the genetic algorithm. Ignoring the cross-over andmutation operators, fitness determines the probability an existing strategy is imitated by theR agents receiving a new strategy. Fitness for the N agents is defmed as the average periteration payoffs. Where 7 is the total payoff from playing the L times repeated prisoner’sdilemma against four neighbours, fitness, J, i=1,.. . ,N, is defined as:= .IL (2.1)4LGiven the payoff matrix in Figure 2.1(a), all fitnesses are positive. Thus, fitnessproportional selection means that the probability an existing strategy is selected by thegenetic algorithm is given by- 35 -Learning. Evolution and SdJOrganizarion Co-evolution and Spatial InteractionProb(select i) = (2.2)This definition implies that the cardinal value of fitnesses matters. If for example, a positiveconstant is added to the payoffs, the equilibrium game theory results are unaffected. Inaddition, the fitness rankings, which determine the agents that will receive new strategies,are also unaffected. However, the probability any one strategy is selected by the geneticalgorithm changes. From equation (2.2), adding a constant moves the probability that aspecific agent’s strategy is selected closer to a uniform (1/N) probability?9 Thus sealingpayoffs differently can change the long run properties of the system. For example, by deemphasising strategies that are initially successful (say an all D strategy), co-operative nichesmay form. In evolutionary models where not all possible strategies need be represented inthe population, the cardinal properties of the payoffs can affect the results qualitatively.30Similar to the choice of payoff matrix, the length of the iterated prisoner’s dilemma that theagents play influences the results. The length of the game, L, determines the importance ofan individual move. In a long game, the advantage (increase in fitness) gained from playing29 There are procedures available for the genetic algorithm which scale fitnesses to seemingly avoid this problem. See Goldberg(1989) or Miller (1989). However, the scaling procedures require the modeller to speci1y the significance of relative importance.In optimization applications of the genetic algorithm, there are recommended levels for this parameter. However, in modelswith co-evolution, the choice of this parameter would be arbitrary.This is in contrast to models of evolution which employ replicator dynamics. In these models, adding a constant to fitnessesusually only affects the speed at which convergence occurs. See Mailath (1992) for a discussion of replicator dynamics.- 36 -Learning, Evolution and SeI’Organization Co-evolution and Spatial InteractionD instead of C is much smaller than in a short game. As Marks (1989a) notes, there is arelationship between the length of the game and the theoretical implicit discount rates in atruly infmitely repeated game (ie. the importance of future payoffs). When L is increasedthe importance of joint co-operation (alternatively, the insignificance of any single move)swamps any effect of varying the number of agents replaced. In tournaments where L islarge, there is little distinction between local and global co-evolution. To simulate theinfinitely repeated game, L= 10 is used where memory of the strategies is restricted to threemoves. This value is much lower than those used in Miller (1989) and Marks (1989a,1989b).31 A small value of L is chosen to emphasise the R/N ratio. The results for thesetournaments are discussed in the next section.ifi- ResultsThe results are presented in two parts. First, the results for the infinitely repeated gamerepresented by a game of 10 moves with memory limited strategies are discussed. Second,the results for a one-shot and twice repeated game with no restrictions on the strategies arepresented. In general, simulations produce a great deal of data. To understand the resultsof the simulations it is necessary to effectively summarize the data. The primary summary31 More important than the length of the game is the number of moves each agent must make in the tournament (since each agenthas have only one strategy for all games). In this chapter each agent plays 4 games of length 10 for a total of 40 moves. InMiller (1989), each agent plays 30 agents at a game of length 150, for a total of 4,500 moves. In Marks (1989b) a tournamentconsists of 100 games of length 22 for a total of 2,200 moves. Calculating the implicit discount rate for the infinitely repeatedgame requires considering the total number of interactions and not just the length of the game as is shown in Marks (1 989a,page 6). In addition, although their payoff matrices are similar to the one used here both these papers scale the fitnesses to alterthe probability of selection (see footnote 29).- 37 -Learning, Evolution and SefOrganization Co-evolution and Spatial Interactionend of the chapter begfrznfng on page 63.statistic considered is the average fitness of the population. This statistic is presented bothby generation for a fixed R (ie. time series) and across different levels of R for a givengeneration (ie. cross-section). In addition, to interpret the effect of local interaction and theevolutionary mechanism, animation is used to present information on individual agentstrategies, fitness and location on the torus.(A) Infinitely Repeated Prisoner’s DilemmaFigure 3 presents a cross-section of population average fitnesses for different numbers ofagents replaced (R= 1,2,... ,99, 100). All of these simulations begin with the same initialpopulation and report the population average fitness in the final generation. The finalgeneration is defined as the first generation when the variance across individual fitnesses iszero (ie. all agents achieve the same payoffs). When this occurs the evolution mechanismis said to converge. Convergence is a necessary (but not sufficient, as will be noted below)for the population of strategies to be ESS. Of the 100 simulations reported in Figure 3, 7did not converge by generation 2000. For these cases, the average fitness in generation 2000is reported.- 38 -Learning, Evolution and SerfOrganization Co-evolution and Spatial InteractionRecall that fitness is defined as payoff per iteration. Individual fitnesses can be in the rangebetween 0 (perfectly exploited) and 3 (perfectly exploiting). However, since for every agentscoring a 3 in one iteration there is an agent scoring 0, population average fitnesses must fallin the closed interval [1,2]. For the same reason, the expected population average fitnessof a randomly selected initial population is 1.5. Finally, note that the average fitnesses onFigure 3 moves in increments of 0.1 (for those cases which converged). A necessarycondition of ESS is that the behaviour of all agents is identical. In ESS equilibrium, therewill be no (C,D) or (D,C) outcomes.32 Since the game length is 10 moves, 0.1 representsthe fitness increase from achieving one additional (C,C) instead of a (D,D) outcome.From Figure 3 the effect of varying the number of agents replaced per generation can beseen. R tends to reduce the average fitness (more low fitness cases occurred) and increasethe variability of the population average fitness. However, these results are for only onerandomly chosen initial population and the creation of new strategies by the genetic algorithmis stochastic. To confirm these initial results, Figure 4 presents cross-section averagefitnesses for 20 different initial populations and 20 values for the number of agents replacedper generation (R=5,10,...,95,100).Using the 400 data points from this simulation, the results for the regression of the numberof agents replaced per generation on population average fitness are reported in Table 2.2 (top32 A population of strategies that call for such asymmetric behaviour will not survive the crossover operator of the geneticalgorithm.-39-Learning, Evolution and S4tOrganization Co-evolution and Spatial InteractionDependent Variable: Population average fitness in final generation, FIndependent Variable: Number of agents replaced per generation, RNumber of Observations 400R-Squared: 0.2291Standard Error: 0.2853Fitted Regression: F = 1.9366 -0.00537 X R(t-ratio) (64.36) (-10.87)Table 2.2 - Regression ResultsData: 20 different initial populations and R=5,10,15,. . . ,95,1CK) used to generate 400 data points for regression.of next page).33 The regression indicates that R explains 23% of the variation in theaverage fitnesses and the regression coefficient on R of -0.005 is statistically different fromzero. Thus, average population fitness declines as the number of agents replaced increases.In addition, the dispersion about the regression line is increasing in R (see Figure 4). Inparticular, note that it is only when R>50 does the population average fitness reach its lowerbound of 1.0. This heteroscedasticity implies that when many agents are replaced eachgeneration, the evolution mechanism is much more sensitive to random fluctuations. Thisis consistent with the intuition that when agents co-evolve locally (ie. small R) a mutationor small change in initial population will have a local effect. However, when R is largesmall changes can affect the whole population.The regression results are similar if only the 337 cases where the genetic algorithm converged. The fact that the 63 cases ofnon-convergencetend to lie below the regression line (ie. below average) is curious. This experiment was repeated increasingthe number of simulations to 2600 with almost identical results.This fact suggests that a single regression may not be appropriate for the entire range of R. However, I cannot reject thehypothesis that the regression coefficients when R 50 equal the coefficients when R> 50 (via an F-test of linear restrictions).Thus a single regression is suitable.- 40-Learning, Evolution and Set/Organization Co-evolution and Spatial InteractionFigure 5 presents average fitness per generation (time series) for three different values of R.The three simulations all begin with the same initial population and converge by generation250. These are typical of the cases which constitute Figures 3 and 4. Of the cases thatconverge, convergence usually occurred before generation 250. Typically average fitnessfalls in the initial generations reflecting the proliferation of strategies like all D which havean advantage in randomly created populations. Whether or not average fitness continues todecline depends on the number of agents being replaced.Replacing 8 agents per generation is typical of a case where convergence to high populationaverage fitness (1.9) occurs. Figure 6 presents a series of frames from an animationprogram used to interpret the simulations. The animation represents locations, strategies andfitnesses of agents as disks on a grid. Recall, that the agents are located on a torus and thusthe grid has no boundaries. Individual agent fitness determines the radius of the diskrepresenting the agent (population average fitness is in Figure 5)•35 The shading representsthe number of C’s executed by the agent relative to 40 possible moves each agent makes (4games with 10 moves each) in a generation according to the scale in Figure 6(a).36 Figure6(b) is the initial (randomly selected) population. The population changes little in the first100 generations (Figure 6c) with the exception that there are fewer extreme fitness values.Note that this representation emphasises fitness differences since the area of each disk will be proportional to the square offitness.36 The simulation software shows these animations using colour. Unfortunately, only black and white images can be presentedhere.- 41 -Learning, Evolution and Seq-Organization Co-evolution and Spatial InteractionAgents being exploited in generation 0 have received new strategies. By generation 130 afew niches of agents behaving co-operatively has developed (middle-right of Figure 6d). Theagents in these niches have high fitness. Not only are these agents not replaced, but thoseagents which are replaced are replaced with strategies created (most likely) from membersof the co-operative niches. By generation 135 the strategies of the co-operative agentsspreads. By generation 150, although agents’ bit-strings are not identical, all agents’strategies yield the same behaviour; D on the initial move and C on the remaining 9 moves.From investigating the bit-strings for this population, it does not appear that the C’s areplayed unconditionally. There are many D’s in the string which will prevent all D strategiesfrom exploiting the agents. In addition, simple strategies like all C are identified by theirinitial move and are quickly exploited.37 Animations for the other simulations presentedin Figure 5 are similar and are not presented.Since the strings are large, it is difficult to determine if the strategies generated in thesimulations form a Nash equilibrium. It is certainly possible since any payoff in the closedinterval [1,2] can be the outcome of a Nash equilibrium. The evidence from the threesimulations in Figure 5 is suggestive since the populations are robust to replacing one agentwith strategies like all D or all C. However, to conclude that the strategies form a NashSome populations did achieve C’s played in all 10 moves. However, in those cases where both (C,C) and (D,D) were observedin a population which converged, the (D,D) outcomes appeared in the first moves of the game and the (C,C) moves occurredin the last moves of the 10 move game. Strategies of this form are typically equilibria in games played by finite automata withstrategic complexity considerations (see Binmore and Samuelson, 1992). In these games, there is a (lexicographic) preferencefor simpler strategies. All states of the automata must be used in equilibrium to ensure agents cannot benefit by playing simplerstrategies (like all C).- 42-Learning, Evolution and SdfOrganization Co-evolution and Spatial Interactionequilibrium requires showing that for each agent, there does not exist one strategy (of the3.9 x 1025) that increases the agent’s fitness holding constant the strategies of the rest of thepopulation.38 Even if the strategies form a Nash equilibrium, they need not be ESS.Figure 3 presents the population average fitness for the first generation where the populationconverged. Figure 7 presents the change in population average fitness from the timeconvergence occurred and average fitness in generation 2000. Of the 93 cases ofconvergence in Figure 3, the population average fitness changes in 31 of the cases when thesimulation is run longer.39 Thus at least these 31 cases were not ESS. The average changein fitness is 0.003. Note however, R affects the stability. When the number of agentsreplaced in a generation is high, there are more cases of instability. This is consistent withthe heteroscedasticity of average fitness increasing in R that is discussed above.Figure 8 presents one example where the population converged but did not remain stable.In this run, 67 agents are replaced each generation. At generation 105 the populationconverged to an average fitness of 1.3 (reported in Figure 3 for R = 67). Betweengenerations 415 and 426 the population diverges and then re-converges to an average fitnessof 1.4. Figure 8 shows that this occurred several times. The population converges (thehorizontal lines) then diverges before settling at an average payoff of 1.0 after generation38 Finding such a strategy in a fixed environment is similar to the Axeirod (1987) environment. The genetic algorithm as anoptimization technique is one method of determining if such a strategy exists.Of the three cases presented in Figure 5, all are stable and average fitness does not change when the simulations are sun togeneration 2000.- 43 -Learning, Evolution and S4f-Organization Co-evolution and Spatial Interaction1500 (which may or may not be stable). The change between generation 860 and 890 istypical (of this and other runs) and demonstrates the effect of local interaction and theevolutionary mechanism. The average fitness for these generations are reported in the insertin Figure 8. Figure 9 presents 12 frames from the strategy animation during this time. Atgeneration 864 (Figure 9b) the population exhibits all D behaviour. This behaviour hadpersisted since generation 779. At generation 865 the evolution mechanism created a cooperative niche of strategies that does slightly better than the rest of the population (bottomcentre of Figure 9c). The agent at the centre of this group is able to induce some cooperative moves from all four of his neighbours. In generation 867, this pattern is repeatedin several locations. Note that the few co-operative niches have above average fitness. Theywill not be replaced (only bottom 67 are replaced) and have a better change of being selectedby the genetic algorithm to replace other strategies. In generation 870, these niches havebegun to overlap. As the groups get larger there is more co-operative behaviour induced andmore co-operative strategies are created (generations 871 and 872). This process continuesuntil generation 886 (Figure 9n) when agents are co-operating on 5 of the 10 moves in eachgame. At this time, the population has again converged.4040 This instability of the converged population at generation 864 (fitness of 1.0) is similar to that discussed in Binmore andSamuelson (1992). They discuss why an all D equilibrium in the infinitely repeated game is susceptible to invasion from a smallgroup of strategies. It is important in their discussion that the invading strategies both appear in a group and can recognize oneanother. In this model the invading niche in generation 865 interacts only locally. Local interaction plays the role ofrecognizing fellow members of the invading group.-44-Learnzng, Evolution and SdfOrganization Co-evolution and Spatial Interaction(B) Finitely Repeated Prisoner’s Dilemma GameThe finitely repeated game is distinct from the infmitely repeated game discussed above intwo respects. First, the Nash equilibrium where all agents play D is unique. Second, thememory of the agents strategies is not bounded. In the one-shot game there are two possiblestrategies, {C,D}, while in the twice repeated game there are 32 possible strategies (listedin Figure l3).’ Not only are all these strategies possible, they are all (equally) representedin the initial population. The evolution mechanism converges on the Nash equilibrium in theone-shot game. However, in the twice repeated game the mechanism does not always findthe Nash equilibrium but displays interesting dynamic properties.(i) The one-shot gameIn the one-shot game the strategy strings are single bits. In this case, the genetic algorithmreduces to an evolutionary mechanism of imitation. Regardless of the number of agentsreplaced per generation, the system converges on the equilibrium with all agents playing D.However, a co-operative niche can develop and temporarily survive. Figure 10 presents theanimation for a typical one-shot game simulation where one agent is replaced each generation(R= 1). The initial population (Figure lOb) shows the random distribution of agents playingC or D. The success of the D strategy causes this strategy to spread (Figure lOc).However, a niche of agents playing C as in Figure 10(d) can survive for several generation.41 Note that some of the strategies in Figure 13 are observationally equivalent. There are only 8 distinct pure strategies in thetwice repeated game (see Nachbar, 1992). However, requiring strategies to speci1r moves in “impossible” histories is importantin the presence of crossover and mutation. For example, strategies 0,2,8,10 all play C on both the first and second moves.Although these strategies produce identical behaviour, they are not equally effected by the genetic algorithm. In the event thatthe initial move (bit) was changed (mutated) to a D, these four strategies are no longer equivalent.- 45 -Learning, Evolution and S4/Organization Co-evolution and Spatial InteractionThe agents on the outside boundary of the niche have fitness of 1.5 (= [1+1+1+3] ÷ 4).The agents in the niche have fitness 1.0 (= [2+2+0+0] ÷ 4). All other agents play D andhave a fitness of 1.0 (= [1+1+1+1] ÷ 4). Since the agents in the niche have the samefitness as most other agents, it can take several generations before an agent in the niche isselected to receive a new strategy.42 Eventually one of the niche agents receives a newstrategy and plays D. This occurs in generation 52 (Figure lOe). The niche can no longersurvive since two members now have fitness of 0.5 (=[2+0+0+0]÷4) which is the lowestin the population. The remaining niche members receive a new strategy and play D. Bygeneration 55, the entire population reaches the Nash equilibrium and plays D.The example in Figure 10 highlights the sensitivity of the simulations to parametricassumptions. For example, consider the effect of modifying the payoff matrix inFigure 2.1(a) slightly. Suppose the payoff to the (C,C) outcome is changed to 2 + ½ (where€is a small positive number). The co-operative niche in Figure 10(d) will now surviveindefinitely since the fitness of the members is 1+ € strictly higher than the many agentsplaying D. This sensitivity is the result of choosing the R agents that receive new strategiesdeterministically. The sensitivity is of less concern in longer games where a greater varietyof fitnesses are possible and ties occur less frequently. In the twice repeated game resultsthat follow, the stability of populations to strategy perturbations is explicitly considered toaddress this concern.42 Recall that the R least fit agents are replaced and where fitness ties are broken randomly.-46-Learning, Evolution and SeOrganization Co-evolution and Spatial Interaction(ii) The twice repeated gameIn the twice repeated game whether or not the evolution mechanism converges on the Nashequilibrium depends on the number of agents replaced per generation. When more than 50agents are replaced, the system does converge to the all D Nash equilibrium. On average,convergence occurs before generation 30. However, when less than 50 agents are replaced,the system usually fails to converge. Figure 11 presents a cross-section of populationaverage fitness in generation 5000 for simulations with the same initial population butdifferent values of R. To understand the behaviour of those simulations which fail toconverge, Figure 12 presents the minimum and maximum population average fitness thatoccur in generation 5000 to 6000 as well as the average for that period.43 Identified on thegraph are four regions of evolutionary behaviour; Frozen, Critical, Stationary andConverged.’” The defining characteristics of the four regions are associated with time-series pattern of population average fitnesses and the related animations that occur aftergeneration 5000 (ie. long run behaviour). In particular the sensitivity to initial conditionsand the stability of the simulations to perturbations are important.45The minimum and maximum fitness values reported are for population average fitnesses, not individual agent fitnesses.The boundaries between the regions are somewhat arbitrary. Simulations which fall near the border often display propertiesof both regions. However, the behaviour of simulations on the interior of the regions is distinctive.It is important to consider the stability of the populations to small perturbations (ie. mutating bits of strategies) since the mutationoperator of the genetic algorithm functions only when creating strategies for agents selected for replacement. The perturbationsconsidered will identify situations where the properties of the population hinge on the deterministic replacement withoutsubjecting the whole population to a mutation operator. This is the case in the one-shot game with the payoffs modified by an- 47-Learning, Evolution and SeOrganization Co-evolution and Spatial InteractionIn the Converged Region (R 50) population average fitnesses are 1.0 for all generationspast 5000 (usually for all generations past 50) and reflect the all D behaviour of Nashequilibrium that is reached. The population average fitness of 1.0 is not sensitive to theinitial population chosen for the simulations. The populations of agents in generation 5000are very robust to perturbations. Even when a clique of co-operating agents is introduced(manually), the populations quickly return to the all D behaviour as the co-operative nichemembers are replaced. The populations usually consist of agents using strategies 26, 27 30and 31 listed in Figure 13. What is interesting to note about these populations is that if anagent mutates to play C on the initial move, it will not elicit a C from the opponent on thesecond move. The mutant strategy will not survive. A population of agents playing strategy28 or 29 results in all D behaviour. However if a single agent mutates to play C on theinitial move, a C will be played by the opponent on the second move. The mutant strategywill achieve above average fitness and survive. In this case, the convergence would not bestable.The Frozen Region occurs when the number of agents replaced is low (R 9). After the fewinitial generations the system freezes at a point where the same agents are replaced eachgeneration. For example, the eight dark agents in Figure 14, where R=8, are surroundedby agents which play D. In particular two of the neighbours of each of these agents havethe all D strategy (strategy 31 in Figure 13). No matter which strategy the dark agentschoose, the best they can do is to play D and achieve a fitness of 1.0. However, even with- 48 -Learning, Evolution and S4f-Orgarnzation Co-evolution and Spatial Interactionthis strategy, these agents rank at the bottom of the population (other agents achieve someco-operation) and are selected for replacement again. Figure 15 shows the populationaverage fitness for generations 5000 to 6000 for this simulation. Not surprisingly, there isvery little variation in the average fitness. Figure 16 shows the distribution of strategies inthe population. Although the distribution has changed from the initial distribution, thestrategies in generation 5000 are more disperse than those for simulations in the otherregions (see Figures 18 and 20). The results in the Frozen Region are not sensitive to initialpopulations or perturbations to the population at generation 5000. Perturbations and changesto the initial population change which agents are replaced in perpetuity. However, despiteinitial changes and perturbations, the system returns to the frozen state.In the Stationary Region (21 R 49), the population average fitness is distributed about astationary mean as in Figure 17 (where R=34). In this region, all agents receive a newstrategy every two or three generations. There are no neighbourhoods where agents retaintheir strategy for long periods of time. In this system innovative strategies travel quickly.Both exploitive strategies (like all D) and groups of strategies that attain some co-operationspread quickly and prevent the system from substantially increasing or decreasing thepopulation average fitness. While the population does not converge, it also does not displaythe non-stationary pattern of average fitnesses that occur in the Critical Region (See Figure19). This is despite the similarity of the population of strategies that exist in generation5000. Figure 18 shows the distribution of strategies in the Stationary Region and Figure 20- 49 -Learning, Evolution and S4tOrganizazion Co-evolution and Spatial Interactionshows the distribution for the Critical Region. Finally, populations in the Stationary Regionare robust to changes in the initial population and perturbations of the population ingeneration 5000.The last region of behaviour is the Critical Region (10R20). This region is defmed bya non-stationary pattern of population average fitness and a high degree of sensitivity to bothinitial population conditions and perturbations. These characteristics are usuallyassociated with chaotic systems. However, the spread of new strategies through thepopulation is slower and, in some sense, more orderly in the Critical Region. Someneighbourhoods of the population remain frozen for many generations before new strategiesare adopted. These properties are discussed and demonstrated below. It is interesting tonote that population average fitness is highest in the Critical Region. This result is consistentwith the Kauffman and Johnsen (1991) result and is discussed further in Section 2.IV.Figure 19 displays a typical pattern of population average fitnesses in the Critical Region.In this simulation there are 14 agents replaced each generation. Although the distributionof strategies in the population at generation 5000 (Figure 20) is very similar to that whichoccurs in the Stationary Region (Figure 18), the evolutionary behaviour is quite different.46 A technical description of critical systems is discussed in Bak, Tang and Wiesenfeld (1988). Their work is concerned withsystems which organize to a critical state. For an overview of Self-Organized Criticality (SOC) see Bak and Chen (1991). Inaddition Bak, Chen an Creutz (1989) discuss SOC in the “Game of Life”, Bak, Chen, Scheinkmanand Woodford (1992) discussSoc in an economic model with production and Kauffman and Johnsen (1991) discuss SOC in co-evolution models. Thedefining characteristic of critical behaviour is Power Law distributions. A Power Law distribution is one in which the naturallogarithm of the size of some event (say an earthquake or change in population average fitness) is proportional to the logarithmof the frequency of such events. All size events are possible. Small events are observed frequently and large events areobserved rarely.- 50-Learning, Evolution and S4/Organization Co-evolution and Spatial ItueractionThe pattern of population average fitnesses is non-stationary with several large changes inthe average fitness. Figures 21 and 22 present animation for this simulation for generationsbetween 6877 and 7777. This period covers two of the large changes in fitness. In thisperiod, average fitness changes from 1.5 to 1.25 and then back to 1.5. The animationsilluminate why behaviour in this region is critical.Figure 21(b) shows the population in generation 6877. Most of the agents are playing astrategy which plays C on the first move and D on the second move (C-then-D strategies are13 and 15 in Figure 13) scoring a fitness of 1.5. In the lower right hand corner of Figure21(b) there is an agent playing an all D strategy (strategy 31). This agent is more successfulthan other agents. However, the success is at the expense of his neighbours. Of the 14agents chosen to receive a new strategy for the next generation, 4 are the neighbours of theall D strategy. When creating new strategies the all D strategy will be selected more oftenthan any individual agent’s strategy (ie. all D has higher fitness). However, all D will notbe selected more often than a version of the C-then-D strategy since there is only one agentplaying all D and 99 playing C-then-D. Thus the all D strategy will spread but not thatquickly and will tend to spread locally. By generation 6899 (Figure 21c), there are a fewpockets of the All D strategy. Since the concentration in any one region of all D is not high,they tend to do well. Since they are now more numerous in numbers and have aboveaverage fitness, they will spread more quickly. By generation 6907 the tendency for all Dto spread locally has produced a pocket of all D agents in the centre of Figure 2 1(d). This-51 -Learning, Evolution and Set/Organization Co-evolution and Spatial Ineractionpocket grows since C-then-D agents on the border of the region perform poorly and arereplaced. With the increasing number of all D agents, agents that are replaced increasinglyreceive the all D strategy. By generation 6948 in Figure 21(f) the population average fitnesshas decreased to 1.25 with the majority of the agents playing all D. However, the spreadof the all D strategy is limited since agents on the interior of the all D pocket do poorly (ie.a fitness of 1.0). The high local concentration of all D strategies results in low fitness. InFigure 2 1(g), the all D pocket has been invaded, by the more co-operative C-then-Dstrategy.Figure 22 continues this simulation beginning with generation 7607. Between generation6960 (Figure 21g) and 7607 (Figure 22b) relatively little has occurred. The evolutionmechanism tries strategies in the interior of an all D region. Usually these strategies dopoorly (ie. are exploited). Eventually, a C-then-D niche of strategies is introduced into anall D region and survives as occurs in generation 7695 (Figure 22c). Although the niche hasremoved some of the all D strategies, those that remain are in low concentration and nowscore a high fitness. In generation 7735 (Figure 22d), the success of the all D strategies,at the expense of its neighbours, has again created pockets of all D. However, this time theall D region does not expand. Instead, in the next generation, the pocket of all D in thecentre of Figure 22(d) is replaced with a C-then-D niche of agents in Figure 22(e). Notethat the all D agents can only be eliminated if they are replaced in groups, since a single allD strategy will exploit its neighbours and have a high fitness. This occurs in generation- 52-Learning, Evolution and SerfOrganization Co-evolution and Spatial Ineraction7763 (Figure 22f) where all agents in the population play a C-then-D strategy and score afitness of 1.5. However, in generation 7777 (Figure 22g), one agent experiments with anall D strategy and the process described in Figures 21 and 22 begins again.In the Frozen Region the evolution leads to a situation where large portions of the populationnever change strategies. The Stationary Region is characterized by populations where allagents change strategies frequently. The Critical Region is poised between these two states.Parts of the population can be frozen for many generations before being replaced while inother regions agents receive new strategies frequently. However, no region remains frozenindefinitely and all regions eventually become frozen. Figure 23 illustrates this point usingthe same example as above where R= 14. In this figure the shade of an agent correspondsto the number of generations since that agent last received a new strategy. The generationspresented correspond to those in Figures 21 and 22. In generation 6877 (Figure 23b), thereis no large region which is frozen. In generation 6969 (Figure 23c) shows that the majorityof strategy replacement has occurred in the centre of the population. This regioncorresponds to the interior of the all D region in Figure 21(f). The C-then-D agents on theleft and right side of Figure 23(c) are frozen. By generation 7607 (Figure 23d) some of thefrozen regions are now actively receiving new strategies and new frozen areas havedeveloped. Again, strategy replacement occurs in the interior of the all D region. However,from Figure 22b, note that the all D region has moved. By generation 7777 (Figure 23e),where the all D regions have been replaced by the C-then-D region, there are no frozen- 53 -Learning, Evolution and S4fOrganization Co-evolution and Spatial Interactionregions. This pattern of freezing and un-freezing is typical of the Critical Region and helpsto explain the non-stationary pattern in the population average fitnesses.Unlike the three other regions, the Critical Region is very sensitive to initial populations andperturbations. Although the characteristic behaviour in the Critical Region is usuallyunaffected, small changes in the population can dramatically alter the specific path of thepopulation average fitnesses. A small change in the initial population or a perturbation toan existing population has a large effect on average fitness 1,000 generations later. Figures24 to 27 present a demonstration of the sensitivity to initial population. In these simulations,there are 16 agents replaced per generation. The two initial populations are shown in Figure24. The two populations differ only in two bits of the 500 bits that define the strategies inthe population.47 After 2000 generations the distribution of strategies in the two populationsdiffer substantially (Figure 25). The pattern of average fitnesses also shows a strikingdifferences. For one population (Figure 26), the system is frozen. The other population(Figure 27) displays the typical non-stationary pattern of population average fitnesses. Notethat unlike simulation in the Frozen Region, the first simulation (like the simulation discussedbelow) is not stable to small perturbations.In the Critical Region there appear to be two anomalies in Figure 12. When 12 agents arereplaced, the system becomes frozen with the same 12 agents being replaced. However, this47 .. . .In addition, the random number generator uses the same seed in both simulations.- 54-Learning, Evolution and S4fOrganization Co-evolution and Spatial Interactionpopulation is not stable to minor perturbations in the population. With a small perturbation(for example changing one bit of one string of the population), the systems exhibit thebehaviour typical non-stationary behaviour of the Critical Region. In particular, it does notbecome re-frozen as occurs for simulations in the Frozen Region.The population that results when ii agents are replaced warrants more attention. WhenR = lithe population converged to all agents playing a C on the initial move and a D on thesecond move for a fitness of 1.5. In this case all agents were using the C-then-D strategyCDDCD (strategy 13 of Figure 13). In this population, the existence of the never used Cin the fourth bit plays an important role in maintaining convergence in the population. If thegenetic algorithm creates a new strategy by mutating the initial move (first bit) of thisstrategy, the new strategy will not improve fitness. Playing D on the first move will exploitthe opponent. However, on the second move the new strategy will play C, be exploited, andmaintain a fitness of 1.5. Since the mutant strategy gains no advantage, it is selected withlikelihood equal to any other agent’s strategy. However, since there are many more agentsplaying the C-then-D strategy, the new strategy will likely die out. Defining strategies, asin traditional game theory, to prescribing play for all histories can limit the impact of amutation. Although this population is stable to small perturbations of a single bit, it is notstable to introducing a new strategy into the population like strategy 31 (all D). With thisperturbation, the system returns to the behaviour typical of the Critical Region.4848 A similar comment applies to R 14 since this population converges at generation 17,500 (not shown on Figure 19) and is notstable to the introduction of an all D agent.- 55 -Learning, Evolution and SeOrganization Co-evolution and Spatial InteractionIV - DiscussionRecall that Kauffman and Johnsen (1991) consider a model of co-evolution that investigatesthe interaction among the size of the gene, the ruggedness of the landscape and the degreeof landscape coupling. They fmd that when the landscape is more rugged, relative to thecoupling, the system evolves to an equilibrium faster. They also find that, holding gene sizeand coupling parameter constant, there is an optimal ruggedness that maximises populationaverage fitness. However, this optimal ruggedness is also the value that leaves the systempoised in a critical state. The authors conclude that if there is a meta-dynamic which selectsthese parameters, then the system will organize itself into this critical state. This sectionuses the similarities between their model and the repeated prisoner’s dilemma modeldiscussed in this chapter to interpret the results of the previous. As a result, our model helpsto explain the results of Kauffman and Johnsen.In the repeated prisoner’s dilemma model the size of the gene relates to the complexity ofthe strategies allowed. In the infinitely repeated game there are more than 1025 possiblestrategies, in the one-shot game there are two possible strategies and in the twice repeatedgame there are 32 possible strategies among which the evolutionary mechanism must select.The ruggedness of the landscapes measures the number of local optima and the steepness ofthe peaks. In the infinite game there are many possible Nash equilibria. These translate intomany local optima on an agent’s fitness landscape (holding constant the strategies of his- 56 -Learning, Evolution and S4fOrganization Co-evolution and Spatial Interactionneighbours). In addition, the landscape can be steep since a small change in strategy canchange fitness significantly because of the effect one move can have on the remaining movesin a game.49 Finally the degree of landscape coupling is controlled by the number of agentsreplaced in each generation (ie. R). When only a few agents are replaced, the landscape isnot coupled with distant agents since your neighbours rarely adopt new strategies. Whenmany agents are replaced each generation, there is a high probability that your neighbourswill adopt new strategies. Thus there is a higher probability that strategies of distant agentsmay be adopted by your neighbours.The first result in Kauffman and Johnsen is that the more rugged the fitness landscape,relative to the degree of coupling, the sooner the evolution mechanism will reach a Nashequilibrium. The result in this chapter is similar. Nash equilibrium were reached in theinfmitely repeated game more often than in the finitely repeated game. One explanation forthe difference in the evolution for the two games lies in the fact the there are many Nashequilibria in the infinite game and only one in the finite game. The multiplicity of equilibriain the infinite game makes it likely that an adaptive move (ie. receiving a new strategy) ofone agent will improve the fitness of his neighbours. After making such a move a niche iscreated where the agent and his neighbours will not be replaced. In this way an equilibriumcan spread through the population. This is what occurs in Figure 9(b) to 9(e). In contrast,For example, changing the initial move from a C to a D could cause a neighbour to play D for the remaining game instead ofC. Since in a long game this can be many moves, this will have a large effect on the fitness of the agent (relative to otheragents).- 57-Learning, Evolution and SeOrganizadon Co-evolution and Spatial Interactionin the twice repeated game, an adaptive move by one agent is usually at the expense of hisneighbours. Evolutionary changes are localised and have difficulty spreading through thepopulation and the Nash equilibrium cannot be reached. This is demonstrated in Figure 21.To relate this explanation to the Kauffman and Johnsen result, consider an agent’s multi-peaked landscape that is buckled by an adaptive move by a neighbour. The multiplicity ofpeaks ensures that the agent is not far from a local optima. In many cases, the agent couldbe made better off by the buckling. In contrast, if the landscape is single-peaked, bucklingthe landscape will likely leave the agent far from the peak and most likely worse off.Whether or not the buckling will translate into the agent receiving a new strategy dependsnot only on the landscape ruggedness but on the number of agents replaced per generation(ie. the coupling parameter).In the twice repeated prisoner’s dilemma game, the same region of R which maximisedpopulation average fitness also demonstrated critical behaviour with non-stationarily in thepattern of average fitnesses and the sensitivity to perturbations and initial conditions.Kauffman and Johnsen find a similar result. They fmd that for a given coupling parameter,there is an optimal ruggedness for the fitness landscape which both optimizes populationfitness and provides critical behaviour. In the twice repeated game, there is an optimalcoupling parameter (ie. R) for a fixed landscape ruggedness (ie. the twice repeated game)that demonstrates the critical behaviour. In both their paper and this chapter, if there existedsome meta-dynamic or evolution of the evolution mechanism (ie. make R endogenous) that- 58 -Learning, Evolution and S4tOrganization Co-evolution and Spatial Isueractionsought to maximise population average fitness, then the critical behaviour of the CriticalRegion discussed in above would result. The phenomena of critical behaviour as anendogenous property of a system is called Self-Organized Criticality (SOC). SOC was firstdiscovered by Bak, Tang and Wiesenfeld (1988) in relation to avalanches on sandpiles.5°The advantage of the prisoner’s dilemma evolutionary model is that the familiarity of thegame can be exploited to develop an understanding as to how the R which maximises averagefitness also produces critical behaviour. When R is too low (the Frozen Region), the systemgets stuck replacing the same agents over and over again. No critical behaviour is observedsince most agents never change their strategy. The population average fitness is low sincemost of the agents’ strategies are not highly evolved (see Figure 16).51 When R is too high(the Stationary Region), the exploitive strategies, which are very effective when they existonly in small numbers, spread too quickly. With a high R all agents change strategies often,there is no possibility that co-operative regions can become temporarily frozen. When R isvery high (the Converged Region), the all D strategies spread so quickly that eventually allagents play the all D Nash equilibrium and the system attains the lowest possible populationaverage fitness. Only when the R lies in the critical region is it high enough to preventfreezing but low enough to ensure that all D strategies tend to spread in neighbourhoods50 Footnote 46 containS more references to Soc.51 One has to be careful here since a completely random population will generally have fitness of 1.5 which is high relative to thefitnesses in Figure 12. The simulations in the Frozen Region had enough evolution before freezing that some exploitivestrategies spread.- 59 -Learning, Evolution and S4fOrganization Co-evolution and Spatial Interactioncontaining other all D strategies. By tending to limit the spread of all D strategies locally,niches of co-operative behaviour can exist for many generations. However, when too fewall D strategies exist, all D strategies do very well and can invade and destroy co-operativeniches. This dynamic not only prevents the system from ever attaining the Nash equilibrium,but drives the non-stationary pattern of population average fitnesses observed in the CriticalRegion.V - ConclusionSimulations are useful tools for developing understanding of complex phenomena. They canbe an excellent guide to generate intuition which can guide theoretical work. Unfortunately,simulation results are always inductive. The results of the simulations presented in thischapter are a representative sample of the all the simulations I have considered. However,it is always possible that the next simulation, with some modification of one of the manyparameters of the model, will demonstrate dramatically different behaviour. It is thereforeimportant to develop theory to support the conclusions drawn in this study.There are two theoretical papers in evolutionary game theory that are closely related to themodel presented here. Nachbar (1992) studies evolution in a finitely repeated game. Inaddition to pairing agents randomly, Nachbar considers an evolutionary mechanism ofreplicator dynamics which differs from the one used here. Replicator dynamics, which- 60-Learning, Evolution and S4fOrganization Co-evolution and Spatial Interactionmodel the evolving distribution of strategies in a population, do not allow strategies todisappear and reappear in the population as occurs with the genetic algorithm model in thischapter. Nachbar demonstrates that the evolution must converge on the all D Nashequilibrium. Chapter 3 indicates that introducing local interaction change this conclusion.The simulations certainly suggest that with both these changes, evolution need not convergeto the all D equilibrium. Binmore and Samuelson (1992) consider evolution in the infinitelyrepeated game using finite automata where agents prefer less complex automata. Thedescription of an invasion by a group of strategies that can recognize each other is verysimilar to the invasion that takes place in Figure 9 (see footnote 40). The next chapter,using a stylized version of the environment considered in this chapter, picks up these themesin an analytic, rather than simulation, framework.In summary, this chapter has presented a model that directly investigates the implications ofco-evolution. The proportion of agent strategies replaced in a generation modifies the natureof co-evolution by determining whether strategies need to be well adapted to theirneighbourhood or to the entire population. In the infinitely repeated prisoner’s dilemma thepopulation tends to evolve to one of the many Nash equilibria. Increasing the number ofagents replaced per generation reduces the mean and increases the variance of equilibriumpopulation average fitness. In the finitely repeated game, the population converges to theunique Nash equilibrium only when a large number of agents are replaced. These resultsare related to the Kauffman and Johnsen (1991) result which relates the ruggedness of a- 61 -Learning, Evolution and SeOrganization Co-evolution and Spatial Interactionlandscape to the length of time an evolution requires to converge to a Nash equilibrium.Finally, in the finitely repeated game, the proportion of agents replaced that maximisespopulation average fitness corresponds to critical behaviour. Again, this result is consistentwith a Kauffman and Johnsen (1991) result. However, in our model the intuition of theprisoner’s dilemma game can be used to understand how and why optimal population fitnessis related to critical behaviour. The next chapter builds on the simulation results presentedhere by considering a simpler spatial model analytically.- 62-Learning, Evolution and S4fOrganization Co-evolution and Spo4al InteractionAverage Fitness versus Number ReplacedLength of Game is 102.12 - GD EIJDmD D D 0 DUD UD GD 001.9- DUDO DDUD UD D 13 CD E]l]UDD D C+c +1.8- D 0 GD C E GD UD UD D C DLo 1.7- CD GD D1.6- 0 +D D> + +c8 1.5- + CD UD D GD4-,a1.4- GD £ D+1.3- CU1.2- D>1.1- GD U00I- C D GD DUDD.9 I I I I I I I110120 3D 401501601701901901100IS 25 35 45 55 65 75 8 95Number of Agents Replaced per Gen.D Converged + Not ConvergeFigure 3 - Population Average Fitness in Final GenerationFinal Generation is calculated as either the generation where the population converged or generation 2000.- 63 -Learning, Evolution and SefOrgarnzation Co-evolution and Spatial Interaction2 1++ + + +. 1.2..1.1. + + +t 1• +0.9 (a)0.8 ib 15 2b 25 3b 35 40 45 sb 6b 5 7b 75 sb 8 9b 95 ióoNumber of Agents Replaced Per Generation (R)21.3 00 00 0 00 01.2 0 0• 1.1.0.9 (b)0.8 lb 1 2b 25 3b 3 4b 45 sb 55 O 65 70 7 ab 8 90 9 ióoNumber of Agents Replaced Per Generation (R)+ Average Fitness in generation 0 Average Fitness In generation 1000where population first converged where population did not converge- 337 cases (top)- 63 cases (bottom)Fitted values from linear regressionusing all 400 casesAve fittness - 1.94- 0.005 x RFigure 4- Population Average Fitness in Final Generation (L=1O)Results for 20 thfferent initial populations. R=5,10,15 95,1(X). Final Generation determined as in Figure 3.-64-Learning, Evolution and SeOrganization Co-evolution and Spatial InteractionFigure 5 - Population Average Fitness per Generationlime series plot for three different values for the number of agents replaced per generation. R=8,28,67. Sameinitial population.Average Fitness per GenerationLen9th of game = 102.121.9 -0 1.8 -a1.7 -1.6-a 1.5 -aaC1.4 -U1.3-aL1.2 -1.1 -1-0.92’O i6o ioi4oio 1020 20 24010 30 50 70 90 1 0 1 0 1 0 170 1 0 210 230 250Generation8 agents replaced + 28 agents replaced 0 87 agents replacad- 65 -Learning, Evolution and S4tOrganization Co-evolution and Spatial Interaction.•.•..........n,o,•..,......... e,..4fl.•••n••ne••.•. 0.e.•e•.•ee.•.n........,.•.•.•e.••.Generat ion Hirber 0gure 6(b)................................................................................Generation Nta.ber 130gure 6(d)•••e•n•ee...............e..............••ne.••n......•.............•. ...........•en..e••.Generation tkuber 140gure 6(1)..............................•..e....,e.•.n••.............••....•n•..........Generation Iêater 100gure 6(c)..............................•..•••••••..................................•...•.••een.••.•en•.•e•eGeneration HisSer 135gure 6(e)•nennn....................aaaaaaaaaaC Played by Agentas a Percentageof Total0 to Ii11 to 2222 to 3333 to 4444 to 5656 toG?67 to 7878 totS09 to 100R=8L=1ORigure 6(a)gure 6(g)4+4+44±444+4+ ++ + ++.-#+ + +4+4-4+•Generation Hprber 130Figure 6- Location and Strategy Evolution ofthe Agents (oo repeated Game -8 agents replaced)Each agent’s location (on the torus) is represented by a circle. Radius is proportional to fitness. Shading representsthe percentage of C’s played the agent in tournament co move game against four neighbours or 40 moves).- 66 -Learning, Evolution and S4tOrganiza4on Co-evolution and Spatial InkraclionChange in Average FitnessGen 2000 versus Converged fitnessI0.9- a0.8 -a0.7 -0.6- a0.5- a0.4 -4-,0.3- a aU0.2- a a04 a a a0.1- a aS0 a rmntrijm n1çJujnaa a iii a acimj cwn J ci cflJEmE ac-0.1- a D- a a-0.2- a a a04 a-0.3- a a ao-0.4- aa-0.5- a-0.6 --0.7- a-0.8 - a-0.9 I I I I I I I I I I I I I0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95Number of Agents Rep Iced per Gen.Figure 7- Difference in Average Fitness Between Converged and Generation 2000There are 31 cases where the convergence in Figure 3 was not stable.- 67-Learning, Evolution and SeOrganization Co-evolution and Spatial InteractionFigure 8- Average Fitness per Generation R=67Note the population converges and then diverges several times. Insert provides a magnification ofgenerations 860to 890.Average Fitness per Generation—2.1 1.72 1.50.9500 700 sCo 1100 1300 1500 1700 1900Gene.on— 67geØ.oeddbLJ 1b 4- 68 --69-017L9=1(‘9“kclOT0)69680)9jfl0)1.91.90)99601frfrfr0)550)9!9!0)TTTT0)01t101JotSEpmefiqpefleId3etaJemwiuojeeuag0000000006e©$6000000 •s00o00000•onoe•oe••eo000o• ..,6n.e.,•,,6600,6666O6atØA.qIeiuoTwJaInv00606060600606060606$6600066606060666066$60006060066600060660066606006660006600060600606000000006606690flbWuo;)eaaug(‘)6“&0000600000000606000000006000000000000000000000000000000000000000000000000000000000000000000000000000006$060666606000000000061.90JdtIIWlUOJ)WJdLId96000600000060606000660006006000000006060000000060000000000000000000000000000000000000000000000000000flUawtqjOUTWJd4Jd9(q.)‘“X0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000;tsO3(saww.sosinoqq8pau.snofssuw8vawvffaaowor)suawvuinosusuaZvaqspatrjdçJoa8vsunsadayss!suasndal8upvygwsausj(injvuop.soclo.adssn;pvjapspiitCqpasuasaidatsy(nuosaqsiso)uopinowaXyzpvg(pavlddg81ud80z9-durv8pdjvadalco)81ud8vdysfouojsnioaariC8dWJJgpuvU0p0307-6(6asn4TISJfl400;)esauaQ6660060•6flono0•6600$66600060666600060600(1)6asnS060$66006Gee 66$066006$60066(P)6“&UOflJVJflUJjt’iwt%put’uoyiqoaa-irjuOgwZguVS4o-Jagput’UOflfl7043‘RupuvrLearning, Evolution and S4fOrganization Co-evolution and Spatial Interaction© ©® ...•....••oOee•e•n••n••••n••nn•••••on® “0•e••••on7igure 9(i)Generation Niaiber 873.....,.,..•••••nn•€00000000e••o®oene.......,..Figure 9(k)Generation Niatber 875......................,.......•000ee••••..,.........................................................Figure 9(m)Generation Nunber 884•••n••n.•••nn•®•,.,..,..,.flfl00000...0® © €4,..••nnon••••••••neFigure 9(I)Generation Nte,ber 874.....,.......,,.....•••®n••••...,,..................,......••4fl000000.., fl•••••...,.,....•••••n•••Figure 9(1)Generation têaiber 876....................................................................................................Figure 9(n)Generation Hunber 886C Played by Agenta. a Percentageof Total0 to IiIi to 2222 to 3333 to 4444 to 5656 to 6767 to 7878 to 8989 to 1007igure 9(h)R=67L=1OFigure 9 Continued - oo repeated game - 67 agents ReplacedEach agenñ’ location (on the torus) is represented by a circle. Radius is proportional to fitness. Shading representsthe percentage of Cc played the agent in tournament ao move game against four neighbours or 40 moves).- 70-4000000000.000000.00000000000000000000000000000000000000000000000000000t000000000000000000000 0 0 0 0 0 0 0 0 0I S00000..00..000000000000000000000000000000 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0000000000000000000000 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0Ifr-I‘-S0 0 0 0 0 0 0 0 0 0000000000•Oo000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0I I S I I00000..0oQ.Q00000000000000000000000000000 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0000000000000000000000 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0tI a I I R a I a I I aS0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0Cu00000000000•00000000000.c-)00000000000 0 0 0 0 0liiiUCt0000.4ft—ftISft00000ILeanung, Evolution and SdlOrganization Co-evolution and Spatial InteractionAverage Fitness versus Number ReplacedLength of game = 21.81.7 -0o 1.6-0In1 \/A1Jvf\\\JI 9 I1317I21I25I29I3337I41I45L49I55I63I71 79I87I9 I3 11 1 19 23 27 31 3 39 43 4 51 59 67 75 83 91 99Number of Agents Pep laced per GenU Converged Not ConvergedFigure 11 - Population Average in Generation 5000Population convergedfor R= 11,44,47,50,51 1(1).-72-Learning, Evolution and /Oiganization Co-evolution and Spatial luteractionMinimum, Average and Maximum Population Average FitnessFor Generations 5000 - 60001.81.7H—Minim1.6— Average15R—Mximum1.41.3121.11Frozen Critical Stationary Converged —t0.9 I I I I I I I I0 5 10 15 20 25 30 35 40 45 50 55Number of Agents Replaced per GenerationFigure 12 - Population Average Fitness Statistics for Generations 5000 to 60(X) (L =2)Frozen, Critical, Stationaty and Converged regions are discussed in the text.-73 -Learning, Evolution and S4/Organizadon Co-evolution and Spatial InteractIonSecond Move Conditional onStrategy First (You,Me) Outcome of First StageNumber Move (D,D) (D,C) (C,D) (C,C)0 C C C C C1 C C C C D2 C C C D C3 C C C D D4 C C D C C5 C C D C D6 C C D D C7 C C D D D8 C D C C C9 C D C C D10 C D C D C11 C D C D D12 C D D C C13 C D D C D14 C D D D C15 C D D D D16 D C C C C17 D C C C D18 D C C D C19 D C C D D20 D C D C C21 D C D C D22 D C D D C23 D C D D D24 D D C C C25 D D C C D26 D D C D C27 D D C D D28 D D D C C29 D D D C D30 D D D D C31 D D D D DFigure 13 - All Possible Strategies for the Twice Repeated GameA string of5 bits is used to represent these strategies. Strategy index numbers used in histograms which follow.- 74-Learning, Evolution and Selforganization Co-evolution and Spatial Interaction0000.000cc00000.0000000000000000000000000000000000000.00000000.00000cc00000000000000.000.000000. 000•Figure 14- LoCation ofAgents in a Frozen EvolutionThe dark agents are the 8 least fit agents and receive new strategies each generation. The other agents neverchange strategy.- 75 -Learning, Evolution and S4fOrganization Co—evolution and Spatial ImeractionTournament Length 2Population Ave Pitneee by Generation1.5EE5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000(Nunter of Agente Repieced per gen)—8Figure 15 - Population Average Fitness per Generation (R=8)Generation 5000 to 6000. An example of an evolution which is FrozenTwice Repeated Prisoner’s DilemaDistributic of SirategieB after 5000 Graerations50 Ag Rq,laced p& GmuadL1°lflflflnnnUflflJ- hdxNFigure 16- Strategies in Population in Generation 5000White bars indicate initial population. See Figure 13 for Strategy Index.- 76 -Learning, Evolution and S4f-Organization Co-evolution and Spatial InteractionFigure 17- Population Average Fitness per Generation (R=34)Generation 5(X)O to 6000. An example of an evolution which is Stationaty.0Figure 18 - Strategies in Population in Generation 5000White bars indicate initial population. See Figure 13 for Strategy Index.a(a0C00L06C06aTournament Length 2Population ave fitnees by generation1.5 -1.41.31.21.15000 5200 5400 5600 5000 6000(Nuriber agente replaced by generation)—34Twice Repeated Prisoner’s DilemaDiribution of Slmtegiee after 5000 Gererafio50— RqilaoedpG112hdoxN- 77-Learning, Evolution and SeOrganization Co-evolution and Spatial InteractionFigure 19 - Population Average Fitness per Generation (R=14)Generation 5000 to 6000. An example of an evolution which is Critical.0 IiFigure 20- Strategies in Population in Generation 5000White bars indicate initial population. See Figure 13 for Strategy Index.1.5Tournament Length 2POpccIatTOfl eve fitnese by øereratlon1.4ccccccCLlcc1.3ccLccC0C,cca0a1.11.21450Twice Repeated Prisoner’s DilemaDiIbu&ii of Sirategies aftee 5000 Genesatloss45Agccius Rccçbced pGcciiDo•1440j3520is10S iinflnnnHnn ,r,nFIIl[1F1F1r1E1 1F1- 78 -Learning, Evolution and SOrganization Co-evolution and Spatial Interaction..........................................................................................•.....eeoeGeneration kmber 6877‘igure 21(b)••••••.oce....... •0•..,.......•000.e0000•nooo.000••.oo 00.00.... e000e.......e.............•••••ooo.•Generation Hiwiber 6907ooeooooooooee000cy.o0•oo oo.Q.e•000000000•eo. 00•Oe000000 0 @Qe•••flooc’.•••••o • ec..••e oQ• @00••eoo cocee•••••e• .0.......................eQ......... .Oee•••...0.,0...•eeooe•..........•e•eeeeeee•ee•••eooeGeneration Ntaiber 6899Tgure 21(c)o0eQoooc@0....00000•oooe....••.Oeee••••eo. .oc...•eooo • oDe.•.e .o @0 00.•eo 0 @oe•eee.o.oe••••e0 ooe•Generation Iftater 69257igure 21(e)@0.000 • 000@@0.0.caeee@o• Cc’ oe0• eoe•oee••00.0..on....0ee0C Played by Agentaa a Percentageof Total0 to Ii11 to 2322 to 3333 to 4444 to 5656 to 6767 to 7878 to 8989 to 100‘igure 21(a)R=14L=2ygure 21(d)oeeoe0.e0@0ee000•eoo000...@.•.oFigure 21(g)7igure 21(f)Generation Vêinber 6948 Generation Hiamber 6960Figure 21- Location and Strategy Evolution of the Agents (Twice Repeated Game - R=14)Example ofbehaviour in Critical Region. Average population fitness moves from 1.49 in (b) to 1.25 in (7) and ends at1.32 in (g)- 79 -Learning, Evolution and S4fOrganization Co-evolution and Spatial Interaction••en...0.®.e 00o®eoo0000c’oQ•QCone0on•.n•.‘igure 22(b)••e•on••e•••® 0®....• •0oo•.© 00.0..••• .•••••eQo non000eoneoo.n••en.o7igure 22(d)Generat ion Nunber 7735....................................................................................................7igure 22(i)Generation Ntaber 7763•e•neo.•e.......,..ze••Qee oQe@OeOeOeO.•00 0e...•.®•o.Q.•.•.o00• eeeQ©Qe•0•fl.e.•.000........nn•.•••‘igure 22(c)Generation Wither 7695•...•.....•..0 0®•••••eeecoee•ooe..eQ..e•.•.n.•ee•e.Q.onn000e000eOo.on••••.o..........‘igure 22(e)Generation N.rber 7736.............................................o......................................................‘igure 22(g)Generation I*aber 777700...0•o000•...noOeOooeceooQeoDe00.....00000Generation Itater 7607C Played by Agentas a Percentageof Total0 to 1111 to 2222 to 3333 to 4444 to 5656 to 6767 to 7878 to 8989 to 100Ygure 22(a)R=14L=2Figure 22- Location and Strategy Evolution of the Agents (Twice Repeated Game - R=14)Example ofbehaviour in Critical Region. Average population averagefitness moves from a low of 1.3 in (7’) to a highof 1.5 in (t)- go -Learning, Evolution and S4fOrganization Co-evolution and Spatial InteractionAge (10Age (20Age (30Age (40Age (50Age (60Age (70Aee (80Age >00QQQeQQQoQoccee•ooooocaooeocooe©csooooeoeeoeoo•••o•ononoooo•e••ooneoeoo•eoeoQQQ®©QoQ0000000o©oGenerat ian &eéer 6077‘igure 23(b)•.er/jo 000®o•ee®o® oeo.Oe.o. c e..0000000.•e•o o•c••••e cOo oee•...,0000...... 000•e•••cooQo®•.•., o®0000Generation Weiber 6960‘igure 23(c)I‘igun 23(a)R=14L=2••••e••ee..n cvooeoQ• coo eoo•eee0000•oooeoooocc•OOOOOQOO®0000•oeo•o•o•.....Generation itetber 7607Ygure 23(d)•oooooooooOOOOOO.OOoooooooo•oooOoOeOOooooQ@oo•oooQQoQeoOOOOO©©O©o©•ooeooOOOOOOOOOo•ooo•ooGeneration Nirber 7777‘igure 23(e)Figure 23- Strategy ReplacementShade ofagent indicates the number ofgenerations since the agent received a new strategy. This sinudation is for 14agents replaced in the twice repeated game. Generations correspond to those in Figure 21 and 22.- 81 -Learning, Evolution and S4/Organization Co-evolution and Spatial InteractionTwice Repeated Prisoner’s DilemaInitial populatics of Strategies30• Apes Rep’aced p45 L1Pop1.Geno•j35dexFigure 24 - Two Initial PopulationsPopulations differ only in the first bit of agents 4 and 6Twice Repeated Prisoner’s DilemaDilbnics ift 2000 Geseratices50 Apis, Rqaced p Gesisdon45 EIPOP1-Gen2000•Pop2-Gen 20001I:__________Jill U; 11’iShNisFigure 25- Strategies in Population After Generation 2000 and R=16From two similar populations, R=16 in the Critical Region produces vely different populations.- 82-Learning, Evolution and S4f-Organizadon Co-evolution and Spatial InteractionFigure 26- Population Average Fitness with Initial Population 1 and R=162000 generations are reported. R=16 is in the Critical RegionComparison of Two Initial PopulationsPopulation Ave Fitness by GenerationFigure 27- Population Average Fitness with Initial Population 2 and R=1640(X) generations are reported. Simulation was runfor 15,000 generationsproducing similar plot aspresented here.Comparison of Two Initial PopulationsPopulation Ave Fitness by Generation1.71.5UUC4,ii. 1.5ap•11.4o1.3C01.21.11 501 1001 1501 2001Nrber of agents repiaced is 16 per peninitisi Poulation 1.rrni,vwrrneiL1i .iai1.81.71.60aC1.5411.401.3C01.21.1Nunter of agents replaced is 16 per pen— initial Poulation 2- 83 -Chapter 3Simple Spatial EvolutionThe previous chapter models the evolution of play for a repeated prisoner’sdilemma game where agents’ interaction is restricted to their immediate neighbours on alOx 10 torus. In that model, simply describing the spatial environment is not sufficient tocharacterize the dynamic behaviour of the model. The number of agents replaced pergeneration determines the importance of the spacial environment. In particular, the numberof agents replaced per generation impacts whether a particular agent is likely to faceopponents with new strategies. When agents are very likely to play new agents (or agentswith new strategies - you may interpret however you like), a “good” strategy is one that iswell adapted to the entire population. In contrast, when only a few agents are replaced ina generation, then good strategies are ones that are well adapted to the particular spatial- 84 -Leanung, Evolution and SelfOrganization Simple Spatial Evolutionenvironment.1’23The lOx 10 model in Chapter 2 demonstrates an important characteristicof the spatial prisoner’s dilemma environment: An agent’s strategy influences the probabilitythat his neighbours will seek new strategies next generation and thus affects the dynamic pathof evolution. In other words, in spatial models an agent’s strategy not only determines howthe agent plays the game but helps to determine with whom the game is played. This chapterinvestigates the effects of this spatially induced correlation.The complexity of the 10 x 10 model prevents algebraic analysis. This chapter focuses ontwo much simpler models which capture the effects of the spatially induced correlation. Thechapter begins by describing the population or replicator dynamics approach to modellingevolution. In this environment the introduction of correlated play is exogenous. Since thereis no natural parameterization of the correlation that captures the spatial model, replicatordynamics is not a particularly useful approach. The second example is a Markovian modelwith four agents arranged in two pairs playing a one-shot prisoner’s dilemma game. Besidesoffering insight into the correlation induced by spatial interaction, the chapter indicates the1 Note that learning in the previous chapter and here is “global” in that agents select or create strategies based on the strategiesand fitnesses of the entire population. For more discussion of local evolution, see Kirchamp (1994). Note Axelrod (1984) alsointroduces a model with local strategy evolution.2 Note that when a genetic algorithm is used to simulate evolution, then a “good” strategy is one that is appropriate for the likelyoutcome of the evolutionary mechanism (such as a genetic algorithm) given the current population.The use of the term “good’ has the potential to be misleading. A maintained assumption of evolutionary models is that strategyselection is made by myopic agents. Agents have no intertemporal objective function. Even in games which are repeated,agents are myopic in that they have no concern for what occurs after the end of the game. See Figure 2.2 in Chapter 2 for aschematic representation.- 85 -Learning, Evolution and S4tOrganization Simple Spatial Evolutioncomplexity and difficulty of considering the lOx 10 torus environment analytically. Thisoffers additional justification for the simulation methodology explored in Chapter 2.I - Replicator DynamicsReplicator dynamics, introduced by Taylor and Jonker (1978) are used to analyze a noiselessevolution of random pairwise interactions in large populations. There have since been manyexamples using this approach.4 In particular, Nachbar (1990, 1992) uses replicatordynamics to analyze finitely repeated prisoner’s dilemmas. This section introduces the mainassumptions of replicator dynamics and considers an example of a one-shot prisoner’sdilemma. Most replicator dynamics analysis assumes that agents are paired randomly. Thisassumption does not suit a spatial environment. Skyrms (1994) introduced non-randompairing or correlated play. Unfortunately, as is discussed below, correlation models placelittle restriction on behaviour and are thus not useful.The basic environment for replicator dynamics considers a sequence (indexed by t 0) ofpopulations of agents playing a symmetric two-person game. Let A be the payoff (or fitness)matrix representing the symmetric two-person game with feasible strategies denoted asaE {a1,. .. ,aM}. The A represents the resulting utility of playing a, against an opponentusing a and are assumed to be strictly positive. Let p(t) be an M-dimensional vector whose4 See van Damme (1987), Chapter 9 for an overview and additional references.- 86 -Learning, Evolution and SeOrganization Simple Spatial Evolutionelements, p1(t), denote the proportion of population using strategy a, at time t. Finally let7r(a1,t) be an M-vector whose entries,r1(a,t) represent the probabilities that an agent playinga, at t will meet an agent playing strategy a1.The expected utility (or fitness) of an agent playing strategy a, is calculated as follows.u(a1,t) = e1’ A r(a1,t) (3.1)where e. is an M-vector with a one in the 1th position and zeros elsewhere. Note that forsimplicity, mixed strategies are not considered. Since populations are considered to be large,any need to consider mixed strategies is obtained with a non-homogeneous population. Theexpected population average utility, U(t) of the population is the proportion weighted sumof utilities.U(t) = p1(t)u(a1,t) (3.2)The goal of using replicator dynamics is to describe the evolution of the proportion of eachstrategy in the population. The main assumption of replicator dynamics is that the growthrate of a strategy in the population is proportional to that strategy’s current utility (fitness)relative to the current population average. Since the population is assumed to be large therealized average utility of the (many) agents playing a1 is close to the expected utility (see- 87 -Learning, Evolution and S4fOrganization Simple Spatial EvolutionBoylan, 1992). These assumptions combined imply that the growth or evolution of a strategyin the population occurs according to the following deterministic equation:u(a4)U(t)Note that since the growth of a strategy in the population is assumed to be proportional tothe proportion currently playing that strategy, attention is restricted to initial populations thatput some positive weight on all possible strategies. This implies that all p(t) will remain inthe interior of the (M-1) simplex. Rearranging slightly and letting the time betweensuccessive generations become small yields the following system of differential equations.dp1(t)= p.Q) (u(a1,t)— U(t)) (3.4)dt U(t)Since U(t) >0 (by normalizing A), the trajectories of this equation are the same as thesimpler equationdp1(t)= p.(t)(u(a1, )—U(t)) (3.5)The properties of this equation that are of interest are the dynamic equilibria or fixed points(ie. dp1ldt=0 for all i) and the stability of these fixed points. Dynamic equilibria, p, are- 88 -Learning, EvoiuUon and SeOrganization Simple Spanal Evolulion(asymptotically) stable if there exists a neighbourhood about p such that any path of (3.5)that begins in the neighbourhood converges to p.5 Note that the evolution dynamic isdeterministic. Noise or mutations are less important since every possible strategy isrepresented in the initial and subsequent populations.6A key simplifying assumption in replicator dynamics (and most all evolutionary game theory)is that the pairwise interactions occur by drawing agents randomly from the population(random pairing). This implies that T(a1,t)=p(t) for all a1. In particular, note that theprobability of an agent interacting with any strategy is independent of the agent’s currentstrategy. The assumption simplifies the dynamic in (3.5) to a system of cubic differentialequations which aredp1(t)= p.(t) (e1’A p(t) — p(t)’ A p(t)) (3.6)To highlight the effect of the random pairing assumption, consider the one-shot prisoner’sdilemma game with a1 = C and a2=D and payoff matrix as followsThere are weaker definitions of stability where points which begin near p remain near to p*• However, these distinctions arenot important for the analysis in this chapter.6 This is in contrast to the genetic algorithm where typically the population is small relative to the possible number of strategiesand noise (such as crossover and mutation) serve to ensure that all strategies are evenwally tried in the population.- 89 -Learning, Evolution and S4f-Organization Simple Spatial EvolutionA=CO (3.7)edwith e > c> d> 0. Since there are only two possible strategies, the population can bedescribed by the proportion of agents playing C in the population, denoted p, (timearguments are suppressed). In this prisoner’s dilemma environment with randominteractions, the replicator dynamic in (3.6) is given by=(1—PC) (u(C) — u(D)) (3.8)= PC (1 — PC) (p (c + d — e) —d) < 0Note that the proportion of C-agents is monotonically decreasing over time (for p, < 1).Therefore, the only stable dynamic equilibrium is p=O. This is not surprising since D isa dominant strategy in the one-shot prisoner’s dilemma game. Nachbar (1992) investigatesfmitely repeated prisoner’s dilemma games using the replicator dynamic approach. Whilehe fmds that the replicator dynamic can show much more interesting behaviour (eg. nonmonotonic) than the simple version presented here, the only stable dynamic equilibrium isone with all-D behaviour.7The strategy space of the finitely repeated prisoner’s dilemma game considered by Nachbar (1992) differs from the one usedin the lOx 10 simulation of the previous chapter. In his environment, only one of the feasible strategies will yield all-Dbehaviour. In contrast, in the twice-repeated game in the lOx 10 simulation there are several strategies that produce all-D- 90-Learning, Evolution and S4/Organization Simple Spatial EvolutionIn contrast to random pairing, Skyrms (1994) considers correlated evolution where there issome exogenously specified correlation between the play of strategies. In other words,ir(C,t) i-(D,t). It is easy to see that the monotonic decrease in the proportion of C-agentsin (3.8) need no longer be the case if r(C,t) is sufficiently larger than r(D,t). If these(exogenous) probabilities are held constant over time, the evolution can move monotonticallytowards the all-C population.8 This occurs despite the dominance of the D strategy. Herethe choice of strategy influences the interaction and not just the outcome from theinteraction.9In a more sophisticated model the T(a1,t) may also evolve as some functions of the p(t).Such functions for the probability of pairing with an agent playing C can be written asr(p;a1)(suppressing the time arguments). In this case, much more complex behaviour canbe produced by the replicator dynamic.1°More importantly with the “right” r’ s any p canbe a stable dynamic equilibrium even if the ir’s are restricted to be non-trivial functions ofPC(t). This is stated in the following proposition.behaviour. This is not an important distinction, however.8 Forp,= 1 to be the stable dynamic equilibrium it is sufficient that it,(C,t) > a,(D,t) + for all t.Skyrms (1994) relates the difference between random pairing where strategy choice does not affect opponent and correlatedpairing to the difference between Savage (1956) decision theory and that proposed by Jeffrey (1965). In Jeffrey decision theory,in stark contrast to Savage, the actions of the decision maker are assumed to affect the probability measure over states and notjust the utility of the state-contingent outcomes.10 For example, each agent might receive a partner from the following procedure: Select n agents from the population at randomand then from this list choose an agent to play according to some strategy dependent probabilities. Such a system has a stabledynamic equilibrium at p,=O and never at p=l. However, with the right parameters other (strictly mixed) stable dynamicequilibria exist.-91-Learning, Evolution and S4fOrganization Simple Spatial EvolutionProposition 1 Given any p E (0,1), there exist smooth non-decreasing functions T(p;a)ofpjt) with the property that r(0,a1)=0 and ‘,rjl;a1)=1, for a. E (C,D),such that p is a stable dynamic equilibriwn.Proof See Appendix A.In order to make a model using replicator dynamics meaningful in an environment withcorrelated play, some additional structure must be placed on the probabilities of interaction(the lr’s). Unfortunately there is no obvious connection between the correlation that canarise from spatial interaction and the exogenous ir’s. The next section, therefore, addressesthe induced correlation directly by treating spatial evolution in a Markov setting.II - Spatial Markov ModelAs was noted above, the correlated play that is induced in a spatial model is not easilycaptured by replicator dynamics. In this section, a Markovian spatial model is developedthat tracks the evolution of the strategy played at each location. This allows directconsideration of the (endogenously developed) correlation in play. Unfortunately, the statespace in these models can become very large. In the lOx 10 torus considered previously,- 92-Learning, Evolution and Set/Organization Simple Spatial Evolutionthe state space for the one-shot game is on the order1’ of 2100 (about 1O in the twicerepeated game the state space is of the order 2°° (about 1O’°) and in the infinitely repeatedgame (represented by strategies with a memory restricted to three moves) the state space is28500 (about 102558). In order to understand what is occurring in these more complex models,a simple four agent system is considered. The model is tractable (only six states) yet candemonstrate much of the behaviour of the more complex system.The model is described by two elements; agent interaction and strategy evolution.Figure 3.1 describes the environment for agent interaction where four agents, arranged intwo pairs, play one-shot prisoner’s dilemma games.Cc 0Figure 3.1 - Agent InteractionFour agents, arranged in two groups, interact playing the prisoner dilemma. The symmetric payoffc for therow player are shown at the nght.This overstates the size of the state space since there are some states which are equivalent. For example, the state where aUagents play D except for the agent in the first location is equivalent to one where the sole C-agent is in position 57. Even afterconsidering such reductions, the state space remains large. Note that the size of the neighbourhood for play (nearest 4 agents)influences the state space simplifications which can be made.- 93 -1 2 3{C,D}CD4{C,D}w{C,D} {C,D}De de>c>d>0Learning, Evolution and SeOrganization Simple Spatial EvolutionEvolution is modelled in two stages. The first stage determines which agent searches for anew strategy. Recall in the 10 x 10 model, this was determined as the bottom (lowest fitness)R agents. In this model only one of the four agents will be replaced in a generation. Inparticular, evolution will select the agent with the lowest fitness with probability (l-). Withprobability i, evolution selects one of the four agents at random (ie. with uniform probabilityof ‘A). By varying the parameter , the nature of the spatial landscape is altered (analogousto varying the number of replaced agents). Note that when,is very small, the chance thatan agent’s opponent will seek a new strategy is small as long as the opponent’s fitness is notlow. In contrast, if is high, the probability that an agent’s opponent seeks a new strategyis (almost) independent of either agents’ current strategy.The second stage in the evolution determines which strategy an agent chooses, given she islooking for a new strategy (ie. selected for replacement). With probability 1-u, the agentselects a new strategy from the strategies existing in the population with probabilitiesaccording to each strategy’s proportional fitness. With probability the agent experimentsand selects one of the possible strategies randomly (ie. without reference to the currentpopulation). In this environment, with probability ½ the agent selects C. Fitnessproportional selection is similar to the replicator dynamics. However, in this noisyenvironment serves to introduce strategies into the population that are currently not used.The is similar to the noise in genetic algorithm evolution from both crossover and-94-Learning, Evolution and SeOrganization Simple Spatial Evolutionmutation. The noise is needed in this model (and in the genetic algorithm) since, unlike inreplicator dynamics, the population size is small relative to the feasible strategies.Given this model of evolution, the system is Markovian. There are 16 (=2) possible statesof this evolutionary system. However, since some of the permutations are indistinguishable(ie. the location of the agent playing D when others are playing C) the relevant state spaceconsidered here has only six states. The states indicate the number of C’s being played ineach of the two agent-pairs without regard to their location in the pair or identity of the pair.The state space considered is { 1 =2C2C, 2 =2C 1C, 3 =2COC, 4=11C, 5=1 COC,6 =OCOC}. The Markov transition matrix is denoted M with entries M as the probabilityof transiting to state j given the current state is state i. (The thirty-six M entries areincluded in Appendix A as they may be a bit difficult to read here).- 95 -Learning, Evolution and SeFOrganization Simple Spatial Evolutioni-Vip Vip 0 0 0 0‘/411((1*X...)+14IL) ([_1/4flX(1_(...)+%P)+ (1_%nX(i_().½) %((l_)(...L..).%) 0 0o (i_½uiX(i_)()+½) ½ui((t_)+½)÷ 0 0(i_½ui)((I_p)(_L)÷%p)N— c+d0 Viq(Vip) 0 Viq(1—½p)÷ (i-ViulXi—½p) 0(l-Viq)Ø4p)0 0 1/4qØ4p) ½i(’4) (i-%iiXVip)+ (i-%qXi-1hji)0 0 0 0 i-Vip(3.9)- 96 -Learning, Evolution and S4tOrganizanon Simple Spatial EvolutionLet p(t) be the 6-vector whose elements p,(t) give the probability that the population is instate w at time t. The Markov sequence {p(t)} is defined as follows (where p(O) is aspecified initial condition)p(t+1)’ = p(t)’ M (3.10)In order to understand the behaviour of the evolutionary system, the limiting distribution iscalculated. Note that so long as and are both strictly positive all states communicate.Therefore M is ergodic (or all six states form an ergodic set) and there is a unique invariantdistribution p* such that p’ ‘M with positive weight on each state (ie. p>0 for all o).Furthermore, the system converges to this invariant distribution from any starting point (ie.for all p(O)).’2The unique invariant distribution is a function of the levels of the noise, i and JL. Since theinterpretation of these parameters implies that they are small, the unique invariantdistribution will be considered as q and approach zero. This follows the methodologyintroduced by Kandori, Mailath and Rob (1993) and Young (1993) (KMRY) in their papersusing evolution for equilibrium selection. This section also relies on the Bergin and Lipman(1994) analysis that demonstrates that the KMRY approaches are sensitive to how the noiseis specified.12 These are well known properties of Markov systems. For more details, see Stokey and Lucas (1989 Chapter 11).- 97 -Learning, Evolution and SeOrganizadon Simple Spatial EvolutionFrom inspection of (3.9), note that if j ==O there are three ergodic sets which areabsorbing. They are {1}, the all-C state, {6}, the all-D state or {2,3} which varies between2C1C and 2COC. Note that states {4,5} are transient. By considering various alternativespecifications for the noise (j and ‘q non-zero but vanishingly small), each of these sets canbe selected as the limiting unique invariant distribution. This is an application of Bergin andLipman (1994, Theorem 1). The discussion which follows will characterize the noiseproperties which produce the various limiting behaviours.The key requirement of the Bergin-Lipman theorem is that noise be allowed to be statedependent (eg. Note that the Markov matrix in (3.9), for convenience, presentsthe case where noise levels are not state dependent. The limiting distribution when all noiseis the same is a special case of the analysis which follows. It turns out that a convenientpartition of the state space is to divide the states into those that have at least one co-operativepair, ë={1,2,3} and those that do not, 15={4,5,6}. The noise in these states will bedenoted m, D’ and I-D• Since we wish to consider the case where the noise levelsbecome vanishingly small (albeit at different rates), it is convenient to write the noise levelsas follows (where and constants Kq and K are strictly positive).D=exp()(3.11)12D = exp(-) = exp(—K fl- 98 -Learning, Evolution and SefOrgaaizo4on Simple Spatial EvolutionUsing this we can state the main proposition of the analysis.Proposition 2 Given the Markov matrix with non-zero-vanishingly-small noise, the uniqueinvariant distribution p* is a function of iç and K as follows:(o, 0, 0, 0, 0, 0, 1 ) K<3 and K,<3(3.12)urn p*(;KK)/ = (i, 0, 0, 0, 0, 0, o) K>3andK>K(o, x, 1-x, 0, 0, 0, o) K>3 and K,>Kwhere= c(2c+e) (3.13)c(2c+e) + e(c+d)Proof The details of the proof and the boundary cases (ie. K and/or K is exactly3) are left to an appendix. The strategy for the proof is as follows. First the invariantdistribution of M in (3.9), with the noise levels as specified in (3.11), is calculated. This-99-Learning, Evolution and SeOrganizo4on Simple Spo4oJ Evolutionis done by solving p’ =p’M for p(K,K). Taking the limit as gets large (which makesthe noise small) for the various cases yields the result. See Appendix A.Although the model is relatively simple, it does offer some insight into evolutionary systemslike the one considered in the previous chapter. The results indicate that the relativemagnitudes of the noise matter. When the noise is the same across regions, the systemevolves to the state where primarily all-D behaviour is observed. However, if noise in theO-states is lower (by an exponential factor), then the system can evolve to the i-states. Inthis case two types of behaviour are possible. The system can settle on the all-C state sincemutations to push the system away from 2C2C are vanishing most quickly. The secondpossibility is reminiscent of the behaviour which typified the critical region in the lOx 10simulation model.When is the noise parameter which is approaching zero most quickly (and enough fasterthan ND), the system evolves to oscillating between states 2COC and 2C1C. Note that in state2COC, the two agents playing D do strictly worse than the two agents playing C. Providedis small (more importantly vanishing much more quickly than ED), then one of the Dagents is typically the agent hunting for a new strategy. In the 2C1C, the agent playing Cin the community with the D-agent is the agent most likely replaced (and most likely with-100-Learning, Evolution and SeOrganization Simple Spatial Evolutiona D).13 It is this sort of behaviour that was typical in the lOx 10 simulation when thenumber of agents replaced per generation was moderate.One interpretation of the four-agent model is that it is an idealized form of the many agentpairs which exist on the lox 10 torus. In this case the “noise” comes from the fact that inthe larger system agents are interacting with more agents and fitness calculated from oneinteraction is only an approximation of fitness in the larger (four interactions) system.Secondly, the search for strategies is not limited to four agents but to the entire grid of 100agents (plus crossover and mutation). Thus one could think of i capturing the first of theseimperfections and capturing the second. Recall that the twice repeated game did notnecessarily evolve to the Nash equilibrium and the behaviour was classified as Frozen,Critical, Stationary and Converged according to the number of agents replaced pergeneration. If the model here does indeed capture the essence of the more complexsimulation, then there is a relation between the number of agents replaced per generation andthe noise parameters of this simple model. For example, the large number of replacementsin the Converged region is akin to a large value for both i and D (more precisely, similarvalues for the parameters) where all-D results. Both a high level of replacements and a high,, (particularly i) reduce the level of correlation induced in play by decreasing theimportance of fitness (and indirectly strategy choice) in determining which agents seek new13 Note the relative probabilities of the two states in this case depend on the actual payoffs in the prisoner’s dilemma. They aregiven byx in equation (3.13). Note one can also calculate the mean first passage times, TV, in this situation from state 2 (2C1C)to 3 (2COC) and vice versa. They are r(2e+e)Ie and r,(c+d)Ic.- 101 -Learning, Evolution and S4fOrganization Simple Spatial Evolutionstrategies. In both cases strategies are matched more randomly and evolution leads to theall-D Nash equilibrium. Similarly, the Critical Region is akin to where < In termsof the larger lOx 10 system, this translates into a higher likelihood that regions with agentsplaying D have several agents replaced with C strategies (ie. several realizations of non-fitness based replacement occurring at the iD probability) rather than vice-versa. Indeed, inthe 10 X 10 model, this is the case since agents in regions of C agents have higher fitnessthan those in the D regions.14 Thus the simple four-agent Markovian model presented heredoes capture the behaviour in the much more complicated 10 X 10 spatial model of Chapter2.ifi - ConclusionsThe purpose of this chapter has been to offer some analytical approaches to the spatialmodels of evolution. In particular, I have surveyed the possible approach using replicatordynamics. The difficulty with this approach is that the correlations between agents’strategies (possible) induced by a spatial model is not easily captured. The second approachis to model spatial evolution as a Markov chain. This analysis seems to capture the essenceof the complex simulation. This is encouraging since the huge state space makes acompletely analytical examination of the 10 x 10 spatial evolution model practicallyimpossible.14 . . . . . . . . .For simplicity, I have called the regions C and D in the lOx 10 system. In fact, since this is a twice repeated game which isbeing described, the C region is best described by agents using a C-then-D strategy.-102-Learning, Evolution and Sey’Organization Simple Spatial EvolutionThe next three chapters move away from strategic situations to consider an environmentwhere co-adaptation occurs through prices in a large financial market.-103-Chapter 4Stochastic Adaptive Learning in aRational Expectations ModelThe informational efficiency of financial markets has been extensively studiedand often serves as an axiom for other research such as event studies.’ Despite this largevolume of research, the procedure by which large financial markets process information isnot well understood. Without such an understanding, it is difficult to address questions aboutsecurity market behaviour, regulation or design. Rational expectations models partiallyaddress the information processing mechanism. Models, such as Grossman (1976), describehow agents can infer information from equilibrium prices. However, these models assume1 See Fama (1991) for a survey of market efficiency. Event study methodology is discussed in Fama, Fisher, Jensen and Roll(1969) and Thompson (1985), for example.-104-Learning, Evolution and SeOrganization Stochastic Adaptive Learning in a Rational Erpectations Modelagents are unboundedly rational and possess not only a complete knowledge of theeconomy’s structure but also a detailed knowledge of other agents’ beliefs and objectives.Little explanation is offered for how agents acquire sufficient information to makeappropriate inferences from endogenous parameters. In complex financial markets, like theNYSE, these assumptions seem far from realistic.Strong rationality and informational assumptions are often justified by appeals to repeatedplay and adaptation. Agents learning via trial and error experimentation and imitation donot require sthngent assumptions about agent rationality or information. Agents aredescribed by rules-of-thumb which are adopted, modified and abandoned based on pastsuccess. One might expect that in large financial markets, with a large number of wellmotivated traders, deviations from rationality should be small, non-systematic and transitory.This chapter and the two which follow investigate the reasonableness of this argument byconsidering the conditions for when the outcome of adaptation is likely to coincide with arational expectations equilibrium. This helps to determine when rational expectations arelikely to be good approximations to the behaviour of financial markets like the NYSE. Inaddition, a better understanding of trial and error learning processes offers insight into howfinancial markets reach equilibrium and process information.Specifically, the one-period risky asset model of Grossman and Stiglitz (1980) is augmentedto facilitate modelling learning. The model developed in this chapter is used in the two- 105 -Learning, Evolution and S4/Organization Stochastic Adaptive Learning in a Rational Expectations Modelsubsequent chapters. To explore the implications of adaptive learning, the single periodmodel is repeated. In the model, agents must learn the relevant exogenous relationshipbetween the model’s parameters as well as the parameters of the endogenous relationshipwhich determines equilibrium price. The model underscores the difference between learningan exogenous parametric relationship and learning an endogenous (equilibrium) relationship.Parametric relationships are unaffected by other agents. Equilibrium relationships dependon the behaviour of other agents, and therefore depend on the learning of the other agents.The chapter characterizes a stochastic adaptive learning process that captures the intuitionof experimentation and imitation. It is shown that a stochastic adaptive learning processwhich converges to the Grossman-Stiglitz (GS) rational expectations equilibrium alwaysexists. In addition, the stability of the GS equilibrium for a fixed stochastic learning processis considered. It is shown that the key to stability is the size of agents’ experiments relativeto the sensitivity of the informativeness of the equilibrium price to changes in the proportionof the population which is informed.The remainder of the chapter is organized as follows. The next section discusses thebackground research upon which this chapter draws. Section 4.11 introduces the repeatedone-period asset pricing model. Section 4.ffl discusses the effect of stochastic adaptivelearning on the stability of rational expectations equilibria. The intuition developed in thissection is developed further in Chapter 5 which models learning using deterministic dynamics-106-Learning, Evolution and S4fOrgarnzation Stochastic Adaptive Learning in a Rational Expectations Modeland Chapter 6 which uses a genetic algorithm as an example of a stochastic adaptive learningprocess.I - BackgroundResearch on learning in rational expectations models is often classified into two categories.The first looks at learning schemes which are the natural extension of “rationality”. Eachagent specifies a likelihood function over the unknown parameters and updates her beliefsaccording to Bayes’ Rule. The likelihood functions form a rational expectations equilibriumin that each agent’s likelihood is correct given the likelihood functions of the others. Whileother forms of learning might appear ad hoc, this specification is at least as informationallyand cognitively demanding as traditional rational expectations models. The second categoryof models focus on least squares learning as a model of a boundedly rational agent. In thesemodels, agents treat endogenous relationships as exogenous and minimize a forecast error.2Many of the models of rational or least squares learning treat learning as a deterministicalgorithm. Given the same observations (history), two agents will have identical beliefs and2 For example, Marcet and Sargent (1988, 1989) have introduced models where least squares learning does converge to therational expectations solution. More generally, agents follow a myopic strategy which tries to estimate endogenous relationshipas if they were exogenous. Evans and Honkapohja (1994) present general convergence results for this type of model. Forbackground material on rational learning models see Blume, Bray and Easley (1982) or Blume and Easley (1993). For learningin a model similar to the GS model used in this paper, see Bray and ICreps (1987) investigate Bayesian learning and Bray (1982)considers least squares learning.-107-Learning, Evolution and SeifOrganization Stochastic Adaptive Learning in a Rational Expectations Modelactions. In this thesis, learning is modelled as a stochastic adaptive process. Unlike modelsof learning where agents continually and unboundedly use Bayesian updating or minimizeforecast error, this thesis treats learning as a process of trial and error. Good strategies areimitated. Bad strategies are abandoned. Periodically agents experiment with a novelstrategy. An efficient stochastic learning algorithm is one which effectively trades off theexploitation of current knowledge against experimentation to acquire new knowledge.3There is a natural relationship between stochastic adaptive learning and evolution. Theeconomic notion that agents making systematic mistakes have an incentive to modify theirbehaviour parallels the natural selection of good genes from a diverse gene pool. Stochasticmodels of learning are closely related to recent models in evolutionary game theory whichincorporate random perturbations or mutations (Kandori, Mailath and Rob (1993) and Young(1993)). In the rational expectations model introduced in this chapter, an importantimplication of stochastic learning is that agents with common observations need not behaveidentically. Thus, in addition to learning about the environment, agents must learn aboutother agents. The complexities introduced from learning about learning are related to coevolution in biology. The preceding two chapters both focus on situations where an adaptivechange by one agent alters the environment of the other agents. In a rational expectationsFor example a monopolist facing an unknown demand curve must both experiment with different prices to “plot” the demandcurve and choose a price that maximises profits. Easley and Kiefer (1988) consider optimal (dynamic programming) modelsof this trade-offwhile Arthur (1993) focuses on models which can be fitted to human (perhaps sub-optimal) learning processes.- 108 -Learning, Evolution and SeOrganiza1ion Stochastic Adaptive Learning in a Rational Expectations Modelmodel, an agent’s optimal response to an endogenous parameter (asset price, for example)changes as other agents alter their behaviour.Finally, this chapter is related to several papers that consider evolution and asset prices.Blume and Easley (1992) model the evolution of portfolio strategies. Strategies which areassociated with higher wealth are selected more frequently. In this model rational (utilitymaximising) strategies need not survive. This result is similar to that of De Long et.al.(1990) where “irrational” noise traders actually maximise their evolutionary survivability bygenerating high expected returns. The evolution or imitation mechanisms (by assumption)select high wealth strategies and not necessarily those that yield high utility. Cabrales andHoshi (1992) study the evolutionary selection of differing types of “irrational” tradingstrategies (chartists, for example). In contrast to Blume and Easley or De Long et.al., theevolutionary choice mechanism in Cabrales and Hoshi favours strategies that give higherexpected utility. These examples highlight the fact that optimal strategies depend on whatothers are doing. However, unlike these articles, this chapter will focus on a single periodasset pricing model where dependence on other agents occurs because agents make inferencesusing endogenous relationships.- 109-Learnrng, Evolution and Sal-Organization Stochastic Adaptive Learning in a Rational Ezpecto,tions ModelII - ModelIn this section, the asset pricing model for the economy is introduced. Learning and itseffect on asset demands are also discussed. This model is also used to consider learning asa deterministic dynamic in Chapter 5 and to simulate learning in a genetic algorithm inChapter 6.(A) StructureThe model is developed from Grossman and Stiglitz (1980). The GS model is a one-periodendowment economy where agents can purchase a costly common signal about the period-enddividend of a risky asset. Agents choosing not to buy the signal make inferences about thesignal from the competitive risky asset equilibrium price. Since all informed agents receivethe same signal, the equilibrium price offers no additional information to the informed agentsabout the dividend.Since this chapter studies adaptive learning, the one-period model of GS will be repeated bysuccessive generations of agents. While agents observe and can learn from previousgenerations, the economy they face is essentially a single-period one.4 In this stationaryenvironment, we will investigate if stochastic adaptive learning (from previous generations)An alternative interpretation of the model with N infinitely lived agents is presented in Bray and Kreps (1987). To preservethe one-period nature of the model, agents trade at t and dividends are realized and completely consumed at t+ ½. Since wealthis non-storable across periods, agents are faced with a series of single-period problems.-110-Learning, Evolution and SeifOrganization Stochastic Adaptive Learning in a Rational Expectations Modelis likely to lead to GS rational expectations equilibrium. A discussion of the complexity ofconsidering learning in a dynamic model is deferred until the model and learning have beenintroduced. The following assumptions characterize the model.Assumptions1. At each date t=O, 1,2,..., a generation of N one-period lived agents receives anendowment and trades. Agents have identical constant absolute risk aversion (CARA)preferences, with a risk aversion parameter of a, over end-of-period wealth. No wealthis transferred between generations.2. Two securities, a risk-free and a risky asset, trade in a perfectly competitive andfrictionless market.53. The payoff on the risky security for generation t is denoted d, and is given byd1=I3 (4.1)where the /3 and i3 are constants and )‘ and e are normally distributed as describedbelow. The€is not observable at t.4. Agents can choose to observe the signal, y1, at cost, c. These agents are said to beinformed. All informed agents receive the same signal. Each period, the set of N agentsIf N is small, this assumption is unrealistic. However, the model can be interpreted as having N finite classes of agents whereeach class has an infinite number of atomistic agents. Jackson (1991) considers a model similar to the Grossman-Stiglitz modelexcept where agents do not act atomistically. Strategic trading can eliminate the need for noise traders in establishingequilibrium existence. The perfectly competitive case is determined as a limit of the his model.— 111 —Learning, Evolution and SeOrganization Stochastic Adaptive Learning in a Rational Expectations Modelcan be partitioned into I, the informed agents and U, the uninformed agents. Define Xto be the portion of the population that is informed.5. The per capita supply of the risky asset at t is e shares which is normally distributed asdescribed below and is unobserved by the agents at t.66. The random variables y1, and e are independently distributed across periods as follows.Yt 0 00— Normal 0 , 0 cj 0 (4.2)e e O0cy7. The risk-free asset is in zero net supply. Both the payoff and the price of the risk-freeasset are normalized to one.7 The equilibrium price of the risky asset for generation t,is denoted P and is stated relative to the risk-free asset.8. The model and parameters are common knowledge except for i3o and i3. The agents’knowledge of f3 and i3 are discussed below. In particular, the agents may not knowthe value of these parameters.6 The assumption that the supply of the risky asset is stochastic is necessary for the existence of a rational expectations equilibriumwith a costly signal. A noisy asset supply (often referred to as noise traders) prevents the rational expectations equilibrium pricefrom being fully revealing.Recall that each generation consumes only once and no wealth is transferred between generations. Therefore, as in the standardone-period model, this assumption is without loss of generality.-112-Learning, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Ezpectations ModelSince an agent lives for only one period, the remainder of this section suppresses timesubscripts. W0’ is the endowed wealth at the beginning of the period8 and W1 is the end-of-period wealth. Agent n’s utility is a function of risky asset demand (shares of the riskyasset), x, as followsV(x) = E[_exp( —a W15)I fv] (4.3)where W1= W-) — O’c+x”(d—P) incorporates the budget constraint. is the information(both private and public) available to agent n. O’ is 1 if the agent is informed (ie. yE O) and0 if the agent is uninformed (ie. y O’). The next section considers the formation of assetdemands, conditional on information choice. Section 4.ffl discusses the information choiceitself.Before introducing learning in the model it is helpful to consider the features of the modelthat make it an interesting environment for considering learning. The first is that the rationalexpectations solution can be calculated in closed form. The Grossman-Stiglitz (GS) rationalexpectation equilibrium serves as a bench-mark for considering the effect of learning.8 Note the CARA preferences produce demands which are unaffected by initial wealth. Assumptions, therefore, about the initialendowments are for the most part immaterial. However if the initial endowment is in shares of the risky asset, it is necessarythat the endowment not provide infonnation about the total endowment. If this is not assumed then endowments would affectdemands through the information they provide about d. In addition, initial wealth affects adaptation in the genetic algorithmimplementation in Chapter 6 by changing relative utilities. This is discussed further in that chapter.- 113 -Learning, Evolution and SeifOrganization Stochastic Adaptive Learning in a Rational Expectations ModelSecond, because the model is not an intertemporal one, it does not have multiple equilibria.9In contrast, Bullard and Duffy (1994) and Arifovic (1994b) use (genetic algorithm) learningas an equilibrium selection in overlapping generations models. Third, the learningenvironment is richer than the limited “rule-of-thumb” asset demands found in De Long etal. (1990) or Cabrales and Hoshi (1992). Here, agents will formulate reasonable beliefs andoptimal portfolios based on those beliefs. Fourth, the model can incorporate alternativespecifications of learning such as stochastic adaptive learning (the genetic algorithm inChapter 6 for example) or deterministic learning like least squares learning and the learningprocess studied in Chapter 5. Finally, the model can be used to highlight the differencebetween the informed agents’ simple learning of exogenous relationships and the uninformedagents’ complex learning of endogenous relationships.B LearningIn a rational expectations model, such as that of Grossman-Stiglitz, agents know all therelevant parameters with certainty. Furthermore, such information is common knowledge.Under this assumption, the optimal risky asset demands of the agents are linear. Thisfollows from the properties of CARA utility and normality.1° In the Grossman-Stiglitzmodel demands areAs presented, the rational expectations equilibrium price relationship and GS equilibrium proportion of informed agents (X’) areunique. However, since the GS equilibrium does not speci1’ which particular agents are informed, strictly speaking there aremultiple equilibria.10 See Grossman (1976) for details.-114-Learning, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Expectations Model= E[dIfP] — P (44)a V[dItr]where conditional expectations are linear in the signal, y (if informed) or risky asset price,P (if uninformed) and conditional variance is constant. In this model, it is assumed that thefl0 and the j3 are not common knowledge. It is these parameters and the endogenousrelationship between signal and price (discussed below) that traders need to learn.Alternatively, traders need to learn how to choose their asset demands based on theirinformation.There are several alternatives for modelling the learning of the asset demands. Grossmanand Stiglitz assume rational expectations. Their analysis starts after learning is“complete.”11 One approach to learning is to assume that traders have prior beliefs aboutparameters (exogenous and/or endogenous) and learn by updating their beliefs according toBayes Theorem. However, since in general posterior beliefs are non-normal, demands willnot be linear and the model is very difficult to analyze.’2 Alternatively, one can specifya functional form for asset demands and consider how agents learn parameters of theimposed structure. This is the approach used in least squares learning and the approachtaken in this chapter (and the two which follow).Grossman and Stiglitz (1980) state (page 396), If over time uninformed traders observe many realizations of [y,P], then theylearn the joint distribution ... After all learning of the joint distribution ceases, ...“12 See Bray and Kreps (1987) for an example of a Bayesian learning model.- 115 -Learning. Evolution and S4/Organization Stochastic Adaptive Learning in a Rational Expectations ModelAssumption 9 Asset Demands are of the form in (4.4) where conditional expectation andvariance, depending on information choice, are as follows’3E’[dly] = + (4.5)V’[dly] = + (4.6)E’[dIP] = + p (4.7)9V”[dIP] = + ..yU (4.8)The learning state of trader n is defmed as the set V= {O, £z}. The O indicates whether ornot the agent is informed. The £z is the inference portion of the learning state anddetermines how the agents use information to calculate demands. That is= ((3), .y, , ,u) (4.9)These parameters determine how the agent calculates asset demands in the situations whereshe is informed and where she is uninformed. Note that the linear demands of the informedand uninformed are characterized by three parameters. This over specification of the linear13 denotes an informed individual and u denotes uninfonned individual. Parameters which refer to aggregate behaviour ofinformed or uninformed will be denoted with superscripts land U respectively.- 116-Learning, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Expectations Modelfunctions is innocuous and serves only to simplify discussion. Note that the learning stateis agent specific. Since learning in this thesis is a stochastic adaptive process, the parametersused for calculating asset demands will vary across agents. The information sets, on theother hand, are not agent specific. All informed agents observe the same information. Therelevant information sets are (F = {y,P} and O = {P} 14Assumption 9 is important since it implies that the equilibrium price will be linear in thesignal y (shown below). This maintains the tractability of the standard exponential-normalenvironment. The assumption can be interpreted literally as successive generations of agentsacting as if the parameters in their learning state were the true parameters. Clearly this isa bound on agents’ rationality. Note that asset demands in (4.4), with the correctparameters, will maximize expected realized fitness. Since traders are assumed to adaptbased on the past success, it is realized utility (or fitness), rather than subjected expectedutility which drives adaptation. This point is discussed in Blume and Easley (1992). Finallynote that the structure of the asset demands are consistent with the Grossman-Stiglitz rationalexpectations equilibrium. 1514 Note that similar to models where agents have differing prior beliefs, agents get no contemporaneous information from otheragents’ learning states (which may be partially inferred from the equilibrium price).15 The alternative to making Assumption 9 is to abandon the simplicity which comes with a linear pricing relationship. It appearsfrom numerical calculations that such non-linear equilibria exist (see Bernardo and Judd (1995), for one such example). Inaddition, the non-linear equilibria have the necessary feature that price is monotonically (and smoothly) increasing in signal.The difficulty in applying such a model in a learning environment is how to describe an individual trader. In the linear modelpresented in this chapter (and those which follow), an agent can be described by a small vector of parameters. In non-linearmodels this is no longer the case. Ongoing research is investigating whether agents could be simply described by artificial neuralnetworks (ANN). This has potential since ANN’s can approximate non-linear functions with a small number of parameters.In general, rational expectations models assume that agents know both the functional form and the appropriate parameterization.In the model used in this paper, agents know the functional form but must learn the parameters of the endogenous relationship.A more general model using ANN’s would have the agents learning the functional form as well.-117-Learning, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Expectations ModelThe learning state for agent ii, = {O”, fn}, determines the information choice of agent n andher parameters for calculating asset demands. The population learning state, L= {V}..1,isthe collection of learning states of the agents. The population learning state determines theequilibrium asset price relationship as stated in the following lemma. Besides describinghow agents make inferences, L determines the sets of informed, I, and uninformed, U,agents. Given L, the equilibrium price for the risky asset can be described.Lemma 1 Given L= {L)1,the equilibrium price is give by the following linear equationPa0+1y+o!2(e—e) (4.10)Proof. See Appendix B.(C) “Rational” or Complete-Knowledge Learning StatesIn the following sections, an adaptive process for the learning states is discussed. In orderto determine where adaptation should lead, this section identifies the rational or complete-knowledge learning states. The optimal learning state would be the choice of an agent who,unlike the agents considered in the model, had complete knowledge of the model parameters(j3 and i3) and the behaviour of other agents’ learning states (contained in L).Lemma 2 Given L, the rational inference portion of the learning state can be described by£*=(Oi*f3li*/*flOu*1lu*.), ) where-118-Learning, Evolution and SeOrganizadon Stochastic Adaptive Learning in a Rational Expectations Model13*=13*=13*,,j*=(4.11)= 13; 1314 a 13 = 13 2 22 2= 13*222 2 2aloy+a2oe CEjO•y + a2oeProof. See Appendix B.The rational inference parameters of the agent if informed follows from the exogenousdividend-signal relationship. The rational inference parameters of the agent if uninformedreflects the more difficult task. Not only must a rational learning state reflect the trueexogenous relationship between dividend and signal (equation (4.1)) it must also correctlyuntangle the endogenous relationship between equilibrium price and signal (equation (4.10)).It is for this reason that the rational learning state depends on the learning states of the otheragents. Note that the a’s from (4.10) depend on the population learning state. Given thisinterdependence, the rational learning state can be thought of as containing best replyfunctions.An immediate product of Lemma 2 is the risky asset price in the rational expectationsequilibrium. The rational expectations equilibrium results when all agents are characterizedby a rational learning state (ie. VI=f* for all ii). Since all agents are identical, except for-119-Learning, Evolution and SeifOrganization Stochastic Adaptive Learning in a Rational Ezpectations Modeltheir information choice O, it does not matter which specific agents are informed. Thus, fora given proportion of informed agents, X, the rational expectations equilibrium pricerelationship is given by the following lemma.Lemma 3 The unique rational expectations equilibrium price for a given X is determinedbyP = a + a y + a (e—) (4.13)Proof See Appendix B.ifi - Learning, Equffibrium and StabifityThus far the choice of whether or not to become informed has not been discussed. Thissection considers this choice by comparing the expected utilities of the informed anduninformed agents. This comparison is used to consider the stability of the GS equilibriumwhen agent learning is stochastic and adaptive.(A) Comparison of Informed and Uninformed UtilityA rational choice of whether or not to become informed (choice of 6) requires a comparisonof the cost of being informed, c, with the increased uncertainty one faces when uninformed.-120-Learning, Evolunon and SetOrganization Stochastic Adaptive Learning in a Rational Expectations ModelWe formalize this comparison after discussing the sources of uncertainty faced by theuninformed agent.Uninformed agents face four potential sources of uncertainty.i The €-noise in the dividend.ii Noise introduced in the inference of the signal by the aggregate supply noise.iii Uncertainty about the relationship between the signal and the dividend; equation(4.1).iv Uncertainty about the relationship between the price and the signal; equation (4.10).The first two sources of uncertainty are identical to those of the GS model and are presentin the rational expectations solution. The nature of the third source of uncertainty isidentical to the uncertainty faced by the informed agents. The final source of uncertainty iswhat makes the task of the uninformed difficult. The a parameters in (4.10) are functionsof the informed and uninformed agents’ learning states. Every time any agent changes theinference portion of her learning state or the proportion of informed agents changes, all theuninformed agents must learn a new pricing relationship and adjust their learning statesaccordingly. The adaptive learning of an uninformed agent must “co-adapt” (or co-evolve)with the learning states of both informed agents and other uninformed agents. Recall thatall informed agents receive the same signal and therefore do not rely on the equilibrium pricefor information. The informed agents, therefore, do not face this problem.- 121 -Learning, Evolution and SerfOrganization Stochastic Adaptive Learning in a Rational Expectations ModelThis discussion is formalized in the following calculation of the relative utility levels of theagents conditional on being informed or uninformed. Since the learning process is adaptive,the relevant comparison is between expected realized utilities of the informed and uninformedagents. Thus the expectations are not taken with respect to individual agents’ beliefs(incorporated in their learning state), but under the true dividend process described by (4.1)and (4.2). After introducing a few simplifying assumptions, the next two propositionscalculate the expected realized utilities.Letf(r,LIP)=E[v(W(x’))IP] and f(V,L)=E[V(W(f))]. These are the expected utility ofthe informed agent conditional on P and unconditional, respectively, where W’(x) is the end-of-period wealth as defined in (3), given optimal asset demands, f, from (4.4) and theparameters contained in V, when the population learning state is L. Similarly, for theuninformed agents, let fu(,L I F) =E[V(W(x’)) I F] and fu(r,L) =E[V(W())].In order to simplify the algebra let 130* and 131* in (4.1) be known by the agents and forconcreteness let 13=1. This focuses the discussion on the more interesting learning of theendogenous price-signal relationship rather than the straightforward learning of theexogenous signal-dividend relationship. Since the I3*s are not assumed to be commonknowledge, the learning of the uninformed remains non-trivial. In the genetic algorithmsimulations of this model in Chapter 6, this assumption is relaxed. Recall that the linearasset demands of the agents (linear in y for the informed and in P for the uninformed) are-122-Learning, Evolution and SeOrganization Stochastic Adaptive Learning in a Rational Expectations Modelcharacterized by three parameters. Since a linear function has only two free parameters, forboth the informed and uninformed agents, the conditional variances parameters will be heldat their rational levels (-( and ‘y”).Proposition 1 Given [3f=(3 and (3/=(3=1 and given a population learning state L, theexpected realized utilities, conditional on F, are as follows:2 ½f(V,LI F) = exp(a c) fU(C * ,L IP) (4.14)+ 02(L)In( 2fu(tn,LIp) = exp fU(f*,LIP) (4.15)2( + 02(L))where o2(L) = V[y I P]=- and (F)= (Ufl0)+ (fl—j31”)F.Proof. See Appendix B.Note that both utilities are stated relative to the utility of an uninformed agent using therational learning state parameters, £ *(L) The three elements which contribute to the relativeperformance of the informed and uninformed agents are the cost of information, c, theinformativeness of the price about the signal, o2(L), and the inference errors made by the-123-Learning, Evolution and S4f-Organizazion Stochastic Adaptive Learning in a Rational &pectations Modaluninformed, r(P). Recall that if r=f(L) then no inference errors are made and “(P)=O.Note that an uninformed agent’s expected utility (conditional on price) decreases as inferenceerrors become larger.16In order to compare the desirability of being informed or uninformed, the unconditionalexpected realized utility must be calculated. In particular, the relative values of the expectedrealized utilities are of interest. In the following proposition the utilities are normalized sothatf’(r,L)=l andfu(r,L) O.Proposition 2 Given (3,,’=13 and (=f3=1 and given a population learning state L, therelative realized expected utilities are 1 and2fu(f,n,L) = exp(ac) 05+ o2(L)(4.16)1÷C(1B2) ½__________• exp1+C 2(l+Co,J(1+Co,(1_B2))f1+Cc4(1-B2)>Oandf(r,L)=O otherwise and where16 Note that the negative exponential utility implies thatf(t’,LI F) <0.-124-Learnthg, Evolution and S4f-Organization Stochastic Adapsve Learning zn a Rational Erpectoiions ModelX = f3 +(P)-P = j3 +(flr -1)P —Nonnal(i,o) (4.17)A= i3—i3;-*(4.18)B =_____(4.19)I3 * -1= 1 (4.20)+ o2(L)Proof. See Appendix B.While the expression for the unconditional relative utilities is more complicated, theunconditional relative utilities have similar intuition as in Proposition 1. Note that A capturesinference errors in the intercept and B captures inference errors in the slope. WhenA=B=0, no inference errors are made and the expression forfu will collapse to facilitatecalculation of the GS rational expectations proportion of informed agents, X (see below).- 125 -Learning, Evolution and S4lOrganizazion Stochastic Adaptive Learning in a Rational Expectations ModelAny inference errors (ie. A or B moving away from 0) will reduce the utility of theuninformed agent. The condition 1+ Cc(1-B2)>0, which implies that f’ is non-zeroessentially states that slope inference errors ((f314*) cannot be too large. As the slope-basedinference errors become large the absolute level of the uninformed realized expected utilityapproaches negative infinity and the ratio of informed to uninformed utility approacheszero. 17The GS rational expectations proportion of informed agents, X*, follows directly frompropositions 2. In the GS rational expectations equilibrium, agents do not make inferenceerrors, and therefore, (P) E0 or A=B=0. The equilibrium follows from noting that forX E (0,1), agents must be indifferent between being informed and uninformed.18Corollary 1 if the population learning state in a rational expectations equilibrium, V.implies a proportion of informed agents XE (0,1), it is given by= -(e-1) (4.21)e—117 If z—N(jic#) then E[exp(-kz2)j is not finite if kr’.It is also possible that X’= 1 or These cases occur if there is no X for which the informed and uninformed agents areindifferent.- 126 -Learning, Evolution and SeI’Organization Stochastic Adaptive Learning in a Rational Expectations ModelProof. See Appendix B.(B) Stochastic Adaptive Learning ProcessThus far, we assumed the population learning state was given and we focused on a singleperiod. This section describes the learning process which generates a sequence of learningstates through time. The properties of the learning process and their effect on the GSequilibrium is analyzed in the following section.A stochastic adaptive learning process for agent n, , is defmed as= {Ln } (4.22)t=1The L={r’,V} is a modified learning state where rE (0,1) is the probability that agent n,at period t, is informed (ie. prob(O’ = 1)). Note that the only difference between a modifiedlearning state and a learning state is that in the modified learning state the agent’sinformation choice, O, has not been realized. Instead, the information choice is representedby a probability of becoming informed. The learning process generates a sequence oflearning states by determining how learning states are updated. Since learning is adaptive,can depend on the population learning states, L, from time T t.19 In addition, agents19 . .The assumption that modified learning states are updated based on past learning states, as opposed to past modified learningstates, is not important but is consistent with the view that adaptation is based on observed past behaviour of other agents.-127-Learning, Evolution and SeifOrganization Stochastic Adaptive Learning in a Rational Expectations Modelwill imitate strategies which are more successful more often. Thus learning states associatedwith relatively high utility (fitness) at t are copied by agents for use at t+ 1. Since thelearning is stochastic, the transition from the learning states of one generation to the next,4n to .L, is probabilistic and E (0,1). The probabilistic transition and the strictly mixedinformation choice capture the experimentation of learning. In summary, agents experimentwith different learning states (stochastically) and mimic those learning states which are moresuccessful (adaptively). For a given choice of a particular learning state, however, assetdemands are determined rationally.The model is constructed to facilitate an adaptive (or evolutionary) interpretation of learning.However, as previously discussed, it is not a dynamic asset pricing model. The model isthat of a repeated one-shot game. Similar to the underlying interpretation in evolutionarygame theory, the model is that of a sequence of one-period-life agents who observe the pastand are myopic. In the one-period model, agent fitness depends on the ability to predict theend-of-period dividend (mean and variance). Since this may require an inference from theendogenous price-signal relationship, an agent’s fitness depends on the current populationlearning state. In a dynamic model, agent fitness (and objectives) are more complex.Agents would need to predict the next period return which consists (at least in part) of nextperiod’s price. Since future prices depend on future learning states the degree of coadaptation (or co-evolution) increases dramatically. This sort of complexity, withoutlearning, is studied in Kraus and Smith (1994). While it would be interesting to apply-128-Learning, Evolution and S4jOrganization Stochastic Adaptive Learning in a Rational Expectations Modelstochastic adaptive learning and genetic algorithms to a dynamic GS-like model such asWang (1993), it is beyond the scope of this chapter.2°(C’ Learning and Local StabilityThis section defines a local learning equilibrium. The definition is intended to capture thebehaviour of stochastic adaptive learning close to the GS rational expectations equilibrium.Definition L, = (L] is a local learning equilibrium f(a) for all n=1,...,N== lirn (T:,f) (4.23)and (b) for any a> 0, there exists a T such that for all t T and for alln =1,... ,N, L is within of a best response to (Li,...20 Note that utility functions which produce myopic portfolio strategies reduce the complexity of a dynamic model only slightly.In a dynamic model, returns depend on future prices even with myopic portfolio policies. In addition, the myopia implied bysingle-period lived agents implies that when agents adapt by imitating others they do not consider the long-run effect (‘ic. thattheir strategy might be imitated) or that others are also adapting. Interpretations such as Bray and Kreps (1987) (see footnote4) that agents are long-lived but lack the ability to transfer wealth betweenperiods becomes complicated in a model with adaptivelearning. In such a world agents might have incentives to consider the dynamic implications of learning. The reasonablenessof the myopia assumption in a complex environment, such as a financial market, is an open question. Kandori, Mailath andRob (1991) discuss myopia of adapting agents in the context of evolutionary game theory.- 129 -Learning, Evolution and SeOrganization Stochastic Adaptive Learning in a Rational Expectations ModelThere are several important features of the definition. First, (a) implies that a stochasticadaptive learning process is a local learning equilibrium only if it converges.21 Thus, thedefinition is used to identify properties of learning close to the GS equilibrium. Second, (b)implies that the converged learning states must form mutual best responses (ie. are rationallearning states). Thus the definition is a refinement of a Nash equilibrium and the GSequilibrium. The requirement of (b) captures the “adaptive” nature of learning in this model.Agents have an incentive to continue adapting as long as utility (or fitness) can be increased.Third, condition (b) is stronger than requiring that the converged learning states form a Nashequilibrium. In fact since learning processes are stochastic, the local learning equilibriumdefinition is equivalent to a normal form version of a (trembling-hand) perfectequilibrium.22 It is necessary, in a learning equilibrium, that learning states be robust totrembles since experimentation by other agents vanishes only in the limit. When agents arelearning by trial and error, optimal strategies must be robust to other agents’experimentation.23’421 Note that £€ W. Convergence is defined as point-wise in each of the seven elements. Convergence and within are bothwith respect to Euclidean distance.See Kreps (1990) for a definition and discussion of perfection. Trembling-hand perfection was developed in Selten (1975).The definition of local learning equilibrium is also closely related to that of Evolutionary Stable Strategy (ESS) equilibria. ESSare appropriate for models of adaptation (or evolution) since they select a collection of strategies (or learning states in our model)which are robust to mutant strategies (ie. experimentation). For a fornial definition and discussion of ESS and perfect equilibriasee van Damme (1987). Using an identical argument to that used by van Damme to establish that every ESS is a perfectequilibria, it is straight forward to show that every ESS is also a local learning equilibrium.24 Note that this section relies on the assumption that N is finite.-130-Learning, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational &pectations ModelProposition 3 The GS Rational &pectations Equilibrium X (where X denotes theproportion of informed agents), is a local learning equilibriwn when o> 0and N is large.Proof. Consider the following stochastic adaptive learning process. For X N agents let=1 —r’ and for the remaining (1— X)N agents let i= h• t’ (where 0 <h, <t andlim,h1=h< co). For all n agents, let £‘=f(L0,)+ where ER6 is uniformly distributedin the hypercube [—2,]6.The £ is defmed by (4.11) and (4.12) where L(=lim,L, isthe limit of the population learning states. Note that proportion X of the population willhave r, = 1 (ie. informed) and proportion (1—XD will have =0 (ie. uninformed).Therefore L =L for all agents (see (4.23)). Thus, L0,=L and by construction L,, =L.Next, note that since t(L1) is continuous in the elements of the population learning state, L1,as t gets large, 1? is arbitrarily close to t(L for all agents. It remains to be shown that theagents’ information choice (T’s) are close to best responses. From inspection of (4.16), notethat and 4fu(t*,LD are equal at X when V=t(L). Secondly, note that-r4-f”(f,L) >0 since o2 is strictly decreasing in X for oe >0. Therefore, locally about X,(4.16) is also increasing. Thusfu(L7S(L*),L*+) > fu(.(L.)L.)p=1 > fu(fs(Ls),L._) (4.24)where L (L) is the GS equilibrium population learning state, L, with a small decrease(increase) in X. Recall that the GS rational expectations equilibrium is constructed so that- 131 -Learrnng, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Expectations Modelagents are indifferent between being informed and uninformed; that is fU(1? *(L*) ,L*) fi=1.Inference errors caused by changes in other agent’s learning parameters (the ) are becomingarbitrarily small at a rate proportional to j2, and ir is approaching its limit at rateproportional to t. Thus by appropriate choice of h1, for large t and N, E[f”(V,L1)]=1 (takingexpectations over possible population learning states), leaving agents indifferent betweenbeing informed or uninformed and therefore indifferent to ir.The proof highlights the two different sources for the (P) inference errors of theuninformed. The rational inference parameters of the uninformed, (4.12), depend on thelearning state of’the population. Agents can be mistaken about the information choice of theother agents in the population learning state (the other agents’ 0 or more simply the valueof X) or the inference portion of the population learning state (the V of other agents).Consider a population learning state at the GS rational expectations equilibrium learningstate, L*, where by definition, no inference errors are made. If any agent (informed oruninformed) happens to alter an inference parameter (an experiment), all uninformed agentsare worse off since their inference parameters, €z=€*(L, are no longer correct. In contrast,consider the effect if all agents continue to use £(LD but there is a small change in A awayfrom X. If A declines then all uninformed agents are worse off since the price is lessinformative (ie. o2 increases) and they make small inference errors. If A increases, however,the price is more informative. This acts in the direction of making uninformed agents betteroff. Moreover, since the change in A is small, inference errors resulting from the small-132-Learning, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Expectations Modelchange in X can be ignored (a result that follows from the Envelope Theorem) and alluninformed agents are better off. The existence of a local learning equilibria near the GSequilibrium relies on small trembles (experiments) in the X and much smaller trembles in theinference parameters. The result is that, on average (and for large t), uninformed agentsremain indifferent about their information choice?The above discussion leads to a corollary. When the supply of the risky asset is constant(o=0) and the information is costless (c=0), then XE (0,1] are all GS equilibria.However, equilibria with X* <1 are not robust to trembles. Small positive changes in theX cannot make the price more informative since it already perfectly reveals y. Inferenceerrors always leave the uninformed agents worse off. Thus they choose to become informed.Corollary If o= 0 and c = 0, only X = 1 is a local learning equilibriwnProof. Using the same defmition of learning processes in proposition 3, note that ifV=t* since a2=0 (ie. the signal is perfectly revealing) and c=0, thenfi=fu. Therefore,given any perturbation from L, uninformed make inference errors and are strictly worse offthan the informed. Thus =0 (choosing to be uninformed) is not a best response. UNote thatf(t,L is concave in A. Thus f(f,L) is locally concave at (t(L),LD. The utility of the uninformed is concavenear the GS rational expectations equilibria. Since this is the case, small increases in A must be slightly more common thansmall decreases in order to maintain the indifference of the uninformed (ie. h,> 1). This implies that the time averaged Ainduced by a local learning equilibrium must converge to X from above.- 133 -Learning, Evolusion and Sef-0rganizadon Stochastic Adaptive Learning in a Rational Expectations Model(D) Stability with Always-Experimenting Stochastic AdaDtive LearningThe previous section demonstrated that learning processes can be described that lead to theGS equilibrium. While this is a necessary condition for the GS equilibria to be “useful,” itis not sufficient. This section describes conditions for stability for a given learning process.In particular, this section describes the robustness of the GS equilibria to fixedexperimentations which do not necessarily vanish in the limit. This is important inunderstanding complex financial markets where the arrival of new agents and exogenouschanges in the environment lead to continual experimentation. In addition, the inability todistinguish good rules from lucky outcomes also contributes to continued experimentation.The genetic algorithm simulations presented in Chapter 6 have experimentation rates whichdo not vanish and fitnesses (utilities) which are observed with error.This section develops a defmition of stability that is a refinement of a local learningequilibrium. The analysis is tailored to the learning process captured by the geneticalgorithm simulations. First, a definition of a stochastic adaptive learning process with non-vanishing experimentation is needed. Define an always experimenting stochastic adaptivelearning process (AESAL) as a stochastic adaptive learning process, , where for all agentsn=1,...,Nlim’r = E (0,1)(4.25)1int = +-134-Learning, Evolution and S4!Organization Stochastic Adaptive Learning in a Rational Expectations Modelwhere is a random variable (mean zero) in R6 with a non-degenerate distribution.26 Insuch a learning process, the lower bound on the size of experimentation is given by thecertainty with which an agent is informed or uninformed (ie. min[r, 1—r]) and theproperties of the distribution governing . A norm for the size of the experimentation ratescould be constructed?7 However, the details of this construction are not necessary for thediscussion which follows and AESAL’s will simply be assumed to be indexed by K whereK is a measure of the limiting size of the experiments.Since experimentation is non-vanishing, the learning process will never converge to any fixedpopulation learning state let alone the GS rational expectations equilibrium. However, if thepopulation learning state is usually close to the GS equilibrium, then it reasonable to viewthe GS equilibrium as stable. This is similar to the motivation for the definition ofEvolutionary Stable Strategies (ESS). There are many possible ways to formalize thisconcept. This definition uses the time average X relative to the GS equilibrium value of X*Definition For > 0, the equilibriwn L is tX-stable to AESAL(K) fthere exists some AESALprocess with experimentation of at least K such thatE [lim—--X1-* 1 < tX (4.26)LT t=o T j26 It is not cn ial that these limits exist in order to consider stability. This is simply a convenient way of speciIiing the lowerbound on the size of experimentation.27 For example such a norm would combine i with the properties of such as its bounds.- 135 -Learning, Evolution and SeOrganizasion Stochastic Adaptive Learning in a Rational Expectations ModelA stronger definition of stability might also relate to the behaviour of e in the population.However, since the learning process is adaptive and an agent’s utility (fitness) depends onboth information choice, 0, and inference parameters, 4?, the X will reflect the relativeinference abilities of the informed and uninformed. The parameter in the definition allowsfor the possibility that X remains slightly above X in order to compensate the uninformedagents who are adversely affected by the experimentations of other agents.Figure 4.1 presents the utility of the uninformed agents (relative to the informed agents)from the definition of f” in (4.16). The figure will be used to help demonstrate that thestability of the GS equilibrium depends primarily on the slope off. The graph plots thefitness of an uninformed agent against X. As noted previously, the informed agent’s utility,f’, equals 1 by a normalization. There are four different curves presented for the uninformedagent’s utility. They differ according to the assumptions about the agent’s inferenceparameters, 4?n, and the learning parameters of the other agents (contained in L) which affectthe rational inference parameters, 4?* Curve A is the rational expectations level of utility.This is the maximum utility an uninformed agent can attain for a given X since r=4?*(L).Note that for a given X, since the informativeness of the price about the signal (o is notaffected by altering L, neither is curve A.28 The point where curve A intersectsf= 1 yieldsthe GS equilibrium of X*. Curve B lies just below curve A (except at X=X* where thecurves are tangent). Here tz=4?*(L) only when L implies a X=X. On curve B, the28 This uses the assumption the informed agents know the 137 in (4.1). A variation in a 3’ of an agent does affect theinformativeness of the price and would affect curve A.- 136 -Learning, Evolution and S4lOrganization Stochastic Adaptive Learning in a Rational Expectations ModelFigure 4.1- &pected Utility of Informed and Uninformed AgentsThe figure plotsft=l (a normalization) andf’ (for various t’ and L) as functions of Xuninformed are making inference errors stemming from mistakes about the number ofinformed agents but not because of mistakes about the inference portion of the populationlearning state. Curves C and D represent situations where agents are making both types ofinference errors (1 1?). The only difference between curves C and D is that the errors arelarger in curve D.To consider the stability of the GS equilibrium, L, we will consider various perturbationsfrom L. The perturbations stem from the experimentation contained in the always-1.1.01o.0e0i60.040.2 0.3 0.4 0.0 0.0-137-Learning, Evolution and S4Organization Stochastic Adaptive Learning in a Rational Expectations Modelexperimenting stochastic adaptive learning process. The key is to determine whether or notadaptation will force the learning state back to L. First note that perturbations in X aloneare relatively minor. If X increases (decreases) the uninformed, from curve B, are better(worse) off and adaptation will lead fewer (more) agents to be informed to move X towardsx*.The effect of experimentation in other agents’ learning states is more complicated. As notedabove, beginning from L*, all uninformed agents are made worse off by a change in L.Uninformed agents not only have an incentive to re-learn the rational inference parameters,they have an incentive to become informed. In curve C, adaptation will push the proportionof informed towards X1. Near X1, an uninformed agent can improve by adjusting herinference parameters. The possible improvement (via experimentation) is bounded by curveA (where agents use the correct inference parameters f *(L)) Since any increase in theuninformed utility at X1 will make being uninformed more attractive, adaptation will causeX to fall. As long as X remains above X, there is some possibility that experimentation willincrease the utility of the uninformed. Thus the initial perturbation tends to cause anincrease in X followed by a slow decline back to X.Larger experimentation in an other agent’s learning state might lead to curve D. In thiscase, the incentive of the uninformed to become informed is much larger. In fact, X willtend to 1, since even at X =1 an uninformed agent does worse than an informed agent. If- 138 -Learning, Evolution and S4lOrganization Stochastic Adaptive Learning in a Rational Expectations Modelperturbations of this kind are likely then the GS equilibrium is unlikely to be stable. Inorder for adaptation to move X back towards X, an agent must simultaneously experimentwith being uninformed and experiment with inference parameters, r, that push the agent’sutility above the informed (the region indicated by G). In the situation with curve C, manyuninformed agents (ie. 1 —X) are attempting to improve the uninformed portion of theirlearning parameters. When X is close to 1, there are fewer agents attempting to learn howto be uninformed.In order for L* to be stable, it must be the case that perturbations where all uninformedagents are in a situation characterized by curve D must be unlikely. The size ofperturbations in the learning process (described by K) relative to the slope off with respectto X determines this likelihood. The key parameter in determining the slope off is o.When the noise in the risky asset supply is low the conditional variance of y given P, andthereforefu, is less sensitive to changes in the proportion of agents who are informed. Asf14 becomes flatter, a fixed size perturbation leads to a larger increase in X. In addition, theslope also determines the size of the region labelled G in Figure 4.1. When informativenessof the price is not very sensitive to X, this region becomes small. When the size of the Gdecreases and X is close to 1, the likelihood of an agent, through experimentation,discovering a inference parameters which pushes her utility to this region (ie. above aninformed) also decreases. This decreases the chance that X will return to the GS equilibrium- 139 -Learning, Evolution and S4tOrganizalion Stochastic Adaptive Learning in a Rational Expectations Modellevel of X. The region G will play an important role in the next chapter which considersstability more formally.This discussion suggests that for a given level of precision (z in (4.26)) and for a givenalways experimenting stochastic adaptive learning process with some level of experimentation(indexed by K), that there are some critical values for the level of noise trading, e andsuch that the GS rational expectations equilibrium is stable for all > and not stable for<• This conjecture is explored in the next two chapters by analyzing appropriatelychosen deterministic dynamics (Chapter 5) and a genetic algorithm as a simulated alwaysexperimenting stochastic adaptive learning process (Chapter 6).IV- ConclusionsThis chapter has introduced the one-period single asset model that is used to considerstochastic adaptive learning in a non-strategic setting. The model underscores the differencebetween learning an exogenous parametric relationship and learning an endogenous(equilibrium) relationship. While parametric relationships are unaffected by other agents,equilibrium relationships depend on the behaviour of other agents, and therefore learningmust co-adapt or co-evolve. The chapter characterizes a stochastic adaptive learning processin a repeated version of the single period model.-140-Learning, Evolution and SeifOrganization Stochastic Adaptive Learning in a Rational Expectations ModelThe chapter shows the existence of a stochastic adaptive learning process which convergesto the GS equilibrium. The construction of the process requires that experimentation withinference parameters be less common than experimentation over the choice of beinginformed. The chapter also considers the stability of the GS equilibrium to a given learningprocess. The stability of the GS equilibrium depends on the size of experimentation in theagents’ learning processes relative to the uncertainty in the risky asset supply. The noise inthe asset supply is important since it determines the sensitivity of the informativeness of theendogenous price to the proportion of informed agents. The next chapter develops thisrelationship more formally by considering the deterministic analogue to the stochasticlearning process. Chapter 6 also develops this theme by developing a genetic algorithmsimulation as an example of an always-experimenting stochastic adaptive learning process.The simulation results contain examples where the GS equilibria are stable and others wherethe GS equilibria are unstable given genetic algorithm learning. In addition, the geneticalgorithm is used to investigate if a learning process will converge to the GS equilibriumfrom an arbitrary initial population learning state.- 141 -Chapter 5The Stability of SAL. DeterministicDynamicsThe previous chapter concluded with the remark that for a given alwaysexperimenting stochastic adaptive learning process, the stability of the rational expectations(or GS) equilibrium depends on the level of noise trading (and associated cost of information)relative to the level of experimentation in the learning process. This chapter formallydevelops this relation. In order to proceed with the analysis, however, a more specificdescription of learning processes is required. This chapter specifies learning processes usingtechniques developed by Binmore and Samuelson which approximate stochastic dynamicswith deterministic differential equations. These differential equations encompass both- 142-Learning, Evolution and S4f-Organization Stability ofSAL: Detenninistic Dynamicsadaptation and, as a proxy for noise, drift.’ This chapter constructs an appropriatedeterministic learning process which represents stochastic adaptive learning. After presentinga few properties of the learning process, the stability of the GS rational expectationsequilibrium is considered. In this context, the stability is interpreted as meaning that the GSequilibrium lies near a stable dynamic equilibrium of the deterministic learning process.A stochastic adaptive learning process for each agent, which defines the stochastic learningprocess for the economy, was described in the previous chapter as a sequence of learningstates for each of the agents. As described, the economy learning process is a Markovsystem. Unfortunately, the state space of this process is extremely large and makes analysisdifficult. Recall that in Chapter 3 analyzing the ergodic distribution for a four-agent-two-interactions system was almost intractable. Fortunately, in order to consider the stability ofthe GS equilibrium, it is not necessary to consider the entire Markov distribution of thestochastic learning process. Building on Binmore and Samuelson (1995), it is sufficient tofocus on average system behaviour which can be well represented by a deterministic system.The authors, in a game theoretic model, analyze a stochastic evolutionary dynamic (ie.stochastic evolutionary learning process) using deterministic differential equations. In theiranalysis stochastic “noise” (or mutations) is replaced by deterministic “drift.”1 See Binmore and Samuelson (1995) and Gale, Binmore and Samuelson (1995).- 143 -Learning, Evolution and S4Organization Stability of SAL: Deterministic DynamicsThe important result Binmore and Samuelson establish is that the equilibria of thedeterministic system are arbitrarily close to the expected state of the stochastic process (forsome finite but long time period). Their result does not necessarily apply to the “ultra-longrun” (ie. the limiting behaviour as time is made arbitrarily large) if the deterministic systemhas multiple stable equilibria. In this case stochastic shocks, however improbable, can movethe system between the regions of the deterministic equilibria. Without specifying the noisemore precisely, one cannot say in which region the stochastic system will, on average,reside. Similar to Chapter 3, the relative likelihood of stochastic shocks (possible multiplemutations) that can move the system from the region of one deterministic equilibrium toanother would need to be specified. However, when the deterministic system has a uniquestable dynamic equilibrium, then the exact specification for the (small levels of) noise is notimportant, since on average, the stochastic system will remain close to the deterministicdynamic equilibrium.The stochastic adaptive learning process for the GS model will be approximated by thesystem of deterministic equations presented below. The state variables of the dynamicsystem are the proportion of agents who are informed, X, and the vector of average inferenceparameters used by the agents, denoted 1?. This section will maintain the assumption madein Section 4.111(D) in chapter 4 that the informed agents know (and use) the correct inferenceparameters (ie. the informed know the correct 13’s). Again, of the three parameters whichdescribe an uninformed agent’s linear demands, the ‘y” parameter is set according to (4.12).-144-Learning, Evolution and SerfOrganization Stability of SAL: Deterministic DynamicsTherefore, the state variable, £ E R2, need only represent the two inference parameters of theuninformed agents (( and f3u) The deterministic dynamic equations governing theevolution of the (X,,) are2= g(X1,e) + (5.1)= g(X1,1?) +i1m(X,t) (5.2)where g, m, g and m1 are continuously differentiable and bounded functions of X and £?That is g,m:[0,l]xR2-R (appropriately restricted to ensure that X1E [0,1]) andg,,m,: [0,1] xR2-sR. The g functions represent adaptation, the m are drift and capture theeffect of the noise in the learning process. The i’,, and are positive constants whichdetermine the level of drift in the system. Each of these functions is considered in moredetail below.Equation (4.12) in Chapter 4 calculates the optimal inference parameters given a populationlearning state. Since the optimal inference parameters depend only on the average inference2 Time subscripts are suppressed unless necessary.Continuously differentiability implies Lipshitz continuity and ensures that there is a unique continuously differentiable solutionto the equations, LQ,I0,t) which gives the state of the system after time t with initial condition (>,,t). See Binmore andSamuelson (1995) or Waltman (1986).- 145 -Learning, Evolution and SeyOrganization Stability of SAL: Detenninistic Dynamicsparameters and the proportion of informed agents, the vector of optimal inference parametersis a function of the state variable of the dynamic system.4 The function for optimalinference parameters for uninformed agents is denoted t *(X, e) where 1? : [0,1] XR2-÷. Therational expectations parameters, which are the solution to the fixed point problem£ =C(X,1), are £‘(X). Note that in this notation, the inference parameters in the GSequilibrium L*, which has proportion X* informed agents, are simply £**(XD. Similarly, thefitness or utility of the uninformed agent (relative to the informed) depends on X, £*(X,e) andthe inference errors made by the specific agent. Therefore, the utility of the averageuninformed agent depends on X, £ and £(X,1). Equation (4.16) for the average uninformedagent in the economy can be stated as a function of the state variables of the dynamicsystem, fu(x,).sIn this chapter, the parameters of the economy (as opposed to those of the learning process)are denoted E. From the discussion at the close of the previous chapter, the economicparameter of interest is the level of noise in the risky asset supply (noise trading), . Inorder to facilitate discussion regarding the stability of the GS equilibrium to various levelsof noise trading, the GS rational expectations equilibrium proportion of informed agents, X,Recall that all informed agents are assumed to have identical and correct inference parameters. In addition, note that optimalinference parameters and fitness are both based on expected realized utility. That is, expectations are calculated using the trueparameters and are not based on agent beliefs.While optimal inference parameters depend only on X and I, if agents have different inference parameters, then the average Iwill depend on which specific agents are infonned (even for the same X). The noise that is introduced by altering which agentsare informed is in principle no different from the noise introduced by the stochastic experimentation. In the deterministiclearning dynamic considered in this section, all sources of noise are captured by the drift term.- 146 -Learning, Evolution and S4tOrganization Stability of SAL: Deterministic Dynamicsis held constant. This is be done by simultaneously adjusting the cost of information, c,using equation (4.21).6 All other parameters (for example, dividend mean and variance) inthe economy are held fixed. Therefore, the economy can be summarized by E=(u,X*)where the value of c is implied by oe 21fld X.I - Learning by Pure AdaptationThe deterministic learning process in (5.1) and (5.2) contains four functions. Each of thefour functions will be described in more detail. Initial focus is on the functions g and gwhich represent the adaptive component of the learning process. After presenting someassumptions on these functions the stable dynamic equilibria to the adaptation-only learningprocess (flx’7e0)is presented.First consider the adaptive component of the dynamic for the proportion of informed agents.In order to interpret gx as adaptation, the following two assumptions are made.urn g = urn g = 0 (5.3)x—1for XE(O,1) g0 f(X,1?)1 and g<0 f5(X,1)>1 (5.4)6 It is straight forward to rewrite (4.21) such that c is a fiznction of the a. and X.-147-Learning, Evolution and SelJOrganization Srabiliiy ofSAL. Detenninistic DynamicsThe first assumption ensures that (in the case of ‘Jx=O) Xt remains in [0,1]. It is alsoconsistent with proportional strategy growth, which is a typical assumption in evolutionarymodels.7 The second, and more important, assumption states that the proportion ofinformed agents decreases if and only if uninformed agents receive higher utility than theinformed agents (recall that f= 1 by a normalization). This is consistent with theinterpretation that gx captures adaptive or imitative learning.The region wheref 1 plays an important role in determining the behaviour of the learningdynamic. The following analysis identifies some properties of this region. For the economyE defmeG(E) = {(x,t) Ifu(X,O1} (5.5)The following lemma establishes the useful properties of G(E). For this chapter, all of theproofs are in Appendix C.Lemma 1 For a given economy, E, where Ore> 0, the set G(E) has the following properties:(a) G(E) is closed,(b) G(E) is not empty,(c) For X> X, there is a neighbourhood8of £(X)) is in G(E), andSee, for example, the replicator dynamics discussed in Chapter 3.8 In this chapter, neighbourhoods are defined with respect to the Euclidean norm. Note that since XE [0,1], the c-neighbourhoodof (X’,f’) should be interpreted as all points (X,f) within distance e of (X’,t’) and such that OX1.-148-Learning, Evolution and S4fOrganization Stability of SAL: Determini1ic Djmamice(d) The only (X,t)EG(E) with XX’ is (X’,flX)).An important property of the set G(E) is that its size depends on the level of noise tradingin the economy, o. The set G(E) is the region where uninformed agents are experiencinghigher utility than informed agents. This occurs when X> X. Since the price of the riskyasset with X informed agents is more informative about the signal than at X, it is moreeconomical to infer the signal imperfectly than to bear the cost, c, and be informed. To theextent that inference errors, £-(X,t), are made, the utility of the uninformed is reduced.However, with small inference errors, uninformed agents can still yield higher utility thaninformed agents at X> X* (Lemma 1(d)). The following function measures the maximumallowable inference errors that can be made and still leave the uninformed agent at least aswell off as the informed agent. For X X, defme 4(G(E)) as follows,= sup{I £—‘ II (X,), (X,1’)EG(E)} (5.6)(1,1’)The argsup will lie on the boundary of G(E). From Lemma 1(d), note that 4.(G)=O sinceG is a singleton at X. Similarly, 4 is not defined for X < X since G is empty for thesevalues of X.As the level of noise trading decreases, holding constant the GS equilibrium proportion ofinformed agents, X, the maximum allowable inference error decreases. In this sense G(E)gets smaller. In particular, as the level of noise trading approaches zero (holding constant- 149 -Learning, Evolution and S4fOrganization Stability of SAL: Deterministic Dynamicsso does the maximum allowable inference error. That is at oe=0 (which for X >0implies c=0) the maximum utility of an uninformed trader is equal to the informed. Thislevel is achieved only if no inference errors are made. Thus as noise trading vanishes, G(E)collapses. This is stated in the following Lemma.Lemma 2 For a given GS equilibrium proportion of informed agents, X E (0,1), for allx > x(e’r)) = 0 (5.7)Next, consider the adaptive learning process of the (average) inference parameters which isrepresented by g1(X,1). The following assumptions on g are made. First, for a givenXE (0,1), the solution to the differential equation dt/dt=g, (ie. (5.2) with i, =0) is such that1iinQ = Q**(,) (5.8)In addition,.IItII < 0 (5.9)-150-Learning, Evolution and SerfOrganization Stability ofSAL: Deterministic DynamicslimIIgII = 0 (5.10)The first assumption states that adaptation (holding the proportion of informed agentsconstant) will converge to the rational expectations inference parameters. This rules out, forexample, unstable cobweb-cycle adjustment dynamics. This assumption is required for thedynamic learning process to have a stable dynamic equilibrium near the GS equilibrium.The assumptions in (5.9) and (5.10), which are important to the results which follow, statethat the speed of learning decreases (eventually to zero) as fewer agents are uninformed.9Recall that adaptation is based on copying successful agents. If there are only a fewuninformed agents exploring for the inference parameters, the learning rate slows. Theimplication of these assumption is that adaptive forces are always pushing€towards t*(X,f)at a rate which depends on the proportion of uninformed agents.Similar to the set G, the set H will be useful to describe the results which follow. DefineH = {Q,t)Ig(X,)=0} (5.11)Note that for XE (0,1), the assumption that g, will converge to the correct inferenceparameters is equivalent to (X,t)EH if and only if £ =t**(X).The speed is represented by the Euclidian norm on R2 which is denoted- 151 -Learning, Evolution and SeifOrganization Stability of SAL: Detenninistic DynamicsII - Driftiess Stable Dynamic EquffibriaBefore investigating the equilibria in the learning process with drift, the purely adaptive ordriftless equilibria are considered by setting =0 in (5.1) and (5.2). In this case theset of stable dynamic equilibria reflect two types of behaviour for the learning system. Thefirst type is the GS equilibrium. The second type is where X= 1 (ie. all agents are informed)and inference errors are non-zero. In this case, the inference errors are moot since all agentsare informed. After defining dynamic equilibria and stable dynamic equilibria, theseequilibria are characterized.Definitions 4?) is a dynamic equilibrium i.f it is a fixed point of the deterministiclearning process in (5.1) and (5.2); that is dX/dt=d1?/dt=O.The dynamic equilibrium (X, 4?) is stable ffor any neighbourhood R of (X, 4?)there exists a neighbourhood Q of (X, 4?) such that any trajecto?y of (5.1) and(5.2) originating in Q remains in R.The dynamic equilibrium (X, 4?) is asymptotically stable fit is stable and thereexists a neighbourhood Q of (X, 4?) such that all trajectories of (5.1) and (5.2)that originate in Q converge to (X, 4?).10 Note that the existence of a neighbourhood Q such that trajectories beginning in Q converge to the dynamic equilibrium is notsufficient to imply stability. Consider the example dxIdt={-x for x,EWrationalU0, lr!x, for x,ERational’0}. All pointsconverge to 0. However, the system is not stable. The definition of asymptotically stable rules out such cases.-152-Learning, Evolution and SetOrganization Stability of SAL: Deterministic DynamicsProposition 1 Given an economy, E, with o> 0, the set of stable dynamic equilibria for(5.1) and (5.2) when 2=2h=Ois {(X*1**(X*))}u{(lf)EG(E)}Figure 5.1 describes this result. (Figures begin on page 165). The X is represented on thehorizontal axis and the inference error (t*(X,)) is on the vertical axis. Note that in orderto keep the diagram simple, only one dimension of the (two dimensional) inference error ispresented. The shaded area is the set G and represents where dX/dt=g 0. The line whereinference errors are zero is the set H where d€/dt=g, =0. The arrows represent the directionof movement in the system. The stable dynamic equilibria occur either at the GS rationalexpectation equilibrium (X*,0), or at the boundary where all agents are informed. Figure 5.2represents phase trajectories for specific gx and g1•H Each line on the diagram representsthe evolution of the system for a different initial X and £. The two types of stable dynamicequilibria are where the phase lines converge (the GS equilibrium) or reach the boundary(X= 1) outside the set G.ifi - Learning with Adaptation and DriftWhile the driftless dynamic is illustrative, it does not capture the nature of stochasticadaptive learning. Noisy experimentation will prevent the system from converging to X= 1since agents continue to experiment with being uninformed. For this reason, inferenceThese functions are the replicator dynamics for the evolution of X and a “best response” dynamic for the updating of inferenceparameters. These functions are chosen only to illustrate the more general points of this section. The inference parameter space(to avoid higher dimensional diagrams) is for the intercept parameter, f. In these simulations, inference errors on the slopeparameter,,are held at zero.- 153 -Learning, Evolution and S4fOrganization Stability of SAL: Deterministic Dynamicsparameter learning will not completely cease. However, experimentation (in informationchoice or inference parameters) also makes the inference parameters “harder” to learn sinceexperiments change the optimal inference parameters. By setting >O and i >0 in (5.1)and (5.2) choosing appropriate drift functions m and m the deterministic learning processcaptures both of these features of the stochastic adaptive learning process.The drift in the proportion of informed agents, m(X,1?), is assumed to be such thatlim m(X,f) <0 (5.12)x-.1lim m(X,e) > 0 (5.13)x-.ofor all 1?. Unlike, adaptation, drift can lead to fewer (more) informed agents, even ifinformed agents have higher (lower) relative fitness. This is because drift is capturing theexperimentation which, unlike adaptation, is not completely fitness driven. The assumptionson mx capture the continual experimentation in a stochastic adaptive learning process byacting to force X away from zero and one.The primary learning forces are from adaptation. Noisy experimentation and its proxy, drift,should be thought of as small. It is useful to characterize effect of drift and clarify what-154-Leanung, Evolution and SeOrganizadon Stability of SAL: Detenninistic Dynamics“small” means. For X< 1, the set G contains all points such that g(X,)O. When driftis non-zero, it is useful to define the set G’G’ = {(X,e) Ig(X,t)+iim(X,) O} (5.14)The following lemmas show that for small levels of drift, G is arbitrarily close to G’ exceptnear X=l.Lemma 3 For i,> 0 andfor all £ there exists an such that < implies (l-E, £) E G’.Lemma 4 Given ii>O and X”<l, there exists i>O such thatfor all(a) for all (X’, £ ‘) E G’ with X ‘< X “, there exists (), £) E G such that(X, f,)-(X ‘, C’) <, and(b) for all (X, £) E G with X < X”, there exists ().‘, £ ‘) E G’ such thatII O)-(X’’) II <iThe aspect of experimentation that drives inference parameters towards the rational levelshas already been captured in the specification of adaptive learning, g,. Thus drift willcapture the aspect of experimentation that makes learning more difficult by altering therational inference parameters, £Q,t).12 In particular, this portion of experimentation iscaptured by the assumption that drift pulls the inference parameters towards some arbitrary12 Recall from the previous chapter, when one agent modified the inference parameters all uninformed agents were made worseoff.- 155 -Learning, Evolution and S4lOrganizadon Stabithy of SAL: Detenninistic Dynamics(incorrect) value, That is, the solution to d1?/dt=m(X,t) has, for all X, a uniqueasymptotically stable dynamic equilibrium at £ = £m. The constant £m is chosen such that4!m4Q) for all XRecall that H={(X,t)g=O}. Define H’ as follows,H’ = (5.15)Since £ E R2, the set H’, in general, may be empty for some values of X (ie. the solutionsto a system of non-linear equations may not exist). Therefore, the assumption is made thatfor all fixed XE (0,1], g1 and me are such that the set of dynamic equilibria of the inferenceparameters is a singleton and that this point is also asymptotically stable. (I’he appendixincludes discussion which considers this assumption further). The function h(X) will denotethe asymptotically stable inference parameters for the given X. Previously, it was assumedthat g(X,f)=0 only at £=t**(X) andm1(X,)=O only at=m• The assumption that h(X)exists and is unique is analogous to these assumptions. It would not be surprising (orinteresting) to find that the GS rational expectations equilibrium was not stable given alearning process (with or without drift) that is doomed never to converge. Thus the issueof the stability of the GS equilibrium is restricted to the case where the convergence of thelearning process is assured.The following lemma shows that h(X) is a continuous function of X.- 156 -Learning, Evolution and SeOrganizadon Stability of SAL: Dete,ministic DynamicsLemma 5 The function h: (0, 1]—’R2 is continuous.Corresponding to the learning dynamic controlling X, drift (as a proxy for continuedexperimentation) has only a small effect on the inference parameters. That is, drift can bemade small enough to ensure that for any fixed X <1, the vector of asymptotically stableinference parameters in the system with drift, h(X), is close to that of the driftiess system,£(X). The following lemmas show that the dynamic equilibria (and therefore theasymptotically stable equilibria) of the inference process with drift lie close to, but not at,the rational expectations level except at X =1.Lemma 6 For lit>0, h(1)=4?mLemma 7 For XE(O,1], h(X) 1?(X)Lemma 8 If drift, m (), t), is a boundedfunction then given A >0 and X” <1, there exists,>0 such thatfor all lie < IIht2)-eO.) II <A.Learning processes with the assumptions and properties described above will be summarizedby three parameters. The set of learning processes L(X” ,A,1?,,.), with X” > X, A >0 andm h(X), denotes learning processes described by (5.1) and (5.2) with positive drift (,> 0-157-Learning, Evolution and S4jOrganizo4on Stability ofSAL: Detenninistic Dynamicsand 9j >0) and drift parameter £m. These learning processes have the properties that for allX<X”, G is with in of G’ (see Lemma 4) and h(X) is with in of £(X) (see Lemma 8).Before considering the stability of the GS rational expectations equilibrium it is helpful tocharacterize the stable dynamic equilibria for the learning process with non-zero drift. Theequilibria occur at the intersection of the boundary of the G’ set and the stationary points ofthe inference parameter dynamic, h(X). The following proposition establishes that at leastone stable dynamic equilibrium exists for the learning process with non-zero drift. 13Proposition 2 For a given economy E with > 0, there exists at least one stable dynamicequilibriwn to the learning process, L(X “,z, £.).IV - The Stabifity of the GS EquilibriumThe GS rational expectations equilibrium is not a dynamic equilibrium of a learning processwith drift. This follows directly from Lemma 7. At the GS equilibrium values, there areno adaptive forces acting on the agents but drift (however small) will pull the system awayfrom the GS state. However, stable dynamic equilibria of the learning process with driftmay lie (arbitrarily) close to the GS equilibrium. A characterization of how well the GS13 In general, h(X) will intersect G’ q 1 times where q is odd and (q+ 1)÷2 are the stable dynamic equilibria. The proof is verysimilar to the proof of proposition 4. For the same reasons that the boundary of G’ and hQt) must cross once, they must crossan odd number of times. For each (X,I) where a crossing occurs, only those points where (X-e,f) G’ are stable. Thiscondition occurs on the initial intersection and then on altering intersections. Since this is not crucial to the analysis, this pointis not developed formally.- 158 -Learning, Evolution and SdfOrgamzalion Stability of SAL: Deterininzstic Dynamicsrational expectations solution describes an economy with learning agents (or vice versa) isthe distance between the GS equilibrium and stable dynamic equilibria of the learningprocess. Since the deterministic learning process is a good approximation of a stochasticadaptive learning process (for small amounts of noise or “mutations”), this provides insightinto the stability of the GS equilibrium under always experimenting stochastic adaptivelearning processes.The following theorems state that for any economy there is a learning process (with drift)such that all stable dynamic equilibria are close to the GS equilibrium. In addition, for anylearning process with a property discussed below, there is an economy for which the GSequilibrium is not close to the set of stable dynamic equilibria. These theorems capture theintuition of the previous chapter (and genetic algorithm simulations in the following chapter)that both the level of noise trading in the economy and level of experimentation (capturedin drift) are important to determining when stochastic adaptive learning and rationalexpectations will yield similar predictions.In the previous chapter, equilibria for a stochastic adaptive learning process with vanishinglevels of experimentation were considered. In particular, in order to ensure that in the limitas experimentation vanished the process converged to the GS equilibrium, it was importantthat experimentation in the learning parameters became small faster than experimentation inthe information choice. (See the discussion of Proposition 3 in Chapter 4 on page 131). A159 -Learning. Evolution and S4fOrganization Stability of SAL. Detenninistic Dynamicssimilar condition is important for the stability of the GS equilibrium in the deterministicrepresentation of learning process used in this chapter. A necessary condition for anarbitrary learning process not to have its stable dynamic equilibria close to the GSequilibrium (for some economies) is that drift on the inference parameters be roughly thesame size as drift in the information choice. Formally stated, this condition is as follows.For a learning process, L(X”4”,L) for all X<Xtinf{IIh(X)_t* *(x)II} > sup{II(x,)—(x,’)II (x,e)eoG,(x,t’)eaG’} (5.16)x<xllwhere ô indicates the boundary of a closed set.Theorem 1 Given any >0 and economy E= (,X) with >0 and X E (0,1), there existX “, “ and £,Z such tha for all X ‘> X “, ‘<“ andfl £,,— £**(1) <II £,— €(‘l) , all the dynamic equilibria of the learningprocess L(X ‘4,, £,,) are within distance of the GS equilibrium, (X, fl))).Theorem 2 For a given learning process L(A “4, £.) with 0< X <X”< 1 and the propertyin (5.16) there exists ‘ such thatfor all economies E= fr,X*) with cii<all the dynainic equilibria, ()‘, £‘) have the property that X ‘> X “.Since the driftiess learning process has an asymptotically stable fixed point at the GS rationalexpectations equilibrium, Theorem 1 is not particularly remarkable. A learning process with-160-Learning, Evolution and SeOrganization Stability of SAL: Deterministic Dynamicsarbitrarily small drift has all its equilibria arbitrarily close to the GS rational expectationsequilibrium. What is notable is in Theorem 2. For many learning processes, even ones witharbitrarily small drift (like those constructed by Theorem 1), there are economies for whichall the stationary points (stable or not) are well away from the GS rational expectationsequilibrium. In fact by choosing X” close to 1 implies that economies exist for which thedynamic equilibrium will have almost all agents informed. These economies for which thisis true are those with a small level of noise trading, Oe.Note that in both Theorem 1 and 2 all the dynamic equilibria reside in a smallneighbourhood (either close to the GS value or with X 1) and are not necessarily unique.From Proposition 2, at least one of these fixed points is a stable dynamic equilibrium. Theintroductory comments of this chapter mentioned that a deterministic dynamic may notcapture the limiting behaviour of a stochastic system when there are multiple equilibria tothe dynamic system. Multiple deterministic dynamic equilibria may imply that the exactspecification of the noise matters since noise can (informally stated) bounce the systembetween basins of attraction for the deterministic equilibria. However, in this case, all stablefixed points of the learning dynamic are close together. Thus in the case where all theequilibria lie close to the GS equilibrium, the exact specification for noise (not drift) will notmatter. Noise may move the system between some (not necessarily all) of the regions ofattraction of the deterministic equilibria. However, since all equilibria are close to the GSequilibria, there will be little perceptible difference between the determinist representation- 161 -Learning, Evolution and S4fOrganizadon Stability of SAL: Detenninistic Dynamicsand the underlying stochastic adaptive process. In this respect the GS equilibrium is stable.Similarly, in the case when all fixed points of the deterministic learning process lie awayfrom the GS value, the exact specification of noise will not matter. Since none of the fixedpoints are close to the GS, the limiting behaviour of the underlying stochastic process willnot concentrate on the GS equilibrium. In this case, the GS equilibrium is unstable.Figure 5.3 and Figure 5.4 (see page 166) provide an example of when the GS equilibriumis stable. Note that all the trajectories in Figure 5.4 (including those which begin near theX =1 border) converge to a point just below the GS equilibrium (which in these examples isX=O.5 and £f**(X*)=O). In contrast, Figure 5.5 and Figure 5.6 describe an example wherethe unique dynamic equilibrium is near X =1. In this case all trajectories (including thoseinitiating near the GS equilibrium) converge to this point.Finally, the situation depicted in Figure 5.7 and Figure 5.8 is an example where ÔG’ andh(X) intersect three times. The first intersection near the GS equilibrium is stable as is thethird which is near X =1. In this case the limiting behaviour depends on the initial condition.Furthermore, the exact specification of noise in the underlying stochastic process caninfluence which of these equilibria best characterizes the limiting behaviour of the stochasticsystem.V - Conclusion-162-Learning, Evolution and SeOrganization Stability of SAL: Deterministic DynamicsThe previous chapter developed the GS rational expectations model and describes stochasticadaptive learning. The chapter described the relation between economic parameters and thelearning process that impact on the stability of the GS rational expectations equilibrium. Inparticular, stability depended on the relation between noise in the supply of the risky asset(noise traders) and the level of experimentation.This chapter has developed the relationship among noise trading, experimentation and thestability of the GS equilibrium by approximating the stochastic adaptive learning process witha deterministic one. In lieu of randomness, drift, is introduced to capture the effects ofrandom experimentation. The important result established is that for a given level of drift(a proxy for experimentation), the GS equilibrium may be stable or unstable. In this chapterthe stability of the GS equilibrium is measured by the distance the GS equilibrium is fromthe stable fixed points of the learning process. The stability is determined by the level ofnoise trading. For large (small) levels of noise trading, the GS equilibrium is stable(unstable). The stability is driven by the size of the set G where uninformed agents’ utility(perhaps with inference errors) exceeds that of the informed agents. When noise trading ishigh, and the proportion of informed agents increases above the GS equilibrium value,uninformed agents have more room for mistakes in their inference parameters. The size ofmistakes depend on the level of drift in the learning process. When the uninformed haveroom for error, the proportion of uninformed agents remains close to the GS equilibriumvalue, X. More importantly, the proportion of uninformed agents does not become very- 163 -Learning, Evolution and Sef.Organization Stability of SAL: Deterministic Dynamicssmall. Given this, the learning of the inference parameters of the uninformed agents isdriven primarily by adaptation and adaptation ensures that the inference parameters remainclose to their rational expectations levels.The next (and penultimate) chapter continues the with the GS model and attacks the questionof stability using simulations. In Chapter 6, a genetic algorithm is used as a specificexample of a stochastic adaptive learning process. While this chapter approximates generalstochastic adaptive process with deterministic dynamics, the following chapter investigatesa specific, but stochastic, learning process.- 164-Learning, Evolution and Seif-Orgarnzation Stability ofSAL: Deterministic DynamicsFigure 5.1 - Dnftless Learning ProcessThe arrows are for a typical driftiess dynamic. The equilibria are at the GS values (A,O) or at X=1.Figure 5.2 - Phase Plots in Dnftless Learning ProcessEach line represents a different initial condition. Note all lines converge to GS equilibriwn (X=O.5) or the X=1boundary.00 w 1.0Ao .oioo0.00800.0060I.0.00400.0020—0.0020—0.0040-0.0060—0.0080—0.01000.- 165 -Learning, Evolution and S4fOrganization Stability of SAL: Detenninistic DynamicsFigure 5.3 - Learning Process with DriftThe unique dynamic equilibrium is near the GS values (A,O) (denoted by the dot).Figure 5.4 - Phase Plots in Learning Process with DriftEach line represents a different initial condition. Note all lines converge to a point which is close to the GSequilibrium.00 1.0I’o .oioo0.00800 • 0060I.0.0040-0 • 0020-0.0040-0.0060-0.0080—0. OiOOLa.,bda- 166 -Learning, Evoludon and S4lorganization &ability of SAL: Detenninisdc DynamicsFigure 5.6 - Phase Plots in Learning Process with DriftEach line represents a different initial condition. Note all lines converge to a point close to X=1.0Figure 5.5 - Learning Process with DriftThe unique dynamic equilibriwn (denoted with the dot) is far from the GS values.Ao • oj.oo0.00800 • 0060I.0I.0.00400.0020.0080-0.0020-o .0040-0.0060-0.0080—0.0100Libda-167-Learning, Evohaion and S4tOrganizadon Stability of SAL: Detenninielic DynamicsFigure 5.8 - Phase Plots in Learning Process with DrIftEach line represents a dzfferent initial condition. Lines converge to close to the GS equilibrium or close to X=1.0Figure 5.7- Learning Process with DnftThere are three dynamic equilibria. Two are stable.A0.01000 • 00800 • 0060I.EL 0.00400.0020Th00000—0 • 0020-0.0040-0.0060-0.0080—0.0100Lanbda- 168 -Chapter 6Genetic Algorithm Learning in aRational Expectations ModelThe previous chapter, building on the model in Chapter 4 analytically describedthe relationship between economic and learning parameters that determine the stability of theGrossman-Stiglitz (GS) rational expectations equilibrium. In order to preform the analysis,the stochastic adaptive learning process was approximated by an appropriately chosendeterministic system. In addition, the problem was simplified by the assumption thatinformed agents know the exogenous signal-dividend relationship. This chapter relaxes bothof these assumptions and analyses stochastic adaptive learning using a genetic algorithmsimulation. The simulations support the analysis of the previous two chapters demonstrating- 169 -Leanung, Evolution and S4fOrganization Genetic A1goithm Learning in a Rational &pecrations Modelthe relationship between the level of noise trading (or noisy supply of the risky asset) andthe stability of the rational expectations equilibrium.As discussed in Chapter 1, the genetic algorithm is an example of a stochastic adaptivelearning process. Genetic algorithms, introduced by Holland (1975), are biologicallyinspired stochastic optimization techniques. However, genetic algorithms can be interpretedas simulating the adaptive learning process where successful strategies proliferate andunsuccessful strategies die out. They also generate novel strategies by combining andmutating existing strategies in an efficient search of the strategy space. Genetic algorithmshave intuitive appeal as a learning process since they employ both imitation andexperimentation and have been successfully used in many optimization applications.’ Theirsuccess in demonstrating and augmenting the theoretical results suggests such algorithms arepotentially useful techniques for analyzing more realistic and complex models. This chapterdescribes the genetic algorithm application of the GS model and presents the simulationresults.Stochastic adaptive learning models have been used successfully to model learning ofcomplex tasks and are becoming more common in economics. Many of these models usegenetic algorithms. Axelrod (1987), for example, used genetic algorithms to choose an1 See Goldberg (1989), Chapter 4 for a list applications.-170-Learning, Evolution and SerfOrganization Genetic Algorithm Learning in a Rational Expectations Modeloptimal strategy for the repeated prisoner’s dilemma game in a fixed population ofstrategies.2 Slade and Eaton (1990) consider the pricing behaviour of oligopolists andArifovic and Eaton (1994) use a genetic algorithm in a complex coordination problem.Marimon, McGrattan and Sargent (1990) consider adaptive agents in a monetary economy.Arifovic (1989) simulates learning in a rational expectations model (based on Bray, 1982)using a genetic algorithm to perform least squares learning. Arifovic (1994a) looks atgenetic algorithms and cob-web cycles and Arifovic (1994b) and Duffy and Bullard (1994)implement genetic algorithms in overlapping-generations models.3I - Simulation DesignUsing the model and notation introduced in Chapter 4, the genetic algorithm, as a stochasticadaptive learning process, generates a population learning state, L+1 based on the currentstate L1. The genetic algorithm updates agents’ learning states based on three operators:selection, crossover and mutation. From the point of view of agent n, L1 is determinedas follows. First, with probability proportional to fitness, two learning states from L areselected. These states are combined using crossover which takes different parts of each ofthe two learning states. Finally, mutation perturbs some of the elements of the combined2 The strategies that formed the environment for this study were the eight best strategies from the Axelrod (1984) tournament.Axeirod also considered the evolution of a population of agents playing the repeated prisoner’s dilemma model. Other examplesusing the prisoner’s dilemma and genetic algorithms include Miller (1987), Marks (1989a, 1989b) and Chapter 2 in this thesis.Some of these papers are discussed in Sargent (1993).- 171 -Learning, Evolution and SeOrganizadon Genetic Algorithm Learning in a Rational Expectations Modellearning state yielding The selection operator captures the imitation or adaptationcomponent of a learning process while the crossover and mutation drive the experimentation.The crossover step helps to guide experimentation to the most promising area in the spaceof possible learning states and mutation ensures that the learning process is alwaysexperimenting. Note that when the population consists of agents with similar inferenceparameters (Vi), crossover has little effect. It is the mutation rate which determines the limitsize of experimentation.Simulations begin with the creation of an initial learning state for each agent in thepopulation. In Simulations One and Two, the initial learning state of each agent contains theGS rational expectations equilibrium values (ie. L0=L). In Simulations Three and Four,initial learning states for each agent are selected randomly. Note that agents are completelycharacterized by their learning states. In the genetic algorithm, each agent’s learning stateis represented by a binary string (chromosome). One bit (gene) determines the agent’sinformation choice. The remaining bits are the binary representation of the real valuedparameters t’ in (4.9). The minimum and maximum values for these parameters arespecified to lie symmetrically about the GS rational expectations values.4 Successivegenerations of population learning states are created by the genetic algorithm based on theConsistent with the analysis of the previous section, the range for the A and is zero in these simulations and these parametersare fixed at the values specified by the GS equilibrium.- 172 -Learning, Evolution and S4tOrganizadon Genetic Algorithm Learning in a Rational Expectations Modelrealized fitness of the learning states. This process is described in the flowchart,Figure 6.1. The parameters for the simulations are contained in three tables. Table 6.1contains the parameters of the economy which are common to all the simulations. Theseparameters are consistent with a specific GS equilibrium, L5, where X=0.5. Table 6.2presents the parameters specific to each of the four simulations and summarizes the results.Finally, the parameters of the genetic algorithm learning process are in Table 6.3.Tables 1 to III ithd Figures 6.2 t 6.5 are the efld of the chapterbeginning on page 187.Fitness is calculated based on repeated play of the one-period model. Each generation, theone-period model is simulated R times (R= 1000 in the simulations presented) using thecurrent population learning state. The repetitions are independent of one another. Arepetition consists of selecting (y,e,e) according to (4.2), calculating asset demands accordingto (4.4), using the agents’ learning states, and solving for the equilibrium price using (4.10)and (13.5). The calculation of fitness, f’ is a proxy for expected realized utility, fEZ, and isbased on the realized mean and variance end-of-period wealth of the agents in the Rrepetitions. For all n =1,. . . ,N agents,The simulations are written in Pascal. They are based on the coding provided in Appendix C of Goldberg (1989). The Pascalprograms for these simulations are available from the author on request. The simulations presented here were ron on an IBMRS6000. A typical run of 5000 generations (or 5 million repetitions of the one-period model) takes several hours. Smallersimulations can be run on PC-based machines.- 173 -Learning, Evolution and SefOrganization Genetic Algorithm Learning in a Rational &pectalions ModelStartSelect Initial PopulationCreate strings randomly orat GS rational expecations parametersDecode Stringsto StrategiesDetermines learning stateDetermines if informed or notand the parameters (‘s and y’s)SimulateOne-Period Modelup to 5 Simulate a signal, noise and asset supply Repeattimes Calculate equilibrium price and demands 1000I timesRecord End-of-PeriodWealthBased on realized dividend, denurnrlq and priceCalculate FitnessBased on realized average and varianceof wealthGenerate New PopulationUsing Genetic AlgorithmSelection, Crossover and MutationEndFigure 6.1- Schematicfor Genetic Algorithm Stochastic Adaptive Learning Simulations-174-Leanung, Evolution and S4/Organization Genetic Algorithm Learning in a Rational Erpectations Model?= W:-oC+!(dr-Pr)X:Rr..ir 1 (6.1)I 1R 2 R 2- L _Y.(dr-Prx:)- [ Y.(dr_Pr)XijRecall that 9fl is 1 if agent n is informed and 0 otherwise. The equation for fitness is thesample equivalent of the mean-variance statement of expected utility (which follows fromCARA preferences and normality) •6The choice of a fitness measure is not innocuous. Alternative definitions of fitness areavailable. For example, both De Long et.al. (1990) and Blume and Easley (1992) assumefitness is based on average return. This measure of fitness can be implemented by thegenetic algorithm model by choosing only one repetition of the one-period model betweensuccessive generations (ie. R= 1). An argument for this measure might be based onobservability. Agents looking to imitate other agents can observe wealth easier than utility.However, in this model utility is based on the mean and variance of wealth whoseobservability is equally plausible. Most importantly, the selection of fitness is consistentwith the GS equilibrium.6 The mean-variance representation was used so that fitnesses are non-negative since agents are selected by the genetic algorithmfor imitation with probability proportional to their fitness. By appropriate choice of W5, the probability that the fitness isnegative is low. In the rare instances where fitness did happen to be negative, it was set equal to zero.- 175 -Learning, Evolution and SeOrganizaUon Genetic Algorithm Learning in a Rational &pectations ModelThe choice of fitness defmition is important for a second reason. Recall that the geneticalgorithm selects learning states (strings) for imitation according to fitness. Since fitness isconsidered proportional to the total fitness, the cardinal values for fitness matter. Thus theparameter for initial wealth is not innocuous. An increase in initial wealth, while notaffecting any of the equilibrium relationships, moves the probability of any one agent’slearning state being selected (imitated) closer to uniform.7 In other words, the cardinalvalues of utility determine the importance of the ordinal rankings in the adaptive learning.II - Simulation ResultsFour simulations are presented. They are representative of a large number of suchsimulations that have been conducted. The first simulation is an example of a stable GSequilibrium. The second simulation is an example which is not stable to the geneticalgorithm learning.8 These simulations differ only in the parameter a which is the noisein the supply of the risky asset. Both of these simulations begin at the GS rationalexpectations parameters, L*. The next two simulations use the same parameters fromSimulation One except the simulations begin from a randomly selected initial population.One of the simulation converges to the GS result while the other does not.Note that fitness proportional selection implies the probability of selecting string n isf+_. Once strategies are selected,the crossover and mutation operators are applied. There are alternative selection schemes which are used in genetic algorithmoptimization. For example, tournament selection uses only the ordinal information contained in the fitness. See Beasley et al(1993) for more details.8 Since the description of the simulations is informal, a specific value for i-stability is not required.- 176 -Learning, Evolution and SeifOrganization Genetic AlgoHth,n Learning in a Rational &pectations ModelThe description of the simulations will focus on the proportion of informed agents (Xi), theasset pricing parameters (the a’s) and the aggregate (or average) inference parameters (thej3’s). The asset pricing parameters in (4.10) are calculated from (13.5) according to theevolved learning state L. For comparison, the GS rational expectations pricing parameters(the a; from (13.11)) for the current X are shown. The aggregate 13’s are calculatedaccording to (13.3) from the learning state L. The 13, for example, is the (weighted)average of the [31 contained in the £ of those agents who are informed at t. The otheraggregate fl’s are calculated similarly. These plots also contain the GS rational expectationsvalues given the X. For the plots of the inference parameters of the uninformed, the rationalparameters are shown. These are the £(L) and represent the best choice for the parameter,given the current evolved learning state. For the informed inference parameters, the rationalparameter is the rational expectations value.(A) Stability Examples - Simulations One and TwoSimulations One and Two both begin from the GS rational expectations equilibrium, L*.They differ only in the noisiness of the risky asset supply, o (and corresponding cost ofinformation, c, which is chosen such that X*=0.5). This parameter determines the sensitivityoff to X (ie. the slope in Figure 4.1). In Simulation One, where supply noise is higher,the proportion of informed agents remains close to its GS equilibrium level of 0.5 (seeFigure 6.2a) and the GS equilibrium is stable. In contrast, in Simulation Two where the-177-Learning, Evolution and S4Organization Genetic AlgoHthm Learning in a Rational F.tpectations Modelsupply noise is smaller, X1 drifts steadily towards one (see Figure 6.3a). In this simulation,by generation 2000, with the exception of agent experimentation, all agents are informed.The pricing parameters reflect the differing behaviour of the two simulations. The behaviourof the a pricing parameters is similar in the two simulations (compare Figure 6.2c withFigure 6.3c). Both drift slightly away from their rational expectations values (which areshown in the figures as solid lines). Figure 6.2d and 6.3d report the behaviour of the a1parameter. This parameter is the sensitivity of the asset price to the signal y. In SimulationOne, which is stable, a1 remains close to the rational expectations level. In Simulation Two,where X drifts to one, the a1 parameter also rises. At X= 1, a= 1. The a1 in SimulationTwo drifts above this level primarily because the i3 (as discussed below) converges toslightly above f3. Finally, Figures 6.2e and 6.3e present the a2 parameter which is thesensitivity of the asset price to the realized asset supply, e. From (13.5) note that the a2 isprimarily determined by the X1 and the ‘s of the agents. Since the ‘s are held constant inthese simulations, the difference in the two figures is simply a reflection of the differentbehaviour of the X’s in the two simulations.As expected, the interference parameters of the informed agents differ only slightly in thetwo simulations. The f3 are in Figure 6.2f for Simulation One and Figure 6.3f forSimulation Two. The i3’ are in Figures 6.2g and 6.3g. Recall that the informed agents needonly learn the exogenous signal-divided relationship. This relationship is not affected by the- 178 -Learning, Evolution and SeOrganization Genetic Algorithm Learning in a Rational Expectations Mod4noisiness of the risky asset supply. Since the learning is stochastic, the paths for theseparameters are slightly different.9 Note that small variations from the rational parameters(j3*)are not surprising since the fitness of the informed (as well as uninformed) is observedwith error.The behaviour of the uninformed agents’ inference parameters is quite different in the twosimulations. In Simulation One, where X, remains close to its GS equilibrium level, the flUsalso remain close to their rational expectations level. Figure 6.2h contains fl0U and Figure6.2i contains i3 for Simulation One. Note that evolving flU and the rational 3U (given bye*(L)) lie on opposite sides of the rational expectations gT (ie. £*(LD). This is notsurprising, if other uninformed agents (about half the agents in the population) are placingtoo much weight on the price (reflected in the value of fl1U) then, knowing this, the rationalparameter puts less weight on the price. The distance between the evolving parameters andrational ones is small in terms of fitness (or utility). An agent who experimented with therational parameters would, on average, have higher fitness. However, this increase is slightand, given the noisy observation of fitness, may not be realized.In Simulation Two, where most agents eventually become informed, the evolving inferenceparameters of the uninformed diverge from the rational levels. Figure 6.3h contains theresults for fl0U• However, Figure 6.3i of demonstrates the divergence most clearly. AfterMI simulations begin with the same random seed. However, since the simulations have different outcomes in some respects(primarily in the X), the learning of the infonned also differs.- 179 -Learning, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expectations Modelthe first 250 generations, L1 differs from the initial state (Lo=L*) for which the initial valueof 131w is rational. The difference is large enough that, due to inference errors, the fitnessof the uninformed is below that of the informed and adaptation leads to more informedagents.1° The fitness difference persists even at X= 1. By generation 1600 almost allagents are informed. Recall that the parameter fl1U is an average of the 13 of theuninformed agents. With fewer uninformed, the average is more volatile which partiallyexplains the divergence. In addition, with so few uninformed agents, uninformed agents’learning becomes almost pure experimentation. The fitness of most agents does not reflectthe quality of the uninformed portion of their inference parameters since most agents areinformed. Pure experimentation or just blind guessing generates the disperse 13TJ5 and fromFigure 6.3b which is discussed next, it is unlikely that the experimentation will lead to anuninformed agent faring better than an informed agent.The analysis of stability in Section 4.ffl pointed to the importance of the supply noiseparameter for the stability of the GS equilibrium. These two simulations demonstrate thisresult. In Simulation One, inference parameter experiments tend to result in small changesin the proportion of informed. This is similar to the situation in curve C in Figure 4.1.However, in Simulation Two, with smaller supply noise, experiments shortly lead to thesituation where for all A, uninformed agents have lower fitness than the informed as in the10 Recall selection or adaptation is proportional to fitness. The size of difference in fitness of the infonned and the uninformedis relatively small as compared to average fitness (given the choice of the initial wealth parameter, Wj). This determines thespeed at which >.. moves to one. In other simulations (not presented here), where initial wealth is smaller (with the otherparameters of Simulation Two), X moves to one much more quickly.-180-Learning, Evolution and SeOrganizaiion Genetic Algorithm Learning in a Rational Expectations Modelcase of curve D in Figure 4.1. Recall that the supply noise also determines the size ofregion where, uninformed fitness exceeds the informed fitness with X =1 (denoted as G inthe Figure 4.1). Figure 6.3b for Simulation Two presents the difference in the averageinformed and uninformed fitnesses. This difference demonstrates that the analog to regionG is small in Simulation Two. When X is near one (after generation 2000), occasionally anexperimenting uninformed agent does better than the average informed. Most of the time,experimenting agents do worse and X does not move far from one.(B) Convergence Examples - Simulations Three and FourThe previous two simulations began at the GS rational expectations equilibrium. SimulationsThree and Four begin from (the same) randomly selected population learning state. In thesesimulations significant learning is required since agents are not endowed with ideal initiallearning states. The parameters for this simulation are identical to Simulation One. Underthese parameters, the GS equilibrium is stable. Simulation Three is an example of asimulation which tends to converge to the GS equilibrium. Simulation Four does not. Thedifference in the two simulations is a constraint on the movement of X1 in the initialgenerations of the simulations. In Simulation Three, the proportion of informed agents isconstrained to the GS rational expectation equilibrium level for the first 100 generations.- 181 -Learning, Evolution and Set/Organization Genetic Algorithm Learning in a Rational Expectations ModelThat is, in Simulation Three X=X*=0.5 for t< 100. In Simulation Four, X=0.5 for t<50.The role of this constraint is discussed further below.”Simulation Three converged to the GS equilibrium (see Figure 6.4a). After generation 100,however, the proportion of informed agents increased to 0.7 (around generation 500). Itslowly declined back towards the GS equilibrium level and fmally reached X=0.5 aroundgeneration 2500. In contrast, Simulation Four did not converge to the GS equilibrium (seeFigure 6.5a). After generation 50, X, increased quickly to 0.8 (by generation 200) and thencontinued to drift towards one reaching X= 1.0 by generation 1000.To understand the different outcomes in these simulations the average fitnesses of theinformed and uninformed are shown in Figure 6.4b for Simulation Three and Figure 6.5bfor Simulation Four (for the initial generations). Given the randomly selected learningstates, average informed and uninformed fitness is initially quite low. The selection andcrossover increase average fitness dramatically in the opening generations. The rapidlearning in the these generations leads to large changes in the population learning state.Since rational inference parameters for the uninformed (ie. £*(L)) depend on the (rapidlychanging) learning state, initially, the learning process of the uninformed is slower than forthe informed. Due to the faster learning by the informed, regardless of the informativenessIn general it appears that for the parameter values used in simulations Three and Four, >.,=O.5 must be imposed for (roughly)75 generations in order that the learning converges to the OS equilibrium. For other parameters (tighter allowable ranges forinference parameters, for example), convergence was obtained without this constraint.182-Learning, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expectations Modelof the price, informed agents’ fitness is above the uninformed. Even at X= 1, adaptivepressures will drive all agents to be informed. This is the situation with curve D inFigure 4.1. By the constraint that X1=0.5, uninformed agents improve their inferenceparameters without reducing the number of uninformed agents. Thus, in Simulation Threewhen the constraint on X1 is removed (at generation 100), the situation resembles curve Cin Figure 4.1. The X increases, but only to 0.7. As uninformed agents continue to learnand move closer to rational parameters (see fl0-’ in Figure 6.4h or 131U in Figure 6.4i), theirfitness increases above the informed and adaptation leads fewer agents to be informed. InSimulation Four, however, when the constraint on X, is removed earlier, the learning of theuninformed agents is not sufficiently advanced. Even as the proportion of informed agentsapproaches one, the uninformed are still doing worse than the informed. As discussed inconnection with Simulation Two, at X= 1, the learning of the uninformed is based purely onrandom experimentation.The a’s and j3’s for Simulation Three are similar to those in Simulation One. They arepresented in Figures 6.4c to 6.4i. The a’s and j3’s for Simulation Four are akin to those inSimulation Two and are presented in Figures 6.5c to 6.5i. Given the parallels with theprevious two simulations, further discussion is not presented.ifi - Extensions- 183 -Learning, Evolution and S4jOrganizadon Genetic Algorithm Learning in a Rational Erpecrations ModelThere are many possible extensions to the genetic algorithm model presented. For example,varying the learning rate is important in the spatial evolution model presented in Chapter 2.In that model, the least fit agents alter their strategy each generation. The behaviour of theevolution, including whether or not the evolution found a Nash equilibrium depends on thenumber of agents who update their strategy. A related modification to genetic algorithmlearning called “election” is made in Arifovic (1994a, 1994b). In these papers, before astrategy is adopted it is tested against recent experience. If the proposed strategy would havedone better than the current strategy it is adopted, otherwise the old strategy is maintained.Both modifications are intended to control the level of co-evolution occurring in thepopulation. In the context of the current model, if fewer agents are modifying learning statesin a generation (by algorithm design or as a result of election), then the learning of theuninformed should be easier. The “target” for learning, £*(Lt) is easier to hit since L is lessvolatile. However, results suggest that because the fitness of an uninformed agent is highlycorrelated with other uninformed agents and the fitness an informed agent is highlycorrelated with other informed agents, both election and reducing the number of learningagents affects one group of agents disproportionately. This increases the volatility of X.Further investigation could prove interesting.One aspect of the simulation which appears particularly unrealistic is the poor learning aboutthe uninformed inference parameters when most agents are informed. The informed agentspossess the necessary information to study the potentially useful price-signal relationship.-184-Learning, Evolution and SeOrganization Genelic Algorithm Learning in a Rational Erpectadons ModelHowever, the agents do not learn “how to be uninformed.” This occurs because fitness ofan informed agent does not reflect her uninformed inference parameters. If an agenthappened upon the rational uninformed parameters (via crossover or mutation in the geneticalgorithm) but the agent was informed their value is not reflected in the probability that otheragents imitate the learning state. There are several modifications to the genetic algorithmthat might address this concern. The first is to modify the definition of fitness to incorporatehow the informed agent would have done jf she had been uninformed. Alternatively, theagents could be given the opportunity to experiment with being uninformed for some of theR repetitions which constitute a generation.’2 However, if the size of the region whereuninformed fitness is larger informed fitness with X, near one is small (see area G inFigure 4.1), then it is anticipated that the proposed modifications will have little effect onthe simulations.IV- ConclusionsThe genetic algorithm implementation of the GS model demonstrates the stability resultsdiscussed in Chapters 4 and 5. The simulation results presents examples where a stochasticadaptive learning process does and does not converge to the GS equilibrium. Theseexamples include cases where the initial condition of the learning process is the GS rational12 . . . .This is computataonally more intensive than the current algorithm. Since the learning state could potentially change with eachof the 1000 repetitions that occur within a generation, the aggregate fl’s and pricing parameter a’s would need to be calculatedfor each repetition instead of just once per generation.- 185 -Leatrnng, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expectations Modelexpectations equilibrium. The effectiveness of the genetic algorithm in demonstrating thetheoretical results is encouraging. Genetic algorithm learning may be a useful technique toinvestigate more realistic models where rational expectations solutions cannot be calculated.More general conclusions for the set of three chapters using the GS environment are includedin the final chapter.-186-Learning, Evolution and Se0rganizadon Genetic Algorithm Learning in a Rational Erpectations ModelCoefficient ofAbsolute Risk Aversion a 2.0Initial wealth per repetition won 0.1Average aggregate supply of risky asset •N 100.0Average dividend [3 0.1Sensitivity of dividend to signal 13 1.0Variance of signal 0.0004Variance of dividend + 0.0008GS equilibrium proportion of informed agents 0.5Table 6.1- Common Simulation Parameters-187-Learning, Evolution and SeOrganization Genetic Algorithm Learning in a Rational Expectations ModelSimulation Number 1 2 3 4Initial Population L L’ Random RandomSupply Noise, a 10.0 1.0 10.0 10.0Cost of information, c 0.0146027 0.0015848 0.0146027 0.0146027(chosen so thatX*05)Number of Generation 1 1 100 50that X=O.5 isimposedSummary of L* stable L* not Converges NotSimulation Results stable to L ConvergetoLLocation of Results Figure 6.2 Figure 6.3 Figure 6.4 Figure 6.5Table 6.2- Simulation Specj/ic Parameters and Results Summaiy- 188-Learning, Evolution and S4f-Organizo.tion Genetic Algorithm Learning in a Rational Expectations ModelPopulation size N 1,000Repetitions of one-period model per generation R 1,000Length of simulation (number of generations) 5,000Length of string that represents learning state 83Crossover probability 0.7(f crossover does not occur, the agent imitates one of theselected learning states)Mutation Probability 0.0001(Probability that any bit is switched from 0 to 1 or viceversa)Expected number of mutations per generation 8.3Allfour simulation begin with the same seedfor random number generationMapping Learning States to StringsParameter Length ofBinaty Length ofRangeRepresentation Around the GSParameters, r(v)6 1 bit n.a.I3i, 13), i3”, f3,M 20 bits each 0.2‘,‘y 1 bit each 0.0Table 6.3 - Common Simulation Parameters - Genetic Algorithm- 189 -Learning, Evolution and SeifOrganization Genetic Algorithm Learning in a Rational Expectations ModelFigure 6.2 - Simulation 1 - Stable1.10.90.8-J0.78L00.60 500 1000 1500 2000 2500 3000 3500 4000 4500 5000Gerierat I onFigure 6.2a- X, Proportion of Informed Agents-190-Learning, Evolution and Sef-OrganlzaUon Genetic Algorithm Learning In a Rational Expectations ModelFigure 6.2b- D/ference in Average Fitness:ft-fa0.0180.0160.0140.0120.01‘I0.0080.0060.004L0.002Ca-0.002=-0.004-0.006C-0.008-0.01C-0.012-0.014-0.016 Ô 4ó0 I aào 112b0116b012000124b0128b0132b0136b0140b0144b0148b012O0 6O0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000GenerationC Ave UI - Ave UUdi a 0D aD CDai0 BCDo- 191 -Learning, Evolution and SeifOrganization Genetic Algoi*hm Learning in a Rational Expectations ModelFigure 6.2c- 110, Average PriceLi h I.i ,i ii ‘liII I(iIJIh.II. I I.hi” (I1II’I ‘p1 ,I I “I’ I0.10140.10130.10120.10110.1010.1009I-’0.10080.0.1007o.ioo0.10050.10040.10030.10020.10010.10.09990.0998 6 I 860 I 1600 2400 3200 4000 48004O0 1200 2000 2800 3600 4400Generationa 80 —Evolution-192-Learning, Evolution and SeOrganizaUon Genetic Algorithm Learning in a Rational Erpectaiions ModelFigure 6.2d- a1, slope ofprice to y0,980.9790. 9780.977r 0.976>‘0 0.975> 0.974> 0.9730.972C0.971LI0.970.969< 0.9680.9670. 9660.9650.9640 800 1600 2400 3200 4000 4800400 1200 2000 2800 3600 4400Generat I on0 RE —Evolution- 193 -Learning, Evolution and SefOrganlzation Genetic Algorithm Learning in a Rational F.rpectation ModelFigure 6.2e- a2, Slope of Pnce to e-0.00148- 0.00149-0.0015-0.00151-0.00152,.- -0.001530;-0.00154•‘ -0.00155>‘‘-‘ -0.00156-0.00157—0.00158-0.00159LI-0.0016c’J—0.00161a-0.00162< -0.00163-0.00164—0.00165—0.00166-0.00167-0.001686 I 800 1600 2400 3200 4000 4800460 1200 2000 2800 3600 4400Generation0 RE Evolution-194-Learning, Evolution and S4fOrgwiization Genetic Algorithm Learning in a Rational Erpectations Model- 195 -Learning, Evolution and S4lOrganizasion Genetic Algorithm Learning in a Rational &pectations ModelFigure 6.2g-I .0071 .0061 .0051 . 0041 . 003m1 .0021.0011o. ggg0. 998.IiIII III . LiI’I I’ IIll’ I I’o 1400 I 800 112b0116b012000124b0128b0132b0136b0140b0144b0148b01200 6O0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000Generation0 RE —Evolution- 196 -Learning, Evolution and S4tOrganizadon Genetic Algorithm Learning in a Rational &pectaslons ModelFigure 6.2h - I3oU0.00410. 0040. 00390.00380.00370.00360.00350.00340.00330.00320.0031m0.0030.00290.00280,00270.00260.00250.00240.00230.00220.0021 6 I 460 I 860 12b0116b0120’00124b0128b0132b0136b0140b0144b0148b012O0 6b0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000GenerationC flE Rational 0 Evolution.,I‘J[I’ lip-197-Learning, Evolution and SeOrganizaUon Genetic Algorithm Learning in a Rational &pectations ModelFigure 6.2i -0.9720.9710.970.9690.9680.9670.9660. 9650. 9640.9630.9620.9610.96IIIfjro i 400 800 I12b0I16b0I200I24b0I28b0I32b0I36b0I40b0I44b0I48b0I2O0 SOc 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000Generat I onD RE Rational 0 Evolution- 198 -Learning, Evolution and SeOrganization Genetic Algorithm Learning in a Rational Expectations ModelFigure 6.3 - Simulation 2 - Not StableFigure 6.3a - X, Proportion of Informed Agents1.10.90.8-J1::0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000GenerationI I I I I I I I I I- 199 -Learning, Evolution and SeOrganization Genetic Algorithm Learning in a Rational Expectations ModelFigure 6.3b - Difference in Average Fitness:ff”U’-,a a0 0aaaaaa aa0 aa0a aa aa a aaa a a aarI U0.10.0900.08L0C 0.07CD0.06E0.05‘IC0.04>‘— 0.03D0.02‘4-0.010-0.015 14O0 1860 I12b0l16b0I200I24b0I28b0I32b0I36b0I40b0I44b0I48b0I200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000Generationa Ave UI - Ave UUair j-a-200-Learning, Evolution and S4/Organizatlon Genetic Algorithm Learning in a Rational Eipectations ModelFigure 6.3c - a0, Average PriceIiIII I0.10080.10070.10060.10050.1004a0.1003w0.10020a 0.10010.10.09990.09980.09970.0996 I Ii I 8à0 I 1600 2400 4000 48004O0 1200 2000 2800 3600 4400Generat I onD RE ———EvolutionI r “- 201 -Learning, Evolution and S4tOrganization Genetic Algorithm Learning in a Rational &pectations ModelFigure 6.3d - a1, slope ofprice to y1 012I . 01 11.011 0091 .008>‘ 1.00701,0061.005>1.0041.0031)1.0021.0010.9990.9980.9970.9960.995O I 860 1600 2400 3200 4000 48004O0 1200 2000 2800 3600 4400GenerationU RE Evolution-202 -Figure 6.3e - a2, Slope of Price to e-0,0008-0.0009-0 001-0.0011>‘-0.0012—0.0013t -0.0014r4—0.0015-0.0016-0. 0017Generat I onRE Evolution- 203 -Learning, Evolution and S4jOrgamzation Genetic Algorithm Learning in a Rational Erpectations ModelFigure 6.3f- fl0’ilii ii ‘II I Ii. L0.10060.10050.10040.10030.1002m0.10010.10,0999o.ogge I I I I I0 4001800 1120011600120001240012800132001360014000144001480012O0 6O0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000Genrat iona RE —Evolutionrip“ III- 204 -Learning, Evolution and S4fOrga,uzadon Genetic AlgoHthm Learning in a Rational Expectations Model1.1i.o9 -1.08 -1.07-1.06 -1.05 -1.041,03-1.02-1.01m0.99 -0.98-0.97-0.96-0.95 -0.94-0.93-0.920.91 -0.9 I I I I0 I 400 I 800 1200 1600 2000 2400 2800 3200 3600 4000 4400 48002O0 60 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000GenerationD RE —Evolution1Ygure 6.3g- Ii’- 205 -Learning, Evolution and SeOrganizadon Genetic Algorithm Learning in a Rational Expectations ModelFigure 6.3h - I3U0 0 6’oø.000 00 0 00 0000*0000 00 0 0 00,000000 0 000 0 000 00 000.03 -0.02 -0.01 -0w-0.01-0.02-0.03-ào aáo2O0 600 1000 1400 1800 2200 2600 3000 3-400 3800 4200 4600 5000GenerationD RE Rational 0 Evolution- 206 -Learning, Evolution and Seif-Organization Genetic 4lgo,thm Learning In a Rational Expectations ModelFigure 6.3i-131’o 9720.9710 970. 9690.969o.9670,966m0.9650. 9640.9630. 9620.9610.96I ii r!hl IIII jo Il2boIl6boI2o’ooI24boI28boI32boI3EboI4oboI44boI4BboI200 6O0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000GenerationD RE Rational 0 Evolution- 207 -Learning, Evolution and S4torganization Genetic Algorahm Learning in a Rational Erpectadons ModelFigure 6.4 - Simulation 3 - Converges to GS EquilibriumFigure 6.4a- X, Proportion of Informed Agents1.10.90.8-JLi:.:0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000Generat onI I I I I I I I I I I- 208 -Learning, Evolution and SeOrganizationFigure 6.4b- Average Fitnessf1 andf”For the first 200 generationGenetic Algonthm Learning in a Rational Expectations Model>‘Da,0,aLa,>£0a001.1I0.90.80.7 -0.6 -0.5 -0.4 -0.3 -0.2+0GenerationInformed + Uninformed- 209 -Learning, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expectations ModelFigure 6.4c - a,, Average PriceJill• .II Al1’’II0.10080.10070.10060.10050.10040.1003w0.100200.10010.10.09990.09980.09970.0996 O I 800 1600 2400 3200 4000 48004O0 1200 2000 2800 3600 4400GenerationC RE EvolutionI ‘II, I-210-Learning, Evolution and SeifOrganization Genetic Algorithm Learning in a Rational Expectations Model0.990.98>‘04-,0.g4-,>4-,— 0.969a)U)Li0.95LU- 0.940.93800 1600 I 24b0 I 40b0 I 48b04c!I0 1200 2000 2800 3600 4400Genet-at ion0 RE —EvolutionFigure 6.4d- a1, slope ofprice to y-211 -Learmng, Evolution and SeOrganizadon Genetic Algonthm Learning in a Rational Erpectations Model-0.0011—0,0012°‘-0.00130>‘-0.001480.0015.2- -0.0016-0.0017-0.0018O I 860 16b0 24b0 32b0 40b0 48b0460 1200 2000 2800 3600 4400GenerationEl RE —EvolutionFigure 6.4e- cr2, Slope of Price to e-212-Learning, Evolution and S4f-Organizadon Genetic Algothhm Learning in a Rational &pecro4ons Model0. 1030.102 -0.101 -:::0.097 -0.096 -0.095 I I I I0 4150 8150 120011600120001240012800132001360014000144001480012O0 6O0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000Generation0 RE EvolutionFigure 6.4f-fl0’-213-Learning, Evolution and S4fOrganizationFigure 6.4g- 13/Genetic Algorithm Learning in a Rational Erpectadons Model1 .0041 .0020.9980. 9960. 9940. 992o.ggm0. 9880. 9860. 9840.9820.980. 9780.9760 400 8ll02O0 6b0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000Generation[1 RE —Evolution-214-Learning, Evolution and S4fOiganizationFigure 6.4h -Genetic Algorithm Learning in a Rational Expectations Modelo .0150.0140. 0130 0120.0110.010. 0090.0080.0070.006am 0.0050. 0040.0030.0020.0010-0.001-0.002-0,003-0,004GenerationD RE — Rational 0 Evolution- 215-Learning, Evolution and S4/OrganizationFigure 6.4i -Genetic Algonthm Learning in a Rational &pectations Model1 .031 021 01I0.990.98m0.970.960.950.940.93D RE Rational 0 Evolution- 216 -Learning, Evohalon and S4fOrgo.mzathn Genetic Algorithm Learning in a Rationol Expectations ModelFigure 6.5 - Simulation 4 - Does Not Converges to GSEquffibriumI I I I I I____________I1gure 6.5a - X, Proportion of Informed Agents1.1I0.90.8-JLi0.7aL00.6C0.5a0.4a0a0.30.20.100 500 1000 1500 2000 2500 3000 3500 4000 4500 5000Cenerat on-217-Learning, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expecw.Uons ModellYgure 6. Sb - Average Fitness f andfFor the first 1000 generation1 020.980 .960.940.92D 0.90.886L0.860.8400.8260.80.780.760.740.720.0 500 1000Generationlnforrrd + Uninformed-218-Learning, Evolution and S4/Organization Genetic Algorithm Learning in a Rational &pectations ModelFigure 6.5c- a0, Average Price0. 1040. 1030.1020.1010.10.099I-’‘ 0.09800.0970.09600.095r0.0940.0930.0920.0910.090.0890.088 6 I 860 1600 2400 3200 4000 48004ö0 1200 2000 2800 3600 4400Generation£1 RE Evolution- 219 -Learning, Evolution and S4/Organization Genetic AlgoHthm Learning in a Rational Expectations ModelIYgure 6.5d - a1, slope ofprice to y- 220 -Learning, Evolution and SeOrganizaUon Genetic Algoi*hm Learning in a Rational &pectadons Model-0.0008- 0.0009,- -0.0010> -0.0011>-0.0012CU)Lic’j -0.0013< -0.0014-0.0015—0.0016GenGrat I onRE EvolutionFigure 6.5e-a2, Slope of Price to e- 221 -Learning, Evolution and Sdf-Organization Genetic Algodthsn Learning in a Rational &pectatione ModelFigure 6.5f-(3’0.1030. 1020.1010.10.0990 0980. 0970.0960.095Generat I on0 RE Evolution- 222 -Learning, Evolution and S4fOrganization Genetic AlgoHthm Learning in a Rational Expectations Model1.11.09 -1.081.07 -1.06 -1.05 -1.04 -1.03 -1.02 -1.01 -0.970.960.95 -0.94 -0.93 -0.92 -0.91 -0.9 I I I4à0 8à0 12001160012000124001280013200136001400014400148001200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000Generationa RE EvolutionFigure 6.5g-- 223 -Learning, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expectations ModalFigure 6.5h -0.110.10.090.080.070.060.050 . 040.030.020.01o 0m-0.01-0.02-0.03-0.04-0.05-0.06-0.07-0.08-0.09-0.1-0.110 RE — Rational Evolution- 224 -Learning, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expectations ModelFigure 6.5i -1.11 .081.061 .041.02I0.980,960 . 940.920.90.880.860.84• 8 qo I 8 000oo 0 00 0000o i 4O0J8O0112b0I16b0I2d00I24b0I2Bb0I32b0I36b0I40b0I44b0I48J0I200 6 0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000GenerationD RE Rational 0 Evolution- 225 -Chapter 7ConclusionsN4odern financial theory almost always begins by analyzing the decisions ofindividuals. In this way aggregate behaviour such as that observed in stock prices or interestrates can be tied to economic primitives like preferences and information. However, thecomplete rationality assumption which is typically used to analyze individual behaviour is aweak link in the analytic chain. The evidence from cognitive psychology demonstrates thatassumptions of complete and unbounded rationality are simply not correct models for humandecision making.1 However, while individuals may not satisfy many of the assumptionsunderlying complete and unbounded rationality, individual choice is not random and is1 For example see Shoemaker (1982) and Camerer and Weber (1992) for discussion of the expected utility hypothesis for decisionmaking under uncertainty. Shafir, Simonson and Tversky (1993) is an example of the well known phenomenon that how choicesare framed influences decision. This evidence is inconsistentwith the fundamental premise that agent preferences can be usefullyrepresented with a binary relation. Finally see Vallone and Tversky (1985) for evidence of the inability of people to understandrandom sequences.- 226 -Learning, Evolution and S4/Organization Conclusionsusually reasonable. Therefore complete rationality may be a good enough approximation ofindividual decisions. This may be especially true in large markets where individualanomalies have little effect on aggregate behaviour. This thesis examines reasonable, butnot fully rational, individuals who learn by an adaptive and stochastic process.In this thesis, the behaviour of models with stochastic adaptive individuals is contrasted withcomplete rationality in two environments. In the first section of the thesis interactionbetween individuals’ decisions is limited and direct. People face their neighbours in arepeated prisoner’s dilemma. In the genetic algorithm simulation model of Chapter 2, thenumber of agents replaced in a generation, a parameter interpreted as an indication of therate of learning, is important. In the infinitely repeated game the learning rate affects theequilibrium level of payoffs (ie. affects which equilibrium is selected). In the fmitelyrepeated game the learning rate determines whether or not the system converges to theunique Nash equilibrium. Interestingly, high average fitness in the fmitely repeated gameis associated with a high degree of volatility in the average fitness and strategies used by theindividuals. In Chapter 3 this phenomenon is explored analytically. The analog of thelearning rate in this simpler model is the probability that an individual abandoned asuccessful strategy (ie. non-fitness based replacement). In this model the relativeprobabilities (even if arbitrarily small) of this stochastic shocks determine whether or not thelearning process will converge to the unique Nash equilibrium.- 227 -Learning, Evolution and SeOrganization ConclusionsThe implications of Chapters 2 and 3 depend on how one views Nash equilibrium or rationalbehaviour. If a stochastic adaptive learning process is to be used as a defence of rationaleconomic behaviour and the Nash equilibria that this implies, then the complexities of coevolution need to be addressed directly. Not every model of adaptive learning leads tobehaviour which is consistent with complete and unbounded rationality. A similar caveat iswarranted if an adaptive learning algorithm, such as a genetic algorithm, is to be used to findequilibria of complex less tractable games. On the other hand, if one does not see Nashequilibrium as a necessary outcome of an adaptive process, then the prisoner’s dilemmamodel offers a bench-mark for considering models where evolution is used to generate“reasonable,” but not necessarily optimal, behaviour in more complex environments. Forexample, Arthur (1993) suggests a stochastic adaptive learning algorithm to model humanchoice. In his model, the agent is learning about a fixed environment. The complexities ofco-evolution presented in the prisoner’s dilemma model must be addressed when extendingArthur’s model to agents learning about an environment influenced by the behaviour of otherlearning agents. The difficulties of co-evolutionary models suggests that the developmentof a “flight simulator” that accurately reflects an entire economic system and in a such a wayas to allow policy analysis will be difficult?The second portion of the thesis investigates stochastic adaptive learning in a non-strategicyet co-evolutionary environment. This section develops an asymmetric information, one2 The flight simulator of economic activity is discussed in Freeman (1992).- 228 -Learning, Evolution and S4/Organization Conclusionsperiod, single risky asset portfolio choice model based on Grossman and Stiglitz (1980). Themain finding of these three chapters is that the appropriateness of the rational expectations(or complete rationality) equilibrium depends upon the level of noise in the economy (in theform of noise traders) relative to the level of experimentation in the individual’s learningprocesses. The discussion of this relationship begins in Chapter 4 by constructing a learningprocess which converges to the rational expectations equilibrium and concludes with adiscussion of the stability of the GS equilibrium to an adaptive learning process whereexperimentation does not vanish. Chapter 5 develops a deterministic representation of astochastic adaptive learning process to formally develop the link between noise trading,experimentation and the GS equilibrium. Finally, Chapter 6 demonstrates the stability resultin a more general environment using a genetic algorithm as an example of a stochasticadaptive process. The chapter presents examples where the learning process does and doesnot converge to the GS equilibrium. The effectiveness of the genetic algorithm indemonstrating the theoretical results is encouraging. Genetic algorithm learning may be auseful technique to investigate more realistic models where rational expectations solutionscannot be calculated. However, as is the case in the first section of the thesis, the fact thatadaptive learning process need not converge to a rational expectations equilibrium reducesthe ability of the genetic algorithm to be used as a tool for solving for equilibria.The key ingredients for stability of the GS rational expectations equilibrium are the level ofnoise in the asset supply and size of agents’ experiments. It is not immediately obvious how- 229-Learning, Evolution and Se’Organization Conclusionsthese variables could be measured empirically. There are two views on the underlying causeof the uncertainty about asset supply (or justifications for the assumption). One view is thatsome agents trade randomly for un-modeled reasons (often called liquidity traders). Thistype of noise is difficult to measure. An alternative view is that noise is introduced by asmall number of irrational traders. De Long et al (1990) develop the discussion of Black(1986) that agents trade based on inferences from irrelevant data. This second type of noisewould seem to be most plausible in equity markets, where valuation is complex anddifferences of opinion are common. Although the GS model presented here is a single-period model, it is interesting to note that higher levels of noise in equity markets increasethe likelihood that the GS equilibrium is stable and therefore a good approximation for acomplex economy.Estimating the size of experimentation of agents in the economy would also be challengingand potentially insightful. In a more general model than the one presented here, marketvolume might give some indication of the size of experiments. When an agent alters herrule-of-thumb, affecting the desired portfolio, trade will occur. Exactly how trading volumecaused by new information can be separated from trade arising from experimentation andadaptation is an open issue. The issue is open particularly because there is no consensustheory of market volume (see Karpoff, 1987). Alternatively, more direct evidence on thelearning process could be gathered from a laboratory setting. The GS model with stochasticadaptive learning is well suited to laboratory investigation. By controffing the information- 230-Learning, Evolution and S41 Organizationsubjects observe about past learning states and fitness, the learning process could becharacterized. Such laboratory experiments have been successful in similar models inunderstanding rational expectations equilibria.3Throughout the thesis the structure of the economy and the learning process have beentreated as exogenous. It is important to note that the learning process and the structure ofthe economy are not truly independent. For example, in the asset pricing environment,presumably the market mechanism that determines prices will not develop independently ofcharacteristics like level of noise and learning process. In the GS model, the process bywhich the equilibrium asset price is determined is not specified. The typical market clearingassumption is used to focus on learning and the process by which the equilibrium priceconveys information. The fact that stochastic adaptive learning can lead to a rationalexpectation equilibrium is important because it helps reduce the concern about the realismof the rational expectations assumptions. However, if learning truly is stochastic andadaptive then it will affect the optimal design and evolution of the market mechanism.Similarly in the prisoner’s dilemma model the spatial structure will be the outcome ofindividual decisions and therefore cannot evolve independently of the learning process.4 Amodel which can incorporate the connection between learning and economic structure wouldSee Plott and Sunder (1982) and Forsythe et al (1982) for examples of laboratory asset markets with asymmetric information.Arifovic (1994b) compares genetic algorithm simulations with an experimental study of an exchange rate model with multipleequilibria.Tesfatsion (1995) and Stanley, Ashlock and Tesfatsion (1993) make this point strongly.-231-Learning, Evolution and SeifOrganization Conclusionsbe complex and potentially unmanageable. However, understanding the self-organizingconnections between learning and economic structure would be a significant contribution tounderstanding modern economies and the boundedly rational humans whose decisions createthem.- 232 -BibliographyAghion, P., P. Bolton, C. Harris and B. Juffien, 1991, Optimal Learning byExperimentation, Review of Economic Studies, 58, 621-654.Arifovic, J., 1994a, Genetic Algorithm Learning and the Cobweb Model, Journal ofEconomic Dynamics and Control, 18, 3-28Arifovic, J., 1994b, The Behavior of the Exchange Rate in Genetic Algorithm andExperimental Economies, manuscript, Simon Fraser University.Arifovic, J., 1989, Learning by Generic Algorithms in Economic Environments, Sante FeInstitute Working Paper Number 90-001.Arifovic, J. and B.C. Eaton, 1994, Coordination and Genetic Learning, manuscript, SimonFraser University.Arthur, W.B., 1993, On Designing Economic Agents that Behave Like Human Beings,Journal ofEvolutionary Economics, 3, 1-22.Axeirod, R.A., 1987, The Evolution of Strategies in the Iterated Prisoner’s Dilemma, in L.Davis (ed.), Genetic Algorithms and Simulated Annealing, Pitman, London.Axeirod, R.A., 1984, The Evolution of Co-operation, Basic Books, New York, NY.Bak, P. and K. Chen., 1991, Self-Organized Criticality, ScientificAmerican, January, 46-53.Bak, P., C. Tang and K. Wiesenfeld, 1988, Self-Organized Criticality, Physica Review A,364-374.Bak, P., K. Chen and M. Creutz, 1989, Self-Organized Criticality in the ‘Game of Life’,Nature, 342, 780-782.- 233 -Learning, Evolution and SeOrganization BibliographyBala, V. and S. Goyal, 1993, A Theory of Learning with Heterogeneous Agents,manuscript, McGill University.Beasley, D., D.R. Bull and R.R. Martin, 1993, An Overview of Genetic Algorithms: Part2, Research Topics, University Computing, 15, 170-181.Bergin, J. and B.L. Lipman, 1994, Evolution with State-Dependent Mutations, Manuscript,Queen’s University.Bernardo, A.E., and K. Judd, 1995, The Relationship Between Trading Volume and PriceChanges, manuscript, University of California at Los Angeles.Binmore K.G., and L. Samuelson, 1995, Evolutionary Drift and Equilibrium Selection,manuscript, University of Wisconsin.Binmore K.G. and L. Samuelson, 1992, Evolutionary Stability in Repeated Games Playedby Finite Automata, Journal ofEconomic Theory, 57, 278-305.Binmore, K. G., 1988, Modelling Rational Players II, Economics and Philosophy, 4, 9-55.Black, F., 1986, Noise, Journal of Finance, 529-543Blume, L. and D. Easley, 1992, Rational Expectations and Rational Learning, manuscriptCornell University.Blume, L. and D. Easley, 1992, Evolution and Market Behavior, Journal of EconomicTheory, 58, 9-40.Blume, L.E., M. Bray and D. Easley, 1982, Introduction to the Stability of RationalExpectations Equilibrium, Journal ofEconomic Theory, 26, 313-3 17.Boylan R.T., 1992, Laws of Large Numbers for Dynamical Systems with RandomlyMatched Individuals, Journal of Economic Theory, 57, 473-504.- 234 -Learning, Evolution and Sef0rganization BibliographyBray, M., 1982, Learning, Estimation and the Stability of Rational Expectations, JournalofEconomic Theory, 26, 318-339.Bray, M.M. and N.E. Savin, 1986, Rational Expectations Equilibria, Learning and ModelSpecification, Econometrica, 54, 1129-1160.Bray, M. and D. Kreps, 1987, Rational Learning and Rational Expectations, in G. Feiwel(ed.), Arrow and the Accent of Modern Economic Theory, MacMillan Press, Houndmills.Cabrales, A., and T. Hoshi, 1992, Evolutionary Consequences of Noise Traders, Chartistsand Other Non-Rational Investors, manuscript, University of California, San Diego.Camerer, C. and M. Weber, 1992, Recent Developments in Modelling Preferences:Uncertainty and Ambiguity, Journal ofRisk and Uncertainty, 5, 325-386.Danielson, P.A., 1992, How to Evolve AMoral Players: Artificial Morality and GeneticProgramming, manuscript, University of British Columbia.De Long, J.B., A. Shleifer, L.H. Summers and R.J. Waldmann, 1990, Noise Trader Riskin Financial Markets, Journal of Political Economy, 98, 703-737.Duffy, J. and J.B. Bullard, 1994, A Model of Learning and Emulation with ArtificialAdaptive Agents, manuscript, University of Pittsburgh.Easley, D. and N.M. Kiefer, 1988, Controlling a Stochastic Process with UnknownParameters, Econometrica, 56, 1045-1064.Easley, D. and N.M. Kiefer, 1988, Controlling a Stochastic Process with UnknownParameters, Econometrica, 56, 1045-1064.Evans, G.W. and S. Honkapohja, 1994, Economic Dynamics with Learning: New StabilityResults, manuscript, University of Edinburgh.- 235 -Learning, Evolution and SeOrganization BibliographyFama, E.F., L. Fisher, M.C. Jensen and R. Roll, 1969, The Adjustment of Stock Prices toNew Information, International Economic Review, 10, 1-21.Fama, E.F., 1991, Efficient Capital Markets:ll, Journal ofFinance, 46, 1575-1617.Forsythe, R., T.R. Paifrey and C. Plott, 1982, Asset Valuation in an Experimental Market,Econometrica, 50, 538-567Freeman, D.H., 1992, Is Management Still a Science? Harvard Business Review, 70, 6, 26-38.Friedman, D., 1991, Evolutionary Games in Economics, Econometrica, 59, 637-666.Fudenberg, D and J. Tirole, 1991, Game Theory, MIT Press, Cambridge, MA.Gale, J., K.G. Binmore and L. Samuelson, 1995, Learning to be Imperfect: The UltimatumGame, Games and Economic Behaviour, 8, 56-90.Goldberg, D., 1989, Genetic Algorithms in Search. Optimization and Machine Learning,Addison-Wesley, Reading, MA.Grossman, S.J. and J.E. Stiglitz, 1980, On the Impossibility of Informationally EfficientMarkets, American Economic Review, 70, 393-408.Grossman, S.J., 1976, On the Efficiency of Competitive Stock Markets Where Traders HaveDiverse Information, Journal ofFinance, 31, 573-585.Guesnerie, R., 1992, An exploration of the Eductive Justification of the Rational-Expectations Hypothesis, American Economic Review, 82, 1254-1278.Hogarth R.M. and M.W. Reder (eds.), Rational Choice: The Contrast between Economicsand Psychology, University of Chicago Press, Chicago, IL.- 236 -Learning, Evolution and S4fOrganization BibliographyHolland, J.H., 1992, Genetic Algorithms, Scientific American, July, 66-72.Holland, J.H., 1975, Adaptation in Natural and Artificial Systems, University of MichiganPress, Ann Arbour, MI. (1992 edition, MiT Press, Cambridge, MA).Jackson, M. 0., 1991, Equilibrium Price Formation and the Value of Private Information,Review of Financial Studies, 4, 1-16.Jefferson, D., R. Collins, C. Cooper, M. Dyer, M. Flowers, R. Korf, C. Taylor and A.Wang, 1991, Evolution as a Theme in Artificial Life: The Genesis/Tracker System, in C. G.Langton, C.Taylor, J.D. Farmer and S. Rasmussen (eds.), Artificial Life II. the Proceedingsof an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems,Addison-Wesley, Redwood, CA.Jeffrey, R., 1965, The Logic of Decision, McGraw Hill, New York.Jordan, J. S., 1991, Bayesian Learning in Normal Form Games, Games and EconomicBehaviour, 3, 60-8 1.Kalai, E. and E. Lehrer, 1993, Rational Learning Leads to Nash Equilibrium, Econometrica,forthcoming.Kalai, E. and W. Stanford, 1988, Finite Rationality and Interpersonal Complexity inRepeated Games, Econometrica, 56, 397-4 10.Kandori, M., G.J. Mailath and R. Rob, 1993, Learning, Mutation, and Long Run Equilibriain Games, Econometrica, 61, 29-56.Karpoff, J.M., 1987, The Relation Between Price Changes and Trading Volume: A Survey,Journal of Financial and Quantitative Analysis, 22, 109-126.- 237 -Learning, Evolution and SeOrganization BibliographyKauffman, S.A. and S. Johnsen, 1991, Co-evolution to the Edge of Chaos: CoupledLandscapes, Poised States and Co-evolutionary Avalanches, Journal of Theoretical Biology,149, 467-505.Kirchamp, 0., 1194, Spatial Evolution of Automata in the Prisoners’ Dilemma, Manuscript,University of Bonn.Kohlberg, E. and J.F. Mertens, 1986, On the Strategic Stability of Equilibria, Econometrica,50, 863-894.Kosa, J., 1992, Genetic Programming and Co-evolution of Computer Programs, in C.G.Langton, C.Taylor, J.D. Farmer and S. Rasmussen (eds.), Artificial Life II. the Proceedingsof an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems,Addison-Wesley, Redwood, CA.Kraus, A. and M. Smith, 1994, Beliefs about Beliefs, manuscript, University of BritishColumbia.Kreps, D., 1990, A Course in Microeconomic Theory, Princeton University Press,Princeton, NJ.Langlois, R.N. (ed), 1986, Economics as a Process: Essays in the New InstitutionalEconomics, Cambridge University Press, Cambridge, England.Langton, C.G., 1992, Introduction, in C.G. Langton, C.Taylor, J.D. Farmer and S.Rasmussen (eds.), Artificial Life II. the Proceedings of an Interdisciplinary Workshop onthe Synthesis and Simulation of Living Systems, Addison-Wesley, Redwood, CA.Linn, S.C. and B.E. Stanhouse, 1992, Learning and the Dynamic Behavior of Risky AssetPrices, manuscript, University of Oklahoma.- 238 -Learning, Evolution and S4fOrganization BibliographyLucas, R.E., 1986, Adaptive Behavior and Economic Theory, in Hogarth R.M. and M.W.Reder (eds.), Rational Choice: The Contrast between Economics and Psychology, Universityof Chicago Press, Chicago, IL.Mailath, G.J., 1992, Introduction: Symposium on Evolutionary Game Theory, Journal ofEconomic Theory, 57, 259-277.Marcet, A. and T. Sargent, 1988, The Fate of Systems with “Adaptive” Expectations,American Economic Review, 78, 168-171.Marimon, R., E. McGrattan and T.J. Sargent, 1990, Money as a Medium of Exchange withArtificially Intelligent Agents, Journal ofEconomic Dynamics and Control, 14, 329-373.Marks, R.E., 1989a, Breeding Hybrid Strategies: Optimal Behaviour for Oligopolists,Australian Graduate School of Business, Working Paper 89-006.Marks, R.E., 1989b, Niche Strategies: The Prisoner’s Dilemma Computer TournamentsRevisited, Australian Graduate School of Business, Working Paper 89-009.Maynard Smith, 1982, Evolution and the Theory of Games, Cambridge University Press,Cambridge.Milgrom, P., and J. Roberts, 1991, Adaptive and Sophisticated Learning in Normal FormGames, Games and Economic Behaviour, 3, 82-100.Miller, J.H., 1989, The Co-evolution of Automata in the Repeated Prisoner’s Dilemma,Sante Fe Institute Working Paper 89-003.Miller, J.H., 1989, The Co-evolution of Automata in the Repeated Prisoner’s Dilemma,Sante Fe Institute Working Paper 89-003.Morrison, D.F, 1990, Multivariate Statistical Methods. Third Addition, Magraw-Hills, NewYork, NY.- 239 -Learning, Evolution and S4fOrganization BibliographyNachbar, J.H., 1992, Evolution in the Finitely Repeated Prisoner’s Dilemma, Journal ofEconomic Behaviour and Organization, 19, 307-326.Nachbar, J.H., 1990, Evolutionary Selection Dynamics in Games: Convergence and LimitProperties, International Journal of Game Theory, 19, 59-89.Plott, C.R. and S. Sunder, 1982, Efficiency of Experimental Security Markets with InsiderInformation: An Application of Rational-Expectations Models, Journal ofPolitical Economy,90, 663-698Rosenzweig, M.L., J.S. Brown and T.L. Vincent, 1987, Red Queens and ESS: The CoEvolution of Evolutionary Rates, Evolutionary Ecology, 1, 59-94.Rosser, J.B. Jr., 1992, The Dialogue Between the Economic and Ecological Theories ofEvolution, Journal ofEconomic Behavior and Organization, 17, 195-2 15.Sargent, T.J., 1993, Bounded Rationality in Macroeconomics, Oxford University Press,Cambridge, MA.Savage, L.J., 1954, The Foundations of Statistics, Wiley, New York.Selten, R., 1975, Re-Examination of the Perfectness Concept for Equilibrium in ExtensiveGames, International Journal of Game Theory, 4, 25-55.Shafir, E., I. Simonson and A. Tversky, 1993, Reason-Based Choice, Cognition, 49, 11-36.Shoemaker, P.J.H., 1982, The Expected Utility Model: Its Variants, Purposes, Evidence andLimitations, Journal of Economic Literature, 20, 529-563.Simon, H.A., 1959, Theories of Decision-Making in Economics and Behavioral Science,American Economic Review, 49, 253-283.- 240 -Learning, Evolution and S4fOrganization BibliographySimon, H.A., 1955, A Behavioral Model of Rational Choice, Quarterly Journal ofEconomics, 69, 99-118.Simon, H.A., 1987, “Behavioral Economics”, “Bounded Rationality”, and “Satisficing” inJ. Eatwell, M. Milgate and P. Neuman, (eds.), The New Palgrave, Macmiffian Press,London.Skyrms, B., 1994, Darwin Meets The Logic ofDecision: Correlation in Evolutionary GameTheory, Philosophy of Science, 61, 503-523.Skyrms, B., 1990, The Dynamics of Rational Dynamics, Harvard University Press,Cambridge, MA.Slade, M.E. and B.C. Eaton, 1991, Evolutionary Equilibrium in Market Supergames,manuscript, University of British Columbia.Stanley, E.A., D. Ashlock and L. Tesfatsion, 1993, Iterated Prisoner’s Dilemma withChoice and Refusal of Partners, in C. Langton (ed.) Artificial Life ifi, Addison-Wesley(forthcoming).Stokey, N.L. and R.E. Lucas, 1989, Recursive Methods in Economic Dynamics, HarvardUniversity Press, Cambridge, MA.Taylor, P.D., and L.B. Jonker, 1978, Evolutionarily Stable Strategies and Game Dynamics,Mathematical Biosciences, 40, 145-156.Tesfatsion, L., 1995, A Trade Network Game with Partner Selection, working paper, IowaState University.Thompson, R., 1985, Conditioning the Return Generating Process on Firm-Specific Events:A Discussion of Event Study Methods, Journal ofFinancial and Quantitative Analysis, 20,15 1-168.- 241 -Learmng, Evolution and SeOrganization BibliographyVallone, R. and A. Tversky, 1985, The Hot Hand in Basketball: On the Misperception ofRandom Sequences, Cognitive Psychology, 17 295-314.van Damme, E., 1987, Stability and Perfection of Nash Equilibria, Springer-Verlag.Vives, X, 1993, How Fast do Rational Agents Learn? Review ofEconomic Studies, 60, 329-347.Vose, M.D., 1991, Generalizing the Notion of Schema in Genetic Algorithms, ArtificialIntelligence, 50, 385-396.Waitman, P., 1986, A Second Course in Differential Equations, Academic Press, Orlando,FL.Wang, J., 1993, A Model of Intertemporal Asset Prices Under Asymmetric Information,Review of Economic Studies, 60, 249-282.Young, H.P., 1993, The Evolution of Conventions, Econometrica, 61, 57-84.242 -Appendix A - Proofs for Chapter 3Proof of Proposition 1The replicator dynamic in (3.5) for the prisoner’s dilemma (with p denoting the proportionof agents playing C) is-= i (1 — PC) (u(C) — u(D)) (A. 1)= PC(l —p)A(p)where (p) is the difference in the expected utilities of playing C versus D. The probabilityan agent playing a, meets an agent playing C is denoted lrC(pC;a,). The z(p) is calculated asfollows:= c c(p ; C) — e c(p ;D) — d (1 — c(P D)) (A.2)= c T(p ; C) - (e -d) ‘cQ’ ;D) - dIn order forp* to be a stable dynamic equilibrium, the following two conditions are required=0 (A.3)P -P_4_ < 0 (A.4)dp dt)PC=PThese conditions translate into conditions on (p). First, since PC(i-PC) >0, (A.3) implies= 0 (A.5)From (A.2), this condition can be written ascQ ;C) =+c(P f)) (A.6)Note that in order for ‘w(p;C) <1 it is sufficient forpD) < I C (A.7)(e—d cSecond, calculating the derivative in (A.4) and using the facts thatp(1-p) >0 and (pD=0,this condition can be written asdA(p)= c p*;C) - (e-d)p *;D) < 0 (A.8)dp- 243 -Learning, Evolution and S4Organizalion Appendix Awhere the prime denotes the derivative with respect to p. Note that a sufficient conditionfor (A.8) to be satisfied is ir(p ;C) =0 and ‘7rC(p ;D) >0.The following equations construct piece-wise-linear functions that satisfy (A.6) and (A.8) forE (0,1). They are shown in Figure B. 1.(c\f d\ *PP= e—d) k c,i (A.9)g0÷g1ph0p= +(!::)p * p * —€ p p* +e (A. 10)h1+h2pwhere g0, g1, h0, h1 and h2 are (easily calculated) constants chosen to give continuity andT(0,a1)=0 and ir(1 ,a1)=1. The e >0 is chosen small enough to ensure ir(p;a) E [0,1].These functions satisfy all but the smoothness criterion of the proposition. They aredifferentiable everywhere but at the kinks. Since these kinks are at least distance awayfrom p they can be smoothed (for example using the logistic function) and still meet therequirements of (A.5) and (A.8).- 244 -Learning, Evolution and S4forganizaiion Appendis AFigure B.1- Functions which support p* as a stable dynamic equilibriwnThefunctions arefrom (A. 9) and (A. 10). The dashed lines indicate smoothing ofthe piece-wise linearfunctions.Markov Transition ProbabifitiesThe Markov transition matrix M with entries M as the probability of transiting to state jgiven the current state is state i. The Markov matrix elements, from (3.9) are as follows.M11 = 1’/21LM12 = ½M13 = 0 (A.11)M14 = 0M15 = 0M16 = 01.0d + (ed)p*C C0rjj;C)-1.0pC- 245 -Learning, Evolution and SeOrganizadon Appendix A2c)+1/2M)M21=4’12c+eM22 = (1—Aq)((1—2c+e (2c+ej +½jA)I 2cj+½)+4,7((1)1 er e (A.12)M,=____M =l/27((l_,4)[ej +½)M=OM26 = 0M31 = 0M32 = (1_1/zll)((1_I.L)(_S_j +½)c+M = ½((1-)1dI d33M = 0[;j +½)+(l_½n)((l_)........J +½) (A.13)M = ½fl((1—L)1 d 135M = 0+½)= 0M42 = ½(½)M43 = 0 (A.14)M = ½(1—½)+(1—½)(½)M45 = (1_½,7)(1_1/2)M51 = 0M52 = 0M53 = ‘Aq(½) (A.15)M54 = ½i(½)M55 = (1—3)(½)+4,(1—’/2)M56 = (1—34)(1—½)- 246 -Learning, Evolution and SeOrganizadon Appendix AM61 = 0M62 = 0M63 = 0 (A.16)MM=OM65 =M56 = 1-/Proof of Proposition 2First the invariant distribution of M in (3.9), with the noise levels as specified in (3.11), iscalculated. This is done by solving p*? =p*IM yielding the function p*(;K,K). Each ofthe six states of the solution has the following form (1=1,.. ,6ID crexp(-/3)p.*(•;KK) = J (A.17)D ahexp(-hflwhere the cx’s are functions of the game payoffs (c,e,d) and the j3’s are linear iç and K andare strictly positive. Note that each of the six terms in p has the same denominator. Asgets large each of the terms in the numerator and denominator of (A. 17) approach zero.The limit of the expression, therefore, will depend only on the terms with the largestexponent (ie. the terms with the smallest fi). Using 0(z) to indicate terms with order strictlyless than-z, the limiting distnbution, p is:exD[—(3÷K )C1 + 0(3+K)32(c+d)(2c+e) 1 ‘1 J—cext,[—(3+K )C1 + Q(3+K)32(c÷d) I I j—eexp[—(3+K ) + Q(3+K)* 1 32(2c+e) I. I (A.18)pi(C;Kq,Kp) = —B—(3c+2d)eexnl —(3 +K +K )C1 + 0(3 +K +K)64(c+d)(2c+e) L fl I’ j—eexp[—(1÷K +K K] + 0(1+K +K)82c+e) “ “-e. exp[-(K +K )Cj + 0(K +K)4(2c+e) “ p 11 jwhere B is given by1 Mathematica was used to solve this system of equations and simplit the results to find the highest order terms.- 247 -Learning, Evolution and S4fOrganization Appendix A—C2B=32(c+d)(2c+e) xp[ -(3+KqXj—c(2c+e) —e(c +d)32(c+d)(2c+e) exp[ —(3+K)C](A.19)CaseNumberingKM<3K=3K>3—eexp{_(Kq+KpX] + O(l+Kq+Kp)4(2c+e)K,<3147K,1=3258Iç>3369,10,11—eexp[ —(Ic +K)] + O(K +K)) = lim 4(2c+e)Inn p6(C;K,)Içc exp—uç +içç + oIç +iç++Note that it is only in the denominator where determining which term (or terms) dominatesdepends on K and K. In particular, the dominant terms will depend on the magnitude ofK and K relative to 3 and each other. There are 11 possible cases for the limit ofp(KK) as -oo They are listed in Table B. 1.Table B.1- Nwnbering of CasesThere are 11 cases to consider. Note thefinal cell represents 3 cases depending on the relationship between K,and K,.Note that the final cell of the table includes three distinct cases depending on the relationshipbetween K and K. The proof proceeds by simply taking the limits in the various cases.Case 1 - K,<3 and KConsider state six.=1 (A.20)Since probability of state six goes to one, the probability of the other states goes to zero.Case7. 8and9-K,>3andKKConsider state one.- 248 -Learning, Evolution and SeOrganizalionAppendix A-c2 exp[—(3÷19CJ + 0(3+K)32(c+d)(2c+e)= 1 (A.21)urn pj(s;Kq,K,1) =— exp[ —(3 ÷K)] + 0(3+1932c+d2c+eAgain, since this probability is one, the others go to zero.Case 3. 6and lO-K>3andKr>KpIn this case, consider states two and three.—cexp[_(3+K)C] + 0(3+K)32(c+d)lim =-c(2c+e)--e(c+d)32(c’-d)(2c+e)exp[ —(3 +K)C] ÷ 0(3 +K) (A.22)= c(2c+e)c(2c+e) +e(c+d)—e exp[ —(3 +K)?] + 0(3 +K)32(2c+e)lini p;(;K,Iç) = I—c(2c+e)—e(c+d)32(c÷dX2c+e)exp[ —(3 +K)(] + 0(3 +K) (A.23)= e(c+d)c(2c+e) +e(c+d)The sum of these probabilities is one. Thus the probabilities of the other states goes to zero.This completes the cases which were listed in the Proposition. For completeness, theremaining four cases are included.Case 4- I<3 and I.=3In this case, consider states one and six.—C2cxp[—(3+K)iJ + 0(3÷K,,)lim p(;K,,K) Urn 32(c+d)(2c-’-e)C ( —C2 + —e. exp —(3 +K)C] + 0(3 +K) (A.24)32(ci-d)(2c÷e) 4(2c+e))= C2c2+8e(c+d)- 249 -Learning, Evolution and S4fOrganization Appendix A—exp[—(3+K)c] +4(2c+e)lim = ( -C + -e.exp[-(3+K,)] + 0(3+19 (A.25)32(c+d)(2c+e) 4(2c+e))8e(c+d)c2+8e(c+d)Again, note that these probabilities sum to one.Case2-K,7=3andIn this case, the limit of the invariant distribution puts all the weight on states two, three andsix. These probabilities are calculated below.32(c+d) ‘exp[ _(3+K)(]+0(3+K)lim p(C;K,1,Iç)= c-. —c(2c+e)—e(c+d) —e( 32(c+d)(2c+e) + ) exp[ —(3 +K)] + 0(3 +K) (A.26)= c(2e+e)c(2c+e) +e(c÷d)+8(e(c+—e (3 +J( )(]+Q(3 i-K)32(2c+e)urn p(C;Kq,K) =—c(2c+e)—e(c+d) + —e ) .exp[...(3+K)] + 0(3÷K) (A.27)( 32(c+d)(2c+e)e(c+d)c(2c+e) +e(C+d)+8(e(c+d)—e—(3 +K)t]+Q(3+K)4(2ci-e)lirn p(;K,K)?-...—c(2c+e)—e(c+d) + —e ) .exp[—(3+KK] + 0(3+K) (A.28)( 32(ci-d)(2c+e)Se(c+d)c(2c+e) +efr+d)+8(e(c+d)These three probabilities sum to one.Case 5- K,=3 and KIn this case, states one, two, three and six must be considered. The calculations follow.- 250 -Learning, Evolution and SeOrganization Appendix A—C2exp[-6(J + 0(6)32(c-’-d)(2e+e)lim = (— + —c(+e)—e(c+t1) + —e (A 29)32(c+d)(2c+e) 32(c+d)(2c+e) 4(2c+e)j exp[ —6C] +C2+c(2c+e)+e(c+d)+8e(c+d)-c.exp[-6C] + 0(6)32(c+d)lim AV’( ;K,K)=—c2 + —c(2c+e) —e(c+cf) + —e (A 30)C-— L. 32(c+d)(2c +e) 32(c+d)(2c+e) 4(2c+e)) exp[ —6C] +c(2c+e)+c(2c+e)+e(c+d)+8e(c+d)-c.exp[-6C] + 0(6)32(c+d)p*( C;K,Iç) = ( _2 +—c(2c+e)—e(c+cO + —e A 31)C-—32(c+d)(2c +e) 32{c +d)(2c+e) 4(2c+e)J exp[ -6] + 06e(c+d)c2+c(2c+e) +e(c+d)+8e(c+d)-eexp[6] + 0(6)(4(2c+e)______lim p( (;K,K) =—c2 + —c(2c+e) —e(c+d) + —eC-—32(c+d)(2c +e) 32(c +d)(2c+e) 4(2c+e)J exp[ -6] + oy328e(c+d)c2+c(2c+e) +e(c+d)+8e(c+d)These four probabilities sum to one.Case 11- KK,=K>3In this, the last of the 11 cases, states one, two and three are relevant. Let K=I=K> 3._c2exp{-(3+K)J + Q(3+K)32(c+d)(2c+e)Jim p(t;Iç,K) = ( —c2 + —c(+e)—e(c*-cOi (A.33)C-.— exp[—(3K)Cj + 0(3+K)32(c’-dX2c+e) 32(c+d)(2c-i-e)C2c2+c(2c+e) +e(c+d)-251 -Learning, Evolution and S4fOrganization Appendix A-cexp[-(3+IQJ + O(3+K)32(c-i-d)lim=—c2—c(2c+e)—e(c+d)1 (A.34)(32(c+(2c+e) + 32(c+(2c+e) 3 exp[—(3+K)J + O(3+K)c(2c+e)2+c(2c+e)+e(c+d)-e [-(3÷IQ(] + Q(3+JQ32(2c+e)lim p( ;Kq,Iç) =—c2 + —c(2e+e) —e(c+d)’ (A.35)(32(c+d)(2c+e) 32(c+ti(2c+e) J .exp[—(3+IQC] + Q(3+K)e(c+d)c2+c(2c+e)+e(c+d)These three probabilities sum to one. With these 11 cases, the proof is complete.- 252 -Appendix B - Proofs for Chapter 4Proof of Lemma 1The asset demands in (4.4), characterized by the learning states, determine the aggregatedemand. Equating demand with aggregate supply determines the equilibrium price.j3013yP + f3+f3P-P= N• e (B.l)jEl a(+’y) uEUNote that there are X • N informed agents and (1—k) • N uninformed agents. The followingdefinitions are useful.1 1 1 2kNa(g+yi) ‘ (l_k)N,fr,a(cr+.yu)T’ and TU are the average (or aggregate) effective risk tolerances of the informed and theuninformed agents. Average 13’s can be defmed by weighting the fl’s by the effective risktolerance. For j=O, 1(31 = 1 (3 flU = 1_______(13.3)‘ T’XN iEI a(o+yz) ‘ ‘ T’.(l—X)N uEUUsing these definitions, (B. 1) can be stated as,X T’(I3 +fl1 - P) + (1-k) Th1(fl’ i3P —F) =e (B.4)Solving for the equilibrium price, where+ (1-k)fl’T’ -a =_____________XT’ + (1_k)Th1(1_f3’)a =__ __ _ _ __(B.5)1XT’ + (1—k)T’(1--j3’)—1a =_XT’ + (1-X)T’(1-fl’)yields (4.10).Proof of Lemma 2For the informed agents, (4.11) follows directly from (4.1) and (4.2). For the uninformed,(4.12) follows from the linearity of the equilibrium price in (4.10) and the joint normalityof the random variables. Note that- 253 -Learning, Evolution and S4fOrganization Appendb BE[P] =a02 22 220P = aly+a2e=a1(TThese relationships determine Eb’ IPI and V[y IF] (see Morrison, 1990).E[ylP] = E[y] + !.(p -E[P])cr,,(B.7)= 2 22 2•(P—a0)a1 y + a2 °e2(cT2V[yIP] = —___“P (B.8)22a 2= 2 2 2 2 0:,,+ aoThe E[j Ill and V[y I F] are combined with the dividend and signal relationship in (1) and (2)to yield the rational learning state. •Proof of Lemma 3Consistent with the analysis in Section 4.ffl, Chapter 5, and the genetic algorithmsimulations in Chapter 6, for concreteness let 3= 1 (the analysis for an arbitrary j3proceeds similarly). In rational expectations equilibrium all agents use rational learningstates, Vz=f. Using the rational learning states described in lemma 4 and the definitionsin (B.2) and (B.3), in the rational expectations equilibrium, fl’=ir and =j3 forj=0, 1and, TU= 12.2 2 22 a2 qc1y (B.9)a +E.2 2 .2 2a1 o, + a2 eThe equilibrium price (for any population learning state) is given by (4.10) and (B.5).Inserting the above values for 13o’, i3i’, T’, I30, /3i11 and T U yields the following set of linearequations.- 254 -Learning, Evolution and S4lOrganizaiion Appendix BXI3T’ + (1_X)TU -XT’ + (1_X)TU(1_fl)a;Xf3T’ (B.1O)XT’ + (1_X)TU(1_j3’)* —1a2 =______________________XT’ + (1_X)TU(1_(3J)The unique solution to these equations is• * ea0= I3 +___XT’ + (1—X)T’* = r o + X k2 o.2 2 ÷ >2 y2 u (B. 11)22 222 222+ X k 0y °e + k 0e T6= kawhere k= — 1/(XT’). These defme the rational expectations equilibrium price. UProof of Proposition 1From the linear pricing equation (4.10) and the joint normal distribution, the conditionaldistribution of y given P is normally distributed with mean JL(P) given in (B.7) and varianceo2 from (B.8). The proof of the proposition proceeds by calculating three quantities:fu(n,p), f”Q?,LIP), andf’(t’,LIP).f4(ffl LIP)The risky asset demands of the uninformed follow from (4.4) and the parameters in r.= E’[dIP]-PaV”[djP]+ f3P= 2 (B.12)i3c + (P) P + (P)a(o +where (P)=—D+(I311f3DP. Note the use of the assumption that yu=y1=c.End-of-period wealth for the uninformed agent follows from (4.3) and is- 255 -Learning, Evolution and Seq’Organization Appendix B+ p(P) —P + (P)a(+o2) I (3* +y+e—P) (B.13)End-of-period wealth is normally distributed with conditional mean and variance ofE[W1 i] = w: + (j + (P) —P + + ,L(P) —a(c+o2)(13.14)v[w1 i = ( + 1i(P) —p + r(P))a2(o+o2)(B.15)These conditional moments determine the utility (using the CARA presences and normality)fu(4u,L IP) = E[_exp(_a W1j I P1exp 1 n(p)2 j[+O= _exp_a IE[Wln IP]-.v[w1 P1] 1=-exp(-a w) exp(ii) fu(e*,LIp)Note that (P) =0 (for all P) when t1=1*(L). Thereforefu (f * ,L I P) = -exp(-a w) expandfu(1?n,L IP) = exp (p)22(+o2)(B.16)(B.17)fU(f*,L IF’) (13.18)(iii) fi(en,LIP)Note that the assumption that the informed agents know the I3*s in (4.1) implies thatf(1?’,LIP) =f*,LIP). Following similar steps which lead to (13.16), the informed agent’sexpected utility conditional on both y and P is- 256 -Learning, Evolution and S4/Organization Appendix BE[U(W1)iy,pj= _expI_a [E[W1ny,p]—.v[w1ni]j j2(B.19),, (fj+y—P)= —exp-a W0 j expa cj exp — 22Taking expectations over y given P in (B. 19) will yield f(t,L I P). As noted above, theconditional distribution of y given P is normal. The calculation uses the normal distributionand “completes the square” (see also Grossman and Stiglitz (1980) page 406).I P = -exp(-a w: exp(a c) 2+o2LJ½exp [= exp(a c)2f”(f ,L P)+ a2(L)This completes the proof.Proof of Proposition 2The proof uses the conditional utilities calculated in Proposition 1 to calculate theunconditional utilities. First by inspection of (B. 16) and (B.20) note that initial wealth, W0will have no impact on the relative utilities. Therefore, without loss of generality, setW0=O. Define2 ½D = exp(a c) E 03.21)+ o2(L)Note that this value does not depend on P. Using the variables defined in equations (4.17)to (4.20) and 03.21), write (B. 16) asfu(en,L I) = —exp _2 (x2_2)j 03.22)and 03.20) asf(t,LIP) = —D exp1_X2] 03.23)Note that X is a linear function of P (the conditioning information in Proposition 1). In fact,by construction, (P)=A+BX. Since P and X are perfectly correlated,fu(1n,LIp)fu(tn,L IX) and f(t,LIP)=f’(t”,LIX). Since P normally distributed (a linear- 257 -Learning, Evolution and SerfOrganization Appendix Bfunction of normally distributed variables, see (4.10)), Xis also normal. Using the techniqueof completing the square commonly used when calculating expectation of exponentialnormals the expectation, over X, is as follows.Er..exp [_2(X2_(A+BX)2)1 1 = -(1+C(1-B2)exp _c _A2+_Bi4_CA2_2tABL 2 jj 2 1+C(1-B)(B.24)Note that this is the equation for E[f(r,L IX)]. Multiplying (B.24) by D and settingA =B=0 yields E[f’(f’,L I X)]. Normalizing f(4,L) =1 and calculatingfi4(4n,L) = E[f’(€,L IX)] (B.25)E1p(r,LX)]using (B.24), completes the proof.’ •Proof of Corollary 1Since VZ = * for all n agents, (P) =0. Therefore in (4.16), A =B=0. The rationalexpectations equilibrium proportion of informed agents is where agents are indifferentbetween being informed or uninformed, that is fu=f= 1. The X* follows from (4.16) withA =B=0 and the defmition of o2(L) in (B. 8) and the rational expectation a; in (B. 11) 2 •Note that the ratio considers the fact that the absolute level of utilities are negative.2 Integer constraints are ignored here. In the genetic algorithm simulations of Chapter 6, X’O.5 and N is even, so integerconstraints do not play a role.- 258 -Appendix C - Proofs for Chapter 5Proof of Lemma 1(a). From inspection of equation (4.16) in Chapter 4, forJ” 1e,fU is continuous in X and4?.’ Thus since the set {f 1} is closed so is G(E) (ie. inverse image of a closed set forcontinuous function is closed).(b). By the definition of the GS equilibrium, (X, 4? **(X*)) E G(E). Therefore G(E) is notempty.(c). Note that at £(X) uninformed agents are making no inference errors (by defmition) andsince X> X, fU(X,f**(X))> 1. Since f14 is jointly continuous in X and in the inferenceparameters, there exists such that for all (E,Et) ER3,fU(X + e>, 4? (X) +es)> 1 for X> X andII (e),e) ii <E..(d) By construction of the GS equilibrium, fU(X*,4?**(XD) = 1. Since inference errors reduceutility for all e (with lie II >0),f14(X*,1?**(X*)+e)< 1. Again, by construction of the GSequilibrium, for X< X fu(x,4?)<f4(x,4?**(x))< 1.Proof of Lemma 2For XE [X*, 1], defme the correspondence, G(E) asG(E) = {ti(x,t)eE)} (C.1)= {e[f(x,4?;E)1}Note that by Lemma 1(b), G(E) is non-empty (and therefore is a correspondence). SincefU is continuous in and c and the set {f‘ 1) is closed, the graph of the correspondenceG(E) is closed. Therefore G(E) is upper hemi-continuous.2In the economy where Orç=0, that is E=(0,XD, fU(X,4?;E) 1 and = 1 only if £ =4?**(X).Therefore, G(0,X) is the singleton {t (X)}.Consider a sequences E1, with oe>0 and constant X, such that E,_s(0,X*). Take any twosequences £,-s.4? and £-‘4?’ such that £,,4?,’E G(E,). Since G is upper hemi-continuous,4? ,t E G,(O,XD. But since G(0,XD is a singleton, 4? = 4?’ = 4? (X) and II ..f**(X) fl +£: 4?**Q) fl -‘0. By the triangle inequality, X,-f + fl X’-4? II ii rIl, therefore£,-f; —‘0. Since this is true for any two sequences, 4(E,)-’0.1 From the remarks which follow equation (4.16), note thatf is only discontinuous when the inference error in the slope is large.However, this occurs only whenf approaches zero.2 See Debreu (1954) Theorem 1.8(1), page 18. A correspondence, h:Q-.R associates each element ofxEQ with a non-emptysubset of elements in R. h(4) is upper hemi-continuous (or upper semi-continuous) if: x,-x0y,eh(x) and Y,Yo impliesy0th(x).- 259 -Learning, Evoiunon and SeOrga&zadon Appendix CProof of Proposition 1(a) First consider possible dynamic equilibria such that XE(0,1). By assumption,g 0(X,1?) E G. Given (5.4), the continuity of g and properties of the closed sets G, forXE(0, 1), g=0 if and only if (X,t) is on the border of G. (Throughout this appendix, FiGindicates the border of the set 0). Similarly, for XE (0,1), g=0(X,) EH (ie. (1? =1 (X)).From Lemma 1 (c) and (d), the only intersection of the border of G with H is where X=X.Therefore for XE(0,1), the unique dynamic equilibrium is (X,(X)). To verify that thisequilibrium is stable note that g> 0 for X < X (Lemma 1(d)) and g<0 in a neighbourhoodof r(X) for X> X (Proposition 1(c)). This coupled with the continuity of the learningprocess and the convergence assumption on g1 establishes stability.3(b) Next consider X =1. By assumption, all points (1, £) are fixed points of the learningdynamic. However for (1,1?) E G, g(1-e,X) <0 (for e > 0) and these fixed points are notstable.5 For (1,1?) G, g(1-e,X) >0. Thus for small enough c and X0= 1-€, limX =1.By assumption, is small for X close to 1. Thus, for, £=?, lim,t, can be madearbitrarily close to t. Thus (1,X) G is stable.6(c) Finally consider X=0. Since g(e,X)>0 (for small positive e), any equilibria with X=0are not stable.Proof of Lemma 3For any £, (l,f)E G’ follows immediately from the assumptions that g(1,X)=0 andm(1 ,X) <0. The existence of (which may depend on 1?) such that € < impliesg(1—e,X)+m(1—e,X)<0 is an immediate consequence of the continuity of g, and m. •Proof of Lemma 4(a) By definition, (X’,4?’)E G’ implies that g(X’,f’)+’qm(X’,t’)0. If g(X’,t!’)0 then(X’,f’)E G. On the other hand if g(X’,f’)>0 then 0<g(X’,f’)< —‘qm(X’,f’). Bychoosing small enough q, g(X’,t’) is within ‘ of g(X,1)=0. By the continuity of g,(X’,4’) is with distance of (X,4).For additional details, see the proof of equilibrium existence for learning processes including drift in Proposition 2.More formally, this can be stated as there exists an 1>0 such that this g(l-€,t)<O for all 0<e<1. Since G need not be aconvex set, the sign of g, may be positive for larger e.For the case where (1,t) is on the border of G, then (X-e,1) may or may not be in G. If (X-e,t) EG, then (l,t) is clearly notstable. However, if (1-e,t) G then g(l-e,X) >0. However, since t t’(l ,t) (Lemma 1(c)), g is non-zero and the stabilitydepends on the specific properties of g (ie. if g4 implies monotonic convergence to t’, then boundary points are clearly notstable).6 The points (1 ,X) E G are not asymptotically stable since the equilibria (1,1) are connected.- 260 -Learning, Evolution and S4fOrganization Appendix C(b) (X,e)E Gimplies that g(X,e)0. Ifg(X,t)+im(X,C)0, then (X,)E G’. If not then0< g(X, £) + 1?) <jm(X, f). Thus for small enough t, by the continuity7of g andmx, there exists (X’,e’) within distance of (X,e) such that g(X’,t’)+im(X’,f’) 0. •Discussion: Sufficient Conditions for non-empty h(X)There are at least two ways to rationalise the assumption that h(X) is a function (ie. thatsolution to g,+1m=0 for fixed X exists, is unique and asymptotically stable). The first isto utilize the smallness of ii,. If we write z(f ,i)=g(X,t)+,m(X,i), then by the assumptionthat g, leads to the rational expectations inference parameters holding X constant,zx(f**(X),0) =0. Since (X) is asymptotically stable, the determinant of the Jacobean of zat this point is non-zero. Therefore, by the implicit function theorem, the function f(j)exists such that zx(),) =0 for small . Therefore if attention is restricted to X which arenot close to 1 and small ‘h, then h(X) exists and is unique. However, when X approaches1, g,-O and the implicit function theorem no longer can be applied (ie. Jacobean becomeszero). However, since m(l,,j=0 a similar argument (for fixed >0) and X=1 can beconstructed. The implicit function argument will not work for all X since there is an “in-between” X where g and 1m, are of the same order of magnitude.The second approach is to make additional assumptions directly on g, and m to ensure thath(X) is non-empty. For example, assume that both g, and m, are monotonic (in each elementof £) as follows:g1(X,t) = ig(X,r*(X)_1) (C.2)=(C.3)where the function 1g’m: [0,1] xR2-. are jointly continuous and increasing (term by term)in their second argument and equal zero if and only if the second element is zero. Withoutloss of generality assume that 1m< <r(X) (where << means the inequality is true forboth elements of 1 or £,,< £ and m1 < t**)Given these assumptions, for a given XE (0,1], the set h°(X) = {1? I g2 +iem, =0} is not empty.This follows from the monotonicity and continuity assumed in (C.2) and (C.3). They implythat the system (holding X constant) is always pushed into the box defined by r and £m.That is:O< £mO g0>O, m0>O (C.4)g,0<0, m0<0Noe d&=g+,n is just a linear combination of two continuous functions, therefore is also continuous.- 261 -Learning, Evolution and S4jOrganization Appendix Cei<emz g1>O, m1>0 (C.5)g,1<0, m11<OSince the functions are continuous, it follows that h(t) = {f0 Ig,0Q ,1) Lo, t ) 0}is non-empty and (by Debreu (1954) Theorem 1.8(1)) is a upper hemi-continuouscorrespondence of f. Similarly for f)={f(X,+iemt0}is a non-empty upper hemi-continuous correspondence of £. These two correspondences must crossat least once. Using the monotonicity assumption, at least one of these crossings will beasymptotically stable.Proof of Lemma 5The graph of the set of dynamic equilibria,h°(X) = {fIg(X,e) + m,(X,f)=0} (C.6)is non-empty by assumption. This graph is closed sinceg1+17m, is continuous and theinverse image of a continuous function of the closed set {0} is also closed. This implies thath° is an upper hemi-continuous correspondence (see Debreu (1954) Theorem 1.8(1)). Sinceasymptotic stability is a refinement of a dynamic equilibrium, h(X) c h°(X). However, sinceboth sets (by assumption) are singletons, h(X) =h°(X) and h is also upper hemi-continuouscorrespondence. Since h(X) is also a function, upper hemi-continuity implies it iscontinuous. .Proof of Lemma 6h(1)=€m follows from the assumptions that g,(1,€)=0 for all £ andm1(X,e)=O only at £ =mTherefore, h°(1) = {1?} where h°(X) is defined in (C.6). Since h(X) =h°(X) (by assumption),h(1)={e**}.Proof of Lemma 7For X < 1, g,(X,e)=0 if and only if £ = £ (X) andm1(X,e)=0 if and only if £ = em. However,by assumption, £m £ (X). Thus, h(X) £ (X).Proof of Lemma 8Define KQ)=sup(,){ fl m(X,i) }. Since m1 is bounded, K(X) < . For any XE (0,X”),g(X,e(X))=0 and g1(X,f)+m(X,t)=0 where £=h(X). Therefore,II g,(X,f-g(X,e) = m(X,t) <K(X). By choice of i,1K(X) can be made arbitrarilysmall. By the continuity of g,, fl e-r <. .Prelude to Proof of Proposition 2The following definition and Lemma are useful for showing the existence of a stable dynamicequilibrium for a learning process. These definitions are with respect to the generalautonomous (ie. time independent) system of equations for xE R, dx/dt=g(x) where L(,t)- 262 -Learning, Evolution and SeOrganizadon Appendix Cis unique continuously differentiable solution.8Note that the learning process considered inthe chapter in (5.1) and (5.2) is of this form.Definition The set V is an invariant set fall trajectories which begin in V remain in V.Lemma El Ifx is a stable dynamic equilibriwn, then for every neighbourhood Q ofx,there exists and invariant set VC Q.Proof of Lemma ElSince x is stable, there exists a neighbourhood R such thatx0ER implies all xE Q. Defmethe set V={L(x0,t)IxER,t0}. The set V is all the points of all the paths which originatein R. By construction, VC Q. By construction, xE V implies there exists somex0ER andr0 such that x=L(x0,r) and L(x0,t)E V for all tr. Time independence implies thatL(x0,t)=L(x,t-r). Therefore L(x,t) E V for all t0 and V is invariant.Remark V need not be a neighbourhood of x even though it is contained in theneighbourhood of Q.Proof of Proposition 2The proof shows the intersection of h(X) (where d1/dt=0) and the boundary of G’, denotedG’ (where dXldt=0) is non-empty and that at least one of the intersections is stable.For small e, (e,h(e))G’ (Lemma 1(d) implies g(c,f)>0 and (5.12) implies m(E,)>0))and (1,h(1))E G’ (by Lemma 4). Therefore, the infemum of(x’,’) = inf{(X,h(X))EG’} (C.7)has 0< X’ <1. Since h(X) is continuous, (X’ ,h(X’)) E ÔG’ (otherwise (X’-€,h(X’-€)) E G’ forsmall which contradicts that X’ is the infemum). Therefore (X’,e’) is a dynamicequilibrium. It remains to be shown that this point is stable.To show that (X’,f’) is stable, consider any arbitrary neighbourhood Q’ ER3 of (X’,t’). Thefollowing will construct the neighbourhood R’ C Q’ with the property that all trajectoriesoriginating in R’ must remain in Q’. To begin, without loss of generality, letQ’=(X’—z,X’+i)xQ where t>0 and QER2 is a neighbourhood of t’. The constructionof R’ is such that R’=AxR where A=(X’-½e,X’+½E) with 0<e< and R is aneighbourhood of £‘ with R C Q.Let the solution to dt/dt=g(X,f)+j2m(X,1?) for a fixed and constant X be denotede=L(X,e0,t). Note that L is continuously differentiable in each of its parameters. RecallFor the existence and uniqueness of G is sufficient that g is continuously differentiable (bence Lipshitz Continuous).- 263 -Learning, Evolution and S4fOrganization Appendir Cthat the unique asymptotically stable fixed point of this system is given by the continuousfunction h(X). Since h(X) is continuous, there exists an E >0 such that h(X) ER for all XE A.Furthermore, since each h(X) is also a stable dynamic equilibrium (when X is held constant),there exist sets U(X), neighbourhoods of h(X), such that tE U(X) implies that L(X,f0,t)EQfor all t 0. Since each U(X) is open, h(X) is continuous and € >0 can be chosen arbitrarilysmall, there exists an open set U (a neighbourhood of V =h(X’)) such that for all XE A,h(X)E U and £0E U implies that L(X,10,t)ER for all t0. For example, U could be theintersection of the appropriately chosen U(X).For each XEA, an invariant set V(X)CR can be constructed as in Lemma El,V(X)={L(X,10,t)It0 U, t0}. Consider the intersection between any two of these setsV(X) fl V(X+e). The following will show that this intersection is an invariant set under bothL(X,•,•)andL(X+€,•,•).Consider 4? C V(X) fl V(X+c). It follows:(i) By definition 1? C V(X) implies there exists £0E U and T such that X=L(X,4?0,r).(ii) Similarly, 4? C V(X+€) implies there exists £E U and TE such that £ =L(X+€,4?,r).(iii) Since V(X) is an invariant set, L(X,f0,t)E V(X) for all t0.(iv) Similarly, since V(X+€) is an invariant set, L(X+€,t,t)E V(X+€) for all t0.(v) Since L is (continuously) differentiable, the implicit function theorem can be applied9and there exists (t) such thatL(X,e,t)=L(X+€,t+r,t). By choice of small €,can be made arbitrarily small.’° Since U is open, £+(t)E U. Since this is true forany t, L(X,4?0,t)E V(X+€) for t0.(vi) Similarly, there exists a small (t) such that L(X + €, £,t) =L(X, £4) + ,t) and £+ C U.Therefore L(X+€,f,t)E V(X) for all t0.Taken together, these points imply that L(X,t0,t)C V(X) fl V(X+e) andL(X+€,4?,t)E V(X) fl V(X+€). The intersection of the invariant sets is also invariant. LetV be the intersection of all the invariant sets. That is,V= fl V(X) (C.8)XEASince (i) to (vi) are true for the intersection of two arbitrary sets which differ in theparameter by €, it follows that for all XE A, 4? C V implies that L(X, 4? ,t) C V for all t 0.The implicit function theorem applies so long as the determinant Jacobean of G, VG(X,h(X),t) is non-zero. This follows fromthe fact that h(X) is the unique fixed point and that it is asymptotically stable.10 Note that since the h(X) are asymptotically stable, the set U can be chosen to ensure that trajectories converge. Thereforelim. h(X)-h(X+e) which is small by the continuity of h(X).- 264 -Learning, Evolution and S4tOrganizo4on Appendir CWe have constructed an invariant set V (such that VCRC Q) for the inference parameters.As longs as X,EA, £,E V. It remains to be shown that R and A (ie. choice of E) can bechosen to ensure that XE A. By construction, the dynamic equilibrium (X’,2’) is on theboundary of the closed set G’ such that, (X’-e, £ ‘) G’ and (X’ + E, £‘) is an interior point ofG’. Therefore for a small enough neighbourhood R of 1?’, £ ER and X’-E < X < X’ implies(X, 1) G’ and dklcft>O. Similarly, for I ER and X’ X < X’ + € implies (X, I) E G’ anddX/dtO.Therefore, for any neighbourhood Q’ of (A’, I’), there exists the neighbourhood AX U suchthat (X0,t)EAXU implies (X,,€1)EAxV and by constructionAxVC(X’—A,X’+A)xQ=Q’.Proof of Theorem 1Consider the learning process L(X”4”,t) with X”=l-A” >X and I **(1)(A A) liUsing Lemma 3 and the continuity of h (Lemma 5), for small A”, for XX”, (A,h(X)) is aninterior point of G’. Therefore there are no dynamic equilibrium of L with A A” (ie. nointersections of h(X) and FiG’). Therefore, applying Proposition 2 (existence of a dynamicequilibrium), there exists at least one dynamic equilibrium with A < A”. Recall that theintersection of £**(X) and FiG is unique and is the GS equilibrium (Proposition 1). Since forA < A”, h(X) — I**(X) fl <A” (see Lemma 8), A” can be chosen small enough to ensure thatsup{II(X,f)_(A*,e**(A*)II I(XMEFiGflh(A)}<€. Lemma 4 implies that for X<A” and(A’, I’) E FiG’, inf{ (A’, £ ‘) — (A, I) I (A, I) E FiG) <A” (ie. the boundary of G is close to theboundary of G’). Therefore, by the triangle inequality, if (X,I) is a dynamic equilibrium (ie.in h(A) fl FiG’) then ff (A, £)(X*, I (A)) <A” + € < A for small enough A”. Therefore all thedynamic equilibria of L(X”,A”,I,) are with in A of the GS equilibrium. The same is alsotrue of learning processes with less drift, namely A’ A’, A’ A” and II £,,A(1)IIIIf—1(l)II.Proof of Theorem 2It is sufficient to show that for small G’ and h(A) do not intersect for A < A”. FromLemma 2, G(E) approaches the set {(X,r(A)) IA> X*} as 7e goes to zero (holding Xconstant). From Lemma 7, h(X) 1(X). Therefore there exists j such that for all o<h(X) and G do not intersect. The fact that h(A) and G’ do not intersect for A < A” ineconomies with < o’ follows from the property that A drift does not exceed inference drift.That is, the property thatinf{ h(A)—e * (A) } > sup{ (A,I)—(A,t’) (X,I)E FiG,(A,e’)e aG’} (C.9)This assumes that, for all X, t(X)t(1)-(”,A). This is without loss of generality since choice of t, can be perturbedslightly if required.- 265 -Learning, Evolution and S4tOrganization Appendix CTherefore there are no equilibria with X < X”. Any equilibria (at least one exists), must havethe property that X > X”.- 266 -Learning, Evolution and SeifOrganization Appendix C...TheEnd...- 267 -
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Finance theory applications of learning, evolution...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Finance theory applications of learning, evolution and self-organization Routledge, Bryan R. 1995
pdf
Page Metadata
Item Metadata
Title | Finance theory applications of learning, evolution and self-organization |
Creator |
Routledge, Bryan R. |
Date Issued | 1995 |
Description | This thesis investigates stochastic adaptive learning and contrasts models of adaptive individuals with models that assume complete and unbounded rationality. In this thesis, individuals follow rules of thumb which are developed by adopting or copying successful rules, abandoning unsuccessful ones and occasionally creating novel rules. In fixed environments, these learning algorithms differ only slightly from complete rationality Bayesian approaches. However, in situations where an individual’s utility depends on the behaviour of other individuals, a co-evolutionary environment, the prediction of adaptive learning models can differ markedly from traditional complete rationality models. In the first section of the thesis interaction between individuals’ decisions is limited and direct. People face their neighbours in a repeated prisoner’s dilemma. A genetic algorithm, used as an example of a stochastic adaptive learning process, is developed in Chapter 2. The rate of learning in the algorithm is controlled by altering the number of individuals obtaining new strategies in a generation. In the infinitely repeated game the learning rate affects the equilibrium level of payoffs (ie. affects which equilibria are selected). In the finitely repeated game the learning rate determines whether or not the system converges to the unique Nash equilibrium. Chapter 3 considers a similar model analytically yielding analogous results. The second portion of the thesis investigates stochastic adaptive learning in a non-strategic yet co-evolutionary environment. This section develops an asymmetric information, one-period, single risky asset portfolio choice model based on Grossman and Stiglitz (1980). The main finding of these three chapters is that the appropriateness of the rational expectations (or complete rationality) equilibrium depends upon the level of noise in the economy (in the form of noise traders) relative to the level of experimentation in the individual’s learning processes. The discussion of this relationship begins in Chapter 4 by constructing a learning process which converges to the rational expectations equilibrium and concludes with a discussion of the stability of the Grossman-Stiglitz equilibrium to an adaptive learning process where experimentation does not vanish. Chapter 5 develops a deterministic representation of a stochastic adaptive learning process to formally develop the link between noise trading, experimentation and the Grossman-Stiglitz equilibrium. Finally, Chapter 6 demonstrates the stability result in a more general environment using a genetic algorithm as an example of a stochastic adaptive process. |
Extent | 7598786 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-04-24 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0088380 |
URI | http://hdl.handle.net/2429/7553 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
GraduationDate | 1995-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1995-060551.pdf [ 7.25MB ]
- Metadata
- JSON: 831-1.0088380.json
- JSON-LD: 831-1.0088380-ld.json
- RDF/XML (Pretty): 831-1.0088380-rdf.xml
- RDF/JSON: 831-1.0088380-rdf.json
- Turtle: 831-1.0088380-turtle.txt
- N-Triples: 831-1.0088380-rdf-ntriples.txt
- Original Record: 831-1.0088380-source.json
- Full Text
- 831-1.0088380-fulltext.txt
- Citation
- 831-1.0088380.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088380/manifest