FINANCE THEORY APPLICATIONS OF LEARNING, EvOLUTIoN AND SELF-ORGANIzATIoN by BRYAN R. R0UTLEDGE B.Comm (Hon), Queen’s University at Kingston, 1987 A THESIS SUBMITTED iN PARTIAL FULFILMENT OF THE REQUllEMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE FACULTY OF GRADUATE STUDiES FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA October 1995 © Bryan Ralph Routledge, 1995 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. (Signature) Department of ‘-) LT ( f( The University of British Columbia Vancouver, Canada Date DE-6 (2/88) I? CT CEc2 L99 1 Finance Theory Applications of Learning, Evolution and SeOrganization Abstract This thesis investigates stochastic adaptive learning and contrasts models of adaptive individuals with models that assume complete and unbounded rationality. In this thesis, individuals follow rules of thumb which are developed by adopting or copying successful rules, abandoning unsuccessful ones and occasionally creating novel rules. In fixed environments, these learning algorithms differ only slightly from complete rationality Bayesian approaches. However, in situations where an individual’s utility depends on the behaviour of other individuals, a co-evolutionary environment, the prediction of adaptive learning models can differ markedly from traditional complete rationality models. In the first section of the thesis interaction between individuals’ decisions is limited and direct. People face their neighbours in a repeated prisoner’s dilemma. A genetic algorithm, used as an example of a stochastic adaptive learning process, is developed in Chapter 2. The rate of learning in the algorithm is controlled by altering the number of individuals obtaining new strategies in a generation. In the infinitely repeated game the learning rate affects the equilibrium level of payoffs (ie. affects which equilibria are selected). In the finitely repeated game the learning rate determines whether or not the system converges to the unique Nash equilibrium. Chapter 3 considers a similar model analytically yielding analogous results. The second portion of the thesis investigates stochastic adaptive learning — II — in a non-strategic yet co-evolutionary environment. This section develops an asymmetric information, one-period, single risky asset portfolio choice model based on Grossman and Stiglitz (1980). The main finding of these three chapters is that the appropriateness of the rational expectations (or complete rationality) equilibrium depends upon the level of noise in the economy (in the form of noise traders) relative to the level of experimentation in the individual’s learning processes. The discussion of this relationship begins in Chapter 4 by constructing a learning process which converges to the rational expectations equilibrium and concludes with a discussion of the stability of the Grossman-Stiglitz equilibrium to an adaptive learning process where experimentation does not vanish. Chapter 5 develops a deterministic representation of a stochastic adaptive learning process to formally develop the link between noise trading, experimentation and the Grossman-Stiglitz equilibrium. Finally, Chapter 6 demonstrates the stability result in a more general environment using a genetic algorithm as an example of a stochastic adaptive process. — 111 — Table of Contents Abstract ii Table of Contents iv List of Tables ix List of Figures x xvii Acknowledgements Chapter 1 Introduction I 1 Background on the Genetic Algorithm 7 II Adaptation in Spatial Environment 11 III Stochastic Adaptive Learning in a Financial Market 13 IV Conclusion 16 Chapter 2 Co-evolution and Spatial Interaction I 17 Background 22 II The Repeated Prisoner’s Dilemma Model and Evolution 25 (A) Agent Interaction 26 (B) Evolution 29 (C) Parametric Assumptions 34 - iv - Table of Co,Uenlz Learning, Evolution and S4fOrganization III Results 37 . (A) Infinitely Repeated Prisoner’s Dilemma 38 (B) Finitely Repeated Prisoner’s Dilemma Game 45 (i) The one-shot game (ii) The twice repeated game IV Discussion • • . • • .. . • V Conclusion . . . 45 47 56 60 Chapter 3 Simple Spatial Evolution I 84 86 Replicator Dynamics II Spatial Markov Model 92 ifi Conclusions 102 Chapter 4 Stochastic Adaptive Learning in a Rational Etpectations Model I Background 104 107 II Model 110 (A) Structure 110 (B) Learning 114 (C) “Rational” or Complete-Knowledge Learning States 118 -V. Learning, Evolution and Se!-Organization Table of oment.c III Learning, Equilibrium and Stability . 120 (A) Comparison of Informed and Uninformed Utility 120 (B) Stochastic Adaptive Learning Process 127 (C) Learning and Local Stability 129 (D) Stability with Always-Experimenting Stochastic Learning Adaptive 134 IV Conclusions 140 Chapter 5 The Stability of SAL: Deterministic Dynamics I Learning by Pure Adaptation . . . 142 147 II Driftiess Stable Dynamic Equilibria 152 III Learning with Adaptation and Drift 153 IV The Stability of the GS Equilibrium 158 V Conclusion 162 Chapter 6 Genetic Algorithm Learning in a Rational Expectations Model I 169 Simulation Design 171 II Simulation Results 176 (A) Stability Examples - Simulations One and Two - vi - 177 Table of Co,#enL Learning, Evolution and SefOrganization (B) Convergence Examples - Simulations Three and Four 181 ifi Extensions 183 IV Conclusions 185 Chapter 7 Conclusions 226 Bibliography Appendix A Appendix B Appendix C 233 - - - Proofs for Chapter 3 243 Proofs for Chapter 4 253 Proofs for Chapter 5 259 - vu - List of Tables Table 2.1 Table 2.2 - - Parameters for Simulation 35 Regression Results Data: 20 different initial populations and R=5,10,15,. .,95,100 used to generate . 400 data points for regression Table 6.1 Table 6.2 Table 6.3 Table B. 1 - - - - 40 Common Simulation Parameters 187 Simulation Specific Parameters and Results Summary 188 Common Simulation Parameters 189 - Genetic Algorithm Numbering of Cases There are 11 cases to consider. Note the final cell represents 3 cases depending on the relationship between K and K 248 - vial - List of Figures Figure 2.1 - The Repeated Prisoner’s Dilemma Game (a) Payoffs in the stage game (or one-shot) game. (b) Possible and equilibria per iteration payoffs in infinitely repeated game 27 Schematic for Evolution Simulation 30 Figure 2.2 Figure 2.3 - - Population Average Fitness in Final Generation Final Generation is calculated as either the generation where the population converged or generation 2000 Figure 2.4 - 63 Population Average Fitness in Final Generation (L= 10) Results for 20 different initial populations. R =5,10,15,... ,95, 100. Final Generation determined as in Figure 3 Figure 2.5 - 64 Population Average Fitness per Generation Time series plot for three different values for the number of agents replaced per generation. R=8,28,67. Same initial population Figure 2.6 - 65 Location and Strategy Evolution of the Agents Infinitely repeated Game with eight agents replaced. Each agent’s location (on the torus) is represented by a circle. Radius is proportional to fitness. Shading represents the percentage of C’s played the agent in tournament (10 move game against four neighbours or 40 moves) Figure 2.7 - 66 Difference in Average Fitness Between Converged and Generation 2000 - ix - Learning, Evolution and SetOrganizo4on List ofFigures There are 31 cases where the convergence in Figure 3 was not stable Figure 2.8 - 67 Average Fitness per Generation R=67 Note the population converges and then diverges several times. Insert provides a magnification of generations 860 to 890 68 Figure 2.9- Location and Strategy Evolution of the Agents Infmitely repeated game with 67 agents Replaced. Each agent’s location (on the torus) is represented by a circle. Radius is proportional to fitness. Shading represents the percentage of C’s played the agent in tournament (10 move game against four neighbours or 40 moves) 69 Figure 2.10- Location and Strategy Evolution of the Agents One-Shot Game with 1 agent replaced. Each agent’s location (on the torus) is represented by a circle. Radius is proportional to fitness. In this one-shot game, shading represents agents that play C as dark and D as white Figure 2.11 - 71 Population Average in Generation 5000 Population converged for R=11,44,47,50,51,...,100 Figure 2.12 - Population Average Fitness Statistics for Generations 5000 to 6000 (L=2) Frozen, Critical, Stationary and Converged regions are discussed in the text. Figure 2.13 - 72 . 73 All Possible Strategies for the Twice Repeated Game A string of 5 bits is used to represent these strategies. Strategy index numbers used in histograms which follow Figure 2.14- Location of Agents in a Frozen Evolution 74 List ofFigures Learning, Evolution and S4fOrganization The dark agents are the 8 least fit agents and receive new strategies each generation. 75 The other agents never change strategy Figure 2.15 - Population Average Fitness per Generation (R=8) Generation 5000 to 6000. An example of an evolution which is Frozen Figure 2.16 - 76 Strategies in Population in Generation 5000 White bars indicate initial population. See Figure 13 for Strategy Index Figure 2.17 - Population Average Fitness per Generation (R=34) Generation 5000 to 6000. An example of an evolution which is Stationary. Figure 2.18 - 76 . . . Strategies in Population in Generation 5000 White bars indicate initial population. See Figure 13 for Strategy Index Figure 2.19 - - 77 Population Average Fitness per Generation (R= 14) Generation 5000 to 6000. An example of an evolution which is Critical Figure 2.20 77 78 Strategies in Population in Generation 5000 White bars indicate initial population. See Figure 13 for Strategy Index 78 Figure 2.21- Location and Strategy Evolution of the Agents Twice Repeated Game with 14 agents replaced. Example of behaviour in Critical Region. Average population fitness moves from 1.49 in (b) to 1.25 in (f) and ends at 1.32 in (g) Figure 2.22 - 79 Location and Strategy Evolution of the Agents -xi- Learning, Evolution and S4IOrganizaiion List ofFigures Twice Repeated Game with 14 agents replaced. Example of behaviour in Critical Region. Average population average fitness moves from a low of 1.3 in (b) to a high of 1.5 in (f) Figure 2.23 - 80 Strategy Replacement Shade of agent indicates the number of generations since the agent received a new strategy. This simulation is for 14 agents replaced in the twice repeated game. Generations correspond to those in Figure 21 and 22 Figure 2.24 - Two Initial Populations Populations differ only in the first bit of agents 4 and 6 Figure 2.25 - 81 82 Strategies in Population After Generation 2000 and R= 16 From two similar populations, R= 16 in the Critical Region produces very different 82 populations Figure 2.26 - Population Average Fitness with Initial Population 1 and R= 16 2000 generations are reported. R= 16 is in the Critical Region Figure 2.27 - Population Average Fitness with Initial Population 2 and R= 16 4000 generations are reported. Simulation was run for 15,000 generations producing similar plot as presented here Figure 3.1 - 83 83 Agent Interaction Four agents, arranged in two groups, interact playing the prisoner’s dilemma. The symmetric payoffs for the row player are shown at the right Figure 4.1 - Expected Utility of Informed and Uninformed Agents - XII- 93 Learning, Evolution and Sw-Organization List of Figures The figure plots fi= 1 (a normalization) and f’ (for various £ and L) as functions ofX Figure 5.1 137 - Driftiess Learning Process The arrows are for a typical driftless dynamic. The equilibria are at the GS values (X’,O) or at X = 1 Figure 5.2 - 165 Phase Plots in Driftiess Learning Process Each line represents a different initial condition. Note all lines converge to GS equilibrium (X*=O.5) or the X= 1 boundary Figure 5.3 - 165 Learning Process with Drift The unique dynamic equilibrium is near the GS values (X*,O) (denoted by the dot) Figure 5.4 166 - Phase Plots in Learning Process with Drift Each line represents a different initial condition. Note all lines converge to a point which is close to the GS equilibrium Figure 5.5 - 166 Learning Process with Drift The unique dynamic equilibrium (denoted with the dot) is far from the GS values Figure 5.6 167 - Phase Plots in Learning Process with Drift Each line represents a different initial condition. Note all lines converge to a point closetoX=1 Figure 5.7 - 167 Learning Process with Drift - XIII - List ofFigures Learning, Evolution and SeOrganization There are three dynamic equilibria. Two are stable Figure 5.8 168 Phase Plots in Learning Process with Drift - Each line represents a different initial condition. Lines converge to close to the GS equilibrium or close to X =1 Figure 6.1 - 168 Schematic for Genetic Algorithm Stochastic Adaptive Learning Simulations Figure 6.2a Figure 6.2b Figure 6.2c Figure 6.2d Figure 6.2e Figure 6.2f Figure 6.2g Figure 6.2h Figure 6.2i Figure 6.3a Figure 6.3b Figure 6.3c Figure 6.3d Figure 6.3e Figure 6.3f - - - - - - 174 X, Proportion of Informed Agents 190 Difference in Average Fitness: f f” 1 191 , Average Price 0 a 192 , slope of price to y 1 a 193 , Slope of Price to e 2 a 194 j3 195 196 - - - - - - - - 3U 197 3U 198 X, Proportion of Informed Agents 199 Difference in Average Fitness: f’—f” 200 , Average Price 0 a 201 , slope of price to y 1 a 202 , Slope of Price to e 2 a 203 204 - - xiv - Learning, Evolution and S4fOrganization Figure 6.3g Figure 6.3h Figure 6.3i Figure 6.4a Figure 6.4b List ofFigures - - . U 130 1J 131 Figure 6.4d Figure 6.4e . • - - X, Proportion of Informed Agents Average Fitness - - - 205 206 fi and • . 207 208 fU • 209 For the first 200 generation Figure 6.4c . • - ac,, Average Price • 210 , slope of price to y 1 a 211 , Slope of Price to e 2 a • 212 Figure 6.4f j3’ 213 Figure 6.4g 214 - Figure 6.4h Figure 6.4i Figure 6.5a Figure 6.5b - 215 - 216 - - - X, Proportion of Informed Agents Average Fitness f’ and fU For the first 1000 generation Figure 6.5c Figure 6.5d Figure 6.5e - - - 217 218 , Average Price 0 a 219 , slope of price to y 1 a 220 , Slope of Price to e 2 a 221 Figure 6.5f i3’ 222 Figure 6.5g 223 - - -xv- Learning, Evolution and S4Organization Figure 6.5h Figure 6.5i Figure B. 1 - - - List of Figures i3ohl 224 225 ,9iU Functions which support p as a stable dynamic equilibrium The functions are from (A.9) and (A. 10). The dashed lines indicate smoothing of 245 the piece-wise linear functions -xvi- Acknowledgements For my Mom and Dad, A small return for years of love and support There are many people to thank for the advice and support that I have received during my time at UBC. Many thanks are due to Alan Kraus, my research supervisor, for guidance, patience, support and unsurpassed intellectual encouragement. I also thank the other members of my thesis committee, Jonathan Berk, Ken MacCrimmon, Vasant Naik and Michele Piccione for many stimulating and helpful conversations. There are also various members of the academic community who contributed generously of their time. These include, Murray Carlson, Peter Danielson and Jasmina Arifovic. For financial support, I thank the University of British Columbia and the Social Science and Humanities Research Council of Canada. Many thanks are also due to the faithful Tuesday/Friday basketball colleagues for helping me maintain my sanity and to the Commerce Cubs, 1995 Champions, for helping maintain the illusion of my youth. Thanks are due to my family for lots of encouragement. Finally, I thank my Marie who makes everything fun. -xvii- Chapter 1 Introduction Although the assumption of complete and unbounded rationality has been fruitful in economics, it is not without controversy. In the 1950’s, Simon (1955, 1959) argued that the global rationality assumption should be replaced by a more realistic view which recognizes the limits on information and cognition. Furthermore, the empirical investigations into the foundations of decision theory consistently find violations of its underlying assumptions. 1 Despite the controversy, the unbounded rationality assumption remains dominant in modern financial and economic theory. Two reasons for this persistence are the difficulty in finding tractable alternatives and the view that, even if agents are not rational, they are reasonable and therefore will eventually adapt their behaviour to that predicted by the rational model. Lucas (1986) captures the first sentiment when he 1 Sec Shoemaker (1982) and Camerer and Weber (1992) for surveys of this research. Learning, Evolution and SdfLOrganizazion writes of “... a tendency ... Introduction to disdain existing theories in favour of yet-to-be-constructed theories” (page 218). In the same article he also captures the essence of the second point when he states, “Technically, I think of economics as studying decision rules that are steady states of some adaptive process, decision rules that are found to work over a range of situations and hence are no longer revised appreciably as more experience accumulates.” (Page 218). This thesis will develop and consider tractable models of adaptive behaviour to demonstrate that adaptation will often lead to behaviour which is dramatically different from unbounded rationality. The argument that deviations from rationality should be small and unimportant is compelling. Even if economic agents are not fully rational they are reasonable. They have incentive to abandon behaviour which leads to poor results and imitate behaviour which was more successful or try novel ideas. Over time, after many decisions, agents should have a well developed set of behaviourial rules-of-thumb which are indistinguishable from a fully rational agent. This argument is especially appealing in fmance which focuses on items such as risk premia and stock price reactions. These items are determined by the aggregation of the anonymous trading behaviour of a large number of agents. Any individual anomalies should be unsystematic and thus have little impact on aggregate behaviour. However, for many economic models such those based on rational expectations, this argument is flawed. The rationality assumption is more than an assumption on individual behaviour. For many models, it is an assumption about what agents know about other agents. With rationality, -2- Learning, Evolution and S4fLOrganizadon Introduction the behaviour of the other agents offers no source of uncertainty. If I know what you know, then I know what you will do. 2 However, when behaviour is stochastic and adaptive, this is not the case. Individual anomalies do not vanish. 3 This thesis demonstrates this fact and explores its causes. This thesis investigates the conditions and reasons that behaviour in models with stochastic adaptive learning individuals differs from that predicted by models based on unbounded rationality. The reason that adaptive models are complex and need not lead to the unboundedly rational prediction is that agent behaviour does not adapt independently of other agents. The success (or fitness) of an agent depends on the adapting behaviour of the other agents. In these situations, behaviour co-adapts, or borrowing from biology, co-evolves. As discussed by Kauffman and Johnsen (1991) in the biological context, adaptive changes by one agent not only affect the fitness of other agents, but change their fitness landscapes (ie. the map which translates behaviour into fitness). For example, when frogs develop sticky tongues, ffies develop slippery feet. The development by the frog changes the desirability (fitness) of slippery feet. What was once a valley on the fly’s fitness landscape is now a peak. By altering the fitness landscape of other agents, adaptive moves by one agent alter the “target” of the adaptive process. 2 At least up to a point of randomizing between indifferent alternatives. This is analogous to the recent work in physics on Self-Organized Criticality (see Bak and Chea, 199,for an overview). This work considers systems where small shocks can have large effects. A defining characteristic of a critical state is one where events (such as avalanches or earthquakes) propagate according to a Power Law. While small events occur more often, all sized events are possible. -3- Learning, Evoludon and S4t’Organizudon Introdudion In economics, an agent’s utility (or fitness) almost always depends on the behaviour of other agents either directly, as in strategic situations, or indirectly through prices. 4 Thus the agent’s success or fitness is coupled (using Kauffman and Johnsen’s term) to the behaviour of other agents. Both of these situation are considered in the thesis’s two main sections. First, in Chapters 2 and 3, individual fitness is directly coupled to the behaviour of others via a repeated prisoner’s dilemma game. Since play in this game is limited to one’s neighbours, fitness is directly coupled with only a few others and the complexity of co adaptation is addressed directly. In contrast, in the second main section of the thesis, consisting of Chapters 4 through 6, agent fitness is coupled indirectly with all others’ behaviour. In an asymmetric information asset pricing environment based on the model of Grossman and Stiglitz (1980), individual traders need to learn an endogenous relationship between asset price and an exogenous signal of dividend value. Since all agents’ actions affect the price-signal relation, the success (or fitness) of one trader is coupled with all other agents. In the two environments studied in this thesis, individuals are considered to have less than complete knowledge. In the repeated prisoner’s dilemma model, agents do not know the strategy of the other agents. In the Grossman-Stiglitz based model, agents are assumed not to know parameters concerning the exogenous signal-dividend relation or endogenous pricesignal relation. In these models, therefore, the opponents’ strategies or the signal For example, Blume and Easley (1992) consider the fact that portfolio strategies of all agents affect the success (or fitness) of each individual strategy. -4- Learning, Evolution and Sef-Organization Introduction relationship must be learned. Throughout the thesis, stochastic adaptive learning will be considered. In each model, agents form reasonable rules-of-thumb and experiment by trial and error. 5 Good rules-of-thumb are imitated and used with more frequency, bad rules-of-thumb are abandoned and randomly selected novel rules are periodically introduced. Stochastic adaptive learning has the appealing property that “average” or “typical” behaviour can be predicted. Since learning is adaptive, agents, on average, choose the best action. However, individual behaviour can be predicted with less certainty as individual learning is stochastic and behaviour includes stochastic experimentation. In order to understand adaptive learning, it is useful to contrast adaptive learning with traditional Bayesian learning. In Bayesian learning beliefs are updated based on prior beliefs, observed information and Bayes Rule, and an optimal action, based on beliefs, is chosen. In an environment where success or fitness does not depend on other agents, 6 the comparison between the two models of learning can primarily be drawn from existing 7 The primary difference between the two approaches is whether or not complete literature. learning occurs (in the limit) and the true relationship is discovered. Bayesian learning models can lead to incomplete learning. With certain prior beliefs, the agent may not experiment sufficiently to learn the true parameters. This incomplete learning is & aiue “Reasonable” depends on the model context. In particular, the full-infonnation, rational behaviour is a possible rule-of-thumb. 6 A typical example is a monopolist learning about a demand function with an unknown parameter. It is important to note that the demand function in this example is treated as exogenous. If consumers were also adaptively learning, then the situation would be co-adaptive and far more complicated. See for example Easley and Kiefer (1988), Aghion et al (1991) and Bala and Goyal (1993). -5- Learning, Evolution and S4fOrganization Introduction rational because the pessimistic prior beliefs about the unknown parameter do not warrant experimentation. Stochastic adaptive learning, in contrast, will usually lead to complete learning. That is, the agent will learn the true parametric relationship and choose the expost optimal action. Since the agent tries new strategies (ie. stochastic experimentation), she will generate sufficient information to uncover the underlying process. Since behaviour is adaptive, the behaviour will tend toward the optimal action, given the true relationship. While it is likely that Bayesian and stochastic adaptive learning will converge to the same action, it is possible that an agent acting adaptively can be more successful (realize a higher utility or fitness) than the Bayesian rational agent. It is important to note that rational behaviour need not be favoured by evolution. 8 Adaptive learning selects behaviour which does well ex post while rational behaviour focuses on ex ante. As is obvious from the discussion thus far, the thesis draws on many diverse fields for input. Besides the research on Bayesian learning models of optimal experimentation discussed in the previous paragraph, the research on learning in both game theory and rational expectations models is relevant. 9 Since there is a natural relationship between adaptive learning and evolution, research in evolutionary game theory ° and evolutionary 1 8 9 10 This point is made in Blume and Easley (1992). . . Both of these branches of research are extensive. However, examples of relevant papers in rational expectations learning are Bray and Savin (1986) and Vives (1993). Examples of learning in game theory can be found in Jordan (1991), Milgrom and Roberts (1991) and Kalai and Lehrer (1993). Again, there are many examples which are relevant. However, the papers of Nachbar (1992), Binmore and Samuelson (1992), Kaudori, Mailath and Rob (1993) and Young (1993) are particularly relevant. -6- Imroduction Learning, Evolution and SeOrganization 11 are relevant. The economic notion that agents making systematic errors have economics an incentive to modify their behaviour parallels natural selection which eliminates poor genes. Thus the work in co-evolutionary biological systems is important (see Kauffman and Johnsen, 1991). The fact that adaptation takes place far quicker than biological evolution is of little importance. A more thorough discussion of this literature appears at the beginning of each of the two major sections (Chapter 2 and 4). I - Background on the Genetic Algorithm A chief advantage of considering learning as an stochastic adaptive process is that models can be easily simulated using a genetic algorithm which is a commonly used optimization technique. In this thesis, the genetic algorithm is used as an example of a stochastic adaptive learning process rather than for optimization. Genetic algorithms, developed primarily by Holland (1975), are biologically inspired stochastic optimization techniques. Genetic algorithms simulate the adaptive learning process where successful strategies proliferate and unsuccessful strategies die out. They also generate novel strategies by combining and mutating existing strategies in an efficient search of the strategy space. Besides the intuitive appeal of imitating good strategies, the genetic algorithm has the advantage of being easily implemented in the environments considered in this thesis. In addition, genetic algorithms have proven successful in many optimization applications and are gaining popularity as 11 See Dosi (1988) for a survey. -7- Introduction Learning, Evolution and Se/’Organization 2 models of adaptation in economics.’ Finally, there is the potential that if genetic algorithms are well understood in the simple environments presented in this thesis, they can be applied to more complicated, less tractable problems to guide intuition and research. Since genetic algorithms are an important example of a stochastic adaptive learning process 3 and are used in Chapters 2 and 6, a brief background on genetic algorithms is included.’ There are three parts to the to the genetic algorithm, mapping strings to strategies, mapping these strategies to fitness (utilities) and manipulating strings based on their fitness. The first two parts are problem specific. For example, a string may be decoded into a butterfly’s wing colour strategy. The wing colour strategy may be mapped into a probability of being captured by a predator. In this thesis, strings are decoded into a strategy for playing the prisoner’s dilemma or a trader’s inference about a risky dividend and portfolio choice. A string is a collection of genes which are selected from some alphabet (usually {O, 1}). A population of strings is manipulated by the genetic algorithm to form a new population, through selection, crossover and mutation. The selection, crossover and mutation are designed to search for good strings by focusing attention on the most promising parts of a strategy space. 12 13 Selection chooses strings with probability proportional to fitness (ie. a For example, see Axelrod (1987), Miller (1987), Marimon, MeGrattan and Sargent (1990) and Slade and Eaton (1990) and Arifovic (1994) all of which use simulations based on a genetic algorithm. All of the simulations using the genetic algorithm were programmed in Pascal and based on the sample code in Goldberg (1989). The simulations were ron on both PC-80x86 using Borland’s Turbo-Pascal and on an IBM RS6000 using IBM’s XLP compiler. For more information, please contact the author. -8- Introduction Learning, Evolution and S4f-Organization weighted roulette wheel). Crossover involves mating two strategies. Two strings that have been selected are cut at a randomly chosen point. Two new strategies are formed by recombining the front portion of string one with the back portion of string two and the front portion of string two with the back portion of string one. Crossover in the genetic algorithm directs the search. A strategy with one good feature (say a good parameter for making an inference or a good move at some situation in a game) can be combined with a strategy with another good feature. Since, on average, strings with good features propagate and strings with bad features die out, the genetic algorithm constructs a population of optimal strings. Finally, mutation involves randomly altering a few genes in a population to ensure enough genetic diversity in the search. While artificial selection may be intuitively appealing, it is not obvious that the genetic algorithm will iterate to an optimal solution. Genetic algorithms efficiently search by exploiting the fact that a string belongs to many regions (or schema) simultaneously. For example, the four-gene string 1101 and its associated fitness level provide information on strings of the form {*l**, l1’, 1K0K, .. .}, where the * represents an arbitrary 0 or 1. Similar to the solution to a k-armed bandit problem,’ 4 the genetic algorithm trades off the exploration for knowledge and the exploitation of known good strings. The choice between strings of the form **l** and **0** is a two-armed bandit problem. The choice between 14 The two-armed bandit problem, stated in terms of the slot machine analogy, is how to allocate trials over two slot machines to maximise expected profits. Since the average payoffs on the machines are unknown, optimal play involves a tradeoff between testing both machines and playing only the machine thought to have the highest mean. The k-armed bandit problem simply involves a larger choice set. See Goldberg (1989), page 36. -9- Learning, Evolulion and S4fOrganization Introduction **()()*, ‘lO, **O1* and **11* is a four-armed bandit problem. Primarily through the crossover step, the genetic algorithm is solving many k-armed bandit problems simultaneously. This is the intuition for Holland’s (1975) Schema Theorem which proves the effectiveness of genetic algorithms. 6 ” 15 The proof of the Schema Theorem is for situations where the fitness of an individual string does not depend on other strings in the population. However, in the situations considered in this thesis this is not the case. The fitness of a string can depend directly on a few other agents, as it does when individuals play a game (ie. the prisoner’s dilemma in Chapter 2). Alternatively, string fitness may depend indirectly on all agents’ behaviour as it does in Chapter 6 where traders are trying to infer information from an endogenously determined asset price. In these settings the Schema Theorem does not apply. It does not apply in co adaptive environments primarily because the concept of optimality is not well defined. For example, in a simple one-shot prisoner’s dilemma, 17 two rational agents defect. Yet two cooperative agents achieve higher utility (fitness). This, however, is an issue that applies to adaptive learning processes in general and is discussed more fully in the body of the thesis. The principal advantage of the genetic algorithm is that they capture the intuition of 15 Besides Holland (1975), Goldberg (1989) Chapter 2 and offer explanation of this theorem. Vose (1991) has a generalization of schema and genetic algorithms. 16 An important assumption in the Schema Theorem is that the probability that good schema (or features) are disrupted by crossover and mutation is small. While the restriction on the mutation rate is easily controlled, disruption caused by crossover depends on the nature of the problem and how strategies are coded to strings (see Goldberg, 1989, Chapter 4). In each of the two Chapters that use genetic algorithms alternative string coding schemes were tried producing similar results. 17 For a specific prisoner’s dilemma see Figure 2.1 in Chapter 2. -10- Learning, Evohaion and Set/Organization Introduction proliferating successful strategies as well as stochasticly introducing novel ones and do correspond with (ex post) optimality in simple settings. The specific details of how the algorithm was implemented is the spacial prisoner’s dilemma model and Grossman-Stiglitz model are in Chapters 2 and 6 respectively. II - Adaptation in Spatial Environment As was noted earlier, the thesis has two main components. The first section attempts to highlight the coupling of fitness landscapes by considering a spatial model. This section investigates directly the impact of co-adaptation or co-evolution. The interdependence which produces co-evolution is modeled using the well-known repeated prisoner’s dilemma game. Both the finitely and infinitely repeated versions are considered. It is important to state that the purpose of the section is not to provide another characterisation of when co-operation will or will not take place. Instead, the familiarity of the struggle between the need for joint co-operation and the temptation of exploitation is used to assist in understanding the complex evolutionary dynamics. Chapter 2, uses a genetic algorithm simulation to consider stochastic adaptive learning for individuals located on a lOx 10 grid (without boundaries) playing repeated prisoner’s dilemmas with each of their four immediate neighbours. The number of agents that receive a new strategy each generation can be considered to determine the rate at which the — 11 — Learning, Evolution and SeOrganization Introduction population learns or adapts. This rate determines the importance of location in the model. By varying the number of agents which can alter their strategy in any generation, the effect of co-evolution is modified. If a large proportion of agents receive new strategies, then any one agent will likely face neighbours with new strategies in each generation. optimal strategy must co-evolve with the entire population. Thus her If the proportion of agents receiving new strategies is small then most changes in the population will not affect her neighbours. However, if one neighbour does change strategy, the impact will be significant. In this case co-evolution will be a local phenomenon. The proportion of the agents replaced in any one generation, or the learning rate, is this model’s key parameter in the exploration of co-evolution. The results in Chapter 2 demonstrate that in the presence of co-evolution, the way evolution is modeled matters. In the infinitely repeated prisoner’s dilemma the population tends to evolve to one of the many Nash equilibria. Increasing the number of agents replaced per generation (the learning rate) reduces the mean and increases the variance of equilibrium population average fitness. In the finitely repeated game, the population converges to the unique Nash equilibrium only when a large number of agents are replaced. These results are related to the Kauffman and Johnsen (1991) result which relates the ruggedness (ie. number of local optima) of a fitness landscape to the length of time an evolution requires to converge to a Nash equilibrium. Finally, in the fmitely repeated game, the proportion of agents replaced that maximises population average fitness corresponds to critical behaviour -12- Learning, Evolution and S4fOrganization Introduction which suggests the possibility of Self-Organized Criticality (Bak and Chen, 1991). One could argue that the maximal population fitness makes the critical state a dynamical attractor. Again, this result is consistent with a Kauffman and Johnsen (1991) result. However, in the model developed in Chapter 2, the intuition of the prisoner’s dilemma game can be used to understand how and why optimal population fitness is related to critical behaviour. Chapter 3 analyses simpler models of stochastic adaptive learning in a spatial environment. In addition to a simpler spatial relationship, the interactions are the more rudimentary oneshot prisoner’s dilemma games. algebraic analysis is possible. The benefit of these simplifications is that tractable The chapter begins by modelling the spatial interactions impact on the correlation in which strategies get matched to play. The chapter argues that a replicator dynamics approach, typical of many evolutionary game theory models, will not be useful for this spatial setting since there is no natural way to parameterize the correlation in play induced by the spatial setting. The chapter then presents a Markov model of a simple four-person, two-interactions system. Despite the simplicity of this model, it captures the complex relationship between number of agents replaces and average fitness which appears in Chapter 2. ifi - Stochastic Adaptive Learning in a Financial Market - 13 - Introduction Learning, Evolution and SelfOrganization The second main section of the thesis abstracts from the strategic considerations which are central to the previous section. Instead, this section models a situation where all strategies must co-adapt with the entire population. In this section, many individual traders attempt to learn the information revealed by the competitive price of a financial asset. Chapters 4, 5 and 6 use a model similar to the single risky asset, one-period model of Grossman and Stiglitz (1980) to explore learning dynamics relative to the typical rational expectations assumptions. In the model, agents may choose to be informed by purchasing a common signal on the risky dividend or be uninformed and imperfectly infer the signal from the asset’s price. Traders repeat the one-period model many times and must learn two of the model’s exogenous parameters and the parameters of an endogenous relationship. Learning in this model underscores the difference between learning a parametric relationship (ie. nature) and learning an equilibrium relationship. Parametric relationships are unaffected by other agents. However, equilibrium relationships, which are of concern only to uninformed agents, depend on the learning of all the other agents and thus involve co-adaption or co-evolution. Chapter 4 presents the model and introduces stochastic adaptive learning in this environment. The chapter demonstrates that stochastic adaptive learning need not be inconsistent with the rational expectations equilibrium. The chapter constructs a family of stochastic adaptive learning process which converge to the rational expectations equilibrium. However, these -14- Introduction Learning, Evoluilon and S4f Organization processes all have the feature that experimentation (ie. trial and error portion of learning) becomes arbitrarily small. The interesting results are for the case when experimentation, however small, does not disappear. In this case stochastic adaptive learning may not produce behaviour similar to the Grossman-Stiglitz rational expectations equilibrium. In general, whether or not stochastic adaptive learning yields the rational expectations equilibrium (or some stochastic approximation of it) depends on a parameter in the economy and a parameter of the learning process. In particular if the level of noise trading in the economy is low relative to the individual experimentations, then the learning process will not produce behaviour similar to the rational expectations equilibrium. In the Grossman-Stiglitz model, the reason the uninformed agent’s inference is imperfect is due to the stochastic supply of the risky asset. The noisy supply, often referred to as noise traders, is important to the Grossman and Stiglitz’s original model. A rational expectations equilibrium proportion of informed agents exists as long as some (perhaps very small) noise trading is present. However, if the rational expectations equilibrium is used to model a world with agents who are learning using some stochastic adaptive process, it is necessary that the level of noise trading not be too low. The intuition for this result is presented in Chapter 4. Since stochastic adaptive systems are complex, Chapter 5 investigates this issue using deterministic dynamics as an approximation for stochastic systems. Finally, Chapter 6 presents a genetic algorithm simulation of the model using the genetic algorithm as an example of a stochastic adaptive learning process. - 15 - Learning, EvoluSion and SeOrganization IV - Introduciion Conclusion Understanding the decisions of individuals is the cornerstone of modern finance. This thesis expands that understanding by relaxing the demanding informational and computation assumptions. Introducing learning, particularly stochastic adaptive learning, can dramatically impact a model’s predictions. Besides simply failing to converge to a traditional equilibrium, models with stochastic learning can involve complex behaviour. - 16 - Chapter 2 Co-evolution and Spatial Interaction Although the assumption of complete and unbounded rationality in economics has been fruitful, it is not without controversy. Critics, such as Simon, have long argued that the assumption is inconsistent with what is empirically known about human thought and choice processes. 1 In response to this criticism, modelling agents as continuously and unboundedly optimising is often justified by an appeal to some evolutionary process. Over time, after many decisions, agents have a well developed set of behavioral rules-of-thumb. Good rules are imitated, bad rules die out and new rules are periodically introduced. The end product of this evolution is a set of rules which are indistinguishable from the rules of 1 See Simon (1987), for example. This criticism is also made in many of the papers in Hogarth and Reder (1986) and Langlois (1986). - 17 - Learning, Evolution and S4fOrganization Co-evolution and Spatial Interaction an unboundedly optimising agent. 2 This argument is strengthened by appeals to aggregation. Individual anomalies are not systematic and thus have no impact on average behaviour. However, these evolutionary arguments are flawed. Rules-of-thumb can rarely evolve independently of other agents. The success of an agent usually depends on the behaviour of other agents. In these situations behaviour must co-evolve simultaneously. When this occurs, the evolution mechanism can fundamentally affect individual and aggregate behaviour. Evolution need not lead to behaviour which is isomorphic to that of unbounded optimisation. These complexities of co-evolution are a topic of recent interest in evolutionary biology and are the theme of this chapter. 3 The interdependencies which necessitate co-evolution can take many forms. Other agents’ portfolio investment strategies affect the asset returns faced by all agents (Blume and Easley, 1992). Agents’ behaviour affects the endogenous relationship between economic variables and therefore influences the inferences agents should make (see Chapter 4). The most direct form of interdependence occurs in strategic situations. In game theory, success depends directly on the behaviour of opposing players. In fact, this chapter builds on work in evolutionary game theory. 2 Lucas (1986) is one such example when he states: “Technically, I think of economics as studying decision rules that are steady states of some adaptive process, decision rules that are found to work over a range of situations and hence are no longer revised appreciably as more experience accumulates.” (Page 218). Rosenzweig, Brown and Vincent (1987) and Kauffman and Johnsen (1991) are two examples of co-evolutionary theory in biology. - 18- Learning, Evolution and S4j’Organization Co-evolution and Spatial Interaction In this chapter, interdependence is modeled using the repeated prisoner’s dilemma game. Both the finitely and infinitely repeated versions are considered. It is important to state that the purpose of the chapter is not to provide another characterisation of when co-operation will or will not take place. Instead, the familiarity of the struggle between the need for joint co-operation and the temptation of exploitation is used to assist in understanding the complex evolutionary dynamics. Specifically, agents are located on a grid (without boundaries) and play a repeated prisoner’s dilemma with each of their four immediate neighbours. The number of agents that are replaced per generation determines the importance of location in the model. By varying the number of agents which can alter their strategy in any generation, the effect of co-evolution is modified. If a large proportion of agents are replaced, then any one agent will likely have new neighbours in each generation. 4 Thus her optimal strategy must co-evolve with the entire population. If the proportion of agents being replaced is small then most changes in the population will not affect her neighbours. However, if one neighbour does change strategy, the impact will be significant. In this case co-evolution will be a local phenomenon. The proportion of the agents replaced in any one generation is this model’s key parameter in the exploration of co-evolution. The purpose of the chapter is to explore the hypothesis that the way evolution is modeled matters. Evolving an optimal strategy as other agents are similarly evolving is fundamentally Equivalently the agent faces old neighbours with new strategies. It does not matter whether we think of the agents getting new strategies or being replaced by new agents. Strategies are modified only through evolution, there is no explicit learning where past interactions (beyond the current game) matter. - 19 - Learning, Evolution and S4/.Organization Co-evolution and Spatial Interaction different from choosing a strategy which is optimal on a fixed landscape. Evolutionary changes by other agents alter the landscape. An evolutionary mechanism which converges to optimal behaviour in a fixed (no co-evolution) environment need not converge in the presence of co-evolution. This chapter will demonstrate that how evolution is modeled not only affects the behaviour of the system in the short run, but dramatically impacts the long run characteristics. The equilibrium behaviour, if an equilibrium is reached, varies depending on how co-evolution is modeled. The familiarity of the prisoner’s dilemma will be used to understand why the characteristics of co-evolution matter. Specifically, in the infinitely repeated prisoner’s dilemma (a game with many Nash equilibria), increasing the proportion of agents replaced per generation tends to reduce the expected population average payoff (or fitness). The increased replacements also increase the variance of the average payoffs. More equilibrium payoffs are possible when a larger number of agents is replaced. In contrast to the equilibrium achieved in the infinitely repeated game, evolution in the finite game may not find the unique Nash equilibrium. With the finite game there is an interesting relationship between the average long-run payoff and the model’s stability. The same proportion of agents replaced that maximises long-run average payoffs leaves the model most sensitive to initial conditions and random fluctuations (ie. butterfly effects). If there existed some mechanism to select the optimal number of agents to be replaced per generation in the finitely repeated game, then the system will exhibit chaotic behaviour. 5 Finally, these Kauffman and Johnsen (1991), discussed below, use the term mete-dynamic to describe the selection of evolution parameters. An evolution dynamic determines individual agent characteristics based on individual fitness. A mete-dynamic, in contrast, effects evolution parameters based on the fitness of the entire population. - 20- Learning, Evolution and SeOrganization Co-evolution and Spatial Interaction results are compared to those of recent co-evolutionary biology. The similarity between our model and that of Kauffman and Johnsen (1991) is used to interpret both their results and the results of this chapter. The methodology used to investigate co-evolution in this chapter is simulation. In the next Chapter, algebraic analysis in a simpler environment supports and augments many of the results of the simulation analysis. The technique of using simulations to expand the empirical data base upon which theory can be constructed is typical of studies in the field of Artificial Life and Emergent Processes (see Langton, 1992). The simulation of evolution is based on a genetic algorithm. The genetic algorithm manipulates the agents’ strategies which are represented as bit string mappings. The genetic algorithm operators of selection, crossover and mutation efficiently imitate successful strategies and introduce new strategies to produce successive generations of the population. Besides the intuitive appeal, the genetic algorithm is easily implemented and has proved a successful optimisation tool in many settings. In addition, genetic algorithms have recently been applied in models where co evolution occurs. 6 The chapter is organized as follows. Section 2.1 discusses some of the literature related to co-evolution in biology, evolutionary game theory and the repeated prisoner’s dilemma. Section 2.11 introduces the spatial version of the game and the techniques used to simulate 6 See for example, Miller (1989), Marks (1989a, 1989b), Marimon, McGrattan and Sargent (1990), Slade and Eaton (1990), and Danielson (1992). - 21 - Co—evolution and Spatial Interaction Learning, Evolution and S4lOrganization evolution. The results for the infinitely and finitely repeated games are presented in Section 2.111. Section 2.IV discusses these results and their relationship with evolutionary biology. Section 2.V concludes. I - Background The relationship between theories of evolution in economics and biology dates back to the influences of Malthus on Darwin. 7 The roots of this study lie in both the recent biological theory of co-evolution and the economic study of evolutionary game theory. This section discusses some of the vast literature in these areas upon which this chapter draws. In biology, Kauffman and Johnsen (1991) provide both theoretical and simulation methodology to study the dynamics of co-evolution. In their model the agents are adapting to a fitness landscape that is both rugged (multi-peaked) and coupled with other agents. Adaptive moves by one co-evolutionary partner not only affect the fitness of the agent but also the fitness landscapes of other agents. For example, when frogs develop sticky tongues, flies develop slippery feet. The development by the frog changes the desirability (fitness) of slippery feet. What was once a valley on the fly’s fitness landscape is now a peak. The Kauffman and Johnsen framework focuses on the interaction among the size of gene, the ruggedness of the landscape and the degree to which landscapes are coupled. Gene size is Rosser (1992) provides an interesting analysis of this historical relationship. - 22- Co-evolution and Spatial Imeraction Learning, Evolution and S4fOrganization a measure of the complexity of the evolutionary task. Ruggedness is a measure of both the continuity and the number of local optima of the landscape. Finally, landscapes which are coupled to a higher degree buckle more severely when another agent changes strategy. These parameters control evolution characteristics like the length of time to achieve a Nash equilibrium, the stability of the system to perturbations in the initial conditions and the average fitness of the agents. Although the repeated prisoner’s dilemma model presented below differs from the Kauffman and Johnsen methodology, our model has analogous components and similar results. These similarities will be exploited in Section 2.IV in the interpretation of the results. In economics, evolution and the repeated prisoner’s dilemma is closely associated with the work of Axeirod (1984). His tournaments used game theorists, decision scientists and others to submit strategies to play a repeated prisoner’s dilemma game. This chapter, like the others noted below, replaces people with computer simulation to generate strategies. Axefrod (1987) was also the first to consider a genetic algorithm simulation in the prisoner’s dilemma. Axeirod used the genetic algorithm to breed an optimal strategy to play against a fixed environment (the winners from his previous tournament) and in an environment consisting of the evolving population. Miller (1989) and Marks (1989a, 1989b), building on Axefrod, use genetic algorithms to evolve a population of agents playing a repeated prisoner’s dilemma. Since these papers do not vary the degree of co-evolution which is - 23 - Co-evolution and Spatial Interaction Learning, Evolution and S4tOrganizadon present in their models, these papers do not specifically address co-evolution. This chapter differs from these studies by considering the effects co-evolution directly. Evolutionary game theory is a popular topic in modem economics. 8 It has developed to address the questions begged by the Nash equilibrium concept. Strategies form a Nash equilibrium when they are mutual best responses. However, the equilibrium concept does not address the issue of how agents jointly choose to play the equilibrium strategies. This question is even more important in games with multiple Nash equilibria. Even if agents are assumed to play Nash equilibria, they must somehow choose which equilibrium to play. Although many refinements of Nash equilibria have been proposed to select a unique equilibrium, none has gained universal acceptance. 9 Whether evolutionary game theory can answer the questions of how equilibrium is selected and reached remains to be seen.’° This chapter argues that how evolution is modeled effects the equilibrium selection and determines whether or not an equilibrium is reached. In many of the papers in evolutionary game theory, however, some of the assumptions about the evolutionary mechanism are implicit. 8 See Mailath (1992) survey of the papers contained in the symposium on evolutionary game theory in the Journal of Economic Theory. Kohlberg and Mertens (1986) provide a comprehensive discussion of refinements of Nash equilibrium. 10 Skyrms (1990) also discusses the process of arriving at a Nash equilibrium. Unlike evolutionary game theory where agents actually play the game, using the results to modi1’ their strategies, Skyrms considers a deliberation dynamic which occurs entirely before play begins. Similarly, Binmore (1988) considers game-playing machines which use an internal dynamic similar to Skyrms to predict their opponent’s play. - 24- Learning, Evolution and S4fOrganization Co-evolution and Spatial IiUeraction A central notion of evolutionary game theory is the evolutionary stable strategy (ESS) equilibrium concept. ESS is motivated by considering a population of strategies and the success of any possible mutant or invading strategy. The definition of ESS implies that it is a symmetric Nash equilibrium which is relatively unaffected by invasion by a few agents with alternative strategies (ie. invading strategies score lower fitness and do not survive). ’ 1 ESS helps determine whether a given population of strategies is likely to persist. The ESS concept plays an important role in interpreting the results presented here. However, evolutionary game theory which focuses on the dynamic properties of evolution will be more useful in developing theoretical analysis of the model presented below.’ 2 II - The Repeated Prisoner’s Dilemma Model and Evolution There are three elements that define this study. The first is the interaction between the agents which determines their fitness. The second is the evolutionary mechanism which controls how strategies are represented, created and modified. Finally, there are parametric assumptions for the simulations which can influence both quantitative and qualitative results of the study. Throughout this section it is important to be aware of the concern raised by For a fonnal definition of ESS, see van Damme (1987). For a discussion of the biological motivation for ESS see Maynard Smith (1982). ESS is closely related to other refinements of Nash equilibrium. For example, an ESS equilibrium is both a perfect and a proper equilibrium (van Damme, 1987). However, ESS need not exist and if it does exist, it need not be unique. For a discussion of ESS in the context of repeated prisoner’s dilemma see Binmore and Samuelson (1992). They also provide their own modification of ESS called Modified Evolutionary Stable Strategy or MESS. 12 These papers include Friedman (1991) and Nachbar (1992) in models without mutations and Kandori, Mailath and Rob (1993) and Young (1993) in models that include mutations. Friedman (1991) discusses the relationship between ESS and dynamic stability. Other than in two-person symmetric games (as in our model) ESS need not be necessary or sufficient for dynamic stability. - 25 - Learning, Evolution and SeOrganizadon Co-evolution and Spatial Imeracion Jefferson et. al. (1992). In their artificial life study, the authors express concern that some of the many assumptions of studies using simulation may beg deeper questions. Specific assumptions which appear innocuous may have a large influence on the behaviour of the system. (A) Agent Interaction The interactions between agents are modeled by the familiar repeated prisoner’s dilemma. It is the familiarity of the game that makes it a useful model of interactions, incorporating both competition and co-operation. It is important to understand the complex evolutionary behaviour in this simple environment before considering more sophisticated games. The payoffs in the stage (or one-shot) game are presented in Figure 2.1(a). Note that the interaction between agents is pairwise. While pairwise interaction may not be that common in economics, 13 it is a useful simplification. In particular, pairwise interaction keeps the strategy space manageable. The repeated game is an L-times repetition of the stage game with perfect information.’ 4 Strategies in this game are more complex than in the one-shot game since they can depend on the history of moves. History includes both your opponent’s previous moves and your previous moves. Thus a strategy must specify play in all possible histories, including those 13 14 Although not central to their study, this criticism of Axelrod’s environment is made in Slade and Eaton (1990). Miller (1989) considers the effect of introducing a noise so that agents do not know for certain the past actions of their opponent. - 26 - Co-evolution and Spatial Interaction Learning, Evolution and S4tOrganizaiion Me C D (0,3) C (2,2) (0,3) D (3,0) (1,1) Me You You (3,0) Payoffs to (You,Me) (b) (a) Addapted froni Bhmore and Samuelscai (1992). page 279. Figure 2.1 - The Repeated Prisoner’s Dilemma Game (a) Payoffs in the stage game (or one-shot) game. (b) Possible and equilibria per iteration payoffs in infinitely repeated game. that are impossible given your strategy. While this is the standard definition of a strategy in game theory it is by no means innocuous. The interaction of this definition and the evolution mechanism are discussed in Section 2.ffl. Traditional game theory analysis yields a unique Nash equilibrium when L is finite. 15 Regardless of how large L is, the equilibrium is always observationally equivalent to all D. Per iteration (or average) payoffs in this game are 1.0 for all agents. In contrast, when L is infinite, all average payoffs in the shaded region of Figure 2.1(b) are possible Nash equilibria payoffs. 16 However, not all of these 15 16 Traditional assumptions needed to suppolt this conclusion include common knowledge of the payoffs, the length of the game and of the rationality of the players. Nachbar (1992) focuses on evolution in the finite game. This is the “Folk Theorem”. For any point (payoffs) in the shaded region of Figure 2.1(b), strategies can be constructed that form a Nash equilibrium with these payoffs (provided agents’ discount factors are sufficiently close to 1 ie. future payoffs matter). A version of this appears in Fudenberg and Tirole (1991), page 152. - - 27 - Learning, Evolution and SeOrganizatlon Co-evolution and Spatial Interaction equilibria are possible in an evolutionary model. Payoffs (fitness) must be symmetric for evolution to converge. 17 Thus if Nash equilibria are reached in the infinite game, they will be along the thick diagonal line of Figure 2.1(b). Most evolutionary models that focus on pairwise interaction consider an environment where all agents play all other agents (Miller 1987 and Marks 1989a, 1989b) or agents are paired randomly from a population (Nachbar, 1992). Strategy fitness depends on the entire population of agents. In our model, agents are located on a two dimensional torus (a grid with no boundaries). All agents play a repeated prisoner’s dilemma with their four immediate neighbours. This environment was initially proposed in Axelrod (1984). Since the purpose of this chapter is to investigate co-evolution, a model with local interaction underscores the inter-dependence of strategies by restricting the dependence of an agent’s fitness to a subset of the total population. There need not be a globally optimal strategy for the game. An agent’s optimal strategy depends on his neighbourhood. If we think of location as geographic, then it might make perfect sense to trust your neighbour in West Vancouver but not in East Vancouver. However, location need not be thought of as geographic. Axelrod points out that location could be interpreted as characteristics in a product space. For example, completion in the luxury car market differs from that in the economy car market. 17 Symmetric payoffs are necessary for a population of strategies to be ESS. - 28 - Learning, Evolution and S4fOrganization Co-evolution and Spatial Interaction Although Axefrod had a similar form of interaction, his analysis was quite different than that presented here. The primary difference is the model of evolution. In Axelrod’s model, agents can adopt only the strategy of their neighbours. Thus evolution focused solely on local imitation. The evolution mechanism in our model is much richer, and is introduced below. (B) Evolution How strategies are represented and modified defines the evolutionary mechanism. The representation of strategies, the replacement and creation of strategies and the genetic algorithm are all discussed in this section. Figure 2.2 presents a schematic overview of this process. Note that the procedure is iterative. A tournament consisting of all agents playing a L-times repeated game with their neighbours is conducted, then the strategies in the population are updated, then a new tournament is played. Strategies do not change during a tournament. There is the potential for confusion since the game itself is dynamic (ie. L times repeated). However, strategies do not change during the tournament. It is best to think of agents submitting complete history dependent strategies which are executed automatically. In addition, in evolutionary models, such as this one, history refers only to the current game. Strategies cannot condition on what occurred in past games or tournaments. 18 18 Mailath (1992) discusses these assumptions of evolutionary models and their interpretation. If strategies could condition on past games (with other neighbours or in past generations) then the definition of the game would be changed. No longer would our intuition of the repeated prisoner’s dilemma be helpful. - 29 - Learning, Evolution and SeifOrganization Co-evolution and Spatial Interaction Randomly Select an Initial Population Play Repeated Prisoner’s Dilemma Using bit-string strategies, all agents play their four immediate neighbours a L repeated prissoner’s dilemma Calculate Fitnesses Fitnessea are the average per round payoffs Select R Agents for Replacement The R least fit agents are chosen to receive a new strategy Generate R Strategies Jsing the genetic algorithm operators of fitness roportional selection, croover and mutation, R strategies are created to replace the above agents. This process is stochastic. Figure 2.2 - Schematic for Evolution Simulation A genetic algorithm is used to create new strategies based on the results of the previous tournament. Genetic algorithms were developed by Holland (1975) and have proved effective in many diverse environments. They incorporate the intuitive appeal of imitation of successful strategies and, through crossover and mutation, can create novel strategies. Since the genetic algorithm is the primarily evolutionary tool, interpreting the results requires a basic understanding of genetic algorithms. A genetic algorithm is a directed random search. Agents’ strategies are represented as strings of bits (or alleles) chosen from some alphabet (usually {O, 1}). Strategy representation is important and is discussed below. New strings are formed through selection, crossover and mutation. Selection chooses strings randomly proportional to their fitness (success in - 30- Learning, Evolution and S4fOrganization Co-evolution and Spatial Interaction the tournament with their neighbours). Strings with higher fitness are selected with higher probability: a weighted roulette wheel. Crossover splices together two strings at a randomly chosen location. By combining successful strategies in this way, the genetic algorithm efficiently experiments with strings that are likely to produce superior strategies. Finally, mutation involves randomly altering a few bits to introduce genetic diversity. When genetic algorithms are implemented on fixed landscapes (ie. no co-evolution) they usually converge to the global optimum. What occurs in environments where co-evolution is present is the object of this study. 19 Strategies for the repeated prisoner’s dilemma game are represented as bit-strings. Strategies are restricted to pure strategies. ° Each possible history (the empty history, the 4 possible 2 histories after the initial move, the 16 possible histories after two moves ...) corresponds to a bit in the bit-string. The bit determines the agents play given that history. As long as L, the length of the game, is finite so are the string lengths. 21 To implement simulations for 19 20 21 The Scientific American article by Holland (1992) provides a short overview of genetic algorithms and Goldberg (1989) provides a thorough introduction. The proof that the genetic algorithm will converge to the global optimum is Holland’s (1975) Schema Theorem. Goldberg (1989), Chapter 2 and Vose (1991) offer proofs of this theorem. Goldberg also provides a discussion of the premises of the theorem. The Pascal code for the simulations was adapted from Goldberg (1989). The implementation of fitness-based selection is via the stochastic remainder method as described in Goldberg (1989), page 123 which is slightly different than a weighted roulette wheel method noted in the text. It is not difficult to modify the representation of strategies to include mixed strategies. However, this would substantially increase the required string length. Since there are four possible Outcomes from play in the stage game, {(C,C),(C,D),(D,C),(D,D)), the length of the bit string L.l). The string 4 %.( required to describe strategies in the L-times repeated game is given by the geometric series, can be ordered many different ways. Whether, for example, the initial move is given by the first bit, the last bit or bit 17 could influence the likelihood that some strategies are created. Bits which are located close together in the bit-string have a lower probability of being separated by the crossover operator of the genetic algorithm. This point is discussed in Goldberg (1989), Chapter 2 in relation to the Schema Theorem. To alleviate this concern, simulations were sun using different bit-string mappings. This has little effect on the results. -31 - Co-evolution and Spatial Interaction Learning, Evolution and S4/Organizazion the infmitely repeated game, strategies are bounded to have a memory of three rounds. In games of length greater than three, agents can no longer implement the strategy “D in the last round”, since their limited memory cannot count the moves (see Nachbar, 1992). By eliminating strategies which can condition on the length of the game, L> 3 will represent an infinitely repeated game. Note that this restriction still leaves many possible strategies. The bit-string length of 85 bits means there are 285 or 3.9x 1025 possible strategies. In particular, even though memory is resthcted, trigger strategies are possible. 22 The bit-string representation was first used by Axeirod (1987) and appears in Marks (1989a, 1989b). An alternative representation is to represent strategies as finite automata or state machines as used in Miller (1989).23 For genetic algorithm implementation of the automata, the string is partitioned into a number of states. Each state consists of three parts. The first determines the move in the state. The second and third parts determine the next state given a C move or D move from the opponent. While the two representations are equivalent in theory (Kalai and Stanford, 1988) there are some practical differences. For a given string length, more complex (longer memory) strategies are possible using finite 22 Typical trigger strategies usually include items like “once my opponent plays D, Twill play D forever”. This strategy can be implemented in a finite bit-string since the memory-limited history includes the agent’s own past moves. Binmore and Samuelson (1992) consider an evolutionary model using the state-machine representation and include diagrams for many strategies in the infinitely repeated game. - 32- Learning, Evolution and S4/Organization automata than bit-string representation. Co-evolution and Spatial Interaction However, for finite games (such as the twice repeated game considered here) the bit-string representation is easier to analyze? 1 As was mentioned in the introduction, the most important parameter of this model is the number of strategies replaced in each generation. It is assumed that of the N agents in the population only the R agents with lowest fitness are selected to receive a new strategy? 5 Recall that new strategies are created by the genetic algorithm based on the entire current population. One can think of the ratio RIN as the degree of satisficing behaviour (ie. agents are contented with a certain rank) or the cost of acquiring a new strategy. 26 By varying this ratio the importance of location in the model is changed. This changes the nature of co evolution. The probability that your neighbour may adopt a new strategy determines whether the success of a strategy depends on whether it is well adapted locally or whether success requires the strategy to be well adapted to the entire population (or more precisely well adapted to strategies which are likely to be generated from the current population using the genetic algorithm). 24 Although their models differ from this one, for comparison, Miller There is one additional possible representation for strategies called genetic programming (Kosa, 1992). This approach represents the strategies as programs of functions (such as LISP). In more complex environments, genetic programming lets evolution literally choose and combine rules-of-thumb like “trust your neighbour.” In particular, the string length in these models need not be fixed. However, the results from genetic programming are very difficult to interpret and involve difficult methodological questions. Danielson (1992) uses this approach in a modified version of the one-shot prisoner’s dilemma game. Agents are ranked according to their fitness with ties being broken randomly. In short games, where ties are more frequent the tie breaking assumption can influence the results. 26 An alternative to selecting agents based on their rank would be to select agents whose fitness fell below some threshold. Determining replacement based on a threshold would be more consistent with satisficing behaviour (Simon, 1987) or the existence of an opportunity cost for the agents to remain in the game. However, rank-based replacement provides for more precise control of the co-evolutionary environment and therefore best suits the objective of this chapter. - 33 - Co-evolution and Spatial Interaction Learning, Evolution and SeOrganization (1989) replaces 33% of the population and Danielson (1992) replaces 50% of the population each generation. The R agents whose strategies are replaced in a generation are selected based on fitness. By determining replacement based on fitness the probability that your neighbour receives a new strategy depends upon his fitness. This is crucial for the RIN ratio to have a significant effect. If a group of strategies in one region work well together (ie. all have high fitness), then with a low R/Nratio, these strategies will not change. If, for example, replacement was unrelated to fitness, then regardless of the RIN ratio some strategies in this group will be replaced. (C) Parametric Assumptions The parametric assumptions that define these simulations are presented in Table 2.1 and the payoffs used for the iterated prisoner’s dilemma game are in Figure 2.1(a). This section focuses on the definition of fitness and the length of the repeated prisoner’s dilemma 28 game. 27 28 Stochastic replacement of a single agent is used in Chapter 3 where the probability of non-fitness based replacement as the analog to the RIN ratio. Other parametric assumptions are also important. However, results from varying these parameters suggest they do not dramatically effect the quality of the results presented here. - 34- Learning, Evolution and Sel-Organization Co-evolution and Spatial Interaction Population size, N Maximum generations Crossover probability Mutation probability 100 100 to 17,500 0.7 0.0001 Number of neighbours each agent pkiys in tournament Number of agents to replace each generation 4 1 to 100 1, 2, 10 Number of iterations in the prisoner dilemma game Corresponding length of bit-string representing agent’s strategy 1, 5, 85 Table 2.1 - Parameters for Simulation Fitness in this model plays two roles. The first is to determine (deterministically) the R agents that will receive new strategies. Second, fitness is used to determine the probability an existing strategy will be selected by the genetic algorithm. Ignoring the cross-over and mutation operators, fitness determines the probability an existing strategy is imitated by the R agents receiving a new strategy. Fitness for the N agents is defmed as the average per iteration payoffs. Where 7 is the total payoff from playing the L times repeated prisoner’s dilemma against four neighbours, fitness, J, i=1,.. ,N, is defined as: . = .IL 4L Given the payoff matrix in Figure 2.1(a), all fitnesses are positive. (2.1) Thus, fitness proportional selection means that the probability an existing strategy is selected by the genetic algorithm is given by - 35 - Learning. Evolution and SdJOrganizarion Co-evolution and Spatial Interaction Prob(select i) = (2.2) This definition implies that the cardinal value of fitnesses matters. If for example, a positive constant is added to the payoffs, the equilibrium game theory results are unaffected. In addition, the fitness rankings, which determine the agents that will receive new strategies, are also unaffected. However, the probability any one strategy is selected by the genetic algorithm changes. From equation (2.2), adding a constant moves the probability that a specific agent’s strategy is selected closer to a uniform (1/N) probability? 9 Thus sealing payoffs differently can change the long run properties of the system. For example, by de emphasising strategies that are initially successful (say an all D strategy), co-operative niches may form. In evolutionary models where not all possible strategies need be represented in the population, the cardinal properties of the payoffs can affect the results qualitatively. 30 Similar to the choice of payoff matrix, the length of the iterated prisoner’s dilemma that the agents play influences the results. The length of the game, L, determines the importance of an individual move. In a long game, the advantage (increase in fitness) gained from playing 29 There are procedures available for the genetic algorithm which scale fitnesses to seemingly avoid this problem. See Goldberg (1989) or Miller (1989). However, the scaling procedures require the modeller to speci1y the significance of relative importance. In optimization applications of the genetic algorithm, there are recommended levels for this parameter. However, in models with co-evolution, the choice of this parameter would be arbitrary. This is in contrast to models of evolution which employ replicator dynamics. In these models, adding a constant to fitnesses usually only affects the speed at which convergence occurs. See Mailath (1992) for a discussion of replicator dynamics. - 36 - Learning, Evolution and SeI’Organization Co-evolution and Spatial Interaction D instead of C is much smaller than in a short game. As Marks (1989a) notes, there is a relationship between the length of the game and the theoretical implicit discount rates in a truly infmitely repeated game (ie. the importance of future payoffs). When L is increased the importance of joint co-operation (alternatively, the insignificance of any single move) swamps any effect of varying the number of agents replaced. In tournaments where L is large, there is little distinction between local and global co-evolution. To simulate the infinitely repeated game, L= 10 is used where memory of the strategies is restricted to three moves. This value is much lower than those used in Miller (1989) and Marks (1989a, 31 A small value of L is chosen to emphasise the R/N ratio. The results for these 1989b). tournaments are discussed in the next section. ifi - Results The results are presented in two parts. First, the results for the infinitely repeated game represented by a game of 10 moves with memory limited strategies are discussed. Second, the results for a one-shot and twice repeated game with no restrictions on the strategies are presented. In general, simulations produce a great deal of data. To understand the results of the simulations it is necessary to effectively summarize the data. The primary summary 31 More important than the length of the game is the number of moves each agent must make in the tournament (since each agent has have only one strategy for all games). In this chapter each agent plays 4 games of length 10 for a total of 40 moves. In Miller (1989), each agent plays 30 agents at a game of length 150, for a total of 4,500 moves. In Marks (1989b) a tournament consists of 100 games of length 22 for a total of 2,200 moves. Calculating the implicit discount rate for the infinitely repeated game requires considering the total number of interactions and not just the length of the game as is shown in Marks (1 989a, page 6). In addition, although their payoff matrices are similar to the one used here both these papers scale the fitnesses to alter the probability of selection (see footnote 29). - 37 - Learning, Evolution and SefOrganization Co-evolution and Spatial Interaction end of the chapter begfrznfng on page 63. statistic considered is the average fitness of the population. This statistic is presented both by generation for a fixed R (ie. time series) and across different levels of R for a given generation (ie. cross-section). In addition, to interpret the effect of local interaction and the evolutionary mechanism, animation is used to present information on individual agent strategies, fitness and location on the torus. (A) Infinitely Repeated Prisoner’s Dilemma Figure 3 presents a cross-section of population average fitnesses for different numbers of agents replaced (R= 1,2,... ,99, 100). All of these simulations begin with the same initial population and report the population average fitness in the final generation. The final generation is defined as the first generation when the variance across individual fitnesses is zero (ie. all agents achieve the same payoffs). When this occurs the evolution mechanism is said to converge. Convergence is a necessary (but not sufficient, as will be noted below) for the population of strategies to be ESS. Of the 100 simulations reported in Figure 3, 7 did not converge by generation 2000. For these cases, the average fitness in generation 2000 is reported. - 38 - Learning, Evolution and SerfOrganization Co-evolution and Spatial Interaction Recall that fitness is defined as payoff per iteration. Individual fitnesses can be in the range between 0 (perfectly exploited) and 3 (perfectly exploiting). However, since for every agent scoring a 3 in one iteration there is an agent scoring 0, population average fitnesses must fall in the closed interval [1,2]. For the same reason, the expected population average fitness of a randomly selected initial population is 1.5. Finally, note that the average fitnesses on Figure 3 moves in increments of 0.1 (for those cases which converged). A necessary condition of ESS is that the behaviour of all agents is identical. In ESS equilibrium, there will be no (C,D) or (D,C) outcomes. 32 Since the game length is 10 moves, 0.1 represents the fitness increase from achieving one additional (C,C) instead of a (D,D) outcome. From Figure 3 the effect of varying the number of agents replaced per generation can be seen. R tends to reduce the average fitness (more low fitness cases occurred) and increase the variability of the population average fitness. However, these results are for only one randomly chosen initial population and the creation of new strategies by the genetic algorithm is stochastic. To confirm these initial results, Figure 4 presents cross-section average fitnesses for 20 different initial populations and 20 values for the number of agents replaced per generation (R=5,10,...,95,100). Using the 400 data points from this simulation, the results for the regression of the number of agents replaced per generation on population average fitness are reported in Table 2.2 (top 32 A population of strategies that call for such asymmetric behaviour will not survive the crossover operator of the genetic algorithm. -39- Learning, Evolution and S4tOrganization Population average fitness in final generation, F Number of agents replaced per generation, R Dependent Variable: Independent Variable: Number of Observations R-Squared: Standard Error: - 400 0.2291 0.2853 Fitted Regression: (t-ratio) Table 2.2 Co-evolution and Spatial Interaction F = 1.9366 (64.36) -0.00537 X R (-10.87) Regression Results Data: 20 different initial populations and R=5,10,15,. ,95,1CK) used to generate 400 data points for regression. . . of next page). 33 The regression indicates that R explains 23% of the variation in the average fitnesses and the regression coefficient on R of -0.005 is statistically different from zero. Thus, average population fitness declines as the number of agents replaced increases. In addition, the dispersion about the regression line is increasing in R (see Figure 4). In particular, note that it is only when R>50 does the population average fitness reach its lower bound of 1. 0. This heteroscedasticity implies that when many agents are replaced each generation, the evolution mechanism is much more sensitive to random fluctuations. This is consistent with the intuition that when agents co-evolve locally (ie. small R) a mutation or small change in initial population will have a local effect. However, when R is large small changes can affect the whole population. The regression results are similar if only the 337 cases where the genetic algorithm converged. The fact that the 63 cases of non-convergencetend to lie below the regression line (ie. below average) is curious. This experiment was repeated increasing the number of simulations to 2600 with almost identical results. This fact suggests that a single regression may not be appropriate for the entire range of R. However, I cannot reject the hypothesis that the regression coefficients when R 50 equal the coefficients when R> 50 (via an F-test of linear restrictions). Thus a single regression is suitable. - 40- Learning, Evolution and Set/Organization Co-evolution and Spatial Interaction Figure 5 presents average fitness per generation (time series) for three different values of R. The three simulations all begin with the same initial population and converge by generation 250. These are typical of the cases which constitute Figures 3 and 4. Of the cases that converge, convergence usually occurred before generation 250. Typically average fitness falls in the initial generations reflecting the proliferation of strategies like all D which have an advantage in randomly created populations. Whether or not average fitness continues to decline depends on the number of agents being replaced. Replacing 8 agents per generation is typical of a case where convergence to high population average fitness (1.9) occurs. Figure 6 presents a series of frames from an animation program used to interpret the simulations. The animation represents locations, strategies and fitnesses of agents as disks on a grid. Recall, that the agents are located on a torus and thus the grid has no boundaries. Individual agent fitness determines the radius of the disk representing the agent (population average fitness is in Figure 5)•35 The shading represents the number of C’s executed by the agent relative to 40 possible moves each agent makes (4 games with 10 moves each) in a generation according to the scale in Figure 6(a). 36 Figure 6(b) is the initial (randomly selected) population. The population changes little in the first 100 generations (Figure 6c) with the exception that there are fewer extreme fitness values. Note that this representation emphasises fitness differences since the area of each disk will be proportional to the square of fitness. 36 The simulation software shows these animations using colour. Unfortunately, only black and white images can be presented here. - 41 - Learning, Evolution and Seq-Organization Co-evolution and Spatial Interaction Agents being exploited in generation 0 have received new strategies. By generation 130 a few niches of agents behaving co-operatively has developed (middle-right of Figure 6d). The agents in these niches have high fitness. Not only are these agents not replaced, but those agents which are replaced are replaced with strategies created (most likely) from members of the co-operative niches. spreads. By generation 135 the strategies of the co-operative agents By generation 150, although agents’ bit-strings are not identical, all agents’ strategies yield the same behaviour; D on the initial move and C on the remaining 9 moves. From investigating the bit-strings for this population, it does not appear that the C’s are played unconditionally. There are many D’s in the string which will prevent all D strategies from exploiting the agents. In addition, simple strategies like all C are identified by their initial move and are quickly exploited. 37 Animations for the other simulations presented in Figure 5 are similar and are not presented. Since the strings are large, it is difficult to determine if the strategies generated in the simulations form a Nash equilibrium. It is certainly possible since any payoff in the closed interval [1,2] can be the outcome of a Nash equilibrium. The evidence from the three simulations in Figure 5 is suggestive since the populations are robust to replacing one agent with strategies like all D or all C. However, to conclude that the strategies form a Nash Some populations did achieve C’s played in all 10 moves. However, in those cases where both (C,C) and (D,D) were observed in a population which converged, the (D,D) outcomes appeared in the first moves of the game and the (C,C) moves occurred in the last moves of the 10 move game. Strategies of this form are typically equilibria in games played by finite automata with strategic complexity considerations (see Binmore and Samuelson, 1992). In these games, there is a (lexicographic) preference for simpler strategies. All states of the automata must be used in equilibrium to ensure agents cannot benefit by playing simpler strategies (like all C). - 42- Learning, Evolution and SdfOrganization Co-evolution and Spatial Interaction equilibrium requires showing that for each agent, there does not exist one strategy (of the 3.9 x 1025) that increases the agent’s fitness holding constant the strategies of the rest of the 38 population. Even if the strategies form a Nash equilibrium, they need not be ESS. Figure 3 presents the population average fitness for the first generation where the population converged. Figure 7 presents the change in population average fitness from the time convergence occurred and average fitness in generation 2000. Of the 93 cases of convergence in Figure 3, the population average fitness changes in 31 of the cases when the simulation is run longer. 39 Thus at least these 31 cases were not ESS. The average change in fitness is 0.003. Note however, R affects the stability. When the number of agents replaced in a generation is high, there are more cases of instability. This is consistent with the heteroscedasticity of average fitness increasing in R that is discussed above. Figure 8 presents one example where the population converged but did not remain stable. In this run, 67 agents are replaced each generation. At generation 105 the population converged to an average fitness of 1.3 (reported in Figure 3 for R = 67). Between generations 415 and 426 the population diverges and then re-converges to an average fitness of 1.4. Figure 8 shows that this occurred several times. The population converges (the horizontal lines) then diverges before settling at an average payoff of 1.0 after generation 38 Finding such a strategy in a fixed environment is similar to the Axeirod (1987) environment. The genetic algorithm as an optimization technique is one method of determining if such a strategy exists. Of the three cases presented in Figure 5, all are stable and average fitness does not change when the simulations are sun to generation 2000. - 43 - Learning, Evolution and S4f-Organization Co-evolution and Spatial Interaction 1500 (which may or may not be stable). The change between generation 860 and 890 is typical (of this and other runs) and demonstrates the effect of local interaction and the evolutionary mechanism. The average fitness for these generations are reported in the insert in Figure 8. Figure 9 presents 12 frames from the strategy animation during this time. At generation 864 (Figure 9b) the population exhibits all D behaviour. This behaviour had persisted since generation 779. At generation 865 the evolution mechanism created a co operative niche of strategies that does slightly better than the rest of the population (bottom centre of Figure 9c). The agent at the centre of this group is able to induce some co operative moves from all four of his neighbours. In generation 867, this pattern is repeated in several locations. Note that the few co-operative niches have above average fitness. They will not be replaced (only bottom 67 are replaced) and have a better change of being selected by the genetic algorithm to replace other strategies. In generation 870, these niches have begun to overlap. As the groups get larger there is more co-operative behaviour induced and more co-operative strategies are created (generations 871 and 872). This process continues until generation 886 (Figure 9n) when agents are co-operating on 5 of the 10 moves in each game. At this time, the population has again converged. 40 40 This instability of the converged population at generation 864 (fitness of 1.0) is similar to that discussed in Binmore and Samuelson (1992). They discuss why an all D equilibrium in the infinitely repeated game is susceptible to invasion from a small group of strategies. It is important in their discussion that the invading strategies both appear in a group and can recognize one another. In this model the invading niche in generation 865 interacts only locally. Local interaction plays the role of recognizing fellow members of the invading group. -44- Learnzng, Evolution and SdfOrganization Co-evolution and Spatial Interaction (B) Finitely Repeated Prisoner’s Dilemma Game The finitely repeated game is distinct from the infmitely repeated game discussed above in two respects. First, the Nash equilibrium where all agents play D is unique. Second, the memory of the agents strategies is not bounded. In the one-shot game there are two possible strategies, {C,D}, while in the twice repeated game there are 32 possible strategies (listed in Figure l3).’ Not only are all these strategies possible, they are all (equally) represented in the initial population. The evolution mechanism converges on the Nash equilibrium in the one-shot game. However, in the twice repeated game the mechanism does not always find the Nash equilibrium but displays interesting dynamic properties. (i) The one-shot game In the one-shot game the strategy strings are single bits. In this case, the genetic algorithm reduces to an evolutionary mechanism of imitation. Regardless of the number of agents replaced per generation, the system converges on the equilibrium with all agents playing D. However, a co-operative niche can develop and temporarily survive. Figure 10 presents the animation for a typical one-shot game simulation where one agent is replaced each generation (R= 1). The initial population (Figure lOb) shows the random distribution of agents playing C or D. The success of the D strategy causes this strategy to spread (Figure lOc). However, a niche of agents playing C as in Figure 10(d) can survive for several generation. 41 Note that some of the strategies in Figure 13 are observationally equivalent. There are only 8 distinct pure strategies in the twice repeated game (see Nachbar, 1992). However, requiring strategies to speci1r moves in “impossible” histories is important in the presence of crossover and mutation. For example, strategies 0,2,8,10 all play C on both the first and second moves. Although these strategies produce identical behaviour, they are not equally effected by the genetic algorithm. In the event that the initial move (bit) was changed (mutated) to a D, these four strategies are no longer equivalent. - 45 - Learning, Evolution and S4/Organization Co-evolution and Spatial Interaction The agents on the outside boundary of the niche have fitness of 1.5 The agents in the niche have fitness 1.0 have a fitness of 1.0 (= [2+2+0+0] ÷ 4). (= [1+1+1+1] ÷ 4). (= [1+1+1+3] ÷ 4). All other agents play D and Since the agents in the niche have the same fitness as most other agents, it can take several generations before an agent in the niche is selected to receive a new strategy. 42 Eventually one of the niche agents receives a new strategy and plays D. This occurs in generation 52 (Figure lOe). The niche can no longer survive since two members now have fitness of 0.5 (=[2+0+0+0]÷4) which is the lowest in the population. The remaining niche members receive a new strategy and play D. By generation 55, the entire population reaches the Nash equilibrium and plays D. The example in Figure 10 highlights the sensitivity of the simulations to parametric assumptions. For example, consider the effect of modifying the payoff matrix in Figure 2.1(a) slightly. Suppose the payoff to the (C,C) outcome is changed to 2 + ½ (where € is a small positive number). The co-operative niche in Figure 10(d) will now survive indefinitely since the fitness of the members is 1+ € strictly higher than the many agents playing D. This sensitivity is the result of choosing the R agents that receive new strategies deterministically. The sensitivity is of less concern in longer games where a greater variety of fitnesses are possible and ties occur less frequently. In the twice repeated game results that follow, the stability of populations to strategy perturbations is explicitly considered to address this concern. 42 Recall that the R least fit agents are replaced and where fitness ties are broken randomly. -46- Learning, Evolution and SeOrganization Co-evolution and Spatial Interaction (ii) The twice repeated game In the twice repeated game whether or not the evolution mechanism converges on the Nash equilibrium depends on the number of agents replaced per generation. When more than 50 agents are replaced, the system does converge to the all D Nash equilibrium. On average, convergence occurs before generation 30. However, when less than 50 agents are replaced, the system usually fails to converge. Figure 11 presents a cross-section of population average fitness in generation 5000 for simulations with the same initial population but different values of R. To understand the behaviour of those simulations which fail to converge, Figure 12 presents the minimum and maximum population average fitness that occur in generation 5000 to 6000 as well as the average for that period. 43 Identified on the graph are four regions of evolutionary behaviour; Frozen, Critical, Stationary and Converged.’” The defining characteristics of the four regions are associated with timeseries pattern of population average fitnesses and the related animations that occur after generation 5000 (ie. long run behaviour). In particular the sensitivity to initial conditions and the stability of the simulations to perturbations are important. 45 The minimum and maximum fitness values reported are for population average fitnesses, not individual agent fitnesses. The boundaries between the regions are somewhat arbitrary. Simulations which fall near the border often display properties of both regions. However, the behaviour of simulations on the interior of the regions is distinctive. It is important to consider the stability of the populations to small perturbations (ie. mutating bits of strategies) since the mutation operator of the genetic algorithm functions only when creating strategies for agents selected for replacement. The perturbations considered will identify situations where the properties of the population hinge on the deterministic replacement without subjecting the whole population to a mutation operator. This is the case in the one-shot game with the payoffs modified by an - 47- Learning, Evolution and SeOrganization Co-evolution and Spatial Interaction In the Converged Region (R 50) population average fitnesses are 1.0 for all generations past 5000 (usually for all generations past 50) and reflect the all D behaviour of Nash equilibrium that is reached. The population average fitness of 1.0 is not sensitive to the initial population chosen for the simulations. The populations of agents in generation 5000 are very robust to perturbations. Even when a clique of co-operating agents is introduced (manually), the populations quickly return to the all D behaviour as the co-operative niche members are replaced. The populations usually consist of agents using strategies 26, 27 30 and 31 listed in Figure 13. What is interesting to note about these populations is that if an agent mutates to play C on the initial move, it will not elicit a C from the opponent on the second move. The mutant strategy will not survive. A population of agents playing strategy 28 or 29 results in all D behaviour. However if a single agent mutates to play C on the initial move, a C will be played by the opponent on the second move. The mutant strategy will achieve above average fitness and survive. In this case, the convergence would not be stable. The Frozen Region occurs when the number of agents replaced is low (R 9). After the few initial generations the system freezes at a point where the same agents are replaced each generation. For example, the eight dark agents in Figure 14, where R=8, are surrounded by agents which play D. In particular two of the neighbours of each of these agents have the all D strategy (strategy 31 in Figure 13). No matter which strategy the dark agents choose, the best they can do is to play D and achieve a fitness of 1.0. However, even with - 48 - Learning, Evolution and S4f-Orgarnzation Co-evolution and Spatial Interaction this strategy, these agents rank at the bottom of the population (other agents achieve some co-operation) and are selected for replacement again. Figure 15 shows the population average fitness for generations 5000 to 6000 for this simulation. Not surprisingly, there is very little variation in the average fitness. Figure 16 shows the distribution of strategies in the population. Although the distribution has changed from the initial distribution, the strategies in generation 5000 are more disperse than those for simulations in the other regions (see Figures 18 and 20). The results in the Frozen Region are not sensitive to initial populations or perturbations to the population at generation 5000. Perturbations and changes to the initial population change which agents are replaced in perpetuity. However, despite initial changes and perturbations, the system returns to the frozen state. In the Stationary Region (21 R 49), the population average fitness is distributed about a stationary mean as in Figure 17 (where R=34). In this region, all agents receive a new strategy every two or three generations. There are no neighbourhoods where agents retain their strategy for long periods of time. In this system innovative strategies travel quickly. Both exploitive strategies (like all D) and groups of strategies that attain some co-operation spread quickly and prevent the system from substantially increasing or decreasing the population average fitness. While the population does not converge, it also does not display the non-stationary pattern of average fitnesses that occur in the Critical Region (See Figure 19). This is despite the similarity of the population of strategies that exist in generation 5000. Figure 18 shows the distribution of strategies in the Stationary Region and Figure 20 - 49 - Learning, Evolution and S4tOrganizazion Co-evolution and Spatial Interaction shows the distribution for the Critical Region. Finally, populations in the Stationary Region are robust to changes in the initial population and perturbations of the population in generation 5000. The last region of behaviour is the Critical Region (10R20). This region is defmed by a non-stationary pattern of population average fitness and a high degree of sensitivity to both initial population conditions and perturbations. associated with chaotic systems. These characteristics are usually However, the spread of new strategies through the population is slower and, in some sense, more orderly in the Critical Region. Some neighbourhoods of the population remain frozen for many generations before new strategies are adopted. These properties are discussed and demonstrated below. It is interesting to note that population average fitness is highest in the Critical Region. This result is consistent with the Kauffman and Johnsen (1991) result and is discussed further in Section 2.IV. Figure 19 displays a typical pattern of population average fitnesses in the Critical Region. In this simulation there are 14 agents replaced each generation. Although the distribution of strategies in the population at generation 5000 (Figure 20) is very similar to that which occurs in the Stationary Region (Figure 18), the evolutionary behaviour is quite different. 46 A technical description of critical systems is discussed in Bak, Tang and Wiesenfeld (1988). Their work is concerned with systems which organize to a critical state. For an overview of Self-Organized Criticality (SOC) see Bak and Chen (1991). In addition Bak, Chen an Creutz (1989) discuss SOC in the “Game of Life”, Bak, Chen, Scheinkman and Woodford (1992) discuss Soc in an economic model with production and Kauffman and Johnsen (1991) discuss SOC in co-evolution models. The defining characteristic of critical behaviour is Power Law distributions. A Power Law distribution is one in which the natural logarithm of the size of some event (say an earthquake or change in population average fitness) is proportional to the logarithm of the frequency of such events. All size events are possible. Small events are observed frequently and large events are observed rarely. - 50- Learning, Evolution and S4/Organization Co-evolution and Spatial Itueraction The pattern of population average fitnesses is non-stationary with several large changes in the average fitness. Figures 21 and 22 present animation for this simulation for generations between 6877 and 7777. This period covers two of the large changes in fitness. In this period, average fitness changes from 1.5 to 1.25 and then back to 1.5. The animations illuminate why behaviour in this region is critical. Figure 21(b) shows the population in generation 6877. Most of the agents are playing a strategy which plays C on the first move and D on the second move (C-then-D strategies are 13 and 15 in Figure 13) scoring a fitness of 1.5. In the lower right hand corner of Figure 21(b) there is an agent playing an all D strategy (strategy 31). This agent is more successful than other agents. However, the success is at the expense of his neighbours. Of the 14 agents chosen to receive a new strategy for the next generation, 4 are the neighbours of the all D strategy. When creating new strategies the all D strategy will be selected more often than any individual agent’s strategy (ie. all D has higher fitness). However, all D will not be selected more often than a version of the C-then-D strategy since there is only one agent playing all D and 99 playing C-then-D. Thus the all D strategy will spread but not that quickly and will tend to spread locally. By generation 6899 (Figure 21c), there are a few pockets of the All D strategy. Since the concentration in any one region of all D is not high, they tend to do well. Since they are now more numerous in numbers and have above average fitness, they will spread more quickly. By generation 6907 the tendency for all D to spread locally has produced a pocket of all D agents in the centre of Figure 2 1(d). This -51 - Co-evolution and Spatial Ineraction Learning, Evolution and Set/Organization pocket grows since C-then-D agents on the border of the region perform poorly and are replaced. With the increasing number of all D agents, agents that are replaced increasingly receive the all D strategy. By generation 6948 in Figure 21(f) the population average fitness has decreased to 1.25 with the majority of the agents playing all D. However, the spread of the all D strategy is limited since agents on the interior of the all D pocket do poorly (ie. a fitness of 1.0). The high local concentration of all D strategies results in low fitness. In Figure 2 1(g), the all D pocket has been invaded, by the more co-operative C-then-D strategy. Figure 22 continues this simulation beginning with generation 7607. Between generation 6960 (Figure 21g) and 7607 (Figure 22b) relatively little has occurred. The evolution mechanism tries strategies in the interior of an all D region. Usually these strategies do poorly (ie. are exploited). Eventually, a C-then-D niche of strategies is introduced into an all D region and survives as occurs in generation 7695 (Figure 22c). Although the niche has removed some of the all D strategies, those that remain are in low concentration and now score a high fitness. In generation 7735 (Figure 22d), the success of the all D strategies, at the expense of its neighbours, has again created pockets of all D. However, this time the all D region does not expand. Instead, in the next generation, the pocket of all D in the centre of Figure 22(d) is replaced with a C-then-D niche of agents in Figure 22(e). Note that the all D agents can only be eliminated if they are replaced in groups, since a single all D strategy will exploit its neighbours and have a high fitness. This occurs in generation - 52- Co-evolution and Spatial Ineraction Learning, Evolution and SerfOrganization 7763 (Figure 22f) where all agents in the population play a C-then-D strategy and score a fitness of 1.5. However, in generation 7777 (Figure 22g), one agent experiments with an all D strategy and the process described in Figures 21 and 22 begins again. In the Frozen Region the evolution leads to a situation where large portions of the population never change strategies. The Stationary Region is characterized by populations where all agents change strategies frequently. The Critical Region is poised between these two states. Parts of the population can be frozen for many generations before being replaced while in other regions agents receive new strategies frequently. However, no region remains frozen indefinitely and all regions eventually become frozen. Figure 23 illustrates this point using the same example as above where R= 14. In this figure the shade of an agent corresponds to the number of generations since that agent last received a new strategy. The generations presented correspond to those in Figures 21 and 22. In generation 6877 (Figure 23b), there is no large region which is frozen. In generation 6969 (Figure 23c) shows that the majority of strategy replacement has occurred in the centre of the population. This region corresponds to the interior of the all D region in Figure 21(f). The C-then-D agents on the left and right side of Figure 23(c) are frozen. By generation 7607 (Figure 23d) some of the frozen regions are now actively receiving new strategies and new frozen areas have developed. Again, strategy replacement occurs in the interior of the all D region. However, from Figure 22b, note that the all D region has moved. By generation 7777 (Figure 23e), where the all D regions have been replaced by the C-then-D region, there are no frozen - 53 - Co-evolution and Spatial Interaction Learning, Evolution and S4f Organization regions. This pattern of freezing and un-freezing is typical of the Critical Region and helps to explain the non-stationary pattern in the population average fitnesses. Unlike the three other regions, the Critical Region is very sensitive to initial populations and perturbations. Although the characteristic behaviour in the Critical Region is usually unaffected, small changes in the population can dramatically alter the specific path of the population average fitnesses. A small change in the initial population or a perturbation to an existing population has a large effect on average fitness 1,000 generations later. Figures 24 to 27 present a demonstration of the sensitivity to initial population. In these simulations, there are 16 agents replaced per generation. The two initial populations are shown in Figure 24. The two populations differ only in two bits of the 500 bits that define the strategies in the population. 47 After 2000 generations the distribution of strategies in the two populations differ substantially (Figure 25). The pattern of average fitnesses also shows a striking differences. For one population (Figure 26), the system is frozen. The other population (Figure 27) displays the typical non-stationary pattern of population average fitnesses. Note that unlike simulation in the Frozen Region, the first simulation (like the simulation discussed below) is not stable to small perturbations. In the Critical Region there appear to be two anomalies in Figure 12. When 12 agents are replaced, the system becomes frozen with the same 12 agents being replaced. However, this 47 .. . . In addition, the random number generator uses the same seed in both simulations. - 54- Co-evolution and Spatial Interaction Learning, Evolution and S4fOrganization population is not stable to minor perturbations in the population. With a small perturbation (for example changing one bit of one string of the population), the systems exhibit the behaviour typical non-stationary behaviour of the Critical Region. In particular, it does not become re-frozen as occurs for simulations in the Frozen Region. The population that results when ii agents are replaced warrants more attention. When R = lithe population converged to all agents playing a C on the initial move and a D on the second move for a fitness of 1.5. In this case all agents were using the C-then-D strategy CDDCD (strategy 13 of Figure 13). In this population, the existence of the never used C in the fourth bit plays an important role in maintaining convergence in the population. If the genetic algorithm creates a new strategy by mutating the initial move (first bit) of this strategy, the new strategy will not improve fitness. Playing D on the first move will exploit the opponent. However, on the second move the new strategy will play C, be exploited, and maintain a fitness of 1.5. Since the mutant strategy gains no advantage, it is selected with likelihood equal to any other agent’s strategy. However, since there are many more agents playing the C-then-D strategy, the new strategy will likely die out. Defining strategies, as in traditional game theory, to prescribing play for all histories can limit the impact of a mutation. Although this population is stable to small perturbations of a single bit, it is not stable to introducing a new strategy into the population like strategy 31 (all D). With this perturbation, the system returns to the behaviour typical of the Critical Region. 48 48 A similar comment applies to R 14 since this population converges at generation 17,500 (not shown on Figure 19) and is not stable to the introduction of an all D agent. - 55 - Co-evolution and Spatial Interaction Learning, Evolution and SeOrganization IV - Discussion Recall that Kauffman and Johnsen (1991) consider a model of co-evolution that investigates the interaction among the size of the gene, the ruggedness of the landscape and the degree of landscape coupling. They fmd that when the landscape is more rugged, relative to the coupling, the system evolves to an equilibrium faster. They also find that, holding gene size and coupling parameter constant, there is an optimal ruggedness that maximises population average fitness. However, this optimal ruggedness is also the value that leaves the system poised in a critical state. The authors conclude that if there is a meta-dynamic which selects these parameters, then the system will organize itself into this critical state. This section uses the similarities between their model and the repeated prisoner’s dilemma model discussed in this chapter to interpret the results of the previous. As a result, our model helps to explain the results of Kauffman and Johnsen. In the repeated prisoner’s dilemma model the size of the gene relates to the complexity of the strategies allowed. In the infinitely repeated game there are more than 1025 possible strategies, in the one-shot game there are two possible strategies and in the twice repeated game there are 32 possible strategies among which the evolutionary mechanism must select. The ruggedness of the landscapes measures the number of local optima and the steepness of the peaks. In the infinite game there are many possible Nash equilibria. These translate into many local optima on an agent’s fitness landscape (holding constant the strategies of his - 56 - Co-evolution and Spatial Interaction Learning, Evolution and S4fOrganization neighbours). In addition, the landscape can be steep since a small change in strategy can change fitness significantly because of the effect one move can have on the remaining moves in a game. 49 Finally the degree of landscape coupling is controlled by the number of agents replaced in each generation (ie. R). When only a few agents are replaced, the landscape is not coupled with distant agents since your neighbours rarely adopt new strategies. When many agents are replaced each generation, there is a high probability that your neighbours will adopt new strategies. Thus there is a higher probability that strategies of distant agents may be adopted by your neighbours. The first result in Kauffman and Johnsen is that the more rugged the fitness landscape, relative to the degree of coupling, the sooner the evolution mechanism will reach a Nash equilibrium. The result in this chapter is similar. Nash equilibrium were reached in the infmitely repeated game more often than in the finitely repeated game. One explanation for the difference in the evolution for the two games lies in the fact the there are many Nash equilibria in the infinite game and only one in the finite game. The multiplicity of equilibria in the infinite game makes it likely that an adaptive move (ie. receiving a new strategy) of one agent will improve the fitness of his neighbours. After making such a move a niche is created where the agent and his neighbours will not be replaced. In this way an equilibrium can spread through the population. This is what occurs in Figure 9(b) to 9(e). In contrast, For example, changing the initial move from a C to a D could cause a neighbour to play D for the remaining game instead of C. Since in a long game this can be many moves, this will have a large effect on the fitness of the agent (relative to other agents). - 57- Learning, Evolution and SeOrganizadon Co-evolution and Spatial Interaction in the twice repeated game, an adaptive move by one agent is usually at the expense of his neighbours. Evolutionary changes are localised and have difficulty spreading through the population and the Nash equilibrium cannot be reached. This is demonstrated in Figure 21. To relate this explanation to the Kauffman and Johnsen result, consider an agent’s multipeaked landscape that is buckled by an adaptive move by a neighbour. The multiplicity of peaks ensures that the agent is not far from a local optima. In many cases, the agent could be made better off by the buckling. In contrast, if the landscape is single-peaked, buckling the landscape will likely leave the agent far from the peak and most likely worse off. Whether or not the buckling will translate into the agent receiving a new strategy depends not only on the landscape ruggedness but on the number of agents replaced per generation (ie. the coupling parameter). In the twice repeated prisoner’s dilemma game, the same region of R which maximised population average fitness also demonstrated critical behaviour with non-stationarily in the pattern of average fitnesses and the sensitivity to perturbations and initial conditions. Kauffman and Johnsen find a similar result. They fmd that for a given coupling parameter, there is an optimal ruggedness for the fitness landscape which both optimizes population fitness and provides critical behaviour. In the twice repeated game, there is an optimal coupling parameter (ie. R) for a fixed landscape ruggedness (ie. the twice repeated game) that demonstrates the critical behaviour. In both their paper and this chapter, if there existed some meta-dynamic or evolution of the evolution mechanism (ie. make R endogenous) that - 58 - Co-evolution and Spatial Isueraction Learning, Evolution and S4tOrganization sought to maximise population average fitness, then the critical behaviour of the Critical The phenomena of critical behaviour as an Region discussed in above would result. endogenous property of a system is called Self-Organized Criticality (SOC). SOC was first ° 5 discovered by Bak, Tang and Wiesenfeld (1988) in relation to avalanches on sandpiles. The advantage of the prisoner’s dilemma evolutionary model is that the familiarity of the game can be exploited to develop an understanding as to how the R which maximises average fitness also produces critical behaviour. When R is too low (the Frozen Region), the system gets stuck replacing the same agents over and over again. No critical behaviour is observed since most agents never change their strategy. The population average fitness is low since most of the agents’ strategies are not highly evolved (see Figure 16).51 When R is too high (the Stationary Region), the exploitive strategies, which are very effective when they exist only in small numbers, spread too quickly. With a high R all agents change strategies often, there is no possibility that co-operative regions can become temporarily frozen. When R is very high (the Converged Region), the all D strategies spread so quickly that eventually all agents play the all D Nash equilibrium and the system attains the lowest possible population average fitness. Only when the R lies in the critical region is it high enough to prevent freezing but low enough to ensure that all D strategies tend to spread in neighbourhoods 50 51 Footnote 46 containS more references to Soc. One has to be careful here since a completely random population will generally have fitness of 1.5 which is high relative to the fitnesses in Figure 12. The simulations in the Frozen Region had enough evolution before freezing that some exploitive strategies spread. - 59 - Co-evolution and Spatial Interaction Learning, Evolution and S4fOrganization containing other all D strategies. By tending to limit the spread of all D strategies locally, niches of co-operative behaviour can exist for many generations. However, when too few all D strategies exist, all D strategies do very well and can invade and destroy co-operative niches. This dynamic not only prevents the system from ever attaining the Nash equilibrium, but drives the non-stationary pattern of population average fitnesses observed in the Critical Region. V - Conclusion Simulations are useful tools for developing understanding of complex phenomena. They can be an excellent guide to generate intuition which can guide theoretical work. Unfortunately, simulation results are always inductive. The results of the simulations presented in this chapter are a representative sample of the all the simulations I have considered. However, it is always possible that the next simulation, with some modification of one of the many parameters of the model, will demonstrate dramatically different behaviour. It is therefore important to develop theory to support the conclusions drawn in this study. There are two theoretical papers in evolutionary game theory that are closely related to the model presented here. Nachbar (1992) studies evolution in a finitely repeated game. In addition to pairing agents randomly, Nachbar considers an evolutionary mechanism of replicator dynamics which differs from the one used here. - 60- Replicator dynamics, which Learning, Evolution and S4fOrganization Co-evolution and Spatial Interaction model the evolving distribution of strategies in a population, do not allow strategies to disappear and reappear in the population as occurs with the genetic algorithm model in this chapter. Nachbar demonstrates that the evolution must converge on the all D Nash equilibrium. Chapter 3 indicates that introducing local interaction change this conclusion. The simulations certainly suggest that with both these changes, evolution need not converge to the all D equilibrium. Binmore and Samuelson (1992) consider evolution in the infinitely repeated game using finite automata where agents prefer less complex automata. The description of an invasion by a group of strategies that can recognize each other is very similar to the invasion that takes place in Figure 9 (see footnote 40). The next chapter, using a stylized version of the environment considered in this chapter, picks up these themes in an analytic, rather than simulation, framework. In summary, this chapter has presented a model that directly investigates the implications of co-evolution. The proportion of agent strategies replaced in a generation modifies the nature of co-evolution by determining whether strategies need to be well adapted to their neighbourhood or to the entire population. In the infinitely repeated prisoner’s dilemma the population tends to evolve to one of the many Nash equilibria. Increasing the number of agents replaced per generation reduces the mean and increases the variance of equilibrium population average fitness. In the finitely repeated game, the population converges to the unique Nash equilibrium only when a large number of agents are replaced. These results are related to the Kauffman and Johnsen (1991) result which relates the ruggedness of a - 61 - Co-evolution and Spatial Interaction Learning, Evolution and SeOrganization landscape to the length of time an evolution requires to converge to a Nash equilibrium. Finally, in the finitely repeated game, the proportion of agents replaced that maximises population average fitness corresponds to critical behaviour. Again, this result is consistent with a Kauffman and Johnsen (1991) result. However, in our model the intuition of the prisoner’s dilemma game can be used to understand how and why optimal population fitness is related to critical behaviour. The next chapter builds on the simulation results presented here by considering a simpler spatial model analytically. - 62- Learning, Evolution and S4fOrganization Co-evolution and Spo4al Interaction Average Fitness versus Number Length of Game Replaced is 10 2.1 2 GD - EIJDmD D D 0 DUD UD GD 0 0 1.9 1.8 - DDUD DUDO - c D 0 UD D 13 CD E]l]UDD D C + + GD C E GD UD UD D C D L o 1.7- CD 1.6- 1.5- D + c 8 D +D 0 + > GD + CD D UD GD 4-, a 1.4- GD D £ + 1.3- C U 1.2- D > 1.1- GD 0 U 0 I D.9 I 110120 IS - I I 3D 25 35 I I GD I DUD I 401501601701901901100 65 Number of Agents Replaced per Gen. D Figure 3 D C - 45 Converged 55 75 8 95 Not Converge + Population Average Fitness in Final Generation Final Generation is calculated as either the generation where the population converged or generation 2000. - 63 - Learning, Evolution and SefOrgarnzation Co-evolution and Spatial Interaction 1 2 + . + + ..1.1. t + + + + 1.2 1• + + 0.9 (a) 0.8 ib 15 2b 25 3b 35 40 45 sb 6b 5 7b 75 Number of Agents Replaced Per Generation (R) sb 8 0 0 9b 95 ióo 0 0 2 1.3 0 0 0 1.2 • 0 0 0 1.1. 0.9 (b) 0.8 lb + 1 2b 25 3b 3 4b 45 sb 55 O 65 70 7 Number of Agents Replaced Per Generation (R) Average Fitness in generation where population first converged 337 cases (top) - ab 8 90 9 ióo 0 Average Fitness In generation 1000 where population did not converge 63 cases (bottom) - Fitted values from linear regression using all 400 cases Ave fittness - 1.94 0.005 x R - Figure 4- Population Average Fitness in Final Generation (L=1O) Results for 20 thfferent initial populations. R=5,10,15 -64- 95,1(X). Final Generation determined as in Figure 3. Co-evolution and Spatial Interaction Learning, Evolution and SeOrganization Average Fitness per Generation Len9th of game = 10 2.1 2 1.9 1.8 0 a 1.7 - - - 1.61.5 a a a C 1.4 - - U 1.3- a L 1.2 1.1 - - 10.9 i6o 2’O 10 30 50 8 agents replaced Figure 5 - 70 + ioi4oio 1020 20 240 90 1 0 1 0 1 0 170 1 0 210 230 250 Generation 28 agents replaced 0 87 agents replacad Population Average Fitness per Generation lime series plot for three different values for the number of agents replaced per generation. R=8,28,67. Same initial population. - 65 - Learning, Evolution and S4tOrganization .•.•...... .... n,o,• ..,....... .. e,..4fl.• ••n••ne• •.•. 0.e.•e •.•ee.•.n ........,. •.•.•e.••. Co-evolution and Spatial Interaction .......... .......... .......... •..e....,e .•.n••... .......... ••....•n• .......... Generat ion Hirber 0 of Total 0 to Ii 11 to 22 22 to 33 33 to 44 44 to 56 56 toG? 67 to 78 78 totS 09 to 100 Rigure 6(a) Generation Iêater 100 gure 6(b) gure 6(c) .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... •. .••••••• .......... .......... .......... ....•. ..•. ••een.••. •en•.•e•e Generation Nta.ber 130 Generation HisSer 135 R=8 L=1O gure 6(e) gure 6(d) •••e•n•ee .......... .....e.... .......... ••ne. ••n ..... .• ....... ......•. . .......... •en..e••. •nennn .......... .......... aaaaaaaaaa 4+4+44±44 4+4+ ++ + ++. -# + + +4+4-4+• Generation tkuber 140 gure 6(1) C Played by Agent as a Percentage Generation Hprber 130 gure 6(g) Figure 6- Location and Strategy Evolution of the Agents (oo repeated Game -8 agents replaced) Each agent’s location (on the torus) is represented by a circle. Radius is proportional to fitness. Shading represents the percentage of C’s played the agent in tournament co move game against four neighbours or 40 moves). - 66 - Learning, Evolution and S4tOrganiza4on Co-evolution and Spatial Inkraclion in Average Fitness Change Gen 2000 versus Converged fitness I 0.90.8 0.7 a - a - 0.6- a a 0.50.4 4-, U - 0.3- a 0.2- 0.1S c 0 a a 04 a çJujna a a iii 1 n a rmntrijm a acimj J cwn a a a a ci cflJEmE -0.1- a a - -0.2- a a 04 a -0.3o a a aa -0.4- a -0.5-0.6 - a -0.7-0.8 a D a a a a a - I -0.9 0 5 10 15 I I I I I I I I I 20 25 30 35 40 45 50 55 60 I I I 65 70 75 80 85 90 95 Number of Agents Rep Iced per Gen. Figure 7- Difference in Average Fitness Between Converged and Generation 2000 There are 31 cases where the convergence in Figure 3 was not stable. - 67- Learning, Evolution and SeOrganization Co-evolution and Spatial Interaction Average Fitness per Generation — 2.1 1.7 2 1.5 dbLJ 1b 4 0.9 500 sCo 700 1100 1300 1500 1700 1900 Gene.on — 67geØ.oed Figure 8- Average Fitness per Generation R=67 Note the population converges and then diverges several times. Insert provides a magnification ofgenerations 860 to 890. - 68 - - 69 - .so sinoqq8pau .snofssuw8v awvff aaow or) suawvuinos u suaZv aqs patrjd ç Jo a8vsunsad ays ;tsO3 (saww s!suasndal 8upvyg wsausj( in jvuop.soclo.ad s sn;pvj apsp ii tCq pasuasaidat sy (nuos aqs iso) uopino waXy zpvg (pavlddg 81ud80z9-durv8 pdjvadal co) 81ud8v dysfo uojsnioaar iC8dWJJg puv U0p0307 -6 (1)6 (6 asn4 eta Jemwi 0)69 0) 0) 0) 01 9j 1.9 9 frfr 0) TT 0) 9! 0) asnS TIS Jfl4 00; )esauaQ uojeeuag 0000000006 e© $6000000 • s00o00000 •onoe•oe• •eo000o• n.e., ..,6 •,,6 600,66 66O 6 006$ 666 0606 006 6660 0•6 6000 flo 0000 0•6 0006 600 $666000 6066660 0060600 060 $66 006 no Gee 66$ 066 006 $60 066 (P)6 atØ 017 L9=1 A.qIei uoTwJaInv 1.90 JdtIIWl 0060606060 0606060606 $660006660 6060666066 $600060600 6660006066 0066606006 6600066000 6060060600 0000006606 (‘)6 690 (‘9 “k clOT 68 fl 1.9 96 fr 55 9! TT 0) 0 1t101 Jo t SE pme fiq pefleId 3 UOflJVJflUJ jt’iwt% “& UOJ)WJdLId9 6000600000 0606060006 6000600600 0000006060 0000000600 0000000000 0000000000 0000000000 0000000000 0000000000 (q.) “& flU flbW uo;)eaaug 0000600000 0006060000 0000600000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 ‘“X awtqj OUT WJd4Jd9 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 uOgwZguVS4o-Jag put’ put’ uoyiqoaa-irj UOflfl7043 ‘Rupuvr Co-evolution and Spatial Interaction Learning, Evolution and S4fOrganization © © ® ...•.... ••oOee•e •n••n••• •n••nn• ••••on® “0 •e••••on •••n••n. •••nn•®• ,.,..,..,. flfl00000 ...0® © €4,.. ••nnon• •••••••ne Generation Niaiber 873 C Played by Agent a. a Percentage of Total 0 to Ii Ii to 22 22 to 33 33 to 44 44 to 56 56 to 67 67 to 78 78 to 89 89 to 100 igure 9(h) 7 Generation Nte,ber 874 Figure 9(I) igure 9(i) 7 .....,.,.. •••••nn• €00000000 e••o®oene .......,.. .....,.... ...,,..... •••®n•••• ...,,..... .......... ...,...... ••4fl000000 .., fl••••• ...,.,.... •••••n••• Generation Niatber 875 Figure 9(k) R=67 L=1O Generation têaiber 876 Figure 9(1) .......... .......... ..,....... •000ee•••• ..,....... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... Generation Nunber 884 Figure 9(m) Generation Hunber 886 Figure 9(n) oo repeated game 67 agents Replaced Each agenñ’ location (on the torus) is represented by a circle. Radius is proportional to fitness. Shading represents the percentage of Cc played the agent in tournament move game against four neighbours or 40 moves). Figure 9 Continued - - ao - 70- t U 4 I 0 0 0 0 0 0 0 0 0 0 0000 00 0000 00 0•Oo 00 0000 00 0000 00 0000 00 0000 00 0000 00 0000 00 0000 00 0000000 00.0000 00.0000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ‘-S fr-I 0 0 0 0 0 0 S 0000 0000 I 0000 0000 0000 0000 0 0 0 0 0 0 0..0 0 oQ.Q 0 I 0000 0 I I 0 0 0 0 0 0 0 0 0 0 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 liii 0 0 0 0 0 0 0 0 0 0000 0 0 00 0 0 S 0000 0000 0000 0000 0000 0000 I 0000 0 0..0 0 0..0 0 0000 0 0 S 0 0 0 0 0 0 .4 0000 — ft IS ft 00000 ft 0 .c-) •0 0 Cu 00 00 00 0 00 00 00 0 00 00 00 0 00 00 00 0 00 00 00 0 Ct 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a I I a I a IR I a tI Leanung, Evolution and SdlOrganization Co-evolution and Spatial Interaction Average Fitness versus Length of game Number = Replaced 2 1.8 1.7 0 - 1.6- o 0 In 1 \/A1Jvf\\\J I 9 3 49I55I63I71I79I87I95I 7 I1317I21I25I29I3337I41I45L 11 1 19 23 27 31 3 39 43 4 51 59 Number of Agents Pep laced per Gen U Figure 11 - Converged Not Converged Population Average in Generation 5000 Population converged for R= 11,44,47,50,51 1(1). -72- 67 75 83 91 99 Learning, Evolution and /Oiganization Co-evolution and Spatial luteraction Minimum, Average and Maximum Population Average Fitness For Generations 5000 - 6000 1.8 1.7 H—Minim Average 1.6 — R—Mximum 15 1.4 1.3 12 1.1 1 Frozen 0.9 Figure 12 - 0 5 Critical I Converged Stationary I I I I I 10 15 20 25 30 35 40 45 Number of Agents Replaced per Generation I 50 —t I 55 Population Average Fitness Statistics for Generations 5000 to 60(X) (L =2) Frozen, Critical, Stationaty and Converged regions are discussed in the text. -73 - Learning, Evolution and S4/Organizadon Co-evolution and Spatial InteractIon Strategy Number First Move 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 C C C C C C C C C C C C C C C C D D D D D D D D D D D D D D D D Figure 13 - Second Move Conditional on (You,Me) Outcome of First Stage (D,D) (D,C) (C,D) (C,C) C C C C C C C C D D D D D D D D C C C C C C C C D D D D D D D D C C C C D D D D C C C C D D D D C C C C D D D D C C C C D D D D C C D D C C D D C C D D C C D D C C D D C C D D C C D D C C D D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D All Possible Strategies for the Twice Repeated Game A string of5 bits is used to represent these strategies. Strategy index numbers used in histograms which follow. - 74- Learning, Evolution and Selforganization Co-evolution and Spatial Interaction 0000.000cc 00000.0000 0000000000 0000000000 0000000000 000.000000 00.00000cc 0000000000 0000.000.0 00000. 000• Figure 14- LoCation ofAgents in a Frozen Evolution The dark agents are the 8 least fit agents and receive new strategies each generation. The other agents never change strategy. - 75 - Learning, Evolution and S4fOrganization Co—evolution and Spatial Imeraction Tournament Length 2 Population Ave Pitneee by Generation 1.5 EE 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 (Nunter of Agente Repieced per gen) —8 Figure 15 - Population Average Fitness per Generation (R=8) Generation 5000 to 6000. An example of an evolution which is Frozen Twice Repeated Prisoner’s Dilema Distributic of SirategieB after 5000 Graerations 50 Ag Rq,laced p& Gmuad L1° lflflflnnnUflflJ - Figure 16- hdx N Strategies in Population in Generation 5000 White bars indicate initial population. See Figure 13 for Strategy Index. - 76 - 6000 Co-evolution and Spatial Interaction Learning, Evolution and S4f-Organization Tournament Length 2 Population ave fitnees by generation 1.5 - 1.4 a (a 0 C 1.3 0 0 L 0 6 C 0 1.2 6 a 1.1 5000 5400 5200 5600 5000 (Nuriber agente replaced by generation) —34 Figure 17- Population Average Fitness per Generation (R=34) Generation 5(X)O to 6000. An example of an evolution which is Stationaty. Twice Repeated Prisoner’s Dilema Diribution of Slmtegiee after 5000 Gererafio 50 — Rqilaoed pG 1 12 0 hdox N Figure 18 - Strategies in Population in Generation 5000 White bars indicate initial population. See Figure 13 for Strategy Index. - 77- 6000 Co-evolution and Spatial Interaction Learning, Evolution and SeOrganization Tournament Length 2 POpccIatTOfl eve fitnese by øereratlon 1.5 1.4 cc cc cc C Ll 1.3 cc cc L cc C 0 1.2 C, cc a 0 a 1.1 14 Figure 19 - Population Average Fitness per Generation (R=14) Generation 5000 to 6000. An example of an evolution which is Critical. Twice Repeated Prisoner’s Dilema DiIbu&ii of Sirategies aftee 5000 Genesatloss 50 45 Agccius Rccçbced p Gccii Do •14 40 j35 20 is 10 S 0 Figure 20 - Ii nflnnnHnn ii ,r,nFIIl[1F1F1r E1F1F1F1 1 Strategies in Population in Generation 5000 White bars indicate initial population. See Figure 13 for Strategy Index. - 78 - Learning, Evolution and SOrganization .......... .......... .......... .......... .......... .......... .......... .......... .......... •.....eeoe Generation Co-evolution and Spatial Interaction •••••e• .0. .......... .......... .. eQ...... ... .Oee••• ...0.,0... •eeooe• .......... •e•eeeeeee •ee•••eooe kmber 6877 ‘igure 21(b) ‘igure 21(a) Generation Ntaiber 6899 Tgure 21(c) ••••••.oce ....... •0• ..,....... •000.e0000 •nooo.000 ••.oo 00.00 .... e000e. o0eQoooc@ 0....00000 ......e... .......... •••••ooo.• •••e0 ooe• •eee.o.oe• Generation Hiwiber 6907 ygure 21(d) R=14 L=2 •oooe.... ••.Oeee••• •eo. .oc... •eooo • oDe. •.e .o @0 00. •eo 0 @oe Generation Iftater 6925 igure 21(e) 7 ooeooooooo oee000cy.o0 •oo oo.Q.e •000000000 •eo. 00•Oe0 00000 0 @Qe• ••flooc’.•• ec.. oQ• @00 oeeoe @0.00 0.e0@ 0 • 000 0ee @@0.0. 000 caeee@o •eoo • Cc’ oe 000 0• eoe• ...@ oee•• •••o • 00.0.. ••e on... ••eoo cocee .•.o .0ee0 Generation Vêinber 6948 igure 21(f) 7 Figure 21 C Played by Agent aa a Percentage of Total 0 to Ii 11 to 23 22 to 33 33 to 44 44 to 56 56 to 67 67 to 78 78 to 89 89 to 100 Generation Hiamber 6960 Figure 21(g) - Location and Strategy Evolution of the Agents (Twice Repeated Game R=14) (7) and ends at - Example of behaviour in Critical Region. Average population fitness moves from 1.49 in (b) to 1.25 in 1.32 in (g) - 79 - Co-evolution and Spatial Interaction Learning, Evolution and S4fOrganization ••en 00 ... •e•neo.•e ...0. ®.e 00 .......,.. .. no OeO o®eoo .0 ooe 0000 ceo ze••Qee oQe @OeOeOeO.• 00 0e...•.® •o oQe c’oQ•Q 00 oDe Cone 0• 00 0on• ..... .n•. 00000 eeeQ©Qe •0•fl.e.•. 000....... Generation Itater 7607 Generation Wither 7695 ‘igure 22(b) •o.Q.•.•.o 00• .nn•.••• C Played by Agent as a Percentage of Total 0 to 11 11 to 22 22 to 33 33 to 44 44 to 56 56 to 67 67 to 78 78 to 89 89 to 100 Ygure 22(a) ‘igure 22(c) ••e•on••e •...•..... •••® 0®.... •..0 0®•••• • •0 oo•.© .••• 00.0. .••• ••eQo non 000eoneoo .n••en.o R=14 L=2 •eeecoee• ooe..eQ..e •.•.n.•ee •e.Q.onn 000e000eOo .on••••.o .......... Generat ion Nunber 7735 igure 22(d) 7 Generation N.rber 7736 ‘igure 22(e) .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .....o.... .......... .......... .......... .......... .......... Generation I*aber 7777 Generation Ntaber 7763 igure 22(i) 7 ‘igure 22(g) Figure 22 Location and Strategy Evolution of the Agents (Twice Repeated Game R=14) - - Example of behaviour in Critical Region. Average population averagefitness moves from a low of 1.3 in (7’) to a high of 1.5 in (t) - go - Learning, Evolution and S4f Organization Co-evolution and Spatial Interaction QQQeQQQoQ •.er/jo 000®o oocaooeoco oe©csooooeo Oe.o. c occee•ooo eeoeoo•••o •ononooo o•e••oone oeoo•eoeo •ee®o® oeo. e ..0000000. •e•o o•c•• ••e cOo oee• ...,0000.. .... 000•e• 0000000o©o ••cooQo®• .•., o®0000 Generat ian &eéer 6077 Generation Weiber 6960 QQQ®©QoQ ‘igure 23(b) (10 (20 (30 (40 (50 (60 (70 (80 >00 I ‘igun 23(a) ‘igure 23(c) ••••e••e e..n Age Age Age Age Age Age Age Aee Age cvooe oQ• coo eoo •eee0000•o ooeoooocc• OOOOOQOO® 0000•oeo•o •o •..... •ooooooooo OOOOOO.OO oooooooo• oooOoOeOO ooooQ@oo• oooQQoQeo OOOOO©©O© o©•ooeoo OOOOOOOOO o•ooo•oo Generation itetber 7607 Ygure 23(d) R=14 L=2 Generation Nirber 7777 ‘igure 23(e) Figure 23- Strategy Replacement Shade of agent indicates the number of generations since the agent received a new strategy. This sinudation is for 14 agents replaced in the twice repeated game. Generations correspond to those in Figure 21 and 22. - 81 - __________ Co-evolution and Spatial Interaction Learning, Evolution and S4/Organization Twice Repeated Prisoner’s Dilema Initial populatics of Strategies Apes Rep’aced p 30• 45 L1Pop1.Geno • j35 dex Figure 24 - Two Initial Populations Populations differ only in the first bit of agents 4 and 6 Twice Repeated Prisoner’s Dilema Dilbnics ift 2000 Geseratices Apis, Rqaced p Gesisdon 50 45 EIPOP1-Gen2000 • Pop2-Gen 2000 1 I: ; 11’i U Jill S h Nis Figure 25 - Strategies in Population After Generation 2000 and R=16 From two similar populations, R=16 in the Critical Region produces vely different populations. - 82- Co-evolution and Spatial Interaction Learning, Evolution and S4f-Organizadon Comparison of Two Initial Populations Population Ave Fitness by Generation 1.7 1.5 U U C 4, ii. 1.5 a p •1 .rrni,vwrrn 1.4 eiL1i .iai o 1.3 C 0 1.2 1.1 1 501 1001 1501 2001 Nrber of agents repiaced is 16 per pen initisi Poulation 1 Figure 26- Population Average Fitness with Initial Population 1 and R=16 2000 generations are reported. R=16 is in the Critical Region Comparison of Two 1.8 Initial Populations Population Ave Fitness by Generation 1.7 1.6 0 a C 1.5 41 1.4 0 1.3 C 0 1.2 1.1 Nunter of agents replaced is 16 per pen — Figure 27- initial Poulation 2 Population Average Fitness with Initial Population 2 and R=16 40(X) generations are reported. Simulation was run for 15,000 generations producing similar plot as presented here. - 83 - Chapter 3 Simple Spatial Evolution The previous chapter models the evolution of play for a repeated prisoner’s dilemma game where agents’ interaction is restricted to their immediate neighbours on a lOx 10 torus. In that model, simply describing the spatial environment is not sufficient to characterize the dynamic behaviour of the model. The number of agents replaced per generation determines the importance of the spacial environment. In particular, the number of agents replaced per generation impacts whether a particular agent is likely to face opponents with new strategies. When agents are very likely to play new agents (or agents with new strategies - you may interpret however you like), a “good” strategy is one that is well adapted to the entire population. In contrast, when only a few agents are replaced in a generation, then good strategies are ones that are well adapted to the particular spatial - 84 - Simple Spatial Evolution Leanung, Evolution and SelfOrganization 3 The lOx 10 model in Chapter 2 demonstrates an important characteristic 2 ’ 1 environment. of the spatial prisoner’s dilemma environment: An agent’s strategy influences the probability that his neighbours will seek new strategies next generation and thus affects the dynamic path of evolution. In other words, in spatial models an agent’s strategy not only determines how the agent plays the game but helps to determine with whom the game is played. This chapter investigates the effects of this spatially induced correlation. The complexity of the 10 x 10 model prevents algebraic analysis. This chapter focuses on two much simpler models which capture the effects of the spatially induced correlation. The chapter begins by describing the population or replicator dynamics approach to modelling evolution. In this environment the introduction of correlated play is exogenous. Since there is no natural parameterization of the correlation that captures the spatial model, replicator dynamics is not a particularly useful approach. The second example is a Markovian model with four agents arranged in two pairs playing a one-shot prisoner’s dilemma game. Besides offering insight into the correlation induced by spatial interaction, the chapter indicates the 1 2 Note that learning in the previous chapter and here is “global” in that agents select or create strategies based on the strategies and fitnesses of the entire population. For more discussion of local evolution, see Kirchamp (1994). Note Axelrod (1984) also introduces a model with local strategy evolution. Note that when a genetic algorithm is used to simulate evolution, then a “good” strategy is one that is appropriate for the likely outcome of the evolutionary mechanism (such as a genetic algorithm) given the current population. The use of the term “good’ has the potential to be misleading. A maintained assumption of evolutionary models is that strategy selection is made by myopic agents. Agents have no intertemporal objective function. Even in games which are repeated, agents are myopic in that they have no concern for what occurs after the end of the game. See Figure 2.2 in Chapter 2 for a schematic representation. - 85 - Learning, Evolution and S4tOrganization Simple Spatial Evolution complexity and difficulty of considering the lOx 10 torus environment analytically. This offers additional justification for the simulation methodology explored in Chapter 2. I - Replicator Dynamics Replicator dynamics, introduced by Taylor and Jonker (1978) are used to analyze a noiseless evolution of random pairwise interactions in large populations. There have since been many examples using this approach. 4 In particular, Nachbar (1990, 1992) uses replicator dynamics to analyze finitely repeated prisoner’s dilemmas. This section introduces the main assumptions of replicator dynamics and considers an example of a one-shot prisoner’s dilemma. Most replicator dynamics analysis assumes that agents are paired randomly. This assumption does not suit a spatial environment. Skyrms (1994) introduced non-random pairing or correlated play. Unfortunately, as is discussed below, correlation models place little restriction on behaviour and are thus not useful. The basic environment for replicator dynamics considers a sequence (indexed by t 0) of populations of agents playing a symmetric two-person game. Let A be the payoff (or fitness) matrix representing the symmetric two-person game with feasible strategies denoted as aE {a ,. 1 using 4 a . . ,aM}. The A represents the resulting utility of playing a, against an opponent and are assumed to be strictly positive. Let p(t) be an M-dimensional vector whose See van Damme (1987), Chapter 9 for an overview and additional references. - 86 - Simple Spatial Evolution Learning, Evolution and SeOrganization elements, p (t), denote the proportion of population using strategy a, at time t. Finally let 1 (t) r , a represent the probabilities that an agent playing ,t) be an M-vector whose entries, 1 1 7r(a . 1 a, at t will meet an agent playing strategy a The expected utility (or fitness) of an agent playing strategy a, is calculated as follows. ,t) 1 u(a ’ A r(a 1 e ,t) 1 = where e. is an M-vector with a one in the th 1 (3.1) position and zeros elsewhere. Note that for simplicity, mixed strategies are not considered. Since populations are considered to be large, any need to consider mixed strategies is obtained with a non-homogeneous population. The expected population average utility, U(t) of the population is the proportion weighted sum of utilities. U(t) = (t) u(a 1 p ,t) 1 (3.2) The goal of using replicator dynamics is to describe the evolution of the proportion of each strategy in the population. The main assumption of replicator dynamics is that the growth rate of a strategy in the population is proportional to that strategy’s current utility (fitness) relative to the current population average. Since the population is assumed to be large the realized average utility of the (many) agents playing a 1 is close to the expected utility (see - 87 - Simple Spatial Evolution Learning, Evolution and S4fOrganization Boylan, 1992). These assumptions combined imply that the growth or evolution of a strategy in the population occurs according to the following deterministic equation: u(a4) U(t) Note that since the growth of a strategy in the population is assumed to be proportional to the proportion currently playing that strategy, attention is restricted to initial populations that put some positive weight on all possible strategies. This implies that all p(t) will remain in the interior of the (M-1) simplex. Rearranging slightly and letting the time between successive generations become small yields the following system of differential equations. dp ( 1 t) dt ,t) U(t)) 1 (u(a — = p.Q) (3.4) U(t) Since U(t) >0 (by normalizing A), the trajectories of this equation are the same as the simpler equation (t) 1 dp = t) —U(t)) p.(t)(u(a , 1 (3.5) The properties of this equation that are of interest are the dynamic equilibria or fixed points ldt=0 for all 1 (ie. dp i) and the stability of these fixed points. Dynamic equilibria, p, are - 88 - Simple Spanal Evolulion Learning, EvoiuUon and SeOrganization (asymptotically) stable if there exists a neighbourhood about p such that any path of (3.5) that begins in the neighbourhood converges to p. 5 Note that the evolution dynamic is deterministic. Noise or mutations are less important since every possible strategy is 6 represented in the initial and subsequent populations. A key simplifying assumption in replicator dynamics (and most all evolutionary game theory) is that the pairwise interactions occur by drawing agents randomly from the population (random pairing). This implies that T(a ,t)=p(t) for all a 1 . In particular, note that the 1 probability of an agent interacting with any strategy is independent of the agent’s current strategy. The assumption simplifies the dynamic in (3.5) to a system of cubic differential equations which are (t) 1 dp = ’A p(t) 1 p.(t) (e — (3.6) p(t)’ A p(t)) To highlight the effect of the random pairing assumption, consider the one-shot prisoner’s dilemma game with a 1 = C and a 2 =D and payoff matrix as follows There are weaker definitions of stability where points which begin near p remain near to not important for the analysis in this chapter. 6 p*• However, these distinctions are This is in contrast to the genetic algorithm where typically the population is small relative to the possible number of strategies and noise (such as crossover and mutation) serve to ensure that all strategies are evenwally tried in the population. - 89 - Simple Spatial Evolution Learning, Evolution and S4f-Organization (3.7) A=CO ed with e > c> d> 0. Since there are only two possible strategies, the population can be described by the proportion of agents playing C in the population, denoted p, (time In this prisoner’s dilemma environment with random arguments are suppressed). interactions, the replicator dynamic in (3.6) is given by (1 = = PC (1 — — PC) (u(C) PC) (p (c — + u(D)) d — (3.8) e) —d) < 0 Note that the proportion of C-agents is monotonically decreasing over time (for p, < 1). Therefore, the only stable dynamic equilibrium is p=O. This is not surprising since D is a dominant strategy in the one-shot prisoner’s dilemma game. Nachbar (1992) investigates fmitely repeated prisoner’s dilemma games using the replicator dynamic approach. While he fmds that the replicator dynamic can show much more interesting behaviour (eg. non monotonic) than the simple version presented here, the only stable dynamic equilibrium is one with all-D behaviour. 7 The strategy space of the finitely repeated prisoner’s dilemma game considered by Nachbar (1992) differs from the one used in the lOx 10 simulation of the previous chapter. In his environment, only one of the feasible strategies will yield all-D behaviour. In contrast, in the twice-repeated game in the lOx 10 simulation there are several strategies that produce all-D - 90- Learning, Evolution and S4/Organization Simple Spatial Evolution In contrast to random pairing, Skyrms (1994) considers correlated evolution where there is some exogenously specified correlation between the play of strategies. ir(C,t) In other words, i-(D,t). It is easy to see that the monotonic decrease in the proportion of C-agents in (3.8) need no longer be the case if r(C,t) is sufficiently larger than r(D,t). If these (exogenous) probabilities are held constant over time, the evolution can move monotontically towards the all-C population. 8 This occurs despite the dominance of the D strategy. Here the choice of strategy influences the interaction and not just the outcome from the 9 interaction. In a more sophisticated model the T(a ,t) may also evolve as some functions of the p(t). 1 Such functions for the probability of pairing with an agent playing C can be written as ) (suppressing the time arguments). In this case, much more complex behaviour can 1 r(p;a ° More importantly with the “right” r’ s any p can 1 be produced by the replicator dynamic. be a stable dynamic equilibrium even if the ir’s are restricted to be non-trivial functions of PC(t). This is stated in the following proposition. behaviour. This is not an important distinction, however. 8 Forp,= 1 to be the stable dynamic equilibrium it is sufficient that it,(C,t) > a,(D,t) + for all t. Skyrms (1994) relates the difference between random pairing where strategy choice does not affect opponent and correlated pairing to the difference between Savage (1956) decision theory and that proposed by Jeffrey (1965). In Jeffrey decision theory, in stark contrast to Savage, the actions of the decision maker are assumed to affect the probability measure over states and not just the utility of the state-contingent outcomes. 10 For example, each agent might receive a partner from the following procedure: Select n agents from the population at random and then from this list choose an agent to play according to some strategy dependent probabilities. Such a system has a stable dynamic equilibrium at p,=O and never at p=l. However, with the right parameters other (strictly mixed) stable dynamic equilibria exist. -91- Learning, Evolution and S4fOrganization Proposition 1 Simple Spatial Evolution Given any p E (0,1), there exist smooth non-decreasing functions T(p;a) ofpjt) with the property that 1 ) =1, for a. E (C,D), 1 r(0,a =0 and ‘,rjl;a ) such that p is a stable dynamic equilibriwn. Proof See Appendix A. In order to make a model using replicator dynamics meaningful in an environment with correlated play, some additional structure must be placed on the probabilities of interaction (the lr’s). Unfortunately there is no obvious connection between the correlation that can arise from spatial interaction and the exogenous ir’s. The next section, therefore, addresses the induced correlation directly by treating spatial evolution in a Markov setting. II - Spatial Markov Model As was noted above, the correlated play that is induced in a spatial model is not easily captured by replicator dynamics. In this section, a Markovian spatial model is developed that tracks the evolution of the strategy played at each location. This allows direct consideration of the (endogenously developed) correlation in play. Unfortunately, the state space in these models can become very large. In the lOx 10 torus considered previously, - 92- Simple Spatial Evolution Learning, Evolution and Set/Organization ’ of 1 the state space for the one-shot game is on the order 2100 (about 1O in the twice repeated game the state space is of the order 2°° (about 1O’°) and in the infinitely repeated game (represented by strategies with a memory restricted to three moves) the state space is 28500 (about 102558). In order to understand what is occurring in these more complex models, a simple four agent system is considered. The model is tractable (only six states) yet can demonstrate much of the behaviour of the more complex system. The model is described by two elements; agent interaction and strategy evolution. Figure 3.1 describes the environment for agent interaction where four agents, arranged in two pairs, play one-shot prisoner’s dilemma games. CD 1 2 3 4 w {C,D} Figure 3.1 {C,D} - {C,D} {C,D} Cc 0 De d e>c>d>0 Agent Interaction Four agents, arranged in two groups, interact playing the prisoner dilemma. The symmetric payoffc for the row player are shown at the nght. This overstates the size of the state space since there are some states which are equivalent. For example, the state where aU agents play D except for the agent in the first location is equivalent to one where the sole C-agent is in position 57. Even after considering such reductions, the state space remains large. Note that the size of the neighbourhood for play (nearest 4 agents) influences the state space simplifications which can be made. - 93 - Simple Spatial Evolution Learning, Evolution and SeOrganization Evolution is modelled in two stages. The first stage determines which agent searches for a new strategy. Recall in the 10 x 10 model, this was determined as the bottom (lowest fitness) R agents. In this model only one of the four agents will be replaced in a generation. In particular, evolution will select the agent with the lowest fitness with probability (l-). With probability i, evolution selects one of the four agents at random (ie. with uniform probability of ‘A). By varying the parameter , the nature of the spatial landscape is altered (analogous to varying the number of replaced agents). Note that when , is very small, the chance that an agent’s opponent will seek a new strategy is small as long as the opponent’s fitness is not low. In contrast, if is high, the probability that an agent’s opponent seeks a new strategy is (almost) independent of either agents’ current strategy. The second stage in the evolution determines which strategy an agent chooses, given she is looking for a new strategy (ie. selected for replacement). With probability 1-u, the agent selects a new strategy from the strategies existing in the population with probabilities according to each strategy’s proportional fitness. With probability the agent experiments and selects one of the possible strategies randomly (ie. without reference to the current population). In this environment, with probability ½ the agent selects C. proportional selection is similar to the replicator dynamics. environment The Fitness However, in this noisy serves to introduce strategies into the population that are currently not used. is similar to the noise in genetic algorithm evolution from both crossover and -94- Simple Spatial Evolution Learning, Evolution and SeOrganization mutation. The noise is needed in this model (and in the genetic algorithm) since, unlike in replicator dynamics, the population size is small relative to the feasible strategies. Given this model of evolution, the system is Markovian. There are 16 (=2) possible states of this evolutionary system. However, since some of the permutations are indistinguishable (ie. the location of the agent playing D when others are playing C) the relevant state space considered here has only six states. The states indicate the number of C’s being played in each of the two agent-pairs without regard to their location in the pair or identity of the pair. The state space considered is { 1 =2C2C, 2 =2C 1C, 3 =2COC, 4=11 C, 5=1 COC, 6 =OCOC}. The Markov transition matrix is denoted M with entries M as the probability of transiting to state j given the current state is state i. (The thirty-six M entries are included in Appendix A as they may be a bit difficult to read here). - 95 - Learning, Evolution and SeFOrganization Simple Spatial Evolution i-Vip Vip 0 0 0 0 ‘/411((1*X...)+14IL) ([_1/4flX(1_(...)+%P)+ (1_%nX(i_().½) %((l_)(...L..).%) 0 0 o (i_½uiX(i_)()+½) ½ui((t_)+½)÷ 0 0 (i_½ui)((I_p)(_L)÷%p) c+d N— 0 Viq(Vip) 0 Viq(1—½p)÷ (l-Viq)Ø4p) (i-ViulXi—½p) 0 0 0 /4qØ4p) 1 ½i(’4) (i-%iiXVip)+ hji) 1 (i-%qXi- 0 0 0 0 i-Vip (3.9) - 96 - Simple Spatial Evolution Learning, Evolution and S4tOrganizanon Let p(t) be the 6-vector whose elements p,(t) give the probability that the population is in state w at time t. The Markov sequence {p(t)} is defined as follows (where p(O) is a specified initial condition) p(t+1)’ = (3.10) p(t)’ M In order to understand the behaviour of the evolutionary system, the limiting distribution is calculated. Note that so long as and are both strictly positive all states communicate. Therefore M is ergodic (or all six states form an ergodic set) and there is a unique invariant distribution p* such that p’ ‘M with positive weight on each state (ie. p>0 for all o). Furthermore, the system converges to this invariant distribution from any starting point (ie. 2 for all p(O)).’ The unique invariant distribution is a function of the levels of the noise, i and JL. Since the interpretation of these parameters implies that they are small, the unique invariant distribution will be considered as q and approach zero. This follows the methodology introduced by Kandori, Mailath and Rob (1993) and Young (1993) (KMRY) in their papers using evolution for equilibrium selection. This section also relies on the Bergin and Lipman (1994) analysis that demonstrates that the KMRY approaches are sensitive to how the noise is specified. 12 These are well known properties of Markov systems. For more details, see Stokey and Lucas (1989 Chapter 11). - 97 - Learning, Evolution and SeOrganizadon Simple Spatial Evolution From inspection of (3.9), note that if j ==O there are three ergodic sets which are absorbing. They are {1}, the all-C state, {6}, the all-D state or {2,3} which varies between 2C1C and 2COC. Note that states {4,5} are transient. By considering various alternative specifications for the noise (j and ‘q non-zero but vanishingly small), each of these sets can be selected as the limiting unique invariant distribution. This is an application of Bergin and Lipman (1994, Theorem 1). The discussion which follows will characterize the noise properties which produce the various limiting behaviours. The key requirement of the Bergin-Lipman theorem is that noise be allowed to be state dependent (eg. Note that the Markov matrix in (3.9), for convenience, presents the case where noise levels are not state dependent. The limiting distribution when all noise is the same is a special case of the analysis which follows. It turns out that a convenient partition of the state space is to divide the states into those that have at least one co-operative pair, ë={1,2,3} and those that do not, 15={4,5,6}. denoted m, D’ and I-D• The noise in these states will be Since we wish to consider the case where the noise levels become vanishingly small (albeit at different rates), it is convenient to write the noise levels as follows (where and constants Kq and K are strictly positive). D=exp() (3.11) 12D = exp(-) = - 98 - exp(—K fl Learning, Evolution and SefOrgaaizo4on Simple Spatial Evolution Using this we can state the main proposition of the analysis. Proposition 2 Given the Markov matrix with non-zero-vanishingly-small noise, the unique invariant distribution (o, urn p*(;KK)/ = p* 0, 0, 0, 0, 0, 1 ) K<3 and K,<3 0, 0, 0, o) (3.12) K>3andK>K x, 1-x, 0, 0, 0, o) K>3 and K,>K (i, 0, (o, is a function of iç and K as follows: 0, where = c(2c+e) c(2c+e) Proof + (3.13) e(c+d) The details of the proof and the boundary cases (ie. K and/or K is exactly 3) are left to an appendix. The strategy for the proof is as follows. First the invariant distribution of M in (3.9), with the noise levels as specified in (3.11), is calculated. This -99- Learning, Evolution and SeOrganizo4on Simple Spo4oJ Evolution is done by solving p’ =p’M for p(K,K). Taking the limit as gets large (which makes the noise small) for the various cases yields the result. See Appendix A. Although the model is relatively simple, it does offer some insight into evolutionary systems like the one considered in the previous chapter. The results indicate that the relative magnitudes of the noise matter. When the noise is the same across regions, the system evolves to the state where primarily all-D behaviour is observed. However, if noise in the O-states is lower (by an exponential factor), then the system can evolve to the i-states. In this case two types of behaviour are possible. The system can settle on the all-C state since mutations to push the system away from 2C2C are vanishing most quickly. The second possibility is reminiscent of the behaviour which typified the critical region in the lOx 10 simulation model. When is the noise parameter which is approaching zero most quickly (and enough faster than ND), the system evolves to oscillating between states 2COC and 2C1C. Note that in state 2COC, the two agents playing D do strictly worse than the two agents playing C. Provided is small (more importantly vanishing much more quickly than ED), then one of the D agents is typically the agent hunting for a new strategy. In the 2C1C, the agent playing C in the community with the D-agent is the agent most likely replaced (and most likely with -100- Learning, Evolution and SeOrganization Simple Spatial Evolution 13 It is this sort of behaviour that was typical in the lOx 10 simulation when the a D). number of agents replaced per generation was moderate. One interpretation of the four-agent model is that it is an idealized form of the many agent pairs which exist on the lox 10 torus. In this case the “noise” comes from the fact that in the larger system agents are interacting with more agents and fitness calculated from one interaction is only an approximation of fitness in the larger (four interactions) system. Secondly, the search for strategies is not limited to four agents but to the entire grid of 100 agents (plus crossover and mutation). Thus one could think of i capturing the first of these imperfections and capturing the second. Recall that the twice repeated game did not necessarily evolve to the Nash equilibrium and the behaviour was classified as Frozen, Critical, Stationary and Converged according to the number of agents replaced per generation. If the model here does indeed capture the essence of the more complex simulation, then there is a relation between the number of agents replaced per generation and the noise parameters of this simple model. For example, the large number of replacements in the Converged region is akin to a large value for both i and D (more precisely, similar values for the parameters) where all-D results. Both a high level of replacements and a high ,, (particularly i) reduce the level of correlation induced in play by decreasing the importance of fitness (and indirectly strategy choice) in determining which agents seek new 13 Note the relative probabilities of the two states in this case depend on the actual payoffs in the prisoner’s dilemma. They are given byx in equation (3.13). Note one can also calculate the mean first passage times, TV, in this situation from state 2 (2C1C) to 3 (2COC) and vice versa. They are r(2e+e)Ie and r,(c+d)Ic. - 101 - Simple Spatial Evolution Learning, Evolution and S4fOrganization strategies. In both cases strategies are matched more randomly and evolution leads to the all-D Nash equilibrium. Similarly, the Critical Region is akin to where < In terms of the larger lOx 10 system, this translates into a higher likelihood that regions with agents playing D have several agents replaced with C strategies (ie. several realizations of nonfitness based replacement occurring at the iD probability) rather than vice-versa. Indeed, in the 10 X 10 model, this is the case since agents in regions of C agents have higher fitness than those in the D regions. 14 Thus the simple four-agent Markovian model presented here does capture the behaviour in the much more complicated 10 X 10 spatial model of Chapter 2. ifi - Conclusions The purpose of this chapter has been to offer some analytical approaches to the spatial models of evolution. In particular, I have surveyed the possible approach using replicator dynamics. The difficulty with this approach is that the correlations between agents’ strategies (possible) induced by a spatial model is not easily captured. The second approach is to model spatial evolution as a Markov chain. This analysis seems to capture the essence of the complex simulation. This is encouraging since the huge state space makes a completely analytical examination of the 10 x 10 spatial evolution model practically impossible. 14 . . . . For simplicity, I have called the regions C and D in the lOx 10 system. In fact, since this is a twice repeated game which is being described, the C region is best described by agents using a C-then-D strategy. . . . -102- . . Learning, Evolution and Sey’Organization Simple Spatial Evolution The next three chapters move away from strategic situations to consider an environment where co-adaptation occurs through prices in a large financial market. -103- Chapter 4 Stochastic Adaptive Learning in a Rational Expectations Model The informational efficiency of financial markets has been extensively studied and often serves as an axiom for other research such as event studies.’ Despite this large volume of research, the procedure by which large financial markets process information is not well understood. Without such an understanding, it is difficult to address questions about security market behaviour, regulation or design. Rational expectations models partially address the information processing mechanism. Models, such as Grossman (1976), describe how agents can infer information from equilibrium prices. However, these models assume 1 See Fama (1991) for a survey of market efficiency. Event study methodology is discussed in Fama, Fisher, Jensen and Roll (1969) and Thompson (1985), for example. -104- Stochastic Adaptive Learning in a Rational Erpectations Model Learning, Evolution and SeOrganization agents are unboundedly rational and possess not only a complete knowledge of the economy’s structure but also a detailed knowledge of other agents’ beliefs and objectives. Little explanation is offered for how agents acquire sufficient information to make appropriate inferences from endogenous parameters. In complex financial markets, like the NYSE, these assumptions seem far from realistic. Strong rationality and informational assumptions are often justified by appeals to repeated play and adaptation. Agents learning via trial and error experimentation and imitation do not require sthngent assumptions about agent rationality or information. Agents are described by rules-of-thumb which are adopted, modified and abandoned based on past success. One might expect that in large financial markets, with a large number of well motivated traders, deviations from rationality should be small, non-systematic and transitory. This chapter and the two which follow investigate the reasonableness of this argument by considering the conditions for when the outcome of adaptation is likely to coincide with a rational expectations equilibrium. This helps to determine when rational expectations are likely to be good approximations to the behaviour of financial markets like the NYSE. In addition, a better understanding of trial and error learning processes offers insight into how financial markets reach equilibrium and process information. Specifically, the one-period risky asset model of Grossman and Stiglitz (1980) is augmented to facilitate modelling learning. The model developed in this chapter is used in the two - 105 - Learning, Evolution and S4/Organization Stochastic Adaptive Learning in a Rational Expectations Model subsequent chapters. To explore the implications of adaptive learning, the single period model is repeated. In the model, agents must learn the relevant exogenous relationship between the model’s parameters as well as the parameters of the endogenous relationship which determines equilibrium price. The model underscores the difference between learning an exogenous parametric relationship and learning an endogenous (equilibrium) relationship. Parametric relationships are unaffected by other agents. Equilibrium relationships depend on the behaviour of other agents, and therefore depend on the learning of the other agents. The chapter characterizes a stochastic adaptive learning process that captures the intuition of experimentation and imitation. It is shown that a stochastic adaptive learning process which converges to the Grossman-Stiglitz (GS) rational expectations equilibrium always exists. In addition, the stability of the GS equilibrium for a fixed stochastic learning process is considered. It is shown that the key to stability is the size of agents’ experiments relative to the sensitivity of the informativeness of the equilibrium price to changes in the proportion of the population which is informed. The remainder of the chapter is organized as follows. The next section discusses the background research upon which this chapter draws. Section 4.11 introduces the repeated one-period asset pricing model. Section 4.ffl discusses the effect of stochastic adaptive learning on the stability of rational expectations equilibria. The intuition developed in this section is developed further in Chapter 5 which models learning using deterministic dynamics -106- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and S4fOrgarnzation and Chapter 6 which uses a genetic algorithm as an example of a stochastic adaptive learning process. I - Background Research on learning in rational expectations models is often classified into two categories. The first looks at learning schemes which are the natural extension of “rationality”. Each agent specifies a likelihood function over the unknown parameters and updates her beliefs according to Bayes’ Rule. The likelihood functions form a rational expectations equilibrium in that each agent’s likelihood is correct given the likelihood functions of the others. While other forms of learning might appear ad hoc, this specification is at least as informationally and cognitively demanding as traditional rational expectations models. The second category of models focus on least squares learning as a model of a boundedly rational agent. In these 2 models, agents treat endogenous relationships as exogenous and minimize a forecast error. Many of the models of rational or least squares learning treat learning as a deterministic algorithm. Given the same observations (history), two agents will have identical beliefs and 2 For example, Marcet and Sargent (1988, 1989) have introduced models where least squares learning does converge to the rational expectations solution. More generally, agents follow a myopic strategy which tries to estimate endogenous relationship as if they were exogenous. Evans and Honkapohja (1994) present general convergence results for this type of model. For background material on rational learning models see Blume, Bray and Easley (1982) or Blume and Easley (1993). For learning in a model similar to the GS model used in this paper, see Bray and ICreps (1987) investigate Bayesian learning and Bray (1982) considers least squares learning. -107- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeifOrganization actions. In this thesis, learning is modelled as a stochastic adaptive process. Unlike models of learning where agents continually and unboundedly use Bayesian updating or minimize forecast error, this thesis treats learning as a process of trial and error. Good strategies are imitated. Bad strategies are abandoned. Periodically agents experiment with a novel strategy. An efficient stochastic learning algorithm is one which effectively trades off the 3 exploitation of current knowledge against experimentation to acquire new knowledge. There is a natural relationship between stochastic adaptive learning and evolution. The economic notion that agents making systematic mistakes have an incentive to modify their behaviour parallels the natural selection of good genes from a diverse gene pool. Stochastic models of learning are closely related to recent models in evolutionary game theory which incorporate random perturbations or mutations (Kandori, Mailath and Rob (1993) and Young (1993)). In the rational expectations model introduced in this chapter, an important implication of stochastic learning is that agents with common observations need not behave identically. Thus, in addition to learning about the environment, agents must learn about other agents. The complexities introduced from learning about learning are related to co evolution in biology. The preceding two chapters both focus on situations where an adaptive change by one agent alters the environment of the other agents. In a rational expectations For example a monopolist facing an unknown demand curve must both experiment with different prices to “plot” the demand curve and choose a price that maximises profits. Easley and Kiefer (1988) consider optimal (dynamic programming) models of this trade-off while Arthur (1993) focuses on models which can be fitted to human (perhaps sub-optimal) learning processes. - 108 - Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeOrganiza1ion model, an agent’s optimal response to an endogenous parameter (asset price, for example) changes as other agents alter their behaviour. Finally, this chapter is related to several papers that consider evolution and asset prices. Blume and Easley (1992) model the evolution of portfolio strategies. Strategies which are associated with higher wealth are selected more frequently. In this model rational (utility maximising) strategies need not survive. This result is similar to that of De Long et.al. (1990) where “irrational” noise traders actually maximise their evolutionary survivability by generating high expected returns. The evolution or imitation mechanisms (by assumption) select high wealth strategies and not necessarily those that yield high utility. Cabrales and Hoshi (1992) study the evolutionary selection of differing types of “irrational” trading strategies (chartists, for example). In contrast to Blume and Easley or De Long et.al., the evolutionary choice mechanism in Cabrales and Hoshi favours strategies that give higher expected utility. These examples highlight the fact that optimal strategies depend on what others are doing. However, unlike these articles, this chapter will focus on a single period asset pricing model where dependence on other agents occurs because agents make inferences using endogenous relationships. - 109- Stochastic Adaptive Learning in a Rational Ezpecto,tions Model Learnrng, Evolution and Sal-Organization II - Model In this section, the asset pricing model for the economy is introduced. Learning and its effect on asset demands are also discussed. This model is also used to consider learning as a deterministic dynamic in Chapter 5 and to simulate learning in a genetic algorithm in Chapter 6. (A) Structure The model is developed from Grossman and Stiglitz (1980). The GS model is a one-period endowment economy where agents can purchase a costly common signal about the period-end dividend of a risky asset. Agents choosing not to buy the signal make inferences about the signal from the competitive risky asset equilibrium price. Since all informed agents receive the same signal, the equilibrium price offers no additional information to the informed agents about the dividend. Since this chapter studies adaptive learning, the one-period model of GS will be repeated by successive generations of agents. While agents observe and can learn from previous generations, the economy they face is essentially a single-period one. 4 In this stationary environment, we will investigate if stochastic adaptive learning (from previous generations) An alternative interpretation of the model with N infinitely lived agents is presented in Bray and Kreps (1987). To preserve the one-period nature of the model, agents trade at t and dividends are realized and completely consumed at t+ ½. Since wealth is non-storable across periods, agents are faced with a series of single-period problems. -110- Learning, Evolution and SeifOrganization Stochastic Adaptive Learning in a Rational Expectations Model is likely to lead to GS rational expectations equilibrium. A discussion of the complexity of considering learning in a dynamic model is deferred until the model and learning have been introduced. The following assumptions characterize the model. Assumptions 1. At each date t=O, 1,2,..., a generation of N one-period lived agents receives an endowment and trades. Agents have identical constant absolute risk aversion (CARA) preferences, with a risk aversion parameter of a, over end-of-period wealth. No wealth is transferred between generations. 2. Two securities, a risk-free and a risky asset, trade in a perfectly competitive and frictionless market. 5 3. The payoff on the risky security for generation t is denoted d, and is given by (4.1) =I3 1 d where the /3 and i3 are constants and )‘ and e are normally distributed as described below. The € is not observable at t. 4. Agents can choose to observe the signal, y , at cost, c. 1 These agents are said to be informed. All informed agents receive the same signal. Each period, the set of N agents If N is small, this assumption is unrealistic. However, the model can be interpreted as having N finite classes of agents where each class has an infinite number of atomistic agents. Jackson (1991) considers a model similar to the Grossman-Stiglitz model except where agents do not act atomistically. Strategic trading can eliminate the need for noise traders in establishing equilibrium existence. The perfectly competitive case is determined as a limit of the his model. — 111 — Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeOrganization can be partitioned into I, the informed agents and U, the uninformed agents. Define X to be the portion of the population that is informed. 5. The per capita supply of the risky asset at t is e shares which is normally distributed as described below and is unobserved by the agents at t. 6 6. The random variables y , 1 and e are independently distributed across periods as follows. 0 Yt — Normal e 0 e 00 , 0 cj 0 (4.2) O0cy 7. The risk-free asset is in zero net supply. Both the payoff and the price of the risk-free 7 The equilibrium price of the risky asset for generation t, asset are normalized to one. is denoted P and is stated relative to the risk-free asset. 8. The model and parameters are common knowledge except for i3o and i3. The agents’ knowledge of f3 and i3 are discussed below. In particular, the agents may not know the value of these parameters. 6 The assumption that the supply of the risky asset is stochastic is necessary for the existence of a rational expectations equilibrium with a costly signal. A noisy asset supply (often referred to as noise traders) prevents the rational expectations equilibrium price from being fully revealing. Recall that each generation consumes only once and no wealth is transferred between generations. Therefore, as in the standard one-period model, this assumption is without loss of generality. -112- Stochastic Adaptive Learning in a Rational Ezpectations Model Learning, Evolution and S4fOrganization Since an agent lives for only one period, the remainder of this section suppresses time 8 and W 1 is the end-ofsubscripts. W ’ is the endowed wealth at the beginning of the period 0 period wealth. Agent n’s utility is a function of risky asset demand (shares of the risky asset), x, as follows V(x) = E[_exp( —a W ) I fv] 5 1 1 = W-) O’c+x”(d—P) incorporates the budget constraint. where W — (4.3) is the information (both private and public) available to agent n. O’ is 1 if the agent is informed (ie. yE O) and 0 if the agent is uninformed (ie. y O’). The next section considers the formation of asset demands, conditional on information choice. Section 4.ffl discusses the information choice itself. Before introducing learning in the model it is helpful to consider the features of the model that make it an interesting environment for considering learning. The first is that the rational expectations solution can be calculated in closed form. The Grossman-Stiglitz (GS) rational expectation equilibrium serves as a bench-mark for considering the effect of learning. 8 Note the CARA preferences produce demands which are unaffected by initial wealth. Assumptions, therefore, about the initial endowments are for the most part immaterial. However if the initial endowment is in shares of the risky asset, it is necessary that the endowment not provide infonnation about the total endowment. If this is not assumed then endowments would affect demands through the information they provide about d. In addition, initial wealth affects adaptation in the genetic algorithm implementation in Chapter 6 by changing relative utilities. This is discussed further in that chapter. - 113 - Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeifOrganization 9 Second, because the model is not an intertemporal one, it does not have multiple equilibria. In contrast, Bullard and Duffy (1994) and Arifovic (1994b) use (genetic algorithm) learning as an equilibrium selection in overlapping generations models. Third, the learning environment is richer than the limited “rule-of-thumb” asset demands found in De Long et al. (1990) or Cabrales and Hoshi (1992). Here, agents will formulate reasonable beliefs and optimal portfolios based on those beliefs. Fourth, the model can incorporate alternative specifications of learning such as stochastic adaptive learning (the genetic algorithm in Chapter 6 for example) or deterministic learning like least squares learning and the learning process studied in Chapter 5. Finally, the model can be used to highlight the difference between the informed agents’ simple learning of exogenous relationships and the uninformed agents’ complex learning of endogenous relationships. B Learning In a rational expectations model, such as that of Grossman-Stiglitz, agents know all the relevant parameters with certainty. Furthermore, such information is common knowledge. Under this assumption, the optimal risky asset demands of the agents are linear. This ° In the Grossman-Stiglitz 1 follows from the properties of CARA utility and normality. model demands are As presented, the rational expectations equilibrium price relationship and GS equilibrium proportion of informed agents (X’) are unique. However, since the GS equilibrium does not speci1’ which particular agents are informed, strictly speaking there are multiple equilibria. 10 See Grossman (1976) for details. -114- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and S4fOrganization = E[dIfP] P a V[dItr] (44) — where conditional expectations are linear in the signal, y (if informed) or risky asset price, P (if uninformed) and conditional variance is constant. In this model, it is assumed that the 0 and the j3 are not common knowledge. It is these parameters and the endogenous fl relationship between signal and price (discussed below) that traders need to learn. Alternatively, traders need to learn how to choose their asset demands based on their information. There are several alternatives for modelling the learning of the asset demands. Grossman and Stiglitz assume rational expectations. Their analysis starts after learning is 11 One approach to learning is to assume that traders have prior beliefs about “complete.” parameters (exogenous and/or endogenous) and learn by updating their beliefs according to Bayes Theorem. However, since in general posterior beliefs are non-normal, demands will 2 Alternatively, one can specify not be linear and the model is very difficult to analyze.’ a functional form for asset demands and consider how agents learn parameters of the imposed structure. This is the approach used in least squares learning and the approach taken in this chapter (and the two which follow). Grossman and Stiglitz (1980) state (page 396), If over time uninformed traders observe many realizations of [y,P], then they learn the joint distribution After all learning of the joint distribution ceases, ... 12 ...“ See Bray and Kreps (1987) for an example of a Bayesian learning model. - 115 - Stochastic Adaptive Learning in a Rational Expectations Model Learning. Evolution and S4/Organization Assumption 9 Asset Demands are of the form in (4.4) where conditional expectation and variance, depending on information choice, are as follows’ 3 E’[dly] V’[dly] E’[dIP] = (4.6) + p + = 9V”[dIP] (4.5) + = + ..yU = (4.7) (4.8) The learning state of trader n is defmed as the set V = {O, £z}. The O indicates whether or not the agent is informed. The £z is the inference portion of the learning state and determines how the agents use information to calculate demands. That is = .y, ((3), , ,u) (4.9) These parameters determine how the agent calculates asset demands in the situations where she is informed and where she is uninformed. Note that the linear demands of the informed and uninformed are characterized by three parameters. This over specification of the linear 13 denotes an informed individual and u denotes uninfonned individual. Parameters which refer to aggregate behaviour of informed or uninformed will be denoted with superscripts land U respectively. - 116- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and S4fOrganization functions is innocuous and serves only to simplify discussion. Note that the learning state is agent specific. Since learning in this thesis is a stochastic adaptive process, the parameters used for calculating asset demands will vary across agents. The information sets, on the other hand, are not agent specific. All informed agents observe the same information. The relevant information sets are (F = {y,P} and O = {P} 14 Assumption 9 is important since it implies that the equilibrium price will be linear in the signal y (shown below). This maintains the tractability of the standard exponential-normal environment. The assumption can be interpreted literally as successive generations of agents acting as if the parameters in their learning state were the true parameters. Clearly this is a bound on agents’ rationality. Note that asset demands in (4.4), with the correct parameters, will maximize expected realized fitness. Since traders are assumed to adapt based on the past success, it is realized utility (or fitness), rather than subjected expected utility which drives adaptation. This point is discussed in Blume and Easley (1992). Finally note that the structure of the asset demands are consistent with the Grossman-Stiglitz rational expectations equilibrium. 15 14 15 Note that similar to models where agents have differing prior beliefs, agents get no contemporaneous information from other agents’ learning states (which may be partially inferred from the equilibrium price). The alternative to making Assumption 9 is to abandon the simplicity which comes with a linear pricing relationship. It appears from numerical calculations that such non-linear equilibria exist (see Bernardo and Judd (1995), for one such example). In addition, the non-linear equilibria have the necessary feature that price is monotonically (and smoothly) increasing in signal. The difficulty in applying such a model in a learning environment is how to describe an individual trader. In the linear model presented in this chapter (and those which follow), an agent can be described by a small vector of parameters. In non-linear models this is no longer the case. Ongoing research is investigating whether agents could be simply described by artificial neural networks (ANN). This has potential since ANN’s can approximate non-linear functions with a small number of parameters. In general, rational expectations models assume that agents know both the functional form and the appropriate parameterization. In the model used in this paper, agents know the functional form but must learn the parameters of the endogenous relationship. A more general model using ANN’s would have the agents learning the functional form as well. -117- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and S4fOrganization The learning state for agent ii, = {O”, fn}, determines the information choice of agent n and , is 1 her parameters for calculating asset demands. The population learning state, L= {V}.. the collection of learning states of the agents. The population learning state determines the equilibrium asset price relationship as stated in the following lemma. Besides describing how agents make inferences, L determines the sets of informed, I, and uninformed, U, agents. Given L, the equilibrium price for the risky asset can be described. , the equilibrium price is give by the following linear equation 1 Lemma 1 Given L= {L) a +o! e—e) 2 + 0 Pa y 1 ( Proof. (4.10) See Appendix B. (C) “Rational” or Complete-Knowledge Learning States In the following sections, an adaptive process for the learning states is discussed. In order to determine where adaptation should lead, this section identifies the rational or completeknowledge learning states. The optimal learning state would be the choice of an agent who, unlike the agents considered in the model, had complete knowledge of the model parameters (j3 and i3) and the behaviour of other agents’ learning states (contained in L). Lemma 2 Given L, the rational inference portion of the learning state can be described by lu*.),*) 13 £*=(Oi*f3li*/*flOu* where -118- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeOrganizadon 13* 13* = Proof. 13; 1314 a 13 13* = 13 (4.11) ,,j* = = = = 13*2 2 22 2 aloy+a2oe 22 2 2 CEjO•y + a2oe See Appendix B. The rational inference parameters of the agent if informed follows from the exogenous dividend-signal relationship. The rational inference parameters of the agent if uninformed reflects the more difficult task. Not only must a rational learning state reflect the true exogenous relationship between dividend and signal (equation (4.1)) it must also correctly untangle the endogenous relationship between equilibrium price and signal (equation (4.10)). It is for this reason that the rational learning state depends on the learning states of the other agents. Note that the a’s from (4.10) depend on the population learning state. Given this interdependence, the rational learning state can be thought of as containing best reply functions. An immediate product of Lemma 2 is the risky asset price in the rational expectations equilibrium. The rational expectations equilibrium results when all agents are characterized by a rational learning state (ie. VI=f* for all ii). -119- Since all agents are identical, except for Stochastic Adaptive Learning in a Rational Ezpectations Model Learning, Evolution and Seif Organization their information choice O, it does not matter which specific agents are informed. Thus, for a given proportion of informed agents, X, the rational expectations equilibrium price relationship is given by the following lemma. Lemma 3 The unique rational expectations equilibrium price for a given X is determined by P Proof ifi - = a + a y + a (e—) (4.13) See Appendix B. Learning, Equffibrium and Stabifity Thus far the choice of whether or not to become informed has not been discussed. This section considers this choice by comparing the expected utilities of the informed and uninformed agents. This comparison is used to consider the stability of the GS equilibrium when agent learning is stochastic and adaptive. (A) Comparison of Informed and Uninformed Utility A rational choice of whether or not to become informed (choice of 6) requires a comparison of the cost of being informed, c, with the increased uncertainty one faces when uninformed. -120- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolunon and SetOrganization We formalize this comparison after discussing the sources of uncertainty faced by the uninformed agent. Uninformed agents face four potential sources of uncertainty. i The €-noise in the dividend. ii Noise introduced in the inference of the signal by the aggregate supply noise. iii Uncertainty about the relationship between the signal and the dividend; equation (4.1). iv Uncertainty about the relationship between the price and the signal; equation (4.10). The first two sources of uncertainty are identical to those of the GS model and are present in the rational expectations solution. The nature of the third source of uncertainty is identical to the uncertainty faced by the informed agents. The final source of uncertainty is what makes the task of the uninformed difficult. The a parameters in (4.10) are functions of the informed and uninformed agents’ learning states. Every time any agent changes the inference portion of her learning state or the proportion of informed agents changes, all the uninformed agents must learn a new pricing relationship and adjust their learning states accordingly. The adaptive learning of an uninformed agent must “co-adapt” (or co-evolve) with the learning states of both informed agents and other uninformed agents. Recall that all informed agents receive the same signal and therefore do not rely on the equilibrium price for information. The informed agents, therefore, do not face this problem. - 121 - Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SerfOrganization This discussion is formalized in the following calculation of the relative utility levels of the agents conditional on being informed or uninformed. Since the learning process is adaptive, the relevant comparison is between expected realized utilities of the informed and uninformed agents. Thus the expectations are not taken with respect to individual agents’ beliefs (incorporated in their learning state), but under the true dividend process described by (4.1) and (4.2). After introducing a few simplifying assumptions, the next two propositions calculate the expected realized utilities. Letf(r,LIP)=E[v(W(x’))IP] and f(V,L)=E[V(W(f))]. These are the expected utility of the informed agent conditional on P and unconditional, respectively, where W’(x) is the end- of-period wealth as defined in (3), given optimal asset demands, f, from (4.4) and the parameters contained in V, when the population learning state is L. Similarly, for the uninformed agents, let fu(,L I F) =E[V(W(x’)) I F] and fu(r,L) =E[V(W())]. In order to simplify the algebra let 130* and 131* in (4.1) be known by the agents and for concreteness let 13 =1. This focuses the discussion on the more interesting learning of the endogenous price-signal relationship rather than the straightforward learning of the exogenous signal-dividend relationship. Since the I3*s are not assumed to be common knowledge, the learning of the uninformed remains non-trivial. In the genetic algorithm simulations of this model in Chapter 6, this assumption is relaxed. Recall that the linear asset demands of the agents (linear in y for the informed and in P for the uninformed) are -122- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeOrganization characterized by three parameters. Since a linear function has only two free parameters, for both the informed and uninformed agents, the conditional variances parameters will be held at their rational levels Proposition 1 (-( and ‘y”). Given [3f=(3 and (3/=(3=1 and given a population learning state L, the expected realized utilities, conditional on F, are as follows: 2 f(V,LI F) = exp(a c) fU(C * + fu(tn,LIp) In( = ½ 02(L) ,L IP) 2 fU(f*,LIP) exp 2( + (4.14) (4.15) 02(L)) ”)F. 1 where o (L) = V[y I P]=- and (F) = (U fl 2 ) + (fl —j3 0 Proof. See Appendix B. Note that both utilities are stated relative to the utility of an uninformed agent using the rational learning state parameters, £ *(L) The three elements which contribute to the relative performance of the informed and uninformed agents are the cost of information, c, the (L), and the inference errors made by the 2 informativeness of the price about the signal, o -123- __________ Stochastic Adaptive Learning in a Rational &pectations Modal Learning, Evolution and S4f-Organizazion uninformed, r(P). Recall that if r=f(L) then no inference errors are made and “(P)=O. Note that an uninformed agent’s expected utility (conditional on price) decreases as inference 16 errors become larger. In order to compare the desirability of being informed or uninformed, the unconditional expected realized utility must be calculated. In particular, the relative values of the expected realized utilities are of interest. In the following proposition the utilities are normalized so thatf’(r,L)=l andfu(r,L) O. Proposition 2 Given (3,,’=13 and (=f3=1 and given a population learning state L, the relative realized expected utilities are fu(f,n,L) = 1 and 2 05 exp(ac) + (L) 2 o (4.16) 1÷C(1B2) ½ • exp 1+C 2(l+Co,J(1+Co,(1_B2)) >O andf(r,L)=O otherwise and where f1+Cc4(1-B ) 2 16 Note that the negative exponential utility implies thatf(t’,LI F) <0. -124- _____ Stochastic Adapsve Learning zn a Rational Erpectoiions Model Learnthg, Evolution and S4f-Organization X = f3 +(P)-P = j3 +(flr -1)P —Nonnal(i,o) (4.17) * A = i3—i3; B (4.18) - (4.19) = I3 -1 * 1 = + Proof. (4.20) (L) 2 o See Appendix B. While the expression for the unconditional relative utilities is more complicated, the unconditional relative utilities have similar intuition as in Proposition 1. Note that A captures inference errors in the intercept and B captures inference errors in the slope. When A=B=0, no inference errors are made and the expression forfu will collapse to facilitate calculation of the GS rational expectations proportion of informed agents, X (see below). - 125 - Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and S4lOrganizazion Any inference errors (ie. A or B moving away from 0) will reduce the utility of the uninformed agent. The condition 1+ Cc(1-B ) >0, which implies that 2 essentially states that slope inference errors ((f314*) f’ is non-zero cannot be too large. As the slope-based inference errors become large the absolute level of the uninformed realized expected utility approaches negative infinity and the ratio of informed to uninformed utility approaches zero. 17 The GS rational expectations proportion of informed agents, X*, follows directly from propositions 2. In the GS rational expectations equilibrium, agents do not make inference errors, and therefore, (P) E0 or A=B=0. The equilibrium follows from noting that for 18 X E (0,1), agents must be indifferent between being informed and uninformed. Corollary 1 if the population learning state in a rational expectations equilibrium, V. implies a proportion of informed agents XE (0,1), it is given by -(e-1) = (4.21) e—1 17 If z—N(jic#) then E[exp(-kz )j is not finite if kr’. 2 It is also possible that X’= 1 or indifferent. These cases occur if there is no X for which the informed and uninformed agents are - 126 - Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeI’Organization Proof. See Appendix B. (B) Stochastic Adaptive Learning Process Thus far, we assumed the population learning state was given and we focused on a single period. This section describes the learning process which generates a sequence of learning states through time. The properties of the learning process and their effect on the GS equilibrium is analyzed in the following section. A stochastic adaptive learning process for agent n, = {Ln , is defmed as } (4.22) t=1 The L={r’,V} is a modified learning state where rE (0,1) is the probability that agent n, at period t, is informed (ie. prob(O’ = 1)). Note that the only difference between a modified learning state and a learning state is that in the modified learning state the agent’s information choice, O, has not been realized. Instead, the information choice is represented by a probability of becoming informed. The learning process generates a sequence of learning states by determining how learning states are updated. Since learning is adaptive, can depend on the population learning states, L, from time 19 . . T 19 In addition, agents t. The assumption that modified learning states are updated based on past learning states, as opposed to past modified learning states, is not important but is consistent with the view that adaptation is based on observed past behaviour of other agents. -127- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeifOrganization will imitate strategies which are more successful more often. Thus learning states associated with relatively high utility (fitness) at t are copied by agents for use at t+ 1. Since the learning is stochastic, the transition from the learning states of one generation to the next, 4n to .L, is probabilistic and E (0,1). The probabilistic transition and the strictly mixed information choice capture the experimentation of learning. In summary, agents experiment with different learning states (stochastically) and mimic those learning states which are more successful (adaptively). For a given choice of a particular learning state, however, asset demands are determined rationally. The model is constructed to facilitate an adaptive (or evolutionary) interpretation of learning. However, as previously discussed, it is not a dynamic asset pricing model. The model is that of a repeated one-shot game. Similar to the underlying interpretation in evolutionary game theory, the model is that of a sequence of one-period-life agents who observe the past and are myopic. In the one-period model, agent fitness depends on the ability to predict the end-of-period dividend (mean and variance). Since this may require an inference from the endogenous price-signal relationship, an agent’s fitness depends on the current population learning state. In a dynamic model, agent fitness (and objectives) are more complex. Agents would need to predict the next period return which consists (at least in part) of next period’s price. Since future prices depend on future learning states the degree of co adaptation (or co-evolution) increases dramatically. This sort of complexity, without learning, is studied in Kraus and Smith (1994). While it would be interesting to apply -128- Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and S4jOrganization stochastic adaptive learning and genetic algorithms to a dynamic GS-like model such as ° 2 Wang (1993), it is beyond the scope of this chapter. (C’ Learning and Local Stability This section defines a local learning equilibrium. The definition is intended to capture the behaviour of stochastic adaptive learning close to the GS rational expectations equilibrium. Definition L, = (L] is a local learning equilibrium f (a) for all n=1,...,N = = and (b) for any a> 0, there exists a T such that for all t n = 1,... ,N, L is within 20 lirn (T:,f) (4.23) T and for all of a best response to (Li,... Note that utility functions which produce myopic portfolio strategies reduce the complexity of a dynamic model only slightly. In a dynamic model, returns depend on future prices even with myopic portfolio policies. In addition, the myopia implied by single-period lived agents implies that when agents adapt by imitating others they do not consider the long-run effect (‘ic. that their strategy might be imitated) or that others are also adapting. Interpretations such as Bray and Kreps (1987) (see footnote 4) that agents are long-lived but lack the ability to transfer wealth between periods becomes complicated in a model with adaptive learning. In such a world agents might have incentives to consider the dynamic implications of learning. The reasonableness of the myopia assumption in a complex environment, such as a financial market, is an open question. Kandori, Mailath and Rob (1991) discuss myopia of adapting agents in the context of evolutionary game theory. - 129 - Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and SeOrganization There are several important features of the definition. First, (a) implies that a stochastic 21 Thus, the adaptive learning process is a local learning equilibrium only if it converges. definition is used to identify properties of learning close to the GS equilibrium. Second, (b) implies that the converged learning states must form mutual best responses (ie. are rational learning states). Thus the definition is a refinement of a Nash equilibrium and the GS equilibrium. The requirement of (b) captures the “adaptive” nature of learning in this model. Agents have an incentive to continue adapting as long as utility (or fitness) can be increased. Third, condition (b) is stronger than requiring that the converged learning states form a Nash equilibrium. In fact since learning processes are stochastic, the local learning equilibrium definition is equivalent to a normal form version of a (trembling-hand) perfect 22 It is necessary, in a learning equilibrium, that learning states be robust to equilibrium. trembles since experimentation by other agents vanishes only in the limit. When agents are learning by trial and error, optimal strategies must be robust to other agents’ 4 experimentation. 2 ’ 23 21 Note that £ € W. Convergence is defined as point-wise in each of the seven elements. Convergence and within with respect to Euclidean distance. are both See Kreps (1990) for a definition and discussion of perfection. Trembling-hand perfection was developed in Selten (1975). The definition of local learning equilibrium is also closely related to that of Evolutionary Stable Strategy (ESS) equilibria. ESS are appropriate for models of adaptation (or evolution) since they select a collection of strategies (or learning states in our model) which are robust to mutant strategies (ie. experimentation). For a fornial definition and discussion of ESS and perfect equilibria see van Damme (1987). Using an identical argument to that used by van Damme to establish that every ESS is a perfect equilibria, it is straight forward to show that every ESS is also a local learning equilibrium. 24 Note that this section relies on the assumption that N is finite. -130- Learning, Evolution and S4fOrganization Proposition 3 Stochastic Adaptive Learning in a Rational &pectations Model The GS Rational &pectations Equilibrium X (where X denotes the proportion of informed agents), is a local learning equilibriwn when o> 0 and N is large. Proof. Consider the following stochastic adaptive learning process. For X N agents let =1 —r’ and for the remaining (1— X)N agents let =h < 1 lim,h co). i= h• t’ (where 0 <h, <t and For all n agents, let £‘=f(L ,)+ where ER 0 6 is uniformly distributed in the hypercube [— . The £ is defmed by (4.11) and (4.12) where L(=lim,L, is 6 ] , 2 the limit of the population learning states. Note that proportion X of the population will have r, = 1 (ie. informed) and proportion (1 —XD will have =0 (ie. uninformed). Therefore L =L for all agents (see (4.23)). Thus, L , =L and by construction L,, =L. 0 Next, note that since t(L ) is continuous in the elements of the population learning state, L 1 , 1 as t gets large, 1? is arbitrarily close to t(L for all agents. It remains to be shown that the agents’ information choice (T’s) are close to best responses. From inspection of (4.16), note that and 4fu(t*,LD are equal at X when V=t(L). Secondly, note that 2 is strictly decreasing in X for oe >0. Therefore, locally about X, -r4-f”(f,L) >0 since o (4.16) is also increasing. Thus fu(L7S(L*),L*+) > fu(.(L.)L.)p=1 > fu(fs(Ls),L._) where L (4.24) (L) is the GS equilibrium population learning state, L, with a small decrease (increase) in X. Recall that the GS rational expectations equilibrium is constructed so that - 131 - Learrnng, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Expectations Model agents are indifferent between being informed and uninformed; that is fU(1? *(L*) ,L*) fi =1. Inference errors caused by changes in other agent’s learning parameters (the ) are becoming arbitrarily small at a rate proportional to j2, and ir is approaching its limit at rate proportional to t. Thus by appropriate choice of h )] =1 (taking 1 , for large t and N, E[f”(V,L 1 expectations over possible population learning states), leaving agents indifferent between being informed or uninformed and therefore indifferent to ir. The proof highlights the two different sources for the (P) inference errors of the uninformed. The rational inference parameters of the uninformed, (4.12), depend on the learning state of’the population. Agents can be mistaken about the information choice of the other agents in the population learning state (the other agents’ 0 or more simply the value of X) or the inference portion of the population learning state (the V of other agents). Consider a population learning state at the GS rational expectations equilibrium learning state, L*, where by definition, no inference errors are made. If any agent (informed or uninformed) happens to alter an inference parameter (an experiment), all uninformed agents are worse off since their inference parameters, €z=€*(L, are no longer correct. In contrast, consider the effect if all agents continue to use £(LD but there is a small change in A away from X. If A declines then all uninformed agents are worse off since the price is less informative (ie. o 2 increases) and they make small inference errors. If A increases, however, the price is more informative. This acts in the direction of making uninformed agents better off. Moreover, since the change in A is small, inference errors resulting from the small -132- Learning, Evolution and S4fOrganization Stochastic Adaptive Learning in a Rational Expectations Model change in X can be ignored (a result that follows from the Envelope Theorem) and all uninformed agents are better off. The existence of a local learning equilibria near the GS equilibrium relies on small trembles (experiments) in the X and much smaller trembles in the inference parameters. The result is that, on average (and for large t), uninformed agents remain indifferent about their information choice? The above discussion leads to a corollary. When the supply of the risky asset is constant (o=0) and the information is costless (c=0), then XE (0,1] are all GS equilibria. However, equilibria with X* <1 are not robust to trembles. Small positive changes in the X cannot make the price more informative since it already perfectly reveals y. Inference errors always leave the uninformed agents worse off. Thus they choose to become informed. Corollary If o = 0 and c = 0, only X = 1 is a local learning equilibriwn Proof. Using the same defmition of learning processes in proposition 3, note that if V=t* since a2=0 (ie. the signal is perfectly revealing) and c=0, thenfi=fu. Therefore, given any perturbation from L, uninformed make inference errors and are strictly worse off than the informed. Thus =0 (choosing to be uninformed) is not a best response. U Note that f(t,L is concave in A. Thus f(f,L) is locally concave at (t(L),LD. The utility of the uninformed is concave near the GS rational expectations equilibria. Since this is the case, small increases in A must be slightly more common than small decreases in order to maintain the indifference of the uninformed (ie. h,> 1). This implies that the time averaged A induced by a local learning equilibrium must converge to X from above. - 133 - Learning, Evolusion and Sef-0rganizadon Stochastic Adaptive Learning in a Rational Expectations Model (D) Stability with Always-Experimenting Stochastic AdaDtive Learning The previous section demonstrated that learning processes can be described that lead to the GS equilibrium. While this is a necessary condition for the GS equilibria to be “useful,” it is not sufficient. This section describes conditions for stability for a given learning process. In particular, this section describes the robustness of the GS equilibria to fixed experimentations which do not necessarily vanish in the limit. This is important in understanding complex financial markets where the arrival of new agents and exogenous changes in the environment lead to continual experimentation. In addition, the inability to distinguish good rules from lucky outcomes also contributes to continued experimentation. The genetic algorithm simulations presented in Chapter 6 have experimentation rates which do not vanish and fitnesses (utilities) which are observed with error. This section develops a defmition of stability that is a refinement of a local learning equilibrium. The analysis is tailored to the learning process captured by the genetic algorithm simulations. First, a definition of a stochastic adaptive learning process with nonvanishing experimentation is needed. Define an always experimenting stochastic adaptive learning process (AESAL) as a stochastic adaptive learning process, , where for all agents n=1,...,N lim’r E (0,1) = (4.25) 1int = + -134- Learning, Evolution and S4!Organization where Stochastic Adaptive Learning in a Rational Expectations Model is a random variable (mean zero) in R 6 with a non-degenerate distribution. 26 In such a learning process, the lower bound on the size of experimentation is given by the certainty with which an agent is informed or uninformed (ie. min[r, 1— r]) and the properties of the distribution governing . A norm for the size of the experimentation rates could be constructed? 7 However, the details of this construction are not necessary for the discussion which follows and AESAL’s will simply be assumed to be indexed by K where K is a measure of the limiting size of the experiments. Since experimentation is non-vanishing, the learning process will never converge to any fixed population learning state let alone the GS rational expectations equilibrium. However, if the population learning state is usually close to the GS equilibrium, then it reasonable to view the GS equilibrium as stable. This is similar to the motivation for the definition of Evolutionary Stable Strategies (ESS). There are many possible ways to formalize this concept. This definition uses the time average X relative to the GS equilibrium value of X* Definition For > 0, the equilibriwn L is tX-stable to AESAL(K) f there exists some AESAL process with experimentation of at least K such that E [lim 1 X LT t=o T - —-- 26 27 * 1j < tX (4.26) It is not cn ial that these limits exist in order to consider stability. This is simply a convenient way of speciIiing the lower bound on the size of experimentation. For example such a norm would combine i with the properties of such as its bounds. - 135 - Learning, Evolution and SeOrganizasion Stochastic Adaptive Learning in a Rational Expectations Model A stronger definition of stability might also relate to the behaviour of e in the population. However, since the learning process is adaptive and an agent’s utility (fitness) depends on both information choice, 0, and inference parameters, 4?, the X will reflect the relative inference abilities of the informed and uninformed. The parameter in the definition allows for the possibility that X remains slightly above X in order to compensate the uninformed agents who are adversely affected by the experimentations of other agents. Figure 4.1 presents the utility of the uninformed agents (relative to the informed agents) from the definition of f” in (4.16). The figure will be used to help demonstrate that the stability of the GS equilibrium depends primarily on the slope of f. The graph plots the fitness of an uninformed agent against X. As noted previously, the informed agent’s utility, f’, equals 1 by a normalization. agent’s utility. There are four different curves presented for the uninformed They differ according to the assumptions about the agent’s inference parameters, 4?n, and the learning parameters of the other agents (contained in L) which affect the rational inference parameters, 4?* Curve A is the rational expectations level of utility. This is the maximum utility an uninformed agent can attain for a given X since r=4?*(L). Note that for a given X, since the informativeness of the price about the signal (o is not affected by altering L, neither is curve A. 28 The point where curve A intersectsf= 1 yields the GS equilibrium of X*. Curve B lies just below curve A (except at X=X* where the curves are tangent). 28 Here tz=4?*(L) only when L implies a X=X. This uses the assumption the informed agents know the informativeness of the price and would affect curve A. - 136 137 - in (4.1). On curve B, the A variation in a 3’ of an agent does affect the Stochastic Adaptive Learning in a Rational Expectations Model Learning, Evolution and S4lOrganization 1. 1.01 o.0e 0i6 0.04 0.2 Figure 4.1 - 0.3 0.4 0.0 0.0 &pected Utility of Informed and Uninformed Agents The figure plots f =l (a normalization) andf’ (for various t’ and L) as functions of X t uninformed are making inference errors stemming from mistakes about the number of informed agents but not because of mistakes about the inference portion of the population learning state. Curves C and D represent situations where agents are making both types of inference errors (1 1?). The only difference between curves C and D is that the errors are larger in curve D. To consider the stability of the GS equilibrium, L, we will consider various perturbations from L. The perturbations stem from the experimentation contained in the always- -137- Learning, Evolution and S4Organization Stochastic Adaptive Learning in a Rational Expectations Model experimenting stochastic adaptive learning process. The key is to determine whether or not adaptation will force the learning state back to L. First note that perturbations in X alone are relatively minor. If X increases (decreases) the uninformed, from curve B, are better (worse) off and adaptation will lead fewer (more) agents to be informed to move X towards x*. The effect of experimentation in other agents’ learning states is more complicated. As noted above, beginning from L*, all uninformed agents are made worse off by a change in L. Uninformed agents not only have an incentive to re-learn the rational inference parameters, they have an incentive to become informed. In curve C, adaptation will push the proportion of informed towards X . 1 Near X , an uninformed agent can improve by adjusting her 1 inference parameters. The possible improvement (via experimentation) is bounded by curve A (where agents use the correct inference parameters f *(L)) Since any increase in the uninformed utility at X 1 will make being uninformed more attractive, adaptation will cause X to fall. As long as X remains above X, there is some possibility that experimentation will increase the utility of the uninformed. Thus the initial perturbation tends to cause an increase in X followed by a slow decline back to X. Larger experimentation in an other agent’s learning state might lead to curve D. In this case, the incentive of the uninformed to become informed is much larger. In fact, X will tend to 1, since even at X =1 an uninformed agent does worse than an informed agent. If - 138 - Learning, Evolution and S4lOrganization Stochastic Adaptive Learning in a Rational Expectations Model perturbations of this kind are likely then the GS equilibrium is unlikely to be stable. In order for adaptation to move X back towards X, an agent must simultaneously experiment with being uninformed and experiment with inference parameters, r, that push the agent’s utility above the informed (the region indicated by G). In the situation with curve C, many uninformed agents (ie. 1 —X ) are attempting to improve the uninformed portion of their 1 learning parameters. When X is close to 1, there are fewer agents attempting to learn how to be uninformed. In order for L* to be stable, it must be the case that perturbations where all uninformed agents are in a situation characterized by curve D must be unlikely. The size of perturbations in the learning process (described by K) relative to the slope off with respect to X determines this likelihood. The key parameter in determining the slope off is o. When the noise in the risky asset supply is low the conditional variance of y given P, and thereforefu, is less sensitive to changes in the proportion of agents who are informed. As 1f4 becomes flatter, a fixed size perturbation leads to a larger increase in X. In addition, the slope also determines the size of the region labelled G in Figure 4.1. When informativeness of the price is not very sensitive to X, this region becomes small. When the size of the G decreases and X is close to 1, the likelihood of an agent, through experimentation, discovering a inference parameters which pushes her utility to this region (ie. above an informed) also decreases. This decreases the chance that X will return to the GS equilibrium - 139 - Learning, Evolution and S4tOrganizalion Stochastic Adaptive Learning in a Rational Expectations Model level of X. The region G will play an important role in the next chapter which considers stability more formally. This discussion suggests that for a given level of precision (z in (4.26)) and for a given always experimenting stochastic adaptive learning process with some level of experimentation (indexed by K), that there are some critical values for the level of noise trading, such that the GS rational expectations equilibrium is stable for all > < • e and and not stable for This conjecture is explored in the next two chapters by analyzing appropriately chosen deterministic dynamics (Chapter 5) and a genetic algorithm as a simulated always experimenting stochastic adaptive learning process (Chapter 6). IV - Conclusions This chapter has introduced the one-period single asset model that is used to consider stochastic adaptive learning in a non-strategic setting. The model underscores the difference between learning an exogenous parametric relationship and learning an endogenous (equilibrium) relationship. While parametric relationships are unaffected by other agents, equilibrium relationships depend on the behaviour of other agents, and therefore learning must co-adapt or co-evolve. The chapter characterizes a stochastic adaptive learning process in a repeated version of the single period model. -140- Learning, Evolution and Seif Organization Stochastic Adaptive Learning in a Rational Expectations Model The chapter shows the existence of a stochastic adaptive learning process which converges to the GS equilibrium. The construction of the process requires that experimentation with inference parameters be less common than experimentation over the choice of being informed. The chapter also considers the stability of the GS equilibrium to a given learning process. The stability of the GS equilibrium depends on the size of experimentation in the agents’ learning processes relative to the uncertainty in the risky asset supply. The noise in the asset supply is important since it determines the sensitivity of the informativeness of the endogenous price to the proportion of informed agents. The next chapter develops this relationship more formally by considering the deterministic analogue to the stochastic learning process. Chapter 6 also develops this theme by developing a genetic algorithm simulation as an example of an always-experimenting stochastic adaptive learning process. The simulation results contain examples where the GS equilibria are stable and others where the GS equilibria are unstable given genetic algorithm learning. In addition, the genetic algorithm is used to investigate if a learning process will converge to the GS equilibrium from an arbitrary initial population learning state. - 141 - Chapter 5 The Stability of SAL. Deterministic Dynamics The previous chapter concluded with the remark that for a given always experimenting stochastic adaptive learning process, the stability of the rational expectations (or GS) equilibrium depends on the level of noise trading (and associated cost of information) relative to the level of experimentation in the learning process. This chapter formally develops this relation. In order to proceed with the analysis, however, a more specific description of learning processes is required. This chapter specifies learning processes using techniques developed by Binmore and Samuelson which approximate stochastic dynamics These differential equations encompass both with deterministic differential equations. - 142- Learning, Evolution and S4f-Organization Stability of SAL: Detenninistic Dynamics adaptation and, as a proxy for noise, drift.’ This chapter constructs an appropriate deterministic learning process which represents stochastic adaptive learning. After presenting a few properties of the learning process, the stability of the GS rational expectations equilibrium is considered. In this context, the stability is interpreted as meaning that the GS equilibrium lies near a stable dynamic equilibrium of the deterministic learning process. A stochastic adaptive learning process for each agent, which defines the stochastic learning process for the economy, was described in the previous chapter as a sequence of learning states for each of the agents. As described, the economy learning process is a Markov system. Unfortunately, the state space of this process is extremely large and makes analysis difficult. Recall that in Chapter 3 analyzing the ergodic distribution for a four-agent-twointeractions system was almost intractable. Fortunately, in order to consider the stability of the GS equilibrium, it is not necessary to consider the entire Markov distribution of the stochastic learning process. Building on Binmore and Samuelson (1995), it is sufficient to focus on average system behaviour which can be well represented by a deterministic system. The authors, in a game theoretic model, analyze a stochastic evolutionary dynamic (ie. stochastic evolutionary learning process) using deterministic differential equations. In their analysis stochastic “noise” (or mutations) is replaced by deterministic “drift.” 1 See Binmore and Samuelson (1995) and Gale, Binmore and Samuelson (1995). - 143 - Stability of SAL: Deterministic Dynamics Learning, Evolution and S4Organization The important result Binmore and Samuelson establish is that the equilibria of the deterministic system are arbitrarily close to the expected state of the stochastic process (for some finite but long time period). Their result does not necessarily apply to the “ultra-long run” (ie. the limiting behaviour as time is made arbitrarily large) if the deterministic system has multiple stable equilibria. In this case stochastic shocks, however improbable, can move the system between the regions of the deterministic equilibria. Without specifying the noise more precisely, one cannot say in which region the stochastic system will, on average, reside. Similar to Chapter 3, the relative likelihood of stochastic shocks (possible multiple mutations) that can move the system from the region of one deterministic equilibrium to another would need to be specified. However, when the deterministic system has a unique stable dynamic equilibrium, then the exact specification for the (small levels of) noise is not important, since on average, the stochastic system will remain close to the deterministic dynamic equilibrium. The stochastic adaptive learning process for the GS model will be approximated by the system of deterministic equations presented below. The state variables of the dynamic system are the proportion of agents who are informed, X, and the vector of average inference parameters used by the agents, denoted 1?. This section will maintain the assumption made in Section 4.111(D) in chapter 4 that the informed agents know (and use) the correct inference parameters (ie. the informed know the correct 13’s). Again, of the three parameters which describe an uninformed agent’s linear demands, the ‘y” parameter is set according to (4.12). -144- Stability of SAL: Deterministic Dynamics Learning, Evolution and SerfOrganization Therefore, the state variable, £ E R , need only represent the two inference parameters of the 2 uninformed agents (( and f3u) The deterministic dynamic equations governing the evolution of the (X,,) are 2 = 1 ,e g(X ) + ,1? g(X ) = 1 + (5.1) m(X i , ) 1 t (5.2) where g, m, g and m 1 are continuously differentiable and bounded functions of X and £? -R (appropriately restricted to ensure that X 2 That is g,m:[0,l]xR E [0,1]) and 1 -sR The g functions represent adaptation, the m are drift and capture the R . g,,m,: [0,1] x 2 effect of the noise in the learning process. The i’,, and are positive constants which determine the level of drift in the system. Each of these functions is considered in more detail below. Equation (4.12) in Chapter 4 calculates the optimal inference parameters given a population learning state. Since the optimal inference parameters depend only on the average inference 2 Time subscripts are suppressed unless necessary. Continuously differentiability implies Lipshitz continuity and ensures that there is a unique continuously differentiable solution to the equations, LQ,I ,t) which gives the state of the system after time t with initial condition (>,,t). See Binmore and 0 Samuelson (1995) or Waltman (1986). - 145 - Stability of SAL: Detenninistic Dynamics Learning, Evolution and SeyOrganization parameters and the proportion of informed agents, the vector of optimal inference parameters 4 is a function of the state variable of the dynamic system. The function for optimal -÷R The XR . inference parameters for uninformed agents is denoted t *(X, e) where 1? : [0,1] 2 rational expectations parameters, which are the solution to the fixed point problem £ =C(X,1), are £‘(X). Note that in this notation, the inference parameters in the GS equilibrium L*, which has proportion X* informed agents, are simply £**(XD. Similarly, the fitness or utility of the uninformed agent (relative to the informed) depends on X, the inference errors made by the specific agent. £*(X,e) and Therefore, the utility of the average uninformed agent depends on X, £ and £(X,1). Equation (4.16) for the average uninformed agent in the economy can be stated as a function of the state variables of the dynamic system, fu(x,).s In this chapter, the parameters of the economy (as opposed to those of the learning process) are denoted E. From the discussion at the close of the previous chapter, the economic parameter of interest is the level of noise in the risky asset supply (noise trading), . In order to facilitate discussion regarding the stability of the GS equilibrium to various levels of noise trading, the GS rational expectations equilibrium proportion of informed agents, X, Recall that all informed agents are assumed to have identical and correct inference parameters. In addition, note that optimal inference parameters and fitness are both based on expected realized utility. That is, expectations are calculated using the true parameters and are not based on agent beliefs. While optimal inference parameters depend only on X and I, if agents have different inference parameters, then the average I will depend on which specific agents are infonned (even for the same X). The noise that is introduced by altering which agents are informed is in principle no different from the noise introduced by the stochastic experimentation. In the deterministic learning dynamic considered in this section, all sources of noise are captured by the drift term. - 146 - Learning, Evolution and S4tOrganization Stability of SAL: Deterministic Dynamics is held constant. This is be done by simultaneously adjusting the cost of information, c, using equation (4.21).6 All other parameters (for example, dividend mean and variance) in the economy are held fixed. Therefore, the economy can be summarized by E=(u,X*) where the value of c is implied by oe 21fld X. I - Learning by Pure Adaptation The deterministic learning process in (5.1) and (5.2) contains four functions. Each of the four functions will be described in more detail. Initial focus is on the functions g and g which represent the adaptive component of the learning process. After presenting some assumptions on these functions the stable dynamic equilibria to the adaptation-only learning process (flx’7e ) is presented. 0 First consider the adaptive component of the dynamic for the proportion of informed agents. In order to interpret gx as adaptation, the following two assumptions are made. urn g = urn g = 0 (5.3) x—1 for XE(O,1) 6 g0 f(X,1?)1 and g<0 It is straight forward to rewrite (4.21) such that c is a fiznction of the a. and X. -147- (X,1)>1 5 f (5.4) Srabiliiy of SAL. Detenninistic Dynamics Learning, Evolution and SelJOrganization The first assumption ensures that (in the case of ‘Jx=O) Xt remains in [0,1]. It is also consistent with proportional strategy growth, which is a typical assumption in evolutionary The second, and more important, assumption states that the proportion of s. 7 model informed agents decreases if and only if uninformed agents receive higher utility than the informed agents (recall that f= 1 by a normalization). This is consistent with the interpretation that gx captures adaptive or imitative learning. The region wheref 1 plays an important role in determining the behaviour of the learning y dynamic. The following analysis identifies some properties of this region. For the econom E defme G(E) = (5.5) {(x,t) Ifu(X,O1} the The following lemma establishes the useful properties of G(E). For this chapter, all of proofs are in Appendix C. Lemma 1 For a given economy, E, where Ore> 0, the set G(E) has the following properties: (a) G(E) is closed, (b) G(E) is not empty, (c) ourhood of 8 For X> X, there is a neighb £(X)) is in G(E), and See, for example, the replicator dynamics discussed in Chapter 3. 8 the c-neighbourhood In this chapter, neighbourhoods are defined with respect to the Euclidean norm. Note that since XE [0,1], that OX1. and such of (X’,f’) should be interpreted as all points (X,f) within distance e of (X’,t’) -148- Stability of SAL: Determini1ic Djmamice Learning, Evolution and S4fOrganization (d) The only (X,t)EG(E) with XX’ is (X’,flX)). An important property of the set G(E) is that its size depends on the level of noise trading in the economy, o. The set G(E) is the region where uninformed agents are experiencing higher utility than informed agents. This occurs when X> X. Since the price of the risky asset with X informed agents is more informative about the signal than at X, it is more economical to infer the signal imperfectly than to bear the cost, c, and be informed. To the extent that inference errors, £-(X,t), are made, the utility of the uninformed is reduced. However, with small inference errors, uninformed agents can still yield higher utility than informed agents at X> X* (Lemma 1(d)). The following function measures the maximum allowable inference errors that can be made and still leave the uninformed agent at least as well off as the informed agent. For X = X, defme 4(G(E)) as follows, sup{I £—‘ II (X,), (X,1’)EG(E)} (5.6) (1,1’) The argsup will lie on the boundary of G(E). From Lemma 1(d), note that 4.(G)=O since G is a singleton at X. Similarly, 4 is not defined for X < X since G is empty for these values of X. As the level of noise trading decreases, holding constant the GS equilibrium proportion of informed agents, X, the maximum allowable inference error decreases. In this sense G(E) gets smaller. In particular, as the level of noise trading approaches zero (holding constant - 149 - Stability of SAL: Deterministic Dynamics Learning, Evolution and S4fOrganization so does the maximum allowable inference error. That is at oe=0 (which for X >0 implies c=0) the maximum utility of an uninformed trader is equal to the informed. This level is achieved only if no inference errors are made. Thus as noise trading vanishes, G(E) collapses. This is stated in the following Lemma. Lemma 2 For a given GS equilibrium proportion of informed agents, X E (0,1), for all x>x (e’r)) = 0 (5.7) Next, consider the adaptive learning process of the (average) inference parameters which is represented by g (X,1). The following assumptions on g are made. 1 First, for a given XE (0,1), the solution to the differential equation dt/dt =g, (ie. (5.2) with i, =0) 1iinQ = Q**(,) is such that (5.8) In addition, .IItII -150- < 0 (5.9) Learning, Evolution and SerfOrganization Stability of SAL: Deterministic Dynamics limIIgII = 0 (5.10) The first assumption states that adaptation (holding the proportion of informed agents constant) will converge to the rational expectations inference parameters. This rules out, for example, unstable cobweb-cycle adjustment dynamics. This assumption is required for the dynamic learning process to have a stable dynamic equilibrium near the GS equilibrium. The assumptions in (5.9) and (5.10), which are important to the results which follow, state that the speed of learning decreases (eventually to zero) as fewer agents are uninformed. 9 Recall that adaptation is based on copying successful agents. If there are only a few uninformed agents exploring for the inference parameters, the learning rate slows. The implication of these assumption is that adaptive forces are always pushing € towards t*(X,f) at a rate which depends on the proportion of uninformed agents. Similar to the set G, the set H will be useful to describe the results which follow. Define H = {Q,t)Ig(X,)=0} (5.11) Note that for XE (0,1), the assumption that g, will converge to the correct inference parameters is equivalent to (X,t)EH if and only if £ =t**(X). The speed is represented by the Euclidian norm on R 2 which is denoted - 151 - Stability of SAL: Detenninistic Dynamics Learning, Evolution and Seif Organization II - Driftiess Stable Dynamic Equffibria Before investigating the equilibria in the learning process with drift, the purely adaptive or driftless equilibria are considered by setting =0 in (5.1) and (5.2). In this case the set of stable dynamic equilibria reflect two types of behaviour for the learning system. The first type is the GS equilibrium. The second type is where X= 1 (ie. all agents are informed) and inference errors are non-zero. In this case, the inference errors are moot since all agents are informed. After defining dynamic equilibria and stable dynamic equilibria, these equilibria are characterized. Definitions 4?) is a dynamic equilibrium i.f it is a fixed point of the deterministic learning process in (5.1) and (5.2); that is dX/dt=d1?/dt=O. The dynamic equilibrium (X, 4?) is stable ffor any neighbourhood R of (X, 4?) there exists a neighbourhood (5.2) originating in Q Q of (X, 4?) such that any trajecto?y of (5.1) and remains in R. The dynamic equilibrium (X, 4?) is asymptotically stable exists a neighbourhood that originate in 10 f it is stable and there Q of (X, 4?) such that all trajectories of (5.1) and (5.2) Q converge to (X, 4?). Note that the existence of a neighbourhood Q such that trajectories beginning in Q converge to the dynamic equilibrium is not sufficient to imply stability. Consider the example dxIdt={-x for x,EWrationalU0, lr!x, for x,ERational’0}. All points converge to 0. However, the system is not stable. The definition of asymptotically stable rules out such cases. -152- Learning, Evolution and SetOrganization Proposition 1 Stability of SAL: Deterministic Dynamics Given an economy, E, with o> 0, the set of stable dynamic equilibria for (5.1) and (5.2) when 2=2h=Ois {(X*1**(X*))}u{(lf)EG(E)} Figure 5.1 describes this result. (Figures begin on page 165). The X is represented on the horizontal axis and the inference error (t*(X,)) is on the vertical axis. Note that in order to keep the diagram simple, only one dimension of the (two dimensional) inference error is presented. The shaded area is the set G and represents where dX/dt=g 0. The line where inference errors are zero is the set H where d€/dt=g, =0. The arrows represent the direction of movement in the system. The stable dynamic equilibria occur either at the GS rational expectation equilibrium (X*,0), or at the boundary where all agents are informed. Figure 5.2 g • H Each line on the diagram represents represents phase trajectories for specific gx and 1 the evolution of the system for a different initial X and £. The two types of stable dynamic equilibria are where the phase lines converge (the GS equilibrium) or reach the boundary (X= 1) outside the set G. ifi - Learning with Adaptation and Drift While the driftless dynamic is illustrative, it does not capture the nature of stochastic adaptive learning. Noisy experimentation will prevent the system from converging to X= 1 since agents continue to experiment with being uninformed. For this reason, inference These functions are the replicator dynamics for the evolution of X and a “best response” dynamic for the updating of inference parameters. These functions are chosen only to illustrate the more general points of this section. The inference parameter space (to avoid higher dimensional diagrams) is for the intercept parameter, f. In these simulations, inference errors on the slope parameter, , are held at zero. - 153 - Stability of SAL: Deterministic Dynamics Learning, Evolution and S4f Organization parameter learning will not completely cease. However, experimentation (in information choice or inference parameters) also makes the inference parameters “harder” to learn since experiments change the optimal inference parameters. By setting >O and i >0 in (5.1) and (5.2) choosing appropriate drift functions m and m the deterministic learning process captures both of these features of the stochastic adaptive learning process. The drift in the proportion of informed agents, m(X,1?), is assumed to be such that lim m(X,f) <0 (5.12) lim m(X,e) > 0 x-.o (5.13) x-.1 for all 1?. Unlike, adaptation, drift can lead to fewer (more) informed agents, even if informed agents have higher (lower) relative fitness. This is because drift is capturing the experimentation which, unlike adaptation, is not completely fitness driven. The assumptions on mx capture the continual experimentation in a stochastic adaptive learning process by acting to force X away from zero and one. The primary learning forces are from adaptation. Noisy experimentation and its proxy, drift, should be thought of as small. It is useful to characterize effect of drift and clarify what -154- Leanung, Evolution and SeOrganizadon Stability of SAL: Detenninistic Dynamics “small” means. For X< 1, the set G contains all points such that g(X,)O. When drift is non-zero, it is useful to define the set G’ G’ = {(X,e) Ig(X,t)+iim(X,) O} (5.14) The following lemmas show that for small levels of drift, G is arbitrarily close to G’ except near X=l. Lemma 3 For i,> 0 and for all £ there exists an such that < implies (l-E, £) E G’. Lemma 4 Given ii>O and X”<l, there exists i>O such that for all (a) for all (X’, £ ‘) E G’ with X ‘< X “, there exists (), £) E G such that (X, f,)-(X ‘, C’) <, and (b) for all (X, £) E G with X < X”, there exists ().‘, £ ‘) E G’ such that I O)-(X’’) I <i The aspect of experimentation that drives inference parameters towards the rational levels has already been captured in the specification of adaptive learning, g,. Thus drift will capture the aspect of experimentation that makes learning more difficult by altering the 12 In particular, this portion of experimentation is rational inference parameters, £Q,t). captured by the assumption that drift pulls the inference parameters towards some arbitrary 12 Recall from the previous chapter, when one agent modified the inference parameters all uninformed agents were made worse off. - 155 - Learning, Evolution and S4lOrganizadon (incorrect) value, Stabithy of SAL: Detenninistic Dynamics That is, the solution to d1?/dt=m(X,t) has, for all X, a unique asymptotically stable dynamic equilibrium at £ = £m. The constant 4!m4Q) £m is chosen such that for all X Recall that H={(X,t)g=O}. Define H’ as follows, H’ (5.15) = Since £ E R , the set H’, in general, may be empty for some values of X (ie. the solutions 2 to a system of non-linear equations may not exist). Therefore, the assumption is made that for all fixed XE (0,1], g 1 and me are such that the set of dynamic equilibria of the inference parameters is a singleton and that this point is also asymptotically stable. (I’he appendix includes discussion which considers this assumption further). The function h(X) will denote the asymptotically stable inference parameters for the given X. Previously, it was assumed that g(X,f)=0 only at £=t**(X) and m (X,)=O only at 1 exists and is unique is analogous to these assumptions. =m• The assumption that h(X) It would not be surprising (or interesting) to find that the GS rational expectations equilibrium was not stable given a learning process (with or without drift) that is doomed never to converge. Thus the issue of the stability of the GS equilibrium is restricted to the case where the convergence of the learning process is assured. The following lemma shows that h(X) is a continuous function of X. - 156 - Stability Learning, Evolution and SeOrganizadon of SAL: Dete,ministic Dynamics 2 is continuous. Lemma 5 The function h: (0, 1]—’R Corresponding to the learning dynamic controlling X, drift (as a proxy for continued experimentation) has only a small effect on the inference parameters. That is, drift can be made small enough to ensure that for any fixed X <1, the vector of asymptotically stable inference parameters in the system with drift, h(X), is close to that of the driftiess system, £(X). The following lemmas show that the dynamic equilibria (and therefore the asymptotically stable equilibria) of the inference process with drift lie close to, but not at, the rational expectations level except at X =1. Lemma 6 For , 0 lit> h(1)=4?m Lemma 7 For XE(O,1], h(X) 1?(X) Lemma 8 If drift, m (), t), is a bounded function then given A >0 and X” <1, there exists , >0 such that for all lie < IIht2)-eO.) I <A. Learning processes with the assumptions and properties described above will be summarized by three parameters. The set of learning processes L(X” ,A,1?,,.), with X” > X, A >0 and m h(X), denotes learning processes described by (5.1) and (5.2) with positive drift (,> 0 -157- Stability of SAL: Detenninistic Dynamics Learning, Evolution and S4jOrganizo4on and 9j >0) and drift parameter X<X”, G is with in £m. These learning processes have the properties that for all of G’ (see Lemma 4) and h(X) is with in of £(X) (see Lemma 8). Before considering the stability of the GS rational expectations equilibrium it is helpful to characterize the stable dynamic equilibria for the learning process with non-zero drift. The equilibria occur at the intersection of the boundary of the G’ set and the stationary points of the inference parameter dynamic, h(X). The following proposition establishes that at least 13 one stable dynamic equilibrium exists for the learning process with non-zero drift. Proposition 2 For a given economy E with > 0, there exists at least one stable dynamic equilibriwn to the learning process, L(X “,z, £.). IV - The Stabifity of the GS Equilibrium The GS rational expectations equilibrium is not a dynamic equilibrium of a learning process with drift. This follows directly from Lemma 7. At the GS equilibrium values, there are no adaptive forces acting on the agents but drift (however small) will pull the system away from the GS state. However, stable dynamic equilibria of the learning process with drift may lie (arbitrarily) close to the GS equilibrium. A characterization of how well the GS 13 In general, h(X) will intersect G’ q 1 times where q is odd and (q+ 1)÷2 are the stable dynamic equilibria. The proof is very similar to the proof of proposition 4. For the same reasons that the boundary of G’ and hQt) must cross once, they must cross an odd number of times. For each (X,I) where a crossing occurs, only those points where (X-e,f) G’ are stable. This condition occurs on the initial intersection and then on altering intersections. Since this is not crucial to the analysis, this point is not developed formally. - 158 - Stability of SAL: Deterininzstic Dynamics Learning, Evolution and SdfOrgamzalion rational expectations solution describes an economy with learning agents (or vice versa) is the distance between the GS equilibrium and stable dynamic equilibria of the learning process. Since the deterministic learning process is a good approximation of a stochastic adaptive learning process (for small amounts of noise or “mutations”), this provides insight into the stability of the GS equilibrium under always experimenting stochastic adaptive learning processes. The following theorems state that for any economy there is a learning process (with drift) such that all stable dynamic equilibria are close to the GS equilibrium. In addition, for any learning process with a property discussed below, there is an economy for which the GS equilibrium is not close to the set of stable dynamic equilibria. These theorems capture the intuition of the previous chapter (and genetic algorithm simulations in the following chapter) that both the level of noise trading in the economy and level of experimentation (captured in drift) are important to determining when stochastic adaptive learning and rational expectations will yield similar predictions. In the previous chapter, equilibria for a stochastic adaptive learning process with vanishing levels of experimentation were considered. In particular, in order to ensure that in the limit as experimentation vanished the process converged to the GS equilibrium, it was important that experimentation in the learning parameters became small faster than experimentation in the information choice. (See the discussion of Proposition 3 in Chapter 4 on page 131). A 159 - Stability of SAL. Detenninistic Dynamics Learning. Evolution and S4fOrganization similar condition is important for the stability of the GS equilibrium in the deterministic representation of learning process used in this chapter. A necessary condition for an arbitrary learning process not to have its stable dynamic equilibria close to the GS equilibrium (for some economies) is that drift on the inference parameters be roughly the same size as drift in the information choice. Formally stated, this condition is as follows. t For a learning process, L(X”4”,L) for all X<X inf{IIh(X)_t* *(x)II} > sup{II(x,)—(x,’)II (x,e)eoG,(x,t’)eaG’} (5.16) x<xll where ô indicates the boundary of a closed set. Theorem 1 Given any > 0 and economy E= (, X) with > 0 and X E (0,1), there exist X “, “ and £,Z tha such fl £,,— £**(1) <II £,— €(‘l) , for X ‘> X “, ‘< “ and all the dynamic equilibria of the learning process L(X ‘4,, £,,) are within distance Theorem 2 all of the GS equilibrium, (X, fl))). For a given learning process L(A “4, £.) with 0< X <X”< 1 and the property in (5.16) there exists all the dynainic ‘ such that for equilibria, ()‘, £‘) all economies E= fr,X*) with have the property that X ‘> X cii< “. Since the driftiess learning process has an asymptotically stable fixed point at the GS rational expectations equilibrium, Theorem 1 is not particularly remarkable. A learning process with -160- Learning, Evolution and SeOrganization Stability of SAL: Deterministic Dynamics arbitrarily small drift has all its equilibria arbitrarily close to the GS rational expectations equilibrium. What is notable is in Theorem 2. For many learning processes, even ones with arbitrarily small drift (like those constructed by Theorem 1), there are economies for which all the stationary points (stable or not) are well away from the GS rational expectations equilibrium. In fact by choosing X” close to 1 implies that economies exist for which the dynamic equilibrium will have almost all agents informed. These economies for which this is true are those with a small level of noise trading, Oe. Note that in both Theorem 1 and 2 all the dynamic equilibria reside in a small neighbourhood (either close to the GS value or with X 1) and are not necessarily unique. From Proposition 2, at least one of these fixed points is a stable dynamic equilibrium. The introductory comments of this chapter mentioned that a deterministic dynamic may not capture the limiting behaviour of a stochastic system when there are multiple equilibria to the dynamic system. Multiple deterministic dynamic equilibria may imply that the exact specification of the noise matters since noise can (informally stated) bounce the system between basins of attraction for the deterministic equilibria. However, in this case, all stable fixed points of the learning dynamic are close together. Thus in the case where all the equilibria lie close to the GS equilibrium, the exact specification for noise (not drift) will not matter. Noise may move the system between some (not necessarily all) of the regions of attraction of the deterministic equilibria. However, since all equilibria are close to the GS equilibria, there will be little perceptible difference between the determinist representation - 161 - Stability of SAL: Detenninistic Dynamics Learning, Evolution and S4fOrganizadon and the underlying stochastic adaptive process. In this respect the GS equilibrium is stable. Similarly, in the case when all fixed points of the deterministic learning process lie away from the GS value, the exact specification of noise will not matter. Since none of the fixed points are close to the GS, the limiting behaviour of the underlying stochastic process will not concentrate on the GS equilibrium. In this case, the GS equilibrium is unstable. Figure 5.3 and Figure 5.4 (see page 166) provide an example of when the GS equilibrium is stable. Note that all the trajectories in Figure 5.4 (including those which begin near the X =1 border) converge to a point just below the GS equilibrium (which in these examples is X=O.5 and £f**(X*)=O). In contrast, Figure 5.5 and Figure 5.6 describe an example where the unique dynamic equilibrium is near X =1. In this case all trajectories (including those initiating near the GS equilibrium) converge to this point. Finally, the situation depicted in Figure 5.7 and Figure 5.8 is an example where ÔG’ and h(X) intersect three times. The first intersection near the GS equilibrium is stable as is the third which is near X =1. In this case the limiting behaviour depends on the initial condition. Furthermore, the exact specification of noise in the underlying stochastic process can influence which of these equilibria best characterizes the limiting behaviour of the stochastic system. V - Conclusion -162- Stability of SAL: Deterministic Dynamics Learning, Evolution and SeOrganization The previous chapter developed the GS rational expectations model and describes stochastic adaptive learning. The chapter described the relation between economic parameters and the learning process that impact on the stability of the GS rational expectations equilibrium. In particular, stability depended on the relation between noise in the supply of the risky asset (noise traders) and the level of experimentation. This chapter has developed the relationship among noise trading, experimentation and the stability of the GS equilibrium by approximating the stochastic adaptive learning process with a deterministic one. In lieu of randomness, drift, is introduced to capture the effects of random experimentation. The important result established is that for a given level of drift (a proxy for experimentation), the GS equilibrium may be stable or unstable. In this chapter the stability of the GS equilibrium is measured by the distance the GS equilibrium is from the stable fixed points of the learning process. The stability is determined by the level of noise trading. For large (small) levels of noise trading, the GS equilibrium is stable (unstable). The stability is driven by the size of the set G where uninformed agents’ utility (perhaps with inference errors) exceeds that of the informed agents. When noise trading is high, and the proportion of informed agents increases above the GS equilibrium value, uninformed agents have more room for mistakes in their inference parameters. The size of mistakes depend on the level of drift in the learning process. When the uninformed have room for error, the proportion of uninformed agents remains close to the GS equilibrium value, X. More importantly, the proportion of uninformed agents does not become very - 163 - Stability of SAL: Deterministic Dynamics Learning, Evolution and Sef.Organization small. Given this, the learning of the inference parameters of the uninformed agents is driven primarily by adaptation and adaptation ensures that the inference parameters remain close to their rational expectations levels. The next (and penultimate) chapter continues the with the GS model and attacks the question of stability using simulations. In Chapter 6, a genetic algorithm is used as a specific example of a stochastic adaptive learning process. While this chapter approximates general stochastic adaptive process with deterministic dynamics, the following chapter investigates a specific, but stochastic, learning process. - 164- Learning, Evolution and Seif-Orgarnzation Stability of SAL: Deterministic Dynamics 0 w 0 Figure 5.1 - A 1.0 Dnftless Learning Process The arrows are for a typical driftiess dynamic. The equilibria are at the GS values (A,O) or at X=1. o .oioo 0.0080 0.0060 I. 0.0040 0.0020 —0.0020 —0.0040 -0.0060 —0.0080 —0.0100 0. Figure 5.2 - Phase Plots in Dnftless Learning Process Each line represents a different initial condition. Note all lines converge to GS equilibriwn (X=O.5) or the X=1 boundary. - 165 - Stability of SAL: Detenninistic Dynamics Learning, Evolution and S4fOrganization 0 1.0 0 Figure 5.3 - I’ Learning Process with Drift The unique dynamic equilibrium is near the GS values (A,O) (denoted by the dot). o .oioo 0.0080 0 • 0060 I. 0.0040 -0 • 0020 -0.0040 -0.0060 -0.0080 —0. OiOO La.,bda Figure 5.4 - Phase Plots in Learning Process with Drift Each line represents a different initial condition. Note all lines converge to a point which is close to the GS equilibrium. - 166 - Learning, Evoludon and S4lorganization &ability of SAL: Detenninisdc Dynamics 0 A Figure 5.5 Learning Process with Drift - The unique dynamic equilibriwn (denoted with the dot) is far from the GS values. o oj.oo • 0.0080 0 • 0060 I. 0 I. 0.0040 0.0020 .0080 -0.0020 -o .0040 -0.0060 -0.0080 —0.0100 Libda Figure 5.6 - Phase Plots in Learning Process with Drift Each line represents a different initial condition. Note all lines converge to a point close to X=1. -167- Learning, Evohaion and S4tOrganizadon Stability of SAL: Detenninielic Dynamics 0 A Figure 5.7- Learning Process with Dnft There are three dynamic equilibria. Two are stable. 0.0100 0 • 0080 0 • 0060 I. EL 0.0040 0.0020 Th00000 —0 • 0020 -0.0040 -0.0060 -0.0080 —0.0100 Lanbda Figure 5.8 - Phase Plots in Learning Process with DrIft Each line represents a dzfferent initial condition. Lines converge to close to the GS equilibrium or close to X=1. - 168 - Chapter 6 Genetic Algorithm Learning in a Rational Expectations Model The previous chapter, building on the model in Chapter 4 analytically described the relationship between economic and learning parameters that determine the stability of the Grossman-Stiglitz (GS) rational expectations equilibrium. In order to preform the analysis, the stochastic adaptive learning process was approximated by an appropriately chosen deterministic system. In addition, the problem was simplified by the assumption that informed agents know the exogenous signal-dividend relationship. This chapter relaxes both of these assumptions and analyses stochastic adaptive learning using a genetic algorithm simulation. The simulations support the analysis of the previous two chapters demonstrating - 169 - Leanung, Evolution and S4fOrganization Genetic A1goithm Learning in a Rational &pecrations Model the relationship between the level of noise trading (or noisy supply of the risky asset) and the stability of the rational expectations equilibrium. As discussed in Chapter 1, the genetic algorithm is an example of a stochastic adaptive learning process. Genetic algorithms, introduced by Holland (1975), are biologically inspired stochastic optimization techniques. However, genetic algorithms can be interpreted as simulating the adaptive learning process where successful strategies proliferate and unsuccessful strategies die out. They also generate novel strategies by combining and mutating existing strategies in an efficient search of the strategy space. Genetic algorithms have intuitive appeal as a learning process since they employ both imitation and experimentation and have been successfully used in many optimization applications.’ Their success in demonstrating and augmenting the theoretical results suggests such algorithms are potentially useful techniques for analyzing more realistic and complex models. This chapter describes the genetic algorithm application of the GS model and presents the simulation results. Stochastic adaptive learning models have been used successfully to model learning of complex tasks and are becoming more common in economics. Many of these models use genetic algorithms. 1 Axelrod (1987), for example, used genetic algorithms to choose an See Goldberg (1989), Chapter 4 for a list applications. -170- Genetic Algorithm Learning in a Rational Expectations Model Learning, Evolution and SerfOrganization optimal strategy for the repeated prisoner’s dilemma game in a fixed population of 2 strategies. Slade and Eaton (1990) consider the pricing behaviour of oligopolists and Arifovic and Eaton (1994) use a genetic algorithm in a complex coordination problem. Marimon, McGrattan and Sargent (1990) consider adaptive agents in a monetary economy. Arifovic (1989) simulates learning in a rational expectations model (based on Bray, 1982) using a genetic algorithm to perform least squares learning. Arifovic (1994a) looks at genetic algorithms and cob-web cycles and Arifovic (1994b) and Duffy and Bullard (1994) implement genetic algorithms in overlapping-generations models. 3 I - Simulation Design Using the model and notation introduced in Chapter 4, the genetic algorithm, as a stochastic adaptive learning process, generates a population learning state, L+ 1 based on the current state L . The genetic algorithm updates agents’ learning states based on three operators: 1 selection, crossover and mutation. From the point of view of agent n, L 1 is determined as follows. First, with probability proportional to fitness, two learning states from L are selected. These states are combined using crossover which takes different parts of each of the two learning states. Finally, mutation perturbs some of the elements of the combined 2 The strategies that formed the environment for this study were the eight best strategies from the Axelrod (1984) tournament. Axeirod also considered the evolution of a population of agents playing the repeated prisoner’s dilemma model. Other examples using the prisoner’s dilemma and genetic algorithms include Miller (1987), Marks (1989a, 1989b) and Chapter 2 in this thesis. Some of these papers are discussed in Sargent (1993). - 171 - Genetic Algorithm Learning in a Rational Expectations Model Learning, Evolution and SeOrganizadon learning state yielding The selection operator captures the imitation or adaptation component of a learning process while the crossover and mutation drive the experimentation. The crossover step helps to guide experimentation to the most promising area in the space of possible learning states and mutation ensures that the learning process is always experimenting. Note that when the population consists of agents with similar inference parameters crossover has little effect. It is the mutation rate which determines the limit (Vi), size of experimentation. Simulations begin with the creation of an initial learning state for each agent in the population. In Simulations One and Two, the initial learning state of each agent contains the =L). In Simulations Three and Four, 0 GS rational expectations equilibrium values (ie. L initial learning states for each agent are selected randomly. Note that agents are completely characterized by their learning states. In the genetic algorithm, each agent’s learning state is represented by a binary string (chromosome). One bit (gene) determines the agent’s information choice. The remaining bits are the binary representation of the real valued parameters t’ in (4.9). The minimum and maximum values for these parameters are 4 specified to lie symmetrically about the GS rational expectations values. Successive generations of population learning states are created by the genetic algorithm based on the Consistent with the analysis of the previous section, the range for the A and are fixed at the values specified by the GS equilibrium. - 172 - is zero in these simulations and these parameters Learning, Evolution and S4tOrganizadon Genetic Algorithm Learning in a Rational Expectations Model This process is described in the flowchart, realized fitness of the learning states. Figure 6.1. The parameters for the simulations are contained in three tables. Table 6.1 contains the parameters of the economy which are common to all the simulations. These , where X=0.5. Table 6.2 5 parameters are consistent with a specific GS equilibrium, L presents the parameters specific to each of the four simulations and summarizes the results. Finally, the parameters of the genetic algorithm learning process are in Table 6.3. Tables 1 to III ithd Figures 6.2 t 6.5 are the efld of the chapter beginning on page 187. Fitness is calculated based on repeated play of the one-period model. Each generation, the one-period model is simulated R times (R= 1000 in the simulations presented) using the current population learning state. The repetitions are independent of one another. A repetition consists of selecting (y,e,e) according to (4.2), calculating asset demands according to (4.4), using the agents’ learning states, and solving for the equilibrium price using (4.10) and (13.5). The calculation of fitness, f’ is a proxy for expected realized utility, fEZ, and is based on the realized mean and variance end-of-period wealth of the agents in the R repetitions. For all n =1,. ,N agents, . . The simulations are written in Pascal. They are based on the coding provided in Appendix C of Goldberg (1989). The Pascal programs for these simulations are available from the author on request. The simulations presented here were ron on an IBM RS6000. A typical run of 5000 generations (or 5 million repetitions of the one-period model) takes several hours. Smaller simulations can be run on PC-based machines. - 173 - Genetic Algorithm Learning in a Rational &pectalions Model Learning, Evolution and SefOrganization Start Select Initial Population Create strings randomly or at GS rational expecations parameters Decode Strings to Strategies Determines learning state Determines if informed or not and the parameters (‘s and y’s) Simulate One-Period Model up to 5 times Simulate a signal, noise and asset supply Calculate equilibrium price and demands I Record End-of-Period Wealth Repeat 1000 times Based on realized dividend, denurnrlq and price Calculate Fitness Based on realized average and variance of wealth Generate New Population Using Genetic Algorithm Selection, Crossover and Mutation End Figure 6.1 Schematic for Genetic Algorithm Stochastic Adaptive Learning Simulations - -174- Leanung, Evolution and S4/Organization ?= Genetic Algorithm Learning in a Rational Erpectations Model W:-oC+!(dr-Pr)X: Rr..i r I - Recall that 9fl L R 1 2 _Y.(dr-Prx:) - [ R 2 1 (6.1) Y.(dr_Pr)Xij is 1 if agent n is informed and 0 otherwise. The equation for fitness is the sample equivalent of the mean-variance statement of expected utility (which follows from CARA preferences and normality) •6 The choice of a fitness measure is not innocuous. Alternative definitions of fitness are available. For example, both De Long et.al. (1990) and Blume and Easley (1992) assume fitness is based on average return. This measure of fitness can be implemented by the genetic algorithm model by choosing only one repetition of the one-period model between successive generations (ie. R= 1). An argument for this measure might be based on observability. Agents looking to imitate other agents can observe wealth easier than utility. However, in this model utility is based on the mean and variance of wealth whose observability is equally plausible. Most importantly, the selection of fitness is consistent with the GS equilibrium. 6 The mean-variance representation was used so that fitnesses are non-negative since agents are selected by the genetic algorithm for imitation with probability proportional to their fitness. By appropriate choice of W , the probability that the fitness is 5 negative is low. In the rare instances where fitness did happen to be negative, it was set equal to zero. - 175 - Learning, Evolution and SeOrganizaUon Genetic Algorithm Learning in a Rational &pectations Model The choice of fitness defmition is important for a second reason. Recall that the genetic algorithm selects learning states (strings) for imitation according to fitness. Since fitness is considered proportional to the total fitness, the cardinal values for fitness matter. Thus the parameter for initial wealth is not innocuous. An increase in initial wealth, while not affecting any of the equilibrium relationships, moves the probability of any one agent’s learning state being selected (imitated) closer to uniform. 7 In other words, the cardinal values of utility determine the importance of the ordinal rankings in the adaptive learning. II - Simulation Results Four simulations are presented. They are representative of a large number of such simulations that have been conducted. The first simulation is an example of a stable GS equilibrium. The second simulation is an example which is not stable to the genetic algorithm learning. 8 These simulations differ only in the parameter a which is the noise in the supply of the risky asset. Both of these simulations begin at the GS rational expectations parameters, L*. The next two simulations use the same parameters from Simulation One except the simulations begin from a randomly selected initial population. One of the simulation converges to the GS result while the other does not. Note that fitness proportional selection implies the probability of selecting string n isf+_. Once strategies are selected, the crossover and mutation operators are applied. There are alternative selection schemes which are used in genetic algorithm optimization. For example, tournament selection uses only the ordinal information contained in the fitness. See Beasley et al (1993) for more details. 8 Since the description of the simulations is informal, a specific - 176 - value for i-stability is not required. Genetic AlgoHth,n Learning in a Rational &pectations Model Learning, Evolution and SeifOrganization The description of the simulations will focus on the proportion of informed agents (Xi), the asset pricing parameters (the a’s) and the aggregate (or average) inference parameters (the j3’s). The asset pricing parameters in (4.10) are calculated from (13.5) according to the evolved learning state L. For comparison, the GS rational expectations pricing parameters (the a; from (13.11)) for the current X are shown. according to (13.3) from the learning state L. The aggregate 13’s are calculated The 13, for example, is the (weighted) average of the [31 contained in the £ of those agents who are informed at t. The other aggregate fl’s are calculated similarly. These plots also contain the GS rational expectations values given the X. For the plots of the inference parameters of the uninformed, the rational parameters are shown. These are the £(L) and represent the best choice for the parameter, given the current evolved learning state. For the informed inference parameters, the rational parameter is the rational expectations value. (A) Stability Examples - Simulations One and Two Simulations One and Two both begin from the GS rational expectations equilibrium, L*. They differ only in the noisiness of the risky asset supply, o (and corresponding cost of information, c, which is chosen such that X*=0.5). This parameter determines the sensitivity off to X (ie. the slope in Figure 4.1). In Simulation One, where supply noise is higher, the proportion of informed agents remains close to its GS equilibrium level of 0.5 (see Figure 6.2a) and the GS equilibrium is stable. In contrast, in Simulation Two where the -177- Learning, Evolution and S4Organization Genetic AlgoHthm Learning in a Rational F.tpectations Model supply noise is smaller, X 1 drifts steadily towards one (see Figure 6.3a). In this simulation, by generation 2000, with the exception of agent experimentation, all agents are informed. The pricing parameters reflect the differing behaviour of the two simulations. The behaviour of the a pricing parameters is similar in the two simulations (compare Figure 6.2c with Figure 6.3c). Both drift slightly away from their rational expectations values (which are shown in the figures as solid lines). Figure 6.2d and 6.3d report the behaviour of the a 1 parameter. This parameter is the sensitivity of the asset price to the signal y. In Simulation One, which is stable, a 1 remains close to the rational expectations level. In Simulation Two, where X drifts to one, the a 1 parameter also rises. At X= 1, a= 1. The a 1 in Simulation Two drifts above this level primarily because the i3 (as discussed below) converges to slightly above f3. Finally, Figures 6.2e and 6.3e present the a 2 parameter which is the sensitivity of the asset price to the realized asset supply, e. From (13.5) note that the a 2 is primarily determined by the X 1 and the ‘s of the agents. Since the ‘s are held constant in these simulations, the difference in the two figures is simply a reflection of the different behaviour of the X’s in the two simulations. As expected, the interference parameters of the informed agents differ only slightly in the two simulations. The f3 are in Figure 6.2f for Simulation One and Figure 6.3f for Simulation Two. The i3’ are in Figures 6.2g and 6.3g. Recall that the informed agents need only learn the exogenous signal-divided relationship. This relationship is not affected by the - 178 - Genetic Algorithm Learning in a Rational Expectations Mod4 Learning, Evolution and SeOrganization noisiness of the risky asset supply. Since the learning is stochastic, the paths for these parameters are slightly different. 9 Note that small variations from the rational parameters (j3*) are not surprising since the fitness of the informed (as well as uninformed) is observed with error. The behaviour of the uninformed agents’ inference parameters is quite different in the two simulations. In Simulation One, where X, remains close to its GS equilibrium level, the flUs fl and Figure U also remain close to their rational expectations level. Figure 6.2h contains 0 6.2i contains i3 for Simulation One. Note that evolving flU and the rational e*(L)) lie on opposite sides of the rational expectations surprising, T g (ie. £*(LD). 3U (given by This is not if other uninformed agents (about half the agents in the population) are placing U) 1 then, knowing this, the rational too much weight on the price (reflected in the value of fl parameter puts less weight on the price. The distance between the evolving parameters and rational ones is small in terms of fitness (or utility). An agent who experimented with the rational parameters would, on average, have higher fitness. However, this increase is slight and, given the noisy observation of fitness, may not be realized. In Simulation Two, where most agents eventually become informed, the evolving inference parameters of the uninformed diverge from the rational levels. Figure 6.3h contains the U• However, Figure 6.3i of 0 results for fl demonstrates the divergence most clearly. After MI simulations begin with the same random seed. However, since the simulations have different outcomes in some respects (primarily in the X), the learning of the infonned also differs. - 179 - Genetic Algorithm Learning in a Rational Expectations Model Learning, Evolution and S4fOrganization the first 250 generations, L 1 differs from the initial state (Lo=L*) for which the initial value w is rational. The difference is large enough that, due to inference errors, the fitness of 131 of the uninformed is below that of the informed and adaptation leads to more informed ° The fitness difference persists even at X= 1. 1 agents. agents are informed. By generation 1600 almost all fl is an average of the U Recall that the parameter 1 13 of the uninformed agents. With fewer uninformed, the average is more volatile which partially explains the divergence. In addition, with so few uninformed agents, uninformed agents’ learning becomes almost pure experimentation. The fitness of most agents does not reflect the quality of the uninformed portion of their inference parameters since most agents are 5 TJ informed. Pure experimentation or just blind guessing generates the disperse 13 and from Figure 6.3b which is discussed next, it is unlikely that the experimentation will lead to an uninformed agent faring better than an informed agent. The analysis of stability in Section 4.ffl pointed to the importance of the supply noise parameter for the stability of the GS equilibrium. These two simulations demonstrate this result. In Simulation One, inference parameter experiments tend to result in small changes in the proportion of informed. This is similar to the situation in curve C in Figure 4.1. However, in Simulation Two, with smaller supply noise, experiments shortly lead to the situation where for all A, uninformed agents have lower fitness than the informed as in the 10 Recall selection or adaptation is proportional to fitness. The size of difference in fitness of the infonned and the uninformed is relatively small as compared to average fitness (given the choice of the initial wealth parameter, Wj). This determines the speed at which >.. moves to one. In other simulations (not presented here), where initial wealth is smaller (with the other parameters of Simulation Two), X moves to one much more quickly. -180- Genetic Algorithm Learning in a Rational Expectations Model Learning, Evolution and SeOrganizaiion case of curve D in Figure 4.1. Recall that the supply noise also determines the size of region where, uninformed fitness exceeds the informed fitness with X =1 (denoted as G in the Figure 4.1). Figure 6.3b for Simulation Two presents the difference in the average informed and uninformed fitnesses. This difference demonstrates that the analog to region G is small in Simulation Two. When X is near one (after generation 2000), occasionally an experimenting uninformed agent does better than the average informed. Most of the time, experimenting agents do worse and X does not move far from one. (B) Convergence Examples - Simulations Three and Four The previous two simulations began at the GS rational expectations equilibrium. Simulations Three and Four begin from (the same) randomly selected population learning state. In these simulations significant learning is required since agents are not endowed with ideal initial learning states. The parameters for this simulation are identical to Simulation One. Under these parameters, the GS equilibrium is stable. Simulation Three is an example of a simulation which tends to converge to the GS equilibrium. Simulation Four does not. The difference in the two simulations is a constraint on the movement of X 1 in the initial generations of the simulations. In Simulation Three, the proportion of informed agents is constrained to the GS rational expectation equilibrium level for the first 100 generations. - 181 - Learning, Evolution and Set/Organization Genetic Algorithm Learning in a Rational Expectations Model That is, in Simulation Three X=X*=0.5 for t< 100. In Simulation Four, X=0.5 for t<50. The role of this constraint is discussed further below.” Simulation Three converged to the GS equilibrium (see Figure 6.4a). After generation 100, however, the proportion of informed agents increased to 0.7 (around generation 500). It slowly declined back towards the GS equilibrium level and fmally reached X=0.5 around generation 2500. In contrast, Simulation Four did not converge to the GS equilibrium (see Figure 6.5a). After generation 50, X, increased quickly to 0.8 (by generation 200) and then continued to drift towards one reaching X= 1.0 by generation 1000. To understand the different outcomes in these simulations the average fitnesses of the informed and uninformed are shown in Figure 6.4b for Simulation Three and Figure 6.5b for Simulation Four (for the initial generations). Given the randomly selected learning states, average informed and uninformed fitness is initially quite low. The selection and crossover increase average fitness dramatically in the opening generations. The rapid learning in the these generations leads to large changes in the population learning state. Since rational inference parameters for the uninformed (ie. £*(L)) depend on the (rapidly changing) learning state, initially, the learning process of the uninformed is slower than for the informed. Due to the faster learning by the informed, regardless of the informativeness In general it appears that for the parameter values used in simulations Three and Four, >.,=O.5 must be imposed for (roughly) 75 generations in order that the learning converges to the OS equilibrium. For other parameters (tighter allowable ranges for inference parameters, for example), convergence was obtained without this constraint. 182- Genetic Algorithm Learning in a Rational Expectations Model Learning, Evolution and S4fOrganization of the price, informed agents’ fitness is above the uninformed. Even at X= 1, adaptive pressures will drive all agents to be informed. Figure 4.1. This is the situation with curve D in By the constraint that X =0.5, uninformed agents improve their inference 1 parameters without reducing the number of uninformed agents. Thus, in Simulation Three when the constraint on X 1 is removed (at generation 100), the situation resembles curve C in Figure 4.1. The X increases, but only to 0.7. As uninformed agents continue to learn and move closer to rational parameters (see fl -’ in Figure 6.4h or 0 U 131 in Figure 6.4i), their fitness increases above the informed and adaptation leads fewer agents to be informed. In Simulation Four, however, when the constraint on X, is removed earlier, the learning of the uninformed agents is not sufficiently advanced. Even as the proportion of informed agents approaches one, the uninformed are still doing worse than the informed. As discussed in connection with Simulation Two, at X= 1, the learning of the uninformed is based purely on random experimentation. The a’s and j3’s for Simulation Three are similar to those in Simulation One. They are presented in Figures 6.4c to 6.4i. The a’s and j3’s for Simulation Four are akin to those in Simulation Two and are presented in Figures 6.5c to 6.5i. previous two simulations, further discussion is not presented. ifi - Extensions - 183 - Given the parallels with the Learning, Evolution and S4jOrganizadon Genetic Algorithm Learning in a Rational Erpecrations Model There are many possible extensions to the genetic algorithm model presented. For example, varying the learning rate is important in the spatial evolution model presented in Chapter 2. In that model, the least fit agents alter their strategy each generation. The behaviour of the evolution, including whether or not the evolution found a Nash equilibrium depends on the number of agents who update their strategy. A related modification to genetic algorithm learning called “election” is made in Arifovic (1994a, 1994b). In these papers, before a strategy is adopted it is tested against recent experience. If the proposed strategy would have done better than the current strategy it is adopted, otherwise the old strategy is maintained. Both modifications are intended to control the level of co-evolution occurring in the population. In the context of the current model, if fewer agents are modifying learning states in a generation (by algorithm design or as a result of election), then the learning of the uninformed should be easier. The “target” for learning, £*(Lt) is easier to hit since L is less volatile. However, results suggest that because the fitness of an uninformed agent is highly correlated with other uninformed agents and the fitness an informed agent is highly correlated with other informed agents, both election and reducing the number of learning agents affects one group of agents disproportionately. This increases the volatility of X. Further investigation could prove interesting. One aspect of the simulation which appears particularly unrealistic is the poor learning about the uninformed inference parameters when most agents are informed. The informed agents possess the necessary information to study the potentially useful price-signal relationship. -184- Learning, Evolution and SeOrganization Genelic Algorithm Learning in a Rational Erpectadons Model However, the agents do not learn “how to be uninformed.” This occurs because fitness of an informed agent does not reflect her uninformed inference parameters. If an agent happened upon the rational uninformed parameters (via crossover or mutation in the genetic algorithm) but the agent was informed their value is not reflected in the probability that other agents imitate the learning state. There are several modifications to the genetic algorithm that might address this concern. The first is to modify the definition of fitness to incorporate how the informed agent would have done jf she had been uninformed. Alternatively, the agents could be given the opportunity to experiment with being uninformed for some of the R repetitions which constitute a generation.’ 2 However, if the size of the region where uninformed fitness is larger informed fitness with X, near one is small (see area G in Figure 4.1), then it is anticipated that the proposed modifications will have little effect on the simulations. IV - Conclusions The genetic algorithm implementation of the GS model demonstrates the stability results discussed in Chapters 4 and 5. The simulation results presents examples where a stochastic adaptive learning process does and does not converge to the GS equilibrium. These examples include cases where the initial condition of the learning process is the GS rational 12 . . . . This is computataonally more intensive than the current algorithm. Since the learning state could potentially change with each of the 1000 repetitions that occur within a generation, the aggregate fl’s and pricing parameter a’s would need to be calculated for each repetition instead of just once per generation. - 185 - Genetic Algorithm Learning in a Rational Expectations Model Leatrnng, Evolution and S4fOrganization expectations equilibrium. The effectiveness of the genetic algorithm in demonstrating the theoretical results is encouraging. Genetic algorithm learning may be a useful technique to investigate more realistic models where rational expectations solutions cannot be calculated. More general conclusions for the set of three chapters using the GS environment are included in the final chapter. -186- Learning, Evolution and Se0rganizadon Genetic Algorithm Learning in a Rational Erpectations Model Coefficient ofAbsolute Risk Aversion a 2.0 won 0.1 N 100.0 Average dividend [3 0.1 Sensitivity of dividend to signal 13 1.0 Initial wealth per repetition Average aggregate supply of risky asset • Variance of signal 0.0004 Variance of dividend + GS equilibrium proportion of informed agents Table 6.1 - Common Simulation Parameters -187- 0.0008 0.5 Learning, Evolution and SeOrganization Simulation Number Genetic Algorithm Learning in a Rational Expectations Model 1 2 3 4 Initial Population L L’ Random Random Supply Noise, a 10.0 1.0 10.0 10.0 Cost of information, c (chosen so that X*05) 0.0146027 0.0015848 0.0146027 0.0146027 Number of Generation that X=O.5 is imposed 1 1 100 50 L* stable L* not stable Converges to L Not Converge toL Figure 6.2 Figure 6.3 Figure 6.4 Figure 6.5 Summary of Simulation Results Location of Results Table 6.2 - Simulation Specj/ic Parameters and Results Summaiy - 188- Genetic Algorithm Learning in a Rational Expectations Model Learning, Evolution and S4f-Organizo.tion Population size N 1,000 Repetitions of one-period model per generation R 1,000 Length of simulation (number of generations) 5,000 Length of string that represents learning state 83 Crossover probability (f crossover does not occur, 0.7 the agent imitates one of the selected learning states) 0.0001 Mutation Probability (Probability that any bit is switched from 0 to 1 or vice versa) Expected number of mutations per generation 8.3 All four simulation begin with the same seed for random number generation Mapping Learning States to Strings Parameter Length of Binaty Representation Length ofRange Around the GS Parameters, 1 bit n.a. 20 bits each 0.2 1 bit each 0.0 6 I3i, 13), ‘, Table 6.3 - ‘y i3”, f3,M Common Simulation Parameters - 189 - - r(v) Genetic Algorithm Learning, Evolution and SeifOrganization Genetic Algorithm Learning in a Rational Expectations Model Figure 6.2 - Simulation 1 - Stable 1.1 0.9 -J 8 0.8 0.7 L 0 0.6 0 500 1000 1500 2000 2500 Gerierat I on Figure 6.2a - X, Proportion of Informed Agents -190- 3000 3500 4000 4500 5000 Genetic Algorithm Learning In a Rational Expectations Model Learning, Evolution and Sef-OrganlzaUon 0.018 a 0.016 0.014 0.012 0.01 ‘I 0.008 0.006 0.004 L C 0.002 a -0.002 = C -0.004 -0.006 -0.008 -0.01 di C a D -0.012 0 ai 0 BCDo CD aD -0.014 -0.016 Ô I aào1000 112b0116b012000124b0128b0132b0136b0140b0144b0148b01 1400 1800 4ó0 2O0 6O0 2200 2600 3000 3400 3800 4200 4600 5000 Generation C Figure 6.2b - Ave UI Ave UU - D/ference in Average Fitness: f t -f - 191 - Genetic Algoi*hm Learning in a Rational Expectations Model Learning, Evolution and SeifOrganization 0.1014 0.1013 0.1012 0.1011 JIh.II. 0.101 I. I i Li ,i ii 0.1009 I-’ I 0.1008 hi” 0. 0.1007 I I “I’ o.ioo I (I1II’ h I. ‘liII I(iI 1, ‘p 0.1005 0.1004 0.1003 0.1002 0.1001 0.1 0.0999 0.0998 6 I 4O0 860 I 1600 1200 a Figure 6.2c - 110, 80 2400 3200 4000 4800 2000 2800 3600 4400 Generation —Evolution Average Price -192- Genetic Algorithm Learning in a Rational Erpectaiions Model Learning, Evolution and SeOrganizaUon 0,98 0.979 0. 978 0.977 r 0.976 >‘ 0 0.975 > 0.974 > 0.973 0.972 C 0.971 LI 0.97 0.969 < 0.968 0.967 0. 966 0.965 0.964 800 0 400 1600 1200 0 RE 4000 4800 2400 3200 3600 4400 2000 2800 Generat I on —Evolution Figure 6.2d a , slope ofprice to y 1 - - 193 - Learning, Evolution and SefOrganlzation Genetic Algorithm Learning in a Rational F.rpectation Model -0.00148 - 0.00149 -0.0015 -0.00151 -0.00152 ,.- 0; •‘ -0.00153 -0.00154 -0.00155 >‘ ‘-‘ -0.00156 -0.00157 —0.00158 -0.00159 LI c’J a -0.0016 —0.00161 -0.00162 < -0.00163 -0.00164 —0.00165 —0.00166 -0.00167 -0.00168 6 I 460 800 1600 1200 0 Figure 6.2e - RE 2400 4000 4800 3200 2000 2800 3600 4400 Generation Evolution , Slope of Pnce to e 2 a -194- Genetic Algorithm Learning in a Rational Erpectations Model Learning, Evolution and S4fOrgwiization - 195 - Learning, Evolution and S4lOrganizasion Genetic Algorithm Learning in a Rational &pectations Model I .007 1 .006 .Ii 1 .005 I’ I I III 1 004 . . Li 1 003 . m 1 .002 1.001 I I’ IIll’ I I’ 1 o. ggg 0. 998 o 1400 I 200 800 112b0116b012000124b0128b0132b0136b0140b0144b0148b01 6O0 1000 1400 0 Figure 6.2g 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation RE —Evolution - - 196 - Genetic Algorithm Learning in a Rational &pectaslons Model Learning, Evolution and S4tOrganizadon 0.0041 0. 004 0. 0039 0.0038 0.0037 0.0036 0.0035 0.0034 0.0033 0.0032 m 0.0031 0.003 0.0029 0.0028 0,0027 0.0026 0.0025 0.0024 .,I 0.0023 ‘J [I’ lip 0.0022 0.0021 6 I I 460 860 12b0116b0120’00124b0128b0132b0136b0140b0144b0148b01 2O0 6b0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation C Figure 6.2h - flE Rational I3oU -197- 0 Evolution Genetic Algorithm Learning in a Rational &pectations Model Learning, Evolution and SeOrganizaUon 0.972 0.971 0.97 0.969 0.968 0.967 0.966 0. 965 jr 0. 964 0.963 0.962 II If 0.961 0.96 o 2O0i 400SOc8001000 I12b0I16b0I200I24b0I28b0I32b0I36b0I40b0I44b0I48b0I 1400 1800 3400 4200 4600 2200 2600 3000 Generat I on D Figure 6.2i RE Rational - - 198 - 0 Evolution 3800 5000 Genetic Algorithm Learning in a Rational Expectations Model Learning, Evolution and SeOrganization Figure 6.3 - Simulation 2 Not Stable - 1.1 0.9 -J 1 0.8 :: I I I I I I I I I 0 500 1000 1500 2000 2500 3000 3500 4000 Generation Figure 6.3a - X, Proportion of Informed Agents - 199 - I 4500 5000 Learning, Evolution and SeOrganization Genetic Algorithm Learning in a Rational Expectations Model 0.1 a 0.09 0 a a 0 0.08 L 0 C 0.07 C D 0.06 0 a a a aaa 0 a a 0 a aa a a a a aa a a a a a E ‘I U’-, rI 0.05 U C 0.04 >‘ — 0.03 D 0.02 ‘4- 0.01 ir 0 -0.01 5 14O0 1860 I12b0l16b0I200I24b0I28b0I32b0I36b0I40b0I44b0I48b0I 600 1000 1400 a - j- a 200 Figure 6.3b a 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation Ave UI - Ave UU Difference in Average Fitness: f f” -200- Genetic Algorithm Learning in a Rational Eipectations Model Learning, Evolution and S4/Organizatlon 0.1008 0.1007 0.1006 Ii 0.1005 II 0.1004 a w 0.1003 I 0.1002 I r 0 a 0.1001 I “ 0.1 0.0999 0.0998 0.0997 0.0996 i I 4O0 8à0 I I 1600 1200 D Figure 6.3c - RE I 2400 2800 2000 Generat I on ———Evolution , Average Price 0 a - 201 - 4000 3600 4800 4400 Genetic Algorithm Learning in a Rational &pectations Model Learning, Evolution and S4tOrganization 1 012 I 01 1 . 1.01 1 009 1 .008 >‘ 0 1.007 1,006 1.005 > 1.004 1.003 1) 1.002 1.001 0.999 0.998 0.997 0.996 0.995 O I 4O0 860 1600 1200 U Figure 6.3d - 2400 3200 2000 2800 Generation RE Evolution , slope of price to y 1 a - 202 - 4000 3600 4800 4400 -0,0008 -0.0009 -0 001 -0.0011 >‘ -0.0012 —0.0013 t r -0.0014 4 —0.0015 -0.0016 -0. 0017 Generat I on Evolution RE Figure 6.3e - , Slope of Price to e 2 a - 203 - Learning, Evolution and S4jOrgamzation Genetic Algorithm Learning in a Rational Erpectations Model 0.1006 0.1005 0.1004 I 0.1003 ilii II i. i L ‘I rip m “ III 0.1002 0.1001 0.1 0,0999 o.ogge 0 I I I I I 4001800 112001160012000124001280013200136001400014400148001 2O0 6O0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 Genrat ion a RE —Evolution Figure 6.3f fl ’ 0 - - 204 - Learning, Evolution and S4fOrga,uzadon Genetic AlgoHthm Learning in a Rational Expectations Model 1.1 i.o9 1.08 1.07 1.06 1.05 - - - - - 1.04 1,03 1.02 - - 1.01 m 0.99 0.98 0.97 0.96 0.95 0.94 0.93 - - - - - - - 0.92 0.91 - 0.9 0 I I D 1Ygure 6.3g - I I I 400 I 800 1200 1600 2000 2400 2800 3200 3600 4000 4400 4800 2O0 60 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation RE —Evolution Ii’ - 205 - Learning, Evolution and SeOrganizadon 0.03 Genetic Algorithm Learning in a Rational Expectations Model - 0 0 6’o ø. 00 0.02 0 - 00 0 0 0000* 0 000 0 0.01 0 - 0 0 0 0 0 w 0,0 0 -0.01 0 0 0 0 0 0 0 -0.02 0 0 0 0 0 -0.03 -ào 2O0 0 00 aáo 600 1000 1400 1800 2200 2600 3000 3-400 3800 4200 4600 5000 Generation D Figure 6.3h - 0 RE Rational I3U - 206 - 0 Evolution Learning, Evolution and Seif-Organization o Genetic 4lgo,thm Learning In a Rational Expectations Model 972 0.971 0 97 0. 969 0.969 o.967 m 0,966 0.965 0. 964 I 0.963 ii r!hl III 0. 962 0.961 0.96 I 200 jo Il2boIl6boI2o’ooI24boI28boI32boI3EboI4oboI44boI4BboI 6O0 1000 1400 D RE 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation Rational Figure 6.3i-13 ’ 1 - 207 - 0 Evolution Learning, Evolution and S4torganization Figure 6.4 - Genetic Algorahm Learning in a Rational Erpectadons Model Simulation 3 Converges to GS Equilibrium - 1.1 0.9 -J Li 0.8 :.: I I I I I I I I I I I 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Generat on Figure 6.4a - X, Proportion of Informed Agents - 208 - Learning, Evolution and SeOrganization Genetic Algonthm Learning in a Rational Expectations Model 1.1 I 0.9 >‘ 0.8 D a, 0, a 0.7 - L a, > £ 0 a 0.6 0.5 - - 0 0 0.4 0.3 - - + 0.2 0 Generation Informed Figure 6.4b - Uninformed + Average Fitness f 1 and f” For the first 200 generation - 209 - Learning, Evolution and S4f Organization Genetic Algorithm Learning in a Rational Expectations Model 0.1008 0.1007 0.1006 Jill 0.1005 0.1004 w • 0.1003 . II Al ’’ 1 II 0.1002 0 I II, 0.1001 I ‘ 0.1 0.0999 0.0998 0.0997 0.0996 O I 4O0 800 1600 1200 C Figure 6.4c - RE 2400 3200 4000 4800 2000 2800 4400 3600 Generation Evolution a,, Average Price -210- Learning, Evolution and SeifOrganization Genetic Algorithm Learning in a Rational Expectations Model 0.99 0.98 >‘ 0 4-, 4-, 0.g > 4-, — 9 0.96 a) U) Li 0.95 LU - 0.94 0.93 800 4c!I0 1600 1200 I 24b0 2000 I 2800 Genet-at ion 0 Figure 6.4d - RE —Evolution , slope of price to y 1 a -211 - 40b0 3600 I 4400 48b0 Learmng, Evolution and SeOrganizadon Genetic Algonthm Learning in a Rational Erpectations Model -0.0011 —0,0012 °‘ 0 -0.0013 >‘ -0.0014 8 0.0015 .2- -0.0016 -0.0017 -0.0018 O I 460 860 16b0 1200 El Figure 6.4e - RE 24b0 32b0 2000 2800 3600 Generation —Evolution , Slope of Price to e 2 cr -212- 40b0 48b0 4400 Learning, Evolution and S4f-Organizadon Genetic Algothhm Learning in a Rational &pecro4ons Model 0. 103 0.102 - 0.101 - ::: 0.097 0.096 - - 0.095 0 I I I I 4150 8150 12001160012000124001280013200136001400014400148001 2O0 6O0 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation 0 RE Evolution Figure 6.4f fl ’ 0 - -213- Learning, Evolution and S4f Organization Genetic Algorithm Learning in a Rational Erpectadons Model 1 .004 1 .002 0.998 0. 996 0. 994 0. 992 m o.gg 0. 988 0. 986 0. 984 0.982 0.98 0. 978 0.976 0 8ll0 400 6b0 1000 1400 2O0 [1 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation RE —Evolution Figure 6.4g 13/ - -214- Learning, Evolution and S4fOiganization Genetic Algorithm Learning in a Rational Expectations Model o .015 0.014 0. 013 0 012 0.011 0.01 0. 009 0.008 0.007 a m 0.006 0.005 0. 004 0.003 0.002 0.001 0 -0.001 -0.002 -0,003 -0,004 Generation D Figure 6.4h RE — Rational - - 215- 0 Evolution Learning, Evolution and S4/Organization Genetic Algonthm Learning in a Rational &pectations Model 1 .03 1 02 1 01 I 0.99 m 0.98 0.97 0.96 0.95 0.94 0.93 D Figure 6.4i RE Rational - - 216 - 0 Evolution ____________ Learning, Evohalon and S4fOrgo.mzathn Figure 6.5 Genetic Algorithm Learning in a Rationol Expectations Model - Simulation 4 - Does Not Converges to GS Equffibrium 1.1 I 0.9 -J Li a 0.8 0.7 L 0 0.6 C 0.5 a a 0 a 0.4 0.3 0.2 0.1 0 I I I I I I 0 500 1000 1500 2000 2500 Cenerat on I1gure 6.5a - X, Proportion of Informed Agents -217- 3000 3500 4000 4500 5000 Learning, Evolution and S4fOrganization Genetic Algorithm Learning in a Rational Expecw.Uons Model 1 02 0.98 0 .96 0.94 0.92 D 6 L 0.9 0.88 0.86 0.84 0 0.82 6 0.8 0.78 0.76 0.74 0.72 0. 0 500 Generation lnforrrd lYgure 6. Sb - + Average Fitness f andf For the first 1000 generation -218- Uninformed 1000 Learning, Evolution and S4/Organization Genetic Algorithm Learning in a Rational &pectations Model 0. 104 0. 103 0.102 0.101 0.1 0.099 I-’ ‘ 0.098 0 0.097 0.096 0 0.095 r 0.094 0.093 0.092 0.091 0.09 0.089 0.088 6 I 4ö0 860 1600 1200 £1 Figure 6.5c - 2400 3200 4000 4800 2000 2800 3600 4400 Generation RE Evolution , Average Price 0 a - 219 - Learning, Evolution and S4/Organization IYgure 6.5d - Genetic AlgoHthm Learning in a Rational Expectations Model , slope of price to y 1 a - 220 - Learning, Evolution and SeOrganizaUon Genetic Algoi*hm Learning in a Rational &pectadons Model -0.0008 - ,- 0.0009 -0.001 0 > -0.0011 > -0.0012 C U) Li c’j -0.0013 < -0.0014 -0.0015 —0.0016 GenGrat I on RE Figure 6.5e - Evolution , Slope of Price to e 2 a - 221 - Learning, Evolution and Sdf-Organization Genetic Algodthsn Learning in a Rational &pectatione Model 0.103 0. 102 0.101 0.1 0.099 0 098 0. 097 0.096 0.095 Generat I on 0 RE Evolution Figure 6.5f (3’ - - 222 - Learning, Evolution and S4fOrganization Genetic AlgoHthm Learning in a Rational Expectations Model 1.1 1.09 - 1.08 1.07 1.06 1.05 1.04 1.03 1.02 1.01 - - - - - - - 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.9 - - - - - I I I 4à0 8à0 12001160012000124001280013200136001400014400148001 200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation a Figure 6.5g RE Evolution - - 223 - Genetic Algorithm Learning in a Rational Expectations Modal Learning, Evolution and S4fOrganization 0.11 0.1 0.09 0.08 0.07 0.06 0.05 0 04 . 0.03 0.02 0.01 o m 0 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09 -0.1 -0.11 0 Figure 6.5h RE — Rational - - 224 - Evolution Learning, Evolution and S4f Organization Genetic Algorithm Learning in a Rational Expectations Model 1.1 1 .08 1.06 • 1 .04 o 8 I 0 8 0oo 1.02 q 0 0 0 0 0 000 I 0.98 0,96 0 94 . 0.92 0.9 0.88 0.86 0.84 o i 4O0J8O0112b0I16b0I2d00I24b0I2Bb0I32b0I36b0I40b0I44b0I48J0I 200 6 0 1000 1400 D Figure 6.5i RE 1800 2200 2600 3000 3400 3800 4200 4600 5000 Generation Rational - - 225 - 0 Evolution Chapter 7 Conclusions odern financial theory almost always begins by analyzing the decisions of 4 N individuals. In this way aggregate behaviour such as that observed in stock prices or interest rates can be tied to economic primitives like preferences and information. However, the complete rationality assumption which is typically used to analyze individual behaviour is a weak link in the analytic chain. The evidence from cognitive psychology demonstrates that assumptions of complete and unbounded rationality are simply not correct models for human decision making. 1 However, while individuals may not satisfy many of the assumptions underlying complete and unbounded rationality, individual choice is not random and is 1 For example see Shoemaker (1982) and Camerer and Weber (1992) for discussion of the expected utility hypothesis for decision making under uncertainty. Shafir, Simonson and Tversky (1993) is an example of the well known phenomenon that how choices are framed influences decision. This evidence is inconsistent with the fundamental premise that agent preferences can be usefully represented with a binary relation. Finally see Vallone and Tversky (1985) for evidence of the inability of people to understand random sequences. - 226 - Learning, Evolution and S4/Organization Conclusions usually reasonable. Therefore complete rationality may be a good enough approximation of individual decisions. This may be especially true in large markets where individual anomalies have little effect on aggregate behaviour. This thesis examines reasonable, but not fully rational, individuals who learn by an adaptive and stochastic process. In this thesis, the behaviour of models with stochastic adaptive individuals is contrasted with complete rationality in two environments. In the first section of the thesis interaction between individuals’ decisions is limited and direct. People face their neighbours in a repeated prisoner’s dilemma. In the genetic algorithm simulation model of Chapter 2, the number of agents replaced in a generation, a parameter interpreted as an indication of the rate of learning, is important. In the infinitely repeated game the learning rate affects the equilibrium level of payoffs (ie. affects which equilibrium is selected). In the fmitely repeated game the learning rate determines whether or not the system converges to the unique Nash equilibrium. Interestingly, high average fitness in the fmitely repeated game is associated with a high degree of volatility in the average fitness and strategies used by the individuals. In Chapter 3 this phenomenon is explored analytically. The analog of the learning rate in this simpler model is the probability that an individual abandoned a successful strategy (ie. non-fitness based replacement). In this model the relative probabilities (even if arbitrarily small) of this stochastic shocks determine whether or not the learning process will converge to the unique Nash equilibrium. - 227 - Conclusions Learning, Evolution and SeOrganization The implications of Chapters 2 and 3 depend on how one views Nash equilibrium or rational behaviour. If a stochastic adaptive learning process is to be used as a defence of rational economic behaviour and the Nash equilibria that this implies, then the complexities of co evolution need to be addressed directly. Not every model of adaptive learning leads to behaviour which is consistent with complete and unbounded rationality. A similar caveat is warranted if an adaptive learning algorithm, such as a genetic algorithm, is to be used to find equilibria of complex less tractable games. On the other hand, if one does not see Nash equilibrium as a necessary outcome of an adaptive process, then the prisoner’s dilemma model offers a bench-mark for considering models where evolution is used to generate “reasonable,” but not necessarily optimal, behaviour in more complex environments. For example, Arthur (1993) suggests a stochastic adaptive learning algorithm to model human choice. In his model, the agent is learning about a fixed environment. The complexities of co-evolution presented in the prisoner’s dilemma model must be addressed when extending Arthur’s model to agents learning about an environment influenced by the behaviour of other learning agents. The difficulties of co-evolutionary models suggests that the development of a “flight simulator” that accurately reflects an entire economic system and in a such a way as to allow policy analysis will be difficult? The second portion of the thesis investigates stochastic adaptive learning in a non-strategic yet co-evolutionary environment. This section develops an asymmetric information, one 2 The flight simulator of economic activity is discussed in Freeman (1992). - 228 - Learning, Evolution and S4/Organization Conclusions period, single risky asset portfolio choice model based on Grossman and Stiglitz (1980). The main finding of these three chapters is that the appropriateness of the rational expectations (or complete rationality) equilibrium depends upon the level of noise in the economy (in the form of noise traders) relative to the level of experimentation in the individual’s learning processes. The discussion of this relationship begins in Chapter 4 by constructing a learning process which converges to the rational expectations equilibrium and concludes with a discussion of the stability of the GS equilibrium to an adaptive learning process where experimentation does not vanish. Chapter 5 develops a deterministic representation of a stochastic adaptive learning process to formally develop the link between noise trading, experimentation and the GS equilibrium. Finally, Chapter 6 demonstrates the stability result in a more general environment using a genetic algorithm as an example of a stochastic adaptive process. The chapter presents examples where the learning process does and does not converge to the GS equilibrium. The effectiveness of the genetic algorithm in demonstrating the theoretical results is encouraging. Genetic algorithm learning may be a useful technique to investigate more realistic models where rational expectations solutions cannot be calculated. However, as is the case in the first section of the thesis, the fact that adaptive learning process need not converge to a rational expectations equilibrium reduces the ability of the genetic algorithm to be used as a tool for solving for equilibria. The key ingredients for stability of the GS rational expectations equilibrium are the level of noise in the asset supply and size of agents’ experiments. It is not immediately obvious how - 229- Conclusions Learning, Evolution and Se’Organization these variables could be measured empirically. There are two views on the underlying cause of the uncertainty about asset supply (or justifications for the assumption). One view is that some agents trade randomly for un-modeled reasons (often called liquidity traders). This type of noise is difficult to measure. An alternative view is that noise is introduced by a small number of irrational traders. De Long et al (1990) develop the discussion of Black (1986) that agents trade based on inferences from irrelevant data. This second type of noise would seem to be most plausible in equity markets, where valuation is complex and differences of opinion are common. Although the GS model presented here is a singleperiod model, it is interesting to note that higher levels of noise in equity markets increase the likelihood that the GS equilibrium is stable and therefore a good approximation for a complex economy. Estimating the size of experimentation of agents in the economy would also be challenging and potentially insightful. In a more general model than the one presented here, market volume might give some indication of the size of experiments. When an agent alters her rule-of-thumb, affecting the desired portfolio, trade will occur. Exactly how trading volume caused by new information can be separated from trade arising from experimentation and adaptation is an open issue. The issue is open particularly because there is no consensus theory of market volume (see Karpoff, 1987). Alternatively, more direct evidence on the learning process could be gathered from a laboratory setting. The GS model with stochastic adaptive learning is well suited to laboratory investigation. By controffing the information - 230- Learning, Evolution and S41 Organization subjects observe about past learning states and fitness, the learning process could be characterized. Such laboratory experiments have been successful in similar models in understanding rational expectations equilibria. 3 Throughout the thesis the structure of the economy and the learning process have been treated as exogenous. It is important to note that the learning process and the structure of the economy are not truly independent. For example, in the asset pricing environment, presumably the market mechanism that determines prices will not develop independently of characteristics like level of noise and learning process. In the GS model, the process by which the equilibrium asset price is determined is not specified. The typical market clearing assumption is used to focus on learning and the process by which the equilibrium price conveys information. The fact that stochastic adaptive learning can lead to a rational expectation equilibrium is important because it helps reduce the concern about the realism of the rational expectations assumptions. However, if learning truly is stochastic and adaptive then it will affect the optimal design and evolution of the market mechanism. Similarly in the prisoner’s dilemma model the spatial structure will be the outcome of individual decisions and therefore cannot evolve independently of the learning process. 4 A model which can incorporate the connection between learning and economic structure would See Plott and Sunder (1982) and Forsythe et al (1982) for examples of laboratory asset markets with asymmetric information. Arifovic (1994b) compares genetic algorithm simulations with an experimental study of an exchange rate model with multiple equilibria. Tesfatsion (1995) and Stanley, Ashlock and Tesfatsion (1993) make this point strongly. -231- Conclusions Learning, Evolution and SeifOrganization be complex and potentially unmanageable. However, understanding the self-organizing connections between learning and economic structure would be a significant contribution to understanding modern economies and the boundedly rational humans whose decisions create them. - 232 - Bibliography Aghion, P., P. Bolton, C. Harris and B. Juffien, 1991, Optimal Learning by Experimentation, Review of Economic Studies, 58, 621-654. Arifovic, J., 1994a, Genetic Algorithm Learning and the Cobweb Model, Journal of Economic Dynamics and Control, 18, 3-28 Arifovic, J., 1994b, The Behavior of the Exchange Rate in Genetic Algorithm and Experimental Economies, manuscript, Simon Fraser University. Arifovic, J., 1989, Learning by Generic Algorithms in Economic Environments, Sante Fe Institute Working Paper Number 90-001. Arifovic, J. and B.C. Eaton, 1994, Coordination and Genetic Learning, manuscript, Simon Fraser University. Arthur, W.B., 1993, On Designing Economic Agents that Behave Like Human Beings, Journal of Evolutionary Economics, 3, 1-22. Axeirod, R.A., 1987, The Evolution of Strategies in the Iterated Prisoner’s Dilemma, in L. Davis (ed.), Genetic Algorithms and Simulated Annealing, Pitman, London. Axeirod, R.A., 1984, The Evolution of Co-operation, Basic Books, New York, NY. Bak, P. and K. Chen., 1991, Self-Organized Criticality, ScientificAmerican, January, 46-53. Bak, P., C. Tang and K. Wiesenfeld, 1988, Self-Organized Criticality, Physica Review A, 364-374. Bak, P., K. Chen and M. Creutz, 1989, Self-Organized Criticality in the ‘Game of Life’, Nature, 342, 780-782. - 233 - Learning, Evolution and SeOrganization Bibliography Bala, V. and S. Goyal, 1993, A Theory of Learning with Heterogeneous Agents, manuscript, McGill University. Beasley, D., D.R. Bull and R.R. Martin, 1993, An Overview of Genetic Algorithms: Part 2, Research Topics, University Computing, 15, 170-181. Bergin, J. and B.L. Lipman, 1994, Evolution with State-Dependent Mutations, Manuscript, Queen’s University. Bernardo, A.E., and K. Judd, 1995, The Relationship Between Trading Volume and Price Changes, manuscript, University of California at Los Angeles. Binmore K.G., and L. Samuelson, 1995, Evolutionary Drift and Equilibrium Selection, manuscript, University of Wisconsin. Binmore K.G. and L. Samuelson, 1992, Evolutionary Stability in Repeated Games Played by Finite Automata, Journal of Economic Theory, 57, 278-305. Binmore, K. G., 1988, Modelling Rational Players II, Economics and Philosophy, 4, 9-55. Black, F., 1986, Noise, Journal of Finance, 529-543 Blume, L. and D. Easley, 1992, Rational Expectations and Rational Learning, manuscript Cornell University. Blume, L. and D. Easley, 1992, Evolution and Market Behavior, Journal of Economic Theory, 58, 9-40. Blume, L.E., M. Bray and D. Easley, 1982, Introduction to the Stability of Rational Expectations Equilibrium, Journal of Economic Theory, 26, 313-3 17. Boylan R.T., 1992, Laws of Large Numbers for Dynamical Systems with Randomly Matched Individuals, Journal of Economic Theory, 57, 473-504. - 234 - Learning, Evolution and Sef0rganization Bibliography Bray, M., 1982, Learning, Estimation and the Stability of Rational Expectations, Journal of Economic Theory, 26, 318-339. Bray, M.M. and N.E. Savin, 1986, Rational Expectations Equilibria, Learning and Model Specification, Econometrica, 54, 1129-1160. Bray, M. and D. Kreps, 1987, Rational Learning and Rational Expectations, in G. Feiwel (ed.), Arrow and the Accent of Modern Economic Theory, MacMillan Press, Houndmills. Cabrales, A., and T. Hoshi, 1992, Evolutionary Consequences of Noise Traders, Chartists and Other Non-Rational Investors, manuscript, University of California, San Diego. Camerer, C. and M. Weber, 1992, Recent Developments in Modelling Preferences: Uncertainty and Ambiguity, Journal of Risk and Uncertainty, 5, 325-386. Danielson, P.A., 1992, How to Evolve AMoral Players: Artificial Morality and Genetic Programming, manuscript, University of British Columbia. De Long, J.B., A. Shleifer, L.H. Summers and R.J. Waldmann, 1990, Noise Trader Risk in Financial Markets, Journal of Political Economy, 98, 703-737. Duffy, J. and J.B. Bullard, 1994, A Model of Learning and Emulation with Artificial Adaptive Agents, manuscript, University of Pittsburgh. Easley, D. and N.M. Kiefer, 1988, Controlling a Stochastic Process with Unknown Parameters, Econometrica, 56, 1045-1064. Easley, D. and N.M. Kiefer, 1988, Controlling a Stochastic Process with Unknown Parameters, Econometrica, 56, 1045-1064. Evans, G.W. and S. Honkapohja, 1994, Economic Dynamics with Learning: New Stability Results, manuscript, University of Edinburgh. - 235 - Learning, Evolution and SeOrganization Bibliography Fama, E.F., L. Fisher, M.C. Jensen and R. Roll, 1969, The Adjustment of Stock Prices to New Information, International Economic Review, 10, 1-21. Fama, E.F., 1991, Efficient Capital Markets:ll, Journal of Finance, 46, 1575-1617. Forsythe, R., T.R. Paifrey and C. Plott, 1982, Asset Valuation in an Experimental Market, Econometrica, 50, 538-567 Freeman, D.H., 1992, Is Management Still a Science? Harvard Business Review, 70, 6, 2638. Friedman, D., 1991, Evolutionary Games in Economics, Econometrica, 59, 637-666. Fudenberg, D and J. Tirole, 1991, Game Theory, MIT Press, Cambridge, MA. Gale, J., K.G. Binmore and L. Samuelson, 1995, Learning to be Imperfect: The Ultimatum Game, Games and Economic Behaviour, 8, 56-90. Goldberg, D., 1989, Genetic Algorithms in Search. Optimization and Machine Learning, Addison-Wesley, Reading, MA. Grossman, S.J. and J.E. Stiglitz, 1980, On the Impossibility of Informationally Efficient Markets, American Economic Review, 70, 393-408. Grossman, S.J., 1976, On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information, Journal of Finance, 31, 573-585. Guesnerie, R., 1992, An exploration of the Eductive Justification of the RationalExpectations Hypothesis, American Economic Review, 82, 1254-1278. Hogarth R.M. and M.W. Reder (eds.), Rational Choice: The Contrast between Economics and Psychology, University of Chicago Press, Chicago, IL. - 236 - Learning, Evolution and S4f Organization Bibliography Holland, J.H., 1992, Genetic Algorithms, Scientific American, July, 66-72. Holland, J.H., 1975, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbour, MI. (1992 edition, MiT Press, Cambridge, MA). Jackson, M. 0., 1991, Equilibrium Price Formation and the Value of Private Information, Review of Financial Studies, 4, 1-16. Jefferson, D., R. Collins, C. Cooper, M. Dyer, M. Flowers, R. Korf, C. Taylor and A. Wang, 1991, Evolution as a Theme in Artificial Life: The Genesis/Tracker System, in C. G. Langton, C.Taylor, J.D. Farmer and S. Rasmussen (eds.), Artificial Life II. the Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems, Addison-Wesley, Redwood, CA. Jeffrey, R., 1965, The Logic of Decision, McGraw Hill, New York. Jordan, J. S., 1991, Bayesian Learning in Normal Form Games, Games and Economic Behaviour, 3, 60-8 1. Kalai, E. and E. Lehrer, 1993, Rational Learning Leads to Nash Equilibrium, Econometrica, forthcoming. Kalai, E. and W. Stanford, 1988, Finite Rationality and Interpersonal Complexity in Repeated Games, Econometrica, 56, 397-4 10. Kandori, M., G.J. Mailath and R. Rob, 1993, Learning, Mutation, and Long Run Equilibria in Games, Econometrica, 61, 29-56. Karpoff, J.M., 1987, The Relation Between Price Changes and Trading Volume: A Survey, Journal of Financial and Quantitative Analysis, 22, 109-126. - 237 - Bibliography Learning, Evolution and SeOrganization Kauffman, S.A. and S. Johnsen, 1991, Co-evolution to the Edge of Chaos: Coupled Landscapes, Poised States and Co-evolutionary Avalanches, Journal of Theoretical Biology, 149, 467-505. Kirchamp, 0., 1194, Spatial Evolution of Automata in the Prisoners’ Dilemma, Manuscript, University of Bonn. Kohlberg, E. and J.F. Mertens, 1986, On the Strategic Stability of Equilibria, Econometrica, 50, 863-894. Kosa, J., 1992, Genetic Programming and Co-evolution of Computer Programs, in C.G. Langton, C.Taylor, J.D. Farmer and S. Rasmussen (eds.), Artificial Life II. the Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems, Addison-Wesley, Redwood, CA. Kraus, A. and M. Smith, 1994, Beliefs about Beliefs, manuscript, University of British Columbia. Kreps, D., 1990, A Course in Microeconomic Theory, Princeton University Press, Princeton, NJ. Langlois, R.N. (ed), 1986, Economics as a Process: Essays in the New Institutional Economics, Cambridge University Press, Cambridge, England. Langton, C.G., 1992, Introduction, in C.G. Langton, C.Taylor, J.D. Farmer and S. Rasmussen (eds.), Artificial Life II. the Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems, Addison-Wesley, Redwood, CA. Linn, S.C. and B.E. Stanhouse, 1992, Learning and the Dynamic Behavior of Risky Asset Prices, manuscript, University of Oklahoma. - 238 - Bibliography Learning, Evolution and S4fOrganization Lucas, R.E., 1986, Adaptive Behavior and Economic Theory, in Hogarth R.M. and M.W. Reder (eds.), Rational Choice: The Contrast between Economics and Psychology, University of Chicago Press, Chicago, IL. Mailath, G.J., 1992, Introduction: Symposium on Evolutionary Game Theory, Journal of Economic Theory, 57, 259-277. Marcet, A. and T. Sargent, 1988, The Fate of Systems with “Adaptive” Expectations, American Economic Review, 78, 168-171. Marimon, R., E. McGrattan and T.J. Sargent, 1990, Money as a Medium of Exchange with Artificially Intelligent Agents, Journal of Economic Dynamics and Control, 14, 329-373. Marks, R.E., 1989a, Breeding Hybrid Strategies: Optimal Behaviour for Oligopolists, Australian Graduate School of Business, Working Paper 89-006. Marks, R.E., 1989b, Niche Strategies: The Prisoner’s Dilemma Computer Tournaments Revisited, Australian Graduate School of Business, Working Paper 89-009. Maynard Smith, 1982, Evolution and the Theory of Games, Cambridge University Press, Cambridge. Milgrom, P., and J. Roberts, 1991, Adaptive and Sophisticated Learning in Normal Form Games, Games and Economic Behaviour, 3, 82-100. Miller, J.H., 1989, The Co-evolution of Automata in the Repeated Prisoner’s Dilemma, Sante Fe Institute Working Paper 89-003. Miller, J.H., 1989, The Co-evolution of Automata in the Repeated Prisoner’s Dilemma, Sante Fe Institute Working Paper 89-003. Morrison, D.F, 1990, Multivariate Statistical Methods. Third Addition, Magraw-Hills, New York, NY. - 239 - Bibliography Learning, Evolution and S4fOrganization Nachbar, J.H., 1992, Evolution in the Finitely Repeated Prisoner’s Dilemma, Journal of Economic Behaviour and Organization, 19, 307-326. Nachbar, J.H., 1990, Evolutionary Selection Dynamics in Games: Convergence and Limit Properties, International Journal of Game Theory, 19, 59-89. Plott, C.R. and S. Sunder, 1982, Efficiency of Experimental Security Markets with Insider Information: An Application of Rational-Expectations Models, Journal ofPolitical Economy, 90, 663-698 Rosenzweig, M.L., J.S. Brown and T.L. Vincent, 1987, Red Queens and ESS: The Co Evolution of Evolutionary Rates, Evolutionary Ecology, 1, 59-94. Rosser, J.B. Jr., 1992, The Dialogue Between the Economic and Ecological Theories of Evolution, Journal of Economic Behavior and Organization, 17, 195-2 15. Sargent, T.J., 1993, Bounded Rationality in Macroeconomics, Oxford University Press, Cambridge, MA. Savage, L.J., 1954, The Foundations of Statistics, Wiley, New York. Selten, R., 1975, Re-Examination of the Perfectness Concept for Equilibrium in Extensive Games, International Journal of Game Theory, 4, 25-55. Shafir, E., I. Simonson and A. Tversky, 1993, Reason-Based Choice, Cognition, 49, 11-36. Shoemaker, P.J.H., 1982, The Expected Utility Model: Its Variants, Purposes, Evidence and Limitations, Journal of Economic Literature, 20, 529-563. Simon, H.A., 1959, Theories of Decision-Making in Economics and Behavioral Science, American Economic Review, 49, 253-283. - 240 - Learning, Evolution and S4f Organization Bibliography Simon, H.A., 1955, A Behavioral Model of Rational Choice, Quarterly Journal of Economics, 69, 99-118. Simon, H.A., 1987, “Behavioral Economics”, “Bounded Rationality”, and “Satisficing” in J. Eatwell, M. Milgate and P. Neuman, (eds.), The New Palgrave, Macmiffian Press, London. Skyrms, B., 1994, Darwin Meets The Logic ofDecision: Correlation in Evolutionary Game Theory, Philosophy of Science, 61, 503-523. Skyrms, B., 1990, The Dynamics of Rational Dynamics, Harvard University Press, Cambridge, MA. Slade, M.E. and B.C. Eaton, 1991, Evolutionary Equilibrium in Market Supergames, manuscript, University of British Columbia. Stanley, E.A., D. Ashlock and L. Tesfatsion, 1993, Iterated Prisoner’s Dilemma with Choice and Refusal of Partners, in C. Langton (ed.) Artificial Life ifi, Addison-Wesley (forthcoming). Stokey, N.L. and R.E. Lucas, 1989, Recursive Methods in Economic Dynamics, Harvard University Press, Cambridge, MA. Taylor, P.D., and L.B. Jonker, 1978, Evolutionarily Stable Strategies and Game Dynamics, Mathematical Biosciences, 40, 145-156. Tesfatsion, L., 1995, A Trade Network Game with Partner Selection, working paper, Iowa State University. Thompson, R., 1985, Conditioning the Return Generating Process on Firm-Specific Events: A Discussion of Event Study Methods, Journal of Financial and Quantitative Analysis, 20, 15 1-168. - 241 - Learmng, Evolution and SeOrganization Bibliography Vallone, R. and A. Tversky, 1985, The Hot Hand in Basketball: On the Misperception of Random Sequences, Cognitive Psychology, 17 295-314. van Damme, E., 1987, Stability and Perfection of Nash Equilibria, Springer-Verlag. Vives, X, 1993, How Fast do Rational Agents Learn? Review ofEconomic Studies, 60, 329347. Vose, M.D., 1991, Generalizing the Notion of Schema in Genetic Algorithms, Artificial Intelligence, 50, 385-396. Waitman, P., 1986, A Second Course in Differential Equations, Academic Press, Orlando, FL. Wang, J., 1993, A Model of Intertemporal Asset Prices Under Asymmetric Information, Review of Economic Studies, 60, 249-282. Young, H.P., 1993, The Evolution of Conventions, Econometrica, 61, 57-84. 242 - Appendix A Proofs for Chapter 3 - Proof of Proposition 1 The replicator dynamic in (3.5) for the prisoner’s dilemma (with p denoting the proportion of agents playing C) is = i (1 = PC(l PC) — (u(C) u(D)) — - (A. 1) —p)A(p) where (p) is the difference in the expected utilities of playing C versus D. The probability an agent playing a, meets an agent playing C is denoted lrC(pC;a,). The z(p) is calculated as follows: = = c c(p ; C) c T(p ; C) — - e c(p ;D) — d (1 c(P D)) — (e d) ‘cQ’ ;D) - - (A.2) d In order forp* to be a stable dynamic equilibrium, the following two conditions are required (A.3) =0 P -P _4_ < dp dt) (A.4) 0 PC=P These conditions translate into conditions on (p). First, since PC(i-PC) >0, (A.3) implies (A.5) = 0 From (A.2), this condition can be written as cQ ;C) c(P = f)) (A.6) + Note that in order for ‘w(p;C) <1 it is sufficient for I(e—d C pD) < (A.7) c Second, calculating the derivative in (A.4) and using the facts thatp(1-p) >0 and (pD=0, this condition can be written as dA(p) dp = c p *;C) - 243 - - (e-d) p *;D) < 0 (A.8) Learning, Evolution and S4Organizalion Appendix A where the prime denotes the derivative with respect to p. Note that a sufficient condition for (A.8) to be satisfied is ir(p ;C) =0 and ‘7rC(p ;D) >0. The following equations construct piece-wise-linear functions that satisfy (A.6) and (A.8) for E (0,1). They are shown in Figure B. 1. = (c\f e—d) k d\ c,i PP * (A.9) ÷g 0 g p 1 p 0 h = + (!::)p * p * — € * +e pp (A. 10) +h 1 h p 2 where g , h 0 1 and h 2 are (easily calculated) constants chosen to give continuity and , g 0 , h 1 ) =1. The e >0 is chosen small enough to ensure ir(p;a) E [0,1]. 1 ) =0 and ir(1 ,a 1 T(0,a These functions satisfy all but the smoothness criterion of the proposition. They are differentiable everywhere but at the kinks. Since these kinks are at least distance away from p they can be smoothed (for example using the logistic function) and still meet the requirements of (A.5) and (A.8). - 244 - Learning, Evolution and S4forganizaiion Appendis A 1.0 d C + (ed)p* rjj;C) C - 0 1.0 pC Figure B.1 - Functions which support p* as a stable dynamic equilibriwn The functions arefrom (A. 9) and (A. 10). The dashed lines indicate smoothing ofthe piece-wise linearfunctions. Markov Transition Probabifities The Markov transition matrix M with entries M as the probability of transiting to state j given the current state is state i. The Markov matrix elements, from (3.9) are as follows. M 1 1 12 M 13 M 14 M 15 M 16 M = = = = = = - ’/21L 1 ½ 0 0 0 0 245 - (A.11) ____ Learning, Evolution and SeOrganizadon 21 M = 22 M = Appendix A 2c /2M) 1 )+ 4 2 1 ’ c+e I 2c +½)+4,7((1)1 e Aq)((1— (1— 1 +½jA) 2c+e (2c+ej e j r M, (A.12) = M )[ej 4 ((l_, 27 l/ = +½) M=O 26 = 0 M 31 M = 0 32 M = (1_1/zll)((1_I.L)(_S_j M33 = d ½((1-) 1 +½)+(l_½n)((l_) . 1 .......J +½) [;j M = 0 M35 = M = +½) c+ d I (A.13) 1 1 d +½) ½fl((1—L) 0 M 4 2 43 M M 45 M M 5 1 52 M 53 M 54 M 55 M 56 M = 0 = ½(½) = 0 = = = = = = = = ½(1—½)+(1—½)(½) (1_½,7)(1_1/2) (A.14) 0 0 ‘Aq(½) ½i(½) (1—3)(½)+4,(1—’/2) (1—34)(1—½) - 246 - (A.15) Appendix A Learning, Evolution and SeOrganizadon M 6 1 = 0 62 = 0 M 63 = 0 M MM=O 65 = M 56 = 1-/ M (A.16) Proof of Proposition 2 First the invariant distribution of M in (3.9), with the noise levels as specified in (3.11), is =p*IM yielding the function p*(;K,K). Each of calculated. This is done by solving the six states of the solution has the following form (1=1,.. ,6 p*? p.*(•;KK) ) 3 ID crexp(-/ D ahexp(-hfl J = (A.17) where the cx’s are functions of the game payoffs (c,e,d) and the j3’s are linear iç and K and are strictly positive. Note that each of the six terms in p has the same denominator. As gets large each of the terms in the numerator and denominator of (A. 17) approach zero. The limit of the expression, therefore, will depend only on the terms with the largest exponent (ie. the terms with the smallest fi). Using 0(z) to indicate terms with order strictly less than -z, the limiting distnbution, p is: * pi(C;Kq,Kp) = 1 B 32(c+d)(2c+e) exD[—(3÷K‘1 )C1 + 0(3+K) —c 32(c÷d) ext,[—(3+KI)C1j + Q(3+K) —e 32(2c+e) exp[—(3+KI) + Q(3+K) 1 J I I. (A.18) — —(3c+2d)e 64(c+d)(2c+e) exnlL —(3 +Kfl +KI’ )C1 + 0(3 +K +K) K] + 0(1+K +K) j —e 82c+e) exp[—(1÷K +K -e 4(2c+e) . exp[-(K +Kp )Cj “ “ + “ 0(K11 +K) j where B is given by 1 Mathematica was used to solve this system of equations and simplit the results to find the highest order terms. - 247 - Learning, Evolution and S4f Organization B= Appendix A 2 —C xp[ -(3+KqXj + —c(2c+e) —e(c +d) 32(c+d)(2c+e) exp[ —(3+K)C] + —e 4(2c+e) exp{_(Kq+KpX] + 32(c+d)(2c+e) (A.19) O(l+Kq+Kp) Note that it is only in the denominator where determining which term (or terms) dominates depends on K and K. In particular, the dominant terms will depend on the magnitude of K and K relative to 3 and each other. There are 11 possible cases for the limit of p(KK) as -oo They are listed in Table B. 1. Case Numbering K,<3 3 K, = 1 Iç>3 KM<3 1 2 3 K=3 4 5 6 K>3 7 8 9,10,11 Table B. 1- Nwnbering of Cases There are 11 cases to consider. Note thefinal cell represents 3 cases depending on the relationship between K, and K,. Note that the final cell of the table includes three distinct cases depending on the relationship between K and K. The proof proceeds by simply taking the limits in the various cases. Case 1 K,<3 and K Consider state six. - —e Inn p ( C;K,)Iç ) 6 = lim 4(2c+e) exp[ —(Ic +K)] c + O(K +K) =1 exp —uç +içç + oIç (A.20) +iç Since probability of state six goes to one, the probability of the other states goes to zero. Case7. 8and9-K,>3andKK Consider state one. - 248 - Appendix A Learning, Evolution and SeOrganizalion -c 2 exp[ —(3÷19CJ 32(c+d)(2c+e) q,K, 1 pj(s;K urn ) + 0(3+K) = = — 32c+d2c+e exp[ —(3 ÷K)] + 1 (A.21) 0(3+19 Again, since this probability is one, the others go to zero. Case 3. 6and lO-K>3andKr>Kp In this case, consider states two and three. lim —c exp[_(3+K)C] + 0(3+K) 32(c+d) -c(2c+e)--e(c+d) exp[ —(3 +K)C] ÷ 0(3 +K) 32(c’-d)(2c+e) = c(2c+e) c(2c+e) +e(c+d) = lini p;(;K,Iç) = (A.22) —e exp[ —(3 +K)?] + 0(3 +K) 32(2c+e) —c(2c+e)—e(c+d) exp[ —(3 +K)(] + 0(3 +K) 32(c÷dX2c+e) I (A.23) e(c+d) c(2c+e) +e(c+d) = states goes to zero. The sum of these probabilities is one. Thus the probabilities of the other For completeness, the This completes the cases which were listed in the Proposition. remaining four cases are included. Case 4- I<3 and I.=3 In this case, consider states one and six. 2 —C Urn C lim p(;K,,K) = ( 32(c+d)(2c-’-e) cxp[—(3+K)iJ + 0(3÷K,,) —e exp —(3 +K)C] 32(ci-d)(2c÷e) 4(2c+e)) 2 —C + . 2 C 8e(c+d) c + 2 - 249 - + 0(3 +K) (A.24) Appendix A Learning, Evolution and S4fOrganization lim ( = —e xp[—(3+K)c] 4(2c+e) + -C + -e .exp[-(3+K,)] 32(c+d)(2c+e) 4(2c+e)) + 0(3+19 (A.25) 8e(c+d) +8e(c+d) 2 c Again, note that these probabilities sum to one. =3andK, 7 Case2-K, In this case, the limit of the invariant distribution puts all the weight on states two, three and six. These probabilities are calculated below. 32(c+d) lim p(C;K, ,Iç) 1 = c-. —e + c(2e+e) c(2c+e) +e(c÷d)+8(e(c + = urn p(C;Kq,K) ( —c(2c+e)—e(c+d) 32(c+d)(2c+e) ‘exp[ _(3+K)(]+0(3+K) —e 32(2c+e) = ( —c(2c+e)—e(c+d) 32(c+d)(2c+e) + lirn p(;K,K) ?-... ( Se(c+d) +efr+d)+8(e(c+d) c(2c+e) + 0(3 +K) + 0(3÷K) + 0(3+K) (A.26) (3 +J( )(]+Q(3 i-K) —e e(c+d) c(2c+e) +e(C+d)+8(e(c+d) —e 4(2ci-e) —c(2c+e)—e(c+d) + 32(ci-d)(2c+e) ) exp[ —(3 +K)] ) .exp[...(3+K)] (A.27) —(3 +K)t]+Q(3+K) —e ) .exp[—(3+KK] (A.28) These three probabilities sum to one. Case 5- K,=3 and K In this case, states one, two, three and six must be considered. The calculations follow. - 250 - ______ Appendix A Learning, Evolution and SeOrganization 2 —C lim 32(c-’-d)(2e+e) = ( —c(+e)—e(c+t1) 32(c+d)(2c+e) + — exp[-6(J 32(c+d)(2c+e) + + 0(6) —e exp[ —6C] 4(2c+e)j (A 29) + 2 C +c(2c+e)+e(c+d)+8e(c+d) 2 c lim AV’( -c .exp[-6C] 32(c+d) ; K, K) = C-— L. —c 2 32(c+d)(2c +e) —c(2c+e) —e(c+cf) 32(c+d)(2c+e) + + + 0(6) —e exp[ —6C] 4(2c+e)) (A 30) + c(2c+e) +c(2c+e)+e(c+d)+8e(c+d) 2 c -c p*( C;K,Iç) 32(c+d) = ( C-— _2 .exp[-6C] —c(2c+e)—e(c+cO + + 32{c +d)(2c+e) 32(c+d)(2c +e) + 0(6) —e exp[ -6] 4(2c+e)J + A 31) 06 + 32 oy e(c+d) +c(2c+e) +e(c+d)+8e(c+d) 2 c -e lim p( (;K, K) (4(2c+e) = C-— 2 —c 32(c+d)(2c +e) exp[6] —c(2c+e) —e(c+d) 32(c +d)(2c+e) + + + 0(6) —e exp[ -6] 4(2c+e)J 8e(c+d) +c(2c+e) +e(c+d)+8e(c+d) 2 c These four probabilities sum to one. Case 11 - =K>3 1 KK, In this, the last of the 11 cases, states one, two and three are relevant. Let K=I =K> 3. _c2 Jim p(t;Iç,K) C-.— 32(c+d)(2c+e) = ( 2 —c + Q(3+K) —c(+e)—e(c*-cOi exp[ —(3K)Cj + 32(c+d)(2c-i-e) 32(c’-dX2c+e) 2 C +c(2c+e) +e(c+d) 2 c -251 exp{-(3+K)J - (A.33) + 0(3+K) Appendix A Learning, Evolution and S4fOrganization -c exp[-(3+IQJ 32(c-i-d) lim = —c 2 (32(c+(2c+e) + O(3+K) —c(2c+e)—e(c+d)1 exp[—(3+K)J 32(c+(2c+e) 3 + + (A.34) O(3+K) + (A.35) Q(3+K) c(2c+e) +c(2c+e)+e(c+d) 2 c lim p( ;Kq,Iç) -e [-(3÷IQ(] 32(2c+e) = 2 —c + Q(3+JQ —c(2e+e) —e(c+d)’ .exp[—(3+IQC] 32(c+ti(2c+e) + J (32(c+d)(2c+e) e(c+d) +c(2c+e)+e(c+d) 2 c These three probabilities sum to one. With these 11 cases, the proof is complete. - 252 - _____________ _________________ ____________ Appendix B - _______ Proofs for Chapter 4 Proof of Lemma 1 The asset demands in (4.4), characterized by the learning states, determine the aggregate demand. Equating demand with aggregate supply determines the equilibrium price. 13yP 0 j3 f3+f3P-P + = jEl a(+’y) N• e (B.l) uEU Note that there are X • N informed agents and (1—k) • N uninformed agents. The following definitions are useful. 1 kNa(g+yi) 1 1 (l_k)N,fr,a(cr+.yu) ‘ 2 T’ and TU are the average (or aggregate) effective risk tolerances of the informed and the uninformed agents. Average 13’s can be defmed by weighting the fl’s by the effective risk tolerance. For j=O, 1 (31 = ‘ 1 T’XN iEI (3 a(o+yz) flU = ‘ ‘ 1 T’.(l—X)N (13.3) uEU Using these definitions, (B. 1) can be stated as, X T’(I3 +fl1 - P) (1-k) + Th1(fl’ i3P —F) =e (B.4) Solving for the equilibrium price, where a + = XT’ a1 a + (1-k)fl’T’ - (1_k)Th1(1_f3’) (B.5) = XT’ + (1—k)T’(1--j3’) + —1 (1-X)T’(1-fl’) = XT’ yields (4.10). Proof of Lemma 2 For the informed agents, (4.11) follows directly from (4.1) and (4.2). For the uninformed, (4.12) follows from the linearity of the equilibrium price in (4.10) and the joint normality of the random variables. Note that - 253 - ___ Appendb B Learning, Evolution and S4fOrganization E[P] =a 0 2 0 P = 22 22 aly+a2e (T 1 =a These relationships determine Eb’ IPI and V[y IF] (see Morrison, 1990). E[ylP] = E[y] + !.(p -E[P]) cr,, (B.7) ) 0 •(P—a 2 22 2 a y + 1 °e = 2 a V[yIP] = 2 (cT2 — “P (B.8) 22 2 a 2 2 = 2 2 + a o 2 2 0:,, The E[j Ill and V[y I F] are combined with the dividend and signal relationship in (1) and (2) to yield the rational learning state. • Proof of Lemma 3 Consistent with the analysis in Section 4.ffl, Chapter 5, and the genetic algorithm simulations in Chapter 6, for concreteness let 3= 1 (the analysis for an arbitrary j3 proceeds similarly). In rational expectations equilibrium all agents use rational learning states, Vz=f. Using the rational learning states described in lemma 4 and the definitions in (B.2) and (B.3), in the rational expectations equilibrium, fl’=ir and =j3 forj=0, 1 and , 1 TU= 2 2 a + E 2 2 a.2 qc1y 2 a.2 o,2 1 (B.9) .2 2 + 2 a e The equilibrium price (for any population learning state) is given by (4.10) and (B.5). 30 /3i11 and T U yields the following set of linear I Inserting the above values for 13o’, i3i’, T’, , equations. - 254 - ______________________ ______________ Learning, Evolution and S4lOrganizaiion Appendix B XI3T’ XT’ (1_X)TU + + - (1_X)TU(1_fl) Xf3T’ (B.1O) a; XT’ + (1_X)TU(1_j3’) + —1 (1_X)TU(1_(3J) * 2 a = XT’ The unique solution to these is equations • 0 a * = I3 + XT’ ro 22 = where k= — e * = + (1—X)T’ + 2 2 o.2 Xk ÷ >2 y 2 + X k 222 y °e 0 + u (B. 11) 6 k 222 eT 0 ka 1/(XT’). These defme the rational expectations equilibrium price. U Proof of Proposition 1 From the linear pricing equation (4.10) and the joint normal distribution, the conditional distribution of y given P is normally distributed with mean JL(P) given in (B.7) and variance 2 from (B.8). The proof of the proposition proceeds by calculating three quantities: o fu(n,p), f”Q?,LIP), andf’(t’,LIP). (ffl LIP) 4 f The risky asset demands of the uninformed follow from (4.4) and the parameters in r. E’[dIP]-P = aV”[djP] + = f3P (B.12) 2 i3c + (P) P a(o + (P) + where (P)= —D+(I3 1 1 —f3 DP. 1 Note the use of the assumption that yu=y1=c. End-of-period wealth for the uninformed agent follows from (4.3) and is - 255 - Learning, Evolution and Seq’ Organization Appendix B p(P) —P + + (P) a(+o2) I (3* (B.13) +y+e—P) End-of-period wealth is normally distributed with conditional mean and variance of 1 E[W i] = (j w: + (P) —P + + ,L(P) (13.14) — + a(c+o2) 1i v[w = ( + i(P) —p 1 r(P)) + (B.15) a2(o+o2) These conditional moments determine the utility (using the CARA presences and normality) fu(4u,L IP) = E[_exp(_a W j I P1 1 = _exp_a = -exp(-a w) exp 1 IE[Wln IP]-.v[w P1] 1 (B.16) exp 1 2 n(p) [+O (ii) fu(e*,LIp) Note that (P) =0 (for all P) when fu (f ,L I P) * = t1=1*(L). j Therefore (B.17) -exp(-a w) exp and fu(1?n,L IP) = 2 (p) exp 2(+o2) fU(f*,L IF’) (13.18) (iii) fi(en,LIP) Note that the assumption that the informed agents know the I3*s in (4.1) implies that f(1?’,LIP) =f*,LIP). Following similar steps which lead to (13.16), the informed agent’s expected utility conditional on both y and P is - 256 - Learning, Evolution and S4/Organization ) 1 E[U(W Appendix B iy,pj n 1 [E[W _expI_a = n i]j j 1 y,p]—.v[w — , —exp-a = W,j expa cj exp 0 2 (B.19) (fj+y—P) 2 2 Taking expectations over y given P in (B. 19) will yield f(t,L I P). As noted above, the conditional distribution of y given P is normal. The calculation uses the normal distribution and “completes the square” (see also Grossman and Stiglitz (1980) page 406). J½ IP = = -exp(-a w: exp(a c) exp(a c) exp 2 L 2 +o [ f”(f ,L P) 2 + (L) 2 a This completes the proof. Proof of Proposition 2 The proof uses the conditional utilities calculated in Proposition 1 to calculate the unconditional utilities. First by inspection of (B. 16) and (B.20) note that initial wealth, W 0 will have no impact on the relative utilities. Therefore, without loss of generality, set =O. Define 0 W ½ 2 D = E exp(a c) + 03.21) (L) 2 o Note that this value does not depend on P. Using the variables defined in equations (4.17) to (4.20) and 03.21), write (B. 16) as fu(en,L I) = —exp _2 (x2_2)j 03.22) and 03.20) as f(t,LIP) = —D exp1_X2] 03.23) Note that X is a linear function of P (the conditioning information in Proposition 1). In fact, by construction, (P)=A+BX. Since P and X are perfectly correlated, fu(1n,LIp)fu(tn,L IX) and f(t,LIP)=f’(t”,LIX). Since P normally distributed (a linear - 257 - Learning, Evolution and SerfOrganization Appendix B function of normally distributed variables, see (4.10)), Xis also normal. Using the technique of completing the square commonly used when calculating expectation of exponential normals the expectation, over X, is as follows. Er..exp [_2(X2_(A+BX)2)1 2 1 = ) exp 2 -(1+C(1-B _c 2 _A2+_Bi4_CA2_2tAB 1+C(1-B)(B.24) jj Note that this is the equation for E[f(r,L IX)]. Multiplying (B.24) by D and setting A =B =0 yields E[f’(f’,L I X)]. Normalizing f(4,L) =1 and calculating L (4n,L) 4 fi IX)] E1p(r,LX)] E[f’(€,L = using (B.24), completes the proof.’ (B.25) • Proof of Corollary 1 Since VZ = for all n agents, (P) =0. Therefore in (4.16), A =B =0. The rational expectations equilibrium proportion of informed agents is where agents are indifferent between being informed or uninformed, that is fu=f= 1. The X* follows from (4.16) with A =B =0 and the defmition of o (L) in (B. 8) and the rational expectation a; in (B. 11) 2 • 2 * Note that the ratio considers the fact that the absolute level of utilities are negative. 2 Integer constraints are ignored here. In the genetic algorithm simulations of Chapter 6, X’O.5 and N is even, so integer constraints do not play a role. - 258 - Appendix C - Proofs for Chapter 5 Proof of Lemma 1 (a). From inspection of equation (4.16) in Chapter 4, forJ” 1e,fU is continuous in X and 4?.’ Thus since the set {f 1} is closed so is G(E) (ie. inverse image of a closed set for continuous function is closed). (b). By the definition of the GS equilibrium, (X, 4? **(X*)) E G(E). Therefore G(E) is not empty. (c). Note that at £(X) uninformed agents are making no inference errors (by defmition) and since X> X, fU(X,f**(X))> 1. Since f 14 is jointly continuous in X and in the inference parameters, there exists such that for all (E,Et) ER , fU(X + e>, 4? (X) + es)> 1 for X> X and 3 I (e),e) <E.. i (d) By construction of the GS equilibrium, fU(X*,4?**(XD) = 1. Since inference errors reduce 14 1. Again, by construction of the GS f utility for all e (with lie >0), (X*,1?**(X*)+e)< ( 4 fu(x,4?)<f x,4?**(x))< equilibrium, for X< X 1. I Proof of Lemma 2 For XE [X*, 1], defme the correspondence, G(E) as G(E) = {ti(x,t)eE)} = {e[f(x,4?;E)1} (C.1) Note that by Lemma 1(b), G(E) is non-empty (and therefore is a correspondence). Since is continuous in and c and the set {f‘ 1) is closed, the graph of the correspondence 2 G(E) is closed. Therefore G(E) is upper hemi-continuous. fU 1 and = 1 only if £ =4?**(X). In the economy where Orç=0, that is E=(0,XD, fU(X,4?;E) Therefore, G(0,X) is the singleton {t (X)}. Consider a sequences E , with oe>0 and constant X, such that E,_s(0,X*). Take any two 1 £,-s.4? sequences and £-‘4?’ such that £,,4?,’E G(E,). Since G is upper hemi-continuous, 4? ,t E G,(O,XD. But since G(0,XD is a singleton, 4? = 4?’ = 4? (X) and I ..f**(X) + **Q) -‘0. By the triangle inequality, X,-f + X’-4? therefore £: 4? £,-f; —‘0. Since this is true for any two sequences, 4(E,)-’0. fl 1 2 fl I i rIl, fl From the remarks which follow equation (4.16), note thatf is only discontinuous when the inference error in the slope is large. However, this occurs only whenf approaches zero. See Debreu (1954) Theorem 1.8(1), page 18. A correspondence, h:Q-.R associates each element ofxEQ with a non-empty subset of elements in R. h(4) is upper hemi-continuous (or upper semi-continuous) if: x,-x , y,eh(x) and Y,Yo implies 0 0 th(x y ) . - 259 - Learning, Evoiunon and SeOrga&zadon Appendix C Proof of Proposition 1 (a) First consider possible dynamic equilibria such that XE(0,1). By assumption, g 0(X,1?) E G. Given (5.4), the continuity of g and properties of the closed sets G, for XE(0, 1), g=0 if and only if (X,t) is on the border of G. (Throughout this appendix, FiG indicates the border of the set 0). Similarly, for XE (0,1), g=0(X,) EH (ie. (1? =1 (X)). From Lemma 1 (c) and (d), the only intersection of the border of G with H is where X=X. Therefore for XE(0,1), the unique dynamic equilibrium is (X,(X)). To verify that this equilibrium is stable note that g> 0 for X < X (Lemma 1(d)) and g <0 in a neighbourhood of r(X) for X> X (Proposition 1(c)). This coupled with the continuity of the learning process and the convergence assumption on g 3 1 establishes stability. (b) Next consider X =1. By assumption, all points (1, £) are fixed points of the learning dynamic. However for (1,1?) E G, g(1-e,X) <0 (for e > 0) and these fixed points are not 5 For (1,1?) G, g(1-e,X) >0. Thus for small enough c and X stable. 0 = 1-€, limX =1. By assumption, is small for X close to 1. Thus, for, £=?, lim,t, can be made arbitrarily close to t. Thus (1,X) G is stable. 6 (c) Finally consider X=0. Since g(e,X)>0 (for small positive e), any equilibria with X=0 are not stable. Proof of Lemma 3 For any £, (l,f)E G’ follows immediately from the assumptions that g(1,X)=0 and (which may depend on 1?) such that € < implies m(1 ,X) <0. The existence of g(1—e,X)+m(1—e,X)<0 is an immediate consequence of the continuity of g, and m. • Proof of Lemma 4 (a) By definition, (X’,4?’)E G’ implies that g(X’,f’)+’qm(X’,t’)0. If g(X’,t!’)0 then (X’,f’)E G. On the other hand if g(X’,f’)>0 then 0<g(X’,f’)< —‘qm(X’,f’). By choosing small enough q, g(X’,t’) is within ‘ of g(X,1)=0. By the continuity of g, (X’,4’) is with distance of (X,4). For additional details, see the proof of equilibrium existence for learning processes including drift in Proposition 2. More formally, this can be stated as there exists an 1>0 such that this g(l-€,t)<O for all 0<e<1. Since G need not be a convex set, the sign of g, may be positive for larger e. For the case where (1,t) is on the border of G, then (X-e,1) may or may not be in G. If (X-e,t) EG, then (l,t) is clearly not stable. However, if (1-e,t) G then g(l-e,X) >0. However, since t t’(l ,t) (Lemma 1(c)), g is non-zero and the stability depends on the specific properties of g (ie. if g 4 implies monotonic convergence to t’, then boundary points are clearly not stable). 6 The points (1 ,X) E G are not asymptotically stable since the equilibria (1,1) are connected. - 260 - Learning, Evolution and S4fOrganization Appendix C (b) (X,e)E Gimplies that g(X,e)0. Ifg(X,t)+im(X,C)0, then (X,)E G’. If not then 1?) <jm(X, f). Thus for small enough t, by the continuity 0< g(X, £) + 7 of g and mx, there exists (X’,e’) within distance of (X,e) such that g(X’,t’)+im(X’,f’) 0. • Discussion: Sufficient Conditions for non-empty h(X) There are at least two ways to rationalise the assumption that h(X) is a function (ie. that solution to g, + m =0 for fixed X exists, is unique and asymptotically stable). The first is 1 to utilize the smallness of ii,. If we write z(f ,i)=g(X,t)+,m(X,i), then by the assumption that g, leads to the rational expectations inference parameters holding X constant, zx(f**(X),0) =0. Since (X) is asymptotically stable, the determinant of the Jacobean of z at this point is non-zero. Therefore, by the implicit function theorem, the function f(j) exists such that zx(),) =0 for small Therefore if attention is restricted to X which are not close to 1 and small ‘h, then h(X) exists and is unique. However, when X approaches 1, g,-O and the implicit function theorem no longer can be applied (ie. Jacobean becomes zero). However, since m(l,,j=0 a similar argument (for fixed >0) and X=1 can be constructed. The implicit function argument will not work for all X since there is an “inbetween” X where g and 1 m, are of the same order of magnitude. . The second approach is to make additional assumptions directly on g, and m to ensure that h(X) is non-empty. For example, assume that both g, and m, are monotonic (in each element of £) as follows: (X,t) 1 g = ig(X,r*(X)_1) (C.2) = (C.3) where the function 1 -.R are jointly continuous and increasing (term by term) xR g’ m: [0,1] 2 in their second argument and equal zero if and only if the second element is zero. Without loss of generality assume that 1m< <r(X) (where << means the inequality is true for both elements of 1 or £,,< £ and m1 < t**) Given these assumptions, for a given XE (0,1], the set h°(X) = {1? I g 2 +iem, =0} is not empty. This follows from the monotonicity and continuity assumed in (C.2) and (C.3). They imply that the system (holding X constant) is always pushed into the box defined by and £m. That is: r >O, m 0 g >O 0 O< £mO <0, m 0 g, <0 0 Noe d&=g+,n is just a linear combination of two continuous functions, therefore is also continuous. - 261 - (C.4) Learning, Evolution and S4jOrganization Appendix C g > 1 O, 1 m > 0 ei<emz (C.5) g, < 1 0, m <O 1 1 Since the functions are continuous, it follows that h(t ) = {f 0 I g, Q, , 1) 0 Lo, t ) 0} is non-empty and (by Debreu (1954) Theorem 1.8(1)) is a upper hemi-continuous correspondence of f. Similarly for 1 ) h(f 1 g (X, ) ( , 0 ) +iemt =0} ={f X,f t 1 e is a nonempty upper hemi-continuous correspondence of £. These two correspondences must cross at least once. Using the monotonicity assumption, at least one of these crossings will be asymptotically stable. Proof of Lemma 5 The graph of the set of dynamic equilibria, h°(X) = {fIg(X,e) + m,(X,f)=0} (C.6) is non-empty by assumption. This graph is closed since 1 +17 g m , is continuous and the inverse image of a continuous function of the closed set {0} is also closed. This implies that h° is an upper hemi-continuous correspondence (see Debreu (1954) Theorem 1.8(1)). Since asymptotic stability is a refinement of a dynamic equilibrium, h(X) c h°(X). However, since both sets (by assumption) are singletons, h(X) =h°(X) and h is also upper hemi-continuous correspondence. Since h(X) is also a function, upper hemi-continuity implies it is continuous. . Proof of Lemma 6 h(1)=€m follows from the assumptions that g,(1,€)=0 for all £ and m (X,e)=O only at £ =m 1 = Therefore, h°(1) {1?} where h°(X) is defined in (C.6). Since h(X) =h°(X) (by assumption), h(1)={e**}. Proof of Lemma 7 For X < 1, g,(X,e)=0 if and only if £ = £ (X) and m (X,e)=0 if and only if £ = em. However, 1 by assumption, £m £ (X). Thus, h(X) £ (X). Proof of Lemma 8 Define KQ)=sup(,){ fl m(X,i) }. Since m 1 is bounded, K(X) < . For any XE (0,X”), g(X,e(X))=0 and g (X,f)+m(X,t)=0 where £=h(X). 1 Therefore, g,(X,f-g(X,e) m(X,t) <K(X). By i, choice of K 1 (X) can made be arbitrarily I = small. By the continuity of g,, e-r <. . fl Prelude to Proof of Proposition 2 The following definition and Lemma are useful for showing the existence of a stable dynamic equilibrium for a learning process. These definitions are with respect to the general autonomous (ie. time independent) system of equations for xE R, dx/dt=g(x) where L(,t) - 262 - Learning, Evolution and SeOrganizadon Appendix C is unique continuously differentiable solution. 8 Note that the learning process considered in chapter in (5.1) and (5.2) is of this form. the f all trajectories which begin in V remain in Definition The set V is an invariant set Lemma El If x is a stable dynamic equilibriwn, then for every neighbourhood there exists and invariant set VC Q. V. Q ofx, Proof of Lemma El Since x is stable, there exists a neighbourhood R such that x ER implies all xE Q. Defme 0 the set 0 ,t)Ix V={L(x E R, t0}. The set V is all the points of all the paths which originate in R. By construction, VC Q. By construction, xE V implies there exists some x ER and 0 r0 such that x=L(x ,r) and L(x 0 ,t)E V for all tr. Time independence implies that 0 ,t) =L(x,t-r). Therefore L(x,t) E V for all t 0 and V is invariant. 0 L(x Remark V need not be a neighbourhood of x even though it is contained in the neighbourhood of Q. Proof of Proposition 2 The proof shows the intersection of h(X) (where d1/dt=0) and the boundary of G’, denoted G’ (where dXldt=0) is non-empty and that at least one of the intersections is stable. For small e, (e,h(e))G’ (Lemma 1(d) implies g(c,f)>0 and (5.12) implies m(E,)>0)) and (1,h(1))E G’ (by Lemma 4). Therefore, the infemum of (x’,’) = inf{(X,h(X))EG’} (C.7) has 0< X’ <1. Since h(X) is continuous, (X’ ,h(X’)) E ÔG’ (otherwise (X’-€,h(X’-€)) E G’ for small which contradicts that X’ is the infemum). Therefore (X’,e’) is a dynamic equilibrium. It remains to be shown that this point is stable. To show that (X’,f’) is stable, consider any arbitrary neighbourhood Q’ ER 3 of (X’,t’). The following will construct the neighbourhood R’ C Q’ with the property that all trajectories originating in R’ must remain in Q’. To begin, without loss of generality, let Q’=(X’—z,X’+i)xQ where t>0 and QER 2 is a neighbourhood of t’. The construction of R’ is such that R’=AxR where A=(X’-½e,X’+½E) with 0<e< and R is a neighbourhood of £‘ with R C Q. Let the solution to dt/dt=g(X,f)+j m(X,1?) for a fixed and constant X be denoted 2 ,t). Note that L is continuously differentiable in each of its parameters. Recall 0 e=L(X,e For the existence and uniqueness of G is sufficient that g is continuously differentiable (bence Lipshitz Continuous). - 263 - Appendir C Learning, Evolution and S4f Organization that the unique asymptotically stable fixed point of this system is given by the continuous function h(X). Since h(X) is continuous, there exists an E >0 such that h(X) ER for all XE A. Furthermore, since each h(X) is also a stable dynamic equilibrium (when X is held constant), ,t) EQ 0 there exist sets U(X), neighbourhoods of h(X), such that tE U(X) implies that L(X,f for all t 0. Since each U(X) is open, h(X) is continuous and € >0 can be chosen arbitrarily small, there exists an open set U (a neighbourhood of V =h(X’)) such that for all XE A, E U implies that L(X,1 0 h(X)E U and £ ,t)ER for all t0. For example, U could be the 0 intersection of the appropriately chosen U(X). For each XEA, an invariant set V(X)CR can be constructed as in Lemma El, 0 U, t0}. Consider the intersection between any two of these sets ,t)I t 0 V(X)={L(X,1 V(X) fl V(X+e). The following will show that this intersection is an invariant set under both L(X,•,•)andL(X+€,•,•). Consider 4? C V(X) fl V(X+c). It follows: E U and T such that X=L(X,4? 0 ,r). 0 (i) By definition 1? C V(X) implies there exists £ (ii) Similarly, 4? C V(X+€) implies there exists £E U and TE such that £ =L(X+€,4?,r). (iii) Since V(X) is an invariant set, L(X,f ,t)E V(X) for all t0. 0 (iv) Similarly, since V(X+€) is an invariant set, L(X+€,t,t)E V(X+€) for all t0. (v) Since L is (continuously) differentiable, the implicit function theorem can be applied 9 and there exists (t) such that 0 ,t)=L(X+€,t L(X,e + r,t). By choice of small €, can be made arbitrarily small.’° Since U is open, £+(t)E U. Since this is true for ,t)E V(X+€) for t0. 0 any t, L(X,4? (vi) Similarly, there exists a small (t) such that L(X + €, £,t) =L(X, £4) + ,t) and £ + C U. Therefore L(X+€,f,t)E V(X) for all t0. Taken together, these points imply that L(X,t ,t) C V(X) fl V(X+e) and 0 L(X+€,4?,t)E V(X) fl V(X+€). The intersection of the invariant sets is also invariant. Let V be the intersection of all the invariant sets. That is, V fl = V(X) (C.8) XEA Since (i) to (vi) are true for the intersection of two arbitrary sets which differ in the parameter by €, it follows that for all XE A, 4? C V implies that L(X, 4? ,t) C V for all t 0. The implicit function theorem applies so long as the determinant Jacobean of G, VG(X,h(X),t) is non-zero. This follows from the fact that h(X) is the unique fixed point and that it is asymptotically stable. 10 Note that since the h(X) are asymptotically stable, the set U can be chosen to ensure that trajectories converge. Therefore lim. h(X)-h(X+e) which is small by the continuity of h(X). - 264 - Learning, Evolution and S4tOrganizo4on Appendir C We have constructed an invariant set V (such that VCRC Q) for the inference parameters. As longs as X,EA, £,E V. It remains to be shown that R and A (ie. choice of E) can be chosen to ensure that XE A. By construction, the dynamic equilibrium (X’,2’) is on the boundary of the closed set G’ such that, (X’-e, £ ‘) G’ and (X’ + E, £‘) is an interior point of G’. Therefore for a small enough neighbourhood R of 1?’, £ ER and X’-E < X < X’ implies (X, 1) G’ and dklcft>O. Similarly, for I ER and X’ X < X’ + € implies (X, I) E G’ and dX/dtO. Therefore, for any neighbourhood Q’ of (A’, I’), there exists the neighbourhood AX U such , (X ) EAXU that 0 t implies )EAxV and by construction 1 (X,,€ AxVC(X’—A,X’+A)xQ=Q’. Proof of Theorem 1 **(1)(A A) li Consider the learning process L(X”4”,t) with X”=l-A” >X and I Using Lemma 3 and the continuity of h (Lemma 5), for small A”, for XX”, (A ,h(X)) is an interior point of G’. Therefore there are no dynamic equilibrium of L with A A” (ie. no intersections of h(X) and FiG’). Therefore, applying Proposition 2 (existence of a dynamic equilibrium), there exists at least one dynamic equilibrium with A < A”. Recall that the intersection of £**(X) and FiG is unique and is the GS equilibrium (Proposition 1). Since for A < A”, h(X) I**(X) fl <A” (see Lemma 8), A” can be chosen small enough to ensure that sup{II(X,f)_(A*,e**(A*)II I(XMEFiGflh(A)}<€. Lemma 4 implies that for X<A” and (A’, I’) E FiG’, inf{ (A’, £ ‘) (A, I) I (A, I) E FiG) <A” (ie. the boundary of G is close to the boundary of G’). Therefore, by the triangle inequality, if (X,I) is a dynamic equilibrium (ie. in h(A) fl FiG’) then f (A, £)(X*, I (A)) <A” + € < A for small enough A”. Therefore all the dynamic equilibria of L(X”,A”,I,) are with in A of the GS equilibrium. The same is also true of learning processes with less drift, namely A’ A’, A’ A” and £,,A(1)II — — I IIf—1(l)II. Proof of Theorem 2 It is sufficient to show that for small G’ and h(A) do not intersect for A < A”. From Lemma 2, G(E) approaches the set {(X,r(A)) IA> X*} as 7e goes to zero (holding X constant). From Lemma 7, h(X) 1(X). Therefore there exists j such that for all o< h(X) and G do not intersect. The fact that h(A) and G’ do not intersect for A < A” in economies with < o’ follows from the property that A drift does not exceed inference drift. That is, the property that inf{ h(A)—e * (A) } > sup{ (A,I)—(A,t’) (X,I)E FiG,(A,e’)e aG’} (C.9) This assumes that, for all X, t(X)t(1)-(”,A). This is without loss of generality since choice of t, can be perturbed slightly if required. - 265 - Learning, Evolution and S4tOrganization Appendix C Therefore there are no equilibria with X < X”. Any equilibria (at least one exists), must have the property that X > X”. - 266 - Learning, Evolution and SeifOrganization Appendix C ...TheEnd... - 267 -
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Finance theory applications of learning, evolution...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Finance theory applications of learning, evolution and self-organization Routledge, Bryan R. 1995
pdf
Page Metadata
Item Metadata
Title | Finance theory applications of learning, evolution and self-organization |
Creator |
Routledge, Bryan R. |
Date Issued | 1995 |
Description | This thesis investigates stochastic adaptive learning and contrasts models of adaptive individuals with models that assume complete and unbounded rationality. In this thesis, individuals follow rules of thumb which are developed by adopting or copying successful rules, abandoning unsuccessful ones and occasionally creating novel rules. In fixed environments, these learning algorithms differ only slightly from complete rationality Bayesian approaches. However, in situations where an individual’s utility depends on the behaviour of other individuals, a co-evolutionary environment, the prediction of adaptive learning models can differ markedly from traditional complete rationality models. In the first section of the thesis interaction between individuals’ decisions is limited and direct. People face their neighbours in a repeated prisoner’s dilemma. A genetic algorithm, used as an example of a stochastic adaptive learning process, is developed in Chapter 2. The rate of learning in the algorithm is controlled by altering the number of individuals obtaining new strategies in a generation. In the infinitely repeated game the learning rate affects the equilibrium level of payoffs (ie. affects which equilibria are selected). In the finitely repeated game the learning rate determines whether or not the system converges to the unique Nash equilibrium. Chapter 3 considers a similar model analytically yielding analogous results. The second portion of the thesis investigates stochastic adaptive learning in a non-strategic yet co-evolutionary environment. This section develops an asymmetric information, one-period, single risky asset portfolio choice model based on Grossman and Stiglitz (1980). The main finding of these three chapters is that the appropriateness of the rational expectations (or complete rationality) equilibrium depends upon the level of noise in the economy (in the form of noise traders) relative to the level of experimentation in the individual’s learning processes. The discussion of this relationship begins in Chapter 4 by constructing a learning process which converges to the rational expectations equilibrium and concludes with a discussion of the stability of the Grossman-Stiglitz equilibrium to an adaptive learning process where experimentation does not vanish. Chapter 5 develops a deterministic representation of a stochastic adaptive learning process to formally develop the link between noise trading, experimentation and the Grossman-Stiglitz equilibrium. Finally, Chapter 6 demonstrates the stability result in a more general environment using a genetic algorithm as an example of a stochastic adaptive process. |
Extent | 7598786 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-04-24 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0088380 |
URI | http://hdl.handle.net/2429/7553 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Graduation Date | 1995-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1995-060551.pdf [ 7.25MB ]
- Metadata
- JSON: 831-1.0088380.json
- JSON-LD: 831-1.0088380-ld.json
- RDF/XML (Pretty): 831-1.0088380-rdf.xml
- RDF/JSON: 831-1.0088380-rdf.json
- Turtle: 831-1.0088380-turtle.txt
- N-Triples: 831-1.0088380-rdf-ntriples.txt
- Original Record: 831-1.0088380-source.json
- Full Text
- 831-1.0088380-fulltext.txt
- Citation
- 831-1.0088380.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088380/manifest