UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the nature of the stock market : simulations and experiments Blok, Hendrik J. 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2000-565076.pdf [ 9.27MB ]
Metadata
JSON: 831-1.0085499.json
JSON-LD: 831-1.0085499-ld.json
RDF/XML (Pretty): 831-1.0085499-rdf.xml
RDF/JSON: 831-1.0085499-rdf.json
Turtle: 831-1.0085499-turtle.txt
N-Triples: 831-1.0085499-rdf-ntriples.txt
Original Record: 831-1.0085499-source.json
Full Text
831-1.0085499-fulltext.txt
Citation
831-1.0085499.ris

Full Text

ON THE NATURE OF THE STOCK MARKET: SIMULATIONS AND EXPERIMENTS by Hendrik J . Blok B . S c , University of British Columbia, 1993 M . S c , University of British Columbia, 1995 A DISSERTATION S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF Doctor of Philosophy in T H E F A C U L T Y - O F G R A D U A T E STUDIES (Department of Physics and Astronomy) We accept this dissertation as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A November 2000 © Hendrik J . Blok, 2000 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of P t ^ C S /WO A^JKotpKy The University of British Columbia Vancouver, Canada D a t e 5erT. ^ ^OOO DE-6 (2/88) Abstract Over the last few years there has been a surge of activity within the physics com-munity in the emerging field of Econophysics—the study of economic systems from a physicist's perspective. Physicists tend to take a different view than economists and other social scientists, being interested in such topics as phase transitions and fluctuations. In this dissertation two simple models of stock exchange are developed and simulated numerically. The first is characterized by centralized trading with a mar-ket maker. Fluctuations are driven by a stochastic component in the agents' fore-casts. As the scale of the fluctuations is varied a critical phase transition is discov-ered. Unfortunately, this model is unable to generate realistic market dynamics. The second model discards the requirement of centralized trading. In this case the stochastic driving force is Gaussian-distributed "news events" which are public knowledge. Under variation of the control parameter the model exhibits two phase transitions: both a first- and a second-order (critical). The decentralized model is able to capture many of the interesting properties observed in empirical markets such as fat tails in the distribution of returns, a brief memory in the return series, and long-range correlations in volatility. Significantly, these properties only emerge when the parameters are tuned such that the model spans the critical point. This suggests that real markets may operate at or near a critical point, but is unable to explain why this should be. This remains an interesting open question worth further investigation. One of the main points of the thesis is that these empirical phenomena are not present in the stochastic driving force, but emerge endogenously from interactions between agents. Further, they emerge despite the simplicity of the modeled agents; suggesting complex market dynamics do not arise from the complexity of individual investors but simply from interactions between (even simple) investors. Although the emphasis of this thesis is on the extent to which multi-agent models can produce complex dynamics, some attempt is also made to relate this work with empirical data. Firstly, the trading strategy applied by the agents in the u second model is demonstrated to be adequate, if not optimal, and to have some surprising consequences. Secondly, the claim put forth by Sornette et al. [1] that large financial crashes may be heralded by accelerating precursory oscillations is also tested. It is shown that there is weak evidence for the existence of log-periodic precursors but the signal is probably too indistinct to allow for reliable predictions. in Contents Abstract » List of Tables ix List of Figures x i Acknowledgements xix Chapter 1 Introduction 1 1.1 Financial markets 1 1.2 Motivation for research 1 1.2.1 Motivation for the physicist 2 1.3 Anticipated challenges 5 1.4 Modeling 6 1.4.1 Computer simulations 7 1.4.2 An appeal for simplicity 8 1.5 Organization of the thesis 8 Chapter 2 Centralized Stock Exchange Model 10 2.1 Inspiration 10 2.2 Theory 10 2.2.1 Assumptions 11 2.2.2 Utility theory 12 2.2.3 Optimal holdings 14 2.2.4 Risk aversion 15 2.2.5 Optimal investment fraction 16 2.2.6 Forecasting 17 2.2.7 Fluctuations 19 2.2.8 Initialization 20 2.2.9 Market clearing 21 iv 2.2.10 Review 23 2.3 Implementation 25 2.3.1 Pseudo-random numbers • • 27 2.4 Parameter space exploration 28 2.4.1 Number of agents N 28 2.4.2 Total cash C and total shares S 28 2.4.3 Investment fraction limit 6 31 2.4.4 Risk aversion a and forecast uncertainty <je 31 2.5 Parameter tuning 34 2.5.1 Forecast error 37 2.5.2 Finalized parameter ranges 40 2.6 Discussion 40 2.6.1 Fundamentalists versus noise traders 40 2.6.2 Forecasting 40 2.6.3 Portfolios 41 2.6.4 Difficulties 41 Chapter 3 Decentralized Stock Exchange Model 44 3.1 Inspiration 44 3.2 Basic theory 45 3.2.1 Assumptions 45 3.2.2 Utility theory 47 3.2.3 Optimal investment fraction 47 3.2.4 Fixed investment strategy 48 3.2.5 Friction 51 3.2.6 Cal l orders 51 3.2.7 Poisson processes 53 3.2.8 Call interval 53 3.2.9 Reply orders 56 3.2.10 Time scale 57 3.2.11 Initialization 58 3.3 Fluctuation theory 58 3.3.1 Bayes' theorem 58 3.3.2 News 59 3.3.3 Price response 61 3.3.4 Review 63 3.4 Implementation 65 3.5 Parameter space exploration 65 3.5.1 Number of agents TV 65 v 3.5.2 Total cash C and total shares S 65 3.5.3 Further scaling 68 3.6 Parameter tuning 69 3.6.1 News response 69 3.6.2 Friction 71 3.6.3 News interval 72 3.6.4 Finalized parameter ranges 72 Chapter 4 Analysis and Results: Phase space 74 4.1 C S E M phase space 74 4.1.1 Review 74 4.1.2 Data collection 74 4.1.3 Phases 75 4.1.4 Investment limit 77 4.1.5 Critical regime 79 4.1.6 Alternative thermodynamic variables 80 4.1.7 Finite size effects 82 4.1.8 Transient 87 4.1.9 Summary 88 4.2 D S E M phase space 90 4.2.1 Review 90 4.2.2 Data collection 92 4.2.3 Phases 92 4.2.4 Phase transition to H = 1 at rp — n 96 4.2.5 Phase transition to H — 0 at rp = r2 97 4.2.6 Summary 101 4.3 Number of investors 101 4.3.1 Centralized Stock Exchange Model 103 4.3.2 Decentralized Stock Exchange Model 106 4.3.3 Summary 106 Chapter 5 Analysis and Results: Empirical results 108 5.1 Price fluctuations 108 5.1.1 Background 108 5.1.2 Alternative: Decaying power law 110 5.1.3 Methodology I l l 5.1.4 Centralized stock exchange model 114 5.1.5 Decentralized stock exchange model 116 5.1.6 Summary 122 vi 5.2 Price autocorrelation 125 5.2.1 Background: The efficient market hypothesis 125 5.2.2 News 125 5.2.3 Short timescales 126 5.2.4 Long timescales 127 5.3 Volatility clustering 132 5.3.1 Shuffling 133 5.4 Scaling and Clustered volatility 134 5.5 Wealth distribution 135 5.5.1 Challenges 135 5.5.2 Log-normal distribution 135 5.5.3 Two-point price response 136 5.6 Summary 138 Chapter 6 Experiments with a hypothetical portfolio 140 6.1 Motivation 140 6.2 Choice of companies 141 6.3 Friction 142 6.3.1 Minimum friction 142 6.4 FIS Experimental results 144 6.4.1 Events 144 6.4.2 Performance 145 6.5 Log-periodic precursors 150 6.5.1 Scale invariance 150 6.5.2 Discrete scale invariance and complex exponents 150 6.5.3 Log-periodic precursors 151 6.5.4 Critical points 151 6.5.5 Application to financial time series 152 6.5.6 Experimental design 153 6.5.7 Results 155 6.5.8 Universality of scaling ratio 158 6.6 Summary 158 Chapter 7 Concluding remarks 160 7.1 Review 160 7.1.1 Anomalous market properties 160 7.1.2 Centralized stock exchange model 161 7.1.3 Decentralized stock exchange model 161 7.1.4 Fixed investment strategy 162 vii 7.1.5 Log-periodic precursors 163 7.2 Conclusions to be drawn from this research 163 7.3 Relation to other work in the field 164 7.4 Avenues for further work 165 Bibliography 167 Appendix A Discounted least-squares curve fitting 177 A . l Least-squares curve fitting 177 A.2 Discounting 178 A.3 Storage and updating 180 A.4 Memory 182 A.5 Unknown measurement errors 183 A.6 Forecasting 183 A.6.1 Unknown measurement errors 184 A . 7 Summary : 185 Appendix B Sampling discrete processes 186 B . l Simple random walk 186 B.2 Poisson Brownian motion 190 B . 3 Sampling 191 Appendix C Long-range memory: The Hurst exponent 193 C l Synthesis 197 C. 2 Analysis 199 C.2.1 Dispersional analysis 200 C.2.2 Scaled Window Variance analysis 201 C.2.3 Levy Flight 202 C.3 Conclusions 205 viii List o f Tables 2.1 All parameters and variables used in the Centralized Stock Exchange Model (CSEM) 26 2.2 Parameter values for CSEM Runs 1, 2 and 3 29 2.3 Parameter values for CSEM Runs 4 and 5 32 2.4 Parameter values for CSEM Run 6 35 2.5 Regression analysis of log w versus agent parameters for different sam-ples of Run 6 (Table 2.4). The symbols indicate the sign of the regression-line slope, or zero if it is insignificant (relative to its stan-dard error). The results indicate that a is positively correlated with wealth but ae, M and d are largely irrelevant 36 2.6 Parameter values for CSEM Run 7 37 2.7 As Table 2.1 except with updated parameter ranges. These ranges will be used in subsequent simulations 39 3.1 All parameters and variables used in the Decentralized Stock Ex-change Model (DSEM) 64 3.2 Parameter values for DSEM Runs 1, 2 and 3 66 3.3 As Table 3.1 except with updated parameter ranges. These ranges will be used in subsequent simulations. All parameters except N and rp are firm 73 4.1 Parameter values for CSEM Dataset 1. Some of the parameters were established in Chapter 2 and are common to all the runs. Dataset 1 explores two dimensions of phase space: N and ae 75 4.2 Parameter values for CSEM Dataset 2. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.1, N — 100) exploring a range of investment limits 5 77 4.3 The threshold values of a£ separating the two phases of CSEM shown in Fig. 4.3 do not appear to depend on the investment limit 5 78 ix 4.4 Parameter values for D S E M Dataset 1. Some of the parameters were established in Chapter 3 and are common to all the runs. Dataset 1 explores two dimensions of phase space: N and rp 91 4.5 Parameter values for D S E M Dataset 2. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.4) exploring a few other intermediate system sizes 98 4.6 Parameter values for C S E M Dataset 3. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.1) with many agents N = 10,000 103 5.1 Parameter values for D S E M Dataset 3. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.4) run out for longer times (roughly 80 years). Also notice that for rp — 0.99 the news response was reduced by an order of magnitude to keep the price within reason 116 5.2 Parameter values for D S E M Dataset 4. These runs are characterized by a two-point distribution of the price response. Each agent chooses tp — rio or Thi with equal probability. (All unspecified parameters are duplicated from Table 4.4.) 119 5.3 Linear correlation analysis between said variable and the logarithm of the characteristic return from D S E M Dataset 4. The correlation is strongest with the upper limit of the price response 121 6.1 Initial holdings of a hypothetical portfolio on January 4, 1999. . . . 141 6.2 Events relating to the hypothetical portfolio which occurred during the course of the experiment 144 6.3 Final holdings of a hypothetical portfolio on May 12, 2000 147 6.4 First four moments of the distribution of log-returns for each stock, the two trading strategies under review and the Nasdaq Composite Index. The skewness characterizes the asymmetry of the distribu-tion and the kurtosis indicates the presence of outliers. The average skewness is not found to be significant but the kurtosis is 148 6.5 Average values and standard deviations of the daily portfolio returns (wt/wt-i — 1) for all data and separately for days a crash was fore-casted and not forecasted 157 6.6 Same as Table 6.5 except only including data up until the observed decline on Apr i l 14 157 x List of Figures 1.1 Sample phase transitions. A first-order transition (left) is character-ized by a discontinuity in the order parameter, while a second-order (critical, right) is discontinuous in the first derivative 4 2.1 The exponential utility function defined in Eq. 2.4 is often applied in finance. The goal wealth parameter wgoai implicitly sets the risk aversion 14 2.2 Demonstration of forecasting via polynomial curve extrapolation. Shown are forecasts produced by a simple moving average and a linear trend. The linear trend is able to anticipate reversals in returns 18 2.3 The expected initial trading price depends only on the risk aversion multiplied by the uncertainty of returns, aae. As the aversion or uncertainty increases the initial value of the stock drops 24 2.4 Comparison of time evolutions of (a) price and (b) volume for Runs 1, 2 and 3 as defined in Table 2.2. The price scales as the ratio of cash to shares and the volume scales as the number of shares. (In both plots Run 2 is offset to improve readability.) 30 2.5 Comparison of time evolutions of price for Runs 4 and 5 as defined in Table 2.3. The price is not perfectly invariant under rescalings which preserve the constant aae 33 2.6 Price history generated by C S E M with parameters listed in Table 2.4 (Run 6). The price almost reaches its theoretical maximum of $999 (see Eq. 2.44) before collapsing. The agent state variables were sampled at the times indicated 35 2.7 Plot of agent wealth versus (a) risk aversion and (b) forecast error. The best fit lines have slopes 5.2± 1.4 (positive correlation) and 4.7 ± 8.8 (no correlation), respectively 38 xi 2.8 Price history generated by C S E M with parameters listed in Table 2.6 (Run 7). The series has the undesirable property that the price spends much of its history at or nearing its ceiling ($999) 39 2.9 Plot of agent wealth versus (a) cash and (b) shares held showing that most agents hold extreme portfolios of maximum cash and minimum shares, or vice versa. It appears that the method of calculating the investment fraction in C S E M (Eq. 2.19) is too sensitive to fluctua-tions. (It should be acknowledged the plots are truncated since the lowest wealth actually extends down to 1 0 - 2 5 , an unrealistic quantity since real money is really discretized with a minimum resolution of one penny.) 42 3.1 The fixed investment strategy specifies how many shares to trade at a price p given an investment fraction i and an ideal price p* (from Eq. 3.14). As the current price drops toward zero the fractional change in shares diverges 50 3.2 Call orders p„ are placed at either the current price p or the limit prices pg or p$, whichever is better. The spread between the limits increases with friction / 52 3.3 "Buy" and "Sell" call orders are modeled as independent Poisson processes with price-dependent rates. As the last trading price in-creases, the probability of a "Sell" order being called becomes much more likely than a "Buy." 55 3.4 Comparison of time evolutions of (a) price and (b) volume for Runs 1, 2 and 3 as defined in Table 3.2. The price scales as the ratio of cash to shares and the volume scales as the number of shares. (Run 2 is offset to improve readability. The gaps in (b) denote periods of zero volume.) 67 3.5 The price series generated by Run 1 is compared with the expected price generated by Eq. 3.70, showing rough agreement (though with systematic deviations) 70 4.1 The price series plots for C S E M with N — 100 agents and a€ — 0.10 (a) and at — 0.05 (b) indicate a change of character of the dynamics. 76 4.2 The highest price in any given simulation increases as the forecast error decreases until it reaches its theoretical limit, creating two sep-arate phases for the dynamics 78 xn 4.3 The maximum price in CSEM has a limit which depends on the in-vestment limit 6. However, the threshold value of cr£ for which the limit is first reached does not appear to depend on 6 79 4.4 The best fits of power laws to CSEM Dataset 2 (and N = 100 from Dataset 1) yield the critical points (a) and scaling exponents (b) shown. The lines represent the weighted averages of the best fit values. 81 4.5 The average price (a) and variance of fluctuations (b) also exhibit scaling near the critical point ac = 0.12 for the data from CSEM Dataset 2. The deviation from scaling observed near the critical point in (b) is due to the finite size of the system (N = 100) as will be seen in Fig. 4.7 83 4.6 The best fits of power laws to CSEM Dataset 1 yield the critical points (a) and scaling exponents (b) shown. A finite-size scaling analysis (neglecting N — 50) reveals information on how the critical point changes with increasing investor numbers (a). For reasons discussed in the text, the exponent for N = 1000 is dropped from the estimate of the scaling exponent (b) 84 4.7 The variance of the log-price largely collapses to a single curve when multiplied by the system size N for CSEM Dataset 1. This curve diverges as the critical point is approached with an exponent 7 = 1.29±0.02 calculated from the largest system N = 1000. (The critical points were taken from Fig. 4.6(a).) 86 4.8 Duration of the transient period in CSEM (Dataset 1) before the price series settles down to some steady-state value. The transient grows near the critical point ac ~ 0.08 89 4.9 The maximum transient in CSEM appears to scale with the system size with an exponent 1.5 ± 0.1 90 4.10 Sample price series for DSEM with N = 100. Negative values of rp (a) produce an anticorrelated series while positive values (b) result in positive autocorrelations 93 4.11 The Hurst exponent increases with rp in DSEM as expected but with two surprising phase transitions emerging at larger system sizes: one near rp ^ —0.4 and the other near rp « 1 95 4.12 The best fits of power laws to DSEM Dataset 1 yield the scaling exponents shown. The average exponent is 0.185 ± 0.016 96 4.13 Sample fits of first- and second-order phase transitions to N — 500 (a) and N — 1000 (b) near r2 in DSEM show that the power-law fits better for small N but the first-order prevails for larger systems. . . 99 xiii 4.14 As the system size JV increases the critical exponent b tends to zero. The line represents a power-law fit 6 oc Nm giving an exponent m = -0.25 ±0.04 100 4.15 The price series (a) and daily volume (b) of DSEM with JV = 200 and rp = —0.45 is a good example of intermittency. The dynamics fluctuate between two phases 102 4.16 The price series of CSEM for ot — 0.01 (a) appears unaffected by changing the number of agents JV. In particular, occasional semi-periodic fluctuations (b) are observed for all system sizes 104 4.17 The price series of CSEM for a£ = 0.15 exhibits smaller fluctuations and a lower mean as the system size increases. (The lower mean may simply be because the system has not reached a steady state yet.) . 105 4.18 In DSEM the price series does not get more regular as the system size is increased—in fact the fluctuation grow. This is especially true for rp = —0.75 (a) but it is also indicated to a lesser degree at rp — 0.90 (b) 107 5.1 Ten minute returns (86,000 data points) of the Swiss franc-U.S. dollar exchange rate [2] (negative tail) compared to power law with crossover to a ~ 3 (a) and power law with exponential drop-off presented in this section (b) 112 5.2 Both tails of the cumulative distribution of daily (normalized) returns for the Nasdaq Composite index between October 1984 and Jun 2000 (4,000 data points) fit well to a decaying power law. The power law is truncated by two standard deviations in the positive tail but extends almost to four in the negative tail 113 5.3 Scaling in the distribution of returns is only observed well below the critical point ae <C ac in CSEM as indicated by large values of the characteristic return rc. For small ae scaling occurs in both tails for daily returns but only for negative returns in monthly returns. . . . 115 5.4 For ae = 0.03 in CSEM (JV = 1000) the distribution of positive (monthly) returns (upper) almost converges to a Gaussian but still has a slightly heavy tail. The negative returns (lower), however, ex-hibit scaling for r < rc « 5.4 with an exponent a w 1.1 117 5.5 DSEM only begins to exhibit scaling, as measured by a characteristic return exceeding three standard deviations, for price responses well below the first-order transition r2 = —0.33 and as the price response approaches the critical point r\ — 1 118 xiv 5.6 The characteristic returns in D S E M with a two-point distribution of price responses (r[0 and r/jj) exceeds the required threshold of rc = 3 when rfii is large (a). Neglecting the dependence on r/ 0 (b) it be-comes clear that the characteristic return grows exponentially with the upper limit r^i, crossing the threshold 1 120 5.7 The characteristic return r c (a) and scaling exponent a (b) for D S E M with ri0 — 0.00 and r « = 1.25. The characteristic return grows as the sampling interval is shortened, but the scaling exponent a is fairly constant (1.55 ±0 .11) 123 5.8 Fitting the decaying power law to D S E M with a two-point price re-sponse using returns on individual trades (rather than per unit time, as in Fig. 5.6) shows scaling still occurs in the same region of param-eter space 124 5.9 Sample price series for D S E M Dataset 4 (r/0 = 0.5, = 1.5) showing the price roughly tracks the exponential of the cumulative news e77. The proportionality constant is estimated from the data 126 5.10 The autocorrelation between daily returns for the Dow Jones Indus-trial Average [3] decays rapidly to zero with an estimated character-istic timescale T c = 0.4 ± 0.2 days. (Being less than the sampling interval, this estimate is not precise.) 127 5.11 The autocorrelation between tickwise returns for D S E M (with a two-point price response distribution) decays rapidly to zero for all runs sampled 128 5.12 Sample dispersion plot (see Section C.2.1) demonstrating the phe-nomenon of crossover in the Hurst exponent to H 1/2 on long timescales for D S E M with a two-point price response distribution. . 129 5.13 A reproduction of Fig. 5.12 except with regularly sampled returns at an "hourly" interval (instead of tickwise). Short timescale anticor-relations crossing over to uncorrelated returns at long timescales are still observed so the effect is not an artifact of sampling tickwise. . . 130 5.14 Both the crossover point, or memory, (a) and Hurst exponent for short timescales (b) indicate that memory effects are minimized when flo > 0-25 in D S E M with a two-point price response distribution. (The high values of the Hurst exponent for r\0 > 0.5 (b) do not cause problems because the memory is very short in this region (a).) . . . 131 xv 5.15 The Hurst exponent of the absolute returns, which measures the de-gree of clustered volatility, is strictly greater than one half for all parameter combinations in DSEM. It is particularly high when the upper limit of the two-point distribution r^i is large or when the lower limit r/ 0 is small 133 5.16 D S E M Dataset 4 (N — 100 agents) is able to capture three important properties observed empirically when r j 0 > 0.35 and > 1.25. The curves are contours from previous plots: (1) characteristic return rc — 3 from Fig. 5.8 (solid line); (2) memory in return series = 100 from Fig. 5.14(a) (dashed line); and (3) Hurst exponent for the absolute returns H = 0.6 from Fig. 5.15 (dotted line) 134 5.17 Sample distribution of agents' wealth from D S E M Dataset 3 (N = 100, rp = —0.50). There is insufficient data to distinguish between a normal and a log-normal distribution 136 5.18 Sample distribution of agents' wealth from D S E M Dataset 4 (N = 100, ri0 = 0, r/jj = 1). The log-normal curves are calculated from each sub-population, revealing a strongly bimodal nature 137 5.19 In D S E M with a two-point price response the wealth of each of the sub-populations w(rp) depends strongly on the magnitude of the price response \rp\. The population with the smallest absolute price re-sponse (r/jj to the left of zero and r\0 to the right) consistently has more wealth as indicated by the ratio of wealth between the two sub-populations 137 6.1 Historical wealth using FIS versus (a) the Buy-and-Hold strategy and (b) the Nasdaq Composite Index over the same interval (rescaled to be equal at the start of the experiment) 146 6.2 Histograms of log-returns of capital rt+i = \og(wt+i/wt) for both strategies. Notice BHS exhibits more large fluctuations (fatter tails) than FIS 148 6.3 Sample fit of Eq. 6.29 to portfolio wealth on May 12, 2000. The best fit parameters indicate a crash is anticipated on or around tc =July 4, 2000 155 6.4 Daily wealth returns (wt/wt-i — 1 ) are shown along with the dates forecasted to crash in (a). The qualities of the curve fits corresponding to the forecasted crashes, which suggest the reliability of the predic-tions, are shown in (b) 156 A . l Comparison of weightings using standard and discounted windows. . 179 xvi A . 2 Discounted least-squares fitting has a computational storage advan-tage over moving windows of JV data points when JV > M2 + M + 2 where M is the number of parameters to be fitted 182 B . l When a random walk is generated at some regular interval and sam-pled at another, A , the number of jumps between samples wil l vary. 187 B.2 The kurtosis is only zero at integer values of the sampling interval A and diverges as the sampling interval approaches zero 188 B.3 The distribution of increments for the random walk appears to have fatter tails than a normal distribution with the same variance when sampled at intervals of A = 1.05. However, the tails still drop off as e~x2 189 B.4 Discrete Brownian motion with Poisson-distributed jump intervals has tails which fall off exponentially (with a decay constant of 0.72), instead of as e x *, when sampled at regular intervals (A = 1) 191 C l Sample fractional Brownian motion time series with different Hurst exponents: antipersistent H = 0.1 (top) has negative long-range cor-relations, uncorrelated H — 0.5 (center) is standard Brownian mo-tion, and persistent H = 0.9 (bottom) has positive long-range corre-lations 195 C.2 Power spectral densities for the fractional Brownian motion time se-ries shown in Fig. C l . The points are from finite samples of 1000 points each and the line represents the theoretical spectrum. For low frequencies the power spectrum is well approximated by a power law l/f2H+1 196 C.3 Scaled window variance analyses for the fractional Brownian motion time series shown in Fig. C l (exact H—0.1, 0.5, and 0.9, respec-tively). The estimated values of H shown represent the best fit slopes of the lines. The analysis used M m j n = 4 (see the text) 203 C.4 Comparison of Hurst estimators using synthetic datasets of 1000 points each. The scaled-window variance method (SWV, *) per-forms significantly better than rescaled range analysis (R/S, +) and marginally better than dispersional analysis (Disp., x). (The points are offset slightly to improve readability.) 204 xvii C.5 Comparison of Hurst estimators on uncorrelated Levy flight with characteristic exponent a using synthetic datasets of 1000 points each. Rescaled range (R/S, +) and dispersional analysis (Disp., x) perform well but scaled window variance analysis (SWV,*) performs poorly, especially for small a, tending towards the 1/a curve. (The points are offset slightly to improve readability.) 206 C.6 Schematic representation of relation between fractional Brownian mo-tion and Levy flight. Traditional Brownian motion sits at the inter-section (H = 1/2, a = 2). The natural extension into the two-space is fractional Levy motion which has correlated, non-Gaussian increments.206 xvm Acknowledgements I would first like to thank my supervisor, Dr. Birger Bergersen, for his help and financial support over the last many years. His remarkable ability to always keep both the "forest" and the "trees" in focus has been an inspiration to me. I shall always try to do the same. I am also particularly appreciative of the many helpful discussions with Dr. Alan Kraus of the Faculty of Commerce at U B C , whose input has been invaluable. Thanks also to Casey Clements whose curiosity provided the inspiration for one of the models. I gratefully acknowledge the financial support, through graduate fellowships, of the Crisis Points Group, Peter Wall Institute of Advanced Studies. This unique interdisciplinary group has also afforded me the opportunity to learn about com-plex systems in many diverse fields. I would especially like to thank the group's coordinator, Dr. Cindy Greenwood, for her helpful ideas and comments. I would also like to thank a number of software developers whose free code segments were incorporated into my simulations: Grahame Grieve, Lucian Wis-chik, Fedor Koshevnikov, Igor Pavluk, Serge Korolev, and Troels Skovmand Erik-sen...thanks! Finally, I am greatly indebted to my wife and friend, Kathy, for always supporting me in my studies (even when it wasn't clear where they would lead) and for taking an interest in my work. You always let me follow my dreams; now let's find out where your dreams lead. H E N D R I K J . B L O K The University of British Columbia November 2000 xix Chapter 1 Introduction 1.1 Financial markets Financial markets include stock markets, such as the New York Stock Exchange (NYSE), which deal in ownership shares of publicly-owned companies. Companies owned privately can raise equity capital through an initial public offering (IPO) which releases part-ownership to the public. When the public stockholders wish to sell some of their shares they do so on a financial market. The market typically charges the company a fee to list it and may require the company to meet certain standards in order to protect investors. On the N Y S E , trades are handled by a restricted number of brokers who are governed by the market's rules and regulations. Brokers receive trading orders— consisting of a quantity of shares to trade and (optionally) a price—from the public and bring them to specialists who deal only with specific stocks. The specialist's role is to compare the highest bid (buy order) price with the lowest offer (sell order) price and if they meet, execute the trade. At the beginning of trading each day the specialist also finds a fair market price for a stock by balancing the outstanding supply (total offers) with demand (total bids). Although actually more complicated, for our purpose this is a sufficient explanation of the specialist's role. 1.2 Motivation for research Neglecting dividends a company may pay to its shareholders, investors make money on the market by buying stocks at low prices and selling them at higher prices. But given identical (publicly available) information one would expect (similar) investors to have the same prediction for how the price would move and they should all place 1 similax orders. For example, if Betty hears that company X Y Z has discovered oil, she may well expect the company's stock price to climb and so she places a bid order. The trade will not be exercised, however, until another investor, say Sam, offers to sell his shares in X Y Z . But the question is then raised in Betty's mind, "Why does Sam want to sell?" Is he being irrational? D id he miss the good news? Does he know something Betty doesn't? Conventional wisdom assumes Betty and Sam have slightly different expecta-tions about the future price of X Y Z , perhaps due to imperfect information. Thus the market dynamics are driven by random fluctuations. But this assumption leads to predictions that price fluctuations should be normally distributed (or log-normally) and that trading volume should be low and steady. What is actually observed on all markets is bursts of activity with very high volume and/or extremely large fluctuations in price which occur much more frequently than the conventional wisdom can account for. The reason for these bursts has not been established and is an interesting topic of research. 1.2.1 Motivation for the physicist Statistical physicists and condensed matter theorists have developed a significant arsenal of tools for analyzing many-particle systems with strong, localized inter-actions. Methods such as mean-field theory, the renormalization group, and finite-scaling analysis allow physicists to explore complex, irreducible systems such as spin glasses (highly disordered magnetic systems) where the important details are in the interactions between the particles, rather than the individual particles themselves. Recently, physicists have realized that the methods developed above may also be useful for non-physical systems such as ecological and social systems. The leap of faith required is the assumption that it is not necessary to fully understand the individuals in the system themselves (their motivations, for instance), but only to the point that one can construct reasonable rules for the interactions between individuals. Whether this leap of faith is justified remains an open question but interest is mounting within the physics community in complex, socio-economic systems like the stock market. In 1995 the Los Alamos National Laboratory (LANL) condensed-matter preprint archive (http: / / a r X i v . org) accepted three papers containing the word "market" in their abstracts (a check was done confirming they were all finance-related) representing 0.16% of the submissions that year. Since then it has doubled every year through 1999 when fifty of the 5,490 (0.91%) submissions were market-related. (The foray of physicists into economics has come to be known as Econo-2 physics.) Phase transitions Together with the analytic tools physicists bring to the subject, they also bring a fresh perspective and new questions. For instance, it has been empirically observed that price fluctuations exhibit scaling [4-7], meaning the fluctuations appear invari-ant under a change of scale, over orders of magnitude from a few minutes to a few days. Scaling is characterized by power-law distributions which are very familiar to physicists because they occur near second-order or critical phase transitions. Phase transitions, in the context of thermodynamics, are well understood phenomena. The terminology of order parameter, a dependent variable which un-dergoes a "sharp" change, and control parameter, the variable which is smoothly adjusted to produce the change, is used to quantify the transition. In the case oi first-order transitions (such as melting) the order parameter undergoes a discontinuity—it jumps to a new value. The jump is accompanied by an absorption or liberation of energy (latent heat). Usually, fluctuations within the substance can be ignored for first-order transitions. To demystify the above definitions, consider a pot of water boiling at 1 at-mosphere of pressure and 100°C. If we choose temperature as the control parameter then density could play the role of the order parameter. Below 100°C water is a liquid with a relatively high density. Above this point, all the water is in the form of steam which has a significantly lower density. At the transition we observe fluctua-tions in the form of small, uniform steam bubbles. This is a first-order transition. In contrast, second-order, or critical, transitions are characterized by a dis-continuity in the derivative of the order parameter (see Fig. 1.1). In fact, the derivative diverges at the critical point. Further, near the transition, properties are dominated by internal fluctuations on all scales. For example, let us revisit our pot of boiling water. We raise the pressure to 218 atm and the temperature to 374° C (the critical point of water). Again, we observe steam bubbles but in this case the bubbles exist on all scales—from microscopic to the size of the pot itself [8]. Also, the density (but not its derivative) is continuous across the transition. Near a critical point, many thermodynamic properties obey diverging power laws. Early studies of critical phenomena revealed that the characteristic exponents for the power laws clustered around distinct values for a variety of systems. This sug-gested that some of the features of separate systems were irrelevant—they belonged to the same universality class. Some of the irrelevant variables in a universality class are usually the type of local interactions, the number of nearest neighbours,'et cetera. On the other hand, dimensionality and symmetry, for example, appear to 3 O o *>• control control First-order Second-order Figure 1.1: Sample phase transitions. A first-order transition (left) is characterized by a discontinuity in the order parameter, while a second-order (critical, right) is discontinuous in the first derivative. be relevant variables—that is, change the number of dimensions or the symmetry laws and the system changes its class (or may even cease being critical). Critical systems with the same relevant variables but different irrelevant variables have the same critical exponents and are said to belong to the same universality class. Returning to our discussion of markets, the estimated power law exponent in the financial data seems to exhibit universality. That is, the exponents seem to be similar for a number of different markets and stocks and they also seem not to change over time [6,7,9,10]. This evidence suggests that markets operate at or near a dynamical critical point as studied by physicists. Self-organized c r i t i ca l i ty To a physicist, the question of whether the market operates at a critical point is especially interesting. The traditional theory of critical phenomena states that a system will approach a critical point via deliberate tuning of the control parameter. In the above example, by adjusting both the temperature and pressure, water was brought to its critical point. This description does not seem to apply to markets, however. The rules governing market dynamics were not chosen in order to put the market in a critical state. In fact, there does not appear to be any analog for temperature, which could 4 be used to explain why the market might be at a critical point. If it is critical, it appears to have arrived there spontaneously, without any tuning of a control parameter. This phenomenon has come to be known as self-organized criticality (SOC) and was originally proposed as a possible explanation for scaling in many natural phenomena [11,12]. The canonical example of SOC is a pile of sand to which grains are added very slowly. As each grain is dropped it may cause the local slope of the pile to exceed a threshold and collapse, dispersing grains within a local neighbourhood. These grains may cause further instabilities producing a cascade reaction. Measuring the total effect of dropping each grain yields a power-law distribution of avalanche sizes, indicating the presence of a critical point. The criticality is said to be self-organizing because it emerges spontaneously from the simple process of dropping grains periodically. In some cases, SOC can be mapped back onto traditional criticality by a separation of timescales: systems which responds quickly to very slow driving forces are candidates for SOC. In particular, the sand pile model described above qualifies for this mapping because the driving force (dropping of grains) is much slower than the duration of the avalanches [13]. More generally, the appearance of SOC can be an artifact of how the system is constructed. Some natural choices of parameter values (such as an infinitesimal driving rate, as discussed above) automatically lead to dynamics which can be crit-ical or very nearly so. Traditional criticality is only revealed when the parameter is manually varied [14] (for example, by increasing the rate at which sand is added to the pile). Whether the markets operate at a critical point and, if so, how they develop towards and maintain this state is of interest to physicists. 1.3 Anticipated challenges Although synthetic constructs, the markets are difficult to study scientifically for many of the same reasons as natural phenomena. Firstly, stocks are strongly coupled to each other and to other systems, both natural and man-made. For example, an earthquake in Taiwan on September 20, 1999 which cut off electrical power at Taiwan Semiconductor Manufacturing (TSM) and significantly disrupted production, had only a minimal impact on the company's stock price. However, their South Korean competitors' stock prices soared in anticipation of increased demand. This highlights the second challenge in studying the market: investors' re-sponses (and hence stock price fluctuations) to incoming news can be strongly non-5 linear. A company's quarterly forecast of a loss of ten cents per share could con-ceivably have much more than double the impact of five cents. The precise response function (if one exists) is unknown. Thirdly, the impact of an exogenous event may be practically, or even the-oretically, unquantifiable. Investors may receive imperfect information and/or the necessary calculations to assess the impact of the news on a stock's price may be too complex, beyond the rational abilities of the investor. Lastly, only some of the information which drives investors' actions is broad-cast to all. The rest (a rumour, for instance) is transmitted through a complex network of friends, families, and co-workers. It is not clear if this information can be neglected and, if not, how the network is to be represented, structurally. 1.4 Modeling The natural sciences are well acquainted with these types of challenges and their reaction is to study the system in two ways: first empirically, then with an idealized representation. Empirical analysis is the first and best way to understand the world around us. By collecting data and studying statistical properties thereof we can learn about the underlying distributions governing many phenomena. Then, once sufficient em-pirical data have been collected idealized models may be constructed to try and account for the data. A vast store of financial market data is available. For instance, precious metal price data are on record all the way back to the 1200s [15]. A large number of these data sets have been analyzed and the results indicate that large market fluctuations (outliers) occur much more frequently than would be expected (the frequency dis-tributions exhibit fat tails) and, unexpectedly, fluctuations occur in clustered bursts of volatility rather than uniformly. This thesis will not focus on empirical analysis of financial data, rather relying (mainly) on these published results. Instead, this thesis will focus on idealized representation or modeling of finan-cial markets. Social scientists have developed simple analytic models of the stock market. For tractability they assume a small number of investors who have per-fect rationality (unlimited computational power) and complete information [16-18]. These models are interesting to economists because they can explain equilibrium stock prices [19]. However, they are uninteresting to the physicists for precisely the same reason—since they are equilibrium models they fail to exhibit fat-tailed fluctuation distributions or clustered volatility. 6 1.4.1 Computer simulations M y hypothesis is that the complex dynamical behaviour of the stock market is an emergent property arising from the interactions of many agents and is largely inde-pendent of the complexity of the agents themselves. In order to test my hypothesis I will construct some simple models meant to capture the essence of the stock market and study them experimentally via computer simulation. Computer simulation is necessary because many-agent models are impracti-cal or even impossible to analyze by hand—the number of interactions which need to be accounted for typically grows as the square of the number of agents. (Even the simplest models tend to be too complex for an analytic treatment.) So we turn to computers, which are capable of performing millions of calculations rapidly, with no (significant) errors. There are many objections to working with computer simulations but some of these apparent shortcomings are actually advantages. For instance, it is impossible to construct a many-agent simulation with perfect rationality and complete infor-mation: each agent's expectations are formed on the basis of every other agent's expectations, which are formed on the basis of every other agent's expectations, ad infinitum. (In some cases this infinite regress can be collapsed and solved.) Besides being impossible to incorporate into a simulation, I hope the reader wil l agree this is an unrealistic account of investor behaviour. Simulation may also seem inappropriate because, to develop a stochastic model, random events must be incorporated but computers are incapable of gener-ating truly random numbers. As an alternative, a number of algorithms have been constructed to produce pseudo-random numbers which pass all known statistical tests for randomness [20-22]. However, these generators still require a seed from the user—a random, initial number to begin the sequence. This flaw can often be a blessing because it offers replicability in one's experiments—by seeding the simula-tion with the exact same number as a previous iteration, the entire time series can be reproduced. (To generate independent time series different seeds are used.) Finally, market micro-simulation may be objected to because the events (for example, news releases) which drive the dynamics must be explicitly coded into the simulation. As discussed above, these events are often not even quantifiable and, hence, can not be accurately coded. However, turning this argument around, this is yet another advantage of the simulation methodology. Any number of alternate hypotheses of the structure of the driving events can be encoded and their impact on the dynamics tested experimentally. One of the interesting questions this thesis will address is "How complex does the input (news) need to be to produce realistic output (price fluctuations)?" 7 1.4.2 An appeal for simplicity A common temptation when constructing computer simulations is to try to capture as much detail as possible in order to make the simulation realistic. But there are a number of reasons the model should be kept simple: Firstly, model complexity must be balanced against the constraints of current computational speed. Simple models require less computational power and produce larger datasets. Since large quantities of data are required to test the frequency distributions of rare events (such as price crashes), simpler is better for our purposes. Secondly, as a model's complexity grows its capacity for being understood diminishes. Some global climate models (GCMs), for example, have reached suffi-cient complexity that the modelers specialize in only a particular subroutine of the model, such as cloud formation. Very few (if any) of the researchers have a full grasp of every detail of these simulations. A problem with this approach is that the model becomes as difficult to understand as the system it was meant to idealize—a problem known as Bonini's Paradox [23]. (Of course, G C M s are extremely useful for predictive purposes, but perhaps not for furthering scientific understanding.) Thirdly, by starting with a trivial model and gradually adding layers of com-plexity, it is possible to determine the minimum requirements for a model which captures the essence of the system under investigation. In the case of the mar-ket model this could mean building on a simple model until fat-tailed distributions (for example) are observed in the price fluctuations. Then we can say with some confidence, "These ingredients are the minimum requirements to explain market fluctuations." Finally, there is the issue of Occam's razor. In the 1300s the Franciscan monk, Will iam of Occam stated, "Causes are not to be multiplied beyond necessity" [24,25] or, to paraphrase, "The simplest explanation is best," guiding the course of science for centuries. Notice this claim is aesthetic, not epistemological—it does not claim that the simplest explanation is true, but simply to be preferred, at least until evidence comes to light which requires us to reject it. In Bayesian probability theory, Occam's razor has an even more precise role: given two theories which explain a phenomenon equally well, the one with fewer adjustable parameters is assigned a greater numerical likelihood [25, Ch. 24]. Similarly, we should construct models which contain as few parameters, or assumptions, as possible. 1.5 Organization of the thesis In this thesis I will develop and implement via simulation two hypothetical models of stock exchange. A n early model, which introduces the idea of a centralized market, 8 wil l be described in Chapter 2, and a later model, which discards the centralized trading restriction, in Chapter 3. In Chapter 4 the phase space of these models will be explored revealing some interesting phase transitions, including a critical point in either model. Then, in Chapter 5 experiments wil l be performed and the results of the two models will be compared with each other and empirical data. It will be discovered that the centralized model is incapable of generating the desired dynamics but the decentralized model can exhibit both fat tails and clustered volatility. Some interesting results of an experiment in investing, using a hypothetical portfolio, will be discussed in Chapter 6. The thesis will close with a discussion of some conclusions which can be drawn from the research and some ideas for future research. 9 Chapter 2 Centralized Stock Exchange Model 2.1 Inspiration In this chapter we will explore the Centralized Stock Exchange Model (CSEM), a microscopic model which is built upon the premise of centralization; each agent on the market is restricted to trading with a single, monopolistic market maker who has complete control over the execution price. No direct trades between agents are allowed. This situation approximates some actual, thinly traded stocks on the New York Stock Exchange (NYSE) and other markets [26]. There are two reasons this approach was chosen: firstly, there exists a signif-icant collection of literature following this methodology [27-36]. I hoped to famil-iarize myself with this literature by constructing a model along the same vein. Secondly, it allows for the construction of very simple agents. By having the trading price set exogenously the agents need only react rather than formulate their own trading schedules. In particular, the standard game theoretic approach is applicable only to reactive agents, as will be seen. Hence, the development of C S E M was a natural starting point for my re-search. 2.2 Theory In this section the model's structure will be explained. 10 2.2.1 Assumptions I will begin, for the sake of clarity, by laying out some common assumptions used in CSEM. Heterogenous agents The market consists of many agents interested in trading. If all the agents had identical beliefs then we might expect their actions to be identical. Hence, we would effectively have a market of just one meta-agent unable to execute any orders. Similarly, if any subset of the population is homogeneous then that subset can be equally well represented by a single agent. Therefore, it is natural to require that all the agents be unique. Notice that heterogeneity can arise from imperfect rationality or incomplete knowledge, qualities which seems reasonable for the simple agents which will be constructed. In most cases the agents will differ in fundamental parameters describing their preferences but transient differences alone (such as cash or shares held) may be allowed too, provided these factors influence the agents' actions. Single risky asset and single riskless asset For simplicity a market consisting of just one risky asset (public company stock) and one riskless asset (cash, for instance) will be used. The total number of shares available on the market will be conserved. Since the company pays no dividend the stock has no fundamental value and stock price is maintained solely by expectations of satisfactory returns on the sale of shares. (The stock must at least have a chance of paying a dividend eventually or the stock price will be identically zero for all time, but the payout date is assumed to be far in the future.) For simplicity, the stock price will be assumed to be a continuous variable. (In contrast, real stock prices are discretized, but on a sliding scale—dollar stocks usually have increments of one sixteenth of a dollar but penny stocks may be incre-mented by one tenth of a penny.) The riskless asset (which we will call cash, though it could as easily represent some other stable equity such as gold) is defined to have a fixed intrinsic value in terms of which the value of the stock is measured. (By measuring all value in terms of cash some of the difficulties of utility theory in comparing utilities of disparate objects [37] are sidestepped.) The total cash in the market will also be conserved. To achieve this, cash will pay no interest and no commissions will be charged on trades. This restriction 11 may be unrealistic but it has a significant advantage: the ratio of cash-to-shares is conserved. This means that the market can avoid moving into a regime dominated by one or the other and instead establish a balance between the two. For instance, if transaction costs were implemented cash would flow out of the system and, even-tually, most of each agent's wealth would be held in stock. Conversely, if interest was paid on cash, eventually the market might be cash-dominated. In either case it is conceivable that the dynamics of the market would change, adding a complicating factor. Fixing the amount of cash in the system simplifies the model and allows for the collection of large datasets. No intraday trading C S E M uses a trading model which assumes all trades are executed only once daily (simultaneously). This approach is common in the literature [27-30,33,35] and mimics trading which occurs in real markets on unprocessed orders before opening each day. Centralized trading As mentioned above, the agents in this model are restricted to trading only with a single, monopolistic market maker or specialist. They are not allowed to trade directly with each other. This has a empirical basis but is also a simplifying factor. A discussion of how trading is implemented follows. 2.2.2 Utility theory Each agent can adjust a portfolio consisting of s shares of a single risky stock and riskless cash c. If the share price is p then the agent's total capital at time t is wt — ct + ptst- W i t h interest and fluctuations in the stock price the agent's capital after one day (defined as one time unit) becomes and the trading behaviour reduces to an optimization problem with respect to the holdings St-If the agent could know what tomorrow's price of the stock pt+i will be in advance, finding the optimum strategy would be trivial: if pt+\ > pt then move all one's capital into the stock, otherwise move it all into cash. But of course the future price is unknown. Nevertheless each agent assumes it is a stochastic variable and wt+i = ct+pt+1st = wt + [pt+i - Pt] st (2.1) (2.2) 12 has some expectations of the underlying probability distribution, based on historical prices. A naive goal, then, might be to maximize one's expected future wealth (wt+i) with respect to one's current holdings St. Unfortunately, Eq. 2.2 simply tells us to invest all our capital into stock if (pt+i) > Pt and otherwise into cash, almost exactly as before. The problem with this approach is that it doesn't factor in risk. What if there was a non-zero probability that the stock price would crash Pr(pj+i = 0) > 0? Then, under repeatedly application of this strategy the agent would eventually lose all its wealth with certainty. Even if the price can't drop to zero (which it can't if there is any expectation of a non-zero price in the future) this strategy can perform poorly, particularly if the price is a multiplicative stochastic process [38] because it assigns disproportionate weights to extremely unlikely events which would have exorbitant payoffs. This strategy is said to be risk neutral. We define our agents as simple expected utility maximizers where the utility function is monotonically increasing with wealth but has a negative second derivative (concave) ^ > 0 , ^ < 0 . (2.3) aw dwz These requirements for a utility function are well established within financial eco-nomics [37,39] and basically mean that an agent is unwilling to make a "double-or-nothing" wager of any amount if the odds are even. (Notice that the risk neutral agent U = w would be ambivalent towards this wager and a risk preferring agent > 0 would willingly take the wager.) Exponential utility function An often-chosen form [19] is the exponential utility U(x) = —e~ax or equivalently (because utilities are defined only up to a linear transformation [37]) U(w)=wgoal(l-e-w'ws°*>) (2.4) where wgoai is called the goal wealth and sets a natural scale for the utility. As shown in Fig. 2.1, the utility crosses over from a linear dependence on w at small wealth U(w <S wgoai) « w to an asymptote at large wealth U(w » wgoa{) —> wgoai. The interpretation of wgoai as a "goal wealth" is justified because below wgoai the agent is willing to take risks for the chance of high payoffs but above wgoai it sees little reward in amassing greater wealth, being more concerned with maintaining its current level. 13 0 0 Wgoal wealth, w Figure 2.1: The exponential utility function defined in Eq. 2.4 is often applied in finance. The goal wealth parameter wgoai implicitly sets the risk aversion. 2.2.3 Optimal holdings The exponential utility function is useful because it provides an analytic solution to the maximization problem [16] if we assume tomorrow's wealth wt+i is Gaussian distributed (a reasonable assumption by the Central Limit Theorem, if it is a cumu-lation of many additive stochastic components). Then the expectation of the future utility is which is maximized by simply minimizing the argument of the exponential. The future wealth depends on the price movement through Eq. 2.2 so the mean and variance become Eq. 2.6 can be maximized with respect to the free variable st yielding the (2.5) (2.6) im+i) = wt + st {(pt+i) - pt} Var [wt+i] = a2Var \pt+i]. (2.7) (2.8) 14 optimum quantity of shares to hold, Var \pt+i\ with the additional constraints s£ > 0 (no short selling) and u>t > pts\ (no borrowing cash). The agent's strategy is to sell shares if st > or buy if st < s£. The above equation is intuitively appealing: only hold shares if the expected return on your investment is positive and decrease your investment when the uncertainty (variance) is large (indicating an aversion to risk). 2.2.4 Risk aversion The goal wealth wgoai in the utility function sets an undesirable, arbitrary scale for the agents behaviour: they wil l be become increasingly risk neutral as their wealth falls far below this scale, and conversely, increasingly risk averse far above it. The arbitrary scale can be removed by setting the goal wealth proportional to the current wealth Wgoal = — (2-10) a where a is a dimensionless constant which describes risk aversion (which increases monotonically with a). Notice that introducing the dependence on the current wealth does not in-terfere with the optimization problem because wt is a constant at any time t, inde-pendent of any changes in the portfolio st (assuming no trading costs). Therefore the optimal portfolio simply becomes stKPt) = 77—f i— ( 2 - 1 1> aVar [p ( + i ] From Fig. 2.1 it is clear that the extremes of intense risk aversion and risk neutrality can be avoided by choosing a on the order of unity. A rough estimate provides an even more precise scale: empirically, the market appears to prefer to divide wealth equally between cash and stock when the annual expected return is 8% better than cash with an uncertainty on the order of 25%: (pt + 1 ) « ( l + 8%)p t (2.12) Var [p f + 1 ] « (25%Pt)2 (2.13) « P (2.14) 2 Pt where t is scaled by years instead of days (but this does not interfere with the argument). The a-value to satisfy these conditions is a w 2.5. Thus, the first agent-specific parameter introduced in C S E M is the risk aver-sion a which is constrained to lie in a G [1,3]. 15 2.2.5 Optimal investment fraction For ease of comparison with the Decentralized model (to be presented in Chapter 3) the above discussion will be presented in terms of the fraction of one's wealth invested in stock. The investment fraction it at time t is given by it = ^ (2.15) wt and the optimal investment fraction is denoted by i*t. Let us also define the return on investment from time t to t + 1: Pt which has a mean and variance (given a known current price pi) = ^ Z P i (2.17) Pt Var[r t +i] = s • v 2- 1 8) Pt Then, substituting Eq. 2.11 into Eq. 2.15, we find that the optimal invest-ment fraction is it = - J r * ± ! > _ (2.19) a Var [ r t + i ] with the constraints 0 < it < 1. This relation has some intuitively attractive properties: 1. A l l else being equal, given two agents with different risk aversions, the one with the higher aversion will invest less. 2. Only invest if the expected return is strictly positive, and invest in proportion to it. 3. As your certainty of a good return increases (variance decreases), increase your investment. However, it also has one glaring fault: when the expected return exceeds some limit, (rt+i) > a V a r [ r m ] (2.20) the recommendation is to invest all capital in the stock, despite risk. This arises because the agents assume the returns are Gaussian-distributed, with no higher moments than the variance, but as we will see, higher moments do exist, increasing the risk. 16 Investment l im i t To avoid complications of this kind a limit of 6 is imposed: the investment fraction is constrained to lie within i G [5,1 — S]. Hence agents never take an absolute stance of investing all their money or withdrawing it all from the market. The motivation for this restriction is purely mathematical: it prevents the occurrence of all the agents simultaneously selling all their stock and driving the price down to zero (or conversely, selling all their cash and driving the stock price up to infinity). As the population size increases, the probabilities of these events diminish simply as a result of fluctuations so the parameter S becomes less important. To minimize its impact on the dynamics it should be assigned a small, positive value. Mathematically, however, it is allowed to be as large as 1/2 in which case the investment fraction would be a constant 1/2, never responding to Eq. 2.19. Thus, the second agent-specific parameter is 6 which is constrained to lie within S G (0,0.5). 2.2.6 Forecasting With Eq. 2.19 the optimization problem becomes one of forecasting one's future return rt+\. In order to solve the optimization problem estimates of the expectation and variance of one's future return are required. The only information available to the agents is the history of returns so a reasonable choice is to try and extrapolate the series forward in time. Although more complicated forecasting algorithms involving nonlinearity and chaos exist [40-45], I chose to extrapolate a simple curve-fitting algorithm to produce forecasts. The goal of this model is not to test complicated forecasting models but to understand the effect of interactions between many simple investors, so the forecasting algorithm need only be adequate, not optimal. Linear least-squares curve fitting is well understood so we don't have to worry about it generating unexpected side-effects in the dynamics. The time series could be represented by a few parameters, one being the raw prices. However, a natural choice is the returns (as defined by Eq. 2.16) because a Gaussian-distributed future wealth wt+i was assumed. This assumption can be validated by assuming a Gaussian distribution for returns as well, because Eq. 2.2 can be written as w t + i = (1 - h)wt + kwtn+i (2-21) where the stochastic variable is the return rt+\. Since least-squares fitting assumes Gaussian errors, the returns are a convenient choice. Note that assuming a Gaussian 17 a 3 time, t Figure 2.2: Demonstration of forecasting via polynomial curve extrapolation. Shown are forecasts produced by a simple moving average and a linear trend. The linear trend is able to anticipate reversals in returns. distribution of returns is equivalent to a log-Brownian price series as is observed empirically on long timescales [46] (with interesting deviations on short timescales). For simplicity, only low-degree polynomials will be used as fitting functions. The degree zero polynomial, a simple moving-window average, is already robust enough to project exponential growth in the stock's price. Increasing to degree one (linear) also gives the agents the ability to forecast trend reversals (such as an imminent crash, as shown in Fig. 2.2) assuming the return history has meaningful trends. By choosing higher degree polynomials we can effectively make the agents smarter (better able to detect trends in the return series) but, in practice, it is unreasonable to go beyond a degree two, quadratic fit. If too high a degree is chosen agents begin to "see" trends where none exist by fitting curves to noise. Thus, the third agent-specific parameter, degree of fit d, is constrained to the integer values d = 0,..., 2. 18 Memory Obviously, as time progresses and the latest returns are acquired, the older data in the time series become irrelevant. The standard methodology for handling this is to set a finite moving-window which only keeps the M most recent data points, discarding the rest. Then the curve fitting is performed only with respect to the remaining data. However, this technique has a drawback: it suffers from shocks as outliers (strongly atypical data) get dropped from memory. To minimize this effect I constructed a method which uses an exponentially decaying window rather than the square window described above. The contribution of each point to the curve fit is weighted exponentially by how old it is. The technique is described in detail in Appendix A but a few points will be mentioned here: The exponential weighting is characterized by a single parameter, the mem-ory M (denoted by N* in Appendix A) indicating the effective number of data points stored, which is approximately the decay constant of the exponential. Using the exponential window allows compression of the data into just a few numbers regardless of the memory M and, as such, is computationally efficient in terms of storage and speed. An agent's memory also says something about its expectations. A short mem-ory produces fast responses to changes in returns and hence, more active trading. Conversely, a long memory results in slow variations in expectations and, therefore, slow changes in investment strategy. Hence, the memory implicitly also sets the (future) timescale, or horizon, over which the agent expects to collect. As with standard curve-fitting the parameter M is required to be greater than the number of parameters to be fit (= d + 1 where d < 2 so M > 10 (two trading weeks) is satisfactory) but there is no maximum value. But to draw parallels with real markets it is reasonable to choose scales on the order of real market investors. Many online stock-tracking sites allow one to compare a stock's current value to its moving average over windows up to 200 trading days (almost one year). Thus, the fourth agent parameter in C S E M is the memory M which is allowed to take on values in the range M G [10,200] (between two weeks and roughly one year). 2.2.7 Fluctuations To this point we have not explicitly identified the source of stochasticity. (Thus, since the simulation begins with no memory of any fluctuations no trading will oc-cur whatsoever.) To mimic the noisy speculation which drives movements in real markets, stochastic fluctuations are introduced into CSEM. The fluctuations are 19 meant to represent the agents' imperfect information which can produce errors in their expectations of tomorrow's price. Given that the return-on-investment time series is already assumed to have Gaussian distributed errors a natural extension is to introduce normally-distributed fluctuations into the agents' forecasts ( r t + i ) e = ( r t + 1 ) + e t (2.22) where et is a Gaussian-distributed stochastic variable with mean zero and variance It is assumed that agents are aware that their forecasts contain uncertainties so the variance of their forecasts is increased by V a r [ r m ] £ = V a r [ r f + 1 ] + a £ 2 (2.23) since the forecasted return rt+i is also assumed to be Gaussian-distributed (and the variance of the sum of two normally-distributed numbers is the sum of their vari-ances) . Fluctuations are handled by determining a random deviate for each agent at each time step and adding it to the expected return, as discussed above. Once the deviate is chosen, it is a constant (but unknown by the agent) for that time interval, so the expected return is also constant. This is necessary for technical reasons (it keeps the agents' demand curves consistent for the auctioning process which will be discussed in Section 2.2.9) but it also seems intuitively reasonable—one would not expect an investor to forecast a different return every time she was asked (in the absence of new information). The dynamics are driven solely by the presence of noise (as wil l be discussed below) so we require strictly non-zero standard deviations. On the other hand, the standard deviation also sets the typical scale of errors in the forecasted return. From personal experience, on a daily basis one would expect this error to be on the order of two percent. However, to fully explore the effect of the noise parameter C S E M will allow errors as large as 1/2 (which represents daily price movements up to ±50%) . Thus, the fifth agent parameter introduced into C SEM is the scale of the uncertainty cr£ which is chosen to lie within ae G (0,0.5). 2.2.8 Initialization The discussion so far has focused on how the agents evolve from day to day. But we must also consider in what state they will be started. It is important to choose starting conditions which have a minimal impact on the dynamics or a long initial transient will be required before the long-run behaviour emerges. 20 The simulations will be initialized with N agents; each agent wil l have a frac-tion of some total cash C and total shares S available. The effect of different initial distributions of cash and shares wil l be explored, but—unless otherwise specified— the cash and shares will usually be distributed uniformly amongst the agents. This allows the simulations to test the performance of other parameters; that is, to see if there is a correlation between parameter values and income. As mentioned above, agents wil l also be initialized to have zero expectation (n) = 0 and zero variance Var [ri] = 0 of tomorrow's return-on-investment. How-ever, this is subject to Eqs. 2.22-2.23 so the actual initial expectation is a Gaussian deviate with mean zero and variance a2. The first trading day is unique in that there exists no prior price from which to calculate a return-on-investment (for future forecasts). So the first day is not included in the agents' histories. Thus, the dynamics for the first two days of trading are due solely to fluctuations. In this section three market parameters were introduced: the number of agents N on the market, as well as the total cash C and total shares S which are initially divided equally among the agents (unless otherwise stated). 2.2.9 Market clearing Having discussed how the agents respond to prices and choose orders we now turn our attention to how the trading price is set. As mentioned before, this model is centralized in the sense that the agents are not allowed to trade directly with each other but all transactions must be processed through a specialist or market maker [27,28,30-36]. In real markets, the role of the market maker is more complex than in this simulation: here the market maker simply negotiates a price such that the market clears; that is, all buyers find sellers and no orders are left open. (Al l mechanisms by which the market maker may make a profit have been removed from the simulation for the sake of simplicity.) A simple way for the market maker to establish a trading price is via an auction process: repeatedly call out prices and receive orders until buy and sell orders are balanced. If buy orders dominate, raise the price in order to encourage sellers, and vice versa. However, C S E M provides a simpler (and faster) method for arriving at the trading price. Assuming the market maker knows each agent is using a fixed invest-ment strategy as given by Eq. 2.19, it can be deduced that the optimal holdings for 21 agent j (with cash Cj and shares Sj) at price p is s* = C-i±^2le (2.24) p J Effectively, by reporting their ideal investment fractions ij (and current porfolios (CJ, Sj)), the agents submit an entire demand curve (demand versus price) for all prices instead of just replying to a single price called out by the auctioneer. The market maker's goal of balancing supply with demand can be achieved by choosing a price which preserves the total number of shares held by the investors: 0 = E ( s i - s ; ) (2-25) j = i ^ E ^ + E ^ - 1 ) ^ (2-2g) which has a solution where, the values ij, Cj, and Sj are all from before any trading occurs on the current day. So, instead of requiring an auction, the trading price is arrived at with a single analytic calculation. Note that this method is possible because the optimal investment fraction ij does not depend on the current day's price but only on the history of prior returns. (Once the trading price is established, the latest price is included in the history and contributes to the determination of tomorrow's optimal investment fraction.) In i t i a l t r ad ing price In general, the calculation of the trading price is complicated and depends intricately on the history of the run but there is a special case where it is possible to determine explicitly the expected trading price—the first day. Let us assume that the initial distribution of cash and shares is such that each agent has equal numbers of both so that Eq. 2.27 reduces to (»*) 1 - ( I ' ) " (2.28) (2.29) 22 To calculate the expected investment fraction recall that initially the return history is empty so the expected returns are simply Gaussian-distributed with mean zero and variance a\ so, from Eq. 2.19, acre defining x = e/a£, k = aae and assuming the risk aversion a and forecast uncertainty ot are identical for all agents. Neglecting the limits i G [S, 1 — S\ on the investment fraction (6 — 0) simplifies the calculation of the expected investment fraction: ri(x)=l roo (»'*) = / i{x)Pr{x)dx + / Pr(x)dx (2.31) JO Ji(x)=l 1 k Jo Jk (2.32) = ^ (1-e""'2) + ^ (1-erf<k/^)) where erf(-) is the error function. Substituting this equation into Eq. 2.29 gives the trading price as a function of the single parameter k = aa€, as shown in Fig. 2.3. Notice the value of the stock drops with increased risk aversion or uncertainty of return, properly capturing the essence of risk aversion. It is interesting to note that the price drops to zero as po oc k"1 for large k. To see how this occurs, notice that as the parameter k approaches infinity the second term in Eq. 2.33 drops out (falling off faster than 1/k), as does the exponential in the first term, leaving only <i*) (k y oo) ~ - j L - , (2.34) which diminishes to zero rapidly. The power law tail in the price emerges from simply substituting this relation into Eq. 2.29. Now we briefly review the structure of the model. 2 . 2 . 1 0 Review The Centralized Stock Exchange Model (CSEM) consists of a number N of agents which trade once daily (simultaneously) with a single market maker, whose goal is to set the stock price such that the market clears (no orders are pending). In this section, the structure of the model will be reviewed. 23 Figure 2.3: The expected initial trading price depends only on the risk aversion mul-tiplied by the uncertainty of returns, aae. As the aversion or uncertainty increases the initial value of the stock drops. 24 The agents are simple utility maximizers which extrapolate a fitted polyno-mial to the return history to predict future returns and, therefrom, optimal trans-actions. Each agent has a portfolio of cash c and shares s and is characterized by the parameters listed in Table 2.1. Algorithm Events are separated into days. After the model has been initialized the agents place orders and have them filled once each day. The basic algorithm follows: 1. Initialization. Cash and shares distributed amongst agents. Agents clear his-tories. 2. Start of new day. Agents forecast return-on-investment from history (and noise). 3. Agents calculate optimal investment fraction and submit trading schedules (optimal holdings as a function of stock price). 4. Market maker finds market clearing price (supply balances demand). 5. Trades are executed. 6. Agents calculate stock's daily return-on-investment and append to history. 7. End of day. Return to step 2. Parameters For convenience all the variables used in CSEM are listed in Table 2.1. The param-eters are inputs for the simulation and the state variables characterize the state of the simulation at any time completely. For each run, the agent-specific parameters are set randomly; they are uniformly distributed within some range (a subset of the ranges shown in the table). Each dataset analyzed herein wil l be characterized by listing the market parameters and the ranges of agent parameters used. 2.3 Implementation The above theory completely characterizes CSEM. The model is too complex for complete analysis so it is simulated via computer. The model was encoded us-ing Borland C++Builder 1.0 on an Intel Pentium II computer running Microsoft Windows 98. The source code and a pre-compiled executable are available from http://r ikblok.cjb.net/phd/csem/. 25 Symbol Interpretation Range Market parameters N number of agents 2+ C total cash available s total shares available Market state variables Pt stock price at time t Vt trade volume (number of shares traded) at time t Agent parameters aj risk aversion of agent j [1,3] investment fraction limit of agent j (0,0.5) dj degree of agent j ' s fitting polynomial 0,1,2 Mj memory of agent j ' s fit [10,200] scale of uncertainty of agent j ' s forecast (0,0.5) Agent state variables C3 cash held by agent j S3 actual shares held by agent j s*3 optimum shares held by agent j Wj{p) wealth of agent j at stock price p actual investment fraction of agent j •5 optimum investment fraction of agent j Table 2.1: A l l parameters and variables used in the Centralized Stock Exchange Model (CSEM). 26 2.3.1 Pseudo-random numbers Coding the model as it has been described is fairly straight-forward. The only complication is that modern computers are unable to produce truly random numbers (required for the fluctuations in the forecasts) because computers are inherently deterministic. Many algorithms for generating numbers which appear random have emerged. A good pseudo-random number generator must have three qualities: it must be fast, it must pass statistical tests for randomness and it must have a long period. The period exists because there are only a finite number of states (typically 2 3 2) a random number may take on. Hence, it must eventually return to its original seed and once it does, since the series is deterministic, it is doomed to cycle endlessly. If the period is less than the number of times the generator is called within a single run, the periodicity wil l contaminate the dataset. One of the earliest and simplest pseudo-random number generators is the linear congruential generator [20, Section 7.1] which is defined recursively for an integer If Ij+i = alj + c (mod m). (2.35) While this algorithm is fast it is not a good choice because it exhibits correlations between successive values. More complicated generators have been developed which pass all known sta-tistical tests for randomness [20,21]. One of these, the Mersenne Twister [22] is also fairly fast and has a remarkable period of 2 1 9 9 3 7 PS 10 6 0 0 0 . Unless otherwise specified, the Mersenne Twister will be the generator of choice for CSEM. Seed A l l pseudo-random number generators require an initial seed: a first number (IQ in the linear congruential generator, for example) chosen by the user which uniquely specifies the entire set of pseudo-random numbers which will be generated. This seed should be chosen with care: using the same seed as a previous run will generate the exact same time series (all other parameters being equal). C SEM is coded to optionally accept user-specified seeds or it defaults to using the current time (measured in seconds since midnight, January 1, 1970, GMT). Since no two simulations will be run simultaneously, this provides unique seeds for every run. Unless explicitly specified, the default (time) seed will be used in the simulations. 27 2.4 Parameter space exploration With C S E M coded into the computer, time series data can be generated for numeri-cal analysis. As presented CSEM requires at least eight parameters to fully describe it. To fully explore the space of all parameters, then, means exploring an eight dimensional manifold... a daunting task. Before starting any experiments, then, it would be a good idea to check if any of these dimensions can be eliminated. 2.4.1 Number of agents N The effect of changing the number of traders wil l be explored in detail in Chapter 4 and is left until then. 2.4.2 Total cash C and total shares S In this section the effect of rescaling the total cash C and total shares S will be explored. Let us denote rescaled properties with a prime. Then rescaling cash by a factor A and shares by B is written C = AC (2.36) S' = BS. (2.37) Cash and shares are rescaled equally for each agent so the distribution remains constant. To see how these rescalings affect the dynamics let us begin by assuming that each agent's ideal investment fraction i\ is unchanged (this will be justified below). Then from Eq. 2.27 the price is rescaled by P't = | p t (2-38) and each agent's total wealth is rescaled by w't = Awt. (2.39) (The rescaling of price can be interpreted as the "Law of supply and demand" because when either cash or stock exists in overabundance, it is devalued relative to the rarer commodity.) Thus, the optimal holdings become a? = 4«7 = Bat (2-40) Pt 28 Parameter Run 1 Run 2 Run 3 N 100 100 100 C $1,000,000 $10,000,000 $1,000,000 S 1,000,000 10,000,000 10,000,000 o-£ 0.5 0.5 0.5 M 40 ± 2 0 40 ± 2 0 40 ± 2 0 a 2 ± 1 2 ± 1 2 ± 1 S 0.001 0.001 0.001 d l i l 1 ± 1 1 ± 1 seed -2 -2 -2 Table 2.2: Parameter values for C SEM Runs 1, 2 and 3. and the volume an agent trades becomes As' t = \s*t'-s't\ = BAst. (2.41) To justify that the optimal investment fraction remains unchanged, recall that it depends only on the return series through Eq. 2.19. The return series, under rescaling, becomes rt = -, = rt (2.42) Pt-i assuming the price series is rescaled by A/B. Thus, if the investment fraction remains unsealed then the price series is scaled by A/B, so the investment fraction remains unsealed... This would be a circular argument except for the fact that the investment fraction is initialized by a Gaussian fluctuation, which depends only on the param-eters a and a€. Thus the investment fraction begins unchanged (under rescaling of C and S) and there exists no mechanism for changing it, so it remains unchanged throughout time. So, when cash is rescaled by some factor A and shares by B, the only effects are: 1. Trading price is rescaled by A/B. 2. Trading volume is rescaled by B. To clarify this point in the mind of the reader, three identical runs were performed, with the parameter values shown in Table 2.2. Notice that Run 2 is Run 1 repeated with the scaling factors A = B = 10, and Run 3 is Run 1 with 29 ,-2 I I I I I I I I I I I 0 10 20 30 40 50 60 70 80 90 100 Time t 3 107 ( V 1 ' 1 1 i i i i Run 1 — Run 2 — Run 3 - - --10 6 i i i i 0 10 20 30 40 50 60 70 80 90 100 Time t Figure 2.4: Comparison of time evolutions of (a) price and (b) volume for Runs 1, 2 and 3 as defined in Table 2.2. The price scales as the ratio of cash to shares and the volume scales as the number of shares. (In both plots Run 2 is offset to improve readability.) 30 A — 1,B = 10. The resulting time series, shown in Fig. 2.4, confirm the claim that price scales as A/B and volume scales as B. Neither the absolute value of the price nor the volume are items of interest in this dissertation. Instead we are interested in fluctuations, in the form of price returns and relative change of volume. Neither of these properties are affected by rescaling the total cash or total shares so we are free to choose a convenient scale. I have arbitrarily chosen a market with C — $1,000,000 total cash and 5 = 1,000,000 total shares, thereby reducing the degrees of freedom by two. 2.4.3 Investment fraction limit 5 The investment fraction limit parameter S sets a bound on the minimum and max-imum allowed investment fractions 5 < i < 1 — S. This is purely a mathematical kludge to prevent singularities which could otherwise occur in Eq. 2.27. Effectively, S sets an upper and lower bound on the price itself: assume the total cash and shares are equal (C — 5). Then, the minimum price is realized when all agents want to discard their stocks, ij = 8 for all j, giving Pmin = Y~I~S' (2-43) Conversely, given maximal demand, ij = 1 — 5, the price will climb to a maximum of 1 - 6 , Pmax = — ^ — • (2.44) So the choice of 5 sets the price range for the stock. Obviously, to allow reasonable freedom of price movements the limit should be significantly less than one half, iJ <C 1/2. To mimic the observed variability in some recent technology-sector stocks, a limit of 6 = 0.001 will generally be used, allowing up to a thousand-fold increase in stock value—except in Chapter 4 where we explore the effect of varying this parameter. 2.4.4 Risk aversion a and forecast uncertainty ac One's intuition may lead one to suspect that the risk aversion factor a and the fore-cast uncertainty a€ are over-specified, and should be replaced by a single parameter k = aae as was done to calculate the initial trading price in Section 2.2.9. However, a closer inspection of Eq. 2.19 demonstrates this is not quite true. The optimal investment fraction is o ( V a r [ r t + i ] + a f ) v ' 31 Parameter Run 4 Run 5 N 100 100 C $1,000,000 $1,000,000 S 1,000,000 1,000,000 0.25 0.5 M 40 ± 2 0 40 ± 2 0 a 2 1 S 0.001 0.001 d 1 ± 1 1 ± 1 seed -2 -2 Table 2.3: Parameter values for C S E M Runs 4 and 5. Since ae is the only parameter to set a scale for the returns, in Eqs. 2.22-2.23, it is reasonable to expect the returns to scale linearly with a€ so renormalizing gives 1 + et/ae Var[r t + i ] /o-2 + i J (2.46) where the second factor is invariant under rescaling of at. Then, since o and ae occur nowhere else in the model, one may expect that the simultaneous rescaling a' = Ca (2.47) a[ = ae/C (2.48) would preserve the dynamics. However, as the price series of Runs 4 and 5 (see Table 2.3) show in Fig. 2.5, there are small deviations which grow with time until eventually the time series are markedly different. To see why this occurs, let us consider a simple thought experiment: Consider a run with equal amounts of cash and shares (C = S) where the last trading pr ice— for the sake of convenience—is pt-\ — 1. Now assume that on the next day all the agents have negative fluctuations in their forecasts which drive their optimal investment fractions to their lower limits i% — S. Then, from Eq. 2.27, the day's stock price will be given by Eq. 2.43 and the return will be pt - 1 1-2S n = — = - y z y (2-49) which does not scale with ae as was hypothesized in the derivation of Eq. 2.46. 32 Figure 2.5: Comparison of time evolutions of price for Runs 4 and 5 as defined in Table 2.3. The price is not perfectly invariant under rescalings which preserve the constant aae. 33 Occasional events like the one described in the above thought experiment are responsible for the deviations seen in Fig. 2.5. However, apart from these rare devi-ations (which, neglecting trends in the return history, should occur with decreasing frequency 1/2N as the number of investors increases) the risk aversion parameter a and uncertainty at appear to be over-specified. Therefore, the risk aversions will always be chosen from a uniform deviate in the range a G [1,3] and only the forecast error ae wi l l be manipulated—excepting the following section in which the relative performance of different values of a and ae wil l be evaluated. 2.5 Parameter tuning Thus far we have isolated three parameters (C, S, and 8) which can be fixed at particular values without loss of generality. We now want to choose reasonable ranges for the remaining parameters (<re, M , a, and d). Reasonable, in this context, refers to agents with parameter combinations that tend to perform well (accumulate wealth) against dissimilar agents. These parameter combinations are of interest because one would expect that, in real markets, poorly performing investors who consistently lose money will not remain in the market for long. Note that, as discussed in the Introduction, parameter tuning generally di-minishes an explanatory model's validity. This, however, does not quite apply in this case because we are not tuning the parameters in order to produce a model which better fits the empirical data (i.e.. exhibits known market phenomena, such as fat tails and clustered volatility)—rather, we are simply trying to select "better" investors. However, it must be acknowledged that this may concurrently tune the simulation towards realism. Further, the point of this exercise is not to completely specify the model but merely to avoid wasteful parameter combinations which should be driven out of the system by selective (financial) pressures. In the model, "dumb" agents (with parameter combinations which tend to underperform) will lose capital and may eventually hold a negligible portion of C and S. Hence, these agents won't contribute to the market dynamics and will simply be "dead weight", consuming computer time and resources. Hopefully, at this point the reader agrees that it would be helpful to cull "dumb" agents by finding the more successful parameter ranges. It may be discovered, in the course of this investigation, that some parameters are irrelevant; they may be take on a wide variety of values with little or no impact on the dynamics. In this case, these parameters may be assigned arbitrary ranges without loss of generality. To determine successful values, a large parameter space should be explored. 34 Parameter Run 6 N 400 C $1,000,000 S 1,000,000 0.25 ±0.25 M 105 ± 95 a 1.5 ± 1.5 5 0.001 d 1 ± 1 seed random Table 2.4: Parameter values for C S E M Run 6. 1000 F 0 10000 20000 30000 40000 50000 60000 time t Figure 2.6: Price history generated by C SEM with parameters listed in Table 2.4 (Run 6). The price almost reaches its theoretical maximum of $999 (see Eq. 2.44) before collapsing. The agent state variables were sampled at the times indicated. 35 Correlation with log w Parameter Sample 1 Sample 2 Sample 3 Sample 4 c e + 0 0 + M 0 0 0 + a + + + + d 0 0 0 0 Table 2.5: Regression analysis of log w versus agent parameters for different samples of Run 6 (Table 2.4). The symbols indicate the sign of the regression-line slope, or zero if it is insignificant (relative to its standard error). The results indicate that a is positively correlated with wealth but o€, M and d are largely irrelevant. To this end a long data set was collected with more agents and with broader pa-rameter ranges, as indicated in Table 2.4. The price history for the run is shown in Fig. 2.6. The results were analyzed by looking for correlations between an agent's wealth and the following parameters: forecast error ae, memory M, risk aversion a, and degree of curve-fit d. Note that the point of this work is not to determine an optimal investment strategy (set of optimal parameter values), but simply to establish reasonable ranges for these parameters such that the agents perform reasonably well. Thus, a complete correlation analysis is unnecessary. Instead, a simple graphical description of the results should be sufficient, with a simple regression analysis for emphasis. Table 2.5 shows the results of linear regression analyses of log w versus agent parameters for different samples of Run 6. The logarithm of wealth is fitted to a straight line with respect to the parameter of interest and the sign of the slope is recorded. If the slope m has a standard error larger than 100% then the parameter is interpreted as being uncorrelated with performance. This method was constructed only because it lent itself to the computational tools available to the author. How-ever, it is reasonable: recall that the linear correlation coefficient (which is typically used to test for correlations) is related to the slope r a m . Also, the standard error estimates the significance of the slope; a value greater than 100% suggests that the sign of the slope is uncertain. The results of the analysis indicate that the risk aversion parameter a is positively correlated with performance (wealth). However, the forecasting param-eters ae (forecast error), M (memory) and d (degree of polynomial fit) appear to be uncorrelated with performance. Hence, these parameters can be set arbitrarily. The memory from Run 6 ( M = 105 ± 95) will be used in all further simulations to 36 Parameter Run 7 N 400 C $1,000,000 s 1,000,000 0.025 ± 0.025 M 105 ± 95 a 1.5-1.5 6 0.001 d 1 ± 1 seed random Table 2.6: Parameter values for C S E M Run 7. maintain diversity. However, the degree of the fitting polynomial will be constrained to d = 0 (a moving average) because it boosts simulation speed. The forecast error <7e requires further inspection. Representative graphs of \ogw versus the parameters a and ae (using Run 6: Sample 3) are shown in Fig. 2.7. The slopes are used to estimate correlations, as discussed above. The evidence suggests that risk aversion is positively correlated with performance. Hence, small values of a (high-risk behaviours) tend to under-perform. Thus, the range of a is restricted to a £ [1,3] instead of a G [0,3] as set in Run 6. 2.5.1 Forecast error The only free parameter left is the forecast uncertainty ae. Although Fig. 2.7 indi-cates no correlation between wealth and forecast error, a closer inspection reveals a small peak for the smallest errors cre < 0.05. To test this range, a new dataset was collected with all the parameters as in Run 6 except the forecast error scaled down by a factor of ten, as indicated in Table 2.6. The time series, shown in Fig. 2.8, exhibits wildly chaotic fluctuations which regularly test the price limits (Eq. 2.44) imposed by 5. Since S was an arbitrarily chosen parameter, we do not want it to significantly affect the dynamics as it does in Run 7. Thus, the choice of ae — 0.025 ± 0.025 causes problems. This issue will be revisited in Chapter 4. 37 n3 CD c CD bfj 03 B3 a 03 bO IO 1 0 10 5 10° I O - 5 1 0 - i o IO" 1 5 1 0 - 2 0 1 0 - 2 5 t*T + 10" -30 (a) i r _L Run 6: Sample 3 + Best fit line J I I 0 0.5 1 1.5 2 agent risk aversion, a 2.5 IO 1 0 IO5 10° 10~ 5 1 0 - i o io-15 IO" 2 0 1 0 - 2 5 IO" 3 0 *>• + + — * A *• + + + + • + ++ + + + + + + + : I (b) T T T T "i r + + Run 6: Sample 3 + Best fit line J I I I I I I I I 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 agent forecast error, ae Figure 2.7: Plot of agent wealth versus (a) risk aversion and (b) forecast error. The best fit lines have slopes 5.2±1.4 (positive correlation) and 4.7±8.8 (no correlation), respectively. 38 Run 7 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 time t Figure 2.8: Price history generated by C S E M with parameters listed in Table 2.6 (Run 7). The series has the undesirable property that the price spends much of its history at or nearing its ceiling ($999). Symbol Interpretation Range Market parameters N number of agents 2+ C total cash available $1,000,000 s total shares available 1,000,000 Agent parameters cij risk aversion of agent j 2 ± 1 investment fraction limit of agent j 0.001 dj degree of agent j ' s fitting polynomial 0 M3 memory of agent j ' s fit 105 ± 95 scale of uncertainty of agent j ' s forecast [0,0.5] Table 2.7: As Table 2.1 except with updated parameter ranges. These ranges will be used in subsequent simulations. 39 2.5.2 Finalized parameter ranges The finalized ranges of the model parameters are shown in Table 2.7. The risk aversion and memory will always be assigned the shown ranges but the effect of varying N and a £ will be explored further in Chapter 4. 2.6 Discussion In this section some observed properties of the model (both theoretical and empiri-cal) wil l be discussed. 2.6.1 Fundamentalists versus noise traders This model borrows heavily from other work in the area [27-36]. However, it differs from most of these papers in that it does not divide the traders into types. Many other models assign the agents one of two roles: either fundamentalists or noise traders [17,32,36,47,48]. Fundamentalists believe the stock has a real value (for instance, if it pays a dividend) and trade when they believe the stock is over- or under-valued. Noise traders (or chartists), on the other hand, have no interest in the stock's fundamental value, but simply try to anticipate price fluctuations from the historical data, and trade accordingly. C S E M deliberately eliminates the fundamental value of the stock so the agents are necessarily what would be called noise traders. Hence, the dynamics which emerge from the simulations are of a completely different nature than those mentioned above. 2.6.2 Forecasting Table 2.5 indicates that the forecasting parameters are largely irrelevant to perfor-mance. This suggests that there are no serial correlations in the stock price and, therefore, no reward for increased effort to forecast (by increasing M and/or d). Whether the time series actually does have auto-correlations wil l be explored in Chapter 5. But the ineffectiveness of forecasting raises the question of whether a model based on forecasting is even relevant. Perhaps the agents would do better to ignore the return history and just rely on fluctuations to make their estimates of future returns. This may indeed be a valid argument but forecasting has an-other purpose—it adds a degree of heterogeneity to the agents through systematic differences in their investment fractions. On the other hand, forecasting may provide a mechanism for herding. As the history develops, the returns may be correlated for short periods. If so, then 40 the agents may converge in opinion regarding future returns and act in unison, with significant consequences in the price history. For this reason, the forecasting algorithm will be retained. 2.6.3 Portfolios Given an investment fraction i, wealth to, and stock price p, an agent's distribution of cash c and shares s is given by the relations iw = sp (2.50) {l-i)w = c. (2.51) Given the investment limits S < i < 1 — 5, a linear relation between wealth and the maximum or minimum holdings of both cash and shares can be found. (Recall, agents are not allowed to sell all their shares or cash.) Fig. 2.9 shows the distribution of cash versus shares for the agents of Run 6: Sample 4. Notice that the agents almost exclusively hold extremal portfolios dominated by either cash or stock. Very few actually hold mixed portfolios. This indicates that Eq. 2.19, which gives an agent's optimal investment fraction, may be too sensitive. But the only freedom one has in reducing the sensitivity is through the parameters a and a e , which have other consequences, as has been discussed. 2.6.4 Difficulties Although this model showed promise, I had some technical and ideological problems which encouraged me to abandon it in favour of a different approach. On the tech-nical side, as the reader can see, the number of parameters is somewhat unwieldy. Although some of the parameters could be determined, those remaining were dif-ficult to manage. The investment fraction limit 5, for instance, is a necessary but unappealing result of the derivation, which imposes arbitrary limits on the stock price's range. Another difficult parameter is the forecast error cre: if too large a value is chosen then the dynamics are dull and dominated by noise (see Fig. 2.5), but too small a value produces wildly chaotic behaviour completely unlike empirical market data (Fig. 2.8). This seems to be the critical parameter for determining the character of the dynamics, and the effect of varying it will be explored in more detail in Chapter 4. One of the ideological problems was the use of parallel updating (all agents trading at a single moment each day). Evidence is mounting that employing a parallel updating scheme (without strong justification) introduces chaotic artifacts 41 3 xi 1 3 13 JO IO 1 0 10 8 10 6 104 10 2 10° IO" 2 I O - 4 1 0 - 6 IO 1 0 IO8 IO6 104 10 2 10° io-2 10~ 4 Run 6: Sample 4 + Minimum investment Maximum investment i I i I i _ 10 - 4 10 -2 10° Cash c 102 104 106 10 - 6 Run 6: Sample 4 + Minimum investment Maximum investment J i I i I 10" 10" 10° 10 2 Shares s 104 106 Figure 2,9: Plot of agent wealth versus (a) cash and (b) shares held showing that most agents hold extreme portfolios of maximum cash and minimum shares, or vice versa. It appears that the method of calculating the investment fraction in C SEM (Eq. 2.19) is too sensitive to fluctuations. (It should be acknowledged the plots are truncated since the lowest wealth actually extends down to 10~ 2 5, an unrealistic quantity since real money is really discretized with a minimum resolution of one penny.) 42 into the dynamics which are generally not observed in the actual, continuous-time system being modeled [14,49-52]. Further, in this model the price of a share is artificially fixed by the market maker. In most markets the price is an emergent phenomena: auction-type orders are placed at hypothetical prices (eg. limit prices) and the price is realized when a trade occurs. Forcing the price to balance supply and demand destroys its emergent character. For these reasons, this line of research was replaced with the model presented in the next chapter. Nevertheless, the Centralized Stock Exchange Model is included here because it follows a prevalent line of reasoning in stock market simulations and falls into many of the same pitfalls encountered by others [17,27,28,30,32,34, 36, 48,53]. Wherever possible, C S E M data will be analyzed alongside the output of the next model, the Decentralized Stock Exchange Model. 43 Chapter 3 Decentralized Stock Exchange Model 3.1 Inspiration I was growing disillusioned with C S E M when a collaboration with—then undergrad-uate student—Casey Clements inspired me to consider a radically different approach. Casey also expressed dissatisfaction with the price being set by a centralized control (market maker) and wondered how real markets worked. I explained to him that the stock (ticker) price was simply the last price at which a trade had occurred. Casey expressed an interest in modeling this approach but I explained to him the hurdles: namely that the agents would have to be a great deal more complicated than in current models because they would have to make complicated decisions involving two or more parameters. Previously, agents were simple utility maximizers, applying Eq. 2.11 in order to calculate the quantity of shares they wanted to trade in response to a given price (set by the centralized control) but these new agents would have to decide on both the volume to trade and the price they wanted to trade at. Nevertheless, Casey was enthusiastic so I obliged him and we constructed a simple event-driven model of stock exchange (neglecting the difficulties with the agents' decision processes) with the following properties: 1. agents trade by calling out and replying to orders, 2. trades can be called at any time (continuous time), and 3. agents can choose both the volume of the trade and the price. 44 In this model prices would be decentralized, arising directly from the agents decisions rather than being governed by a market maker. Thus was born the De-centralized Stock Exchange Model (DSEM). 3.2 Basic theory This model is of a more original nature than C S E M was and, therefore, requires more explanation. For this reason the theory is divided into two sections, one which explains the basic structure of the model and another which describes how fluctuations are incorporated. First, the structure of the model will be developed. 3.2.1 Assumptions The decentralized model discussed here contains many of the same assumptions as C S E M (heterogenous agents; the market is composed of a single risky asset and a single riskless asset; cash and shares are both conserved; et cetera). For the sake of brevity, only the differences in assumptions between the two models wil l be discussed. Decentralization The primary difference between the two models is, obviously, the move to a decen-tralized market. This means that the agents are allowed to trade directly with one another without any interference from a market maker or specialist. The market maker may be interpreted as having been relegated to the mechanical role of match-ing buyers with sellers, with no influence on the price. This interpretation resembles intra-day trades of a fairly active stock [26]. CSEM, in contrast, was meant to mimic low-frequency trading, on timescales no shorter than a single day. Continuous time with discrete events By moving to intra-day trading, the natural periodicity of the market opening and closing daily, which gave rise to discretized time in CSEM, is eliminated. In fact, in D SEM it is assumed that the market never closes; it is open around the clock, 24 hours per day. Trades may be executed at any instant and time is a continuous variable. (Another interpretation is that the market does close but when it re-opens it continues from where it left off without any effect from the close.) To implement continuous-time (at least to some fine resolution) on fundamen-tally discrete devices (digital computers) a shift of paradigms is required. Tradition-ally, time evolution is simulated by simply incrementing "time" by a fixed amount 45 (as in CSEM). This approach is cumbersome and inefficient when the model consists of discrete events (eg. trades) occurring at non-uniform time intervals. Instead, an event-driven approach is preferred [54] in which waiting times (delays) for all pos-sible events are calculated and "time" is immediately advanced to the earliest one. (As each event time is calculated the event is placed in an ordered queue so the earliest event is simply the first event in the queue.) The basic algorithm follows: 1. Calculate waiting times for all events. 2. Find event with shortest waiting time. 3. Advance time to this event and process it. 4. Recalculate waiting times as necessary. 5. Return to Step 2. Separation of time scales The model allows two types of events: call orders and reply orders. By distinguishing between the two a simplifying assumption can be made: no more than one call order is active at any time. In many real markets orders are good (active) until filled or until they expire. When new orders are placed they are first treated as reply orders by checking if they can satisfy any outstanding orders, and then, if they haven't been filled, they are placed on a auction book, and become call orders until they are removed. In DSEM, however, it is assumed that reply orders occur on a much faster timescale than calls. As soon as a call order is placed, all reply orders are submitted and executed (almost) instantaneously and the call (if not filled) can immediately be expired (because no more replies are expected). Hence, the probability that two (call) orders are active simultaneously becomes negligible and it is assumed that at most one is ever active at any given time. This assumption improves performance at the cost of realism. Allowing multiple orders to be simultaneously active would require more complicated book-keeping and would degrade simulation performance. (It was observed that DSEM exhibited rich enough behaviour that the assumption did not need to be discarded.) 46 3.2.2 Utility theory As with CSEM, we again begin with a game theoretic approach. However, instead of an exponential utility function, a power-law is used, U(w) = ( i - ^ 1 ' " In w a — 1. Notice these forms are effectively identical as o —> 1 because l ima^i U' — w~x for both, and utility is only defined up to a (positive) linear transformation. The advantage of the power-law utility (sometimes known as the Kelly utility [55]) is that it has a constant relative risk aversion „ , , wU" , x R(w) = - — - a, (3.2) unlike the exponential utility which, from Eq. 2.4, has constant absolute risk aversion A(w) = —U"/U' = I/wg0ai. The constant relative risk aversion eliminates the scaling problems which had to be worked around in Section 2.2.4 and keeps an agent equally cautious regardless of its absolute wealth. (See Ref. [56] for a more detailed description of absolute and relative risk aversion). 3.2.3 Optimal investment fraction Let us assume an agent is interested in holding a fixed fraction i of its capital in the risky asset. If the price moves from p(0) to p(t) the return-on-investment is and the final wealth can be written as w(t) = i o ( 0 ) ( l - t ) + f ( 0 ) * ( r + l ) (3.4) = tw(0)(l+tr). (3.5) The goal is to find the value of i which maximizes the expected utility at some future time t. First, the expected utility must be calculated: (U) = Uf^-{(l+tr)^). (3.6) Unfortunately, finding a closed-form solution for (U) is difficult, even with very simple probability distributions. l - a 47 However, if we assume the timescale is relatively short then we expect the returns to be small and the above equation can be expanded around r = 0, as (U) « w(0)1-a / — ! — + xr - ]-ai2r2 + •••). (3.7) \ 1 — a 2 / Thus, the optimization condition becomes (3.8) o i * ( r 2 ) ] (3.9) az*(Var[r] + (r) 2)] , (3.10) giving an optimal investment fraction a(Var[r] + (r) 2) V ' with the constraints 0 < i* < 1. (This derivation implicitly assumed that the higher moments in the expansion are negligible—an assumption which may not be valid even for short timescales, as will be seen in Chapter 5). Note how closely this corresponds with Eq. 2.19 in CSEM, differing only by an extra term in the denominator. However, the use of a power-law utility allowed us to drop many of the assumptions originally required, such as explicitly hypothesizing that the returns were Gaussian-distributed. 0 = d(U) di = M O ) 1 " * <r> 3.2.4 Fixed investment strategy Eq. 3.11 states that, given some expectation and variance of the future return, one should hold a constant fraction of one's capital in the risky asset. The same result was derived by Merton [56, Ch. 4] assuming a constant consumption rate (which can be taken to be zero, as in DSEM). Further, Maslov and Zhang [57] demonstrated that keeping a fixed fraction of one's wealth in the risky asset maximizes the "typical" long-term growth rate (defined as the median growth rate). These references suggest that the optimal strategy is to keep a fixed fraction of one's wealth in stock—a Fixed Investment Strategy (FIS). The FIS is empirically tested with a hypothetical portfolio in Chapter 6 and developed, here, for use with DSEM. The strategy discards the attempt to forecast returns (which proved prob-lematic in CSEM) in favour of the basic principle of maintaining some fixed fraction invested in stock. 48 Ideal price The total value of a portfolio consisting of c cash and s shares at price p is w(p)=c + ps. (3.12) The goal of FIS is to maintain a balance ps — iw(jp) (3.13) by adjusting one's holdings s. (The ideal investment fraction is simply denoted by i henceforth, without the cumbersome asterisk.) If an agent currently holds c cash and s shares then the agent would achieve the optimal investment fraction i if the stock price was which will be called the agent's ideal price. If the price is higher than the ideal price then the agent holds too much capital in the form of stock and will want to sell and, conversely, at a lower price the agent will buy in order to buoy up its portfolio. Optimal holdings The fixed investment strategy specifies not only what type of order to place at a given price, but also precisely how many shares to trade. At a price p an agent would ideally prefer to hold stock Given the ideal price above, the agent's wealth can be written 1 — i w(p) — —:—P*s (3.16) i Hence, an agent's optimal trade at price p is s — s = s (3-17) P v* - (l-i)s—+is-s (3.18) P = (3.19) The fractional change of stock (s* — s)/s is plotted as a function of price in Fig. 3.1. So, given a portfolio (c,s) and an optimal investment fraction i, FIS prescribes when to buy and sell and precisely how much. 49 Current price / Ideal price p/p* Figure 3.1: The fixed investment strategy specifies how many shares to trade at a price p given an investment fraction i and an ideal price p* (from Eq. 3.14). As the current price drops toward zero the fractional change in shares diverges. 50 3.2.5 Friction The derivations of the fixed investment strategy here and elsewhere [56,57] assume no transaction costs, which makes it impractical in real markets. Every minuscule fluctuation of a stock's price would require a trade in order to rebalance the portfolio but if the fluctuation was too small the trade would cost more than the value of the shares traded, resulting in a net loss. To circumvent this problem in a simulated portfolio with a commission on every trade (see Chapter 6) I imposed limits on the buying and selling prices: where / is defined as the trading friction. The same approach can be used here: don't buy until the price drops below the limit ps and don't sell until it rises above the limit ps- This allows the simulation to mimic the effect of transaction costs while conserving the total cash. It is also required to make the simulation a discrete-event model. Without it, agents would trade on a continuous-time basis and the model would be difficult to simulate (and less realistic). Obviously, the larger the friction, the larger a price fluctuation will be re-quired before an agent decides to trade. So increasing friction decreases the trade frequency and hinders market activity—which explains why the term "friction" is used. Each agent's friction must be strictly positive because a zero value would allow an agent to place an order to trade zero shares at its ideal p r i c e — a null order. The simulation does not forbid this but, as will be seen, such an order would never be accepted. There is no fixed upper limit on the friction but it seems reasonable to impose / < 1. This would mean an agent places buy and sell limits at one half and double the ideal price, respectively. (It would be peculiar for an investor not to sell when a stock's price doubles!) Thus, the friction / is the first agent-specific parameter in D S E M and it is chosen such that / € (0,1). 3.2.6 Call orders As mentioned above, D S E M is a discrete-event simulation. The events are orders which are called out. A n order consists of a price p0, type ("Buy" or "Sell") and PB = P*/(l + f) Ps = P*(l + /) (3.20) (3.21) 51 Figure 3.2: Call orders p0 are placed at either the current price p or the limit prices PB or ps, whichever is better. The spread between the limits increases with friction /• quantity of shares to trade. The agents use the limit prices discussed in the last section to set their trad-ing prices, thereby solving the problem of how to design agents which can choose both a trade volume and price. (The volume is set by Eq. 3.19.) Of course, if the current market (last traded) price p is "better" than the limit price (p > p$ or p < ps) then the agents rationally choose to trade at that price: min(p,pfl) for "Buy" orders max(p,ps) for "Sell" orders. These order prices, shown in Fig. 3.2, are substituted into Eq. 3.19 to compute the volume of shares to trade. Notice that there are two options (buy or sell) for every price. Before discussing how the agent decides which action to take, we must understand the process which governs most discrete-event simulations. 52 3.2.7 Poisson processes A Poisson process is a stochastic counting process denned by the probability of an event occurring in an infinitesimal interval dt, n(dt) = dtjr (3.23) where T is the average event interval [58]. A l l events are assumed to be independent. The cumulative probability of no events having fired within a finite interval t is then given by Pr(0,t) = e~t/T (3.24) which accurately describes many natural processes, such as radioactive decay of nuclei. The advantage of the Poisson process from a simulation perspective is that it may be interrupted. Consider a time interval t which is divided into two intervals t\ and t2 (t = t\ + t2). The probability of no events within t can be written Pr(0,t) = e _< t l + t 2>/ r (3.25) = Pr(0,<i)Pr(0,<2) (3.26) which simply means that if the event did not occur within the interval t then it must not have occurred within either of the sub-intervals t\ or t2. This equivalence relation may be interpreted to mean that a clock which measures the stochastic time to a Poisson event may be reset at any time before the event fires, without changing the probability distribution. This property is very useful for discrete-event simulations because it allows one to proceed to the first event time, update the system, and recalculate all the event times from this point without disturbing the process. (Event times would need to be recalculated, for instance, if their average rate r was affected by the event which transpired.) We now have the background necessary to discuss how an agent chooses which action to take. 3.2.8 Call interval Agents always have two options when deciding on a call order: place a "Buy" or a "Sell" order. Since the simulation is event-driven, the easiest way to handle this is to allow both possibilities. Each is an event which will occur at some instant in time. The events of calling orders is modeled as a Poisson process, as discussed above. More precisely, "Buy" and "Sell" orders are modeled as independent Poisson processes, each with its own characteristic rate. Intuitively, if the current stock price 53 p is very high then we expect agents will try to sell at or near the current price, rather than trying to buy at a much lower price, and vice versa if the price is low. This intuition can be captured by making the Poisson rates for the "Buy" and "Sell" calls price dependent [26], as TB(P) — T (3.27) PB TS{P) - ?JLT (3.28) P where r is an unspecified (constant) timescale and TQ and TS are the average times between each type of call order. The above linear price dependence is justified only on the basis of its sim-plicity, but it also has some reasonable consequences. An agent makes the fewest calls (of either type) when the call rate 1 /TB+S is minimized, dp TB+S — (— + — dp \TB rs _PB J _ P2 PS which occurs at a price p = y/pBPs — P*» the agent's ideal price. Hence, when an agent is satisfied with its current portfolio, it will place the fewest (speculative) orders. As the price moves away from the agent's ideal price the call rate increases reflecting an increased urgency. With many agents independently placing call orders, the agent with the greatest urgency (shortest waiting time) will tend to place the first order, so urgency drives market fluctuations [59]. This is in contrast with C SEM and many other simulations in which fluctuations are driven by supply vs. demand [27-36]. Incidentally, the minimum call rate is 1 TB+S{P*) which decreases as the friction / is increased, as discussed previously. Eqs. 3.27-3.28 were introduced to increase the probability of buying if the last trading price was low and selling if the last trading price was high. The probability (3.29) (3.30) (3.31) 1 + 1 TB(P*) TS(P*) 2 (3.32) (3.33) 54 Current price / Ideal price p/p* Figure 3.3: "Buy" and "Sell" call orders are modeled as independent Poisson pro-cesses with price-dependent rates. As the last trading price increases, the probability of a "Sell" order being called becomes much more likely than a "Buy." 55 of the next call being a "Buy" can be explicitly calculated: consider the probability of a Buy order being placed between times t and t + dt with neither a Buy nor a Sell having occurred yet. Then, marginalizing over all t gives the probability of a Buy occurring first, Pr(Buy) = r — e - ( / T B - t / r s JO TB 1/TB 1/TB + l / r s P*/P p*/p + p/p*' Of course, the probability of a Sell order being placed first is the complement, Pr(Sell) = 1 — Pr(Buy). Fig. 3.3 shows that Buy orders are much more likely at low prices and Sell orders more likely at high. 3.2.9 Reply orders Orders are divided into two types: call orders and reply orders. The call orders have been discussed above. Reply orders are handled in much the same manner with two important changes: 1. When replying to a call order, the replier is not free to set a price but must accept the called price. 2. Reply orders happen on a much faster timescale than call orders. The first item requires that reply orders be handled slightly differently than call orders. The agent still calculates price limits according to Eqs. 3.20-3.21 but now this is used as the criterion for whether to place an order or not. If the order price meets the Sell limit p0 > ps then a "Sell" reply is placed and if the price meets the Buy limit p0 <PB then a "Buy" reply is placed. Otherwise the agent does not reply to the called order. It is assumed that a called order is completely transparent; all potential repliers have full information about the order including the price, type of order (Buy or Sell) and quantity of shares to be traded. This provides another criterion whether to reply or not: the quantity of shares in the called order must be sufficient to completely fil l the replier's demand. Although this requirement may seem too strict, it is useful because it prevents callers from swaying the price series with negligible orders (as could occur with zero friction or if the caller has negligible wealth compared to the replier). (3.34) (3.35) (3.36) 56 Like call orders, replies receive stochastic execution times with average in-tervals from Eq. 3.27 or Eq. 3.28 depending on whether the reply is a Buy or Sell order. Replies compatible with the call order are then processed in a "first come, first served" queue until the call order is filled or all repliers are satisfied (partial orders are processed). As mentioned above, replies occur on a much faster timescale than call orders, such that no more than one call order is ever active at a time. The time for all replies to be processed is assumed to be infinitesimal compared to the calling time interval. 3.2.10 Timescale If each agent places Buy and Sell call orders at rates 1/TB and 1/TS, respectively, then the net rate p of call orders being placed (for all agents) is Let us assume that all agents are satisfied with their current portfolios; an assump-tion which, by Eq. 3.33, minimizes the net event rate, Now we can identify the minimum event rate with a real rate in order to specify the timescale r. Note that fixing r is only necessary in order to set a scale for comparing simulation data with empirical market data. An arbitrary but con-venient choice is to assume each agent trades once each day (on average). Then, if the market contains N agents, we expect pmin = N/2 (because each call order is a trade between at least two agents) so the timescale r should be Hence, one time unit is meant to represent one day. This is a particularly useful choice because it allows us to draw some parallels between DSEM and the original model, CSEM, in which agents were constrained to exactly one trade per day. Notice the parallel is not perfect for two reasons: firstly, there is no guarantee that a call order will have any repliers and secondly, if it is executed there may be multiple repliers. Nevertheless, the scaling should be accurate within an order of magnitude. (3.37) Pmin (3.38) (3.39) 57 3.2.11 Initialization DSEM is initialized with N agents amongst whom some total cash C and total shares S are distributed. (N sets the "size" of the market. Heavily traded stocks would be represented by large N.) As in CSEM, the cash and shares wil l usually be distributed uniformly between the agents. Each agent begins with an ideal investment fraction z(0) = 1/2 which gives an ideal price p*(0) = - . (3.40) s For simplicity, each agent begins satisfied with its current portfolio, believing that the current market price is actually its ideal price p(0) = p*(0). Notice that Eq. 3.11 is not used to set the investment fraction. Its purpose was only to establish that holding a fixed fraction i of one's wealth in stock is rational. How i is updated is the subject of the next section. If the cash and shares are initially distributed equally amongst the agents (as will be assumed for all runs, unless otherwise stated) then the simulation begins in stasis: any Sell order submitted must necessarily be set above all repliers ideal price (C/S) so it will not be filled, and vice versa for Buy orders. Thus, no trading will occur and the price will never move away from C/S. What is needed is some stochastic driving force to initiate the dynamics. (Even starting with a non-uniform distribution produces only transient fluctuations before the price stabilizes.) In this section three market parameters were introduced: the number of agents JV, the total cash C, and the total shares S. 3.3 Fluctuation theory The above theory completely specifies the basic model. What remains is to in-corporate fluctuations, as discussed in this section. Recall that forecasting was problematic in CSEM (see Fig. 2.9, for example) so in DSEM a different tack is taken. 3.3.1 Bayes' theorem Before we can understand how fluctuations are incorporated it is necessary that we briefly review some results from Bayesian probability theory [25, Ch. 4], which provides an inductive method for updating one's estimated probabilities of given hypotheses as new data arrive: Let H represent some hypothesis which one wishes to ascertain the truth value of. If X is our prior information then we begin with a probability of H given 58 X, denoted by P(H\X). Now notice that as new information D (data) arrives the joint probability of both the hypothesis and the data becomes P(HD\X) = P{H\X)P{D\HX) = P{D\X)P(H\DX) (3.41) (3.42) from the product rule of probabilities. But these equations can be rewritten to give the probability of the hypothesis in light of the new information, which is known as Bayes' theorem. Let us define evidence as the logarithm of the odds ratio e = log [P/(l — P)], which is just a mapping from probability space to the set of all real numbers (0,1) —> (—oo, oo). The advantage of the evidence notation over probabilities is that incorporat-ing new information is an additive procedure where H is the negation of H. With the understanding that assimilating new information is an additive process for evidence, we may proceed with extending D S E M to include fluctuations. 3.3.2 N e w s In real markets a stock's price is derived from expectations of its future earnings. These expectations are formed from information about the company, which is re-leased as news. In other words, news drives market fluctuations. In DSEM fluctuations are also driven by news. How to represent news in this model, though, is problematic. Inspiration comes from Cover and Thomas [24, Ch. 6] in which the optimal (defined as maximizing the expect growth rate of wealth) wagering strategy in a horse race (given fair odds) is to wager a fraction of one's capital on each horse equal to the probability of that horse winning—a fixed investment strategy, where the investment fraction is identified with a probability. In DSEM, this would have to be translated as the probability of the stock (or cash) "winning," the interpretation of which is unclear. (A loose interpretation might be that the stock wins if its value at some future horizon is greater than cash, otherwise cash wins.) P{H\DX) = P{H\X) P(D\HX) P{D\X) ' (3.43) e(H\DX) = e(H\X) + log P{D\HXy P{D\HX) \ ' (3.44) 59 Regardless of what i is a measure of, if we interpret i as some probability measure then Bayesian probability theory offers an avenue for further development. The evidence, as discussed in the last section, corresponding to the probability i is e = l o g ^ - . (3.45) As new information is acquired the evidence is updated via e' = e + 77 (3.46) where 77 represents the complicated second term of Eq. 3.44. This suggests modeling news as a stochastic process 77 which affects an agent's confidence in the stock and, thereby, investment fraction. (Modeling the news as 77 directly, instead of through the information D as presented in Eq. 3.44, dramatically simplifies the calculations.) A positive 77 increases the evidence (and investment fraction), a negative value decreases it, while 77 = 0 is neutral. Assuming news releases have a finite variance and are cumulative, the Central Limit Theorem indicates the appropriate choice is to model 77 as Gaussian noise. It should have a mean of zero (77) = 0 (unbiased) so that there is no long-term expected trend in investment (or price). The scale of the fluctuations an, though, is difficult to decide; but this just opens up an opportunity to increase diversity amongst the agents—let each set its own scale. We begin by setting an arbitrary scale a„ and then allowing agents to rescale it according to their own preferences. Since each agent wil l apply its own scaling factor the universal scale a„ is arbitrary so it will be set to a convenient value later. News response First, we begin by defining individual scale factors. Let us define a new agent-specific parameter r„, which represents the agent's responsiveness to news. Then evidence would be updated as e' = e + rnn. (3.47) A responsiveness of zero would indicate the agent ignores news releases and maintains a constant investment ratio. As the responsiveness increases the agent becomes increasingly sensitive to news and adjusts its ideal investment fraction more wildly, via ''=, T^l »• <3-48> 1 -exp(rnrj)) 60 Notice there exists a symmetry between positive and negative news if the responsiveness also changes sign, (—r„)(—77) = rnr], so we can impose the restriction (rn) > 0 (averaged over all agents) without loss of generality. News releases Since D SEM is a discrete-event simulation the news must also be inserted as dis-crete events. For reasons discussed in Appendix B news events are modeled as a discrete Brownian process with some characteristic interval T„ (unbiased Gaussian-distributed jumps occurring regularly at intervals of r n ) . Thus, on average, we expect 1 / r n news events each day. If every news release has a variance of cr2, then the variance of the cumulative news after one day is cr 2 /r n . To minimize the impact of the news interval parameter the variance of the news over some fixed interval should be constant, independent of how often news is released. Otherwise rescaling the news interval will rescale the evidences and hence impact upon the scale of the price series. Let us take the variance to be one unit over one day, which is satisfied when °\ = rn. (3.49) To draw parallels with real markets the news release interval is chosen be-tween 1/6.5 < rn < 5 where the market is assumed to be open six and one half hours per day [7]. Thus news releases occur at least once per week and at most once per hour. A smaller interval is inappropriate because news is irrelevant until an investor is made aware of it—most investors (except professional traders) probably do not check for news more than once per hour. (In fact, most people still get their news from the daily newspaper, suggesting r n = 1.) On the other hand, a timescale longer than a week (5 days) is useless because we are particularly interested in fluctuations on the scale of hours to days, so the driving force should be on the same scale. Parameters In this section two new parameters related to news were presented: the agent-specific news response parameter rn which is allowed to be negative but is constrained such that (rn) > 0; and the global news interval parameter which is strictly positive and constrained to r n & (1/6.5,5). 3.3.3 Price response DSEM with news-driven fluctuations is a complete model ready for experimentation. However, it is lacking in that it neglects a significant source which real investors often construct their expectations from: the price history itself. It is probable 61 that feedback from the price series is integral to the clustered volatility and other complex phenomena found in empirical data. In C S E M this feedback was modeled by tracking the history of returns and forming future expectations therefrom, to set the investment fraction. Since D SEM sets the investment fraction completely differently, this method is unavailable. However, it is possible to construct a method which allows agents to extract information from the price series. Consider how a single agent's ideal price is affected by news. Prom Eq. 3.14, after a news release 77 the ideal price becomes i'c (1 — t')S = exp(r„77)p* (3.51) which suggests that news is related to the logarithm of the price through j y o c l o g - . (3.52) P Therefore price movements imply news and if the price does change, an agent may infer that it missed some news which others are privy to. Thus the price feed-back may be inserted by extending the evidence dependence such that on a price move from p to p' where rp, the response to price, is a new agent-specific parameter. Setting rp = 0 eliminates the price feedback and reverts the model to being driven solely by news. But with non-zero rp agents have a chartist nature: they presume the price series contains information (trends) and wager accordingly. Some market models separate agents into two groups: fundamentalists and chartists [32,36,47]. Fundamentalists (or "rational" traders) value the stock using fundamental properties such as dividends and reports of assets. In the absence of dividends, this would correspond to responding strongly to news releases in DSEM. Chartists (or "noise" traders) simply use the price history itself as an indicator of the stock's value. This would correspond to a strong price response in DSEM. However, in DSEM the two are not exclusive. Instead of drawing a distinction between the two types of traders a continuum exists where rn and rp can take on a wide range of values so that agents may value the stock based on fundamentals and on its performance history. Note that the price only changes when a trade occurs. Only agents not in-volved in the trade (neither the caller nor a satisfied replier) should update their 62 evidence since the traders can't be interpreted as having "missed" some informa-tion (because their trade was the information). Further, allowing the caller and replier(s) to also update there evidence would cause complications because an agent could never reach equilibrium—a trade would bring them to their ideal investment fractions but then their evidences (and investment fractions) would be immediately changed. The price response parameter is similar to an autocorrelation in returns. Therefore we expect the dynamics to destabilize if we allow it to exceed unit mag-nitude. As we will see in Chapter 4, imposing — 1 < (rp) < 1 keeps the price from rapidly diverging. In this section a new parameter, the price response rp which is constrained by \(rp)\ < 1, was derived. 3.3.4 Review The Decentralized Stock Exchange Model (DSEM) consists of a number N of agents which trade with each other directly, without the intervention of a market maker. In this section the structure of the model wil l be reviewed. Game theory indicates the optimal strategy is to maintain a fixed fraction of one's capital in stock, the Fixed Investment Strategy (FIS). Therefore the agents trade in order to rebalance their portfolios consisting of c cash and s shares. News releases and price fluctuations cause agents to re-evaluate their investment fraction i and up- or down-grade it as they see fit. A l g o r i t h m Everything that happens in the model occurs as a discrete event in continuous time. The basic algorithm follows: 1. Initialization. Cash and shares distributed amongst agents. Agents sample Poisson distribution to get waiting times until first call orders. Waiting time for first news event is set to zero (first event). 2. Next event is found (shortest waiting time). Time is advanced to that of the event which is executed next. If it is a news release proceed to Step 3. Otherwise it is a call order being placed, proceed to Step 4. 3. News event. Sample Gaussian distribution to generate deviate for news. Ad-just all agents' ideal investment fractions. Set news waiting time to rn and recalculate all agents' waiting times. Return to Step 2. 63 Symbol Interpretation Range Market parameters N number of agents 2+ C total cash available S total shares available average interval between news releases (1/6.5,5) Market state variables Pit) stock price at time t v(t) trade volume (number of shares traded) at time t Agent parameters u friction of agent j (0,1) rn,j news response of agent j (r„> > 0 rP,3 price response of agent j I M < 1 Agent state variables Cj cash held by agent j S3 shares held by agent j Wj{p) wealth of agent j at stock price p ij optimum investment fraction of agent j Table 3.1: A l l parameters and variables used in the Decentralized Stock Exchange Model (DSEM). 4. Cal l order placed. (a) Calculate all reply orders and corresponding waiting times. Place com-patible replies in queue, ordered by waiting times. (b) If queue is empty, proceed to Step 4d. If queue is not empty, remove first reply order from queue and execute it. (c) Reduce outstanding call order by appropriate volume. If not completely filled, return to Step 4b. (d) Recalculate call-order waiting times for agents which traded. If price did not change, return to Step 2. (e) If price did change, recalculate ideal fractions for all other agents. Recal-culate call-order waiting times. Return to Step 2. Parameters For convenience all the variables used in DSEM are listed in Table 3.1. The param-eters are inputs for the simulation and the state variables characterize the state of 64 the simulation at any time completely. For each run, the agent-specific parameters are set randomly; they are uniformly distributed within some range (a subset of the ranges shown in the table). Each dataset analyzed herein will be characterized by listing the market parameters and the ranges of agent parameters used. 3.4 Implementation The Decentralized Stock Exchange Model (DSEM) is completely characterized by the above theory. The model is beyond the scope of rigorous analysis in all but the most trivial of scenarios so it is simulated via computer. The model was programmed in C++ using Borland C++Builder 1.0 on an Intel Pentium II computer running Microsoft Windows 98. The source code and a pre-compiled executable are available for download from http://rikblok.cjb.net/phd/dsem/. D SEM encounters some of the same issues as CSEM. In particular, random numbers are handled using the same code as was discussed in Section 2.3.1. Simi-larly, the random number seed will be specified with the other model parameters if the default (the current time) is not used. 3.5 Parameter space exploration Having converted DSEM to computer code time series can be generated for numer-ical analysis. Currently D SEM requires seven parameters (one fewer than C S E M started with) to fully describe it. Again, it would be useful to try and reduce the parameter space before collecting any serious data. 3.5.1 Number of agents N The effect of changing the number of traders wil l be explored in detail in Chapter 4 and is left until then. 3.5.2 Total cash C and total shares S In this section the effect of rescaling the total cash C and total shares S will be explored. Let us denote rescaled properties with a prime. Then rescaling cash by a factor A and shares by B is written C = AC S' = BS. (3.54) (3.55) 65 Parameter Run 1 Run 2 Run 3 N 100 100 100 C $1,000,000 $10,000,000 $1,000,000 S 1,000,000 10,000,000 10,000,000 Tn 1 1 1 r n 0.025 ± 0.025 0.025 ± 0.025 0.025 ± 0.025 rp 0.75 ± 0.75 0.75 ± 0.75 0.75 ± 0.75 f 0.05 ± 0.05 0.05 ± 0.05 0.05 ± 0.05 seed -2 -2 -2 Table 3.2: Parameter values for D S E M Runs 1, 2 and 3. Cash and shares are rescaled equally for each agent so the distribution remains constant. We begin by noticing that the initial evidence is unchanged (being fixed at e(0) = 0). The evidence depends on news releases (which are unaffected by rescaling) and price movements through Eq. 3.53. Assuming price scales linearly (p' oc p) the logarithm of the price ratios will be unchanged. Thus, the evidence and the investment fraction will also be unchanged under rescaling. Assuming i remains unsealed, an agent's ideal price scales as and the buy and sell limits scale identically. Since all agents ideal prices scale as A/B it is reasonable to expect that the entire price series scales as p' = f P, (3-57) as is required for the investment fraction to remain unchanged. Thus, these two hypotheses are compatible and, through the initial conditions, are realized. This argument took the same form as in Section 2.4.2 for CSEM. An identical argument for trade volume also applies giving the result that volume scales with shares as v' = Bv. (3.58) To test these hypotheses three runs were performed, with the parameter values shown in Table 3.2. Contrasting Runs 2 and 3 with Run 1 give A = 10,B = 10 and A = 1,B = 10, respectively. The results, shown in Fig. 3.4, confirm our hypothesis. 66 Figure 3.4: Comparison of time evolutions of (a) price and (b) volume for Runs 1, 2 and 3 as defined in Table 3.2. The price scales as the ratio of cash to shares and the volume scales as the number of shares. (Run 2 is offset to improve readability. The gaps in (b) denote periods of zero volume.) 67 becomes Thus, we again have a model where the roles of C and S are only to set the price and volume scales, which are irrelevant anyway. Without loss of generality we can arbitrarily fix C = $1,000,000 and S = 1,000,000 thereby reducing the degrees of freedom by two. 3.5.3 Further scaling Having fixed the total cash and shares it is possible to show that no further scaling arguments are possible—there is no way to transform the model parameters such that the dynamics are invariant. We begin by noticing that, subject to C and S being fixed, each agent's cash c and shares s must also be fixed—constant under any transformation. Under transformation a trade occurring between times t- and £+ c(t+) = c ( * _ ) - p A s (3.59) s(t+) = c{t-)+As (3.60) c{t+)' = c(t-)-p'As' (3.61) s(t+Y = c(f_) + As '. (3.62) But requiring d = c and s' = s for all times immediately imposes the restrictions A s ' = As and p'=p. (3.63) So price must also be a constant under the transformation. In particular, an agent's ideal price, given by Eq. 3.14, and limit prices, given by Eqs. 3.20-3.21, must remain unsealed, which immediately implies i' = i and / ' = /, (3-64) so there is no way to rescale / without impacting the dynamics. The constraint that i be invariant necessarily means that the evidence e must also be. After a number of news releases rjj and price movements the evidence is e W = r n £ ^ - + r p l o g ^ | . (3.65) Therefore e is only invariant if the number of news releases is the same (requiring Tn = Tn) and r'n = rn (3.66) r'p = rp. (3.67) 68 The conclusion which may be drawn is that there is no transformation of any model parameters which leaves the dynamics invariant. 3.6 Parameter tuning In the absence of scaling arguments to reduce the parameter space further we must use tuning methods to choose appropriate parameter ranges. (It is acknowledged this weakens the model's results somewhat, but it is necessary in order to establish a sufficiently small parameter space for experimentation.) 3.6.1 N e w s response In this section we will explore the effect of varying the news response parameter rn. Let primes denote values after the arrival of a news event and unprimed quantities the same values before its arrival. If we could neglect price response (rp = 0) then the price fluctuations would behave as P*' log — = e' - e = rnn (3.68) p where 77 is the cumulative news in the interval. So we would expect the price to have a log-Brownian motion with a standard deviation of rn\/i (because 77 has a variance t). However, the agents' response to price movements clouds the picture some-what. Accounting for both news and price response, the evidence changes as P e' — e — rnv + r„log —. (3.69) P If we assume, for simplicity, that the agent begins and ends at its ideal price (p and p' — p*') then the relationship becomes l o g ^ = (3.70) P 1 - rp Although the assumption is too restricting for the above equation to accurately describe price movements it at least sets a scale for the dependence of the price movements with respect to the model parameters. To test Eq. 3.70 the price series of Run 1 is reproduced in Fig. 3.5 along with the expected price from the equation (initialized at p(0) = 1). The graph indicates a rough agreement between the series but with a systematic error for prices far from $1, indicating the equation does not completely capture the dynamics. 69 Figure 3.5: The price series generated by Run 1 is compared with the expected price generated by Eq. 3.70, showing rough agreement (though with systematic deviations). 70 Nevertheless, it is sufficient for the following purpose: it at least allows us to choose the scale of the news response rn such that the fluctuations are on roughly the same scale as observed in real markets. If the log-return r (not to be confused with responsiveness) over one day obeys r(t) EE log = -^-(t,{t) - V(t - 1)) (3.71) p(t - 1) 1 - rp where r)(t) is the cumulative news up to time t, then the standard deviation or of the returns is ar = - ^ y V a r f a ] (3.72) where Var [77] is the news variance over one day, which was set in Section 3.3.2 to 1. Rearranging the relationship, we can set a scale for the news response rn = (l- rp)ar. (3.73) Daily returns for the New York Stock Exchange over 26 years covering the period 1962-1988 [60] give ar = 0.00959 while the nine individual stocks studied in Chapter 6 are somewhat more variable with ar = 0.036 ± 0.014 (see Table 6.4). A rough guideline, then, is rn = (1 — rp) • 0.02. Arbitrarily taking (rP) « 1/2 suggests (r n ) = 0.01. For all future simulations a range of rn — 0.01 ± 0.01 wil l be used, unless otherwise stated. 3.6.2 Friction In this section the effect of changing the friction parameter will be explored. One of the effects of changing the friction has already been presented in Section 3.2.10 where it was found that, in order to preserve the number of trades per day, it was necessary to rescale time with Eq. 3.39. This has an appealing interpretation: as friction (or cost per trade) increases the trade rate drops. However, this effect is rather trivial and, by rescaling time via Eq . 3.39, it can be ignored. Nevertheless, / still influences the dynamics in subtle ways. To choose the friction we again appeal to real market structure. Most markets prefer trade quantities to be in round lots or multiples of one hundred shares. So the minimum trade in D S E M should consist of one hundred shares. Of course, the trade quantity depends on how many shares an agent holds and how the dynamics have unfolded. But, at the very least, we can impose this condition initially (at t = 0) because then we know each agent's portfolio and the market price precisely. Assuming the total cash C and shares S(= C) are distributed equally, each agent begins with an ideal investment fraction i = 1/2, ideal price p* = 1, and limit 71 prices PBS = (! + / ) ± 1 - A trade wil l be initiated at one of the limit prices so the quantity of shares to be traded, from Eq. 3.19, is From this argument it appears that the friction should increase with the number of agents N in the model. However, this is just an artifact of having fixed the total number of shares S. As N is increased each agent receives fewer shares since S is constant so it must wait for a larger price move before trading, so that it can trade a full lot. But fixing S in this way is somewhat unnatural. It is more natural to expect that if more agents are involved in a certain company then the company is probably larger and has more shares allocated. So S should probably have scaled with N. But the effect of scaling S can be mimicked by scaling the round lot as A s m i „ oc 1/N. Then the friction is a constant, regardless of N. The most common system size to be used in this research wil l be N = 100 so, given S = 1,000,000, a good scale for the friction is / = 0.02. However, to incorporate heterogeneity future simulations will use / = 0.02 ± 0.01. 3.6.3 N e w s i n t e r v a l Previously it was argued that the average news release interval rn should be on the order of a day. After more reflection the value of one day seems even more appro-priate given the strong daily periodicity in real markets; daily market openings and closings, daily news sources (such as newspapers), and human behaviour patterns are just a few examples of daily cycles which influence market dynamics. Another advantage of using an interval of one day in DSEM is that it strengthens the connection with CSEM, in which all events occur simultaneously once per day. For these reasons the news release interval wil l be fixed at rn = 1 (day). 3.6.4 F i n a l i z e d p a r a m e t e r ranges Via rescaling and tuning we have greatly narrowed the allowed ranges of the pa-rameters. The final ranges which will be used in all further simulations are shown Imposing the round lot restriction A s m , „ = 100 sets the friction at (3.74) (3.75) (3.76) 72 Symbol Interpretation Range Market parameters N number of agents 2+ C total cash available $1,000,000 S total shares available 1,000,000 average interval between news releases 1 Agent parameters u friction of agent j 0.02 ± 0.01 rn,j news response of agent j 0.01 ± 0.01 rPJ price response of agent j (r P) € (0,1) Table 3.3: As Table 3.1 except with updated parameter ranges. These ranges will be used in subsequent simulations. A l l parameters except JV and rv are firm. in Table 3.3. A l l the parameters except the number of agents N and the price response rp have well-defined values (or ranges, in the case of agent-specific param-eters). These two parameters wil l be the subject of further examination in the next chapter. 73 Chapter 4 Analysis and Results: Phase space In the previous two chapters the Centralized and Decentralized Stock Exchange Models (CSEM and DSEM, respectively) were presented and in each case all but two parameters were fixed. In this chapter the remaining parameter space will be investigated and it will be demonstrated that both models exhibit phase transitions for interesting values of these parameters. We begin with CSEM. 4.1 C S E M phase space 4.1.1 Review The Centralized Stock Exchange Model (CSEM), presented in Chapter 2, consists of a number N of agents which trade once daily with a centralized market maker. The market maker chooses a trading price such that all orders are satisfied and the market clears (supply exactly balances demand). The agents choose their orders based on a forecast of the daily return-on-investment which has a stochastic component modeled as a Gaussian deviate with standard deviation at (defined as the forecast error). In Chapter 2 the model parameter space was reduced leaving only N and at as free parameters. In this section the remaining two-dimensional parameter space will be explored. 4.1.2 Data collection To explore the phase space thoroughly simulations were performed on systems of sizes JV=50, 100, 200, 500, and 1000 with forecast errors in the range ae G [0.01,0.50] for each JV, with increments of 0.01 up to at = 0.25 and increments of 0.02 thereafter, 74 Parameters C S E M Dataset 1 Particular values Number of agents N 50 100 200 500 1000 Investment limit 6 10~ 3 I O - 3 10~ 3 IO" 3 I O - 3 Run length (time steps) 10,000 10,000 20,000 20,000 30,000 Number of runs 38 38 38 38 38 Common values Forecast error cre Total cash C Total shares S Memory M Risk aversion o Degree of fit d seed 0.01 to 0.25 by 0.01 0.26 to 0.50 by 0.02 $1,000,000 1,000,000 105 ± 95 (uniformly distributed) 2 ± 1 (uniformly distributed) 0 (moving average) random Table 4.1: Parameter values for C S E M Dataset 1. Some of the parameters were established in Chapter 2 and are common to all the runs. Dataset 1 explores two dimensions of phase space: N and CT£. for a grand total of 190 experiments performed. The complete list of parameter values used can be found in Table 4.1. The choices of parameter values used (other than N and ae) are justified in Chapter 2. To introduce heterogeneity amongst the agents some of the parameters, namely the memory M and risk aversion a, were chosen randomly for each agent from the ranges indicated in the table (with the deviates uniformly distributed within the ranges). Each run consisted of at least 10,000 time steps (days) and larger systems had longer runs to compensate for slower convergence to the steady state. (With these run lengths the initial transient never accounted for more than one third of the total run). 4.1.3 Phases In most of the runs an initial transient period was observed before the price con-verged to a steady state value around which it fluctuated. The only discrepancy was for small forecast errors where the price climbed quickly until it reached a maximum value which it often returned to. Representative plots of these behaviours are shown in Fig. 4.1(a) and (b), respectively. 75 2000 4000 6000 8000 10000 12000 Time t N = 100, ae = 0.05 0 2000 4000 6000 8000 10000 12000 Time t Figure 4.1: The price series plots for C S E M with N = 100 agents and ae = 0.10 (a) and <7e = 0.05 (b) indicate a change of character of the dynamics. 76 Parameters C S E M Dataset 2 Number of agents N 100 100 100 Investment limit S io-2 10~ 4 10~ 5 Number of runs 38 38 38 Run length (time steps) 10,000 10,000 10,000 Table 4.2: Parameter values for C S E M Dataset 2. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.1, N = 100) exploring a range of investment limits 8. The transition between these two behaviours was observed for all system sizes near crt « 0.08 and is most dramatic when looking at the maximum price observed in a run. M a x i m u m price As is demonstrated in Fig. 4.1(a) the price in each run converged to some steady-state value after some time and then appeared to randomly fluctuate around that value, never exceeding some maximum. As mentioned above, the only exception was when the price reached a limit which interfered with its natural fluctuations. The limit price is a consequence of the investment limit parameter 5 intro-duced in Section 2.4.3 where it was noted that the price may not exceed the limit Pmax = (1 - S)/S (Eq. 2.44). Fig. 4.2 clearly captures the distinct character of the dynamics on both sides of ae 0.08. For larger ae the price fluctuates freely while for smaller values the limit has a strong influence on the dynamics. In the limit 8 —> 0 (which is disallowed because it can occasionally generate singularities in the price series) it appears that the maximum price would diverge at ae — 0 producing a phase transition. Although this cannot be tested by directly setting 8 = 0 the limit can be explored by studying smaller values of 8. 4.1.4 I n v e s t m e n t l i m i t A subset of Dataset 1 with A r=100 agents was simulated again, but this time with 8 = 1 0 - 2 , 1 0 - 4 and 10~ 5 as shown in Table 4.2. It was suspected that reducing 8 would increase the limit price and thereby allow the price to fluctuate freely for smaller values of a £ , reducing the domain of the second phase. However, as Fig. 4.3 demonstrates the threshold values of ae (see Table 4.3) 77 Figure 4.2: The highest price in any given simulation increases as the forecast error decreases until it reaches its theoretical limit, creating two separate phases for the dynamics. 8 Threshold at i ( H 0.08 0.06 1 Q - 4 0.06 1 0 - 5 0.08 Table 4.3: The threshold values of cre separating the two phases of C S E M shown in Fig. 4.3 do not appear to depend on the investment limit 5. 78 1 0 0 0 0 0 & •e—a—a n > E E I I • 5 = l(T2 + " 6 = IO" 3 X -6 = IO" 4 * " S = 1 0 - 5 • " T 1 1 1 1—I I _| 10000*-1000 * - X • X X x x^x 100-fc- + 10 1 0.1 1 ' 1 0.01 0.1 Forecast error ae 1 Figure 4.3: The maximum price in C S E M has a limit which depends on the invest-ment limit 6. However, the threshold value of ae for which the limit is first reached does not appear to depend on S. for which the price first reaches its limit remains constant, even though the price limit increases. This suggests that there exists a critical forecast error ac > 0 at which the maximum price diverges. Even though the critical point is only strictly defined in the limit of 8 —> 0 the term will also be used here to refer to systems with nonzero values of 6. 4.1.5 Critical regime Critical points are heralded by power law relationships of the form f(x) on (x — xc)z where xc is the critical point and z is known as the critical exponent. (For a thorough explanation of critical phenomena see Ref. [61].) Many different quantities can play the role of the critical variable /. In a thermodynamic system it could be an order parameter such as the magnetization of a ferromagnet. Alternatively, / can be a response function such as the susceptibility or the specific heat or it could be a correlation time or time for thermalization. The control parameter x could be the temperature or an external field. In the case of C S E M we will continue to use the maximum price as the order parameter and cre as the control parameter. To specify the transition in more detail it would be helpful to estimate the critical point oc and the exponent from the data. We begin by reconsidering the 79 data from Fig. 4.3 and fitting it to a power law Pmax = C ( c r e - <7C) -6 (4.1) to estimate ac and b (C is unimportant). The fitting algorithm used is a Levenberg-Marquardt nonlinear routine [20, Section 15.5] and the fit is performed over the range ae > 0.14. (Choosing a range which is too near the actual critical point tends to reduce the quality of the fit because critical points tend to be "blurred" on finite systems.) The fitting algorithm attempts to minimize the sum-of-squares error between the curve and the data but it can get stuck in suboptimal solutions which depend on the initial parameter choices when performing the fit. For these fits the parameters were initially set to C = exp(—3), ac — 0.08, and 6 = 2 because these values were observed to fit the data reasonably well. (Though setting ac = 0.01 initially, the fit still converged to the same solution.) The resultant fits for each value of 6 in Dataset 2 (all with TV = 100) and for N = 100 in Dataset 1 give the critical points and exponents shown in Fig. 4.4. The weighted average of the exponents is b = 1.57 ± 0.10 and the mean critical forecast error is ac = 0.120 ± 0.005. The fact that these values are similar for all values of 8 tested strengthens the conclusion that ac is a critical point. Notice that the calculated value of ac is significantly higher than the 0.08 originally hypothesized. This is a common feature of experiments involving critical phenomena and is due to the finite size of the system under investigation. A true critical or second-order phase transition is characterized by a discontinuity in the derivative of the order parameter. In finite systems the discontinuity is smeared out and becomes more refined with larger systems. In this case the smearing resulted in an inaccurate first guess of the critical point. After exploring some alternative choices for the order parameter we will consider finite size effects in more detail. 4.1.6 A l t e r n a t i v e t h e r m o d y n a m i c va r i ab l e s In the last section the maximum price over any run was chosen as the thermodynamic property whereby the phase transition was detected. In this section we demonstrate that a number of alternative variables would be equally suitable. In particular, we consider two alternatives: the (logarithmic) average of the price series p = exp (logp) and the (logarithmic) variance around the average (4.2) a\ = Var [logp]. (4-3) 80 (-1 O n CD QJ u CO CP c CD fl o cfl CD 0.135 0.13 0.125 0.12 0.115 0.11 0.105 0.1 0.095 1.9 1.8 1.7 1.6 1.5 1.4 1.3 10" 10" (a) ' i . i • • r <7C = 0.120 ±0.005 • 1 • 1 . i i • . i 1 Q - 4 1 Q - 3 Investment limit 6 10 - 2 ( b ) ' I 1 1 1 I 1 1 1 . 1 1 b = 1.57 ±0.10-1 -1 1 I I - i i i 1 . IO" 4 IO" 3 Investment limit 8 10 - 2 Figure 4.4: The best fits of power laws to C SEM Dataset 2 (and N = 100 from Dataset 1) yield the critical points (a) and scaling exponents (b) shown. The lines represent the weighted averages of the best fit values. 81 The variance a2 measures the scale of the fluctuations and is analogous to magnetic susceptibility in non-equilibrium systems. Notice a phase transition in the average price p is equivalent to one in the (time-averaged) wealth per agent w because they are related by Nw = C + pS (4.4) where C and S are the total amount of cash and shares, respectively. Fig. 4.5 demonstrates that both these properties exhibit scaling, diverging at the critical point ac = 0.12 (from Fig. 4.4(a)). The critical exponent for the average price power law is 1.11 ± 0.04 and the exponent for the fluctuations is 0.94 ± 0.02. Clearly these properties would be equally suitable to determine the phase transition, but the maximum price has the advantage that the transition becomes very clear because it is a constant to the left of the critical point, being bounded by 6. As mentioned before, the deviation from scaling observed in the variance plot is due to finite size effects which we explore next. 4.1.7 Finite size effects Now we return to the maximum price data in Fig. 4.2 and fit it to a power law using the technique described before. The resultant estimates of the critical point are shown in Fig. 4.6(a) which demonstrates that as the system size N increases the critical point decreases systematically (neglecting the smallest system which is plagued by noise). To derive the relationship between the system size and the associated critical point we need to understand the role of correlations. Corre la t ions Near a critical point the dynamics are dominated by correlations between elements of the system (agents, in our case). The degree to which the elements are correlated is measured by the correlation length £ which, in CSEM, counts the typical number of agents affected by any single agent's decision. Far away from the critical point we don't expect one agent's decisions to affect (many) other agents so the correlation length is short. But near the critical point the correlation length diverges as [61] e f o -ac joc fo -ac ) - " . (4.5) When dealing with a finite system the correlation length is attenuated by the size of the system N. It should reach a maximum at the critical point for that particular system size, denoted by ac(N), and we expect the maximum to grow linearly with N, t(<rc(N) - oc) cc N. (4.6) 82 10 CD u bC a SH > < 0.1 0.01 0.1 C N sa. : b CO P. .2 + J cj 03 •c 0.01 (a) 1 8 = io~2 i—i—r + 8 = 10~ 3 X 8 = 10~ 4 8 = 10~ 5 • X u — 1 U (5  IO" 5 • Exponent = 1.11 ±0.04 _ l I I I l_L ± _1 I I I I L. 0.1 Offset from critical point ot — ac - i - | i i -8 = IO" 2 + 8 = I O - 3 X 8 = io-4 * 8 = io-5 • 7 = 0.94 ± 0.02 -\ (b) 0.01 Offset from critical point at — ac Figure 4.5: The average price (a) and variance of fluctuations (b) also exhibit scaling near the critical point ac = 0.12 for the data from C S E M Dataset 2. The deviation from scaling observed near the critical point in (b) is due to the finite size of the system (JV = 100) as will be seen in Fig. 4.7. 83 SH o SH SH CD 3 CJ OJ tH ,Q cS SH CJ c« CO PQ fi fi O 0) PQ 0.16 0.15 0.14 0.13 0.12 0.11 0.1 S 0.09 0.08 (a) 2.1 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 CTc(OO) = 0.082 ± 0.002 1/V = 0.55 ± 0 . 0 3 -I 1 ! — - h 50 100 200 500 Number of agents N 1000 - < b ) b = 1.73 ±0.03 -i 50 100 200 500 Number of agents N 1000 Figure 4.6: The best fits of power laws to CSEM Dataset 1 yield the critical points (a) and scaling exponents (b) shown. A finite-size scaling analysis (neglecting TV = 50) reveals information on how the critical point changes with increasing investor numbers (a). For reasons discussed in the text, the exponent for N = 1000 is dropped from the estimate of the scaling exponent (b). 84 As a consequence of these two equations we expect the finite-size critical point to converge to the thermodynamic critical point as ac(N) <JC oc N'1^. (4.7) Applying this relationship to the data in Fig. 4.6(a) gives a finite-size scaling exponent \jv = 0.55 ± 0.03 which means that the correlation length grows as £ ~ {ae — oc)~v with v = 1.82±0.10. It also allows a more precise estimate of the (limit) critical point, giving ac = 0.082 ± 0.002. Scal ing exponent The best estimates of the critical exponent b from Eq. 4.1 are shown in Fig. 4.6(b), giving an average value of b = 1.73 ± 0.03. Notice the largest system size gives a markedly different result and is not used to compute the average b. This is a consequence of the range of forecast errors over which scaling applies: Notice that the tails of the highest prices (pmax < 1) i n Fig- 4.2 appear slightly "flatter" than the rest of the data. In fact the tails fit quite well to the scaling relation pmax oc o~l which is probably directly related to the inverse relationship derived in Section 2.2.9. This inverse power law can obscure the critical scaling so the range over which the scaling was tested for was constricted to pmax > 1. Unfortunately, since the run for N = 1000 had the lowest observed prices at any given value of <r£ this restriction severely limited the available data and compromised the fit. It appears that the tail is artificially drawing the estimate of b down for this system size, so it was not included in the computation of b « 1.73. Fluc tua t ions In this section we briefly revisit the scaling seen in the fluctuation data (Fig. 4.5(b)) to demonstrate the round-off seen near the transition is a finite-size effect. Since the fluctuations in the log-price series are due to stochasticity in each of the N agent's trading decisions it is reasonable to expect the variance of the log-price to scale with system size as o2(N) cx l/N so that No2 should be independent of system size. For the most part Fig. 4.7 confirms this hypothesis with some deviations near the critical point. Notice these deviations diminish with larger system sizes so these deviations are just finite-size effect—a "blurring" of the phase transition for small system sizes. The best estimate of the scaling exponent, taken from the largest system is 7 = 1.29 ±0.02. 85 100 I I I I \ I I 1 I 1-1—i—n-N = 50 + N = 100 X N = 200 * N = 500 • N = 1000 • 7 = 1.29 ±0.02 • • 10 X X + _L 0.01 0.1 Offset from critical point ere — ac Figure 4.7: The variance of the log-price largely collapses to a single curve when multiplied by the system size N for C SEM Dataset 1. This curve diverges as the critical point is approached with an exponent 7 = 1.29 ± 0.02 calculated from the largest system N = 1000. (The critical points were taken from Fig. 4.6(a).) 86 Universality class The main advantage of characterizing the critical point precisely is that the critical exponents b, 7, and v may tell us to which universality class the critical point belongs. At a critical point many of the particular details of a system become irrelevant and the scaling properties depend only on a few basic quantities, such as the dimensionality and symmetry of the system [61]. As such, many disparate systems are observed to behave in the same way at a critical point and may be classified by their common exponents. Discovering which universality class a system belongs to leads to further understanding of the important features of the system. For instance, the author has recently been involved in research into the "game of Life" (GL), a toy model of interactions between spatially-distributed individuals. GL lies close to a critical point in the same universality class as directed percolation, a model of the spreading of a cluster through its nearest neighbours [14]. Making this connection teaches us that the important factor determining the dynamics at the critical point near GL is simply the probability of a disturbance spreading to its neighbours, not the particular details of GL. I do not know which universality class C S E M belongs to, but by computing the exponents it is my hope that a reader wil l recognize them and classify the model. The exponent b probably is unrelated to physical systems but the variance of the fluctuations which gave 7 ~ 1.29 is analogous to magnetic susceptibility. The exponent for the correlation length v w 1.8 may also be relevant even though the model is mean field (each agent interacts with all other agents through the market maker). 4.1.8 Transient In this section we explore the critical phase transition discovered above from a different perspective; namely, that of the transient period. A close inspection of Fig. 4.1 wil l reveal that the price series had an initial transitory phase before it settled down to its steady-state dynamics. In this phase the memory of the initial conditions is slowly erased. The transient can be systematically quantified by recognizing that the price series eventually converged to some steady-state value. Then the transient is for-mally defined as the period until the price first crosses its (logarithmic) average for the entire series. The above definition is satisfactory provided that the steady state price is far from the initial price (on the order of $1, see Fig. 2.3) but becomes irrelevant 87 for large at when the steady-state price is on the same scale as the initial price (see Fig. 4.2). A plot of the transient duration of each run in Dataset 1, shown in Fig. 4.8, exhibits some interesting properties. For large forecast errors ot ^> ac the transient measure is quite unstable—large for some runs and small for others—with some interesting system size dependence but these properties will not be analyzed further because the transient is a poor measure in this region. For small ae another interesting pattern is observed: the transient appears to grow near the critical point declining away from it, on both sides. This behaviour can be explained if the system does exhibit a second-order phase transition at ac. Criticality arises from correlations between agents which extend further and further as the system approaches the critical point. At the critical point the correlations span the entire system such that a perturbation in any element can have a cascade effect which may impact on any or all other elements. However, the correlations are an emergent phenomena in critical systems and require time to set up—the initial transient period. Away from the critical point the correlations do not span the entire system so they require less time to set up and, correspondingly, the transient is shorter. Notice the maximum transient grows with the system size reflecting the longer time required for the correlations to span the system. Critical theory predicts the maximum transient should scale with the system size and Fig. 4.9 confirms it. The associated scaling exponent is estimated to be 1.5 ± 0.1. Since this exponent is greater than one the duration of the transient grows faster than N as the system size is increased—a typical property of criticality, known as critical slowing down. 4.1.9 S u m m a r y In this section the phase space of the Centralized Stock Exchange Model (CSEM) was explored in detail. The main discovery was of a critical value of the forecast error ac = 0.082 ± 0.002 above which the dynamics are relatively stable and below which the price fluctuates over many orders of magnitude, up to the maximum imposed by the investment limit <5. The transition exists for all values of 6 explored ( IO - 5 < 8 < 10 - 2 ) and appears to be universal. Naturally, the transition becomes more pronounced for larger systems. The economic interpretation of this transition is unclear. It is reasonable to expect the price to rise as uncertainty decreases, reflecting increasing confidence in the stock, but it is not obvious why the price would diverge for a non-zero uncer-tainty. No other interesting phenomena were observed as the parameters were ad-88 100000 10000 2 IOOO L i a fl 100 10 "i 1 1 1 r N = 50 N = 100 - - X -JV = 200 - - * -iV = 500 -N = 1000 - • -0 J L J L J I L 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Forecast error ae Figure 4.8: Duration of the transient period in C SEM (Dataset 1) before the price series settles down to some steady-state value. The transient grows near the critical point 0.08. 89 100000 10000 1000 100 10 1 1 1 r Data + Power law exponent = 1.5 ±0 . 1 _L 50 100 200 500 Number of agents N 1000 Figure 4.9: The maximum transient in CSEM appears to scale with the system size with an exponent 1.5 ± 0.1. justed. In the next section, a similar analysis is performed on DSEM. 4.2 D S E M phase space 4.2.1 R e v i e w The Decentralized Stock Exchange Model (DSEM), presented in Chapter 3, was constructed as an alternative to CSEM, discarding the notion of a centralized con-trol which sets the stock price. In DSEM the price is an emergent property of agents placing and accepting orders with each other directly. The agents use a fixed in-vestment strategy and place orders when their portfolios become unbalanced. The fraction of one's wealth each of the N agents keeps invested in the stock is affected by (exogenous) news and price movements. The degree of influence each of these factors has is parameterized by the news and price responsiveness, vn and rp, re-spectively. In Chapter 3 arguments were presented which reduced the parameter space, leaving only N and rp as free parameters. In this section the role each of these parameters plays will be explored. 90 Parameters D SEM Dataset 1 Particular values Number of agents JV Number of runs 50 100 200 500 1000 39 39 39 39 39 Common values Price response rp Total cash C Total shares S News interval rn News response rn Friction / seed Run length ("days") -0.75 to 0.25 by 0.05 0.50 to 0.95 by 0.05 -0.34 to -0.31 by 0.01 0.91 to 0.94 by 0.01 $1,000,000 1,000,000 1 0.01 ± 0.01 (uniformly dist.) 0.02 ± 0.01 (uniformly dist.) random 1,000 Table 4.4: Parameter values for D SEM Dataset 1. Some of the parameters were established in Chapter 3 and are common to all the runs. Dataset 1 explores two dimensions of phase space: JV and rp. 91 4.2.2 Data collection The phase space was explored by varying the price response parameter rp and num-ber of agents N. The choices of system sizes were JV=50, 100, 200, 500, and 1000 agents while the price response was initially explored at a coarse resolution with increments of 0.25 between —0.75 and +0.75 then again with a finer resolu-tion of 0.05 in the ranges —0.75 to +0.25 and 0.50 to 0.95. Finally, the regions rp G [—0.34, —0.31] and rp G [0.91,0.94] were explored at a higher resolution of 0.01 because they exhibited interesting properties (to be discussed). Although rv is an agent-specific parameter it was set to a single value for all the agents reducing diversity somewhat. However, sufficient heterogeneity was maintained through the news response and friction parameters which were each spread over a range of values and each agent was randomly assigned a (uniformly distributed) deviate from within that range. The complete list of parameters is listed in Table 4.4. Each run lasted for 1,000 "days" as defined in Section 3.2.10. Effectively, this means longer runs (more trades) as the number N of agents increases because of how time is scaled. 4.2.3 Phases Fig. 4.10 shows sample price series for a strongly negative value of rp and a strongly positive value indicating a change of character as rp is varied. For negative rp the dynamics are dominated by high frequency fluctuations overlaying a relatively small low-frequency component while the reverse seems to be true for positive rp. This is not entirely surprising because the parameter rp acts as a kind of autocorrelation between successive price movements. When rv is negative, an increase in the price will lower the agents' ideal investment fractions, decreasing demand which usually results in a price drop. Conversely, price increases tend to be followed by further increases when rp is positive because of increased demand. H u r s t exponent To quantify the dynamics, then, an order parameter which characterizes the autocor-relations is called for. The tickwise autocorrelation (between successive trades) was considered but rejected because it was found to be "noisier" than the alternative— the Hurst exponent. The Hurst parameter 0 < H < 1, discussed in Appendix C, quantifies the proportion of high-frequency to low-frequency fluctuations and mea-sures the long-range memory of a process. It is an alternative representation of 92 T 1 1 — n r 1 1 r p = -0.75 i i i i I I 1 I I I 0 100 200 300 400 500 600 700 800 900 1000 Time t rTTTi 1 1 1 1 1 I I I I 0 100 200 300 400 500 600 700 800 900 1000 Time t Figure 4.10: Sample price series for D SEM with N = 100. Negative values of rp (a) produce an anticorrelated series while positive values (b) result in positive autocorrelations. 93 temporal correlations in a time series with H < 1/2 for anticorrelated series and H > 1/2 for positively correlated data (H = 1/2 indicates no correlations). The Hurst exponent was calculated by the application of dispersional analy-sis, a simple and accurate method described in Section C.2.1, to the log-return series of the price sampled at discrete intervals of four "minutes" (interpreting a trading "day" to consist of 6.5 hours). Sampling the data at regular intervals did not signif-icantly affect the estimates of H but it fixed the size of the dataset to be analyzed; for large systems many trades could be executed within a few minutes, producing exorbitantly large datasets which were cumbersome to analyze. (Before discretiza-tion the largest dataset contained some 400,000 points.) With a fixed sampling rate all system sizes generated the same volume of data, around 100,000 points. As the price response parameter rp increases from —0.75 up to 1.00 some interesting properties emerge: for large systems (N > 200) the Hurst exponent is effectively zero below rp « —0.4, indicating very strong anticorrelations in the price series (independent of rp). Suddenly, near —0.4, the anticorrelations break down and the Hurst exponent climbs quickly (with increasing rp) to roughly H 0.4 (suggesting weakly anticorrelated data). It remains relatively constant until rp w 1 where it climbs again, to H « 0.8. Interestingly, the Hurst exponent is less than one half at rp = 0 meaning that price movements in one direction tend to be followed by opposite movements even with no explicit price response coded into the agents' behaviour. This arises from corrections to initial over-reaction to news—agents with extremist news responses react to news by placing orders with atypical prices which are corrected for when moderate agents have an opportunity to trade. Price response greater than unity Apparently, DSEM exhibits three distinct phases: the first two are demonstrated in Fig. 4.10 but the third (r p > 1) is not shown so it is briefly discussed here. It is characterized by very strong positive correlations in the price series, such that the price explodes or crashes exponentially, depending on the direction of the first price movement. Very rapidly (within a few "days") the price hits a boundary imposed by a mechanism identical to that in CSEM: the investment fraction in D S E M is actually constrained by 8 such that 8 < i < 1 — 8. The price is therefore bounded according to Eqs. 2.43-2.44. This constraint was not mentioned in the development of the model because, in DSEM, the limit is 5 = 10~ 1 2, a value chosen only to avoid numerical round-off errors. In practice 8 was not observed to affect the dynamics whatsoever, except in the region rp > 1. 94 a a o a x a> CO l-l 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 _L JV = 50 JV = 100 JV = 200 JV = 500 JV = 1000 I L -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Price response rp Figure 4.11: The Hurst exponent increases with rp in D S E M as expected but with two surprising phase transitions emerging at larger system sizes: one near rp w —0.4 and the other near rp R J 1. 95 0.26 b = 0.185 ±0.016 - — 0.24 0.22 0.2 0.18 0.16 0.14 0.12 0.1 50 100 200 500 1000 System size N Figure 4.12: The best fits of power laws to DSEM Dataset 1 yield the scaling exponents shown. The average exponent is 0.185 ± 0.016. The exponential growth of the price in this region indicates that the Hurst ex-ponent (which cannot be accurately measured because the price reaches the bound-ary too quickly) is identically one H = 1, the same value obtained from a straight line (which this is, since we are analyzing the logarithm of the price). 4.2.4 Phase transition to H = 1 at rp — r\ In this section the phase transition for positive values of rp will be explored. This phase transition is easier to characterize than the one to be discussed in Section 4.2.5 because we have reason to expect the transition to occur at rp = r\ = 1 and may therefore eliminate one adjustable parameter from the fitting function. (The fit was also performed with r\ as an adjustable parameter (not shown) and the results corroborate the hypothesis that the transition is at r\ = 1.) From Fig. 4.11 it is clear that the phase transition at r\ is not first-order (discontinuous) but appears to be second-order (critical). In the last section it was argued that the transition is to H = 1 for rp > r\ so the power-law to be fit takes the form with fitting parameters C and b. The fits were performed over the range 0.75 < rp < 0.95 for each value of N 1 - H(rp) = C(n - rp)b (4.8) 96 giving the exponents b shown in Fig. 4.12 with an average value of b = 0.185±0.016. It is not known to what universality class (defined in Section 4.1.7), if any, this transition belongs. Next we explore the phase transition observed for negative values of rp. 4.2.5 P h a s e t r a n s i t i o n t o H = 0 at rp — r2 Now we turn our attention to the other phase transition in the system, near rp w —0.4. For the smallest systems N < 100 the transition is not discernable in Fig. 4.11 but it comes into focus as the system size is increased. For intermediate agent numbers, N = 200 and 500, the transition looks very much second-order (continu-ous). However, in the largest system N = 1000 the transition is quite abrupt, so particular care must be taken to establish whether it is first- or second-order. Comments on phase transitions The distinction between first- and second-order transitions is not merely academic; it can greatly enhance our understanding of the underlying dynamics. First-order phase transitions, characterized by a discontinuity in the order parameter, occur via nucleation: small pockets of the new phase emerge within the old phase and grow until the entire system is in the new phase. On the other hand, second-order transitions, characterized by a continuous order parameter with a diverging deriva-tive, exhibit system-spanning correlations such that the entire system undergoes the transition as a whole. In the context of the models presented here, correlations would indicate cor-related behaviour amongst investors and nucleation would refer to a small sub-group of investors acting differently from the larger population. Classification of r2 To explore the phase transition in detail some more data were collected at interme-diate system sizes as shown in Table 4.5 for a total of seven different values of N. The datasets were analyzed by attempting to fit both first-order and second-order transitions. Assuming a first-order transition the fit becomes trivial: we can just assume a linear dependence on rp in the neighborhood of r2. (In this case linearity was observed over rp G (—0.3,0.1) for all runs.) The transition point is simply read off the graph from the largest system N = 1000 (the transition is resolved with greater accuracy as N increases) giving r2 = 0.33±0.01. The magnitude of the discontinuity in the Hurst parameter H at r 2 is then AH{r2) = 0.281 ± 0.008. 97 Parameters DSEM Dataset 2 Particular values Number of agents N 300 700 Number of runs 16 16 Common values Price response rp Run length ("days") -0.35 to -0.30 by 0.01 -0.25 to 0.25 by 0.05 500 Table 4.5: Parameter values for D S E M Dataset 2. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.4) exploring a few other intermediate system sizes. We now consider the possibility that the transition is second-order. If so, then a power-law dependence H(rp) = C(rp - r2)b (4.9) should characterize the behaviour near the transition r2 with adjustable parameters C and b. Sample fits for N = 500 and 1000 are shown in Fig. 4.13 with a simple linear fit representing a first-order transition for comparison. Notice that the N — 500 system is better described by a critical transition but N increases to 1000 the transition becomes sharper, more like a first-order transition suggesting that the scaling behaviour is only a finite-size effect. Fini te-s ize scaling Assuming criticality, each system in Datasets 1 and 2 were fit to Eq. 4.9 and the best-fit exponents b are plotted in Fig. 4.14. Clearly, the exponents exhibit a trend as the system size N increases. On a log-log graph the trend appears linear suggesting that the exponents scale with the system size as yet another power law b oc Nm. (Be warned, this is not a traditional—nor rigorous—finite-size scaling argument.) The scaling exponent is found to be m = —0.25 ± 0.04 meaning that the exponent b will be halved every time the system size is scaled up by a factor of 16 and in the thermodynamic limit (N —> oo) b drops to zero. To understand what is going on here, consider the general scaling function y = c(x- xc)b (4.10) 98 0.5 0.4 h 0.3 0.2 0.1 0: -0.1 (a) —r _L JV = 500 H-First-order --Second-order - -J I L - M -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 Price response r p 0 0.1 0.2 0.3 0.5 0.4 0.3 0.2 0.1 04V -0.1 (b) T T i _1_ _L _L JV = 1000 I—K First-order Second-order J I I -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 Price response rp 0.1 0.2 0.3 Figure 4.13: Sample fits of first- and second-order phase transitions to 7V = 500 (a) and JV = 1000 (b) near ri in D S E M show that the power-law fits better for small JV but the first-order prevails for larger systems. 99 50 200 300 System size N 500 700 1000 Figure 4.14: As the system size N increases the critical exponent b tends to zero. The line represents a power-law fit b oc Nm giving an exponent m = —0.25 ± 0.04. which has a slope y'= bc(x-xc)b-\ (4.11) If we demand that the scaling function mimic a first-order transition, requiring that both y > 0 and y' approach constants as x —> xc then the scaling exponent must vary such that b=V-{x-xc). (4.12) y So a second-order transition "mimics" a first-order in the limit b —> 0. Returning to the transition at rp = r2 in DSEM we see that the apparent crit-icality is an artifact of finite simulation size and in the limit iV —> oo the transition is first-order. In te rmi t tency As discussed above, there are important consequences of knowing a transition is first-order. The foremost is that fluctuations are local, they do not spread through-out the entire system (as is found for critical points). Another consequence is that the transition is a change of quality not simply of quantity. That is, since the order parameter exhibits a discontinuity the nature of the system changes qualitatively, not just quantitatively. A third important feature of first-order transitions is nucle-ation. Near the transition stochastic fluctuations can often give rise to small pockets 100 which exhibit one phase while the greater system is within the other phase. A good analogy to keep in mind is a pot of boiling water: steam bubbles form on the bottom and sides of the pot where small variations of the surface exist. The last interesting property of first-order transitions that wil l be mentioned here is intermittency. As discussed above, near the phase transition bubbles of one phase form at nucleation points within the other phase. In a spatially-extended system this causes intermittent periods of either phase at any particular point in the system. Since D S E M is nonspatial the intermittent behaviour is captured in the price series which is observed to consist of periods of low activity separated by periods of high activity, as demonstrated in Fig. 4.15. 4.2.6 Summary In this section we explored the phase space of the price response parameter rp and the number of agents N in D S E M . Two phase transitions were observed: at rv = r\ = 1 D S E M undergoes a critical transition to perfectly correlated price movements (the Hurst parameter H goes to unity), and at rp = r 2 ~ —0.33 a first-order transition is observed. Below the transition strong anticorrelations in the price series are observed but above it only weak anticorrelations exist. Near the transition point the system spends time in both regimes giving rise to clusters of high volatility. 4.3 Number of investors Before comparing the models with empirical data we should complete our explo-ration of the phase space. In both C S E M and D S E M we explored a variety of system sizes (number of agents N) in order to enhance the resolution of the phase transitions. In this section we wil l re-evaluate this data in light of the discovery that in many market simulations the dynamics reduce to being semi-regular in the limit of many investors [62,63]. Most past simulations were performed with investors numbering between 25 and 1,000 [28-30,32,33,47] with the largest ranging between 5,000 and 40,000 [17,34,35]. Even the largest of these are minuscule when compared with natural systems exhibiting phase transitions, which have on the order of 10 2 3 particles. It is not clear that these market models (including C S E M and D S E M ) are at all interesting in the limit of many investors. In fact it has been discovered that the dynamics of many of these models become almost periodic as the number of investors grows [62,63]. Whether this behaviour detracts from the models is uncertain. The models can only be tested by comparison with real markets but even the largest markets 101 g 10000 cu 6 > cc3 P 400 600 800 Time t (days) 1200 200 400 600 800 Time t (days) 1000 1200 Figure 4.15: The price series (a) and daily volume (b) of DSEM with N = 200 and rp = -0.45 is a good example of intermittency. The dynamics fluctuate between two phases. 102 Parameters C S E M Dataset 3 Number of agents N 10,000 10,000 10,000 Forecast error ut 0.01 0.08 0.15 Investment limit 8 IO" 2 IO- 2 IO" 2 Number of runs 1 1 1 Run length (time steps) 10,000 10,000 10,000 Table 4.6: Parameter values for C S E M Dataset 3. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.1) with many agents N = 10,000. cater to an infinitesimal number of individuals when compared with natural systems. After all, the natural world consists of only a few billion "agents" of which only a minute fraction are actively involved in stock trading. Therefore, one may argue that these models do not need to exhibit rich behaviour in the limit of many investors in order to be realistic. They need only exhibit realistic dynamics on the same scale as real markets. Nevertheless, it is interesting and useful to understand how the models pre-dict the dynamics to evolve with increasing investor numbers. One advantage is that testable predictions may be made with regard to how a market wil l scale as more investors come aboard. In the last few years the number of investors in the markets have grown substantially, mainly due to the rise of the internet which allows traders to monitor their portfolios in (almost) real time and execute trades promptly. The only research the author is aware of to explore the consequences of growing markets is a model which suggests that fluctuations increase with system size [18]. Thus, it is useful to explore the effect of increasing the number of agents in both the Centralized and Decentralized models. 4.3.1 Centralized Stock Exchange Model To test the effect of changing the number of agents N thoroughly a new dataset (see Table 4.6) was collected with markets containing N = 10,000 agents in three regimes: far below the critical point o~e = 0.01, near the critical point at = 0.08, and above the critical point at = 0.15. The price series for the first and last of these are shown in Fig. 4.16 and 4.17, respectively. Far below the critical point the dynamics appear to be largely invariant under change of the number of investors. Interestingly, the dynamics do display semi-periodic intervals, as demonstrated in Fig. 4.16(b), but this occurs for all system 103 0) N = 100 N = 1000 TV = 10000 -Y 10000 Time i 1000 100 10 1 0.1 0.01 0.001 1 '> , 1 . ' ' . ' 1.; ..4 ,' 1 I I I I ; A- -1 i i \i n , \ i , - ' " J r i ' -(b) 1 1 1 1 AT = 100 N = 1000 N = 10000 i i I 2600 2700 2800 2900 3000 3100 3200 3300 3400 Time t Figure 4.16: The price series of C SEM for <je = 0.01 (a) appears unaffected by chang-ing the number of agents N. In particular, occasional semi-periodic fluctuations (b) are observed for all system sizes. 104 JV = 100 JV = 1000 " JV = 10000 o . i I 1 1 1 1 1 0 2000 4000 6000 8000 10000 Time t Figure 4.17: The price series of C S E M for ae = 0.15 exhibits smaller fluctuations and a lower mean as the system size increases. (The lower mean may simply be because the system has not reached a steady state yet.) sizes and is not a stable phenomenon even for the largest system so it does not appear that the dynamics converge to being almost periodic in the limit JV —> oo as found in other models [62,63]. Above the critical point the most obvious feature is that the fluctuations decline with the number of investors (see Fig. 4.17). But this is to be expected since the trading price results from the interactions between JV "noisy" investors. Therefore we expect the fluctuations (measured as the standard deviation of the log-price) to decrease as 1 / y/N, a hypothesis which was found to hold fairly well for cre = 0.15 between JV = 50 and JV = 10,000 but which held better further from the critical point (as can be seen in Fig. 4.7 which demonstrates the variances multiplied by system sizes nearly collapse to a single curve). The main conclusion to be drawn from this analysis is that the dynamics of C S E M do not appear to become trivial or converge to a semi-periodic pattern as the number of investors becomes infinite. Above the critical point fluctuations do diminish but as one approaches the critical point they reappear and below the critical point the dynamics are largely independent of the number of agents. 105 4.3.2 Decentralized Stock Exchange Model Collecting data for N = 10,000 in DSEM proved prohibitive since each run would require a full week of computer run-time to collect sufficient data for analysis. This occurs because D S E M requires on the order of N2 operations per simulation "day" (N calls per day with N potential replies each) whereas C S E M grows linearly with N. Therefore the data collected in Table 4.4 will be used here. Again, we observe the price series for a variety of system sizes—this time N = 50, 200, and 1000 agents—and regions of phase space spanning the phase transitions—r p = —0.75, 0.00, and 0.90—to estimate how the dynamics would change as the system size grew without limit. In each region it was found that the fluctuations actually grew with system size, especially below the first-order transi-tion Tp < T2 ~ —0.33. The two plots in Fig. 4.18 show the price series only for N = 50 and N = 1000 (N = 200 exhibited predictably intermediate fluctuations and was not plotted to reduce clutter). The plots show that the fluctuations are somewhat larger for the larger system when rp = 0.90 and significantly so for rp = —0.75. Thus the dynamics do not reduce to semi-regular in the limit of many investors. 4.3.3 Summary In this section we tested the effect of increasing the number of agents in both C S E M and D S E M in light of recent research that indicates that some market models become quasi-periodic in the limit of many agents [62,63]. It was found that fluctuations did decline in C S E M when the forecast error at was significantly above its critical value but the dynamics were largely invariant under variation of the number of agents below the critical point. In DSEM the fluctuations actually grew as more investors were introduced, especially below the first-order transition at a price response of rp ss —0.33. This makes an interesting and testable prediction regarding empirical markets: it indi-cates that fluctuations in stock prices should be greater in larger markets. (Some caveats are required: the size of the market is measured in terms of the number of independent investors (a fund group would be interpreted as a single investor) and not the total value of the outstanding stock. Recall that in these simulations the total cash and shares were held fixed: with more investors each held a smaller portion of the total resources.) Since the number of investors does not strongly affect the dynamics we are free to choose values which correspond well with observed market fluctuations. Com-paring the fluctuations with empirical data will be the subject of the next chapter. 106 Figure 4.18: In D S E M the price series does not get more regular as the system size is increased—in fact the fluctuation grow. This is especially true for rp = —0.75 (a) but it is also indicated to a lesser degree at rp = 0.90 (b). 107 Chapter 5 Analysis and Results: Empirical results In the last chapter we explored the phase space of the Centralized and Decentralized Stock Exchange Models (CSEM and DSEM, respectively). This chapter is concerned with contrasting the data from these models with empirically known qualities of real markets. Some of the properties we hope to uncover are leptokurtosis in the price returns and correlated volatilities. As we will see, the emergence of these properties is closely related to the phase transitions discovered in the last chapter. 5.1 Price fluctuations We begin by exploring the distribution of price fluctuations. 5.1.1 Background It has long been known that stocks exhibit stochastic fluctuations in their price histo-ries. Originally it was hypothesized that the markets exhibited (discrete) Brownian motion and therefore had Gaussian-distributed price increments [64]. Later this was adapted to account for the strictly positive nature of stock prices via geomet-ric Brownian motion [46] with the logarithm of the price following a random walk. Much theoretical work on derivative pricing assumes geometrical Brownian motion including the famous Black-Scholes equation [65]. It was a startling discovery, then, when Mandelbrot pointed out that, empir-ically, the logarithm of price-returns did not have a Gaussian distribution [5,9]. In fact, on short timescales, large (exceeding a few standard deviations) fluctuations occurred much too frequently to be explained by the Gaussian hypothesis. These 108 large fluctuations contribute to the tails of the distribution resulting in "fat tails". Mandelbrot proposed the correct probability distribution function (on the logarithmic scale) was not the Gaussian but its generalization—-the stable Levy distribution. Levy distributions (see Section C.2.3) drop off as power laws P(x) ~ : J + T ' M ~> 0 0 ( 5 - 1 ) x for 0 < a < 2, resulting in fatter tails than the Gaussian (a = 2). A n attractive feature of this hypothesis is that it scales: that is, the distribution (and the exponent a) remains the same whether measured hourly, daily, or even monthly. Mandelbrot measured the daily and monthly distribution of returns from cotton prices and found both fitted well to a Levy distribution with exponent a — 1.7 [9]. Studies of other markets have had similar results concluding 1.4 < a < 1.7 [4,10, and references therein]. (Appendix B demonstrates it is possible to simulate fat tails by regular sampling of a discrete Brownian process but Palagyi and Mantegna [66] demonstrate this is not responsible for the fat tails observed in return distributions.) Since then the adequacy of the Levy distribution to describe price fluctua-tions has been called into question because it implies that the fluctuations have an infinite variance whereas experimental evidence indicates it is probably finite [7,67]. (Recent studies indicate the tails of the return distributions fall off fast enough that the variance is finite.) Related to this is the observation that scaling is violated on long timescales (of more than a week [4]) where the distribution converges to a Gaussian (because the Central Limit Theorem applies if the variance is finite). This discrepancy was initially resolved by arbitrarily truncating the power law with an exponential weighting function for large events [4, 68]. According to this theory, gradually truncated Levy flight (GTLF), the tails of the cumulative distribution of (normalized) returns r follow C(r) r~aexp M < ic 1 | M (5-2) with an exponential decay beyond some cut-off IQ- While this did improve the quality of the fit to observed returns it did so at the expense of three new fitting parameters—the cut-off lc, the decay rate k, and the power of the exponential P—bringing the total to five adjustable parameters. An appealing alternative is the idea that the tails of the return distribution do have a power law but with an exponent a « 3 . (An interesting, and testable, con-sequence is that moments higher than three—such as the kurtosis—are divergent.) Since the exponent is greater than two the distribution is not stable and converges 109 to a Gaussian on long timescales. It also implies that the variance is finite. A very comprehensive analysis was performed across 1,000 companies yielding a (huge!) dataset of 40 million returns (Mandelbrot had only available 2,000 points) and the results strongly support the inverse cubic (IC) hypothesis [6,69]. To be precise, the theory is that a Levy law (a « 1.4) applies for small to intermediate returns (less than a few standard deviations) but then the distribution crosses over to the inverse cubic for larger returns. Thus we have two fitting pa-rameters in either scaling regime and a crossover point, for a total of five adjustable parameters, the same as GTLF . Although this research [6,7,69] is excellent the prac-tical application of the theory is somewhat cumbersome and a simpler alternative exists. 5.1.2 Alternative: Decaying power law In this section an alternative which appears to explain the data almost as well but with only three parameters is presented. Koponen [70]—expanding on work done by Mantegna and Stanley [71]—demonstrated that a power law with a smooth expo-nential cutoff has Levy increments on short timescales which converge to Gaussian after a long time. This process is equivalent to G T L F with Ic = 0 and (3 = 1 so there exists no cut-off point, the exponential truncation applies for all returns. The hypothesis is that the tails of the cumulative return distribution obey C ( h | > r) ~ r-ae~r/rc (5.3) where a is the scaling exponent and rc is the decay constant, which sets a character-istic scale over which the power law dominates—in the limit r c —> oo Mandelbrot's pure Levy flight hypothesis re-emerges. (This functional form has been observed to accurately describe the distribution of fluctuations in the "game of Life" [72].) The use of this truncation hypothesis must be justified in light of the over-whelming empirical evidence supporting the inverse cubic hypothesis [6,7,69]. To do so it is necessary to explicitly formulate the goals of this section: (1) to determine if the return distributions generated by the models scale with an exponent near a S3 1.4 over some range of returns, and, if so, (2) to estimate the range of returns over which the scaling holds. In doing so we can determine if the models reproduces truncated Levy flight observed empirically. However, the amount of data collected will be insufficient to adequately determine how the truncation occurs, whether it is an inverse cubic or exponential decay. Further, this detail is not of central importance to this work so it is left for future research. 110 Therefore we are free to choose the most convenient form for the truncation factor, that given by Eq . 5.3. This form has a few technical advantages over the previously discussed methods. The first is that it requires a fit of only three pa-rameters and the fit is linear in each of them when performed on the logarithmic scale. Hence, only one optimal solution exists and a number of algorithms exist for arriving there [20, Ch . 15]. Secondly, it is a single continuous function so it requires no manual searching for a crossover point between two regimes (which is an art in itself). In fact, it automatically determines the crossover from power law behaviour to exponential decay with the fitted parameter rc. The larger rc is, the greater range the power law is valid over. Fig. 5.1 contrasts the fit of the inverse cubic hypothesis (a) with the decaying power law (b). Both fit the high frequency exchange data quite well, but recall the inverse cubic requires two additional parameters to do so. The decaying power law hypothesis indicates a power law with exponent a « 1.42 applies for returns less than r c « 8.11 (standard deviations), beyond which the power law is attenuated by an exponential decay. Also shown is the distribution of daily returns for the Nasdaq Composite in-dex over almost 16 years [73, ticker symbol= AIXIC] in Fig. 5.2. This figure demon-strates an important point: when the crossover to the exponential occurs at a small value of rc (less than a few standard deviations, as in the positive tail) the estimate of the Levy exponent is unreliable. The larger r c gets, the more meaningful the value of a becomes. While the claim that real market fluctuations actually obey Eq. 5.3 is largely unsubstantiated, a weaker claim that this functional form is an effective method to test for scaling in market data is also being made based on two observations: (1) it is systematic and does not require any intervention (tuning of parameters) on the part of the researcher, and (2) it characterizes both the range and exponent of the Levy region well with the parameters rc and a, respectively. For these reasons this method wil l be used in the following sections to test for scaling in C S E M and D S E M . 5.1.3 Methodology To determine the distribution of returns over some time interval At the price series will be regularly sampled producing a series of returns \P((i + i)Aty r, = In p{iAt) (5.4) 111 0.1 1 10 100 Normalized returns 0.1 1 10 100 Normalized returns Figure 5.1: Ten minute returns (86,000 data points) of the Swiss franc-U.S. dollar exchange rate [2] (negative tail) compared to power law with crossover to a ~ 3 (a) and power law with exponential drop-off presented in this section (b). 112 PI .2 r2 CO 'O _> s o .2 CO -5 CD > a o i i i 1 1 1 1 Nasdaq daily returns a « 1.10, rc « 1.56 Gaussian • I I I I I _i i i i i 11 0.1 10° 10" 1 i -i o - 2 1 -i o ~ 3 I O - 4 10~ 5 10" 1 10 Normalized returns I 1 I I I I | 100 Nasdaq daily returns a « 1.54, r c « 3.6 Gaussian Negative tail -i _i i I _i i i i I I I I I 0.1 1 10 100 Normalized returns Figure 5.2: Both tails of the cumulative distribution of daily (normalized) returns for the Nasdaq Composite index between October 1984 and Jun 2000 (4,000 data points) fit well to a decaying power law. The power law is truncated by two standard deviations in the positive tail but extends almost to four in the negative tail. 113 The mean r and standard deviation oy will be computed and the returns normalized The cumulative distribution of the normalized returns will be calculated and compared with the cumulative Gaussian (the error function). Of particular interest are the tails of the distribution which are hypothesized to obey the scaling functions C(fi > r) ~ f~"Q+ exp(—r/rCi+), r —>• + 0 0 (5.6) C(fi < f) ~ \r\~a~ exp(— |f| /r C ;_), r —> —00 (5.7) with exponents a+ and a_ for the positive and negative tails and crossover val-ues rC;+ and r C i_. (The cumulative distribution is preferred because cumulating effectively "smooths" the data, making it more amenable to analysis.) The adjustable parameters a± and rCt± will be acquired via a Levenburg-Marquardt nonlinear fit to logC for returns exceeding \f\ > 1 since we only want to fit the tails. (A linear fit is also possible with a suitable choice of parameters.) A small value of rc will be interpreted to mean that no scaling exists and the parameter a is irrelevant. 5 . 1 . 4 Centralized stock exchange model For this experiment we return to Dataset 1 (Table 4.1) and apply the analysis to the largest system N = 1000. Fortunately, each run consists of over 30,000 days worth of data so the scaling can be tested on a wide range of timescales. (In the analysis, the initial transient will be discarded.) Since CSEM contains the free parameter oe (the forecast error) we must sam-ple a suitable spectrum of values in our search for scaling. Obviously, the dynamics around the critical point oc « 0.08 (from Fig. 4.6(a)) is of particular interest so samples are chosen which span the critical point. The scaling regime is indicated by the characteristic return rc: a small value indicates that the exponential drop-off occurs for small returns (before scaling be-comes evident) and a large value indicates the power law applies over a broad range of returns. In this experiment a threshold value of rc = 3 was observed to adequately distinguish between distributions which scaled and those which didn't. This limit was also used in Ref. [7] to estimate the scaling exponent a. Plotting the characteristic return rc for a variety of forecast errors ot (Fig. 5.3) shows that scaling is only observed well below the critical point oc. The average scaling exponent for all distributions with r c > 3 is a = 0.8 ± 0.4, which compares poorly with the empirical value 1.4 < a < 1.7. 114 Figure 5.3: Scaling in the distribution of returns is only observed well below the critical point ae <C ac in CSEM as indicated by large values of the characteristic return rc. For small ot scaling occurs in both tails for daily returns but only for negative returns in monthly returns. 115 Parameters D SEM Dataset 3 Particular values News response r„ 0.01 0.01 0.001 Price response rp -0.75 to 0.75 by 0.25 0.95 0.99 Number of runs 7 1 1 Common values Number of agents N 100 Run length ("days") 20,000 Table 5.1: Parameter values for DSEM Dataset 3. These runs are a variation of Dataset 1 (all unspecified parameters are duplicated from Table 4.4) run out for longer times (roughly 80 years). Also notice that for rp = 0.99 the news response was reduced by an order of magnitude to keep the price within reason. For positive returns scaling is found to disappear as the sampling interval is increased from daily to monthly (20 days), as expected. Interestingly, the same is not true for negative returns: instead the returns scale for even more values of a( on a monthly timescale than they do daily. The run at = 0.03 sampled monthly is an interesting case because it exhibits a strong asymmetry between up and down moves, having a characteristic return above the threshold for negative returns and below for positive returns, so its return distribution is plotted in Fig. 5.4. This effect is due to an asymmetry between up- and down- movements which arises from the artificial price cap imposed by the parameter S. See, for example, Fig. 4.1(b) which shows that occasional large crashes occur when the price approaches its upper limit while upwards movements are more normally distributed. Unfortunately, the above only serves to further call in to question the validity of the C S E M model because scaling behaviour is only observed below the critical point, in the regime where we have already seen (Fig. 4.1, for instance) the dynamics are completely unrealistic. So we turn to DSEM in the hope that it is a more realistic model of market dynamics. 5 .1 .5 Decentralized stock exchange model In this section the distribution of returns in DSEM is analyzed. To get the limit distribution a large quantity of data is required so DSEM was run with the parameter values from Table 5.1, the most notable feature being that the run length was extended from 1,000 days ( « 4 years) to 20,000 days ( « 80 116 .2 CD > S o .2 => rS GO -3 _> a o 10 - 6 1 1—I—I—I I I l-l ae = 0.03 rc w 0.84 Gaussian Positive tail • • i • • • • • _i i i 'i i i 111 _i i i • • ' •' 0.1 10° io-11-io-2 I O - 3 10~ 4 10~ 5 10 - 6 -I—I I I I I 1 10 Normalized returns — i 1—i—i—i i 111 100 1 — i — ' cr£ = 0.03 rc S3 5.4, a K 1.1 Gaussian Negative tail LL I i 0.1 1 10 Normalized returns 100 Figure 5.4: For at = 0.03 in C SEM (TV = 1000) the distribution of positive (monthly) returns (upper) almost converges to a Gaussian but still has a slightly heavy tail. The negative returns (lower), however, exhibit scaling for r < rc S3 5.4 with an exponent a ~ 1.1. 117 Figure 5.5: D SEM only begins to exhibit scaling, as measured by a characteristic return exceeding three standard deviations, for price responses well below the first-order transition r2 = —0.33 and as the price response approaches the critical point n = 1. years). The distribution of price returns (sampled "daily") was then cumulated and fitted with Eq. 5.3. (Issues raised in Appendix B regarding sampling are addressed on page 122, below.) The characteristic size of the returns for the different values of rp is shown in Fig. 5.5. Notice that they are almost exclusively below the threshold required to establish scaling indicating that the distributions do not exhibit scaling properties observed empirically. The worst region appears to be intermediate values of TP with better performance near the endpoints. Recall that DSEM exhibits three behaviours as rp is varied: (1) when rp > r\ = 1 the price is perfectly autocorrelated—every movement is followed by another (typically larger) movement in the same direction; (2) in the intermediate region r\ > Tp > r2 the price series looks most realistic and has (at most) weak autocorrelations on long timescales; and (3) when rp < r2 ~ —0.33 the price fluctuations have a strong negative autocorrelation extending over all timescales. Thus, the price fluctuations appear only to obey (realistic) scaling distributions in the domains precisely where the dynamics were observed to be unrealistic! To reconcile this dichotomy we need to expand our experimental parameter space. 118 Parameters D S E M Dataset 4 Number of agents N Lower price response rj G Higher price response Number of runs 100 -0.75 to 0.75 by 0.25 0.50 to 1.50 by 0.25 35 20,000 Run length ("days") Table 5.2: Parameter values for D S E M Dataset 4. These runs are characterized by a two-point distribution of the price response. Each agent chooses rp = ri„ or r-/,j with equal probability. (Al l unspecified parameters are duplicated from Table 4.4.) Two-point price response Thus far the price response had been fixed at a single value for all the agents. But since realistic dynamics (characterized by both the lack of strong memory effects and scaling in the distribution of returns) were not to be obtained by any single value of rp I was forced to allow multiple price responses. Originally I explored allowing rv to span a broad range which covered all three phases but the range required to get scaling was so large that most of the agents were either in Phase 1 (r p > r\) or in Phase 3 (rp < r2) with only a few in Phase 2. Therefore it seemed easier to just require that rp take on one of only two allowed values, r[0 and r^. D a t a analysis Thirty five runs were executed spanning a two-dimensional region of parameter space with each agent choosing a price response of either rj G or (with equal probability). The lower price response was varied between —0.75 < rj„ < 0.75, spanning the first order phase transition at r2 ~ —0.33, and the upper value was varied between 0.50 < r^i < 1.50, spanning the critical point at r\ = 1, as indicated in Table 5.2. For each run the cumulative distributions of returns (both positive and nega-tive) were calculated and the tails (returns exceeding one standard deviation) fitted to a decaying power law (Eq. 5.3). As before, the tail was determined to scale if the decay constant rc exceeded three standard deviations, otherwise the region over which a power law is suitable is insubstantial. For returns smaller than the characteristic return rc the exponential in Eq. 5.3 is almost flat so the power law dominates. Larger values of rc indicate that scaling spans a greater range of returns. As shown in Fig. 5.6(a), scaling is observed for some parameter combinations. To test which parameter produces scaling a linear 119 81 27 0.33 . ( b ) 1 1 1 1 1 + -+ + + -+ * t + + t + - + + ± -1 - -HH-+ l DSEM Dataset 4 Threshold rc = 3 I + l 0.5 0.75 1 1.25 1.5 Upper price response Figure 5.6: The characteristic returns in D SEM with a two-point distribution of price responses (r;0 and r/,,-) exceeds the required threshold of rc = 3 when r^i is large (a). Neglecting the dependence on r; 0 (b) it becomes clear that the characteristic return grows exponentially with the upper limit r^, crossing the threshold near r^i ~ 1. 120 Variable Correlation with log rc rio 78% 76% -41% Table 5.3: Linear correlation analysis between said variable and the logarithm of the characteristic return from D S E M Dataset 4. The correlation is strongest with the upper limit of the price response r^,-. correlation analysis was performed, as shown in Table 5.3, between the logarithm of the characteristic return and a few obvious possibilities: the upper price response r/ij, the lower limit r j 0 , and the spread r^ — ri„. The best predictor for scaling over a large range of returns was found to be rnj with a correlation of 78%. (The spread also correlated well but, as wil l be seen later in this chapter, is unable to account for other empirical qualities of the market.) Fig. 5.6(b) shows the dependence of the characteristic return on the upper price response. Notice that 85% of the data points lie in the upper-right and lower-left quadrants if axes are drawn at rc = 3 (horizontal) and 1 < < 1.25 (vertical). Thus, the strongest condition for scaling appears to be that the upper price response r/jj > 1, above the critical point T\ = 1. Of all the runs which exhibit scaling the average scaling exponent was cal-culated to be a = 1.64 ± 0.25, in line with the empirical value a pa 1.40 ± 0.05 [10]. Recanting continuous heterogeneity Some other market models characterize agents by types: either fundamentalists or chartists. In the derivation of D S E M I claimed (Section 3.3.3) that allowing a continuous range of the parameter rp would be superior, reflecting a greater diversity of opinion as would be expected in the real world. However, as we saw above, D S E M is only able to capture the essence of real market fluctuations (scaling) when rp is set to two discrete values, rather than a continuum. (As mentioned before, a continuous range of rp can also produce scaling but only if the spread is set to a much greater value than required by the two point distribution.) In other words, scaling appears to depend on the separation of agents into "types". 121 Timescales An interesting empirical property of scaling in real markets is that the exponent appears to be invariant when measured on different timescales (except for timescales exceeding a few days when the distribution converges to a Gaussian). To test if this also occurred in DSEM the run r j 0 = 0.00, r/,, = 1.25 was chosen for further analysis. This run was chosen because it was observed to exhibit scaling on timescales of one day, and because a value of ria = 0 seems "natural"—it divides the population into two types: one of which are pure fundamentalists, not responding to price fluctuations at all. This run was sampled at ten different intervals ranging from 0.02 days (roughly 8 minutes, assuming a 6.5 hour trading day), to 20 days (one month, ne-glecting weekends). As can be seen in Fig. 5.7(a) the characteristic return increases with smaller sampling intervals, and drops below the threshold for detecting scaling when the interval exceeds 5 days (one week). On longer timescales the distribution indeed converges to a Gaussian (not shown). As expected, (when scaling is detectable) the power law exponent does not appear to depend on the sampling interval, fluctuating around a = 1.55 ± 0.11. Tickwise returns Appendix B demonstrates that it is possible to generate the illusion of fat tails in a discrete Brownian process simply by sampling it at regular intervals, as was done for DSEM. Therefore, it is important to establish that the fat tails discussed above are not an artifact of sampling, but are inherent to the fluctuations themselves. This is easily tested by simply sampling the process in trading time rather than real time. That is, a sample is taken directly after every trade (or tick). Clark [74] raised the issue of whether regular sampling may be producing the fat tails observed empirically but Palagyi and Mantegna [66] demonstrated fat tails are still observed when sampled in trading time. To test if this was also the case for DSEM Fig. 5.6(a) was reproduced, using trading time instead of daily samples, in Fig. 5.8. Clearly, scaling is still evident (in the same region of parameter space) when sampling in trading time so it is not an artifact of the sampling interval. 5.1.6 Summary The distribution of price returns, measured as the logarithm of the ratio of successive prices, was the subject of investigation in this section. Two theoretical curves meant to describe the tails of the distribution were presented. An alternate form was also presented, whose main advantages are that it 122 27 0.33 i 11111 i—i—i—IIII X + + x + + X -I—I—I I I 1111 1—I—I I I 111 Positive returns + Negative returns X Threshold rc = 3 X + X + X + X (a) ' ' i i i • • • i i i • i i i • i i JLUJ_ _ l I 1 * I I I I 0.01 0.1 1 10 Sampling interval (days) 100 1.9 1.8 e "S 1-7 <v a % 1.6 g 1.5 h £ 1.4 1.3 1.2 1 r i i i I I 11 1 r —1 1 1 1 111 1 1—1 1 1 1 111 1 1—1 1 1 1 11 Positive returns + Negative returns X 1.55 ± 0 . 1 1 - X X + X X X X . . +. _ _ * _ _ + + . " — + + + X (b) 1 1 1 1 111 1 1 + ' 1 1 ' 1 1 1 ' ' 1 ' ' ' l - L J - L L 0.01 0.1 1 10 Sampling interval (days) 100 Figure 5.7: The characteristic return rc (a) and scaling exponent a (b) for D S E M with r;0 = 0.00 and r^i = 1-25. The characteristic return grows as the sampling interval is shortened, but the scaling exponent a is fairly constant (1.55 ± 0.11). 123 D S E M Dataset 4 Threshold rc = 3 rc Figure 5.8: Fitting the decaying power law to DSEM with a two-point price response using returns on individual trades (rather than per unit time, as in Fig. 5.6) shows scaling still occurs in the same region of parameter space. is linear in its parameters (of which there are two fewer than the competing models). This curve appears to describe empirical data quite well, but even if it is found to be inaccurate, it is still useful because it provides a simple, mechanical method for estimating over what range the power law dependence applies (|r| < rc) and the scaling exponent itself. Both C SEM and DSEM were tested for "fat tails" with this functional form with the requirement that the scaling extend for at least three standard deviations (r c > 3) to be deemed significant. CSEM was found only to exhibit scaling for ot <C <TC, in a region of parameter space where the dynamics are known to be unrealistic. DSEM provided some surprises: if all the agents maintained an identical price response parameter rp then scaling did not occur except as rp moved into regions known to produce unrealistic dynamics. However, if two values of rp were allowed, with each agent randomly picking one or the other, scaling was observed when the responses spanned the critical point rp = r\ = 1. A test for a variety of values of rp established a scaling exponent a = 1.64 ± 0.25, with returns sampled daily, comparing favourably with the empirical quantity a = 1.40 ± 0.05 [10]. The scaling exponent was shown to be robust, independent of the sampling interval. In short, DSEM was able to produce realistic return distributions while 124 C S E M was not. Furthermore, D S E M implied the mechanism which produces scaling is somehow related to having different "types" of agents interacting with each other: fundamentalists versus chartists, for instance. The crucial determinant for scaling appeared to be that the range of parameter values spanned the critical point. In the next section we explore a related phenomenon: autocorrelations in the price series. 5.2 Price autocorrelation 5.2.1 Background: The efficient market hypothesis In this section we explore the possibility of serial correlations in the price series. As discussed in the last section it has long been thought that the market behaves as a random walk [46,64]. Related to this is the efficient market hypothesis (EMH) which, in its weakest form, states that new information received by investors is reflected in the stock's price almost instantly [75]. Since new information cannot be predicted, neither can the future price of the stock so price movements should be independent of their histories. If this were not the case then there would be a riskless way to exploit one's foreknowledge for profit (an arbitrage opportunity). The presence of transaction costs allows an even weaker form of the EMH: there may exist arbitrage opportunities (autocorrelations) but they are so small (brief) that any potential profit would be absorbed by commissions. This form of the E M H is supported by evidence: a number of studies have concluded that autocorrelations in the price series decay exponentially over a scale of only a few minutes [4,33,36,76-81] (or a few trades [66]). In this section we will look for correlations in the price series generated by DSEM. (Since it was established in the last section that C SEM is not a realistic market model an autocorrelation analysis of its price series will be dispensed with.) Of interest are both short- and long-range correlations. 5.2.2 News Before analyzing the correlations in the price series it should be reiterated that the price series is driven by news releases such that the logarithm of the price p(t) roughly follows the cumulative news n (t) as logp(t) oc V{t). (5.8) In Section 3.6.1 it was argued that the proportionality constant should be r„/(l — rp) but with a two-point price response distribution this is not applicable, 125 rlo = 0.5, rhi = 1.5 . Fitted news oc eT? 0.01 I ' " " ' ' " " 1 1 1 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Time t Figure 5.9: Sample price series for DSEM Dataset 4 (r/„ = 0.5, = 1.5) showing the price roughly tracks the exponential of the cumulative news en. The propor-tionality constant is estimated from the data. especially given that the upper limit can be > 1. Nevertheless the relation still holds but the proportionality constant is best estimated from the data as is done in Fig. 5.9. Since the news is a discrete Brownian motion we may naively expect the price series to be a simple geometric Brownian motion but as we have already seen the price series exhibits an abundance of outliers not observed in the news. As we will see in the next section, the price series also contains a memory which the news does not. 5.2.3 Short timescales The analysis for short timescales is fairly straightforward. We need only compute the autocorrelation between returns for different lags. For this analysis, trading time (or ticks), defined as the number of transactions executed, will be used as the time index since the quantity of interest is the correlation between successive trades. For comparison, the autocorrelations between daily returns for the Dow Jones Industrial Average for the last hundred years [3] is shown in Fig. 5.10, indicating no significant correlations in support of the efficient market hypothesis. (Of course, this does not preclude correlations existing on timescales of less than one day but data were not available at this resolution.) 126 I — | 1 1 h - ^ = = + DJIA daily returns -0.2 0 5 10 Lag r (days) 15 20 Figure 5.10: The autocorrelation between daily returns for the Dow Jones Industrial Average [3] decays rapidly to zero with an estimated characteristic timescale r c = 0.4 ± 0.2 days. (Being less than the sampling interval, this estimate is not precise.) As demonstrated in Fig. 5.11 correlations also decay quickly for D S E M re-gardless of the value(s) of the price response, with correlations only evident over a few successive trades. This would seem to imply that the price series has no memory. However, recall that in Section 4.2.3 we observed two phase transitions in D SEM by directly measuring the memory of the price series, which challenges the results presented here. There are two possible reasons for the discrepancy: (1) the autocorrelations are measured tickwise whereas the Hurst exponent was originally measured from a daily sample, or (2) a plot of the autocorrelation does not fully describe temporal dependencies in the data. To determine which is the case the long-range depen-dencies are again estimated from the Hurst exponent, this time calculated from the tickwise data. 5.2.4 L o n g t imesca les To test for long-range temporal dependencies we again compute the Hurst exponents for the price returns in DSEM. But first it should be mentioned that data from real markets have been found to have no memory, with Hurst exponents near H w 0.5. Indeed, for the daily Dow Jones Industrial Average returns presented above, the Hurst exponent is estimated at H = 0.484 ± 0.013 indicating no long-term memory 127 0.8 I 0.6 0.4 0.2 -0.2 -0.4 -0.6 5.1 M I l i fe ii : \ -0.75, rhi = 0.50 r / 0 = -0.75, r M = 1.50 - -X rl0 = 0.00, r w = 1.00 n„ = 0.75, rhi = 0.75 - -B-= 0.75, rhi = 1.50 -• rlo — w . I u , # ftj r i o = 0.75, r/jj i i -x X 10 Lag T (ticks) 15 20 Figure 5.11: The autocorrelation between tickwise returns for DSEM (with a point price response distribution) decays rapidly to zero for all runs sampled. 128 100 r 10 — i T ~ 1 ^—r~ n 0 = 0, rhi = 1 + = 0.571 ± 0.034 H = 0.324 ± 0.009 A _L Memory _i I i L _L 16 64 256 1024 Timescale (ticks) 4096 16384 Figure 5.12: Sample dispersion plot (see Section C.2.1) demonstrating the phe-nomenon of crossover in the Hurst exponent to H « 1/2 on long timescales for D S E M with a two-point price response distribution. effects. Crossover On first glance the computed Hurst exponents appeared similar to the original re-sults shown in Fig. 4.11 but on closer inspection some interesting qualities were revealed. Namely, in almost all the runs there appeared to be two different scaling behaviours: for small timescales one Hurst exponent dominated but as the timescale grew there appeared a crossover to a different exponent. The latter of these was in-variably near H = 1/2 indicating a lack of memory. A sample graph demonstrating crossover is presented in Fig. 5.12. The reader may be concerned that the timescale used for calculating the memory is not linear but trading time—the cumulative number of trades executed since the start of the experiment—and the crossover phenomenon may be an artifact of this sampling. Evidence indicates that trading time is the more natural timescale [66,74], reducing biases introduced by regular sampling (see Appendix B) . However, for completeness the data were also tested using regular sampling with largely the same results. A sample plot is shown in Fig. 5.13 demonstrating crossover also occurs when returns are sampled at regular intervals. The remaining discussion refers to tickwise sampling. 129 1 r - 1 - i 1-rio = 0, rh% = 1 + H = 0.526 ± 0.028 H = 0.326 ± 0.008 0.1 0.01 -+• Memory 1 I I I L. _L 16 64 256 1024 4096 Timescale (hours) Figure 5.13: A reproduction of Fig. 5.12 except with regularly sampled returns at an "hourly" interval (instead of tickwise). Short timescale anticorrelations crossing over to uncorrelated returns at long timescales are still observed so the effect is not an artifact of sampling tickwise. The crossover to H « 1/2 is not altogether surprising because on long enough timescales we expect the news process to be an important determination of the price movements, and the news is a simple, discretely-sampled Brownian motion with no memory (H = 1/2). Yet another phase transition The crossover point gives another estimate of the duration of correlations, or memory in the price series. Fig. 5.14 shows that the memory depends strongly on the lower limit of the price response r/0. In fact, as this value crosses roughly r j 0 w 0.5 a phase transition is apparent. (While interesting, this transition will not be characterized further in this thesis, but may be analyzed in future work.) For larger values of ri0 the memory effects disappear very quickly, conforming to the efficient market hypothesis and empirical data. Further, in this range H is already quite close to one half, which also suggests the lack of a memory for any significant period. (Personal experience suggests that the Hurst exponent has a typical error margin of ±0.1 so any value in 0.4 < H < 0.6 should be interpreted as potentially having no memory.) Thus, the price returns in DSEM are observed to exhibit a realistic lack of (significant) memory when r/0 exceeds roughly 0.25. If DSEM is interpreted as 130 Figure 5.14: Both the crossover point, or memory, (a) and Hurst exponent for short timescales (b) indicate that memory effects are minimized when n D > 0.25 in D S E M with a two-point price response distribution. (The high values of the Hurst exponent for ri,, > 0.5 (b) do not cause problems because the memory is very short in this region (a).) 131 representative of a real market, the question is naturally raised, "Why do the agents choose this region of parameter space?" The simplest (but unjustified and hardly satisfactory) explanation is simply that any other choice would provide arbitrage opportunities which could be taken advantage of by watching for trends in the price series. So a rational agent would choose a nonzero price response in the expectation of these arbitrage opportunities, but ironically, in doing so, the memory (and opportunities) disappear! In other words, expectations of information in the price series erase that very information. 5.3 Volatility clustering In the last section we observed that the price series in DSEM always crossed over to a domain with no memory effects for long enough times. This would seem to imply a lack of history-dependence in the time series. However, it has been empirically observed that volatility (to be defined) has a very long memory. This leads to the phenomenon of clustered volatility: high activity in the market is observed to cluster together, separated by spans of low activity. The simplest definition of volatility is the absolute value of the price return over some interval. (This definition appears to be more prevalent than the square of the returns [36,77,78,80].) Clustered volatility, by this definition, means there exists periods in which the price changes rapidly and dramatically, separated by other periods where few/small changes in the price occur. Hence, there exist long-range temporal correlations in the absolute value of price returns. As before, the Hurst exponent is a promising quantity to measure these correlations. For the daily returns of the Dow Jones Industrial Average [3] shown in Fig. 5.10, the Hurst exponent of the volatility is measured to be H = 0.852 ± 0.009 demonstrating very strong positive correlations—high volatility tends to be followed by further high volatility and low by low. Other studies measuring the Hurst exponent of the volatility from empirical data have concluded H s» 0.9 [77], 0.63 < H < 0.95 [78], and H ^ 0.85 [36]. Two other works I am aware of calculated the exponent of the autocorrelation function which decays as a power law with exponent 2H - 2 [82], giving H « 0.9 [80] and H « 0.8 [4]. (The latter defined volatility as the squared return rather than its absolute value, but came to the same conclusions.) These studies were performed on a variety of systems so, clearly, clustered volatility is universal. Again, we seek to know whether DSEM also exhibits this property. To test it, we continue with our analysis on a tickwise (number of trades executed) timescale and calculate the Hurst exponent for the absolute value of the returns. The 132 Figure 5.15: The Hurst exponent of the absolute returns, which measures the degree of clustered volatility, is strictly greater than one half for all parameter combinations in DSEM. It is particularly high when the upper limit of the two-point distribution is large or when the lower limit ria is small. results are summed up in Fig. 5.15 which shows that the Hurst exponent measuring volatility clustering is always above one half over the whole parameter space, but significantly so when the upper limit r^i of the two-point price response distribution is large or the lower limit r;0 is small. Overall, though, the Hurst exponents are somewhat smaller (the greatest value was H = 0.77) than the empirical results (H « 0.9), suggesting that our search space should be expanded to larger values of rni • 5.3.1 Shuffling As a check of the analysis the absolute return data for a particular run (ri0 = 0.75, — 1-5 with H = 0.72 ± 0.02) were shuffled and the Hurst analysis of the shuffled data recalculated. Shuffling destroys temporal correlations so the expected value of the exponent is one half. In fact, the Dow Jones Industrial Average data yields H = 0.502 ± 0.010 for the absolute returns when shuffled. For the sample DSEM data the resultant exponent is H = 0.520 ± 0.006. Both are very close to one half, confirming that the high value for the unshuffled absolute-return data is due to temporal correlations (clustered volatility). 133 Figure 5.16: DSEM Dataset 4 (N = 100 agents) is able to capture three important properties observed empirically when r/0 > 0.35 and rhi > 125. The curves are contours from previous plots: (1) characteristic return rc = 3 from Fig. 5.8 (solid line); (2) memory in return series = 100 from Fig. 5.14(a) (dashed line); and (3) Hurst exponent for the absolute returns H = 0.6 from Fig. 5.15 (dotted line). 5.4 Scaling and Clustered volatility In real markets all three properties of (1) scaling, (2) uncorrelated returns, and (3) clustered volatility are observed. As we have seen, DSEM can replicate each of these features when the agents have a two-point price response (r;G and r/j,) and the values of these parameters are chosen appropriately. Of particular interest is whether there is a region of parameter space in which all three of these phenomena are observed simultaneously. In some of the previous plots contour lines were drawn to indicate separation into regions which did and did not exhibit the phenomenon of interest: Fig. 5.8 plotted the contour line rc = 3 to distinguish between parameter combinations which did (r c > 3) and did not exhibit scaling in the return distributions. Fig. 5.14(a) separates parameter combinations that do not have a long memory (< 100) in the return series from those that do. Finally, Fig. 5.15 measures, with the Hurst exponent, the clustered volatility where H > 0.6 indicates the presence of clustered volatility while H < 0.6 indicates its absence. These three contour curves are plotted together in Fig. 5.16 showing that all 134 three empirical properties are only observed in the upper right corner of the graph, when T\Q > 0.35 and r/^  > 1.25. So DSEM is most realistic for these parameter combinations. One point to note regarding this region is that it spans the critical point at Tp — r\ = 1 with the low end of the distribution below and the high end above. It is also interesting that the value r/0 = 0 (characterizing agents that do not use the return series as an indicator of performance) is not in the "realistic" region. 5.5 Wealth distribution Thus far we have explored only the temporal dynamics of the stock price. But the distribution of wealth among agents may be of interest as well. It is well known that incomes in many populations are distributed log-normally [83] (with a particular exception we will come to later). So it is natural to ask how wealth is distributed in DSEM. (Again, we neglect the analysis of CSEM.) 5.5.1 Challenges Determining the wealth distribution in DSEM is problematic because long runs of many agents are required. The durations must be long because the wealth distri-bution is initialized to a delta function (initially, all agents have the same amount of cash and stock) so a long transient is to be expected before any steady-state emerges. But this constraint must be balanced against the need for many agents. Most of the simulations presented in this thesis consisted of N = 100 interacting agents—a rather small number for any statistical description. But computational limitations prevent serious investigation of larger systems because the number of operations grows as N2. So we must be content with the data we have collected so far. 5.5.2 Log-normal distribution A rigorous analysis of the distribution of wealth will not be attempted. Instead, it is merely reported that a log-normal distribution was suitable in most cases for Datasets 1-3. However, Fig. 5.17—showing a typical distribution—highlights the difficulties in establishing the proper distribution: the small agent numbers com-bined with the narrow range of wealths observed allows one only to say that the distribution is unimodal, but nothing more. Even so, simply knowing that the distribution is unimodal is satisfactory. Any other result would be surprising because the agents in Datasets 1-3 are all 135 25 20 15 10 N = 100, rp = -0 .50 Log-normal Normal 0 10000 _L _L 20000 Wealth 30000 40000 50000 Figure 5.17: Sample distribution of agents' wealth from D S E M Dataset 3 (JV = 100, rp = —0.50). There is insufficient data to distinguish between a normal and a log-normal distribution. of a similar character, varying (continuously) only in their news responses r n and frictions / , as shown in Table 4.4. 5.5.3 Two-point price response Worth further investigation are the data from Dataset 4 (Table 5.2) where the agents varied discontinuously in their price responses. In this case the market consists of two distinct populations with fundamentally different behaviours, so one would not expect the wealth to be distributed unimodally. A reasonable alternative is that each population has a log-normal wealth distribution producing a bimodal distribution over the whole market. A representative distribution is shown in Fig. 5.18, confirming the bimodal hypothesis. Interestingly, the agents with rv = r[0 = 0 outperform (have more wealth) than their rp = r^i = 1 counterparts. To determine the generality of this result the average wealth for each sub-population was computed for all the simulations in Dataset 4. The hypothesis that the sub-population with the smallest absolute price response \rp\ would outperform the more reactive agents was tested by comparing the difference between the absolute values of the responses |r/j t-| — | r ; 0 | and the ratio of wealth held by each sub-population (w(?"/ii) v s - wirlo))- If valid then we should find the ratio of the wealths obey 136 14 12 h 10 h JV = 100, rlo = 0, rhi = 1 Log-normal: r^i ,_, Log-normal: r/0 1 — 1 Superposition 0 5000 y 10000 Wealth 15000 20000 25000 Figure 5.18: Sample distribution of agents' wealth from DSEM Dataset 4 ( N = 100, rlo = 0, r/jj = 1). The log-normal curves are calculated from each sub-population, revealing a strongly bimodal nature. 0 -0. 25 0 0.25 0.5 0.75 1 Difference of absolute price responses {r^i 1.25 \ r l o \ Figure 5.19: In DSEM with a two-point price response the wealth of each of the sub-populations w(rp) depends strongly on the magnitude of the price response \rp\. The population with the smallest absolute price response (r^i to the left of zero and ri0 to the right) consistently has more wealth as indicated by the ratio of wealth between the two sub-populations. 137 w{rhi)/w(ri0) > 1 when |r/,j| < \ri0\ and vice versa. Fig. 5.19 demonstrates this hypothesis holds very well—with a linear correlation for the plotted data of - 7 6 % — suggesting that the "best" strategy is to ignore the price fluctuations, rp — 0. This raises an interesting question: if a zero price response is best, why would agents choose non-zero values? We have seen in previous sections that realistic market phenomena such as scaling and clustered volatility only emerge in when the price responses are set far from zero. If DSEM is meant to represent real investor behaviour, why do investors base their decisions on price fluctuations when the model indicates this is detrimental? This issue is currently being investigated by allowing the agents to "learn" from their past mistakes as wil l be discussed in Section 7.4. The purpose of this research is to determine if nonzero values of the price response parameter emerge spontaneously in the dynamics via the learning process. Pareto's law of income distribution In 1897 V. Pareto noticed that incomes tend to be distributed log-normally over the majority of the sample data excepting the tail of the distribution (the highest one percent of the incomes) which decay as a power law, an observation which holds in many countries to this day [83]. C S E M and D S E M were deliberately constructed to keep a record of agents' wealths so that this claim could be tested. However, testing for this property requires even larger system sizes than we have explored so far, since the highest percentile of a population of even N = 1000 consists of only ten agents—inadequate for statistical analysis. Larger systems are currently under investigation but insufficient data were available as of completion of this dissertation. 5.6 Summary In this chapter a number of unusual qualities of empirical markets were explored in the context of the Centralized and Decentralized Stock Exchange Models (CSEM and DSEM, respectively). C S E M was unable to reproduce even the first of these: scaling in the tail of the price return distribution. DSEM was also unable to produce this effect until the restriction that all agents maintain the same price response parameter rp was relaxed and instead two values were allowed—ri 0 and rni—thereby splitting the population into two distinct "types." Then, for particular values of the parameters, scaling was observed over a range of returns in excess of three standard deviations. 138 Two other empirical phenomena were also explored: the lack of correlations in the return series but the presence of long-range correlations in the volatility (defined as the absolute value of the return). Having failed the first test C S E M was not tested but D S E M was able to capture both these properties, again in a suitable region of parameter space. A l l three of these properties were observed simultaneously in D S E M when 0-5 < ri0 < 1 and r/j, > 1.25, spanning the critical point at rp = r\ discovered in Section 4.2.4. The significance of this result is unclear but some thoughts on the matter are discussed in Chapter 7. But first the results of some interesting experiments with some real stocks are discussed. 139 Chapter 6 Experiments with a hypothetical portfolio This chapter is somewhat of a departure from the rest of the thesis. It describes some experimental results obtained purely to satiate my own curiosity. As such, the experiments are less rigorous than they could be (in particular, the dataset is quite small) and the results should not be taken too seriously. However, these experiments may yield valuable insight for the reader because they describe a real-world application of some of the theory discussed in prior chapters. 6.1 Motivation The agents in Chapter 3 trade using a fixed investment strategy (FIS) which states that they should keep a fixed fraction of their capital in stock and the remainder as cash, to minimize risk and maximize returns. As was discussed, the theory underlying it has two important assumptions: (1) that there are no costs associated with trading, and (2) that moments of the return distribution higher than two (in particular, the kurtosis) are negligible on short timescales. I was curious how well FIS would work in a real market environment where these assumptions may not hold so I constructed a hypothetical portfolio to track real stocks. Sandbox Entertainment (http://www.sandbox.net/business/) pro-vides an online simulated stock market called "PortfolioTRAC" which gives users an imaginary bankroll of $100,000 and allows them to invest it in stocks listed on the major American markets. The simulation uses real trading prices and allows daily trades. Although it requires a few other idealizations, it is quite thorough and supports such complexities as short positions, limit and stop orders, broker fees, and daily interest on cash. Note that trades are only processed once per day (after 140 Symbol Company Price Shares Value A A P L Apple Comp Inc $41.25 244 $10,065.00 A M D Adv Micro Device $28.00 345 $9,660.00 A U Anglogold L td $19.56 514 $10,055.13 C H V Chevron Corp $82.06 120 $9,847.50 E K Eastman Kodak Co $71.19 141 $10,037.44 IMNX Immunex Corp $116.50 82 $9,553.00 MSFT Microsoft Corp $141.00 69 $9,729.00 NSCP Netscape Comm $63.25 157 $9,930.25 RG Rogers Comm $8.56 1139 $9,752.69 Cash $11,004.89 Total $99,634.89 Table 6.1: Initial holdings of a hypothetical portfolio on January 4, 1999. closing) so limit and stop orders execute based on closing prices. I began with $100,000 on January 4, 1999 and decided to divide my assets uniformly among 9 stocks and one cash account. Table 6.1 lists my initial portfolio after commissions have been accounted for. The FIS goal was to keep one tenth of my total capital in each stock. 6.2 Choice of companies My choice of companies was not completely random: I chose Apple Computers (AAPL), Advanced Micro Devices (AMD) and Netscape Communications (NSCP) because they were all "underdogs" in their respective industries and would have to be innovative and aggressive to survive. Similarly, I chose Eastman Kodak (EK) because, although currently a large, stable company, I expected the emerging digital camera technology to threaten its dominance and I wanted to see how it fared. I chose the cable company Rogers Communications (RG) because I was interested in the newly available cable modem technology which they were investing in. Microsoft Corporation (MSFT) seemed a low-risk choice to balance my high-risk portfolio. My focus to this point had been in the high technology sector so I determined to diversify: in the petroleum sector I chose Chevron Corporation (CHV) for its apparent low-risk and because it was the most recent gas station I had visited, and I chose Anglogold (AU) as a gold stock simply. because of its catchy symbol. I couldn't think of a last company I was interested in so I let my wife choose Immunex Corporation (IMNX) from the biomedical sector. Although these choices 141 were biased by my own interests it was hoped that they would prove sufficiently representative to test the performance of the fixed investment strategy. (As wil l be seen, this portfolio correlated strongly with the Nasdaq composite index.) 6.3 Friction The derivation of FIS in Chapter 3 and other sources [56, 57] neglected commis-sions; they considered a completely fluid portfolio, capable of adjusting instantly to infinitesimal price changes. This market simulation was more realistic, with com-missions which were handled as follows: in each trade a $39.95 charge was levied for the first 1000 shares traded (bought or sold) and $0.04 per share over 1000. Obviously, it would be unprofitable to trade on every minuscule price fluctuation so a friction f was introduced. Orders are not placed until a stock's price p exceeds a threshold as given by Eqs. 3.20-3.21. 6.3.1 Minimum friction Under some particular conditions it is possible to estimate how large the friction needs to be for profitable trading in a commission-enabled market. To calculate the necessary scale of the friction consider an imaginary scenario: we begin with a total capital of w divided uniformly between a cash account and JV — 1 stocks, so the ideal investment fraction is i = 1/JV. The scenario consists of 1. a single stock's price moving to a trade limit (buy or sell), 2. the stock being rebalanced (traded), 3. returning to its original price, and 4. being rebalanced again. In this scenario we assume all the other stocks are unchanged, each main-taining a value iw. We are interested in what the minimum friction / m , n can be such that we don't lose any money given an absolute transaction cost T. We begin with the fluctuating stock at its ideal price * i w (RU P = — , (6-1) s where s is the number of shares held of the stock. So the limit prices are P±=P*(l + f)±\ (6-2) 142 where p+ is the sell limit and p_ is the buy limit. If the stock moves to one of the limits while all others remain constant, then our wealth (before trading) wil l become w± = (1 — i)w + sp± (6.3) and the quantity to be traded, from Eq. 3.15, will be As± = s*(p±) — s = ( i - i)« [(l + - 1 ] maintaining the same notation (the upper symbol of ± and T indicates an initial rise in price, and the lower indicates an initial drop). The trade also changes our cash holdings by (6.4) (6.5) Ac± = -As±p± - T = - A 5 ± p * ( l + / ) ± 1 - T (6.6) (6.7) where T is the transaction cost (in dollars). Now we assume the stock's price returns to its original value p* and we trade to recover our original portfolio As^_ = — As± (for simplicity), yielding another change in cash A c ' ± = As±p* - T so the net change is Ac = As±P* [ l - (1 + / ) ± 1 j - IT. Inserting the computation for As± gives a net change Ac = (l-i)iw[(l + f)^-l] [ l - ( l + / ) * - ' 1 - 2 T (1 — i)iw / + 1 + / - 1 - 2 T , (6.8) (6.9) (6.10) (6.11) regardless of whether the stock price rose then fell or fell and then recovered. After some algebra we find the condition requiring a profit Ac > 0 holds when / > fmin where fmin — TN2 w(N-l) 1 + + 2w{N - 1) TN2 which simplifies to I 2T w{N -1) (6.12) (6.13) 143 Date Event January 4, 1999 Experiment started March 23, 1999 Takeover: 1 NSCP -> 0.9 AOL March 26, 1999 Stock split: 2-for-l IMNX March 29, 1999 Stock split: 2-for-l M SFT August 27, 1999 Stock split: 2-for-l IMNX November 11, 1999 Stock split: 2-for-l AOL December 10, 1999 Tolerance changed from 10% to 2/ m ; n March 21, 2000 Stock split: 3-for-l IMNX Apr i l 14, 2000 NASDAQ correction May 12, 2000 Experiment ended Table 6.2: Events relating to the hypothetical portfolio which occurred during the course of the experiment. in the limit w ~S> T. This informal derivation is only meant to set a scale for the minimum friction, it is not meant to be rigorous. A more detailed calculation may be possible by assuming each stock's price moves as geometric Brownian motion but the derivation would be cumbersome and the benefit dubious. When I began trading with my hypothetical portfolio in the beginning of 1999 I arbitrarily chose / = 10%, a fortuitous choice, as it turns out, because anything less than / m j n = 9.90% might have been a losing strategy. Since Eq. 6.12 sets the break-even friction it is best to set the actual friction somewhat higher. Once the minimum friction for my portfolio had been estimated (December 1999) I chose a dynamic value of / = 2 / m j n . 6.4 F I S Experimental results In this section, the results of using the fixed investment strategy (with friction) on a hypothetical portfolio will be discussed. 6.4.1 Events The experiment began on January 4, 1999 and ran until May 12, 2000 for a total of 343 trading days. The portfolio was rebalanced faithfully, as needed almost every day (excepting a few rare and brief vacations). Note that the simulation only executed trades after closing so intra-day trading was not supported and the trading price was always the stock's closing price. Also note that the simulation did not constrain orders to be in round lots and most orders, in fact, were odd sizes. 144 A list of important events occurring over the course of the experiment is shown in Table 6.2. The majority of events consist of stock splits, a division of the shares owned by each shareholder of a company such the stake held by each is unchanged. For example, a 2-for-l stock split means each share is split into two, each worth half its original value. Although theoretically a stock split should not affect an investor's capital in a company, stock splits are considered good news and often drive the stock's price up both before and after the split. The main reason is that a split lowers the price of a stock and thereby makes it accessible to more potential investors—increasing demand. Another interesting event which occurred during the experiment run was the takeover of Netscape Communications by America Online in March, 1999. AOL purchased Netscape for roughly $4 billion and each share of Netscape stock was con-verted to 0.9 shares of AOL. Takeovers tend to engender a great deal of speculation which can precipitate large fluctuations in the stock's value. On the book-keeping side, the only change in methodology was the move from a constant friction of / = 10% to a floating value of / = 2 / m j n , as given by Eq. 6.12, on December 10, 1999 (giving / = 15% at the time). The main consequence was a somewhat decreased trading frequency. The most exciting event was the correction in the high-technology sector (which dominates my portfolio) in the week of Apr i l 14, 2000, evinced by an over-35% drop in the Nasdaq Composite index from its all-time high only weeks earlier [84]. This is a particularly fortunate occurrence because it tests the ability of the FIS to handle drawdowns. Market-wide fluctuations of this magnitude are rare but an important consideration when devising a trading strategy. This aspect of the experiment will be discussed below. 6.4.2 Performance The final state of the portfolio at the end of the experiment is shown in Table 6.3. In this section the performance of the fixed investment strategy wil l be evaluated. As a control, a simple "Buy-and-Hold" Strategy (BHS) with the initial portfolio shown in Table 6.1 held fixed, wil l be contrasted with the fixed investment strategy (FIS). Fig. 6.1, which shows the evolution of total capital for both strategies, on the surface seems to indicate BHS outperforms FIS. BHS reaches a high of $237,000, a full 14% higher than the maximum achieved with FIS. Also, BHS maintained a higher capital on 292 of the 343 days (85%) the market was open. Evidently, FIS does not perform well in real-world applications. However, a closer inspection suggests FIS should not be discarded too rashly. 145 01/99 03/99 05/99 07/99 09/99 11/99 01/00 03/00 05/00 07/00 Date 01/99 03/99 05/99 07/99 09/99 11/99 01/00 03/00 05/00 07/00 Date Figure 6.1: Historical wealth using FIS versus (a) the Buy-and-Hold strategy and (b) the Nasdaq Composite Index over the same interval (rescaled to be equal at the start of the experiment). 146 Symbol Company Price Shares Value A A P L Apple Comp Inc $107.63 159 $17,112.38 A M D Adv Micro Device $85.69 216 $18,508.50 AOL America Online Inc $55.38 325 17,996.88 A U Anglogold Ltd $20.06 965 $19,360.31 CHV Chevron Corp $94.25 190 $17,907.50 E K Eastman Kodak Co $56.13 310 $17,398.75 IMNX Immunex Corp $35.00 456 $15,960.00 MSFT Microsoft Corp $68.81 269 $18,510.56 RG Rogers Comm $25.56 732 $18,711.75 Cash $18,792.39 Total $180,259.02 Table 6.3: Final holdings of a hypothetical portfolio on May 12, 2000. For example, consider the market correction on and around the week of Apr i l 14, 2000. The Nasdaq Composite peaked at 5,049 points on March 10 and fell to a low of 3,321 on Apr i l 14, a drop of 34%. The Buy-and-hold strategy fared somewhat better, dropping to $180,000 for a drawdown of 25%. But the fixed investment strategy suffered the smallest decrease—down only 15% to $176,000, finishing with almost the same value as BHS. (It should be noted that Maslov and Zhang [85] demonstrated that the FIS is the most aggressive possible strategy that keeps the risk—measured as the expected drawdown from the maximum—bounded.) This suggests FIS is less susceptible to large fluctuations. By rebalancing the portfolio, one moves capital out of stocks which may be overvalued and into safer companies which may be more resilient to perturbations. In this sense, FIS reduces risk. This can be seen in Fig. 6.2 which demonstrates that BHS is more prone to large fluctuations (both positive and negative). To test whether these histograms are compatible with the Gaussian hypothesis [46] the first four moments of the distributions are calculated in Table 6.4. The moments of the daily returns of the Nasdaq Composite index and each of the stocks in the portfolio over the same interval are also shown. The means for all the distributions are all small, negligible in comparison to the standard deviations (which had an average of 3% daily). The skewness, given by Skew({ri}) = i j ; H 3 (6.14) 147 0.1 0.08 h X 0.06 0.04 0.02 i — i 1 r Fixed investment strategy -Buy-and-hold strategy -_L±L 0 -0.08 -0.06 I I H i _L n i l ' r -0.04 -0.02 0 0.02 0.04 Log-returns log(wt+i/wt) 0.06 0.08 Figure 6.2: Histograms of log-returns of capital rt+\ = log(u>t+i/wt) for both strate-gies. Notice BHS exhibits more large fluctuations (fatter tails) than FIS. Symbol Mean Std. Dev. Skewness Kurtosis FIS 0.00178 0.016 -0.18 0.12 BHS 0.00183 0.021 -0.23 0.82 Nasdaq 0.00119 0.022 -0.58 1.65 A A P L 0.00274 0.040 0.04 0.54 A M D 0.00342 0.046 0.23 2.76 AOL 0.00134 0.044 0.33 1.32 A U 0.00042 0.029 0.60 3.93 CHV 0.00050 0.020 0.39 0.95 E K -0.00062 0.019 0.25 5.40 IMNX 0.00375 0.064 0.29 1.80 MSFT -0.00010 0.029 -0.90 5.18 RG 0.00361 0.034 0.66 0.80 Averages 0.00166 0.032 0.08 2.11 Significance 0.21 0.53 Table 6.4: First four moments of the distribution of log-returns for each stock, the two trading strategies under review and the Nasdaq Composite Index. The skewness characterizes the asymmetry of the distribution and the kurtosis indicates the presence of outliers. The average skewness is not found to be significant but the kurtosis is. 148 where ar is the standard deviation of the returns, indicates the degree of asym-metry in the distribution. The sign of the skewness indicates which tail of the distribution contains more outliers. The skewness uncertainty for a (symmetric) Gaussian-distributed sampling of iV points is y/15/N [20, Ch . 14], thereby setting a scale for deciding if a particular skewness is significant or not. Interestingly, Table 6.4 indicates that individual stocks tend to be skewed positively but the portfolios and the Nasdaq index are negatively skewed. This suggests that negative movements tend to be correlated between stocks but positive movements are not. The (excess) kurtosis is a measure of the spread of the distribution. A positive kurtosis indicates the presence of many outliers or "fat tails". The kurtosis is defined where 3 is subtracted in order to fix the kurtosis at zero for a normal distribution. The standard deviation of the kurtosis from a dataset of size N sampled from a Gaussian is \/96/N so the Gaussian hypothesis is rejected if the kurtosis if found to be greater. The table shows that almost every calculated kurtosis is significant, indicat-ing fat tails for these daily return distributions. The only exception is the fixed investment strategy which only has a kurtosis of 0.12 (versus a significance level of 0.53). This further confirms the hypothesis that FIS reduces risk: by rebalancing the portfolio the frequency of large fluctuations is reduced. (Another favourable consequence is that the return distribution appears to converge to a Gaussian, as was assumed in the derivation of FIS.) Nevertheless, it is undeniable that BHS outperformed FIS in the experiment. But given the atypical trend seen in the portfolios, this conclusion may not be generalizable. Both portfolios realized almost a 100% growth over the first year, a gain which can hardly be expected to be repeated often (except, perhaps, during other speculative bubbles). A more typical realization may have proven FIS superior. By more typical is meant a smaller trend, relative to the scale of the fluctuations. As can be seen in Fig. 6.1 FIS performs best in the presence of fluctuations and slowly loses ground against BHS in the presence of an upwards trend. Even if this experiment doesn't demonstrate that FIS is optimal it still shows that it is a reasonable investment strategy, and a suitable choice for the agents in D S E M (Chapter 3). The striking feature of FIS is that it reduces the kurtosis (risk of large events) and thereby misses large downturns (and upturns) in the market. as (6.15) 149 6.5 Log-periodic precursors In this section the results of another experiment performed, using the same hy-pothetical portfolio, will be examined. At issue is whether there exists a reliable method to forecast imminent crashes in the market. First, some background theory is necessary. 6.5.1 Scale invariance Scale invariance is a property of some systems such that a change of scale in a parameter x' = \x only has the effect of changing the scale of some observable F' = F/fi, such that F{x) = pF{\x). (6.16) The above scaling relation has a power law solution F(x) = Cxz where log/i Z = —: (6.17) log A and C is an arbitrary constant. The important point to notice is that the scalings along both axes are related by fi = \~z. No matter how much the control parameter x is scaled by (even infinitesimally, A —> 1), it is always possible to rescale the observable so that it is invariant. This is known as continuous scale invariance. 6.5.2 Discrete scale invariance and complex exponents In contrast, discrete scale invariance only allows fixed-size rescalings of the parame-ter. To see how this comes about we begin by substituting the solution F(x) = Cxz into the renormalization equation (Eq. 6.16), Cxz = fiC\zxz (6.18) => 1 = [i\z. (6.19) Now notice that 1 = e27rm for any integer n. Applying this and taking the logarithm of both sides gives 2-Kin = log n + z log A (6.20) which has the solution - = l o S ^ | j 2 n n (6 21) log A log A For the scaling relation to hold z must be a constant, which can only hold when n = 0 (which allows A to take on any value, recovering continuous scale 150 invariance) or when A is some fixed constant, the preferred scaling ratio. Hence, the only invariant transformations are the discrete rescalings x' = Xx with corresponding scalings in the observable F' = F/u- (for some fixed u). 6.5.3 Log-periodic precursors So far this might all look like mathematical trickery to the reader but the theory does have testable consequences. If we use the notation zn — a + iu>n with log/i a = log A 2nn log A (6.22) (6.23) then Fn(x) = Cnxaxlu'n is a solution to the scaling relation for each n and the general solution is the linear combination over all integers n, F(X) = XaY,CnXi^ n = xa ] P Cn exp(iojn log x) = x C 0 + e^o s * Y, Cn exp{iu{n - 1) logx) (6.24) (6.25) (6.26) where we have defined u> = 2ir/ log A, for convenience. The final form of F(x) indicates that the function has a periodic component with angular frequency C J . Expanding the periodic component as a Fourier series gives, to first order, F{x) w xa [C 0 + C[ cos(o>logx + c6)] (6.27) where (j) is an unknown phase constant. This argument, a variation of those presented in Refs. [13,86,87], concludes that discrete scale invariance leads naturally to log-periodic (in x) corrections to the scaling function F. 6.5.4 Critical points Near a critical point many properties of a system exhibit power law scaling relations as described above. Therefore they are prime test-cases for the existence of complex exponents characterized by log-periodic precursors. Seismicity, studied in the context of critical phenomena, have been success-fully modeled as self-organizing (with the build up of stress) to a critical point in 151 time tc characterized by an earthquake, a sudden release of energy [13,88,89]. In the neighbourhood of the critical time (small \tc — t\) the stress exhibits classic power laws seen in critical phenomena. One important goal in seismology is forecasting the time of occurrence tc of large earthquakes. It has been argued that log-periodic fluctuations are present both before (foreshocks) and after (aftershocks) large events and that the precursors improve earthquake forecasts considerably [86,90,91]. The premise is that the rate of change of the regional strain e exhibits critical scaling near the critical point, e = F(\tc-t\) (6.28) so that the strain (a measurable quantity) obeys e = A + \tc- t\a+1 [B + C cos(w In\tc -1\ + </>)] (6.29) in the vicinity of tc. The curve is fit to known data by tuning the seven model pa-rameters (A, B, C, tc, a, CJ, and <f>) and the forecast of tc is read off from the best fit to the data. This method has significantly improved precision over curve fits neglecting log-periodicity (C = 0), validating the adoption of the three extra parameters. 6.5.5 Application to financial time series The same group of researchers who developed the concept of log-periodic precursors in seismology have recently turned their attention to the stock market, arguing that market crashes should be predictable by the same methodology [1,84,92]. Johansen et al. [92] construct a theory for price fluctuations with the risk of crash such that price series obeys precisely the relationship given in Eq. 6.29. The basic argument is that stock prices enjoy exponential growth but with some risk of crash. As time progresses the risk accumulates and the exponential growth rate increases to compensate for the risk (to remunerate rational investors for their risk). At some point in time the risk diverges and a critical point emerges. The fundamental component of the theory is that the instantaneous risk of crash (which they call the "hazard rate") is assumed to obey a scaling relation like Eq. 6.16 with a control parameter tc — t. Since it is related to the rate of change of the price, Eq. 6.29 arises. In theory then, financial crashes should be predictable by curve fitting to the price series. In practice, though, this is an extremely difficult task: while searching through the seven-dimensional parameter space for the optimum fit one often gets stuck in local optima, missing the global one. This complaint has been raised against the theory [93] and is acknowledged by Johansen et al. [84]. 152 Another problem with the research is that the experiments are all performed on known crashes after they have occurred}. This introduces two problems: F i r s t l y— with no disrespect intended—it may bias the results. If one knows there was a crash at such-and-such a time it would be very difficult to be satisfied with a curve-fit which made no such prediction. One would probably suspect the parameters were stuck in a local minimum and tweak them. This is perfectly natural but without foreknowledge one might have accepted the results without prejudice. Secondly, all the curve fits were performed around well-established crashes. It would be as useful to test the theory during other periods when no crashes occur in order to test for "false positives." If the theory predicts too many crashes when none actually occur it is of no use. In order to avoid these pitfalls I conducted a "blind" experiment to test the ability of Eq. 6.29 to forecast crashes. As I was already running my FIS experiment it was convenient to use it as my input data for the curve fit. Instead of forecasting a crash in a single stock, then, I was attempting to forecast a crash in a portfolio of nine stocks (and one cash account). But this is not seen as problematic since the other studies used composite market indices instead of individual stocks, as well [1,84,92,93]. 6.5.6 Experimental design The experiment consisted of collecting portfolio wealth data wt and fitting the curve given by Eq. 6.29 with e = w. The experiment began on February 14, 2000 and ran through May 12, 2000 but the dataset used was the entire historical set from the FIS experiment (which began on January 4, 1999). The dataset consisted of sets of date-wealth pairs which were only collected on days when a trade was executed. At the beginning of the experiment this consisted of 113 points which grew to 134 points by the close of the experiment. The fitting over the seven model parameters was performed using Microsoft Excel's Solver Add-in which uses the Generalized Reduced Gradient (GRG2) non-linear optimization technique [94]. The GRG2 method is suitable for problems involving up to 200 variables and 100 constraints. The optimization condition was the minimization of the sum of the squared deviations (%2 nonlinear least-squares fitting). The fit was performed on a logarithmic price scale on the basis that it is the relative (fractional) fluctuations in capital which are fundamental, not the absolute variations. Fitting on a linear scale, then, would significantly bias the curve to fit better at greater wealths at the expense of the fit at lesser wealths. So the fit 153 consisted of minimizing X2 = ^ [ l n ^ ) - l n ^ ] 2 (6.30) where wt is the actual wealth at time t and w(t) is the fitting function as given by Eq.:6.29. The Solver routine did not provide a measure of the quality of the fit or an estimate of the fitted parameters' uncertainties but an estimate of the quality is provided by the x 2 measure itself. If the fit is of high quality then the data should be randomly distributed around the curve with a total % 2 variance proportional to N — 7 [20, Ch. 15]. Hence, the ratio {N — 7)/x2 should be independent of the number of data points N acquired. A small value indicates a large x2 variance and a poor fit, while a large value indicates a good fit. Hence, the quality of the fit Q, defined is a dimensionless (strictly positive) quality which increases as the fit gets better. Using Q allows us to compare fits at different times t with different amounts of data N. Note that the parameter Q is only useful so far as ordering the fits: if Qi > Qj for fits i and j (possibly at different times) then i is a better fit—more likely to explain the data and with more meaningful parameter values (in particular, the forecasted crash date tc). Every day of the experiment a new data point was recorded (if a trade had been executed) and then the curve was refit to the dataset generating a new forecast for the next market crash tc. The forecasted date of the crash, date the forecast had been generated and the quality of fit Q were then recorded. A new forecast was made everyday, even in the absence of new data, because the critical time tc was constrained to occur in the future. Nonlinear curve fitting is basically a parameter space exploration which de-pends crucially on the initial choice of parameters. The initial parameter set, at the initiation of the experiment, was chosen by first trying to establish a good power law fit (with C = 0) and then refitting over all seven parameters. Subsequent fits all began with parameter values that were produced by the last fit with one im-portant exception: the critical point tc was always initialized to be the current day. The motivation was to avoid getting stuck in sub-optimal solutions at later times and miss an impending crash. It was preferable to impose a bias towards imminent events. It is still possible to converge to sub-optimal solutions but it was decided that false positives were preferable to false negatives. It is important to stress that all the forecasts from this experiment were true predictions, tabulated as the experiment progressed for analysis later. The as N-7 Q = (6.31) 154 $250 k $200 k J» $150 k $100 k h 111111111 11 111111 111111 111111 11111 i 111111111111 1111 11 Wealth + Fitted curve + + I • • • • • I I i i i i • I i • i i i I I i i i i • I I 01/99 03/99 05/99 07/99 09/99 11/99 01/00 03/00 05/00 07/00 Date Figure 6.3: Sample fit of Eq. 6.29 to portfolio wealth on May 12, 2000. The best fit parameters indicate a crash is anticipated on or around tc =July 4, 2000. calculations were not performed after-the-fact so the results are not biased by fore-knowledge. 6.5.7 Results Each day, a fit of Eq. 6.29 to the portfolio wealth data was performed giving a fitted curve similar to the one shown in Fig. 6.3 and the value of tc the fitting procedure converged upon was interpreted as a forecast of the time of the next crash. The experiment ran for 63 (week-)days and predicted a remarkable 30 crashes in that period. Obviously, the theory predicts too many false positives. However, it may still have some merit if the false positives have some correlation with returns. The dates of forecasted crashes and their actual returns (fractional change of wealth) are plotted in Fig. 6.4. Notice the increased numbers of forecasts for a crash in early April, in agreement with the observed decline on the 14th. However, besides that, it is difficult to discern any pattern from the graph so some further statistical analysis is in order. We want to determine if there is a statistically significant signal in the returns on the days the market was forecasted to crash versus other days so the dataset is split into two: "Forecasted" and "Not Forecasted." The mean returns and standard deviations were computed for both data sets (and the entire dataset) as shown in 155 0.05 0.04 0.03 1—1 1 0.02 1 0.01 S3 0 CG P. -0.01 P -0.02 -0.03 -0.04 -0.05 h (a) U + -+ + — ' 1 ' 1 1 1 1 1 1 1 •" A l l returns + Forecasted to crash X + + + + + * + * + + + + -1 + + _L JL + _L + J , L Feb 12 Feb 26 Mar 11 Mar 25 Apr 08 Apr 22 May 06 May 20 Date 1000 900 800 700 600 500 400 300 200 100 L (b) T T 0 Feb 12 Feb 26 Mar 11 Mar 25 Apr 08 Apr 22 May 06 May 20 Date Figure 6.4: Daily wealth returns (wt/wt-i — 1) are shown along with the dates forecasted to crash in (a). The qualities of the curve fits corresponding to the forecasted crashes, which suggest the reliability of the predictions, are shown in (b). 156 Data set Mean return Std. Dev. A l l data 0.0% 1.8% Forecasted -0 .2% 1.5% Not Forecasted 0.2% 1.9% Table 6.5: Average values and standard deviations of the daily portfolio returns (wt/wt-i — 1) for all data and separately for days a crash was forecasted and not forecasted. Data set Mean return Std. Dev. Forecasted -0.4% 1.4% Not Forecasted 0.5% 1.9% Table 6.6: Same as Table 6.5 except only including data up until the observed decline on Apr i l 14. Table 6.5. There does appear to be a small deviation between returns on forecasted days versus not-forecasted days but the deviation is insignificant when compared to each dataset's standard deviation. The likelihood that the two datasets come from the same underlying distri-bution can be calculated using the Kolmogorov-Smirnov (K-S) test which compares the cumulative probability distribution of the two samples [20, Ch. 14]. The calcu-lation estimates a 24% chance that the underlying distributions are the same, which is still statistically significant so the log-periodic precursor prediction method is not conclusive. Recall that there was a (fortuitous) correction in the markets on and around Apr i l 14 as indicated by Fig. 6.1(b). It would be interesting to know whether this forecasting method "saw it coming." Interestingly, Johansen and Sornette submitted a paper to the L A N L preprint archive on Apr i l 16 claiming to have predicted, as early as March 10, a major event between March 31 and May 2 [84]. To test whether the crash around Apr i l 14 was predictable the data from Fig. 6.4 are reused neglecting everything after Apr i l 14. (Notice the quality of the fit declined markedly after Apr i l 14, suggesting the reliability of the later predictions is dubious.) The average returns and their standard deviations for both "Forecasted" and "Not forecasted" dates is again shown in Table 6.6, with a somewhat more significant difference betweens the means (the standard deviations are almost un-changed). Applying the K-S test now yields a much less significant 11.7% likelihood that the datasets are samples from the same underlying distribution. In conclusion, it appears that there may be some value in interpreting mar-ket crashes as critical phenomena with log-periodic precursors but the predictive 157 advantage of doing so is limited. The main difficulty lies in fitting seven nonlinear model parameters to a given dataset—often the fitting algorithm converges to a suboptimal solution, thereby forecasting an erroneous crash date tc. 6.5.8 Universality of scaling ratio In this section an open problem in the theory of log-periodic precursors wil l be presented. It has been observed that the scaling ratio A in Eq. 6.29 seems to be universal, almost always converging to a value near A « 2.5 — 3.0 [1,84,90-92]. (Note this corresponds to a universal log-periodic frequency UJ = 27r/ In A « 5.5 — 7.0.) The emergence of a universal scaling ratio has come as a surprise to researchers [84,92] since it describes some natural hierarchy within the specific system of interest and is not expected to be general. Another peculiarity is that log-periodic fluctuations occur in some systems which do not have an obvious discrete scale invariance, such as the stock markets. In the derivation of log-periodicity, discrete scale invariance was a fundamental ingredient, without which it did not emerge. Why then might markets, which are not suspected to have any discrete scale invariant structures, exhibit log-periodicity? It is my belief that these two idiosyncrasies are tied together: with the lack of a preferred scaling ratio a natural ratio is chosen, Euler's constant, A = e « 2.72. The log-periodic frequency is then UJ = 2TV « 6.28, in agreement with observation. A mechanism that might produce this preferred scaling ratio is unknown and this issue is only discussed here to generate interest in the problem. The discovery of a mechanism whereby UJ is fixed would be a great boon to forecasting because this parameter is one of the most problematic for the optimization routine. (Incidentally, fixing UJ — 2 T T , the next crash (in the hypothetical portfolio) is forecasted to occur in the third week of October, 2000.) 6.6 Summary In this chapter two experiments were performed with a hypothetical portfolio of nine stocks. In the first experiment it was observed that the fixed investment strategy (FIS) performs sufficiently well to justify its application in the Decentralized Stock Exchange Model (DSEM). Although it underperformed when compared to a trivial "Buy-and-hold" strategy, this is attributed to the strong upward trend in the portfo-lio over the course of the experiment. In each case when the climb was interrupted the FIS managed to "catch up to" and surpass the Buy-and-hold strategy, only to lag behind again when the trend re-emerged. The FIS also had the favourable 158 property that it significantly reduced the kurtosis of the distribution of returns, es-sentially taming the largest fluctuations. This may be relevant to derivative pricing theory [65] which assumes Gaussian-distributed increments with no excess kurtosis. The second experiment tested a method for forecasting financial crashes. The method relies on log-periodic oscillations in the price series which accelerate as the time of the crash approaches. The data suggest that log-periodic precur-sors probably do exist but they offer little, if any, prediction advantage because the method requires solving an optimization problem involving seven nonlinear param-eters. Thus, the optimization procedure tends to get stuck in local, sub-optimal regions of the parameter landscape, frequently producing false-positive forecasts. It would be interesting to discover whether stochastic optimization techniques, such as simulated annealing [61], could provide better forecasts. 159 Chapter 7 C o n c l u d i n g r ema rk s 7.1 Review Traditional economic theory interprets stock markets as equilibrium systems driven by exogenous events. But this theory is incapable of explaining some peculiarities— such as the prevalence of large fluctuations—which are observed to be universal across all markets. Instead, these phenomena are traditionally attributed to the exogenous driving factors. The goal of this thesis was to discover whether these anomalies may arise directly from simple interactions between a large number of investors, and not depend on extraordinary external influences. 7.1.1 Anomalous market properties Some of the peculiarities observed in the markets and not explainable by traditional economic theory follow: Scal ing Firstly, the distribution of returns (be they price returns for a particular stock or index returns) contain too many outliers to be adequately described by the de-fault Gaussian distribution. In Chapter 5 some alternatives were presented which properly capture the extra "weight" contained in the distribution tails. Empiri-cal analysis suggests the distribution is best described by a Levy distribution with exponent a « 1.40 [10] which is truncated for large returns by either a decaying exponential or a power law with an exponent near three. 160 Clus t e r ed vo la t i l i t y Secondly, although the price series has no (significant) memory—supporting the hypothesis that markets are efficient, containing no arbitrage opportunities—the same cannot be said for market volatility. Volatility, which describes the degree of excitation or uncertainty in the market and is quantified most simply by the absolute value of the price returns, exhibits extremely long temporal correlations. High volatility tends to follow high and low follows low, resulting in clusters of activity. This conflicts with traditional economic theory which states that fluctuations should be regular and uncorrelated. To test the hypothesis that these properties may emerge spontaneously from the interactions of many simple investors, two market simulations, the Centralized and Decentralized Stock Exchange Models ( C S E M and DSEM) were constructed in Chapters 2 and 3, respectively. 7.1.2 Centralized stock exchange model C S E M was a traditional simulation, building on similar models developed over the last few years. Its main features include centralized trading (all traders deal with a single market maker), synchronous updating and forecasting of returns. Each forecast was deliberately nudged by a normally-distributed amount with standard deviation ae, the forecast error. It was discovered that as the forecast error was reduced the system underwent a second-order (critical) phase transition near ac « 0.08, below which the price diverged (or would have if it wasn't artificially bounded). When the distribution of the price returns was computed it was discovered that C S E M was only able to produce an overabundance of outliers (compared with the Gaussian) below the critical point, precisely in the regime where the price series is known to be unrealistic. Above the critical point the distribution fit very well to a Gaussian. Thus, C S E M is unable to capture the anomalous "fat tails" phenomenon observed empirically. Since it failed this first test, it was not tested for any of the other properties mentioned above. Instead, focus was shifted to the decentralized model. (In retrospect, C S E M may have been abandoned too rashly. By allowing multiple values of the control parameter, as in D S E M , more realistic dynamics may be realizable. This hypothesis wi l l be tested.) 7.1.3 Decentralized stock exchange model D S E M arose from dissatisfaction with the structure of C S E M : synchronous, central-ized trading was replaced with asynchronous, decentralized, trading directly between market participants and the need for forecasting was eliminated with a simple fixed 161 investment strategy in which agents trade in order to maintain a balance between stock and cash. To drive the dynamics the ideal investment fraction was allowed to be affected by exogenous news events (modeled as a discrete Brownian process) and endogenously by price movements. The dynamics were observed to have three phases of existence, depending on the strength of the agents' response to price movements: when the price response was in the region r i > rp > r2 autocorrelations in the price series were relatively weak but as the price response passed the critical point r\ = 1 very strong positive correlations emerged and the price diverged rapidly. The third phase was found when the price response dropped below r2 ~ —0.33, revealing a first-order phase transition. Below this point the price series was strongly anticorrelated. When all agents were forced to share the same price response scaling in the price return distribution could not be induced except in the phases which exhibited unrealistic memory effects. However, if the price response was sampled from a two-point distribution, scaling (with a realistic truncation for large returns) was found for a number of simulations, the best predictor for scaling being that the upper price response exceeded one, > 1- For those runs which did exhibit scaling the exponent was found to be a = 1.64 ±0.25, which compares favourably with the best known empirical quantity 1.40 ± 0.05 [10]. Having found that D SEM could capture this anomalous property of empir-ical markets it was also tested for memory effects, again using the two-point price response distribution. It was found that the price series did not have a significant memory provided that the lower bound of the price response was in the region 05 < ri0 < 1. Similarly, volatility clustering was observed when the upper limit exceeded r^i > 1.25 or when the lower limit was below ria < —0.5. A l l three requirements were met when 0.5 < r/0 < 1 and r^i > 1-25. What this means for real markets will be discussed below. But first we review the remain-der of the thesis. 7.1.4 Fixed investment strategy DSEM was constructed on the principle of the fixed investment strategy (FIS) which states that one should adjust one's portfolio in order to maintain a balance between the capital invested in a risky stock and the capital held in a safe(r) asset. In Chapter 6 the results of an experiment intended to test the credibility of the FIS in a "real-world" situation (with trading costs, etc.) were reported. It was discovered that the FIS actually underperformed when compared with a simple "Buy-and-hold" strategy, at least over this particular realization. This is probably due to the strong trend observed in the portfolio over the course of 162 the experiment, in which the capital nearly doubled. The FIS is designed to take advantage of fluctuations in the price series and is sub-optimal in the presence of a long-term trend. However, the experiment did reveal an interesting (and possibly advanta-geous) feature of the FIS: it minimized the risk in the sense that it reduced the frequency of large events (both up and down) as measured by the excess kurtosis. By applying the FIS large fluctuations were scaled down bringing them in line with the Gaussian distribution which is typically assumed. Of course, it should be re-membered that these conclusions are less than rigorous, being the result of a single brief experiment with a particular portfolio. 7.1.5 Log-periodic precursors While the FIS experiment was running the hypothesis that market crashes are her-alded by log-periodic precursors was also tested. The theory derives from discrete scale invariance and suggests that, in some cases, systems approaching a critical event may exhibit accelerating oscillations in the power law describing the critical point. It has been suggested that detecting these oscillations may improve predic-tions of the critical event time and recent work in seismology is promising. But the financial data from the FIS experiment indicate that, even if log-periodic precursors do exist, technical optimization difficulties prevent any accurate forecasts of large fluctuations therefrom. 7.2 Conclusions to be drawn from this research The main point the reader should draw from this thesis is that it is possible to replicate realistic market dynamics with a many-agent model with simple driving forces. D S E M was driven by a simple (discrete) Brownian motion without fat tails and having no memory, but through the interactions of the agents both fat tails and long memories (in the volatility) emerged. Similarly, these properties may arise endogenously in real economic systems, and appeals to anomalous external events to explain them may be unwarranted. Interestingly, the most realistic simulations were observed when the price response (control parameter) was centered around a critical point at rp = r\ — 1. If D S E M is assumed to properly capture the essence of real markets the question is naturally raised: "Why are the markets tuned to this region of parameter space?" The fact that this region encompasses a critical point is suggestive of a concept called self-organized criticality (SOC) which claims that many dynamical systems 163 spontaneously evolve towards a critical point [11,12, 88]. The problem with this description is that it adds nothing to our knowledge: it does not tells us how or why the market self-organizes. In a simple economic model involving producers and consumers it was dis-covered that the system self-organizes to the critical state in order to maximize efficiency [95, Ch. 11]. On one side of the critical point the supply outweighs de-mand and on the other the reverse is true. In this example it is easy to see why the market would self-organize. To test whether a similar process could drive DSEM to the critical state DSEM has been extended to allow the agents to adjust their pref-erences (news response and price response parameters) when their current choices are performing poorly. This is discussed further below. Another interesting consequence of the observation that the price response is centered around rp = 1 is that—if D SEM is at all meaningful—real investors do watch (and base decisions on) trends in stock prices. In DSEM, to get realistic behaviour, even the least responsive agents had to have ria > 0.5 which can roughly be interpreted as the perceived autocorrelation between successive returns. DSEM suggests that there do not exist any (pure) fundamentalist traders (who respond only to fundamental information about the company and are unconcerned with the stock's price movements) in real markets. Unfortunately, while an interesting hypothesis, it is not clear how this assertion could be tested empirically. 7.3 Relation to other work in the field Quite a few market models have been developed over the last few years. In this section some of these models are contrasted with C SEM and DSEM. We begin by comparing how the price is chosen in the models. Recall that in C S E M the price is set by an auctioneer in order to balance supply and demand. In DSEM, however, the price is simply the most recent trading price. In most of the models reviewed the price was set by an external market maker as in C S E M [17,27,28,30-36,96-99] the only exceptions being reaction-diffusion models [47,100] in which buyers and sellers diffuse in price space and a trade is executed when they meet. DSEM provides a new mechanism for allowing the price to emerge directly from the agents' decisions. Another major difference between the CSEM and DSEM is in how they are updated: in CSEM trades are executed synchronously, once per day while DSEM allows trading in real time, with agents choosing their own activation times. On this front it appears that asynchronous updating is becoming more prevalent [17,32,34, 98] with more of the older sources choosing parallel updating [27-30,33,35,47]. This 164 is fortunate because a mounting volume of evidence suggests that parallel updating may introduce spurious artifacts into simulation dynamics [14,49-52]. The preferred litmus test for each of these models is whether they can repro-duce fat tails in the price return distribution and many of them can [30,31,33,34, 96-99,101]. The Cont-Bouchaud percolation model [31] has received a great deal of atten-tion lately [34,96-99]. It is characterized by a network of information which produces herding effects. The advantage of the model is that analytic results exist [31,101] which predict that the price return distribution should have a (truncated) power law distribution (with a scaling exponent a = 3/2). It has also been demonstrated to exhibit clustered volatility [97, 99]. D SEM provides an alternative explanation which does not require herding. However, it would be interesting to know what the consequences of herding would be, which brings us to directions for future research. 7.4 Avenues for further work I conclude this thesis with some thoughts on how DSEM may be extended to produce new insights and on further statistical properties which could be tested: As discussed above, one of the most pressing issues is whether scaling and clustered volatility can emerge spontaneously without requiring tuning of the price response parameters. This can be tested by allowing the agents to choose their preferences (response parameters) as they see fit. To do so, a meta-strategy is re-quired which controls when an agent adjusts its preferences and by how much. An arbitrary but reasonable choice is to allow preference adjustments when the agent's performance is demonstrably poor: for instance, if the agent sells shares at a price below the average price it bought them for. When this occurs the agent randomly shifts its preferences by some amount. This has been recently coded into DSEM and research is ongoing. Another interesting direction to explore is the extension of D SEM to support multiple stocks. This idea was inspired by Bak et al. [47] in which they described adding a new stock as adding a new dimension in price space. It is well known that the dimensionality is one of the few factors which can impact the character of a critical point [61] so it would be interesting to see how the critical point in DSEM would be affected. On the surface CSEM and DSEM are quite different. However, it should be possible to modify DSEM such that all trades are handled by a centralized control or market maker. The agents could respond to orders called out by an auctioneer in similar manner to CSEM. Discovering whether scaling and clustered volatility are 165 robust to these changes would be very informative. On the experimentation side, there are a number of statistical properties which could be tested for. One of these is an asymmetry between up- and down-movements in the price series. Roehner and Sornette [102] found that peaks tend to be sharp but troughs (lows) tend to be flat. Since D SEM is symmetric in its response to up- and down-moves it would be surprising if this asymmetry could be replicated. Another interesting property which is currently being tested (but did not make it into this thesis) is Pareto's law for the distribution of incomes which states that the richest segment of the population have incomes in excess of that predicted by the log-normal distribution (which fits the majority of the population). This is thought to be an amplification effect whereby the richest individuals are able to leverage their wealth to increase their income faster than others [83]. Data are being collected to test for this effect in DSEM. Beyond that, the price series contains more information than just the dis-tribution of returns. For instance, the intra-trade interval and bid-offer spread are also interesting with testable distributions [48]. Finally, evidence is mounting that the distribution of empirical returns is truncated by an inverse cubic power law [6,7,69] rather than the exponential assumed in Section 5.1. It would be useful to determine which hypothesis D S E M obeys. To do so, much larger datasets are required in order to determine the distribution of very large returns (since it is difficult to distinguish the two hypotheses on scales studied in this dissertation). Alternatively, the moments of the distribution could be explored: if the exponential truncation holds then all moments should be finite but the inverse cubic implies that the k-th. moment should diverge as the index k increases to three. Either way, it would be valuable to determine if the distribution of returns in DSEM is truncated by an inverse cubic as appears to be the case for empirical data. In short, many exciting possibilities remain for future research into DSEM. 166 Bibliography [1] D. Sornette, A. Johansen, and J.-P. Bouchaud. Stock market crashes, precur-sors and replicas. J. Phys. I France, 6:167-75, 1996. [2] Swiss Franc-U.S. Dollar tickwise exchange rate data, 1985-1991. Avail-able from http://www.stern.nyu.edu/~aweigend/Time-Series/Data/ SFR-USD. Tickwise .gz, provided by Andreas Weigend. [3] Dow Jones Industral Average: Daily close, 1896-1999. Available from http: //www.economagic.com/em-cgi/data.exe/djind/day-djiac, provided by Economagic.com. [4] Rama Cont, Marc Potters, and Jean-Philippe Bouchaud. Scaling in stock market data: Stable laws and beyond. arXivxond-mat/9705087, 1997. [5] Benoit B. Mandelbrot. Fractals and Scaling in Finance: Discontinuity, Con-centration, Risk. Springer-Verlag, New York, 1997. [6] P. Gopikrishnan, M. Meyer, L. A. N. Amaral, and H. E. Stanley. Inverse cubic law for the distribution of stock price variations. Eur. Phys. J. B, 3:139-40, 1998. [7] Parameswaran Gopikrishnan, Vasiliki Plerou, Luis A. Nunes Amaral, Martin Meyer, and H. Eugene Stanley. Scaling of the distribution of fluctuations of financial market indices. Phys. Rev. E, 60:5305-16, 1999. arXiv:cond-mat/9905305. [8] 0. G. Mouritsen. Computer Studies of Phase Transitions and Critical Phe-nomena. Springer-Verlag, Berlin, Heidelberg, 1984. [9] B. B. Mandelbrot. The variation of certain speculative prices. J. Business, 36:394-419, 1963. [10] Rosario N. Mantegna and H. Eugene Stanley. Scaling behaviour in the dy-namics of an economic index. Nature, 376:46-9, 1995. 167 [11] P. Bak, C. Tang, and K. Wiesenfeld. Self-organized criticality: An explanation of 1// noise. Phys. Rev. Lett, 59:381-4, 1987. [12] P. Bak, C. Tang, and K. Wiesenfeld. Self-organized criticality. Phys. Rev. A, 38:364-74, 1988. [13] D. Sornette, A. Johansen, and I. Dornic. Mapping self-organized criticality onto criticality. J. Phys. I France, 5:325-35, 1995. h t t p : / / a l f . n b i . d k / " j ohansen/Papers/nosoc.ps.gz. [14] Hendrik J . Blok and Birger Bergersen. Synchronous versus asynchronous updating in the "game of life". Phys. Rev. E, 59:3876-9, 1999. h t t p : // r i kb lok .c jb .ne t/ l i b/b lok99 .h tml . [15] Gold-silver price ratios, 1257-1999. Available from h t t p : / / w w w . globalfindata.com/freecom.htm, provided by Global Financial Data. [16] S. Grossman. On the efficiency of competitive stock markets where trades have diverse information. J. Finance, 31:573-85, 1976. [17] M. Youssefmir, B. A. Huberman, and T. Hogg. Bubbles and market crashes, ftp://parcftp.xerox.com/pub/dynamics/bubbles.ps, 1994. [18] T. Hogg, B. A. Huberman, and M. Youssefmir. The instability of markets, ftp://parcftp.xerox.com/pub/dynamics/bubbles.ps, 1995. [19] David P. Brown and Zhi Ming Zhang. Market orders and market efficiency. J. Finance, 52:277-308, 1997. [20] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, second edition, 1992. http : / /www .nr .com. [21] B. M. Gammel. Hurst's rescaled range statistical analysis for pseudorandom number generators used in physical simulations. Phys. Rev. E, 58:2586-2597, 1998. [22] Makoto Matsumoto and Takuji Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulations, 8:3-30, 1998. http: / /www.math.keio.ac.jp/"nisimura/random/doc/mt.ps. [23] John M. Dutton and Will iam H. Starbuck, editors. Computer simulation of human behavior, New York, 1971. Wiley. 168 [24] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley and Sons, New York, 1991. [25] Edwin T. Jay nes. Probability theory: The logic of science, http://bayes. wust l .edu/etj/prob.html, 1996. [26] Mark B. Garman. Market microstructure. J. Fin. Econ., 3:257-275, 1976. [27] R. G. Palmer, W. B. Arthur, J . H. Holland, B. LeBaron, and P. Itylor. Artificial economic life: A simple model of a stockmarket. Physica D, 75:264-74, 1994. [28] M. Levy, H. Levy, and S. Solomon. Microscopic simulation of the stock market: The effect of microscopic diversity. J. Phys. I France, 5:1087-107, 1995. [29] W. Brian Arthur, John H. Holland, Blake LeBaron, Richard Palmer, and Paul Tayler. Asset pricing under endogenous expectations in an artificial stock mar-ket. In W. Brian Arthur, Steven N. Durlauf, and David A. Lane, editors, The Economy as an Evolving Complex System II. Addison-Wesley, 1997. h t t p : //www.santafe.edu/sfi/publications/Working-Papers/96-12-093.ps. [30] G. Caldarelli, M. Marsili, and Y.-C. Zhang. A prototype model of stock exchange. Europhys. Lett, 40:479-84, 1997. arXiv:cond-mat/9709118. [31] R. Cont and J.-P. Bouchaud. Herd behavior and aggregate fluctuations in financial markets. arXiv:cond-mat/9712318, December 1997. [32] S. H. Chen, T. Lux, and M. Marchesi. Testing for non-linear structure in an artificial financial market, http://www.ge. infm. it/econophysics/papers/ l ux/non l i n . z i p , 1998. [33] Christian Busshaus and Heiko Rieger. A prognosis oriented microscopic stock market model. Physica A, 267:443-52, 1999. arXiv:cond-mat/9903079. [34] Debashish Chowdhury and Dietrich Stauffer. A generalized spin model of financial markets. Eur. Phys. J. B, 8:477-82, 1999. arXiv:cond-mat/9810162. [35] Giulia Iori. A microsimulation of traders activity in the stock market: The role of heterogeneity, agents' interactions and trade frictions. arXiv:adap-org/9905005, 1999. [36] Thomas Lux and Michele Marchesi. Scaling and criticality in a stochastic multi-agent model of a financial market. Nature, 397:498-500, 1999. h t t p : //www.ge.infm.it/econophysics/papers/lux/lux-marchesi.ps.gz. 169 [37] John Von Neumann and Oskar Morgenstern. Theory of games and economic behavior. Princeton University Press, Princeton, 1944. [38] M. Marsili, S. Maslov, and Y.-C. Zhang. Dynamical optimization theory of a diversified portfolio. Math, and Theor. Methods in Physics, 7:403, 1998. arXiv:cond-mat/9801239. [39] John W. Pratt. Risk aversion in the small and the large. Econometrica, 32:122-136, 1964. [40] Haiping Fang and Liangyue Cao. Predicting and characterizing data sequences from structure-variable systems. Phys. Rev. E, 51:6254-7, 1995. [41] J . Doyne Farmer and J. J . Sidorowich. Predicting chaotic time series. Phys. Rev. Lett, 59:845-8, 1987. [42] G. Sugihara and R. M. May. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature, 344:734-41, 1990. [43] M. Casdagli. Chaos and deterministic versus stochastic non-linear modelling. J. R. Statist Soc. B, 54:303-28, 1991. [44] M. Palus. Detecting nonlinearity in multivariate time series. arXivxomp-gas/9507004, 1995. [45] M. Palus, L. Pecen, and D. Pivka. Estimating predictability: Redundancy and surrogate data method. arXiv:comp-gas/9507003, 1995. [46] M. F. M. Osborne. Brownian motion in the stock market. Operations Research, 7:145-73, March-April 1959. [47] P. Bak, M. Paczuski, and M. Shubik. Price variations in a stock market with many agents. Physica A, 246:430-53, 1997. arXiv:cond-mat/9609144. [48] David Eliezer and Ian I. Kogan. Scaling laws for the market microstructure of the interdealer broker markets. arXivxond-mat/9808240, 1998. [49] B. A. Huberman and N. S. Glance. Evolutionary games and computer simu-lations. Proc. Natl. Acad. Sci. USA, 90:7716-8, 1993. [50] H. Bersini and V. Detours. Asynchrony induces stability in cellular automata based models. In R. A. Brooks and P. Maes, editors, Artificial Life IV, Proceed-ings of the Fourth International Workshop on the Synthesis, and Simulation of Living Systems, pages 382-7, Cambridge, 1994. MIT Press. 170 [51] N. Rajewsky and M. Schreckenberg. Exact results for one-dimensional stochas-tic cellular automata with different types of updates. Physica A, 245:139-144, 1997. arXiv:cond-mat/9611154. [52] J . Rolf, T. Bohr, and M. H . Jensen. Directed percolation universality in asyn-chronous evolution of spatiotemporal intermittency. Phys. Rev. E, 57:R2503-6, 1998. [53] Sorin Solomon. Towards behaviorly realistic simulations of the stock market. arXiv:adap-org/9901003, 1999. [54] Barry G. Lawson and Steve Park. Asynchronous time evolution in an artificial society model. J. Artificial Soc. and Social Sim., 3(1), 2000. http://www. soc.surrey.ac.uk/JASSS/3/1/2.html. [55] J . L. Kelly. A new interpretation of information rate. Bell Syst. Tech. J., 35:917-26, 1956. http://www.bjmath.com/bjmath/kelly/kelly.pdf. [56] Robert C. Merton. Continuous-Time Finance. Blackwell, Cambridge, 1992. [57] S. Maslov and Y.-C. Zhang. Optimal investment strategy for risky assets. Int. J. Theor. and Applied Finance, 1(3):377, 1998. arXiv:cond-mat/9801240. [58] N. G. Van Kampen. Stochastic Processes in Physics and Chemistry. North-Holland, 1981. [59] J . Doyne Farmer. Market force, ecology, and evolution. arXiv:adap-org/9812005, 1998. [60] New York Stock Exchange daily returns and volume, 1962-1988. Available from http://www.stern.nyu.edu/~aweigend/Time-Series/Data/NYSE. Date.Day.Return.Volume.Vola, provided by Blake LeBaron. [61] M. Plischke and Birger Bergersen. Equilibirum Statistical Physics. World Scientific, second edition, 1994. [62] R. Kohl. The influence of the number of different stocks on the Levy-Levy-Solomon model. Int. J. Mod. Phys. C, 8:1309-1316, 1997. [63] E. Egenter, T. Lux, and D. Stauffer. Finite-size effects in Monte Carlo simu-lations of two stock market models. Physica A, 268:250-6, 1999. [64] L. Bachelier. Theorie de la speculation. PhD thesis, Ann. Sci. de l'Ecole Normale Superiure, 1900. 171 [65] Fischer Black and Myron Scholes. The pricing of options and corporate lia-bilities. J. Political Economy, 81:637-654, 1973. [66] Zoltan Palagyi and Rosario N. Mantegna. Empirical investigation of stock price dynamics in an emerging market. Physica A, 269:132-9, 1999. [67] J.-Ph. Bouchaud. Elements for a theory of financial risks. Physica A, 263:415-26, 1999. [68] Hari M. Gupta and Jose R. Campanha. The gradually truncated Levy flight for systems with power law distributions. Physica A, 268:231-239, 1999. [69] Vasiliki Plerou, Parameswaran Gopikrishnan, Luis A. Nunes Amaral, Martin Meyer, and H. Eugene Stanley. Scaling of the distribution of price fluctuations of individual companies. Phys. Rev. E, 60:6519-29, 1999. [70] Ismo Koponen. Analytic approach to the problem of convergence of truncated Levy flights towards the Gaussian stochastic process. Phys. Rev. E, 52:1197-9, 1995. [71] Rosario N. Mantegna and H. Eugene Stanley. Stochastic process with ultraslow convergence to a Gaussian: The truncated Levy flight. Phys. Rev. Lett., 73:2946-9, 1994. [72] Hendrik J. Blok and Birger Bergersen. Effect of boundary conditions on scaling in the "game of Life". Phys. Rev. E, 55:6249-52, 1997. h t t p : / / r i k b l o k . c j b . net/ l ib/blok97.html. [73] Yahoo! Finance historical quotes, 1999. http://chart.yahoo.eom/t. [74] Peter K. Clark. A subordinated stochastic process model with finite variance for speculative prices. Econometrica, 41:135-55, 1973. [75] Yi-Cheng Zhang. Toward a theory of marginally efficient markets. arXivxond-mat/9901243, 1999. [76] Andrew W. Lo. Long-term memory in stock market prices. Econometrica, 59:1279-1313, 1991. [77] Pierre Cizeau, Yanhui Liu, Martin Meyer, C.-K. Peng, and H. Eugene Stanley. Volatility distribution in the S&P500 stock index. Physica A, 245:441-5, 1997. [78] Yanhui Liu, Pierre Cizeau, Martin Meyer, C.-K. Peng, and H. Eugene Stanley. Correlations in economic time series. Physica A, 245:437-40, 1997. 172 [79] J . Rotyis and G. Vattay. Statistical analysis of the stock index of the budapest stock exchange. arXivxond-mat/9711008, 1997. [80] A. Arneodo, J.-F. Muzy, and D. Sornette. "Direct" causal cascade in the stock market. Eur. Phys. J. B, 2:277-282, 1998. [81] Rosario N. Mantegna, Zoltan Palagyi, and H. Eugene Stanley. Applications of statistical mechanics to finance. Physica A, 274:216-221, 1999. [82] Hans E. Schepers, Johannes H. G. M. Van Beek, and James B. Bassingth-waighte. Four methods to estimate the fractal dimension from self-affine sig-nals. IEEE Engineering in Medicine and Biology, 11:57-64,71, 1992. [83] E. W. Montroll and M. F. Shlesinger. Maximum entropy formalism, fractals, scaling phenomena, and 1// noise: A tale of tails. J. Stat. Phys., 32(2):209-30, 1983. [84] Anders Johansen and Didier Sornette. The Nasdaq crash of Apr i l 2000: Yet another example of log-periodicity in a speculative bubble ending in a crash. arXiv:cond-mat/0004263, 2000. [85] Sergei Maslov and Yi-Cheng Zhang. Probability distribution of drawdowns in risky investments. Physica A, 262:232-41, 1999. arXiv:cond-mat/9808295. [86] H. Saleur, C. G. Sammis, and D. Sornette. Discrete scale invariance, complex fractal dimensions, and log-periodic fluctuations in seismicity. J. Geophys. Research, 101:17661-77, 1996. [87] Didier Sornette. Discrete scale invariance and complex dimensions. Physics Reports, 297:239-270, 1998. arXiv:cond-mat/9707012. [88] P. Bak and M. Paczuski. Why nature is complex. Physics World, pages 39-43, December 1993. [89] Daniel Groleau. Study of the Scaling and Temporal Properties of a Simplified Earthquake Model. PhD thesis, University of British Columbia, 1997. [90] D. Sornette and C. G. Sammis. Complex critical exponents from renormal-ization group theory of earthquakes: Implications for earthquake predictions. J. Phys. I France, 5:607-19, 1995. [91] Anders Johansen, Didier Sornette, Hiroshi Wakita, Urumu Tsunogai, Wil l iam L. Newman, and Hubert Saleur. Discrete scaling in earthquake pre-cursory phenomena: Evidence in the kobe earthquake, japan. J. Phys. I France, 6:1391-402, 1996. 173 [92] Anders Johansen, Didier Sornette, and Olivier Ledoit. Predicting financtial crashes using discrete scale invariance. J. Risk, l(4):5-32, 1999. arXiv:cond-mat/9903321. [93] Laurent Laloux, Marc Potters, Rama Cont, Jean-Pierre Aguilar, and Jean-Philippe Bouchaud. Are financial crashes predictable? arXivxond-mat/9804111, 1998. [94] L. S. Lasdon, A. D. Waren, A. Jain, and M. Ratner. Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans, on Math. Software, 4:34-50, 1978. [95] Per Bak. How Nature works: The science of self-organized criticality. Springer-Verlag, New York, 1996. [96] D. Stauffer and T.J.P. Penna. Crossover in the Cont-Bouchaud percolation model for market fluctuations. Physica A, 256:284-90, 1998. [97] Iksoo Chang and Dietrich Stauffer. Fundamental judgement in Cont-Bouchaud herding model of market fluctuations. Physica A, 264:294-8, 1999. [98] Victor M. Eguiluz and Martin G. Zimmermann. Dispersion of rumors and herd behavior. arXiv:cond-mat/9908069, 1999. [99] Dietrich Stauffer and Didier Sornette. Self-organized percolation model for stock market fluctuations. Physica A, 271:496-506, 1999. [100] Lei-Han Tang and Guang-Shan Tian. Reaction-diffusion-branching models of stock price fluctuations. Physica A, 264:543-50, 1999. [101] R. D'Hulst and G. J . Rodgers. Exact solution of a model for crowding and information transmission in financial markets. arXiv:cond-mat/9908481, 1999. [102] B. M. Roehner and D. Sornette. The sharp peak-flat trough pattern and critical speculation. Eur. Phys. J. B, 4:387-399, 1998. [103] Benoit B. Mandelbrot and John W. Van Ness. Fractional Brownian motions, fractional noises and applications. SIAM Review, 10:422-37, 1968. [104] Benoit B. Mandelbrot. A fast fractional Gaussian noise generator. Water Resources Research, 7:543-53, 1971. [105] Murad S. Taqqu, Vadim Teverovsky, and Walter Willinger. Estimators for long-range dependence: An empirical study. Fractals, 3:785-788, 1995. 174 [106] Sandro Rambaldi and Ombretta Pinazza. An accurate fractional Brownian motion generator. Physica A, 208:21-30, 1994. [107] Z.-M. Y in. New methods for simulation of fractional Brownian motion. Journal of Computational Physics, 127:66-72, 1996. [108] James B. Bassingthwaighte and Gary M. Raymond. Evaluating rescaled range analysis for time series. Annals of Biomedical Engineering, 22:432-44, 1994. [109] Gary M. Raymond and James B. Bassingthwaighte. Deriving dispersional and scaled windowed variance analyses using the correlation function of discrete fractional Gaussian noise. Physica A, 265:85-96, 1999. [110] Berndt Pilgram and Daniel T. Kaplan. A comparison of estimators for 1// noise. Physica D, 114:108-122, 1998. [ I l l] B. Mandelbrot. The Fractal Geometry of Nature. Freeman, 1983. [112] Oscar J . Mesa and German Poveda. The Hurst effect: The scale of fluctuation approach. Water Resources Research, 29:3995-4002, 1993. [113] Philippe Carmona and Laure Coutin. Fractional Brownian motion and the Markov property. arXiv:math.PR/9809123, 1998. [114] R. F. Voss. Random fractal forgeries. In R. A. Earnshaw, editor, Fundamental Algorithms in Computer Graphics, pages 805-835. Springer-Verlag, Berlin, 1985. [115] Vern Paxson. Fast, approximate synthesis of fractional Gaussian noise for generating self-similar network traffic. Computer Communications Review, 27(5):5-18, 1997. [116] Y. G. Sinai. Self-similar probability distributions. Theory of Probability and its Applications, 21:64-80, 1976. [117] James B. Bassingthwaighte and Gary M. Raymond. Evaluation of the disper-sional analysis method for fractal time series. Annals of Biomedical Engineer-ing, 23:491-505, 1995. [118] David C. Caccia, Donald Percival, Michael J. Cannon, Gary Raymond, and James B. Bassingthwaighte. Analyzing exact fractal time series: Evaluating dispersional analysis and rescaled range analysis. Physica A, 246:609-632, 1997. 175 [119] H. E. Hurst. Long-term storage capacity of reservoirs. Trans. Am. Soc. Civ. Eng., 116:770-808, 1951. [120] Jens Feder. Fractals. Plenum, New York, 1988. [121] Yanqing Chen, Mingzhou Ding, and J. A. Scott Kelso. Long memory processes (1// Q type) in human coordination. Phys. Rev. Lett, 79:4501-4504, 1997. [122] Larry S. Liebovitch and Weiming Yang. Transition from persistent to antiper-sistent correlation in biological systems. Phys. Rev. E, 56:4557-4566, 1997. [123] R. Oliver and J . L. Ballester. Is there memory in solar activity? Phys. Rev. E, 58:5650-5654, 1998. [124] Murad S. Taqqu and Vadim Teverovsky. Robustness of Whittle-type estima-tors for time series with long-range dependence. Stochastic Models, 13:723-757, 1997. http://math.bu.edu/INDIVIDUAL/murad/pub/robustness-posted. ps. [125] Michael J . Cannon, Donald B. Percival, David C. Caccia, Gary M. Raymond, and James B. Bassingthwaighte. Evaluating scaled window variance methods for estimating the Hurst coefficient of time series. Physica A, 241:606-26, 1997. [126] C.-K. Peng, S. V. Buldyrev, S. Havlin, M. Simons, H. E. Stanley, and A. L. Goldberger. Mosaic organization of DNA nucleotides. Phys. Rev. E, 49:1685-1689, 1994. [127] Elliott W. Montroll and Bruce J. West. On an enriched collection of stochas-tic processes. In E. W. Montroll and J. L. Lebowitz, editors, Fluctuation Phenomena, pages 61-206. North-Holland, Amsterdam, 1987. [128] Rafal Weron. On the Chambers-Mallows-Stuck method for simulating skewed stable random variables (and correction). Statistical Probabil-ity Letters, 28:165-171, 1996. http://www.im.pwr.wroc.pl/~hugo/publ/ RWeron-SPL-95.ps, correction in Research Report HSC/96/1, Wroclaw Univ. of Technology rweron-spl95-corr.ps. [129] Stephen M. Kogon and Dimitris G. Manolakis. Signal modeling with self-similar a-stable processes: The fractional Levy stable motion model. IEEE Trans, on Signal Processing, 44:1006-1010, 1996. 176 Appendix A Discounted least-squares curve fitting In this appendix the standard method of least-squares curve fitting is modified in order to make it more amenable to time series. In particular the goal is to use time series data for forecasting by extrapolating from historical data. As will be shown this method can require fewer computations and less storage. Also, by discounting historical data extrapolated forecasts become more robust to outliers. The reader should keep in mind that, despite the similarity of notation with standard least-squares curve fitting, the following is specifically meant to be applied to time series, where the relevance of past data are discounted as newer data arrive. This appendix borrows heavily from Press et al.'s excellent discussion of generalized least-squares curve fitting [20, Sect. 15.4] which is highly recommended. A . l Least-squares curve fitting We use the index i to label our data points where i = 0 indicates the most recently acquired datum and i = 1,2,3,.. . indicate successively older data. Each point consists of a triplet (x,y,a) where x is the independent variable (eg. time), y is the dependent variable, and o is the associated measurement error in y. We wish to fit data to a model which is a linear combination of any M specified functions of x. The general form of this kind of model is where X\ (x),..., XM {%) are arbitrary fixed functions of x, called the basis functions. For example, a polynomial of degree M — 1 could be represented by Xj(x) = x3~l. (A. l ) 177 (Note that the functions Xj(x) can be wildly nonlinear functions of x. In this discussion "linear" refers only to the model's dependence on its parameters aj.) A merit function is denned N x 2 = E i=0 Vi ~ T,jajXj{xi) 12 (A.2) which sums the (scaled) squared deviations from the curve of all N points. The goal is to minimize x2-The derivative of x2 with respect to all M parameters aj wil l be zero at the minimum < "1 Xk{xi), k = 1 , . . . , M giving the best parameters aj. If we define the components of an M x M matrix [a] by i and a vector [/3] of length M by Xj(xi)Xk(xj) y%xk{xi) (A.3) (A.4) (A-5) then Eq. A.3 can be written as the single matrix equation [a]-a=[/3] (A.6) where a is the vector form of the parameters aj. Eqs. A.3 and A.6 are known as the normal equations of the least-squares problem and can be solved for the vector parameters a by singular value decompo-sition (SVD) which, although slower than other methods, is more robust and is not susceptible to round-off errors [20, Ch. 2]. A . 2 D i s c o u n t i n g The discussion above applies to all linear least-squares curve fitting. The variation proposed here is to discount the relevance of historical data as new data arrive. This was motivated by time series where the fitting parameters may vary slowly. Fitting time series is typically handled with a moving window over the last N data points. Each of the last N points is weighted equally and all prior data is 178 Data Windowing discarded as shown in Fig. A . l . The discontinuous weighting function can introduce discontinuities in the fitting parameters a,j as the data is updated, particularly when an outlier (a strongly atypical y-value) is suddenly discarded. These discontinuities can be avoided by steadily discounting old data as new data arrive. As will be shown, this method also has computational and resource advantages. As before, we use the index i to label our data points with larger i indicating older data. As a new datum arrives (xo>2/o>°"o) w e shift the indices of prior data and scale up the errors by some factor 0 < 7 < 1 (a:»+i,yt+i,<"i+i) <- (xi,yi,ai/'y). (A.7) If we define a\ as the original value of CTJ then after applying i of the above operations oi = <7*/y (A.8) so, since 7 < 1, the historical deviations grow exponentially as new information is acquired. Increasing the error effectively decreases the weight of a datum in the fitting procedure. Calculation of the covariance matrix and the uncertainties of the parameters proceeds as with standard least-squares fitting (see [20, Ch. 15], for instance) so I will just mention the main result, namely that the inverse of [a] C = [a] " 1 (A.9) gives the covariances of the fitting parameters Cov [aj,ak] = Cjk (A. 10) and the variance of a single parameter is, of course, Vax[aj} = Cjj. (A. 11) A.3 Storage and updating So far we have made no mention of N, the number of data points to be fit. From Fig. A . l it appears we need to store the entire history to apply this technique. But notice that as we acquire a new datum (xo, yo, OQ), from Eqs. A.4 and A.5, the matrix [a] and vector [/?] update as Xjjx^Xkjxo) 2 otkj < 2 '"T aki (A-12) a0 180 and P j ^ X 1 i ^ ) y o + ^ j ( A 1 3 ) °b so it appears we need not store any data points, but should just store [a] and [/3] and update them as new data are accumulated. A useful measure we have neglected to calculate so far is x2, the chi-square statistic itself. In (partial) matrix notation Eq. A.2 can be written .2 _ y% X i % = £ 4 + a M [ < * ] - a - [ / ? ] ) - [ / 3 ] T - a (A.15) = £ 5 > - ^ r - a <A-16> (A.17) which appears to still depend on the data history in the first term. Let us define this term as a new variable 8, 18) i « Then, similarly to Eqs. A.12 and A.13, 8 can be updated as more information is accumulated 8*-^+-y26 (A.19) CTo without requiring the entire data history. Finally, it may be useful to record the number of points accumulated. But because each point loses relevance as it gets "older" we should likewise discount this measure, giving an effective memory N* <- 1 + 7 2 AP (A.20) (not to be confused with the number of parameters M.) So, to store all relevant historical information we need only remember [a], [/?], 8, and N* for a total of M2+M+2 numbers, regardless of how many data points have been acquired. Fig. A.2 shows that for many practical problems discounted least-squares fitting requires less storage than the standard moving window. Although it has not been tested, I expect a similar condition to hold for processing time. As the reader can justify, all of these values should be initialized (prior to any data) with null values: [a] = 0, [j3] = 0, 8 = 0, and N* = 0. 181 100 0 0 2 4 6 8 10 M Figure A.2: Discounted least-squares fitting has a computational storage advantage over moving windows of N data points when N > M2 + M + 2 where M is the number of parameters to be fitted. A . 4 M e m o r y For traditional least-squares fitting it is well known that if the measurement errors of yi are distributed normally then the method is a maximum likelihood estimation and the expectation value of Eq. A.2 evaluates to because each term (yi — y(xi))/oi should be distributed normally with mean zero and variance one and there are N — M degrees of freedom to sum the variances over. Similarly with discounting, assuming (yi —y(xi))/at has variance one (notice this is the unsealed error), (A.21) (A.22) (A.23) N* -M (A.24) from Eq. A.20. 182 Notice that as the amount of data collected grows AMKX> 1 — 7 2 which relates the discounting factor 7 to the effective memory N*. Conversely, it is more natural to set 7 such that it produces the desired memory via ^ a * = Jim N* = T—„7  (A"25) 7(A^ax) = W l - T 7 r - - (A-26) V •'"max A . 5 U n k n o w n measurement errors On occasion measurement uncertainties are unknown and least-squaxes fitting can be used to recover an estimate of these uncertainties. Be forewarned that this technique assumes normally distributed (around the curve) y data with identical variances. If this is not the case, the results become meaningless. It also precludes the use of a "goodness-of-fit" estimator (such as the incomplete gamma function, see [20, Sect. 6.2] because it assumes a good fit. We begin by assuming o\ = 1 for all data points and proceeding with our calculations of a and x 2 - If all (unknown) variances are equal a* = o\ then Eq. A.24 actually becomes (x2} = (N* - M)o*2 (A.27) so the actual data variance is best estimated by ..2 cr *2 N* -M (A.28) We can update our parameter error estimates by recognizing that, from Eqs. A.4 and A.9, the covariance matrix is proportional to the variance in the data, so Cjk^a*2C]k. (A.29) A . 6 Forecast ing Forecasting via curve fitting is a dangerous proposition because it requires extrap-olating into a region beyond the scope of the data, where different rules may apply and, hence, different parameter values. Nevertheless, it is often used simply for its convenience. We assume the latest parameter estimations apply at the forecasted point x and simply use Eq. A . l to predict yf = y(x) = ^ajXj(x). (A.30) 3 183 The uncertainty in the prediction can be estimated from the covariance ma-trix. Recall, the definition of variance is Vsx[z] = ((z-(z))2) (A.31) and the covariance between two variables is defined as Cov [zi,z 2] = ((zi - (zi)) (z2 - (z2))) (A.32) so Eq. A . l has variance Var [y{x)] = Var Y^o.jXj[x) (A.33) = ^ £ ( « i - ( a ; > ) ) (A.34) = Y,Xj(x)((aJ-(aj))(ak-M))Xk(x) (A.35) jk = Y,Xj(x)CjMx) (A-36) jk where C is the covariance matrix with possible updating, in the absence of mea-surement errors, according to Eq. A.29. The above gives the uncertainty in y(x) but in the derivation it was assumed that the observed y-values were distributed normally around the curve where y(x) represents the mean of the distribution. Similarly for the prediction, y(x) is the prediction of the mean with its own uncertainty—on top of which there is the mea-surement uncertainty of data around the mean ameas. These two uncertainties are mutually independent so the variances of the two simply add to give the cumulative variance of the prediction Var[ y / ] = Va r [ y ( x ) ] +a m e a s (A.37) = Y.Xoix)CJkXk{x)+ul^. (A.38) jk A.6.1 Unknown measurement errors If the measurement errors are not known in advance, but are calculated from Eq. A.28 then the above formula should be rewritten Var [yf] = a * 2 [^X^C^X^x) + lj (A.39) where Cjk in this equation, are the covariances without rescaling. 184 A.7 Summary Discounted least-squares curve fitting differs from the traditional linear least-squares method in that the uncertainties of older data are artificially amplified as new data are acquired, effectively discounting the relevance of older data. Discounting pro-vides a very efficient method of storing the entire data series in only M2 + M + 2 values, where M is the number of parameters to be fit, regardless of the length of the series. Discounting also smooths the fit, reducing the effects of outliers. It has been demonstrated how discounted least-squares can be used for fore-casting. Whether it is valid depends very much on the time series in question, and its consistency. If the fitting parameters vary on time scales of the same order or smaller than the memory N* of the fit then the forecasts wil l not be reliable. (Of course, a suitable model of the time series is necessary as well.) I have found no evidence of discounting being applied to curve fitting before; the only similar procedure I have found is "exponential smoothing", a technique which uses damping coefficients to smooth forecasts. However, being such a simple premise I am confident this technique has already been discovered, I just don't know where to look. 185 Appendix B Sampling discrete processes Frequently computer simulations generate synthetic Brownian motion via a simple random walk at discrete intervals. Sampling of such a process to get the distribution of increments p(x) can be problematic because of introduced artifacts which bias the statistics. One often-used statistic is the (excess) kurtosis, defined as Kurt[z] = ^ - 3 (B.l) where fj,k is the fc'th (centered) moment of the distribution ^ = ([*-<*>]*}. (B.2) The kurtosis is useful because it quantifies the "weight" of the distribution tail (far from the mean). For the Gaussian the excess kurtosis is zero (because / X 4 = 3/z2.) compared to which a negative kurtosis indicates less weight in the tails and a positive indicates more. Difficulties arise, however, if the Brownian motion is generated by a discrete process as will be demonstrated in two examples below. Unless great care is taken, the kurtosis may be artificially inflated by regular sampling. B.l Simple random walk Consider a discrete Brownian process with normally-distributed (zero mean, unit variance) jumps at regular intervals of r = 1 (without loss of generality). If this process is sampled at regular intervals of A 7^ r , as demonstrated in Fig. B . l , some intervals will have more "jumps" than others so the distribution of increments will not be Gaussian. 186 Is 6 o d Time Figure B. l : When a random walk is generated at some regular interval and sampled at another, A, the number of jumps between samples will vary. To be precise, let A = n + r where n = [AJ is the largest integer not greater than A (the floor of A) and 0 < r < 1 is the remainder. Then each interval will span at least n jumps, spanning n + 1 with the probability r. Since each jump x is normally distributed JV(:r;0,1) with zero mean and unit variance, j jumps are also normally distributed with zero mean and variance j, denoted by N(x;0,j). The distribution of increments of the random walk, sampled at intervals of A = n + r is then given by RW(x; 0, A) = (1 - r)N(x; 0, n) + rN(x; 0, n + 1). (B.3) Calculating the first four moments of the increment distribution is very straight-forward since l*k[RW{x; 0, A)] = (1 - r)»k[N(x; 0, n)] + r^k[N(x; 0, n + 1)] (B.4) and the normal distribution has moments n\ — 0, /i2[N(x;0,j)] = j , n$ = 0, and /i4 = 3^2- Therefore, the moments of RW are Hi = 0 (B.5) M 2 = (1 - r)n + r(n + l ) = n + r = A (B.6) M3 = 0 (B.7) Hi = 3(1 - r ) n 2 + 3r(n + l ) 2 = 3 (A 2 + r ( l - r)). (B.8) 187 1 Q - 4 I I I I I I I I I I 1 0 1 2 3 4 5 6 7 8 9 10 Sampling interval A Figure B.2: The kurtosis is only zero at integer values of the sampling interval A and diverges as the sampling interval approaches zero. Notice that the variance of the distribution is simply A, exactly the same as for continuous Brownian motion sampled at intervals of A. In fact, all three of the lowest moments are identical to Brownian motion, lulling us into a false sense of security. However, the fourth moment differs and the excess kurtosis, which is zero for Brownian motion, is now Kmt[RW{x; 0, A)] = 3 * ^ ^ (B.9) which, on the surface, would seem to indicate the distribution has fat tails. The kurtosis is only zero at integer values of A (r = 0) and is a maximum for any n when r = n/(l + 2n) as shown in Fig. B.2. In particular, the kurtosis diverges as the sampling rate accelerates Kmt[RW] -> ^ as A -> 0, (B.10) a result of the Dirac delta function N(x;0,0) dominating the distribution, scaling the variance down faster than the fourth moment. Even though all the evidence presented suggests that the distribution of increments in the random walk truly does have fat tails when sampled at non-integer intervals A, it is actually just an artifact of sampling. Since we are getting an overlap of two Gaussian distributions, with variances n and n + 1, the center of the distribution is dominated by the smaller variance 188 -8 -6 -4 -2 0 2 4 6 8 Increment x Figure B.3: The distribution of increments for the random walk appears to have fatter tails than a normal distribution with the same variance when sampled at 2 intervals of A = 1.05. However, the tails still drop off as e _ I . contribution and the tails are dominated by the larger variance. Hence, the second moment of the random walk is scaled down by the smaller variance but the fourth moment is scaled up by the larger. The net effect is the illusion of fat tails in the distribution. 2 However, the tails of the distribution still fall off as e~x , as demonstrated in Fig. B.3, so the term "fat tails" is misleading, usually being reserved for simple exponential or power law tails. Notice that the center of the distribution behaves as a normal with variance A and the tail also behaves as a normal, with variance n + 1, but weighted by r. The crossover between the two regimes, after some algebra, is found to be [ln(n + 1) - 21^ 7 - ^ ) 1 A ( n + 1) L — - j ' • (B.ll) This indicates the scale of increments, x sa ± x C ; f ° r which the distribution will appear most strongly non-Gaussian. Next we consider a process generated at Poisson intervals rather than regular. 189 B . 2 Poisson B r o w n i a n mot ion In this section we again consider a discrete Brownian motion but, in this case, the intervals between the jumps are Poisson-distributed instead of regular. The Poisson distribution gives the probability of j events within a time interval t given an average event rate r = 1 (without loss of generality), P{j,t) = e-tjv (B.12) Given normally-distributed jump sizes the distribution of j jumps is N(x;0,j) so the distribution of increments of the Poisson Gaussian process at intervals of A is oo PG(x;0, A) = £ P ( j , A)N(x;0,j). (B.13) j=0 The analytic solution for the distribution of increments is challenging but the moments of the distribution are relatively easy to compute, r °° tik[PG(x;0,A)] = dxY,P(3,&)N(x;0J)xk (B.14) J j=o oo -= £ P ( j , A ) dxN(x;0,j)xk (B.15) j=o J oo = £ P ( j , A ) M i V ( a ; ; 0 , j ) ] , (B-16) j=o depending directly on the moments of the normal distribution (which were presented in the last section). From the identity ex = ^jX^/jl, the first four moments of the Poisson Gaussian are Mi = 0 (B.17) H2 = A (B.18) M3 = 0 (B.19) Hi = 3A(A + 1). (B.20) Again, the first three moments are unchanged from the normal distribution but the kurtosis becomes Kurt[PG(x;0,A)] = |- (B.21) for all A. (This form was also observed for the random walk in the limit A —> 0.) So, again the kurtosis diverges as the sampling interval drops towards zero. 190 OH Increment x Figure B.4: Discrete Brownian motion with Poisson-distributed jump intervals has tails which fall off exponentially (with a decay constant of 0.72), instead of as e~x , when sampled at regular intervals (A = 1). In the case of the random walk we found the excess kurtosis arose from the 2 overlap of two Gaussians but the tails still fell off as e~x . However, for Poisson Brownian motion the distribution tails are much heavier. A synthetic dataset gen-erated from a Poisson Brownian motion sampled at intervals of A — 1 (Fig. B.4) shows that the tails fall off only exponentially, B.3 Sampling Evidently, by generating a synthetic Brownian motion at non-uniform intervals, the illusion of fat tails can be achieved by simply sampling the process regularly. However, the underlying process is still generated by Gaussian-distributed jumps and, over long timescales, still looks like Brownian motion. The easiest way to avoid these artifacts is to not sample the process in "real time" but in "event time." That is, take a single sample after each event. Then, the underlying jump process will be revealed without any complications from zero or multiple events per sample. Unfortunately, in some cases the available data do not allow for the de-termination of individual events. In this case, a very high frequency sampling is recommended and all intervals with zero increment should be discarded. High fre-191 quency sampling minimizes the likelihood that multiple jumps could occur in any one interval but increases the likelihood of zero increments. By discarding these null events only the intervals with a single increment remain. (This also discards actual jump events of size zero but this should have a minimal bias on the statistics since a jump size of identically zero has a negligible probability measure.) 192 Appendix C Long-range memory: The Hurst exponent Brownian motion (in one dimension) is a random walk on the line where the step length is given by a mean zero Gaussian (normal) probability distribution. Since each of the steps are independent the cumulative position X is known to obey (X(t)-X(0)) = 0 (C.l) (fX{t) - AT(0)] 2} 1 / 2 oc \t\l/2 (C.2) so the standard deviation from the origin grows as t 1 / 2 . Mandelbrot and Van Ness [103] introduced fractional Brownian motion (fBm) as a generalization to processes which grow at different rates tH ([XH(t)-XH(0)]2)1/2 <x\t\H (C.3) where 0 < H < 1 is called the Hurst exponent. Successive increments £// of a fractional Brownian motion are called frac-tional Gaussian noise (fGn) ZH(t)=XH(t + 6)-XH(t) (CA) where 5 can always be rescaled to one (to be discussed). The autocorrelation function (which measures the covariance of a data series with itself at some lag r) is formally defined as C[j) = ~ [ M * ~ T ) ~ ~ T ) ) ] ) (C.5) 193 For an fGn process the definition is [103,104] C ( T ) = \(\r + 1| 2 H - 2 \r\2H + \r - 1|2H) (C.6) which is obviously zero for H = 1/2 (except for r = 0 where the autocorrelation is always one) while for general H ^ 1/2 and large r lim C(T) OC r T - > 0 0 2H 2H OC T OC T R2H (1 + T - 1 ) 2 " - 2 + ( 1 - T - 1 ) 2 " (1 + 2 t f r - 1 + H(2r7 - l ) r - 2 ) - 2 + (• • •)] OC T 2H-2 (C.7) so correlations decay slowly and the resulting fractional Brownian motion exhibits long memory effects. Correlations are positive for H > 1/2 (persistence) and nega-tive for H < 1/2 (antipersistence) as shown in Fig. C l . (Note that fBm is not the only framework for generating long range memory effects: for instance, fractional ARIMA(0,d,0) processes also exhibit scaling with an exponent H = d + 1/2 [105].) As for standard Brownian motion, all fBm series are self-affine [105,106] XH(at)laHXH(t) (C.8) meaning that the series appears statistically identical under rescaling the time axis by some factor a and the displacement XH by aH'. Hence, fBm lacks any character-istic time scale and when generating or sampling an fBm series, an arbitrary step length of one unit may be used without loss of generality [107]. Self-affine signals can be described by a fractal dimension D which is related to the Hurst exponent through D = 2 — H for fBm [108,109]. (The fractal dimension D can be loosely interpreted as the "number of dimensions" the signal fills up. For example, notice that in Fig. C l the H = 0.1 signal "fills in " significantly more space than H = 0.9 and, consequently, has a higher fractal dimension.) The power spectrum (defined as the amplitude-squared contributions from the frequencies ±/, S(f) = \FH(f)\2+ \FH(-f)\2 where FH is the Fourier transform of XH [20]) of fBm also demonstrates scaling behaviour. The exact spectrum is difficult to compute but for low frequencies it can be approximated by a power law S(f) ~ l/f2H+1 [105] (see Fig. C.2) which corresponds to long-term spatial correlations. Flicker or l/fa noise with a « 1 is ubiquitous in nature (see Ref. [12] and references therein) and some of it may be attributable to long-memory fBm processes [110]. Note that from the definition of the Fourier transform the derivative of fBm, fractional Gaussian noise, also has a low frequency power law spectrum but with exponent reduced by 2, i.e.. 1/'f2H~l. 194 Figure C l : Sample fractional Brownian motion time series with different Hurst exponents: antipersistent H = 0.1 (top) has negative long-range correlations, un-correlated H = 0.5 (center) is standard Brownian motion, and persistent H = 0.9 (bottom) has positive long-range correlations. 195 l-l QJ I QJ o P . S-l QJ o 105 io 4 1-103 k 101 10° 0.001 -i—i—i—i—i 1111 106 105 i o 4 ^ 103 t-102 IO1 10° 0.001 107 F 106 r 105 r 104 r 103 E-102 fc-101 10° io- 1 # = 0.1 0.01 0.1 frequency l 1—i—i—III # = 0.5 _l I I L 0.01 0.1 frequency i + i i i 111 i 1—i—i I I I I -H = 0.9 0.001 0.01 0.1 frequency Figure C.2: Power spectral densities for the fractional Brownian motion time series shown in Fig. C l . The points are from finite samples of 1000 points each and the line represents the theoretical spectrum. For low frequencies the power spectrum is well approximated by a power law 1/f2H+1. 196 Fractional Brownian motion has been criticized because it lacks a physical interpretation and because the process has an unrealistic infinite memory [111,112]. However, it suits our purposes here because it is a mathematically elegant extension of standard Brownian motion which introduces long-range memory effects and can be characterized by a single parameter H. Hence, it is an ideal experimental control for testing procedures of measuring the Hurst coefficient in real data sets. C l Synthesis Before we can test various methods of estimating the Hurst exponent, we need some control data sets with known exponents. This data must be synthesized from the first principles of fractional Brownian motion as defined above. The computational difficulty is that for i f 7^  1/2 fBm has an infinite dependence on its history so approximations are required. A number of generators have been proposed [104,106, 107,113] but most are slow and/or inaccurate. One of the most common techniques, Successive Random Addition (SRA) [114] is very fast but its correlation function does not match that of fBm. Another technique, the Spectral Synthesis Method (SSM) [82], uses the scal-ing behaviour l/f2H+1 of the power spectrum to generate synthetic data in fre-quency space and then inverse Fourier transform the data to recover the desired time series. Although simple and fast—the Fast Fourier Transform (FFT) algo-rithm only requires on the order of N log JV operations—it fails because the power law in the frequency domain only applies for low frequencies, as mentioned above. I prefer the generator by Vern Paxson [115] because it is quick and accurate. It also uses a Fourier transform but it uses an accurate approximation to the fBm correlation function to generate a proper power spectrum. The basic algorithm for generating a data set of N points with Hurst expo-nent H follows: 1. Find the smallest integer N% which is a power of 2 and is not smaller than 8AT. 2. Generate a discrete power spectrum for /, = i/Ng, i = 1,... ,N&/2 using Paxson's equations [115, Eqs. (4-6)] given here for convenience (see Ref. [116] for derivation): S(f)=A(f,H) |2TT/| -2H-1 (C.9) where A(f,H) = 2sin(7r#)r(2.r7 + 1) (1 - cos(2?r/)) (C.10) 197 and B'i(f,H) = [1.0002- 0.000842/] B'3(f,H) B'3(f,H) = B3(f,H)-2-™5H-JA (C. l l ) (C.12) are improved approximations of B3(f,H) naf + bi + ai + bi + ai + bi ai+bi+4 + bj (C.13) 87Ttf where d d' 27T(fc + /) 2n(k-f). -2H-1 -2H (CU) 3. Choose a zero-amplitude null component of the power spectrum 5(0) = 0 to detrend the fGn increments in real space (zero mean). 4. Multiply each component of the power spectrum by a Poisson distributed uniform deviate nj with mean (n) = 1 This simulates the noise associated with a real data series, for which uncer-tainties in the power spectrum are multiplicative [20, p. 552]. 5. Construct the complex Fourier space representation of the series fi, i = —iVg/2,..., +N%/2 from the power spectrum using random phases 0 < 9/ < 2n Randomizing the phases does not disturb the power spectrum and ensures the finite-sample correlation function converges to the proper theoretical form in the limit N ->• oo [107]. 6. Compute the inverse Fourier transform c]f/(<j),? = 1... A^ 8 and discard the imaginary components to get a fractional Gaussian noise series 7. Pick a random subset of length N of the series and discard the remainder. This minimizes wrap-around effects from the Fast Fourier Transform [20,110, 117,118] and gives the illusion of a non-stationary series (to simulate real data, for which the stationarity may be difficult to decide). Note that Paxson does not consider subsampling in his original algorithm. S(f) <- T}fS(f). (C.15) (C.16) (C.17) 198 8. Finally, to convert to a fractional Brownian motion, simply integrate XH(t) = XH(t-l)+^H(t). (C.18) Paxson's method is accurate [115], computationally simple, and fast (most of the computation is in the Fourier transform so it still only requires on the order of iVlog JV operations). One's first instinct to check for long-range correlations in a data set may be to simply test how quickly the autocorrelation function (Eq. C.5) decays with large lags. This proves to be a poor choice however, because antipersistent data is difficult to distinguish from uncorrelated, the correlations can be mistaken for noise fluctuations around zero. A method to reliably estimate the Hurst coefficient from a time series would be a useful method of testing for and quantifying long-range correlations. The oldest and still most common method is due to Hurst [119] who noticed that the range R of the depth (or cumulated influx) of water behind a dam over a span of time r was related to the standard deviation S of the influx over the same period through where H should be 1/2 for random, uncorrelated processes [120]. Hurst's method, Rescaled Range or R/S analysis, was to sample non-overlapping subsets of length r from a time series and calculate the average R/S statistic. Repeating over a wide range of T-values and recording the data on double-logarithmic graph the Hurst exponent should emerge as the slope of a straight line, log(R/S) = Hlogr + C. Unfortunately, despite its extensive usage [21,76,111,121-123] Hurst's rescaled range analysis has been shown to be a poor estimator of H [108,112,118,120] with a consistent bias towards H = 0.7 and requiring a large data set for convergence. Another common technique for testing for correlations is shuffling the order of the data and comparing the statistics of the original data with the shuffled. Shuffling destroys the correlations in the data but care must be taken to detrend the data as well. Persistent data series are characterized by large low-frequency components which make the data series appear non-stationary (notice, for example, the trend in the H = 0.9 series in Fig. C l ) . Shuffling without first removing this trend would not destroy these low-frequency correlations which extend throughout the entire dataset. Shuffling is a valuable way of testing for correlations but, in itself, doesn't specify any statistical techniques for distinguishing the original from the shuffled C.2 Ana lys i s R/S cx T H (C.19) 199 series, and we have already seen that the correlation function and rescaled range analysis are inadequate. A number of alternatives have been proposed including autocorrelation anal-ysis [82], Fourier analysis [82,110], and maximum likelihood estimators. The ad-vantage of the latter is that they are not graphical techniques but numerical—they simply return the best estimate of the Hurst exponent directly. Unfortunately, they require (at least) an assumption about the form of the long-range dependence (such as fBm or fractional AR IMA) and perform poorly if the assumption is incorrect [124]. Each of the above methods suffers from biases and slow convergence (a large dataset is required to reduce the bias). However, two methods have been consistently better, requiring smaller datasets and exhibiting less bias [109,117,125]: dispersional analysis and scaled-window variance analysis. Both of these methods are graphical, producing a power law relationship from which the exponent can be read off as the slope of the line when using double-logarithmic axes. C . 2 . 1 D i s p e r s i o n a l ana lys i s Dispersional analysis, also known as the Aggregated Variance method [105], averages the differenced fGn series over bins of width r and calculates the variance of the averaged dataset. Given a fGn series £//(i), i = 1,..., N a particularly simple but effective version of the algorithm follows: 1. Set the bin size to T = 1. 2. Calculate the standard deviation of the N data points and record the point (r,r-cr T )). 3. Average neighbouring data points and store in the original dataset M 0 < - ^ K / / ( 2 t - l ) + 6f(2i)] (C.20) and rescale N and r appropriately N - N ' 2 (C.21) T i— IT. K ' o 4. As long as more than four data points remain (N > 4) return to Step 2. (The reader may prefer to require more than four bins to reduce noise.) 5. Perform a linear regression on the log-log graph log(r-c7 T) =H\ogT + C; (C.22) the calculated slope is the best estimate of H. 200 Recording r • oT in Step 2 instead of just the standard deviation ar is not standard but it simplifies the regression because the Hurst exponent can be simply be read off the graph instead of H — 1. Fig. C.4 shows that Dispersional analysis performs significantly better than rescaled range analysis. C . 2 . 2 S c a l e d W i n d o w V a r i a n c e a n a l y s i s The other method well-received method, Scaled Window Variance analysis (SWV), also known as Detrended Fluctuation Analysis [126] or Residuals of Regression [105], applies to the cumulated fBm series instead. Given a fBm series Xn{i), i = 1, • • •, AT my own variation of the algorithm follows: 1. Split the series into M = [N/T\ (where [xj is the floor operator—returning the greatest integer not greater than x) evenly-spaced bins of size r = 16 (SWV is inaccurate for smaller T [109]) X^(j) = XH ((k-1)K + J),J = 1...T (C.23) where N — T * = [ M T TJ < c - 2 4 > This allows the option of setting a minimum and maximum on the number of bins Mmin < M < Mmax. Setting M m j n larger than N/T will necessarily result in overlapping bins but this effect has been tested [118] and the benefit of the larger sample size outweighs the influence of cross-correlations introduced. 2. For each bin k, k — 1... M , detrend the local series by subtracting off X{H{j) = rnj + b (C.25) Three options for calculating the trendline have been tested [125]: (a) No detrending: m = 0, b = 0. This is only recommended for N < 2 9 data points. (b) Bridge detrending. Form a line between the first and last point in the bin: J C.26 b = xP(l)-m Recommended for N > 2 1 2 data points. 201 (c) Linear detrending. Perform a linear least-squares regression over the entire bin to calculate m and b. Recommended for intermediate N. 3. Calculate the residuals after subtracting off the trend line xP(j)=X{Hk)(j)-XP(j) (C.27) (k) 4. Calculate the standard deviation of the residuals in each bin o\ and compute the average and standard deviation of these samples (C.28) 5. If the average standard deviation aT is non-zero convert it to a log-scale <Tiogr = logo> and plot <7\0gT ± AlogT versus logr. The uncertainty on the log scale can be approximated by A l o g T « — loge (C.29) 6. Double the bin size f <— 2r and repeat while N > IT. 7. Perform a linear regression on the log-log graph log<rT = Hlogr + C; the calculated slope is the best estimate of H. Sample fits using the SWV method (with at least M m j „ = 4 bins) are shown in Fig. C.3 using the same data as before. Notice that these data sets are rather small (1000 data points each) but even so, the accuracy is remarkably good. When compared with Hurst's rescaled range analysis (see Fig. C.4) it becomes clear that the SWV method is superior (also edging out the Dispersional method). Another good feature of the SWV method is that, like all graphical tech-niques, it clearly reveals multifractal behaviour. At some critical scale, the fractal dimension may crossover to a new value. This is characterized in graphical tech-niques by a discontinuity in the slope of the log-log graph. In particular, transitions to H = 0.5 are often observed for large r, indicating a transition from correlated behaviour over short time scales to uncorrelated on long time scales. The memory duration can simply be read off the graph as the transition point T. C.2.3 Levy Flight Despite its advantages, SWV fails in one respect: it is unable to distinguish be-tween long-range correlations and uncorrelated Levy flight. Levy flight is similar 202 o > a> "O a -3 128 64 h 32 10 T 1 1 1 1—I—T > > i i i • • i # = 0.101 ±0.009 _1 I I I I I L-JL 100 bin size 1000 512 a •B 256 "> QJ "O "O l-l c6 "C Pi a 128 64 32 10 ~i n 1 1 — i — i — i — r - i 1 1 1 — i — r H = 0.46 ± 0.05 _ l I I 1 L -I I I I l _ 100 bin size 1000 2048 g 1024 •| 512 QJ -o S3 -o 256 128 64 32 10 -i 1 1 — i — i — r - i 1 1 1 — i — i — i — r ' I H = 0.89 ± 0.05 J _L - j i i i _ 100 bin size 1000 Figure C.3: Scaled window variance analyses for the fractional Brownian motion time series shown in Fig. C l (exact #=0.1, 0.5, and 0.9, respectively). The esti-mated values of # shown represent the best fit slopes of the lines. The analysis used Mmin = 4 (see the text). 203 cS .3— oft s'~ O S . s ' "> 0.25 0.2 0.15 -0.1 -0.05 -0 --0.05 --0.1 --0.15 --0.2 -0.25 0 T R/S Disp. SWV I—I—I h - X - - ! ' x X J , ;. m 11 I, i 0.2 0.4 0.6 synthetic H (input) 0.8 Figure C.4: Comparison of Hurst estimators using synthetic datasets of 1000 points each. The scaled-window variance method (SWV, *) performs significantly bet-ter than rescaled range analysis (R/S, +) and marginally better than dispersional analysis (Disp., x). (The points are offset slightly to improve readability.) to (traditional) Brownian motion in that it is a cumulated series of independent, identically-distributed (Ud) increments, but in this case, the increments are Levy distributed instead of normally distributed. Normal or Gaussian distributions are well known to obey the following sta-bility property: if x\ and x2 are both Gaussian-distributed random variables then their sum x = x\ + x2 (C.30) is also Gaussian-distributed. Paul Levy discovered a general class of distributions which have the stability property [83,111,127]. Levy distributions generally have no closed analytical form but can be defined in terms of their characteristic function f(k) (the Fourier-space representation of the probability distribution) [127,128] l n / ( * ; a , / 3 ) = |fc|Q (1 - ip tan if sign(fc)) a / 1 \k\ ( l + ft/3 ln|&| sign(fc)) a = 1 (C.31) where 0 < a < 2 is a characteristic exponent and — 1 < fi < 1 is the skewness. Special cases of the Levy distribution include the Gaussian (a = 2, /? = 0) and Cauchy (a = 1, (3 = 0) distributions. 204 The stability property says that the sum of a large number of iid Levy random variables wil l also be a Levy random variable with the same a and /? in apparent violation of the Central Limit Theorem. The paradox is resolved by recognizing that Levy distributions with a < 2 have power-law tails far from the origin P(x) ~ l+i 3 3 w - > 0 0 ( c - 3 2 ) \x\ so the variance (x2) is infinite for a < 2 whereas the Central Limit Theorem assumes a finite variance. Another interesting property of Levy distributions is that the cumulative Levy flight Xa is self-affine, scaling as Xa{at) = o}'aXa{t) (C.33) which parallels the scaling relation Eq. C.8 for fractional Brownian motion. A consequence of this is that Hurst coefficient estimators which depend on this scaling property may erroneously predict positive long-term correlations with H = 1/a (C.34) when applied to uncorrelated Levy flights with 1 < a < 2. To test for this effect, data sets of 1000 symmetric (/? = 0) Levy distributed random variables with were synthetically generated (using a simple and elegant algorithm explained in Ref. [128]) and cumulated to produce a one-dimensional Levy flight. The synthetic data was then analyzed using the rescaled range, dispersional, and scaled-window variance techniques. The results shown in Fig. C.5 indicate that SWV is sensitive to Levy noise whereas R/S and Dispersion are not. (Note also that the Fourier spectrum of Levy flight still approximates a power law l / / 2 with exponent 2 (indicating no correlations).) C.3 Conclusions In summary, to test for long-range correlations in a data set Dispersional analysis is recommended. If more precision is required (especially for H near 1) and the increments are Gaussian-distributed, a scaled window variance analysis should be performed. In this discussion we have explored tests for long-range correlations in frac-tional Brownian motion (correlated with Gaussian increments) and Levy flight (un-correlated with non-Gaussian increments). These two extensions to Brownian mo-tion are not exclusive. Fig. C.6 shows how that they are complementary notions 205 1 1.2 1.4 1.6 1.8 2 synthetic alpha (input) Figure C.5: Comparison of Hurst estimators on uncorrelated Levy flight with char-acteristic exponent a using synthetic datasets of 1000 points each. Rescaled range {R/S, +) and dispersional analysis (Disp., x) perform well but scaled window vari-ance analysis (SWV,*) performs poorly, especially for small a, tending towards the 1/a curve. (The points are offset slightly to improve readability.) fractional Brownian > h-1 H Brownian a Figure C.6: Schematic representation of relation between fractional Brownian mo-tion and Levy flight. Traditional Brownian motion sits at the intersection (H = 1/2, a = 2). The natural extension into the two-space is fractional Levy motion which has correlated, non-Gaussian increments. 206 and the two ideas can be combined to produce fractional Levy motion (fLm) with correlated, non-Gaussian increments. There is very little literature on the subject but it may be a useful model for some natural phenomena [129]. I am unaware of any efficient algorithm to synthesize fLm but it would begin with the correlation function in Eq. C.5, which still applies. 207 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0085499/manifest

Comment

Related Items