UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Non-linear exchange rate forecasting : the role of market microstructure variables Gradojevic, Nikola 2002

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2003-79220X.pdf [ 5.08MB ]
JSON: 831-1.0091276.json
JSON-LD: 831-1.0091276-ld.json
RDF/XML (Pretty): 831-1.0091276-rdf.xml
RDF/JSON: 831-1.0091276-rdf.json
Turtle: 831-1.0091276-turtle.txt
N-Triples: 831-1.0091276-rdf-ntriples.txt
Original Record: 831-1.0091276-source.json
Full Text

Full Text

NON-LINEAR EXCHANGE RATE FORECASTING: T H E ROLE OF M A R K E T M I C R O ST R U C T U R E V A R I A B L E S  by  NIKOLA GRADOJEVIC  B.Sc, University of Novi Sad, 1996 M.A., Central European University, 1998  A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F  T H E REQUIREMENTS F O R T H E D E G R E E OF  D O C T O R O F PHILOSOPHY in T H E F A C U L T Y O F G R A D U A T E STUDIES  Department of Economics  We accept this thesis as conforming to the required standard  T H E UNIVERSITY O F BRITISH C O L U M B I A  September 2002 © Nikola Gradojevic, 2002  In presenting degree  this  thesis  in partial  fulfilment  at the University of British Columbia,  of the requirements  for an advanced  I agree that the Library shall make it  freely available for reference and study. I further agree that permission for extensive copying  of this  department  thesis  for scholarly  Department of  €x^>*>oniCS  The University of British Columbia Vancouver, Canada  H^six^  ,  It  is  by the head  understood  that  for financial gain shall not be allowed without  permission.  DE-6 (2788)  may be granted  or by his or her representatives.  publication of this thesis  Date  purposes  of my  copying  or  my written  Abstract  In this dissertation, we conduct a study of exchange rate models for the Canada/U.S. exchange rate. More specifically, we focus on their intra-day (high-frequency) and, subsequently,  weekly forecast  performances.  A l l attempts  to  explain equilibrium  exchange rates suffer from various problems: structural (macroeconomic) models used for out-of-sample  forecasting  produce  poor  forecasts.  Given  that  different  market  participants trade based on private as well as public information sets, it is natural to assume that equilibrium exchange rate expectations are formed from a combination of macroeconomic fundamentals and market microstructure variables. Chapter 1 motivates research in the area of non-linear microstructure exchange rate modeling, reviews the recent literature and introduces the general ideas behind this thesis. Chapter 2 outlines Artificial Neural Networks (ANNs) and other non-linear modeling approaches used in this research. Chapter 3 introduces a non-linear Canada/U.S. exchange rate microstructure model and provides a strong evidence for the microstructure effects. Our horse race for forecast performance results in a non-linear A N N model as the winner. A N N models outperform random walk and linear models in a number of recursive out-of-sample forecasts. The daily forecasts produced by A N N models are statistically significant  ii  according to Diebold and Mariano (1995) statistics. Apart from the nearest neighbours model, other linear and non-linear models are unable to generate significant predictions. The inclusion of a microstructure variable, order flow, substantially improves the predictive power of both the linear and non-linear models. Our findings also indicate the necessity of embodying (in a non-linear sense) information not only from interbank order flows, but also from commercial client and foreign institution transactions. No matter which non-linear model is used, there is always a slight forecast gain when dealer's private order flows are included into a set of explanatory variables. Chapter 4 describes fuzzy logic technology in the form of approximate reasoning as a method that can be used in economics when dealing with continuous and imprecise economic variables, insufficient data for analysis and when a mathematical model of the process is unknown. Chapter 5 develops an original and novel approach to generating trading strategies in the foreign exchange (FX) market based on forecasts from the A N N . Neurofuzzy (NF) decision-making technology is designed and implemented to obtain the optimal daily currency trading rule. We find that a non-linear A N N exchange rate microstructure model combined with a fuzzy logic controller (FLC) generates a set of trading strategies that, on average, earn a higher rate of return compared to the simple buy-and-hold strategy. We also find that after including transaction costs, the gains from the NF technology do not decline and increase on some periods. Finally, we successfully apply the NF model to the problem of determining the F X market's sentiment as reflected by the chartists' trading signals during periods of strong depreciation.  iii  Contents  Abstract  ii  Contents  iv  List of Tables  viii  List of Figures  ix  Acknowledgments  x  1 Introduction  1  2 Artificial Neural Networks (ANNs) and Other Non-linear Methodologies  9  2.1  9  ANNs  2.1.1 Definition and structure  9  2.1.2 Learning and adapting  15  2.1.3 Backpropagation A N N  16  2.1.4 Recurrent A N N  21  iv  2.2  NN method  23  2.3  TSK method  27  2.3.1 Introduction  27  2.3.2 TSK fuzzy model building  29  3 Exchange Rate Forecasting: A Market Microstructure Approach 34 3.1  Models of Exchange Rate Determination  34  3.2  Model Specification  45  3.2.1 Data description  45  3.2.2 Assessment of forecast performance  51  3.3  56  Empirical Results  3.3.1 Models based on individual order flows when controlled for the day-of-the-week-effects  56  3.3.2 Parameter estimates and relative significance of inputs  63  3.3.3 Models based on individual and aggregate order flows when not controlled for the day-ofthe-week-effects  3.4  74  Conclusions  87  4 An Introduction to Fuzzy Logic  89;  4.1  Introduction  89  4.2  Fuzzy System  92  5 Neuro-Fuzzy Decision-Making in Foreign Exchange Trading and Other Applications  98  5.1  98  Neuro-Fuzzy Design  v  5.2  Market Environment  106  5.3  Estimation Results  109  5.4  Excess Volatility and the NF Model  116  5.5  "Continuous" Trading Decisions and a Risk-A verse Investor  5.6  Conclusions and Further Research  Bibliography  125 128  131  vi  List of Tables  3.1  Augmented Dickey-Fuller (ADF) unit root t-tests.  47  3.2  Out-of-sample forecast performance.  61  3.3  Two-step-ahead and three-step-ahead forecast performance of the ANNB2 model.  62  3.4  The ANN models with the greatest forecast improvements.  64  3.5  Estimated ANN's connection weights and node biases (Model 1).  65  3.6  Estimated ANN's connection weights and node biases (Model 2).  66  3.7  Estimated ANN's connection weights and node biases (Model 3).  66  3.8  Estimated ANN's connection weights and node biases (Model 4).  67  3.9  Relative contribution (pseudo-weights) of inputs for the ANN models.  68  3.10  Estimation Results For Linear Models.  76  3.11  RMSE (exp 10 ) for ANN, linear, and random walk models (j=l).  3.12  RMSE (exp 10 ) for ANN, linear, and random walk models (j=7).  3.13  The average percentages of correctly predicted signs for linear models 1 and 2 (LM 1 and  ;!  '  81 82  2  L M 2), ANN models 1 and 2 (ANN 1 and ANN 2), and the random walk model. 3.14  83  P E R C and RMSE (exp 10"'' ) statistics for the recursive estimation over the whole testing set (k=225).  85  3.15  PERC(POS) and PERC(NEG) for ANN models 1 and 2 (ANN 1 and ANN 2).  86  5.1  The intervals for discrete decisions on traded volume (based on a defuzzified output z). 105 ,  5.2  The number of moving windows when the NF technology outperforms a simple buy-andhold strategy.  5.3  110  The intervals for discrete trading signals the traders receive from the NF system (based on a defuzzified output z).  5.4  123  Observed ANN (model 2) direction-of-change forecasting statistics (frequencies) for July/August 1998 inputs (34 observations).  5.5  Observed ANN (model 2) direction-of-change forecasting statistics (frequencies) for November/December 1994 inputs (27 observations).  5.6  123  '  124  The number of moving windows when the NF technology outperforms risk-averse investor's strategies.  127  vii  List of Figures  2.1  Computational element or node with N inputs and one output (weighted sum of inputs). 10  2.2  A three-layered A N N architecture.  11  2.3  A three-layered backpropagation ANN.  17  2.4  A three-layered Elman recurrent ANN.  22  3.1  Three-stage exchange rate determination from Lyons and Evans (2002).  41  3.2  Above: Aggregate (cumulative) order flow and log Canada/U.S. real exchange rate. Below: IB (cumulative) order flow and log Canada/U.S. real exchange rate.  43  3.3  Non-linear pattern of response (ANNBP2 model).  71  3.4  Non-linear pattern of response for the 1994 depreciation episode (ANNBP2 model).  72  3.5  ANN errors: training, validation, and testing.  77  3.6  An illustration of linear and A N N model exchange rate forecasts. Actual values are denoted by circles.  78  3.7  RMSE for A N N model 1, linear model 1, and random walk (j=l).  79  3.8  RMSE for A N N model 2, linear model 2, and random walk (j=l).  79  3.9  RMSE for A N N model 1, linear model 1, random walk (j=7).  80  3.10  RMSE for A N N model 2, linear model 2, and random walk(j=7).  80  3.11  Example: recursive estimation P E R C for ANN model 2, linear model 2, and random walk model (1-day forecast).  84  4.1  Fuzzy control system.  93  4.2  The example for the relationship between variables, fuzzy sets and triangular membership functions for the variable "trading action".  95  5.1.a Gaussian membership functions for the variable "exchange rate change forecast".  100  5.1.b Triangular membership functions for the variable "FX trader's action".  101  5.2  NF system for the F X trading. Two nonlinear models are used for forecasting.  102  5.3  An example of Mamdani fuzzy inference.  104  vm  5.4.a NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 2 without the transaction costs) and NF excess returns.  Ill  5.4.b NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 2 with the transaction costs) and NF excess returns.  112  5.4.C NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 1 without the transaction costs) and NF excess returns.  112  5.4. d NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 1 with the transaction costs) and NF excess returns.  113  5.5. a NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 2 without the transaction costs) and NF excess returns.  113  5.5.b NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 2 with the.transaction costs) and NF excess returns.  114  5.5.c NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 1 without the transaction costs) and NF excess returns.  114  5.5.d NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 1 with the transaction costs) and NF excess returns.  115  5.6  Gaussian membership functions for the variable "exchange rate change forecast".  121  5.7  Triangular membership functions for the variable "FX trader's action."  122  5.8  Tan-sigmoid function used to estimate the risk-averse investor's strategies.  126  ix  Acknowledgments  First and foremost, I would like to thank my thesis supervisor, Dr. John Cragg, who played a key role in the conception and development of the entire project. It has been an honour to have the opportunity to work with John, first as a research assistant, and later as a Ph.D. candidate. In working with John, I learned something new and exciting not only about economics, but also about the world each time we would meet. His expertise, support, patience, and friendship have set an example I hope to match some day. Furthermore, I would like to express my gratitude to Glen Donaldson for his constructive and invaluable ideas that have made their mark throughout the work. Michael Devereux also provided some critical and helpful comments. A very special thanks goes out to Angela Redish, without whose support and advice I would not have not been able to materialize this thesis. Thanks to all of my friends at the Bank of Canada including Peter Thurlow, Toni Gravelle, Jing Yang, and many more, who initialized and greatly contributed to this research. I also owe thanks to Dragan Kukolj (University of Novi Sad) and my former colleagues at the Faculty of Engineering who introduced me to the world of artificial  intelligence and its applications in economics, and who helped me with the empirical part of my work. Marko Vujicic, Martin Berka, Christoph Schleicher and my other classmates have made the Ph.D. program fun and interesting. Marko also provided support and friendship during some difficult times. Many thanks to the economics department secretaries for their positive energy and assistance. Finally, this thesis would have never been possible without the love and support of my parents, Jovan and Ljubica Gradojevic, and my sister, Tina, who have never doubted my success on this journey. I would like to dedicate this thesis to my "princesses": my wife, Zana and my daughter, Marina. Thank you for all that I have become from the moment you have enlightened my life.  xi  Chapter 1 Introduction  Understanding exchange rate movements has long been an extremely challenging and important task for academic and business researchers. Not only is the exchange rate a significant determinant of aggregate demand in a small open economy such as Canada's, but it also responds immediately and materially to monetary policy (e.g., Clinton 2001). However, the response is not always predictable. This makes it sometimes difficult to achieve the desired results of monetary policy implementation. Efforts to deepen our understanding of exchange rate movements have taken on a number of approaches. Initially, efforts centred on the development of low-frequency macroeconomic (fundamental) empirical models. More recently, efforts have been aimed at the development of high-frequency models of the foreign exchange (FX) market, based on microeconomic (microstructure) variables. Throughout, however, forecasting models have been developed for obvious utilitarian purposes, but they have also served as a gauge of our understanding of exchange rate movements. Moreover, they can sometimes help to pinpoint where the gaps in our knowledge may lie, and therefore suggest new avenues of research. The exchange rate forecasting model developed in this thesis serves all of these purposes.  1  Given the failure of traditional F X rate models to explain and predict exchange rate fluctuations  correctly (Meese and Rogoff  1983), we turn to the  market  microstructure of F X markets in chapter 3. In recent years there has been a lot of evidence that the behavior of dealers and other market participants can influence equilibrium exchange rates (Lyons and Evans 2002, Yao 1997, Covrig and Melvin 1998). Inventory adjustments and bid-ask spread reactions to informative incoming order flows are two examples in which dealer behavior affects exchange rate determination. Indeed, given that different market participants trade based on private as well as public information sets, it is natural to assume that equilibrium exchange rate expectations are formed  based  on a  combination of macroeconomic fundamentals  and  market  microstructure variables (Goldberg and Tenorio 1997). The highest frequency used in this thesis is a daily frequency as we try to balance the tension between macroeconomic and microstructure effects. Another motivation for using high-frequency data is the 1  capability to develop and estimate a model based on one decade of data. It is unrealistic to achieve a consistent and reliable F X model based on decades or even a century of data: markets are rapidly changing and one could be in fact analyzing a number of  1  The development of electronic trading and the Internet now allows us to follow price formation in real-  time. Ideally, in order to truly understand foreign exchange markets one can attempt to follow our approach on tick-by-tick basis. In financial markets, the DGP is a complex network of layers where each layer corresponds to a particular frequency. As the highest available frequency for the microstructure data is daily, we leave this complete characterization of the true DGP to further research and aggregate to daily (and later weekly) information assuming the influence of random noise does not have a significant impact on our findings. Thus, the messages from this thesis to the mainstream paradigm of possible data generating mechanism of F X rates correspond to a particular frequency and require further investigation, once the data become available.  2  fundamentally different markets. Also, by using a great amount of data we can explore more complicated (non-linear) models and reduce a risk of missing important features of the short-term data generating process (DGP). This thesis is oriented toward generating new and useful theoretical and empirical insights about trader behavior, the informational role of trades and the way that information is processed in the F X market. First, in chapter 3, we extend the portfolio shifts F X model of sequential trading (Lyons and Evans 2002) to find that the fusion of fundamental exchange rate determinants used by the Bank of Canada and different order flow types matter in a highly non-linear fashion. In chapter 2 we introduce the artificial neural network (ANN) model. The application of such a model brings examination of high-frequency exchange rate determination to a new level. It is quite plausible that the informational content of trades and public information are processed in a sophisticated, biological neural networklike model. In contrast to Lyons and Evans (2002), first, we apply ANNs (non-linear model), then, we use a larger amount of data (more than ten years of daily data) and, finally,  we receive statistically significant forecasts.  The inclusion of additional  traditional and microstructure variables (to Lyons and Evans 2002) and ANNs results in out-of-sample forecasts superior to those from random walk and linear exchange rate models. This concept is pioneering in that we are able to establish a successful (and robust)  non-linear  ANN-based  relationship  between  diverse  traditional and  microstructure variables and the Canada/U.S. exchange rate. Prior to this work, A N N exchange-rate  models were built mostly on macroeconomic, technical trading or  3  autoregressive foundations (Gencay 1999, Kuan and Liu 1995, Plasmans et al. 1998) and the evidence from them was mixed. In chapter 3 we also compare A N N results to a number of other popular non-linear approaches and draw important conclusions from our empirical exercise. More specifically, we focus on nearest neighbours method (Diebold and Nason 1989), backpropagation A N N , recurrent A N N and the Takagi-Sugeno-Kang (TSK) fuzzy clustering model (Takagi and Sugeno 1985, Setnes et al. 1998) to estimate non-linear conditional-mean functions. Our exercise calls attention to non-linearities (ANNs) and order flows as important ingredients for exchange rate determination and very short-term forecast improvements. Thus, not only are strategic behavior and information asymmetries important, but, according to our findings, the very ability to understand and process that information as well. In market microstructure theory, order flows are principal price determinants. Moreover, they are proxies for some of the short-run exchange rate volatility sources (technical trading, bandwagon effect, over-reaction to news, speculation, etc.). Having stated this, we do not preclude macroeconomic fundamentals from being the underlying determinant. Our findings strongly support the microstructure approach to exchange rates. Moreover, we point to important implications beyond those given by Lyons and Evans (2002) and Lyons (2001). What makes their dynamic, temporary equilibrium structure so easy to characterize is its linear structure. However, we find that market participants (i.e., dealers) are likely to pursue more complex or even mixed strategies. This brings us to the notion of a non-linear equilibrium where market makers create a non-linear relation between public information, microstructure effects and exchange  4  rates. One of the major strengths of this thesis is its strong empirical evidence that requires expansion of traditional microstructure linear characterization of equilibria as well as fundamental models of F X markets. We attempt to describe this new model only to a limited extent and more extensive research is required. In light of these findings, we challenge theoretical researchers to derive and explain this non-linear microstructure model more formally. In chapter 5, we develop an original and novel approach to generating trading strategies in the foreign exchange (FX) market based on forecasts from the A N N (Gradojevic and Yang 2000). In recent literature that tackles the problem of generating profitable trading strategies, either the efficacy of technical analysis is tested (Lo and Wang 2000, LeBaron, Brock and Lakonishok 1992) or the search for improved technical trading models is conducted (Allen and Karjalainen 1999). We depart from recent research in this area in that our strategies contain macroeconomic and microstructure effects processed by the ANN. In addition to this, we employ a fuzzy logic approach to obtain a smoother decision surface. Previous studies were able to generate only buy, hold or sell trading signals while we model trading strategies as less discrete phenomena in such a form that recommends the fraction of our current endowment to be traded. The fundamentals of fuzzy logic are presented in chapter 4. Fuzzy logic is based on natural language and represents reasoning related to realistic agent behaviour in the F X market. Our approach is motivated by existing evidence that implies human reasoning can be modeled as a fuzzy logic model (Smithson, 1987; Smithson and Oden, 1999). More precisely, we attempt to mimic the behavior of currency traders by  5  assigning them a set of continuous rules that explain the environment.  This is more  difficult to capture with conventional techniques. The idea is to extend the classical boolean logic by means of semantics that accounts for imprecision - the lack of sharply defined class membership criteria as opposed to the presence of random factors. This departs from the probability theory where the imprecision originates in the random nature of the process rather than in any vagueness of human decision-making. The advantage of this approach is that we eliminate the subjective views (i.e., precision errors) of the trading strategy given the exchange rate forecast. Generally, we do not follow the neoclassical economic approach where economic agents are capable of logically inferring their price expectations and consequent trading strategy formation. In our view, real-world F X markets involve excessive market created 2  uncertainty (Arthur 1994). This is a consequence of asymmetric expectations which force the agent to make subjective price predictions and corresponding strategies, given the subjective expectations of other agents. Fuzzy logic could provide an alternative form of reasoning to assist in agents' decision-making. We combine fuzzy reasoning with highly accurate ANN-generated forecasts into a dynamic neuro-fuzzy (NF) model. This enables us to investigate the evolution of NF-defined strategies over various time periods. The objectives of our this chapter are first to, model a fuzzy reasoning process as an extension to the A N N model, second, to investigate the N F model's implications for aggregate market behaviour (i.e., market efficiency) and, finally, to apply the NF model  2  See Tay and Linn 2001 who take similar assumptions.  6  to the problem of determining the F X market's sentiment, something any central bank needs to be able to assess. The central part of our NF prediction and strategy-generating model is a rulebased deduction scheme in which agents (or more precisely traders) rely entirely on a fuzzy set of rules due to limitations in their ability to process public and private information. Our objectives inevitably force us to consider the notion of market efficiency. In its strong form the efficient market hypothesis (EMH) states that current prices reflect all known information (public and private) and no investor can consistently beat the market, i.e., earn extra profit (Fama 1965). A weak form of the E M H leads to the random walk hypothesis (RWH) which implies that on average today's price is the best predictor of tomorrow's price. In this thesis, we do not formally test the R W H or E M H by statistical means. Rather, we compare forecasts and corresponding strategies from our non-linear models to those generated by the random walk model (buy-and-hold strategies). In other words, we assume that the data have been generated by a non-linear DGP, possibly perturbed by additional noise. Our contribution to the research in this area is threefold. First, we develop an innovative fuzzy decision-making system based on both public and private information. This gives us the ability to receive very precise strategy recommendations on not only how to trade but how much to trade as well. Therefore, our approach can also be seen as a natural extension to the classic portfolio selection model (Markowitz 1952) and applied to other decision-making problems. Having received the optimal portfolio weights from the Markowitz's problem, the NF model can give more information about both asset-  7  specific and macroeconomic risks for particular assets. Second, our non-linear forecasts are significantly and consistently superior to the random walk forecasts. However, our daily trading strategies do not consistently make excess returns over simple buy-and-hold strategies. Further, when transaction costs are accounted for, the number of "inefficient" periods is roughly the same compared to when we neglect them. This is not an obvious indication supporting the E M H as there is a strong upward trend in the Canada/U.S. exchange rate testing series which clearly gives advantage to buy-and-hold trading strategies. In addition, the NF-based approach may incur transaction costs on a daily basis while in the case of buy-and-hold strategy it is applied only once (at the beginning of the buy-and-hold period). The findings support our approach to move from discrete (buy/hold/sell) market decisions to less discrete and more precise "fuzzy logic-based" recommendations. Finally, we successfully apply the NF system to assess the F X market's sentiment, as reflected by the chartists' trading signals, during periods of strong depreciations. We also find that the use of the continuous non-linear function that mimics  the  fuzzy  logic  decision-making  can  benefit  a  risk-averse  trader.  8  Chapter 2 Artificial Neural Networks (ANNs) and Other Nonlinear Methodologies  2.1  ANNs  2il.l  Definition and structure  ANNs represent a general class of non-linear models that have been successfully applied to a variety of problems such as medical diagnostics, product selection, system control, pattern recognition, functional synthesis, and forecasting (e.g., econometrics). The most important financial applications of ANNs include option pricing, bankruptcy prediction, stock market prediction, and exchange rate forecasting.  3  3  More extensive review of empirical studies involving ANNs can be found in Qi (1996).  9  Figure 2.1 Computational element or node with N inputs and one output (weighted sum of inputs). Three representative examples of non-linearities are shown below. n Y  = f(Z,W X + 0) i  i  Xo  w X,  INPUT  ^  w  • • •  Y  )  ->  _ y OUTPUT  w  x„ f„(a)  +1  0  F  a  0  < ) A  a  +111  0  f<«)  a  -1 HARD LIMITER  THRESHOLD LOGIC  SIGMOID  ANNs are composed of simple computational elements or nodes (Lippmann 1987). Figure 2.1 provides the simplest node, which sums N weighted inputs and conveys the outcome either to other nodes or out of the A N N . The node is characterized by an internal threshold or offset 0 (or bias) and by its type of specified non-linearity. Figure 2.1 illustrates three common types of non-linearities used in ANNs: hard limiters, threshold logic elements, and sigmoid. More complex nodes might even include integration or other mathematical operations. Neural network models differ in topology, node characteristics, and training or learning rules. These rules fix the initial set of weights and indicate how weights should be altered and adjusted during use to improve performance.  10  There are several ways to structure the neural networks. Typically, the elements are arranged in groups or layers. Fewer layers limit these networks when modeling a functional representation of data, a typical econometrics problem. However, the development of learning algorithms has made it feasible for multilayered networks. They are ideal for functional form determination and they are normally structured as three-layered networks. Figure 2.2 depicts such a network with four inputs, three nodes in the hidden layer, and one output. Figure 2.2 A three-layered A N N architecture.  INPUT 1  INPUT 2  INPUT 3  INPUT 4  INPUT LAYER  The three layers are as follows:  11  Input layer: The neural network receives its data in the input layer. The number of nodes (i.e., neurons) in this layer depends on the number of inputs to a model and each input requires one neuron. For example, in functional synthesis (this thesis' scope of study), inputs are exogenous variables—that is, observations of interest. Hidden layer: The hidden layer lies between the input and output layers; there can be many hidden layers. They are analogous to the brain's interneurons, a place where the hidden correlations of the input and output data are captured. This allows the network to learn, adjust, and generalize from the previously learned facts (e.g., data sets) to the new input. As each input-output set is presented to the network, the internal mapping is recorded in the hidden layer. Unlike any other classical statistical methodology, this gives the system intuitive predictability and intelligence. The number of hidden layers is determined by a trade-off between network intuitive ability and efficiency. A priori, the optimal number of hidden layers is not clear. With too many hidden layers, an overcorrecting problem arises: a network is "overtrained" or "overfitted," which prevents it from learning a general solution. On the other hand, too few layers will inhibit the learning of the input-output pattern. Typically, the number of hidden  layers  and  nodes  inside  the  network  is  determined  through  experimentation, and this thesis follows that technique.  12  • Output layer: Having been trained, the network responds to new input by producing an output that represents a forecast. During training, the network collects the in-sample output values in the output layer.  Among non-linear methods, ANNs are one of the most recent techniques used in non-linear modeling. This is partly due to some modeling problems encountered in the early stage of development within the field of ANNs. In the earlier literature, the statistical properties of A N N estimators and their approximation capabilities were questionable. For example, there was no guidance in terms of how to choose the number of neurons and their configurations in a given layer and how to decide on the number of hidden layers in a given A N N . Recent developments in the A N N literature, however, have provided the theoretical foundations for the universality of feedforward ANNs as function approximators. The results in Cybenko (1989), Funahashi (1989), Hornik et al. (1989, 1990), and Hornik (1991) indicate that feedforward ANNs with sufficiently many hidden units and properly adjusted parameters can approximate an arbitrary function arbitrarily well. Hornik et al. (1990) and Hornik (1991) further show that the feedforward A N N can also approximate the derivatives of an arbitrary function. Various attempts at exchange rate forecasting with ANNs are reported in Verkooijen (1996) and Plasmans, Verkooijen, and Daniels (1998), who estimated structural macroeconomic exchange rate models. In contrast, Hu et al. (1999), Zhang and Hu (1998), Kaashoek and van Dijk (1999), Kuan and Liu (.1995), and Jamal and Sundar (1997) modeled the exchange rate solely as a function of its past lags. Evans (1997)  13  forecasted the U K / D M exchange rate based on the exchange rates of several other currencies. Refenes and Azema-Barac (1994) investigated A N N applications in financial asset management. Moody and' Wu (1996) studied optimization of trading systems and portfolios with ANNs. Donaldson and Kamstra (1997) constructed a semiparametric nonlinear G A R C H model based on feedforward ANNs to forecast the volatility of stock returns. Franses and Griensven (1998) indicated that utilizing past buy-sell signals of foreign  exchange  rate  series  in  a  feedforward  A N N improves  out-of-sample  generalizations. Fernandez-Rodriguez et al. (2000) investigated the profitability of simple technical trading rules based on ANNs. Their results indicated that technical trading rules are superior to a buy-and-hold strategy in the absence of trading costs. Franses and Draisma (1997) proposed a method based on ANNs to investigate how and when seasonal patterns in macroeconomic time series change over time. Yang et al. (1997) used probabilistic ANNs in bankruptcy prediction. Blake and Kapetanios (2000) proposed a test for A R C H that uses ANNs. Heinemann (2000) investigated how adaptive learning of rational expectations may be modeled with the help of ANNs. Finally, in pricing of options, Hutchinson et al. (1994), Garcia and Gencay (2000) and Gencay and Qi (2001) demonstrated that feedforward ANNs can be successfully used to estimate a pricing formula for options, with good out-of-sample  pricing and  delta-hedging  performance.  14  2.1.2  Learning and adapting  Neural networks develop an inner structure to solve problems. Through the training process, connection weights rearrange their values and reveal a data pattern. Thus, the fundamental feature of neural networks is that they are trained, not programmed. A neural network learns with each new datum (an input-output combination) introduced into it during training. Every processing element responds to its input, adjusting its behavior. The network calculates the output in accordance with the elements' transfer function and learns by adjusting the input weights and biases. The equation that explains this change is called the learning law. There are two different learning modes: supervised and unsupervised. The supervised learning mode presents input-output data combinations to the network. Consequently, the connection weights and node biases, initially randomly distributed, adjust their values to produce output that is as close as possible to the actual output, i.e., the learning method (recursive algorithm) tries to minimize the current errors of all processing elements. With each subsequent cycle the overall network error between the desired and the actual output will be lower. Eventually, the result is a minimized error between the network and actual output (or desired network accuracy), as well as the internal network structure, which represents the general input-output dependence. In a one-layered network, it is easy to control each individual neuron and observe the input-output pattern. In multi-layered neural networks, supervised learning becomes difficult. It is harder to monitor and correct neurons in hidden layers.  15  Supervised learning is frequently  used for network decision, memorization, and  generalization problems. The unsupervised learning mode is independent of the external influences to adjust weights and biases. There are no concrete A N N output data to correct its pattern identification. In this mode, the A N N is able to recognize statistical regularities in its input space, automatically categorize them and develop behavior for different classes of inputs. Unsupervised learning can be used for classification, marketing and anomaly detection.  2.1.3  Backpropagation A N N  The backpropagation A N N , one of two network types applied in this thesis, is probably the most commonly used A N N . It is characterized by hidden layers and the generalized Delta rule for learning: If there is a difference between the actual and the desired output pattern during training, then the connection weights and biases, denoted by vector X, have to be recursively adjusted to minimize the mean-squared error (Qi 1996, Kuan and White 1994, Van Eyden 1996). Mathematically, X is chosen to minimize the following loss function:  16  M  mm M A  2^2(vt - tit )  2  where M is the size of the training sample, y is sample output and y is the estimated t  t  output value. We will show how to estimate y in a three-layered backpropagation A N N t  (Figure 2.3).  Suppose that A N N is composed of p input and q hidden nodes whereas signals at each layer are denoted by x (i=l,...,p) and h (j=l,...,q), respectively. The layers are it  jt  characterized by two arbitrary types of non-linearities: i[> and a. Backpropagation  17  learning algorithm requires continuous differentiate non-linearities and the most commonly used type is the sigmoid logistic (or logsig) function: f { w )  4  =TT^  As specified before, nodes in the hidden layer produce h signals from the inputs jt  while a single output node transforms them into y . Specifically, t  h, = ip j0 + 2Z :' " i=l t  a  a  Vi = °  where a  y  , j=i,--,q  X  1  (  p  j=i  \  j=i  ^^  and (3j denote appropriate connection weights between the adjacent layers.  Subscripts 0 for a and (3 stand for A N N biases. Studies by Cybenko (1989), Hornik et al. (1989) and Funahashi (1989) show that the above non-linear representation with sigmoid non-linearities can approximate a large number of mappings between inputs and outputs reasonably well. This makes A N N a very useful non-parametric technique and there is no need for any unjustified restrictions often present in econometric modeling.  4  Other types of transfer functions used in this research are tan - sigmoid (f(w)  —  ~— ) and  e'" + e purelin {f(w)  = W).  18  Delta rule recursive algorithm readjusts connection weights and biases starting at the output nodes and working back to the first hidden layer (Lippmann 1987). Connection weights are adjusted by  7rj(t  where:  "fo(t)  =  • 1) -  ';,,('•) + vSjx'  the weight from hidden node i or from an input to node j at time  t. 1/=.either  the output of node i or an input.  •n = the positive constant that measures the speed of the convergence of the weight vector (gain term or learning rate). 6j = an error term for node j (k goes over all nodes in the layers before node j):  Uj(l — Vj)(dj — Dj), when j is an output node x'j(l — x'j)Yl 47jA-> when j is a hidden node  where:  dj = the desired output of node j. Vj = the actual output of node j.  The node biases are adapted through the similar iterative algorithm by assuming they are connection weights on links from constant-valued inputs.  19  At least three layers are required: input, hidden, and output. The hidden layer is very important, since it enables the A N N to extract patterns and to generalize. Even though a hidden layer should be large, one must be careful not to deprive the network of its generalizing ability when the network starts memorizing, rather than deducing. On the other hand, a hidden layer that is too small could reduce the accuracy of recall. The connections are only feedforwarded between the adjacent  layers. For the transfer  function, the backpropagation A N N employs the sigmoid function (mentioned above). The learning algorithm that controls backpropagation follows a number of steps:  1)  Initialization: Initialize connection weights and neurons to small random values.  2)  Data introduction: Introduce the continuous data set of inputs and actual outputs to the backpropagation network.  3)  Calculation of outputs: Calculate the outputs, and adjust the connection weights several times applying the current network error.  4)  Adjustments: Adjust the activation thresholds and weights of the neurons in the hidden layer according to the Delta rule.  5)  Repeat steps (2) to (4) to until a desired network accuracy (error) is reached.  Backpropagation estimation techniques are important for large samples and realtime applications since they allow for adaptive estimation. However, they may not fully utilize the information in the data. White (1989) showed that the recursive estimator is not as efficient as the non-linear least squares estimator. An important aspect of the  20  backpropagation methods is the choice of the learning rate n. The inefficiency of the backpropagation originates in keeping the learning rate constant in an environment where the influence of random movements in inputs are not accounted by the target. This would lead the parameter vector to fluctuate indefinitely, i.e., there would be no convergence. A minimum requirement is to gradually drive the learning rate to zero to achieve convergence. In fact, White (1989) demonstrated that n, has to be chosen not as vanishing scalar, but as a gradually vanishing matrix of a very specific form. These arguments on learning rates are only valid if the environment is stationary, which is the case in this work (as well as the constant learning rate). However, instead of tuning the network's learning rate, we attempt to balance the bias and variance and avoid nonconvergence with early stopping. This may slightly deteriorate the model's performance and we acknowledge that probably by following White (1989) an optimal estimator would be achieved.  2.1.4  Recurrent A N N  In feedforward neural nets such as backpropagation ANNs there is no feedback between any two of its layers. In line with dynamic modeling, it would be useful to both detect and generate time-varying patterns. Backpropagation ANNs can account for lagged variables by simply including them into the set of inputs. However, this requires knowledge of a number of lags to be included. A sophisticated way to circumvent this  21  inconvenience is to allow for feedback connections within the A N N . Elman recurrent networks (Elman 1990) are backpropagation ANNs with the addition of a feedback connection from the output of the hidden layer to its input (Figure 2.4). Note that the Elman A N N has tan-sigmoid non-linearities (very similar to sigmoid logistic function, but can generate outputs between -1 and 1) in hidden neurons and linear output layer neurons.  The output of a recurrent A N N is calculated in a slightly different manner:  , j=i,-,q i=l  k=l  22  Vt  =  *  A) +  a  A)  +Ylfa^ j=\  \  j°  a  +  Yl  i=l  o 'i  <x  x  + zC^Ae—i) fc=l  where, in addition to the previous A N N type notation, p denotes feedback connection kj  weights (k=l,...,q) that are inputted back into the j-th hidden neuron, and  are  lagged outputs from each of k hidden layer neurons. By recursively substituting for h.'k(t-l) into the last expression for h , h can be jt  jt  expressed depending on the entire history of x . Further, it follows that y is a function it  t  of x and its past lags. This "moving average" component is a crucial difference between it  feedforward and recurrent ANNs which can thus capture more of the dynamics of y . t  As in the case of the backpropagation ANNs, learning and adapting of the Elman A N N is performed with the generalized Delta rule. Naturally, due to the presence of the feedback connection, the iterative estimation becomes more complicated and the learning time for recurrent ANNs is typically much longer than that in backpropagation ANNs.  2.2  N N method  The NN method is based on an assumption that geometric patterns in the past of the time series, similar by some measure to the currently observed variables, can be used for forecasting the dependent variable (see, e.g., Yakowitz 1987, Cleveland 1979, Cleveland  23  and Devlin 1988). Specifically, we use locally-weighted regression (LWR) to estimate one-step-ahead exchange rate change from a weighting scheme in which weights are functions of the Euclidean distances. This approach is a philosophical departure from the Box-Jenkins methodology where the forecasts are extracted from lagged observations and error terms (or external variables as in A R M A X model), rather than from the set of 'related' observations of independent variables - nearest neighbours. Suppose we are interested in forecasting y, one-period-ahead and x is a (mxl) t  vector of explanatory variables. Then, the non-linear model is  y = <> j t  +  e,  Efel x^) = 0;  t =1,...,T  where (j) (.) is an arbitrary, but fixed non-linear function. L W R estimates 4> (x*), i.e., the estimate of cj) at the specific value x=x*. This is performed in several steps as follows:  1. Let y* = (> (x*) denote our forecast of interest and x* the point where we estimate cj) (.). We then organize the x  Nj  (t = 1,...,T) series into h-histories  defined by:  •t-h+l) - , X , . - l )  24  The parameter h is also called the embedding dimension.  2. The next step is to choose k h-histories "closest" to x* called the nearest neighbours (x ). Let C, be a smoothing constant such that 0<C,< 1, and let k  fc=int (TQ, where int (.)  extracts the integer part of its argument. The  L W R uses k observations nearest to x*, where the proximity is measured with the most commonly used distance measure, Euclidean distance:  d(x*,x ) k  E ( ^ - ^ )  2  According to the Euclidean distance, x,,/ are ranked and assigned specific 1  weights. Observations not considered to be x*'s nearest neighbours are assigned a weight of zero.  3. In this step, we follow Diebold and Nason (1990) to construct a tricube weighting function:  w,  d(x ,x*) (d(x*,x )  v ? X  t  k  4. Our forecast is simply computed as:  where  25  J3 = arg min t=i  - x, 0? .  In this thesis we decide to select the Euclidean distance measure as the one most commonly used in the literature. Note that other distance measures could be employed rather than the Euclidian norm, such as Mean Absolute Deviation or even Range, but we leave them to further research. Further, the tricube weighting scheme is smooth in applying more confidence to the first nearest neighbour than the second nearest, and more to the second than to the third, and so forth. Thus, the weights are, set inversely proportional to the Euclidean distances. Various weighting schemes other than Euclidean are presented in Robinson (1987). Stone (1977) proved the consistency (in the statistical sense) of NN estimators for different weighting schemes. Consistency of N N estimators (and therefore LWR) requires that the number of employed nearest neighbours goes to infinity with sample size, but at a slower rate, that is, as T —> °°, k —» <=°, but (k/T) —> 0. Clearly, consistency depends on our choice of C As C, increases, the bias in cj)(x*) increases and the sampling variability decreases. Thus, one needs to choose C, to balance the trade-off between bias and variance. We find the optimal Q by in-sample cross-validation. Once determined, C, is fixed for out-of-sample estimation.  26  2.3  TSK method  2.3.1  Introduction  The fuzzy model is based on a learning by examples approach; by establishing relations between the relevant variables in the form of if-then rules. One of the aspects that distinguish fuzzy modeling from black-box techniques like ANNs is that fuzzy models are to a certain degree transparent to interpretation and analysis. We present a data-driven Takagi-Sugeno-Kang (TSK) fuzzy rule-based model (Takagi and Sugeno 1985, Setnes et al. 1998). It requires only a small number of rules to describe highly complex non-linear models. The basic idea of the TSK model is the fact that an arbitrarily complex model is a combination of mutually interlinked sub-models. If K regions, corresponding to individual sub-models, are determined in the state-space under consideration, then the behaviour of the system in these regions can be described with the simpler functional forms. If the sub-model is assumed to be linear and one rule is assigned to each submodel, the TSK fuzzy model can be represented with K rules of the following form:  R^: If then  x  ;  is Aj  7  and  y, = a±x + b. , :  ;  x  2  is A  i2  and i =  ...  and  1,2,3...,K  x„ is A,,  (2.3.1)  27  where i?, is the i-th rule, x , x ,..., x„ are the input variables, Aj,,A,- ,...,Aj„ are the t  2  2  fuzzy sets assigned to corresponding input variable, variable y represents the value of t  the «-th rule output, and a and b are parameters of the consequent function. The final t  t  output of the TSK fuzzy model for arbitrary x input can be calculated as a weighted k  average of the rule consequents:  K  , k = 1,2,3,N.  Vk=*^—K  (2.3.2)  X > ( * A : )  i=l  where p\ is the weight assigned to the i-th rule. Equation (2.3.2) can-be written in a form that shows more explicitly the concept of the system description using local models. This expression then takes the form  K  y = YjK(s ) k  A :  (x,a + b,)] , k = 1,2,-,N, t  (2.3.3)  where w represents the normalized weighting factor (or firing strength) of the «-th rule ik  for the k-th sample. Equation (2.3.4) shows that (2.3.2) and (2.3.3) are identical for the final output:  *>* = K  t [ X k )  . k=  1,2,3,...,N.  (2.3.4)  E%(^) 3=1  28  The TSK fuzzy rule based model, as a set of local models, enables application of the linear least-squares (LS) method (Pedrycz et al. 1997). LS algorithm requires a model that is linear in parameters. However, that does not imply that the model itself must be linear. We propose an extremely fast and accurate algorithm for generating the fuzzy model that is applicable for a variety of problems. The TSK algorithm combines clustering of the numerical input data and LS method. The first step is to identify the rule base structure. This step involves grouping of the input space using fuzzy clustering. Procedures for selection of the input variables and optimal number of rules are not discussed (these issues are covered in considerable depth in, for example, Sugeno and Yasukawa 1993). The next step consists of identification of parameters of the consequent part of the rule base. Keeping the conditional part of the rule fixed, parameters % and b  t  are determined using the LS method on the global level. Globally formulated LS yields the minimal parameter error estimate of the global non-linear model.  2.3.2  T S K fuzzy model b u i l d i n g  Step I - Structure identification Partitioning of the input space is performed in the first step, using a selected clustering method which utilizes the training (learning) data: TV observations of the exogenous variables, i.e., inputs x , k—l,...,N. Let the input vector be of dimension n. Each k  29  obtained cluster represents a certain operating region of the system, where input data values are highly concentrated. The learning data, divided into these information clusters, are then interpreted as rules. Methods of fuzzy clustering, such as fuzzy c-means (FCM) and Gustafson-Kessel (GK), are convenient tools for partitioning of the input or input-output space (Forsythe et al. 1977, Jang et al. 1997). A central part of the fuzzy clustering algorithm consists of calculation of the distance between the ^ t h sample, currently under consideration, and the centre of a given irth cluster. In the case of the F C M algorithm, this distance is calculated using the Euclidean distance, which is in the l-th iteration of the following form:  (2.3.5)  The G K algorithm evaluates this distance using the transformed Euclidian distance. Hence, in the l-th iteration of the G K algorithm, the distance is given with:  k = 1,2, ...,N  (2.3.6)  where  F is the covariance matrix for the i-th cluster and v is a centre of the i-th t  4  cluster. The G K algorithm is of a more general nature, since it forms hyper-ellipsoid clusters in the Tridimensional input space (those created by the F C M method are hyper-  30  spherical). However, the F C M method is less numerically complex and therefore faster. For these reasons in this thesis we use the F C M method. The result of the clustering process is, apart from the matrix of cluster centers, a fuzzy partition matrix U . This matrix contains values \i e [0,1], for 1 < i < K KxN  lk  A 1 < k < N, that represent the degree of membership of the fc-th sample to the i-th cluster. Elements of the matrix U are calculated using the following formula:  =  " K  E j=i  ( A  2  —  >  f  o  r d  *> ' 0  o  r  (** =  0  f  o  r d  ' * = °-  ( - -) 2  3  7  Vm-1)  id/;  The columns of the matrix U must satisfy the following condition:  K  E/^=l,  * = 1,2,3,...,JV.  (2.3.8)  i=l  This condition corresponds to the condition of orthogonality of input vector fuzzy set.  Step II - Identification of parameters of the consequent part of the rule  The parameter identification step encompasses procedure  for the calculation of  parameters of the T S K fuzzy model: the parameters of the linear regressions in the consequent part of the rule. The crucial problem is to determine the normalized firing  31  strength of the rules, or, more precisely, to estimate the contribution of each rule to the final model output. First, it is necessary to determine one-dimensional fuzzy sets A from tj  the conditional part of the rule. These fuzzy sets, denoted by A are obtained from the tj  multidimensional fuzzy clusters (given by U) by projection onto the space of the input variable x  jt  V-Aij ( hj) = projj(n )  (2.3.9)  x  ik  where \x is the level of belonging of the k-th observation (from the vector x ) to the i-th ik  cluster, while  k  (x ) is the value of the level of belonging of the ,fth input variable to hj  the k-th observation (j-th coordinate of the vector x ) of the fuzzy set k  functions  . Since all the  (ar^) under consideration represent membership functions, the condition  V-Aij i kj) • R- —* [0,1] must hold true. The value of the firing strength of the i-th rule for x  each A^th input sample is calculated as . and-conjunction by means of the product operator:  B * = B> (x )=Y[  n  k  u  A  (x ), kj  k = 1,2,3,...,N . (2.3.10)  J=I  Finally, normalized firing strength w of a rule R for the k-th sample is obtained ik  t  using the equation (2.3.4)!  32  The k-th element on the main diagonal of the NxN dimensional diagonal matrix Wi (i=l,...,K)  is formed using the values of the normalized firing strengths from (2.3.4).  Thus, NxK(n+l) composite matrix X' can be formed:  X ' = [(VK,X ), (W X ), (W,X ), .., (W X )] (2.3.11) e  e  t  e  K  e  where matrixX = [X, 1] contains rows [xT, 1]. Finally, the LS parameters a? and b of e  t  (2.3.1) and (2.3.3), that are in the consequent part of all TSK fuzzy rules, can be grouped into a K(n+l)~xl vector 9'. These parameters are organized in the vector 9' as follows:  0 =  <% L  $]  r  (2.3.12)  where 9[ — [of; &;], for i=l,...,K. Problem described in (2.3.2) and (2.3.3) can now be rewritten as the LS regression of the form  Y = X8' + e, where 8 is an error term. 9'  can be determined using the LS method:  6' = \{X'f X'\'  (X'fY  As in the case of the NN method, we find the optimal number of clusters (K) by in-sample cross-validation. Then we estimate 9' and hold it fixed for out-of-sample estimation.  33  Chapter 3 Exchange Rate Forecasting: A Market Microstructure Approach  3.1  Models of Exchange Rate Determination  Various models aiming at explaining exchange rate fluctuations have been proposed. Meese and Rogoff (1983) found that a simple random walk model performed no worse than any of competing representative time series and structural exchange rate models. Out-of-sa'mple forecasting power in those models was surprisingly low for various forecasting horizons (from 1 to 12 months). Subsequent attempts to determine exchange rates shed very little light on the problem. Baillie and McMahon (1989) pointed out that exchange rates are in general not linearly predictable. They are described as highly volatile with an elusive data-generating process (DGP). Similarly, Hsieh' (1988), Boothe and Glassman (1988) and Diebold and Nerlove (1989) observed that exchange rate changes are leptocurtic and may be nonlinearly dependent. Further, the observed exchange rates seem to exhibit volatility clustering, i.e., high (low) volatility periods tend to be followed by high (low) volatile  34  periods. This conditional heteroskedasticity evidence was reported in Diebold (1988), Engle (1982), Hsieh (1989) and Engle et al. (1990). Nevertheless, excess kurtosis and conditional heteroskedasticity in the residuals may not improve point forecasts because these effects operate through even-ordered moments. To model the observed effects, parametric non-linear models such as A R C H (Hsieh 1989) and G A R C H (Bollerslev 1990) were applied to exchange rates modeling, but with very little success. Gencay (1999) examined the predictability of spot foreign exchange rate returns using moving average technical trading rules and G A R C H , nearest neighbours (NN) and artificial neural networks  (ANN) models. G A R C H  models generated  insignificant sign  forecasts  improvement (and less than 1 per cent mean-squared error improvement) over a simple random walk. As noted in Diebold and Nason (1989), the pre-specification of the G A R C H model form may neglect other possible non-linearities resulting from a true DGP. Meese and Rose (1991) examined macroeconomic exchange rate models and found that poor explanatory power of the models cannot be attributed to non-linearities. They considered five non-linear structural exchange rate models in order to capture possible non-linearities. The application of several parametric and non-parametric techniques on these fundamentally-driven models did not show any improvement in our ability to understand the exchange rate fluctuations. Meese and Rose (1990), Diebold and Nason (1989) and Mizrach (1992) used a NN non-parametric estimator called locally-weighted regression (LWR) in order to handle non-linearities, but they could not significantly improve upon a simple random walk in the out-of-sample exchange rate predictions. In contrast, using an N N method,  35  Gencay (1999) and Lisi and Medio (1997) were able to generate predictions superior to those generated by the random walk model. This mixed evidence could suggest an existence of non-linear patterns in the exchange rates which, if revealed, could be exploited to improve both point and sign predictions. Kuan and Liu (1995) used backpropagation and recurrent artificial neural networks (ANN), a very powerful tool for detecting non-linear patterns, to investigate the ANN's out-of-sample forecasting ability on five exchange rates (British pound, Canadian dollar, Deutsche mark, Japanese yen, and Swiss franc) against the US dollar. The data were daily opening bid prices of the N Y Foreign Exchange Market and the model of interest was non-linear A R where its performance against random walk and A R M A processes was measured. Their results showed the presence of non-linearities in exchange rates time series. For the Japanese yen and British pound, ANNs exhibited significant sign predictions and/or significantly lower out-of-sample MSPE (relative to the random walk model); for the remaining three currencies ANNs had inferior forecasting performance. Some other studies involving ANNs were less encouraging. Plasmans, Verkooijen and Daniels (1998) and Verkooijen (1996) used macroeconomic models, but they could not produce any satisfactory monthly forecasts. However, Zhang and Hu (1998) modeled exchange rate as depending non-linearly on its past values, and their model outperformed simple linear models, but they never compared it to a random walk. Hu et al. (1999) showed (using daily and weekly data) that ANNs are a more robust forecasting method than a random walk model. Hence, the application of ANNs to short-term currency behavior was successful in numerous cases and the results suggest  36  that A N N models may have some advantages when frequent short-term forecasts are required (Evans 1997, Jamal and Sundar 1997, Gencay 1999).  5  All the above-mentioned approaches try to find the exchange rate determinants among macroeconomic variables such as interest rates, money supplies, inflation rates, and trade balances. Flood and Rose (1995) concluded that exchange rate modeling based only on macroeconomic fundamentals might be insufficient to explain the exchange rate volatility. Recently, Cheung and Wong (2000) conducted a survey of practitioners in the interbank foreign exchange markets in Hong Kong, Tokyo, and Singapore. A majority of participants view short-term exchange rate variability closely related to non-economic forces including bandwagon effects, over-reaction to news, speculation, and technical trading. Only 1 per cent of the traders look at economic fundamentals to determine daily exchange rate movements. Given the partial empirical success of the macroeconomic models, there is an increasing interest in the exchange rate microstructure. The microstructure approach investigates how specific trading mechanisms affect the exchange rate formation. Lyons and Evans (2002) incorporated a variable reflecting the microeconomics of asset pricing into a model of the exchange rate. They introduced the most important microstructure variable, "order flow" as the proximate determinant of the exchange rate (using daily data over a four-month period) and were able to significantly improve on existing  5  In this context, ANNs focus on daily (weekly) or less-than-a monthly forecasting frequency, while typical  macroeconomic models are at a monthly or quarterly frequency.  37  macroeconomic models. More precisely, they managed to capture about 60% of the exchange rate daily changes using a linear model. In general, there are two broad theories of exchange rate modeling: traditional macroeconomic models and the more recently developed market microstructure models. Macroeconomic models aim at modeling and estimating exchange rates at monthly or lower frequency. These models are in general of the following form:  Arpfx, = 4>(M) + e, , t=l,..,N. t  where Arpfx, is the change in the logarithm of the real exchange rate over the month or some lower frequency of observations, and M,_ is a vector of typical macroeconomic variables such as the difference between home and foreign nominal interest rates, the long-run expected inflation differential, and relative real growth rates. To control for the 6  key Canadian macroeconomic variables, this thesis uses a variation of the model developed by Amano and van Norden (1995):  Arpfx, = ip( rpfx,, com,, ene, ,intdiff ) + 6, , t=l,..,N. t  where rpfx is real Canada/U.S. exchange rate deflated by GDP deflators, com, is the t  logarithm of non-energy commodity price index (deflated by the U.S. GDP deflator), ene, is the logarithm of energy commodity price index (deflated by the U.S. GDP 6  See Meese and Rogoff (1983), Dornbusch (1976).  38  deflator) and intdiff represents the nominal 90-day commercial paper interest rate differential (Canada-U.S.). Macroeconomic models provide no role for any "market microstructure" effects to directly enter into the estimated equation which are thus incorporated through the error term 5 . These models assume that markets are efficient in the sense that information is 7  t  widely available to all market participants and all relevant and ascertainable information is already reflected in exchange rates. In other words, from this point of view, exchange rate changes  are  not  informed  by microstructure  variables.  However, typical  macroeconomic models perform poorly. Moreover, empirical evidence from Lyons and Evans (2002), Yao (1997), Covrig and Melvin (1998), and this thesis suggests that a microstructure variable 'order flow' contains information relevant to exchange rate determination. For the spot foreign exchange (FX) trader, what really matters is not the data on any of the macroeconomic fundamentals, but information about demand for currencies extracted from purchases and sales orders, or order flow (Cheung and Wong 2000). Any short-term exchange rate determinant such as portfolio shifts, over-reaction to news, or speculative trading would be recorded in order flow. It is presumed that certain F X  7  Microstructure literature examines the elements of the security trading process: the arrival and  dissemination of information; the generation and arrival of orders; and the market architecture which determines how orders are transformed into trades. Prices are discovered in the marketplace by the interaction of market design and participant behaviour.  39  traders observe trades that are not observable to all the other traders and, in turn, the market efficiency assumption is violated at least in the very short term.  8  Microstructure models directly rely on information regarding the order flow. Lyons and Evans (2002) approach an order flow/exchange rate relation through a very realistic framework - portfolio shifts model. Their model is a three-stage game between the dealers and the public (see Figure 3.1). In the first stage, non-dealer market participants (corporations, mutual and pension funds, etc.) analyze all the publicly available information, including macroeconomic fundamentals, and then decide on orders (dealer i's quote is P^, i=l,...,M). Having observed their order flows (which thus reflect information about macroeconomic fundamentals), dealers re-set their price to P . The i2  second stage is the interdealer trading (sharing the inventory risk) where each dealer simultaneously trades on other dealers' quotes. Aggregate interdealer trades (ED ) are i2  publicly observable while customer-dealer trades are not (the same is assumed for other order flows such as foreign institution transactions). The interdealer flows inform dealers about the total size of the currency stock (inventory imbalances) the public needs to absorb to achieve equilibrium. In the third stage, dealers simultaneously and independently quote a new price (P ) so that the public's inventory is in equilibrium and i3  dealers end the day with no net position.  s  It may be that markets, absent these market microstructure frictions would be efficient, but trading  frictions impede the instantaneous embodiment of all information into prices.  40  Figure 3.1 Three-stage exchange rate determination from Lyons and Evans (2002).  Stage 1  Stage 2  Stage 3  Lyons and Evans use the perfect Bayesian-Nash Equilibrium (BNE) concept where dealers choose quotes P^, P , and P , and the dealer's interdealer trade D . They 9  i2  i3  i2  explicitly derive an equilibrium price change (between period t-1 and t) and equilibrium trading strategies. Intuitively, equilibrium price is determined from the stage 1 common information set (macroeconomic fundamentals, denoted r ) and aggregate interdealer t  order flow (IB ): t  AP . = r + X IB t  t  t  where X is a positive constant. This model was estimated over a four-month span of daily observations controlling for a key macroeconomic variable, interest rate differential. The results were in favour of microstructure approach with R statistic over 50 per cent. 2  9  The sum of each dealers interdealer order is an interdealer order flow, or interbank transactions (IB), as  denoted later in the text.  41  However, there are several unanswered questions which require further research. First, can we restrict our model to only one macroeconomic determinant? In this research, we try to control (among other candidates available on daily basis) for another fundamental variable: crude oil price. Secondly, is there a non-linear conditional mean function that characterizes a true DGP? By employing several non-linear methods and relaxing the three-stage game restrictions, we attempt to find another equilibrium, nonlinear by its structure. Indeed, in this new setting there is a possibility that other types 10  of order flow might play a role in setting the price. Finally, can this model be successfully used for out-of-sample forecasting? In this thesis, we address all of these 11  issues and provide adequate answers based solely on empirical grounds. In  general, the  market  microstructure approach  assumes  the  following  relationship between the exchange rate and the driving variables:  Arpfx,. = i> (Ax , AI,, N ) + t  t  X t  , t=l,.,N.  where Ax, represents order flow, AI, is a change in net dealer positions, while N is any t  other microeconomic variable. Order flow can be positive (net dollar purchases), or negative (net dollar sales). Macroeconomic effects are incorporated into error term \ . 12  t  10  For example. Bhattacharya and Spiegel (1991) and Rochet and Vila (1994) show that microstructure  models can have multiple or non-linear equlibria. 11  Lyons and Evans (1999) are unable to generate statistically significant forecasts due to small sample size  (and possibly because of misspecification and/or the linearity assumption). 12  Order flow is explained in the next section (3.2).  42  A positive relationship between the exchange rate and order flow is expected since informational asymmetries gradually affect the price until it reaches equilibrium. Figure 3.2 illustrates the explanatory power of an aggregate order flow and its IB component (the data cover the period from January 1990 to June 2000 at a daily frequency).  Figure 3.2. Above: Aggregate (cumulative) order flow and log Canada/U.S. real exchange rate. Below: IB (cumulative) order flow and log Canada/U.S. real exchange rate. Note: A l l values are normalized to [-1,1].  0  10  20  30 Observation number  40  50  60  As the solid line indicates, the Canadian dollar has depreciated throughout most of the sample. The relationship between the exchange rate and order flow is quite clear as a positive correlation between cumulative purchases of U.S. dollars and the  43  depreciation. This relationship is more obvious from the three-month sample (December, 1999 - February, 2000) of IB order flow and log Canada/U.S. real exchange rate. However, it would be inappropriate to assume that order flow contains all the information that is relevant for exchange rates. ' 1  This thesis combines macroeconomic and microstructure approaches into a single high-frequency data model. More specifically, it embodies modified models from Amano 14  and van Norden (1995, 1998) and Lyons and Evans (2002):  Arpfx, = ^(Aintdifft.!, Aoil,.!, A x ) + T| , t=l,..,N. Nl  t  where Aintdiff, is the change in the differential between the Canada and U.S. nominal 90-day commercial thesis rates, Aoil is the daily change in the logarithm of the crude oil t  price, and order flow is denoted by Ax,. Later in this chapter, Ax, is substituted by either a vector of three different order flow types, aggregate order flow or interbank transactions. ANNs, NN and TSK methods are employed to estimate a nonlinear relationship between exchange rate movements and these variables.  13  Models of this type can be found in Kyle (1985), Back (1992), etc.  14  Goldberg and Tenorio (1997) and Osier (1998) also follow this approach.  44  3.2  Model Specification  3.2.1  Data description  The order flow data were obtained from the Bank of Canada's unique Daily Foreign Exchange Volume Report, which is coordinated by the Bank and organized through the Canadian Foreign Exchange Committee (CFEC). Details about the trading flows (in Canadian dollars) for six major Canadian commercial banks are categorized by the type of trade (spot, forward, and futures) and transaction type (i.e., with regard to trading partner).  15  Because this thesis focuses on a short-term exchange rate forecast, spot  transactions are of interest. In a spot transaction, a currency is traded for immediate delivery and payment is made within two business days of the contract entry date. Spot transactions vary, as follows: • Commercial client transactions (CC) include all transactions with resident and nonresident non-financial customers. • Canadian-domiciled investment transactions (CD) include all transactions with nondealer financial institutions located in Canada. • Foreign institution transactions (FD) include all transactions with foreign financial institutions, such as F X dealers.  15  The six banks used in this research account for approximately 83 per cent of all Canada/U.S. dollar  transactions. The remaining transactions occur within Canada (4 per cent) and in the US and the rest of the world (13 per cent). Source: Bank of Canada.  45  • Interbank transactions (IB) include transactions with other chartered banks, credit unions, investment dealers, and trust companies in the interbank market.  Because it was unavailable prior to 1994, CD transactions are excluded as an explanatory variable in this work. However, according to a linear model by D'Souza (2002), this variable is not statistically significant in exchange rate forecasting and it is not considered to be a major part of aggregate order flow. Moreover, according to Reuters Dealing 2000-1 electronic dealing system, IB transactions account for about 75 per cent of total trading in major spot markets (Lyons and Evans 2002). Thus, the C D transactions contribution in daily exchange rate explanation is relatively small and our results support this conjecture. Individual order flows (CC, F D , IB) are measured as the difference between the number of currency purchases (buyer-initiated trades) and sales (seller-initiated trades). Aggregate order flow (aggof) is the sum of individual order flows. As noted earlier, the other variables of interest are the crude oil closing price (in U.S. dollars) deflated by the U.S. consumer price index (CPI) (Aoil) and the change in the difference between nominal 90-day commercial paper rates in Canada and the United States (Aintdiff). The dependent variable data set comprises the logarithm of real Canada/U.S. exchange rate daily changes (Arpfx) between January 1990 and June 2000. To control for the day-ofthe-week effects, we focus on Wednesdays only: a total of 523 observations (after estimating the first differences). The first 470 observations represent the in-sample data and roughly the last 10 per cent (53 observations) are out-of-sample data. The real  46  exchange rate was calculated from the nominal exchange rate and CPI for the United States and Canada. All variables are considered in first-difference terms, because the daily change (positive or negative) prediction is of interest. Also, by using the first differences we avoid theoretical problems of estimation of non-stationary non-parametric functions (see Diebold and Nerlove 1989b). To support this claim, standard Dickey-Fuller unit roots tests are performed on all variables and based on the entire data set (Table 3.1). The logarithm of the real exchange rate, interest rate differential and the logarithm of the crude oil price are found to be integrated of order one. In contrast, the null hypothesis of nonstationarity for daily order flows is rejected at the 99 per cent significance level. These findings are not surprising and in line with D'Souza (2002) who used a subset of our data set (from January 1996 through September 1999). Table 3.1 Augmented Dickey-Fuller (ADF) unit root t-tests Variable  t-test  Lags (f)  Variable  t-test  Lags (f)  Real exchange  -0.84  1  Change in real  -34.31**  1  -19.90**  5  -31.35**  2  exchange rate  rate Interest rate  -1.48  8  Change in interest rate  differential  differential Crude oil price  10  -2.42  Change in crude oil price  CC  -25.31**  2  FD  -23.43**  2  IB  -31.12**  1  Notes: The sample is from January 1990 to July 2000. Critical values are from Hamilton (1994): -3.43 (1 per cent), **-2.86 (5 per cent).  ;  47  Given the systematic bias in currency forward rates (for more on this "conditional bias," see Lyons 2001), and their unavailability on a daily basis (the minimum contract length is 30 days), we do not test if the inclusion of a forward rate into our set of explanatory variables improves the daily predictability of the Canada/U.S. exchange rate. However, we test for bias in forward rates based on the following equation as suggested by Lyons (2001): P/.+30  where p  t+30  ~Pt  =  CX + /?(/  ti30  ~p )+  £  t  t+30  denotes the spot rate realized at time t+30 (Canada/U.S.), f  t30  30-day forward rate of the Canadian dollar, settled at time t+30, and e  t+30  denotes the is a random  error term. The data for this exercise were obtained from the Statistics Canada CANSIM database (data range: January 1994 - May 2002). The regression results can be summarized as follows:  Pt+so - P t = 0-000525 (2.937104)  We  0.432875(^30  -  p ), t  R  2  = 0.010  ( - 3.449215)  report the t-values in parentheses. The estimate of (3 is statistically  significant and negative. In other words, when the forward rate predicts the spot rate will rise, it actually falls. Additionally, the model has a very poor fit with R below one 2  per cent. This confirms the bias in the 30-day forward rate of the Canadian dollar. Initially, two non-linear models are considered: Model 1:  Arpfx, = f (Aintdiff,.!, A o i l ^ , IB^) + e ; t=l,..,N.  Model  Arpfx, = g (Amtdiff,.i, Aoil _i,  2:  t  t  CC  t  l  ,  TB , F D ^ ) + v,; t=l,..,N. W  48  Only for the purpose of A N N modeling, all data were normalized to the [-1,1] interval. Each of the above non-linear models was developed based on chapter 2 techniques and data sets of four variables (model 1) and six variables (model 2). The model was used to forecast the daily change of the Canada/U.S. real exchange rate one day into the future. We a priori specify model parameters, i.e., A N N structure, the number of nearest neighbours, TSK clusters, etc. This pre-specification is based on the cross validation procedure which minimizes MSPE and maximizes P E R C . We do not exactly follow Gencay (1999), but our approach imposes a fair restriction to all of the considered models. Essentially, once determined for the in-sample data (training set), the optimal model structure does not adapt with each out-of-sample observation (testing set). As pointed out by Gencay (1999) recalculation with each testing sample is computationally expensive and in this thesis we do not attempt to construct a flawless on-line exchange rate predictor. Rather, the intention is to explore the usefulness of an order flow variable and the possibility of presence of conditional-mean non-linearities. Initially, our implementation of the cross-validation follows Gencay (1999). First, we select the 70 most recent training set observations (approximately 1/7 of the training set size). At this point, all the remaining training set observations are treated as "testing data." Then, we repeatedly calculate MSPE and P E R C statistics from these "testing data" until we determine the optimal model structure which yields the lowest MSPE and highest PERC. This structure is recorded and we proceed with the cross-validation by adding 10 more observations from the past to our initial training data. Again, we record  49  the optimal model parameters for a new smaller "testing set" and this procedure is applied until the size of a "testing set" is zero. Finally, we choose our models' parameters from the set of recorded model architectures. Specifically, we choose models with lowest MSPEs and highest PERCs. Non-linear model parameters are held fixed for the whole testing set. We consider the absence of model re-adjustment as a strong, but fair restriction. If the applied non-linear model is truly capable of capturing nonlinearities in the exchange rate, the pattern should be quite clear from the basic training set. In order to investigate the robustness of our ANN-generated forecasts', in the next step, we focus on daily and weekly forecasting with backpropagation ANNs where we develop models of four variables (ANN model 1) and six variables (ANN model 2) using the whole data set of 2,230 observations. Two models are considered:  A N N Model 1:  Arpfx = f (Aintdiff , A o i l , aggof.j) + e ;  ANN Model 2:  Arpfx = g (Aintdiff,.,, Aoil^, CC,..;, IB , FD,.,) + v ;  t  H  H  t  t  W  t  j={l, 7};t=l,..,N.  The  networks  trained  and  tested  were  the  three-layer  and  four-layer  backpropagation ANNs with the non-linear sigmoid and tan-sigmoid neuron activation functions in hidden layers. The number of input neurons was three for model 1 and five for model 2, while the number of hidden neurons varied between three and five for both of the models. The last layer had one linear output neuron.  50  It is important to note that in this research we do not utilize a more complex scenario involving other A N N types and methods for determining optimal A N N structure. Our intention was to emphasize the role and usefulness of the microstructure variables (information) in a relatively simple setting, without refining the estimation procedures. It is possible that a more complicated approach such as Gencay and Qi (2001) would be more successful. The A N N and NN analyses were performed using the software package Matlab, v. 5.2.0, The MathWorks, Inc. (1998). The TSK estimation was done with the C++ program created by Dr. Dragan Kukolj, University of Novi Sad, Yugoslavia.  3.2.2  Assessment of forecast performance  This thesis considers whether the A N N and other non-linear models can outperform linear models (with order flows included into the set of explanatory variables), Autoregressive Integrated Moving Average with external process models (ARIMAX) and random walk (with a drift) models in terms of root mean squared error (MSPE) and the percentage of correctly predicted directions of exchange rate changes (PERC). M S P E is defined as follows: 1 M MSPE  =  A  ( Pf t~ Ar  M  x  Pf t.?  Ar  x  t=i  51  where M denotes the size of the out-of-sample testing set, Arpfx is the model forecast at t  time t (£=1,...,M) and Arpfx, is the actual (sample) exchange rate change. In the' second part of this chapter, in line with the Meese and Rogoff (1983) evaluation criterion, recursive estimation (or rolling regressions) is used to evaluate the models' predictive performance. The initial estimation starts with the first 90 per cent (chronologically) of the sample N , or, for instance, m observations. That comprises training and validation sets for the A N N . The remaining 10 per cent is a testing (forecasting), set of initial size k;.having estimated the model, k forecasts are generated. Subsequent (k-\) steps involve increasing the estimation sample (so that m increases) and shrinking the testing set (so that k decreases) by one period. In each subsequent step, (k-s) forecasts are estimated (s=l,...k-l). Finally, k sets of network responses (of size l,..,k) can be compared to actual observations and other models. This part of the chapter considers whether the A N N model can outperform linear and random walk models in terms of root-mean squared error (RMSE) and the percentage of correctly predicted directions of exchange rate changes. For most of this chapter, RMSE is defined as follows:  RMSE  N  =  —  N -\ y Arpfx _ k  A  (  k+m  t  Arpfx  k+h  where N denotes the size of the out-of-sample testing set (N =l,...,k), Arpfx _ k  k  k+m  t  is the  model forecast at time (k+m-t), and Arpfx . is the actual exchange rate change. The k+m  t  52  ratio of data allocated to training, validation, and testing was maintained at 6:3:1 throughout the recursive experiment. To reduce overfitting, i.e., inconsistency in calibration, which results from the network complexity (too many parameters to be estimated) and (possibly noisy) data length, one approach would be to use ANNs with hints (homogeneity, consistency, etc.). Recently, Garcia and Gencay (2000) showed that an A N N with homogeneity hint can produce a smaller out-of-sample error and a more robust estimator. Gencay and Qi (2001) found that early stopping can be as effective as the homogeneity hint in pricing options. By using early stopping in this thesis we assume that response functions are estimated consistently. In addition to MSPE and RMSE, the percentage of correctly predicted signs (PERC) of the forecasted variable Arpfx, is considered; this is the total number of correctly forecasted positive and negative movements, defined as:  1  PERC =  1  where p  M  TTYPI t.=i  if(Arpfx • Arpfa) > 0, t  t  0  otherwise.  Sometimes, the significance of the difference in the performance of alternative models has to be tested. We use Diebold-Mariano test (Diebold and Mariano 1995) to test the null hypothesis that there is no difference in the MSPE and P E R C of two alternative models (in our case of the random walk and the other models).  53  Let y (t=l,...,M) denote the time series to be forecasted and let y] and yf t  denote two different forecasts of y . For instance, yj and yf could originate from a nont  linear A N N model and a random walk process, respectively. We define forecast errors as follows: H = yt.-y\  , t=l,...,M  In general, a time-t loss associated with a forecast is an arbitrary function of y, and yj or yf. We denote those functions ^(y,,, yj) and l (y , yt)- In many cases such as 2  t  MSPE or root mean squared error (RMSE), l (.) and 1 (.) will be functions of ij and if. x  2  Our loss functions are of the following form:  k =(3) ,i=l,2; t=l,...,M. 2  0  if{y •$)><),  —1  otherwise  t  .  , i=l,2; t=l,...,M.  The first one is used to measure MSPE performance, and the second one for directional accuracy, or PERC. The null hypothesis of interest is: H : Equal forecast accuracy for y} and yf (t = 1,...,M), 0  or E (l (y , yj)) = E (l (y , yf)), or E(d ) = 0 (d = l^y,, y\) - l (y , yf)), a  t  2  t  t  t  2  t  54  and the alternative hypothesis is: H : Different forecast accuracy or E(d ) ^ 0. s  t  The Diebold-Mariano test statistic for the equivalence of forecast errors will be M  V  M  where M is the testing set size and f(0) is the spectral density of d at frequency zero. t  Diebold and Mariano show that S is asymptotically distributed as a J\f(0,l). If we use 1  the above quadratic loss functions, we would have the following:  •  When Si < -1.96 model "1" has a M S P E significantly lower than model "2" at 95% confidence level (H rejected at 5% significance level). 0  •  When Sj > 1.96 model "2" has a MSPE significantly lower than model "1" at 95% confidence level (H rejected at 5% significance level). 0  •  When -1.96 < S < 1.96, H is not rejected at 5% significance level. x  0  For the directional accuracy loss functions the following would be true: •  When Si < -1.96 model "1" has a P E R C significantly lower than model "2" at 95% confidence level (H rejected at 5% significance level). 0  •  When Si > 1.96 model "2" has a P E R C significantly lower than model "1" at 95% confidence level (H rejected at 5% significance level). 0  •  When -1.96 < Sj < 1.96, H is not rejected at 5% significance level. 0  55  As forecasts are done only one step ahead we do not have to include any of the sample autocovariances to calculate the long-run variance of d , 2irf(0). In this case, a t  consistent estimate of 2irf(0) will be the sample variance of d (see Campbell et al. 1997 t  or Diebold and Mariano 1995). The estimated autocovariances of series d, (MSPE performance) for the ANN model 2 (backpropagation) for lags 0 to 7 are 10" x (0.2847, 12  0.0577, 0.0233, 0.0134, -0.0396, -0.0192, -0.0003, -0.0300). Similarly, when directional accuracy is measured for the ANN model 2, the following autocovariances are estimated: 0.1381, -0.0133, -0.0334, -0.0144, 0.0463, -0.0178, -0.0185, -0.0143. Clearly, except for lag 0, the estimated autocovariances are roughly zero and do not have to be included in the calculation of 2irf(0).  3.3  Empirical Results  3.3.1  Models based on individual order flows when controlled for the day-  of-the-week-effects  This section assesses the out-of-sample  forecasting performance  of a range of  Canada/U.S. exchange rate models. The out-of-sample size is fixed at the last 53 observations. Generally, the random walk model performs better than any traditional  56  linear macroeconomic model that excludes microstructure variables; therefore, it can be viewed as the benchmark model. The following models are considered first:  Random walk model (RW):  rpfx =a + rpfx .! + x t=l,..,N. t  0  t  Linear model (LM1):  Arpfx,. = ^ o + ^ Aintdiff,,, + ~f Aoil,.! + ^ IB,,] + e,, t=l,..,N. 2  3  Linear model (LM2):  Arpfx, = p -r- Pi Aintdiff,,! + (3 Aoil,,i + (3 CC,.i + (3 IB,,i + p F D + v , t=l,..,N. 0  2  3  4  5  M  t  To capture more of the model dynamics we use the Box-Jenkins approach. The A R M A models can typically provide an accurate description of stationary non-seasonal time series. This series may be forecasted by fitting an appropriate ARMA-type model. We employ the following two versions of this popular model:  ARIMAXl(pq,q): 3  A(L,p)(l  - L)rpfx  t  = J2B,(L,a)X  lt  + C{L,f3)e  u  e t  NID(0,u?)  i=l where  57  A(L,p) = l+ L  + ... + p V.  Pl  p  Bj(L, a) = 1 + a^L + ... + a, L . k  k  C(L,j3) = l+ftL  + ...+{3 U. q  X = (Aintdifft-i,  IB _i).  Ao^.!,  t  t  t = l,...,N.  ARIMAX2( q,q): P  5  A{L,p){\ - L)rpfx =  a)X« + C(L,fJ)e , s ~  t  t  t  NID^u?)  where A(L,p) = l+p L  + ... + p IF.  1  p  Bi(L,a) = 1 + a L + ... + a, L . k  n  C(L,/3) = l+0 L l  k  + ...+/3 D.  X = (AintdiflU, Aoik-u t  q  IB _„ t  CC _ t  u  FD^).  t = l,...,N.  Naturally, we attempt to take advantage of the microstructure data set and that justifies the usage of microstructure external variables. All of the non-linear models are of the form specified in Section 3.2: Arpfx, = (() (intdiff,.!, A o i l , IB,^) + e,.; t=l,..,N.  (3.3.1)  M  Arpfx, = ^ (Aintdiff,..!, A o i l , 0 0 ^ , IB,.!, F D ^ ) + v ; t=l,..,N. (3.3.2) w  t  We denote our set of non-linear models as follows:  •  ANNB1 (backpropagation ANN model for 3.3.1),  •  ANNB2 (backpropagation A N N model for 3.3.2),  58  •  ANNR1 (recurrent A N N model for 3.3.1),  •  ANNR2 (recurrent A N N model for 3.3.2),  •  NN1 (NN model for 3.3.1),  -  NN2 (NN model for 3.3.2),  -  TSK1 (TSK model for 3.3.1),  •  TSK2 (TSK model for 3.3.2).  The optimal networks trained and tested were the three-layer and four-layer backpropagation and recurrent ANNs with the non-linear sigmoid and tan-sigmoid neuron activation functions in hidden layers. The number of input neurons was three for model 1 and five for model 2, while the number of hidden neurons varied between three and five for both of the models. The last layer had one linear output neuron. The results depend strongly on the A N N architecture. More specifically, the number of hidden layers, number of neurons in the hidden layer, type of activation function, and training algorithm are principal determinants of a good prediction model. To avoid overtraining, the A N N was trained with an early stopping technique, where the available data set was divided into three subsets (maintaining the ratio 6:3:1): a smaller training set (used for gradient calculation and weights and biases updating); a validation set (when the validation error starts increasing, the training is stopped); and a testing set (used to compare real and model output, or different models).  59  The optimal number of nearest neighbours was for C, ~ 0.7, i.e., approximately 270 nearest neighbours. The choice of seven clusters produced the most accurate TSK model. The results of our out-of-sample forecasting exercise are summarized in Table 3.2 The first column represents the relative (to the benchmark) MSPE associated to each of the models; the benchmark model is an R W model and its MSPE is normalized to the value of one. Thus, the relative MSPE shows the exact relative gain (or loss) of our competing models compared to the R W model. The P E R C statistics are reported in the second column. In the M S P E and P E R C out-of-sample estimations, we use the optimal non-parametric model structure found from cross-validation procedure on the training set (470 observations). The Diebold-Mariano (DM) test statistics are reported in the parentheses below MSPEs and PERCs. If there are significant gains in MSPE (for instance, at 5% significance level) relative to the benchmark, the D M value should be less than -1.96. Similarly, if there are significant gains in P E R C (at 5% significance level), the D M value should be greater than 1.96.  16  The P E R C statistic of approximately 58 per cent for the R W model is very high. This suggests a significant amount of positive exchange rate movements, which is not surprising for the period 1990-2000. Relative to the benchmark model, linear models produce minimal gains. A R I M A X (2,1,1) models exert a 6 per cent improvement in P E R C , but almost no improvement in MSPE. Adding more lags to the A R and M A  1 6  The critical values are +/- 1.64 and +/- 1.96 for a confidence level of 90% and 95%, respectively. (*) and  (**) indicates the D M statistic is significant at 5% and 10% significance level, respectively.  60  components of the A R I M A X model only worsens these results. There is certainly a degree of overfitting involved with the conditioning on past exchange rate changes and errors. Table 3.2 Out-of-sample forecast performance. Relative MSPE  P E R C [%]  (DM)  (DM)  RW  1.000  58.490  LM1  0.993  60.377  (-0.259)  (0.255)  0.994  62.264  (-0.206)  (0.467)  0.991  64.150  (-0.269)  (0.553)  0.993  62.264  (-0.197)  (0.362)  0.949  67.924  (-0.888)  (1.695)**  0.920  69.811  (-2.285)*  (2.196)*  0.971  67.924  (-0.827)  (1.938)*  0.952  69.811  (-2.357)*  (2.192)*  0.964  60.377  (-0.758)  (0.206)  0.917  66.037  (-1.590)  (0.813)  1.024  58.491  (0.106)  (0.012)  0.970  62.264  (-0.296)  (0.389)  Models  LM2  ARIMAX1 (2,1,1)  ARIMAX2(2,1,1)  ANNB1  ANNB2  ANNR1  ANNR2  NN1  NN2  TSK1  TSK 2  61  One very clear pattern emerges: both types of ANNs are markedly better than any of the linear and non-linear competitors. They produce best forecasts when all three order flows are included (models ANNB2 and ANNR2). The ANNB2 model provides statistically significant gains in MSPE (about 8 per cent) and P E R C (about 12 per cent), while for the ANNB1 model there are only significant forecasting improvements in PERC. Further, backpropagation A N N appears to be more successful than the recurrent one. Thus, as in the linear case, lagged variables do not matter a lot for high-frequency forecasting. The NN2 model also provides almost border significant (at 10 per cent) reductions in MSPE relative to the R W model. T S K models do not offer any significant forecast gains, but, again, more accurate forecasts are related to a non-linear fivevariable model. In Table 3.3, we also present the out-of-sample performance (relative M S P E and PERC) for longer forecast horizons - two-step-ahead and three-step-ahead  ANNB2  forecats. These forecasts are notably inferior to the ones from the one-step-ahead experiment. The P E R C statistic incurs losses of roughly 2 and 5 per cent, respectively, while the relative MSPE increases. Thus, the findings show that the optimal forecast horizon for daily frequency is 1 day.  Table 3.3 Two-step-ahead and three-step-ahead forecast performance of the ANNB2 model.  Horizon (k)  Relative MSPE  P E R C [%]  k=2  0.958  67.307  k=3  0.966  64.705  62  3.3.2  Parameter estimates and relative significance of inputs  ANNs are often regarded as non-parametric "black-box" models because they involve highly complex non-linear functional forms and parameters which are difficult to interpret. However, once the ANN is trained (estimated), the relationship between inputs and outputs is uniquely defined by non-linear (transfer) functions, connection weights and node biases. This allows us to make ANNs somewhat transparent to interpretation and analysis. We extend the method of pseudo-weights proposed by Qi and Maddala (1995) to find the economic importance of the inputs.  17  More precisely, we use the  weighted average of the input weights to find the marginal contribution of each input variable to the output. Using the notation from chapter 2, we slightly modify their formula for pseudo-weights and define the four-layered ANN's pseudo-weight for the i  t h  input (PWj) as Z>2'%j , = i j =i  <i, • u 3  (--) 3  3  3  2  where q is the number of nodes in the first hidden layer, q is the number of nodes in :  2  the second hidden layer, o^- denotes connection weights between the input layer and the first hidden layer (i.e., between the i" input node and the JY' first hidden layer node), 1  1  a"^ denotes connection weights between hidden layers (i.e., between the j / 2  layer node and the j  , h 2  first hidden  second hidden layer node), and (3 represent the connection  weights between the second hidden layer (j 17  1 1  h  th 2  second hidden layer node) and the output  Qi and Maddala (1995) use ANNs to price call options and find that the economic implications of pseudo -  weights are consistent with the call option properties.  63  layer (single node). Since we are dealing with a four-layered A N N here, we modify the above three-layered A N N notation by introducing q and q hidden nodes (as opposed to 1  2  q nodes in a single hidden layer ANN), and two sets (matrices) of connection weights ( a n d  a"j ) associated with the connections inputted into hidden layers (as opposed 2  to ay for a single hidden layer). Also, for a four-layered A N N we denote non-linearities in hidden layers with node) and er|  0  and  (for the j  , h 2  i|; , 2  and their biases with  (for the j / ' first hidden layer 1  a'  ji0  second hidden layer node). The recurrent A N N follows the  same notation. As mentioned before, the A N N models with two macroeconomic variables and three order flow types generated the greatest and statistically significant one-day-ahead forecast improvements. Table 3.4 uses the notation from 3.3.3 and presents the model's forecast performance for different A N N types. Table 3.4 The A N N models with the greatest forecast improvements. A N N type [p qi  (4>i) <h (V^)  i {<?)}  Relative MSPE  P E R C [%}  (DM)  (DM)  1. Backpropagation (ANNBP1)  0.92  [5 5 (logsig) 5 (tansig) 1 (purelin)]  (-2.285)*  use}***-;  », . . llillftSasHiBIPsiSi  BI9IBH  . i. *  2. Backpropagation (ANNBP2)  69.811  [5 3 (tansig) 5 (logsig) 1 (purelin)]  12.196)*  3. Recurrent (Elmanl)  0.952  [5 5 (logsig) 5 (tansig) 1 (purelin)]  (-2.357)*  4. Recurrent (Elman2) [5 5 (logsig) 5 (tansig) 1 (purelin)]  fTI'lliWif -  ~v.  ,  •  '  -  •' « •!• —69.811 1  (2.576)*  Table 3.4 also gives more details on the ANN's structure (the number of layers, neurons and the activation function's type for each layer). For example, "[5 3 (logsig) 5  64  (tansig) 1 (purelin)]" means there are four layers: five neurons in the input layer, three neurons in the first hidden layer (with log-sigmoid activation functions), five neurons in the second hidden layer (with tan-sigmoid activation functions) and one neuron in the output layer (with a linear activation function). Before we proceed with the calculation of the pseudo-weights for each of these four models, we will present the estimates for the connection weights and biases. It is worth mentioning that an estimated model with significant forecasting improvements in terms of MSPE does not necessarily involve significant improvements in P E R C , and vice versa. Consequently, the sets of weights (and pseudo-weights) and biases for different estimated models vary. In Tables 3.5-3.8 we report the connection weights and biases for our models (using notation from chapter 2). These weights are used for the calculation of pseudo-weights for inputs. Table 3.5 Estimated ANN's connection weights and node biases (Model 1). Layer  Node biases vector  Connection weights matrix  Hidden 1 9.8549  Hidden 2  -2.9391  2.0196  0.0827  3.6983  -13.2714  -3.5306  0.7513  -1.8734  -1.4176  -8.6814  15.5297  2.8011  -2.2898  -1.8303  0.5049  12.5395  1.2344  -1.9592  -0.0203  -3.3835  -9.8746  -2.5924  0.4338  0.3805  jih  a  -2.0998  3.7493  -1.1381  -11.5194  -0.8059  ho  ~  a  =  -0.8239  -1.1924  -1.3401  -1.7941  -2.7949  6.4532  -1.6048  -2.6335  0.8536  1.2977  1.5261  0.9693  0.6845  0.2403  2.8913  2.3210  -0.6332  -2.6727  -0.1492  -1.6125  -0.4865  1.9818  2.8745  -1.2680  0.1582  0.2500  -0.8945  -2.1805  3.0275  1.9737  0.3806  -0.0727  0.3939  0.4582  0.6165  -0.2192  Output  65  Table 3.6 Estimated ANN's connection weights and node biases (Model 2). Layer Hidden 1  Hidden 2  Output  Node biases vector  Connection weights matrix  A0 =  =  °k  a  -5.9156  -0.6221  0.1574  -0.1634  1.3747  -2.0194  8.2915  -0.0841  0.1372  -0.8574  0.2101  5.1096  -2.0364  -0.2606  -1.8518  -0.6910  1.1828  -2.7705  -2.6056  3.9978  1.7636  4.6114  3.4562  2.8768  1.4485  -2.2000  -2.8342  3.1793  1.8361  0.3405  3.0117  2.4207  -2.5241  2.1962  -2.4863  -2.5259  3.0074  -4.5949  -1.0566  -0.7508  0.2607  3th ~  a  Ph = 0.3536  -0.1869  -0.9336  Table 3.7 Estimated ANN's connection weights and node biases (Model 3).  Hidden 1  Hidden 2  Node biases vector  Connection weights matrix  Layer =  °k  11.8174  0.7001  2.2284  -0.4306  -3.9251  4.1052  7.8997  3.5333  -1.7588  -0.4000  1.3543  7.3423  -5.3092  -0.6804  -0.6134  2.3290  0.3758  -2.8190  -4.6142  0.1774  1.6841  -0.3809  -0.0554  -4.5682  -10.4674  -0.4825  2.8461  -0.6674  -0.1229  -9.5688  a"  =  a" o = 2  1.8518  0.8631  -0.4264  -1.1182  1.6506  -1.6529  1.9171  1.9706  0.4564  1.5809  -0.7884  -2.6040  0.3869  2.0378  1.0049  -0.7275  0.8327  0.0153  0.2741  1.4992  1.3473  -0.9813  1.8014  -1.0848  1.4491  0.0813  -1.9059  -0.9378 -0.9777  1.3190  -0.3047  0.1453  -0.7193  -0.1458  0.5370  Output -0.1516  66  Table 3.8 Estimated ANN's connection weights and node biases (Model 4). Connection weights matrix  Layer Hidden 1  Node biases vector  °k = -0.0332 -1.0923  -5.3950  -1.0351  -0.5692  0.3137  -2.8995  1.8290  -0.1662  -0.8907 -0.4239  -6.0596  -2.0041  -1.9276  -0.8401  -0.3912  1.5673  1.1367  5.0291  -1.7058  -2.4187  0.9187 -1.2042  7.3801  -0.1283  -1.8204  0.5878  -7.3858  -0.5503  -9.0746  Hidden 2  «  > 2  0  =  -1.8943  -0.8410  0.2635  0.5640 -1.6569  2.3129  0.9927  -0.6511  0.3522  0.5967  1.3380  -1.0230  1.4946  -1.6408  0.1485  0.5120 -0.1589  -1.2897  0.3754  1.0767  -0.0273  -0.2060  -1.7259  -0.2092  0.2087  -1.1260  1.3149  -1.0367  -0.0312  0.2953 -0^4576  1.5937  -0.3803  0.0371  Output -0.2552 '  -0.3593  Table 3.9 lists four sets of pseudo-weights, one for each considered model. Regardless of the statistic used for comparison, the sum of the absolute values of the pseudo-weights for the order flows is always greater than the sum of the absolute values of the interest rate differential and crude oil price. This is very strong evidence in favour of our conjecture that order flows are highly important for non-linear exchange rate modeling and forecasting. To support our conclusion about pseudo-weights we also report (from Tables 3.5-3.8) that there is consistently more node bias on the nodes topologically more proximate to the order flow inputs. Clearly, more node bias on these nodes indicates more impact of order flows on the exchange rate forecasts.  67  Table 3.9 Relative contribution (pseudo-weights) of inputs for the A N N models. Input  1. ANNBP1  2. ANNBP2  3. Elmanl  4. Elman2  CC  -9.9739  -0.6871  -13.0805  -3.2867  FB  -2.1131  1.1735  -5.3067  -1.7364  IB  0.3797  10.6137  -0.0916  -2.9237  intdiff  3.6768  4.7441  0.3761  0.2750  oil  -0.1070  -6.1089  0.8910  12.4667  12.4743  18.4788  7.9477  3.7838  10.8530  1.2671  1.5181  '  ' -1.2431  |order flows contribution! |macro contribution!  In models 1 and 3 we see that when we aim for minimum MSPE, C C and F B order flows are more important than IB order flow. To the contrary, when we use a P E R C statistic as a measure of forecast performance, IB order flow becomes more significant in a "non-linear" fashion. It must be stressed that IB order flow accounts for most of the currency trading volume and that directional movements are more important for trading purposes. It appears that agents' expectations about future exchange rate sign movements are related to currency trading and originate in IB order flow, i.e., agents behave more like traders. Other types of order flows play a role in a more traditional forecasting framework based on the MSPE. Also, it is possible, since our models are estimated based on more than ten years of data, that the true D G P is changing over time and there is a degree of model misspecification involved (a few turning points are missed). Lastly, from models 3 and 4 we see that historical IB order  68  flow values (as well as past interest rates) are less important for both the MSPE and P E R C statistic. The sign effect of order flows on the exchange rate can be inferred from the following regression: A  where Ap  t  P  =  t  P f i C t  +(3 FB 2  + & I B . +  t  n.  t  is the change in the logarithm of the Canada/U.S. spot rate, CC,, FB, and  IB, are order flows, as defined previously. We estimate this model, first, for the full data set, and then for the episodes when the Canadian dollar was highly volatile (subsample: November/December 1994 and July/August  1998). The least squares  estimated  equations (with t-values in parentheses) are (full sample: 2559 observations) Ap  = -5.95 x 10 CC 7  t  t  (-6.13)  + 1.40  x W FB a  f  (21.58)  - 1.26  x 10 IB , a  t  R  2  = 0.268  (-9.89)  (subsample: 80 observations) Ap = t  1.17  (0.13)  x  m CC 7  t  + 1.55 (3.00)  x 10 FB G  t  - 6.12  x W IB , 7  t  R = 2  0.179  (-0.87)  For the full sample, the coefficient on the interbank order flow is highly significant „and the sign of the coefficient is as would be expected: net purchases of Canadian dollars by commercial banks should result in a decrease in the Canada/U.S. exchange rate, i.e., lead to an appreciation of the Canadian dollar. Commercial client transactions produce the same effect on the exchange rate as interbank flows, and the coefficient on the variable is statistically significant. By contrast, the estimate of the  69  coefficient on the foreign domiciled financial institution flow has a positive sign (and is statistically significant), suggesting that commercial bank purchases of Canadian dollars from this type of counterparty relate to a depreciation of the Canadian dollar. The above conclusions are slightly different for the reduced sample. Both the coefficient on commercial client and the one on interbank  transactions are not  statistically significant. The only significant estimate is the coefficient on the foreign domiciled financial institution transactions. Moreover, commercial client transactions are found to have the opposite effect on the exchange rate: net purchases of Canadian dollars led to a currency depreciation. Our conjecture is that a two-stage version of "hot potato" trading took place in this instance. More precisely, in the second stage, dealers tried to induce F X customers - who appeared to have provided liquidity by buying Canadian dollars - to absorb their inventory imbalances. A simple look at the first-stage data reveals that the number of purchases of Canadian dollars from the foreign domiciled financial institutions was almost three times greater than the average for the full sample. This increased inventory had to be channeled through other dealers and, eventually, customers. The data support this hypothesis - averages of CC and IB order flows are negative and well below averages for the full sample. Thus, foreign domiciled financial institutions provided instability by selling Canadian dollars, while commercial banks provided both stability (buying from foreign domiciled financial institutions)  and  instability (excessive selling to other commercial banks). Commercial clients provided the currency stability and liquidity during the periods of strong depreciation. It would be  70  useful to gain more insight into the trade volume figures, but, unfortunately, data were not available. To further investigate the pattern of response of the ANNBP2 network (the one with the best performance in terms of PERC) to the order flows for the whole sample, we plot response functions for IB, C C and F D in Figure 3.3. While the values of inputs, are held constant at the following out-of-sample values - (CC: -0.7314; FB: -0.1020; IB: 0.5284; oil: 0.3898; interest rate: 0.3871) - the order flows are individually changed, and the response of the exchange rate to each order flow is recorded. The order flows take 18  values from 0.1 to 3.60 (360 values; increment value is 0.01) and all of the other variables are normalized to zero mean and unity variance. Figure 3.3 Non-linear pattern of response (ANNBP2 model). o,  0.11  •  •  •  1  0  1  2 FD order flow  3  4  o -0.5232  " In some cases, setting the inputs to other out-of-sample values can change the pattern of responses, but typical responses are as shown for the above values.  71  The responses are non-linear and, as in the "linear case," net purchases of Canadian dollars by commercial banks from Canadian commercial banks and commercial clients result in a decrease in the Canada/U.S. exchange rate, i.e., lead to an appreciation of the Canadian dollar. Commercial bank purchases of Canadian dollars from foreign domiciled financial institutions relate to a depreciation of the Canadian dollar. Also, the effects of IB and F D order flows on the exchange rate appear to be stronger than is the case with transactions of commercial clients. This is consistent with the values of pseudo-weights for the ANNBP2 model. Taking the data for the 1994 depreciation episode, the typical responses are different, but their directions (signs) are somewhat related to the intuition of the above linear exercise. Here, all of the inputs are set to -0.2 and again the responses can be substantially different for some other values of the input variables. Figure 3.4 Non-linear pattern of response for the 1994 depreciation episode (ANNBP2 model).  FD order flow  72  Figure 3.4 shows that net purchases of Canadian dollars from foreign domiciled financial institutions and commercial clients are strongly correlated to both the appreciation (for smaller order flow values) and depreciation (for larger order flow values) of the Canadian dollar in a non-linear fashion. To the contrary^ interbank flows are weakly (non-linearly) related to the appreciation of the Canadian currency during this depreciation episode. We find it worthwhile to note that in the previous analyses we assumed that order flows are exogenous, i.e., that there is no feedback effect from the price to the order flows (sometimes referred to as "distressed selling"). Killeen, Lyons and Moore (2001) found that for the Germany/France exchange rate Granger causality ran from order flow to price, and not vice versa. However, in the case study on the collapse of the U.S./Japan exchange rate in 1998, Lyons (2001) found some evidence of falling prices inducing additional selling. If the effects we find in two depreciation episodes of the Canada/U.S. exchange rate are indeed due to "distressed selling" of the Canadian currency, it would be important to model and identify them. This is a difficult task, which we consider to be beyond the scope of this thesis, and thus leave it to further research.  73  3.3.3 Models based on individual and aggregate order flows when not controlled for the day-of-the-week-effects  19  We investigate the robustness of the forecasting performance of A N N models in this section. The following models are considered:  Random walk model (RW): rpfx,=a + rpfx ,j + -<,., t=l,..,N. 0  t  Linear model (LM1):  Arpfx, = ^ + ^ i Aintdiff^ + 0  T  2  Aoil^  IB^ + £,., t=l,..,N.  Linear model (LM2):  Arpfx = (3 + (3, Aintdiff,,; + |3 Aoil,,, + (3 CC „ + (3 IB^ + (3 FD,,j + v , t=l,..,N. t  0  2  3  t  t  4  5  t  ANN Model 1: Arpfx,, = f (intdiff.j, Aoil,,j, aggof^) + £,.; t=l,..,N.  Quoted from Gradojevic. N . and J . Yang. 2000. "The Application of Artificial Neural Networks to Exchange Rate Forecasting: The Role of Market Microstructure Variables." Bank of Canada Working Paper No. 2000-23 (December 2000).  74  ANN Model 2:  Arpfx = g (Aintdiff^ AoiL,,, CC , IB,,;, F D ) + v ; t=l,..,N. t  ui  W  t  j={l,7};t=l,..,N.  Table 3.10 presents linear regression estimation results for these models based on the first 2,005 observations (initial estimation set). The impact of interest rate change is more significant for lower frequency models (j=7), while the estimator of oil price change is more significant for one-day-ahead forecasting, Also, order flows are more important for higher frequency forecasting. Even though it is very small, as expected, the R increased when individual order flows were taken into account.  2  20  This thesis uses daily data over a ten-year period (as opposed to the four-month span used by Lyons and Evans  2002);  therefore, the linear "microstructure" model's R is significantly lower. 2  75  Table 3.10 Estimation Results For Linear Models. Estimates (standard  Model  error)  (exp IO") 5  Linear Model 1  Linear Model 1  Linear Model 2  Linear Model 2  (j=l)  0=7)  0=i)  0=7)  8.09  36.53  (2.93e-05)  (7.08e-05) 8.72  36.35  (2.93e-05)  (7.15e-05)  3 (exp IO") 5  0  Aintdiff^ (exp IO ) 4  Aoil , t  a  gg t.j (exp 10' ) 7  of  -1.13  -6.76  -1.89  -6.63  (0.00025)  (0.00026)  (0.00025)  (0.00026)  -0.0090  -0.003  -0.0087  -0.003  > (0.0065)  (0.0073)  (0.0065)  (0.0073)  -1.35  -3.74  (8.96e-08)  (2.16e-07) 1.33  -4.55  (1.28e-07)  (3.11e-07)  -1.015  -2.66  (1.69e-07)  (4.07e-07)  -3.86  -3.78  (8.98e-08)  (2.16e-07)  0.0049  0.0055  C C ^ (exp IO") 7  IB..J (exp IO") 7  F D , (exp IO ) 7  t  R  2  0.0021  0.005  Two above-specified non-linear models (ANN 1 and A N N 2) were estimated by feedforward backpropagation ANNs. Figure 3.5 shows the errors related to the training set, the validation set, and the testing set. As expected, all three errors decline during the learning process. Overtraining was prevented by stopping the training process when the validation set error started to increase.  76  Figure 3.5 ANN errors: training, validation, and testing.  Figure 5.1. — —  50  Training Validation Testing  100 Epoch (cycles) trained  Full sample estimation of 2,230 observations was used to compare the ANN and linear model's performance. Figure 3.6 shows that the linear model forecasts in a linear fashion (for an arbitrary 90-day period), whereas the A N N forecasts more in keeping with the pattern of actual exchange rate changes.  77  Figure 3.6 An illustration of linear and A N N model exchange rate forecasts. Actual values are denoted by circles.  Figure 5.2.  5X1CT 3  Linear model ANN model  inimii  4  o°  3 0  Output  2  °k  1  O  ° o° o  V D  o  O o  °  o  ,°  0  L  °°  oir ° n  o  o  1  20  30  1  K  l  oo  °  SIM  0 °  \p °  o  o °  °  o o  o o  oo  10  o°  .. q, °  o  o o  °  ° All i o  AV  ° 6>  -3 ()  l\ °  M / O  •f*  -2 lo  -4  h^.  A  0 -1  o  Cf_P  o  °  1  40 Sam5p0le size 60  1  70  1  80  90  100  After the initial estimation of the models in the first 2,005 observations, a set of out-of-sample forecasts was used to generate RMSEs. Each recursive re-estimation added 10 observations, so that 18 RMSEs were calculated on out-of-sample data sets ranging in size from 225 to 55 observations. This led to the selection of an A N N model 1 and A N N model 2 for one-day-ahead (j=l) and for one-week-ahead (j=7) forecasts of exchange rate changes, which were compared to linear models 1 and 2 and the random walk model. Figures 3.7, 3.8, 3.9, and 3.10 show that the A N N can produce promising shortrun forecasts, since the RMSE for the A N N model for a given forecasting horizon is equal to or below both of the competing models.  78  Figure 3.7 RMSE for ANN model 1, linear model 1, and random walk 0=1).  Figure 5.3.  . - O- • -  1.3  • - -e-  RW Linear mode! ANN model 4  6 8 10 12 14 Testing set size (step 10);Note:1=size 225,18=stze 55  16  Figure 3.8 RMSE for ANN model 2, linear model 2, and random walk (j=l).  Figure 5.4.  S 1.36  - o1.3  •  —e— - -e-  -  RW Linear model ANN model  4  6 8 10 12 14 Testing set size (step 10);Note:1=size 225,18=size 55  16  18  79  Figure 3.9 RMSE for ANN model 1, linear model 1, random walk (j=7).  Figure 5.5.  _ O  -  — e — -  4  -o-  RW Linear model A N N model  6 8 10 12 14 Testing set size (step 10);Note:1=size 225, 18=size 55  16  18  Figure 3.10 RMSE for ANN model 2, linear model 2, and random walk(j=7).  Figure 5.6. -1 - o -  — e — -  -o-  r~ -  RW Linear model ANN model  S 3.6  4  6 8 10 12 14 Testing set size (step 10);Note:1 =size 225,18=size 55  16  18  80  Tables 3.11 (for j = l ) and 3.12 (for j=7) list the R M S E statistics illustrated in these figures.  Table 3.11 RMSE (exp 10 ) for ANN, linear, and random walk models (j=l). 3  Sample  Model  size Random  Linear  Walk  Model 1  ANN Model 1  Linear  ANN  Model 2  Model 2  (j=l)  (i=i)  U=i)  0=1)  0=1)  225  1.4321  1.4364  1.4270  1.4331  1.4216  215  1.4168  1.4201  1.4121  1.4155  1.4060  205  1.4232  1.4272  1.4163  1.4218  1.4119  195  1.4216  1.4240  1.4143  1.4188  1.4114  185  1.3962  1.4020  1.3871  1.3969  1.3892  175  1.3702  1.3782  1.3580  1.3717  1.3631  165  1.3482  1.3564  1.3353  1.3475  1.3428  155  1.3368  1.3442  1.3237  1.3362  1.3322  145  1.3381  1.3464  1.3238  1.3388  1.3337  135  1.3639  1.3708  1.3496  1.3622  1.3622  125  1.3696  1.3773  1.3540  1.3691  1.3671  115  1.3926  1.4026  1.3739  1.3933  1.3903  105  1.3057  1.3109  1.2963  1.3008  1.2907  95  1.3335  1.3380  1.3230  1.3256  1.3133  85  1.3652  1.3673  1.3515  1.3578  1.3508  75  1.3308  1.3353  1.3163  1.3243  1.3198  65  1.3851  1.3882  1.3738  1.3761  1.3743  55  1.4168  1.4202  1.4070  1.4156  1.4155  81  Table 3.12 RMSE (exp IO ) for A N N , linear, and random walk models (j=7). 2  Sample  Model  size Random  Linear  ANN  Linear  ANN  Walk  Model 1  Model 1  Model 2  Model 2  0=7)  0=7)  0=7)  0=7)  0=7)  225  0.3457  0.3461  0.3458  0.346  0.3448  215  0.3467  0.3474  0.3461  0.3473  0.3462  205  0.3504  0.3516  0.35  0.3515  0.3502  195  0.3431  0.3435  0.3427  0.3437  0.3428  185  0.3358  0.3367  0.335  0.3368  0.3358  175  0.3371  0.3381  0.3364  0.3382  0.3365  165  0.3333  0.3336  0.3328  0.3338  0.3331  155  0.3377  0.3381  0.337  0.3382  0.3373  145  0.3286  0.3308  0.3253  0.3307  0.3278  135  0.3374  0.3396  0.3338  0.3396  0.3366  125  0.3441  0.3461  0.3408  0.3462  0.3438  115  0.3554  0.3576  0.3518  0.3577  0.3553  105  0.3328  0.3351  0.33  0.3349  0.3326  95  0.3384  0.3404  0.3355  0.3404  0.3384  85  0.3523  0.3544  0.3491  0.3544  0.3521  75  0.3657  0.3682  0.3619  0.3683  0.3656  65  0.3768  0.3787  0.3743  0.3790  0.3764  55  0.3882  0.3894  0.3878  0.3895  0.3868  .  The experiments show that the A N N model forecasts one-day and seven-dayahead exchange rate changes better than the linear and random walk models. Nevertheless, the primary indicator of good forecasting power is not necessarily RMSE, but the percentage of correctly forecasted directions of real exchange rate fluctuations. In this case, the estimation involves very small values (exp 10 ) that might result in small 3  82  RMSEs. In turn, the presence of small RMSEs is not a guarantee that the prediction is accurate, and caution is required when interpreting the estimation results. As noted above, the percentage of correctly forecasted exchange rate direction changes or good hits (PERC) is also considered. Recursive regression for horizons between 5 and 225 observations (step 5) reveals the superiority of the A N N model.  21  A N N model 1 (2) correctly predicted, on average, 60.14 per cent (61.81 per cent) of the direction of daily exchange rate movements, while linear model 1 (2) correctly predicted 57.18 per cent (58.75 per cent) of such changes, and the random walk model predicted 54.88 per cent. One-week-ahead forecasts yield worse results for A N N model 1 and linear model 1 against random walk for j=7, but ANN model 2 has the best results. Also, the predictive power of both non-random walk models is lower. Table 3.13 compares all the models used in terms of the second comparison criterion.  Table 3.13 The average percentages of correctly predicted signs for linear models 1 and 2 (LM 1 and L M 2), A N N models 1 and 2 (ANN 1 and A N N 2), and the random walk model. One-day (j=l) and one -week (j= 7) forecasts are considered. AVERAGE P E R C (%)  Model Random  LM 1  ANN 1  LM 2  ANN 2  Walk  21  j=l  54.88  57.18  60.14  58.75  61.81  j=7  56.26  54.9  56.15  55.28  58.04  Step 5 is used instead of step 10 to impose a more demanding setting for A N N models.  83  Figure 3.11 illustrates the one-day results summarized in Table 3.13. The results show that the A N N models dominate in predicting the direction of exchange rate changes one day ahead. Figure 3.11 Example: recursive estimation PERC comparison for ANN model 2, linear model 2, and random walk model (1-day forecast).  Figure 5.7.  — e —  10  -  - 0 -  • -  o- • -  Linear model ANN model RW  15 20 25 30 35 Testing set size (step 5);Note:1=size 225, 45=size 5  Next, out-of-sample, one-step-ahead forecasts are considered. More precisely, A N N model 1 (2) is initially estimated for the first 2,006 observations. The forecast errors for the remaining 225 observations (a testing set) are calculated by extending the estimation set by one and recalculating the forecast errors until the whole testing set is exhausted. This differs from the preceding forecast experiment in that the earlier experiment did not re-estimate the model up to t-1 to forecast the exchange rate at t.  84  RMSEs and PERCs for the one-day-ahead forecasts are listed in Table 3.14. The striking result here is that A N N 2 correctly predicts almost 72 per cent of the directions of future exchange rate changes, while the random walk model stays at about a 55 per cent accuracy. Table 3.14 P E R C and R M S E (exp 10" ) statistics for the recursive estimation over the whole 3  testing set (k=225). A N N models 1 and 2 (ANN 1 and A N N 2) and the random walk (RW) model for one-day-ahead (j==1) forecasts are considered. Model RW P E R C (%):  54.88  RMSE:  1.4321  .  ANN 1  ANN 2  67.56  71.56  1.4155  1.3988  To determine the percentage of correctly predicted changes (or good hits) that relates to positive changes, the following statistic was constructed for the initial testing sample size (k=225):  PERC(POS) = (number of positive correct responses/number of sample positive  movements)  Similarly, for negative good hits another statistic was calculated:  PERC(NEG) = (number of negative correct responses/number of sample negative movements)  85  The term "positive changes" refers to values above the mean of estimation sample changes, while "negative changes" are values below the mean value. This corrects for the fact that there is a significantly greater number of positive changes in this sample. Taking zero as a mean value would affect the reliability of the criterion, since there were mostly positive changes in the sample. According to Table 3.15, the A N N models forecast positive and negative changes roughly equally well. In comparison, failing to correct for the positive mean change would, lead to the erroneous conclusion that the model predicts positive changes much better than negative changes.  Table 3.15 PERC(POS) and PERC(NEG) for A N N models 1 and 2 (ANN 1 and A N N 2). Oneday (j=l) and one-week (j=7) forecasts are considered (k=225). Percentages without normalization are in parentheses. Model ANN 1  ANN 2  42.98  38.02  (89.43)  (80.49)  57.02  53.04  (98.39)  (98.39)  48.08  67.31  (2.05)  (36.27)  41.44  56.36  (0.99)  (2.97)  PERC(POS) (%): j=l j=7  PERC(NEG) (%): j=l  86  3.4  Conclusions  With this exercise, we conduct a study of exchange rate models for the Canada/U.S. exchange rate. More specifically, we focus on their intra-day (high-frequency) and, )  subsequently, weekly forecast performances using a set of non-linear microstructure models. We find strong evidence for the microstructure effects. Our horse race for forecast performance results in a non-linear A N N model as the winner. A N N models are able to significantly improve upon a simple random walk model. The daily forecasts produced by A N N models are statistically significant according to Diebold and Mariano (1995) statistics. Apart from the N N model, other linear and non-linear models are unable to generate significant predictions. The results also indicate the necessity of embodying (in a non-linear sense) information not only from interbank order flows, but from C C and FD transactions. No matter which non-linear model is used, there is always a slight forecast gain when dealer's private order flows are included into a set of explanatory variables. In our second experiment which includes the whole data set, the A N N is consistently better in terms of RMSE than random walk and linear models for the various out-of-sample experiments. Moreover, A N N performs on average at least 3 per cent better than other models in its percentage of correctly predicted signs. This is true for both of the forecasting horizons. As expected, more accurate forecasts are generated for the shorter forecasting window, but they are still superior to the random walk model.  87  Recursive one-step-ahead forecasts lead to a considerable improvement in P E R C compared to the random walk and linear models. These findings point to important implications for the models for exchange rate determination. Particularly, Lyons and Evans (2002) partial equilibrium model could be extended to encompass these non-linear and microstructure effects. Balancing the tension between microeconomic and microstructure variables is crucial. The question of which macroeconomic variables to use remains open for further research as only prices make sense for high-frequency models. Similarly, the panel of microstructure variables can be extended to a set of non-fundamentals which would account for the bandwagon effect, over-reaction to news, speculation, etc. However, these factors are not easy to quantify and we can only rely on different proxies for them. To conclude, the "proper" selection of both macroeconomic and microstructure factors can improve these results. Lastly, the power of this approach can be tested on other currencies.  88  Chapter 4 An Introduction to Fuzzy Logic  4.1  Introduction  Advanced modeling techniques such as artificial neural networks (ANN), fuzzy logic controllers (FLC), and genetic algorithms (GA) can be applied to a vast variety of applications in finance and economics. Through hybridization of these techniques, more complex problems can be addressed. This thesis combines two of the most popular concepts: A N N and F L C . Fuzzy logic represents a technology for designing sophisticated control systems (Cox  1992). It provides a convenient  solution to the most complex modeling  requirements. Fuzzy technology in the form of approximate reasoning is a method of easily representing analog processes on a digital computer. These processes involve continuous  phenomena  defined by imprecise terms such as  "rapidly changing,"  "significant risk," "very cost effective," and "large capital commitment." The difficulty of modeling these concepts, mathematically or by sets of rules, requires a tool that is nonlinear and tolerant of imprecise data. As an example, mutual  89  fund managers' investment decisions (Peray 1999) are subject to the above-mentioned problems. The rules to be used for equity fund investment decision-making could take into account the following variables: economic conditions, price-to-earnings ratio ( P / E ratio), and the gap from the moving average (MA gap).  22  Continuous variables, such as  these, might be characterized by a number of different .states. For instance, the variable "economic conditions" can be explained by a range of states: "very bad," "good," "extremely good," "solid," etc. The transition from one state to another is not precisely defined and it is very hard to say whether an increase in GNP of two percent would cause a "fairly good" economy to change to a "good" economy. Consequently, the idea of what is "fairly good" and what is "good" is subject to different interpretations by different economists observing the same value of a variable. The subjectivity emerging here is resolved by introducing implications, i.e., the fuzzy inference method to continuous systems modeling. Fuzzy inference interprets the values (imprecise linguistic terms such as "good" or "bad") as the states of the variable. Then, these linguistic terms could constitute a fuzzy trading rule for the mutual fund manager such as:  IF economic conditions are bad A N D the P/E ratio is high, A N D the MA gap is high,  T H E N the trading action is strong sell.  22  f  r o m  j-jjg  m o v  j g average is the percent difference between the current market index level and n  weighted moving average. Since the weighted moving average is the trend line for the market index, the gap is thus an indicator of the market moving ahead or behind the trend.  90  Fuzzy logic rules follow the natural language and are close to human reasoning. They "cover" a very broad fuzzy variable range, too. That is why one fuzzy rule can be a substitute for many conventional rules. In addition, since fuzzy logic creates a control surface by combining rules and fuzzy variable states, system control can be achieved even though the mathematical behavior of the system is incomplete. To sum up, fuzzy logic can be used when dealing with continuous and imprecise variables, when a mathematical model of the process in unknown, when the input-output relation is nonlinear, and finally, when the set of rules which explains input-output dependence is available. There has been a growing literature on FLCs and their use in financial economics. Many of these examples, such as ones for portfolio management and stock market trading, can be found in Deboeck (1994). Further, Bojadziev (1997) uses F L C for evaluating a client's risk tolerance level based on his/her annual income and total net worth. Peray (1999) determines an opportunity for equity fund investments using wellestablished technical indicators (e.g., the gap from the moving average) and market fundamentals (GDP, inflation rate, interest rate, etc.). Also, fuzzy logic can be used to explain non-linearities in interest rates (Ju et al. 1997). In Japan, fuzzy logic is used in a foreign exchange trading system to forecast the Japan/U.S. exchange rate. This system uses fuzzy logic rules to make inferences based on economic news that may affect the currency market (Yuize 1991). Recently, Tseng et al. (2001) integrate fuzzy and ARIMA models to forecast the Taiwan/U.S. exchange rate.  91  4.2  Fuzzy System  The conventional control system and fuzzy system are quite alike (Cox 1993). The only difference is that a fuzzy system contains a "fuzzifier," which converts inputs into "fuzzy variables," and a "defuzzifier," which converts the output of a fuzzy control process into numerical value output. Fuzzification and defuzzification processes are explained on the next page. In a fuzzy system the process of generating the output (control) begins with taking the inputs, fuzzifying, and then executing all the rules from the rule base which are active. Active rules' outputs are aggregated into a single output and, after defuzzification, a new output is generated (Figure 4.1; Source: Cox 1993).  92  Figure 4.1. Fuzzy control system.  Normalization and fuzaafi cation of Input variables  Execution of rules Inference engine  Fuzzy control system design is composed of four major steps: 1. Define the model's functional and operational characteristics: This step comprises the definition of the system architecture, data transformation requests, inputs, outputs, and the position of the fuzzy system within the overall system. 2. Define the control surfaces (fuzzification): Each control and state variable is decomposed into a set of fuzzy regions (or states) called the "fuzzy sets". These fuzzy  93  sets are assigned certain names from the set N (suppose N={"strong sell", "sell", "hold", "buy", "buy strong"}) that span the variables' domains. They do not have crisp, clearly defined boundaries. In the end each fuzzy set is represented by its "membership function" (example - Figure 4.2). A membership function is a curve which defines how the points in the input space (elements of N) are mapped to a membership value (or degree of membership), a real number between 0 and 1. Mathematically, a fuzzy set K is defined by a set of ordered pairs (Bojadziev 1997):  N = {(x,Mx))\x  G  N,/i*(x)  G  [0,1]}  where ^(x) denotes a membership function. Membership functions can be of various types: triangular, trapezoidal, Gaussian, sigmoidal, polynomial, etc. This paper uses most common shapes, triangular and Gaussian, to build a fuzzy inference system. It should be noted that larger values of a membership function indicate higher degrees of membership. Basic operations on fuzzy sets (A and B) are the following:  •equality (A=B if and only if for every x,  /J, (X)=/J, (X) A  ),  B  •inclusion (A is included in B if and only if for every x, fi (x) <  /J, (X)),  A  B  •proper subset (A is a proper subset of B if and only if n (x) < MB(. ) X  A  H (x) < fi {x) A  B  f ° every x, and r  for at least one x),  •complementation (A and B are complementary if  /j, (x)—\—fi (x)), A  B  94  •intersection (^Annix) = ^ (MA( )j Bi )))> m  •union  (H UB( ) A  X  n  x  l  x  max(^(i),// ,(a;))).  =  i  Figure 4.2. The example for the relationship between variables, fuzzy sets and triangular membership functions for the variable "trading action".  Trading action  Variable Labels (fuzzy sets) SELL STRONG  <  SELL  HOLD  BUY  BUY STRONG  Fuzzy set representation (fuzzy membership function)  3. Define behavior of control surfaces (rules): To define a link between the input and output variables a rule base is created. Linguistic rules are of the form:  IF <x is A> THEN  <y is B>.  where x and y are scalar variables and A and B are linguistic values defined by fuzzy sets. The phrase "x is A" is called the antecedent or premise, while "y is B " is called the  95  consequent or conclusion. Fuzzy rules form a fuzzy rule base. The number of rules varies with the number of control variables. The idea is to try to identify all the possible combinations of inputs. Thus, if there are three input variables, each described by five fuzzy sets, the required number of rules would be 5 =125. Fuzzified inputs cause some 3  rules to be activated and to contribute to an overall output which is calculated using the so called "Mamdani inference". Mamdani inference applies min and max operators for 23  fuzzy A N D (intersection) and OR (union) operators. It also requires that the output membership function is a fuzzy set (unlike "Sugeno inference" where the output is either constant or linear). To illustrate this, suppose there is one numeric fuzzy input u  0  (u G U) with the corresponding fuzzy membership function /% (u). The output variable 0  has a membership function denoted by fi {v),  v  v  G  The i" active rule (i=l,..,I) would  V.  1  produce the following output:  as = /%,(«o) A// (v) Vj  (4.1)  The overall output Z is a union of all membership functions (c^s):  Z = \Ja  i  (4.2)  See Mamdani (1975).  96  4. Select the defuzzification method: The last step to fuzzy system design is to choose between several defuzzification methods and extract a discrete value z from Z. Perhaps the most popular and frequently used methods are the composite maximum and the composite moment or centroid of area (or centre of gravity). .Centroid of area returns the point where the aggregate fuzzy set Z under the curve is sliced into two equal masses (Jang and Sun 1995): J Hw{z)z  dz  COA —  z  where  /i\ {z) V  is the aggregated output membership function.  The whole process of a fuzzy decision-making system design for the F X trading will be described in greater detail in Chapter 5.  97  Chapter 5 Neuro-Fuzzy Decision-Making in Foreign Exchange Trading and Other Applications  5.1  Neuro-Fuzzy Design  The ANNs have been very successful in recognizing nonlinear patterns from noisy, highfrequency data and have been very useful forecasting tools. However, there has been criticism of their inability to transparently explain how a decision (forecast) is reached. Also, unlike fuzzy logic it is impossible to incorporate a priori information about the problem into the A N N system. Fuzzy logic can quantify vague information and produce transparent decisionmaking logic. The drawbacks of fuzzy logic are the lack of a learning capability and the necessity for an expert knowledge about the system. The aim of neuro-fuzzy (NF) systems is to combine the benefits of both approaches. Such a combination can be achieved in several ways, which include: neural fuzzy inference systems (Jang 1993), fuzzy neural networks (Gupta and Rao 1994), and neuro-fuzzy combination. Neural fuzzy inference systems introduce a parallel architecture and learning capability to a fuzzy inference system. Each fuzzy rule is created using the  98  A N N and it is a data driven process. Fuzzy neural networks embed fuzzy logic into the A N N by fuzzifying the learning algorithms. This paper uses a neuro-fuzzy combination where the A N N is used for identification (forecasting) and F L C extracts the decision from the ANN's output. Finding a trading rule with F L C should not be confused with training of an A N N . A n A N N uses a learning algorithm to map input variables (e.g., lagged interest rate, lagged order flow) into output variables (e.g., exchange rate change). In other words, an A N N is a nonlinear and dynamic system that learns from known input-output combinations. ANNs have been shown to have very good forecasting ability, but lack explanatory capability. By contrast, FLCs have no training capability and the mapping between inputs and output is generated from expert knowledge in the form of "if-then" rules. That is why it would be ideal to combine ANNs and FLCs to create a so-called neuro-fuzzy (NF) technology (Jang and Sun 1995, Nauck and Kruse 1997). In this study the ANN estimates one-day-ahead forecasts (Gradojevic and Yang 2000) of the magnitude of the exchange rate change (positive or negative). Afterwards, 24  that value is fuzzified on the interval [-10 , 10~] that contains all of the F X rate change -3  24  3  Model 1:  Arpfx, = <> ( (Aintdiff,.], A o i l . aggof,.]) + e ;  Model 2:  Arpfx, = ^ (Aintdiff,.], Aoil,.], CC,.], IB,.], FD,.,) + v,;  t  lt  t  where, as before. Arpfx is a daily change of the logarithm of real Canada/US exchange rate, Aintdiff is the change in the differential between the Canada-TJS nominal 90-day commercial paper interest rate, Aoil is the daily change in the logarithm of the crude oil price, aggof is aggregate order flow, and CC, IB, FD are individual order flows. The data set covers the period between January 1990 and June 2000.  99  forecasts. We choose Gaussian-bell over triangular shaped membership functions since they produce a relatively smooth input/output mapping. They are defined by its mean and standard deviation. These parameters are arbitrary set to slice the variable domain into overlapping Gaussian functions that have the same shape and the highest degree of membership for the mean value. The following functions are assigned to each exchange rate change state (fuzzy set): " V E R Y N E G A T I V E (V-NEG)," " N E G A T I V E (NEG)," "MEDIUM-NEGATIVE (M-NEG)," " W E A K L Y N E G A T I V E (W-NEG)," "STABLE," " W E A K L Y POSITIVE (W-POS),"  "MEDIUM-POSITIVE (M-POS),"  "POSITIVE  (POS)," and " V E R Y POSITIVE (V-POS)" (Figure 5.1.a).  Figure 5.1.a. Gaussian membership functions for the variable "exchange rate change forecast".  100  Figure 5.1.b. Triangular membership functions for the variable "FX trader's action."  -0.4  -0.2  0 0.2 Trading-strategy  0.4  0.6  0.8  The F X trader's action or trading strategy is also a fuzzy variable with the triangular fuzzy membership functions that represent five fuzzy sets between -1 and 1: " V E R Y STRONG SELL (VS-SELL)," "STRONG SELL (S-SELL)," "MEDIUM SELL (M-SELL),"  "WEAK  SELL  (W-SELL),"  "HOLD,"  "WEAK  B U Y (W-BUY),"  "MEDIUM B U Y (M-BUY)," "STRONG B U Y (S-BUY)," " V E R Y STRONG B U Y (VSBUY)". These linear-shaped functions (Figure 5.1.b) can be interpreted as the "fuzzy utility functions" of a risk-neutral investor (the degree of satisfaction to the decisionmaker with respect to the particular objective, e.g., expected target rate of return).  25  Fuzzy rule base contains nine simple and intuitive IF-THEN type rules that link a fuzzy input (fuzzified exchange rate change forecast) and the F X trader's action:  2 j  See Ramaswamy (1998) for further discussion on "fuzzy utility functions."  101  • IF "exchange rate change forecast" is "V-NEG" T H E N "FX trader's action" is "VS-SELL". - IF "exchange rate change forecast" is "NEG" T H E N "FX trader's action" is "S-SELL". • IF "exchange rate change forecast" is "M-NEG" T H E N "FX trader's action" is "M-SELL". • IF "exchange rate change forecast" is "W-NEG" T H E N "FX trader's action" is "W-SELL". • IF "exchange rate change forecast" is "STABLE" T H E N "FX trader's action" is "HOLD". • IF "exchange rate change forecast" is "W-POS" T H E N "FX trader's action" is "W-BUY". • IF "exchange rate change forecast" is "M-POS" T H E N "FX trader's action" is "M-BUY". • IF "exchange rate change forecast" is "POS" T H E N "FX trader's action" is "S-BUY". • IF "exchange rate change forecast" is "V-POS" T H E N "FX trader's action" is "VS-BUY".  Figure 5.2 shows how the F X trading strategy is determined.  Figure 5.2. N F system for the F X trading. Two nonlinear models are used for forecasting.  Aintdiff. Aoil -  '  /  aggof -4-  A N N  Arpfx  F L C  Trading strategy  / IB - - r  Model 1  c  i \  c  _\_  FD \-  i —t/  s •- • / :  Model 2  102  NF estimation process begins after the introduction of the new input to the trained A N N which produces the logarithm of the exchange rate change forecast on its output. This is illustrated on Figure 5.3 where, using the notation from chapter 4, the A N N output is assumed to be u = 3.9 • 10~ . In the next stage, this output becomes a 4  0  F L C input and when it is read, each rule that has any truth in its premise will be active. In this example, rules 7 and 9 are active. The rules use Mamdani inference as it is intuitive and well-suited to human input. Moreover, our computational requirements are very modest and there is no need for the Sugeno method which is more computationally effective. When the rule is active, its output will contribute towards the overall F L C output whereas the min operator is applied (equation 4.1). Two resulting membership functions ( c\  =/u  Vi  (u ) A fi (v) 0  Vi  and  a  = Mu ( o) u  2  2  A  Av ( )) f ° each rule are combined u  r  2  together with the max (or logical "and") operator (equation 4.2). Finally, the combined membership function Z, i.e., the upper bound of the shaded area, is defuzzified with "centroid method" (equation 4.3) into a discrete value z=0.417 from the [-1,1] interval, the domain of the output membership function. This is a very reasonable estimate given the shape of Z and we assume that other defuzzification methods would not bring significant advantages.  103  Figure 5.3. An example of Mamdani fuzzy inference.  Log exchange rate forecast (x KV )  Trading strategy (buy/hold/sell)  4  Rule 1  Rule 2  / \  / V  V  y  Rule 3  Rule 4  / V  yv  Rule 5  Rule 6  \ /  / \ / \  / \  Rule 7  V  Rule 8  Rule 9  in  /  \  / V  /\«, 1 rt  liz ^  1  1  z=0.417 M-BLY: Invest 40% CAD  By using the NF technology, we not only determine the F X trading strategy, but the traded volume too. We assess the traded volume using Table 5.1 and based on the shape of the membership functions for the output variable.  104  Table 5.1.The intervals for discrete decisions on traded volume (based on a defuzzified output z). Trading strategy  Interval for z  V E R Y STRONG (VS) SELL-invest 80% of the U.S. currency  -1< z O0.875  STRONG (S) SELL-invest 60% of the U.S. currency  -0.875< z <-0.625  MEDIUM (M) SELL-invest 40% of the U.S. currency  -0.625< z <-0.375  W E A K (W) SELL-invest 20% of the U.S. currency  -0.375< z <-0.125  HOLD-do not invest  -0.125< z <0.125  W E A K (W) BUY-invest 20% of the Canadian currency  0.125< z <0.375  MEDIUM (M) BUY-invest 40% of the Canadian currency  0.375< z <0.625  STRONG (S) BUY-invest 60% of the Canadian currency  0.625< z <0.875  V E R Y STRONG (VS) BUY-invest 80% of the Canadian currency  0.875< z <1  Since the output value z is always a number from [-1,1] interval, Table 5.1 uniquely defines the fraction of the current period endowment for either of the strategies that should be traded. In other words, a positive value indicates to buy the US currency using the fraction of domestic cash endowment from Table 5.1, while a negative value indicates a fraction of foreign currency to be sold. In this example, the trading rule ("MEDIUM B U Y " ) implies buying the foreign currency using a fraction 0.40 of the current cash endowment. This paper aims to compare the'NF daily trading strategy recommendation with the simple buy-and-hold strategy over certain periods. Lastly, it is important to note that a more complex F L C setting could probably achieve the optimal trading strategy estimator. However, we apply a very basic setting and leave this to further research.  105  5.2  Market Environment  Much of the previous trading rules literature has sought to test whether particular kinds of technical trading rules have useful forecasting ability.  26  There are a few differences  between technical trading rules studies and the N F trading rules. First, the searching space is different. For technical trading rules the searching space consists only of past prices, while in this paper the searching space of N F technology contains fundamental variables, such as interest rate differential, and market order flow variables. Second, the question of which of those trading rules are important is subtly different. Instead of checking whether specific rules work, this research tries to find out whether an optimized A N N combined with F L C can be used to generate a single trading strategy. Third, fuzzy logic is employed in this paper to link the forecasting value to a decision space. One advantage of fuzzy logic is that it can make a decision space more continuous. Instead of generating discrete "buy" or "sell" signals, fuzzy logic provides much finer trading decisions, for example, "strong buy," (invest 60 per cent of the Canadian currency endowment) "strong sell," (invest 60 per cent of the U.S. currency endowment) and "hold" (preserve the current position) etc. In this paper two trading strategies are compared, namely the N F and buy-andhold strategies. At the beginning of each period, currency dealers using both strategies are endowed with the same portfolio, which consists of M U.S. dollars, denoted M  26  U S D  and  Lo and Wang (2000), LeBaron, Brock and Lakonishok (1992).  106  M Canadian dollars, denoted M  C A D  . The NF method generates a daily trading signal, to  buy a certain amount of foreign exchange using Canadian currency or to sell a certain amount of the foreign currency in the portfolio. Otherwise, the dealer's current position is preserved. The trading strategy from fuzzy logic specifies the position to be taken. The trading signal generated by F L C is located between -1 and 1, which suggests an asset position in the dealer's portfolio.  27  We assume that the scenario where the NF strategy is compared to buy-and-hold benchmark strategies involves a lump sum initial endowment, and, if the currency is held, there is no reinvestment, i.e., it does not earn the overnight interest rate (domestic or foreign). The NF strategy competes with a conventional buy-and-hold strategy which suggests holding all of the lump sum initial endowments in U.S. dollars until the end of the period called the moving window. The rate of return at time t, r , for the buy-andt  hold strategy is calculated from the amount held in Canadian dollars and according to the market price after each transaction as follows:  M  USD  + (—^— t-w  •S  t  •  M AD) C  b  r  =  MUSD  •  St-w  +  1  M  CAD  where W is the size of the moving window and S is the real Canada/U.S. exchange rate t  at time t.  This trading strategy implies the assumption of risk neutrality.  107  Similarly, for the sequence of N F strategies over the window of size W, the rate of return (r{ ) is calculated by  R  ,  =  NF  USD  MUSD  where N F  C A D  and N F  0 S D  • S + NF T  • &t-w +  _  CAD  1  M  CAD  are the amounts of Canadian and U.S. dollars in the dealer's  portfolio position after W days, respectively. The rates of return defined above for the buy-and-hold strategies are compared to the day-to-day NF-guided trading strategy rate of return. Comparisons are conducted both with the transaction cost and without the transaction cost. The transaction cost imposed as a linear monotonic increasing function of the order size. For example, for a conventional size of a 5 million order, the transaction cost is 5 basis points.  28  To  calculate the returns for the buy-and-hold strategy, it is applied, first, when buying U.S. currency, and then, when converting back to Canadian currency. The NF strategies are subject to transaction costs every time the trade occurs, i.e., the "hold" signal is not received. For the robustness of the results, the same comparison is conducted for a moving window with a different length of the periods. The results for 10-day, 20-day, 30-day, 40day and 80-day moving window are reported in the next section.  28  Points are the smallest moves an exchange rate can make, i.e., digits added to or subtracted from the  fourth decimal place.  108  5.3  Estimation Results  The major purpose of this study is to investigate how well the sequence of N F daily trading strategy recommendations performs against the buy-and-hold strategy over certain periods. The empirical results are discussed in this section. The focus is on the one-day-ahead out-of-sample analysis. Therefore, the NF-based strategy recommendation is re-estimated every time the observed window is shifted towards the end of the sample. Observation windows are called moving windows because, in order to test the robustness of the N F technology, the trading period (window) was continuously moved (100 times) forward across the data set. For each moving window the winning strategy (in terms of the rate of return) was recorded. Thus, we measure the excess return ( t — t.) °f the NF strategy on each of the 100 trading periods or windows. Table 5.2 r  r  contains the estimated number of windows (out of 100) when the NF strategy was superior to the simple buy-and-hold strategy.  109  Table 5.2. The number of moving windows when the NF technology outperforms a simple buyand-hold strategy. Two models are considered: model 1 (with aggregate order flow) and model 2 (with individual order flows) with and without transaction costs. Window  size (days):  10  20  30  40  80  Model 1  44  51  60  69  84  Model 2  46  53  66  72  75  Model 1  46  48  55  67  80  50  57  66  72  74  Number of winning windows:  (with tr. costs) Model 2 (with tr. costs)  In the absence of the transaction costs, it is quite evident that when the size of the moving window increases from 10 to 80, both of the NF models improve and perform better than the buy-and-hold strategy. They produce higher rates of return when the window size is 20 or greater on more than 50 per cent of the windows. This can be explained by the fact that when the window size is bigger the NF technology has more opportunities to compensate for its errors. Further, the inclusion of the transaction costs does not significantly reduce the number of winning windows. In some cases such as model 2 with the window size of 20, the percentage of winning windows increases (from 53 to 57). The explanation for this is that we define the NF technology (or more precisely, fuzzy membership functions) to produce a "hold" signal for an arbitrarily small exchange rate change forecast. For instance, there can be sequences that contain mostly "holds" and "weak-buys" or "weak-sells" with no or very little transaction costs  110  incurred. On the other hand, buy-and-hold strategies involve trading the whole initial Canadian dollar endowment and applying the transaction cost would penalize the rate of return to the greater extent. Figures 5.4 (a, b, c, d) present rates of return for a 10-day moving window. The first moving window is initially estimated for 2010 observations and one-day-ahead forecast and the resulting strategy are generated. This continues for the whole window (while increasing the estimation set by one) and the last action to be generated is based on 2019 observations. Subsequently, the moving window is shifted forward 100 times, these windows are indexed from 1 to 100, and the corresponding 100 rates of return are calculated and compared with the buy-and-hold rates of return. We also calculate excess returns for the N F strategies. Similarly, for 20-day moving windows, the rates of return are shown in Figures 5.5 (a, b, c, d). Figure 5.4.a. NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 2 without the transaction costs) and NF excess returns. The mean excess return is -0.00014.  11 0  i 10  i  i  i i . 20 30 40 50 60 Moving window index: 1=2010-2019, 2=2011-2020  i i 70 80 100=2110-2119  i  90  i 100  HI  Figure 5.4.b. NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 2 with the transaction costs) and NF excess returns. The mean excess return is 0.00041.  —  Buy-and-hokJ R O R NF ROR  Moving window index: 1=2010-2019, 2=2011-2020,..., 100=2110-2119  Figure 5.4.c. NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 1 without the transaction costs) and NF excess returns. The mean excess return is -0.0003.  'o  10  20 30 40 50 60 70 80 Moving window index: 1=2010-2019, 2=2011-2020,..., 100=2110-2119  90  100  10  20 30 40 50 60 70 80 Moving window index: 1=2010-2019, 2=2011-2020,..., 100=2110-2119  90  100  0.01  0  112  Figure 5.4.d. NF and the buy-and-hold rates of return (ROR) for the 10-day moving-window (model 1 with the transaction costs) and NF excess returns. The mean excess return is -0.00008.  T  1  1  Moving window index: 1=2010-2019, 2=2011-2020  1  r  100=2110-2119  Figure 5.5.a. NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 2 without the transaction costs) and NF excess returns. The mean excess return is 0.00047.  113  Figure 5.5.b. NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 2 with the transaction costs) and N F excess returns. The mean excess return is 0.00074.  - • Buy-and-hold R O R  Moving window index: 1=2010-2029, 2:2011-2030  100=2110-2129  Figure 5.5.c. NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 1 without the transaction costs) and NF excess returns. The mean excess return is 0.00016.  ;1 0  I 10  I I I I l 20 30 40 50 60 Moving window index: 1 = 2 0 1 0 - 2 0 2 9 , 2 : 2 0 1 1 - 2 0 3 0  l I 70 80 100=2110-2129  I 90  I 100  114  Figure 5.5.d. NF and the buy-and-hold rates of return (ROR) for the 20-day moving-window (model 1 with the transaction costs) and NF excess returns. The mean excess return is -0.00027.  - • Buy-and-hoU ROR  i  ;l 0  10  'o  10  i  i  i  i  i  20  30  40  50  60  20 30 40 50 60 Moving window index: 1=2010-2029, 2:2011-2030  Moving window index: 1=2010-2029,2:2011-2030  70 80 100=2110-2129  i  i  90  100  70  80  90  10O  i  I  100=2110-2129  The mean excess returns for the N F strategy are increasing with the window size in absence of transaction costs (model 1: -0.0003 for 10-day window, 0.00016 for 20-day window, 0.0014 for 30-day window, etc.; model 2: -0.00014 for 10-day window, 0.00047 for 20-day window, 0.0018 for 30-day window,'etc.) and when transaction costs are included this still holds (model 1: -0.00008 for 10-day window, -0.00027 for 20-day window, 0.00024 for 30-day window, etc.; model 2: 0.00041 for 10-day window, 0.00074 for 20-day window, 0.0018 for 30-day window, etc.). As expected, model 1 is dominated by model 2 for the most of the windows and different window sizes. The longer, 20-day window implies higher mean excess returns without transaction costs for model 1, but surprisingly, as observed before, model 2 seems to be unaffected by the transaction costs.  115  This relationship persists for the various window sizes. To conclude, NF technology can be successful only on certain ranges and any F X trader attempting to use NF-advised dynamic strategies should be very cautious.  5.4  Excess Volatility and the NF Model  The issue of exchange rate overshooting is of deep concern to the Bank of Canada in that it may be the result of, or be manifested as, excess volatility in the F X market. Overshooting is viewed as being due to the prevalence of speculative, noise, or chartist traders. In periods of excess volatility, central banks, like the Bank of Canada, worry that F X volatility will spill over into domestic fixed income markets and in turn impact the real economy. Moreover, there is a concern that even a sudden depreciation in the currency, that was initially predicated on fundamentals, may become self-perpetuating and in turn lead to excess volatility (and excess depreciations). As noted in Murray et al. (2000), in periods where it is believed that overshooting is occurring, it may be necessary for the Bank of Canada to raise official rates in order to offset the self-perpetuating sentiment that exist in the market. In raising official rates (and in turn hopefully calming F X markets) the Bank hopes to avoid a more dramatic tightening of monetary conditions that may occur if market uncertainty (and volatility) continued to permeate. The Bank of Canada has over the 1970s, 1980s, and 1990s on repeated occasion had to engage in such behaviour, the latest example being in August of 1998 when the  116  Bank of Canada raised rates by 1 per cent. During this most recent period, the Canadian dollar depreciation, that began rather slowly in mid-summer, started accelerating soon after the Russian ruble crisis. It is rather difficult to know when a currency is overshooting its "fundamental" (equilibrium) value or when its fluctuation are excessive, since on a daily basis it is hard to ascertain what is the fair value of the currency. Moreover, in order for the Bank of 29  Canada to appropriately apply contractionary (but F X stabilizing) medicine, it needs to know when self-perpetuating or destabilizing market sentiments are taking hold. Thus, models that are able to signal the advent of destabilizing trading activity are useful to the central bank. One such model developed at the Bank is the fundamentalist versus chartists model of Murray, van Norden, and Vigfusson (1996), in which Canadian dollar F X movements can be attributed to either fundamental based traders, who try to keep, the exchange rate close to its true equilibrium value, or chartists traders, who often cause the exchange rate to deviate from its fair market value, using a Markov-switching econometric model. The fundamentalists are assumed to determine the Canada/U.S. real exchange rate (rfx) from the following equation:  Aln(r/x) = a(ln(rfx) _i (  29  t  — /3 — [3 comtot _ 0  r  t  x  — f3 enetot _^ + yintdif,^ e  t  + e  t  (5.1)  In line with Murray, van Norden, and Vigfusson (1996), we conjecture that the "fundamental" forces that  drive the exchange rate are determined by the equation (5.1). It is important to stress that these forces are often very difficult to find.  117  where comtot denotes non-energy commodity terms of trade, enetot denotes energy terms of trade and intdif is Canada/U.S. interest rate differential. To the contrary, chartists have very little regard for fundamental variables and are assumed to base their exchange rate forecasts and trading strategies on technical indicators such as momentum. Further, the expected exchange rate change can be modeled as a weighted average of both groups' expectations as follows:  EAs  t  = oj EAs{ t  +1  + (1 - oJ )EAst t  (5.2)  +1  The weights, uj and 1 — uj , assigned to each group are determined by a portfolio t  t  manager who favours the group that was most successful in the latest period.  As{  and AsJ  +1  +1  are determined by two forecasting equations:  As{ = cJ + cpis^ - aj_!)  As = a + i\ ma c  c  (  where  s is the  A  u  +  ^00^200  + ymtdift^  +  Tintdif^  logarithm of the  + e[, e[ ~ Af(0,o{)  + et,  et - A^{0,at)  (-) 5  3  (5.4)  nominal Canada/U.S. exchange rate,  J  is  fundamentalists' forecast of s based on (5.1), ma and ma. o are moving averages used as 14  20  chartists' technical indicators, 0/ and a° are constants, and superscripts / and s stand for fundamentalists and chartists, respectively.  118  The probabilities that expected values of the exchange rate change will persist in fundamentalists' or chartists' regime (state) R at time t given that it was in that state at time t-1 are the following:  ARt = f\Rt-i = /) = $(«/)  ARt = c l i U = c) = $(a ) c  where $ is the normal cumulative density function. The portfolio manager's objective is to maximize the following log-likelihood function: T L L F  = EE^)««W) t= l  R,  where d(s \R ) is the normal density function of the regime's residual. t  l  In this model, it is shown that during tranquil periods, chartists are the dominant traders while during volatile periods it is the fundamentalist traders that drive F X price movements. This is somewhat counterintuitive, but is explained as the result of fundamentalist traders only participating in the market when they view the price of the currency as being sufficiently far from their view of what it should be, while at other times they choose to not participate (in a dominant manner). However, since chartists can in essence be viewed as being inactive during periods where fundamental traders are  119  dominant, discerning what signals the chartists are receiving during periods of sharp depreciations (attributed to fundamental traders) or what direction these traders would trade in, is not examined in this framework. The NF model presented above can be viewed as also shedding light on activities of F X market participants. Specifically, given that the NF model is essentially a very sophisticated technical trading model, it allows one to understand what type of signals the chartist traders are getting during periods of Canadian dollar overshooting (excess depreciation). It would in turn shed light on how chartists were trading during periods that are thought to be periods where the Canadian dollar has overshot. In the case when one is uncertain whether the F X rate actually overshoots its fundamental value or not, it is important to note that the NF model could provide some indication that expectations are strong in one direction. To put it another way, this N F model allows one to get a partial assessment of the F X market's sentiment, as reflected by the chartists' trading signals, during periods of strong depreciations. As such, this model complements the previous work examining chartists and fundamentalist by providing information on the likely activity carried out by chartists during periods where fundamentalists are seen to dominate market activity. In order to examine what is the sentiment of chartists during periods of strong depreciation, we review past episodes in which the Bank of Canada viewed the market as having destabilizing or self-perpetuating expectations. The most obvious one being the August 1998 periods in which the Bank of Canada intervened heavily in the F X market.  120  We choose to assign the following five fuzzy sets to the exchange rate change states: " V E R Y N E G A T I V E , " "NEGATIVE," "STABLE," "POSITIVE," and " V E R Y POSITIVE" (Figure 5.6). Similarly, the trading strategy variable has five states as follows: "SELL STRONG," "SELL," "HOLD," "BUY," and " B U Y STRONG" (Figure 5.7). These states uniquely define the trading signal received by a currency trader.  Figure 5.6 Gaussian membership functions for the variable "exchange rate change forecast."  NEGATIVE  -10  -  8  -  6  -  STABLE  VERY-BOSITIVE  4 Exchange-rale-change-forecasl  121  Figure 5.7. Triangular membership functions for the variable "FX trader's action."  In accord with Section 5.1, there are five simple rules that link a fuzzy input and an output:  • IF "exchange rate change forecast" is "VERY NEGATIVE" T H E N "FX trader's action" is "SELL STRONG". • IF "exchange rate change forecast" is "NEGATIVE" T H E N "FX trader's action" is "SELL". • IF "exchange rate change forecast" is "STABLE" T H E N "FX trader's action" is "HOLD". • IF "exchange rate change forecast" is "POSITIVE" T H E N "FX trader's action" is "BUY". • IF "exchange rate change forecast" is "VERY POSITIVE" T H E N "FX trader's action" is "BUY STRONG".  The other specifics of the NF design and its implementation in this section are the same as in Section 5.1. It is also worthwhile to note that we do not define any  122  portfolio decisions in this Section. Rather, we try to differentiate the'types of signals the traders receive. For that purpose we use Table 5.3.  Table 5.3. The intervals for discrete trading signals the traders receive from the NF system (based on a defuzzified output z).  Trading strategy  Interval for z  SELL STRONG  -1< z <-0.75  SELL  -0.75< z <-0.25  HOLD  -0.25< z <0.25  BUY  0.25< z <0.75  BUY STRONG  0.75< z <1  The A N N (model 2) is trained based on the first 1823 observations, i.e. until the end of June 1998. The remaining 34 observations, which cover July and August, are used for forecasting (testing) and generating N F system's outputs. The A N N part of a N F system generates 34 forecasts (Table 5.4). The forecasting accuracy of an A N N is very high: roughly 82% of the overall changes are successfully forecasted, i.e., only six negative changes are missed.  Table 5.4. Observed A N N (model 2) direction-of-change forecasting statistics (frequencies) for July/August 1998 inputs (34 observations).  Frequency Positive forecast  31  Actual positive  25  Negative forecast  3  Actual negative  9  123  Further, A N N forecasts are input into F L C to generate a sequence of 34 trading strategies. For this testing set the results are strikingly different from an average 30-day trading recommendations sequence (for a set of arbitrary 30-day windows without sharp increases or declines of the exchange rate) which consists of 11 buy (approximately 33%), 14 hold, and 5 sell signals. More precisely, NF generated 20 "buys" (approximately 60%), 14 "holds", and not a single sell signal. Another  sharp  exchange  rate  depreciation  (5  cents)  concerns  November/December 1994 forecasting period which consists of 27 observations: 18 positive and 9 negative exchange rate changes. A N N statistics are reported in Table 5.5. High forecasting accuracy is maintained for this testing set and it is 81.5%. One positive and four negative changes are not successfully forecasted.  Table 5.5. Observed A N N (model 2) direction-of-change forecasting statistics (frequencies) for November/December 1994 inputs (27 observations).  Frequency Positive forecast  21  Actual positive  18  Negative forecast  6  Actual negative  9  Again, the N F technology generated a similar "alert" set of signals: 14 "buys" (51%), 12 "holds" and one "sell". The only sell signal is based on wrongly forecasted positive exchange rate movement. One could argue that in this case there is 10% less  124  buy signals generated by NF system and that there exists a wrong sell signal (whereas in July/August 1994 there are no sell signals). But, one must not forget that in the first case there was a sharper, 10-cent depreciation and that A N N estimation set in 1994 is twice shorter. The ANN's forecasting power strongly depends on the amount of data available for training and the lack of approximately 700 observations lowers that power. This Section indicates that the N F system which generates discrete trading signals can play an important role in detecting strong and potentially dangerous (from the point of view of the central bank) sentiments in the F X market. These signals are assumed to characterize the possible activity carried out by chartists during periods of excessive depreciation where fundamentalists are seen to dominate market activity. Our analysis also suggests that the accuracy of the N F system is largely linked to the amount of data used for the A N N training.  5.5  "Continuous" Trading Decisions and a Risk-Averse  Investor In this section, we compare NF-generated strategies and decisions (strategies) made by a risk-averse investor, based exclusively on A N N forecasts. F L C provides a relatively smooth input/output surface. We assume that a risk-averse investor attaches a smooth non-linear tan-sigmoid function to A N N forecasts (Figure 5.8).  125  Figure 5.8. Tan-sigmoid function used to estimate the risk-averse investor's strategies. e  x  /(*> = , ,  -e-  x  + e~  x  Investment .  0  s  A N N forecast  -1  This function establishes a familiar framework for determining a fraction of the investor's endowment to be traded. Using the setting from Section 5.4, an endowment fraction to be invested (z g [— 1, 1]) is received for every forecast, but unlike in Section 5.1, this fraction is directly applied to the current position. For example, z=0.417 would be interpreted as a signal for buying the foreign currency with 41.7 per cent of the current domestic currency endowment. This can be thought of as an attempt to investigate if we can make finer and less discrete trading decisions than those defined in Section 5.1. Here, we proceed by recording and comparing the number of moving windows where one of the approaches generates higher returns. Of central interest is whether one of the strategies can consistently earn higher returns over different window sizes.  126  Table 5.6. The number of moving windows when the NF technology outperforms risk-averse investor's strategies. A N N model 2 is considered (with individual order flows and without transaction costs) on 100 moving windows.  Window  5  -  size(days):  10  20  30  40  46  32  25  19  Number of winning windows:  Model 2  54  The results (Table 5.6) show very clearly that it can, even though for small window sizes (5 and 10 days) both approaches produce similar returns. A risk-averse investor's strategies have more substantial percentage advantage for 20-day, 30-day, and 40-day trading periods. The results suggest that fuzzy logic and its "relatively smooth" decision surface cannot outperform a smooth non-linearity such as tan-sigmoid in this simple one input/ one output application. This indicates that a slightly different and simpler approach can produce better results and there are no grounds for "continuous" trading decisions. In a scenario involving the F L C with multiple inputs, given the difficulty to select the suitable multidimensional non-linearity, we anticipate the fuzzy logic approach to be more useful. A N N forecasts and some technical trading indicators as inputs can serve this purpose, but we leave this to further research.  127  5.6  Conclusions and Further Research  Exchange rate forecasting using artificial neural networks (ANN), fuzzy logic controllers (FLC) and genetic algorithms (Goldberg 1989) has recently received much attention. This chapter proposes a neuro-fuzzy (NF) technology to learn a single trading rule that specifies both the action (buy/hold/sell) and volume of currency to be traded. A N N produces one-day-ahead-forecasts of the Canada/US dollar exchange rate change based on lagged aggregate order flow (model l)/individual order flows (model 2), interest rate, and crude oil price. Buy-and-hold periods chosen for this research are 10, 20, 30, 40, 30  and 80 days in length. Each of these windows was shifted 100 times into the future and the recursive N F recommendations were estimated for all of the moving windows. The NF-based sequence of daily trading strategies earns superior returns over a simple buyand-hold strategy on the most of the periods. After the transaction costs are included, this still holds. Also, it is shown that a risk-averse investor with a smooth non-linearity can earn similar or higher returns than a "fuzzy logic investor" making "continuous" trading decisions. The purpose of this chapter is not to produce a flawless trading strategy that would always win over a simple buy-and-hold strategy. That is close to impossible when the exchange rate for the last moving window day is much higher than the exchange rate for the initial buy-and-hold day. Rather, the purpose is to examine the possibility of creating a successful decision-making model that accounts for both public and private  3 0  Gradojevic and Yang (2000).  128  information. In other words, to apply and test a new approach to modeling the agent's decision-making (or behavior) in F X market. By doing this we can identify certain trading patterns, other than those based on well-known technical indicators. Specifically, this approach is also beneficial for signaling purposes to detect destabilizing F X market sentiments. Potentially, NF technology can be used for the government portfolio management, conducting monetary policy, financial stability, etc. There are a number of directions where one could pursue further research on this problem. Several remarks regarding A N N architecture were made in Gradojevic and Yang (2000). Further, the problem of selecting an adequate A N N input lag might be resolved not only by using ANNs, but some other method for determining the degree of "non-linear correlation" such as "average mutual information" and "false nearest neighbors". The F L C consists of one input, one output, and a very simple rule base. A 31  set of technical trading rules could be added to a rule base. Consequently, new inputs 32  which may include moving averages, lagged exchange rates, would bring more decisionmaking power to an overall N F system. An adaptive F L C (Cox 1993) which allows for membership function changes (e.g., widening "HOLD" fuzzy region) and weighting or changing the rules in the rule base may improve the N F system's performance. The percentage of winning strategies can be improved through F L C system modifications  -!1  Kennel, Brown and Abarbanel (1992), Fraser and Swinney (1986).  32  Murphy (1999).  129  with respect to membership functions types, fuzzy sets, and denazification method. One final guideline involves a different aspect of NF hybridization or a NF-GA combination.  33  Goldberg (1989), Bornholdt (1992).  130  Bibliography  Allen, F. and R. Karjalainen. 1999. "Using Genetic Algorithms to Find Technical Trading Rules." Journal of Financial Economics 51: 245-271.  Amano, R.A. and S. van Norden. 1995. "Terms of Trade and Real Exchange Rates: The Canadian Evidence." Journal of International Money and Finance 14 (1): 83-104.  . 1998. "Exchange Rates and Oil Prices." Review of International Economics 6 (4): 683-94.  Arthur, W.B. 1994.  "Inductive  Reasoning and Bounded Rationality."  American  Economic Review 84: 406-411.  Baillie, R. and  P.  McMahon.  1989.  The Foreign Exchange Market: Theory and  Econometric Evidence. New York: Cambridge University Press.  Bhattacharya, U. and M . Spiegel. 1991. "Insiders, Outsiders, and Market Breakdown." Review of Financial Studies 4: 255-282.  131  Blake, A . P and G. Kapetanios. 2000. " A Radial Basis Function Artificial Neural Network Test for Arch." Economic Letters 669:15-23.  Bojadziev, George and M . Bojadziev. 1997. Fuzzy Logic for Business, Finance and  Management. World Scientific.  Bollerslev, T. 1990. "Modelling the Choerence in Short-run Nominal Exchange Rates: A Multivariate Generalized A R C H Model." Review of Economics and Statistics 72: 498505.  Boothe, P. and D. Glassman. 1987. "The Statistical Distribution of Exchange Rates." Journal of International Economics 22: 297-319.  Bornholdt, S. 1992. "General Asymmetric Neural Networks and Structure Design by Genetic Algorithms." Neural Networks 5: 327-334.  Brock, W., J. Lakonishok and B. LeBaron. 1992. "Simple Technical Trading Rules and the Stochastic Properties of Stock Returns." The Journal of Finance 47 (5): 17311763.  Campbell, J.Y., A.W. Lo and A . C . MacKinlay. 1997. The Econometrics of Financial Markets. Princeton University Press.  132  Cheung, Y-W. and C. Y - P . Wong. 2000. "A Survey of Market Practitioners' Views on Exchange Rate Dynamics." Journal of International Economics 51: 401-423.  Cleveland, W. S. and S. J. Devlin. 1988. "Locally Weighted Regression: A n Approach to Regression Analysis by Local Fitting." Journal of the American  Statistical  Association 83: 596-610.  Cleveland,  W.S.  1979.  "Robust-locally  Weighted  Regression  and  Smoothing  Scatterplots." Journal of the American Statistical Association 74: 829-836.  Clinton, K . 2001. "On Commodity-Sensitive Currencies and Inflation Targeting." Bank of Canada Working Paper No. 2001-3, 1-52.  Covrig, V. and M. Melvin. 1998. "Asymmetric Information and Price Discovery in the F X Market: Does Tokyo Know More About the Yen?" Arizona State University. Typescript.  Cox, E. 1992. "Fuzzy Fundamentals." IEEE Spectrum 29 (10): 58-61.  Cox, E. 1993. "Adaptive Fuzzy Systems." IEEE Spectrum 30: 27-31.  133  Cybenko, G. 1989. "Approximation by Superposition of a Sigmoidal Function." Mathematics  of Control, Signals and Systems 2: 303-314.  D'Souza, C. 2002. "A Market Microstructure Analysis of Foreign Exchange Intervention in Canada." Bank of Canada Working Paper No. 2002-16, 1-74.  Deboeck, J. Guido. 1994. Trading on the Edge: Neural, Genetic, and Fuzzy Systems for  Chaotic Financial Markets. John Wiley h Sons, Inc.  Diebold, F. X . and J. Nason. 1990. "Nonparametric Exchange Rate Prediction," Journal of International Economics 28 (3-4): 315-332.  Diebold, F . X . 1988. Empirical Modelling of Exchange Rate Dynamics.  Springer-Verlag,  New York.  Diebold, F . X . and M . Nerlove. 1989. "The Dynamics of Exchange Rate Volatility: A Multivariate Latent Factor A R C H Model." Journal of Applied Econometrics 4: 1-21.  . 1990. Unit Roots in Economic Time Series: A Selective Survey, in G. Rhodes Jr. and T. Fomby, eds., Advances in Econometrics: A Research Annual, vol. 8 (Cointegration,  Spurious Regressions,  and  Unit  Roots), JAI Press, Greenwich,  Connecticut.  134  Diebold, F . X . and R.S. Mariano. 1995. "Comparing Predictive Accuracy." Journal of Business & Economic Statistics 13: 253-263.  Donaldson, R. G. and M . Kamstra. 1997. "An Artificial Neural Network - G A R C H Model for International Stock Return Volatility." Journal of Empirical Finance 4 (1): 17-46.  Dornbusch, R. 1976. "Expectations and Exchange Rate Dynamics." Journal of Political Economy 84(6): 1161-1176.  Elman, J.L. 1990. "Finding Structure in Time." Cognitive Science 14: 179-211.  Engle, R.F. 1982. "Autoregressive Conditional Heteroskedasticity With Estimates of the Variance of United-Kingdom Inflation." Econometrica 50 (4): 987-1007.  Engle, R.F., T. Ito and Heteroscedastic  W.L. Lin. 1990.  Intra-Daily  Volatility  in  "Meteor the  Showers or Heat  Foreign  Exchange  Waves? Market."  Econometrica, 58: 525-542.  Evans, O.V.D. 1997. "Short-Term Currency Forecasting Using Neural Networks." ICL Systems Journal 11 (2): 1-17.  135  Fama, E.F. 1965. "Behavior of Stock Market Prices." Journal of Business 38: 34-105.  Fernandez-Rodriguez, F. C. Gonzalez-Martel and S. Sosvilla-Rivero. 2000. "On the Profitability of Technical Trading Rules Based on Artificial Neural Networks." Economic Letters 69: 89-94.  Flood, R., and A. Rose. 1995. "Fixing Exchange Rates: A Virtual Quest for Fundamentals." Journal of Monetary Economics 36: 3-37.  Forsythe, G.E., M . A . Malcolm and  C.B. Moler.  1977.  Computer Methods for  Mathematical Computations. Prentice-Hall, Englewood Cliffs, NJ.  Franses, P.H. and G. Draisma. 1997. "Recognizing Changing Seasonal Patterns Using Neural Networks" Journal of Econometrics 81: 273-280.  Franses, P.H. and K. van Griensven. 1998. "Forecasting Exchange Rates Using Neural Networks for Technical Trading Rules." Studies in Nonlinear Dynamics and Econometrics 2: 109-116.  Fraser, A . M . and H.L Swinney. 1986. "Independent Coordinates for Strange Attractors from Mutual Information." Physical Review 33 (2): 1134-1140.  136  Funahashi, K.-I. 1989. "On the Approximate Realization of Continuous Mappings by Neural Networks." Neural Networks 2: 183-192.  Garcia, R. and R. Gencay. 2000. "Pricing and Hedging Derivative Securities with Neural Networks and a Homogeneity Hint." Journal of Econometrics 94: 93-115.  Gencay, R. 1999. Linear, Non-Linear and Essential Foreign Exchange Rate Prediction With Simple Technical Trading Rules." Journal of International Economics 47: 91107.  Gencay, R. and M . Qi. 2001. "Pricing and Hedging Derivative Securities with Neural Networks:  Bayesian  Regularization,  Early  Stopping  and  Bagging."  IEEE  Transactions on Neural Networks 12: 726-734.  Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Longman.  Goldberg, L., and R. Tenorio. 1997. "Strategic Trading in a Two-Sided Foreign Exchange Auction." Journal of International Economics 42: 299-326.  137  Gradojevic, N . and J. Yang. 2000. "The Application of Artificial Neural Networks to Exchange Rate Forecasting: The Role of Market Microstructure Variables." Bank of Canada Working Paper No. 2000-23, 1-36.  Gupta, M . M . and D. H. Rao. 1994. "On the Principles of Fuzzy Neural Networks." Fuzzy Sets and Systems: 61, 1-18.  Hamilton, J.D. 1994. Time Series Analysis. Princeton: Princeton University Press.  Heinemann, M . 2000. "Adaptive Learning of Rational Expectations using Neural Networks." Journal of Economic Dynamics and Control 24(5-7): 1007-1026.  Hornik, K . 1991. "Approximation Capabilities of Multilayer Feedforward Networks." Neural Networks 4: 251-257.  Hornik, K . , M . Stinchcombe and H. White. 1989. "Multilayer Feedforward Networks are Universal Approximators." Neural Networks 2: 359-366.  Hornik, K . , M . Stinchcombe, and H. White. 1990. "Universal Approximation of an Unknown Mapping and its derivatives Using Multilayer Feedforward Networks." Neural Networks 3: 551-560.  138  Hsieh, D.A. 1988. "The Statistical Properties of Daily Foreign Exchange Rates." Journal of International Economics 24: 129-145.  . 1989. "Testing for Nonlinear Dependence in Daily Foreign Exchange Rate Changes." Journal of Business 62: 329-68.  Hu, M.Y., G. Zhang, C.X. Jiang, and B.E. Patuwo. 1999. "A Cross-Validation Analysis of Neural Network Out-of-Sample Performance in Exchange Rate Forecasting." Decision Sciences 30 (1): 197-216.  Hutchinson, J., A. Lo, and T. Poggio. 1994. "A Nonparametric Approach to Pricing and Hedging Derivative Securities via Learning Networks." Journal of Finance 49: 851889.  Jamal, A . M . M . and C. Sundar. 1997. "Modeling Exchange Rate Changes with Neural Networks." Journal of Applied Business Research 14 (1): 1-5.  Jang, J.R., C. Sun and E. Mizutani. 1997. Neuro-Fuzzy and Soft Computing. PrenticeHall, NJ.  139  Jang, J.-S. R. 1993. "ANFIS: Adaptive-Network-based Fuzzy Inference System." IEEE Trans. Syst., Man, Cybern. 4 (1): 156-159.  Jang, J.-S. R. and C.-T. Sun. 1995. "Neurofuzzy Modeling and Control." Proceedings of the IEEE.  Ju, Y . J . , C E . Kim and J.C. Shim. 1997. "Genetic-Based Fuzzy Models: Interest Rate Forecasting Problem." Computers Ind. Engng. 33: 561-564.  Kaashoek, J.F. and H.K. van Dijk. 1999. "Neural Networks Analysis of Varying Trends in Real Exchange Rates." Erasmus University Rotterdam, Econometric Institute Report 143, 1-19.  Kennel, M . B . , R. Brown and H.D.I. Abarbanel. 1992. "Determining Embedding Dimension for Phase-space Reconstruction Using a Geometrical Construction." Physical Review 45 (6): 3403-3411.  Killeen, W., R. Lyons, and M . Moore. 2001. "Fixed versus flexible: Lessons from EMS order flow." N B E R Working Paper 8491.  Kooths, S. 1999. "Modelling Rule- and Experience-based Expectations Using Neurofuzzy-Systems." Computing in Economics and Finance 1032: 1-26.  140  Kuan, C . - M . and H. White 1994. "Artificial Neural Networks: A n Econometric Perspective." Econometric Reviews 13: 1-91.  Kuan, C.-M. and T. Liu. 1995. "Forecasting Exchange Rates Using Feedforward and Recurrent Neural Networks." Journal of Applied Econometrics 10 (4): 347-364.  Lippmann, R.P. 1987. "An Introduction to Computing with Neural Nets." IEEE ASSP Magazine, 4-22.  Lisi, F. and A. Medio. 1997. "Is a Random Walk the Best Exchange Rate Predictor?" International Journal of Forecasting 13: 255-267.  Lo, A . W . , H. Mamaysky and J. Wang. 2000. "Foundations of technical analysis: computational algorithms, statistical inference, and empirical implementation." The Journal of Finance 55 (4): 1705-1765.  Lyons, R.K. 2001. The Microstructure Approach to Exchange Rates, MIT Press.  Lyons, R . K . and M.D.D. Evans. 2002. "Order Flow and Exchange Rate Dynamics." Journal of Political Economy 110 (1): 170-180.  141  Mamdani, E. H. and S. Assilian. 1975. "An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller." Int. J. Man-Machine Studies 7 (1): 1-13.  Markowitz, H.M. 1952. "Portfolio Selection." The Journal of Finance 7 (1): 77-91.  Meditch, J.S. 1969. Stochastic Optimal Linear Estimation  and Control. McGraw-Hill.  Meese, R.A. and A . K . Rose. 1990. "Nonlinear, Nonparametric, Nonessential Exchange Rate Estimation." The American Economic Review 80 (2): 678-691.  . 1991. "An Empirical Assessment of Non-Linearities in Models of Exchange Rate Determination." Review of Economic Studies 58 (3): 603-619.  Meese, R.A. and K . Rogoff. 1983. "Empirical Exchange Rate Model of the Seventies: Do They Fit Out of Sample?" Journal of International Economics 14: 3-24.  Mizrach, B. 1992. "Multivariate Nearest-Neighbour Forecasts of EMS Exchange Rates." Journal of Applied Econometrics 7: 151-63.  Moody, J. and L. Wu. 1996. "Optimization of Trading Systems and Portfolios." in Proceedings of Neural Networks in Capital Markets, Pasadena, CA, November 1996.  142  Murphy, J. John. 1999. Technical Analysis of the Financial Markets. New York Institute of Finance.  Murray, J., M . Zelmer and Z. Antia. 2000. "International Financial Crises and Flexible Exchange Rates: Some Policy Lessons from Canada." Technical Report No. 88. Ottawa: Bank of Canada.  Murray, J., S. van Norden, and R. Vigfusson. 1996. "Excess Volatility and Speculative Bubbles in the Canadian Dollar: Real or Imagined?" Technical Report No. 76. Ottawa: Bank of Canada.  Nauck, D. and R.Kruse. 1997. "What are Neuro-fuzzy Classifiers?" Proceedings of the Seventh International Fuzzy Systems Association World Congress IFSA'97 4: 228233, Academia Prague.  Osier, C. 1998. "Short-Term Speculators and the Puzzling Behavior of Exchange Rates." Journal of International  Economics 45: 37-57.  Pedrycz, W. and M . Reformat. 1997. "Rule-Based Modelling of Nonlinear Relationship." IEEE Fuzzy Syst. 5 (2): 256-269.  Peray, Kurt. 1999. Investing in Mutual Funds Using Fuzzy Logic. St. Lucie Press.  143  Plasmans, J., W. Verkooijen, and H. Daniels. 1998. "Estimating Structural Exchange Rate Models by Artificial Neural Networks." Applied Financial  Economics 8: 541-  551.  Qi, M . 1996. "Financial Application of Artificial Neural Networks." Handbook of Statistics 14: 529-552.  Ramaswamy, S 1998. "Portfolio Selection Using Fuzzy Decision Theory." Bank for International Settlements Working Paper 1020-0959-59, Basle, Switzerland.  Refenes, A . and M . Azema-Barac. 1994. "Neural Network Applications in Financial Asset Management." Neural Computation Applications 2: 13-39.  Robinson,  P.  1987. "Asymptotically  Efficient  Estimation  in the  Presence of  Heteroskedasticity of Unknown Form," Econometrica 55: 875-892.  Rochet, J.-C. and Vila, J.-L. 1994. "Insider Trading without Normality," Review of Economic Studies 61: 131-152.  Setnes, M . , R. Babuska, H.B. Verburger. 1998. "Rule-Based Modeling: Precision and Transparency." IEEE Trans. Syst., Man, Cybern.-Part  C. 28 (1): 165-169.  144  Smithson, M . , and G. C. Oden, 1999. Fuzzy Set Theory and Applications in Psychology. In  H.-J Zimmermann  (Ed.),  Practical  Applications  of Fuzzy Technologies  (Handbooks of Fuzzy Sets Series), Dordrecht, The Netherlands: Kluwer Academic Publishers.  Smithson, M . J . 1987. Fuzzy Set Analysis for the Behavioral and Social Sciences. NewYork: Springer- Verlag.  Stone, C. J. 1977. "Consistent Non-Parametric Regression", Annals of Statistics 5: 595645.  Sugeno, M . and T. Yasukawa. 1993. "A Fuzzy-Logic-Based Approach to Qualitative Modeling." IEEE Trans. Fuzzy Syst. 1: 7-31.  Takagi, T. and M. Sugeno. 1985. "Fuzzy Identification of Systems and Its Application to Modelling and Control." IEEE Trans. Syst, Man, Cybern. SMC-15: 116-132.  Tay, N.S.P., and S. C. Linn. 2001. "Fuzzy inductive reasoning, expectation formation and the behavior of security prices." Journal of Economic Dynamics and Control 25: 321-361.  145  Tseng, F.-M., G.-H. Tzeng, H.-C. Y u and B.J.C. Yuan. 2001. "Fuzzy ARIMA Model for Forecasting the Foreign Exchange Market." Fuzzy Sets and Systems 118: 9-19.  Van Eyden, R.J. 1996. The Application  of Neural Networks in the Forecasting  of Share  Prices. Haymarket, VA: Finance & Technology Publishing.  Verkooijen, W. 1996. "A Neural Network Approach to Long-Run Exchange Rate Prediction." Computational Economics 9: 51-65.  White, H. 1989. "Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models." Journal of the American  Statistical Association  94:  1003-1013.  Yakowitz, S.J. 1987. "Nearest Neighbour Methods for Time Series Analysis." Journal of Time Series Analysis 8 (2): 235-247.  Yang, Z. R., M . B. Piatt, and H. D. Piatt. 1997. "Probabilistic Neural Networks in Bankruptcy Prediction." Journal of Business Research 44: 67-74.  Yao, J. 1997. "Spread Components and Dealer Profits in the Interbank Foreign Exchange Market." New York University, Salomon Center, Working Paper: S/98/04, 1-54.  146  Yuize, H. 1991. "Decision Support System for Foreign Exchange Trading." Fuzzy Engineering  Symposium:  International  971-982.  Zhang, G. and M.Y. Hu. 1998. "Neural Network Forecasting of the British Pound/US Dollar Exchange Rate." International  Journal  of Management  Science 26 (4): 495-  506.  147  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items