Modeling the Return Rates of Stock via E M Algorith by Steve Che-Ming Chou B . A . S c , University of British Columbia, 2001 A THESIS S U B M I T T E D IN PARTIAL F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF Master of Science in T H E F A C U L T Y O F G R A D U A T E STUDIES (Mathematics) The University of British Columbia August 2005 © Steve Che-Ming Chou, 2005 Abstract We consider a multi-stock market, where daily return process of each stock together with its mean rate of daily return are assumed to follow a continuous diffusion process, which is similar to a state-space system with linear Gaussian dynamics. Our major objective is to estimate the model parameters based on historical data. Our estimation method is an iterative method bases on the expectation maximization (or E M ) algorithm. Contents Abstract ii Contents iii List of Tables v List of Figures vi Notations and Definitions vii 1 Introduction 1 2 Basic Theory 5 3 2.1 State-space Model with Linear Gaussian Dynamics 5 2.2 Random Behaviour of Ji 7 2.3 Interpolation of conditionally Gaussian processes t Estimations of Parameters 12 16 3.1 Estimation of a 16 3.2 Estimation of (5 and mo or /IQ 19 3.3 Estimation of a 21 3.4 Estimation of (3 25 3.5 Initial Value of a 26 3.6 Iterative Procedure for Parameter Estimation 27 iii 4 5 Estimation Results 30 4.1 Description of "Experiment" or Sampling Procedure 30 4.2 Estimated Values 33 4.3 Convergence of Iteration 40 Future Works 48 5.1 . Estimation Methods 48 5.2 Model Validation and Performance 50 5.3 Statistical Analysis of Parameters 51 Bibliography 53 Appendix A Solving I V P (2.23) for 7, 55 Appendix B Historical Data 58 Appendix C Computational Details 59 C.l Differential Equations (2.22) and (2.23) for fh and 7t in Theorem 2.2 t 59 C.2 Backward Equations (2.37) and (2.38) for m(s,t) and 7(s,t) and differential Equation (2.39) for in Theorem 2.3 C.3 Definition and Computation of Rt 60 61 C.4 Calculation of Yh in the Estimation of a 2 iv • • • • 63 List of Tables 4.1 Estimates of Parameters with Rank 34 4.2 Summaries of Estimates for Parameters 35 4.3 Stocks in Ascending Order of Estimates 37 4.4 95% Confidence Interval of a 41 4.5 95% Confidence Interval of exp(—a) in 10-thousandth 42 4.6 95% Confidence Interval of S 43 4.7 95% Confidence Interval of (3 44 4.8 95% Confidence Interval of a 45 v List of Figures Pairwise Scatter plots: Estimates for Parameters from 20 Stocks 95% Confidence Intervals for Parameters vi . . 39 46 Notation and Definitions {Ct '• to < t < ti} denotes a continuous time process £ in from time to to time t\. So, & is value of the continuous time process at time t in years. {£o>£i) • • • >£ni • • • j £ / v - i } denotes a set of TV discrete observations from the continuous time process : 0 < t < T } , taken at uniform frequency A t in years. Thus, while T denotes the length of duration in which the continuous time process is observed, N represents the number of discrete observations taken,starting from time 0. Thus, T = (N — l ) A t . Here, we somewhat abuse the notations in the sense that the similar notations (i.e., subscripts) are used with slightly different meaning: while £t denotes the value of the continuous time process at time t (in years), £,i refers to the n t h discrete observation. However, the meaning of the notation should be clear from the context and from the different subscripts ("n" and "t"). Thus, ambiguity usually should not occur. In case unusual ambiguity does arise, we still use subscript n for the discrete observation but use the time in parentheses for the value of the process: £ as £ ( n A t ) . n (9, £) = (9t, with 0 < t < T, represents a continuous time random process with unobservable first component 9 and observable second component £, cf. p . l of Liptster and Shiryayev [11] {St : 0 < t < T} denotes the observable price process of a stock on the finite time interval [0, T]. vii • Rt denotes the observable cumulative return of a stock starting from time 0 up to time t, relative to the price St itself; i.e., dRt = 4 ^ , cf. Equation (2.12) on p.81 of Lakner [10] or p.4 of Sass and Haussmann [14]. For briefness, it is simply referred as the "return" of a stock and the cumulative, relative or proportional sense are assumed by default. Note that, by definition, i?y = 0. • Following the definitions above, {Ro, Ri, • • • , Rn, • • • i R-N-i} denotes a set of N discrete observations of the continuous time process {Rt : 0 < t < T} at uniform frequency At. That is, R 1 n = R(nAt). B y definition, RQ — 0. • fit == E (^§ ^j, the unobservable mean rate of change of a stock's return over L time. For briefness, it is simply referred as the "mean rate of return". Here, the mean or expectation is taken over a hypothetical population of all the possible replicates or "sample paths" of a stock which could have been taken over time. A — A • fit = ^ — S and Rt = Rt — St are two transformed processes as defined in (2.3) and (2.4) on page 6, respectively. Note that Ro = Ro — 0 x t = 0. • (fit, Rt) takes the role of (8t, £t) as the working process. • J-f = a{^ s : s < t}, the sigma field generated by {£, : s < t}\ here, we mostly work with a{y s :s = o~{R : s < t}. Occasionally, also use 34 to denote s <t}. • fht = E(6t\J-f), the posterior mean of the unobservable process 6t at time t given the observable process & up to time t, cf. p . l of [11] • jt E ^(9t — fn,f) |J-^|, the posterior variance of the unobservable process 0t 2 at time t given the observable process £ up to time t, cf. p . l of [11] t 'Since a year approximately consists of 252 trading days and the process (here, a stock) is observed in every trading day, we take A t = 1/252 years. viii So, ffiQ — E(6o\£,o) = and 70 = E [(9Q — mo) |£o]- In fact, since we take 2 £0 = -Ro = 0, we have mo = E{6Q\£,Q) = E{6Q). Moreover, because we assume #0 or jlo to be deterministic, 70 = 0. See beginning of Chapter 3 on page 16. For s <t, m(s,t) = E(9 \J f) denotes the posterior mean of the unobservable r s process 9 at time s, given the observable process & up to time t, where s <t. S cf. p.25 of [11] For s < t, j(s,t) = E{[9 -fh(s,t)] \Tf} denotes the posterior variance 2 s or dispersion of the unobservable process 9 at time s, given the observable S process & up to time t, where t is fixed but s varies with s < t, cf. p.35 of [11] t>s, r ( t , s) denotes the counterpart of 7(s,i) defined above but with s fixed and t varying with t > s, cf. p.22 and p.36 of [11]. 9 denotes the maximum likelihood estimator or estimate for the unknown parameter 9. . ' C (T>) denotes a set of functions that are twice continuously differentia ble i n 2 V. CT represents a space of continuous functions x — {x , s < T}, cf. p.2 of [11] s [z]+ = z, if z > 0; e, otherwise. Here, e denotes a small positive random number. A ' denotes transpose of matrix A . A® or Oj j denotes the element on the i ix t h row and j t h column of matrix A . Chapter 1 Introduction We consider the unobservable mean rate of return {fi • 0 < t < T} of a stock, when t the return process {R, : 0 < t < T} is observable. t 1 This leads to a random pro- cess (nt,R ) with unobservable first component and observable second component. 2 t Particularly, we consider (fi , Rt) as a continuous time process of the diffusion type t specified by a pair of Ito stochastic differential equations: dut = a(5- n^dt + PdW^, dRt = ntdt + adwf\ (l.l) (1.2) where a, <5, f3 and a are assumed to be some unknown "constants", wf ^ and 1 w["^ denote two mutually independent Wiener processes, and \i§ is assumed to be deterministic. Moreover, RQ = 0 by definition, so we take it as given. 3 See next Some clarifications about the terminologies used here are required: 1) the "rate of return" refers to the rate of change of return over time (per year). 2) The mean or expectation could be taken over a hypothetical population of all the stocks which could have possibly been considered or one of all the possible replicates or "sample paths" of a single stock that could have been taken over time. 3) The "return" refers to the cumulative change of price from time 0, relative to the price itself. So, if St denotes the price at time t. N, can be defined as dR = dS /S in equation (2.12) on p.81 of Lakner [10] or p.4 of Sass and Ha.ussmann [14]. In filtering terminology, fi is the signal, whereas Rt is the observation. Herc, the roles of fig and RQ differ from those usually assumed in literature, where they are usually assumed to be two random variables independent of and W^ \ Sec the beginning of Chapter 3 on page 16. x t t t t 2 t 3 2 1 / paragraph that clarifies the constancy of the parameters. In fact, this model had been used by Lakner for the return process R and drift process n in the context of multiple stocks, c.f. equations (2.13) and (4.1) of [10]. However, Lakner basically worked with this model directly. Instead, we simplify our work by first re-parameterizing the model into a simpler one; see Chapter 2 on page 6 below. The interpretations of (3 and a are relatively obvious. Because they are the multiples or scalars of the two random increments of the Wiener processes, ,8 and a denote the volatilities of fi and R , respectively. For the interpretations of 6 and t t a, let us first take expectation on both sides of (1.2) to get E{dR )= (Ent)dt, t o r E ( § i L ) = (1-3) As deprived later on page 19 in Section 3.2, (1.1) implies Ein = 8 + {Ena-S)e- . at Because we expect a > 0, (3.6) assures that E/i t ' (3.6) remains bounded and Efj, . —> S t as t —> oo. That is, S is the long-term or asymptotic value of Em or E (^§ ^JL Moreover, as observed in Section 3.5 on page 26, a somehow controls the rate of such convergence: (Em — 5) —+ 0 monotonically as a geometric sequence with a retarding ratio e~ a per year. For instance, if a = 1, then Em approaches to its limit <Hn such a rate that the "excess'' 4 (E/it — S) reduces to e _ 1 or about 37% of its value in the year before. It may be reasonable to assume all the stocks have the similar volatility structurejndependent of time. However, it is quite possible that the stocks approach to different asymptotic mean rate of return, Therefore, we assume a, (5 and a are constants both over time and among the stocks. B u t , we assume 8 is constant in time only and may differ from stock to stock. Consequently, 4 If the mean rate of return starts off with E/j, > 5; otherwise, it is a negative excess. a we interpret the mean rate of return is taken over a hypothetical population of all replicates or "sample paths" of a single stock that could have been taken over time. Our major objective of this work is to understand the mechanism that drives the return of stocks. One application of such understanding is to help forecasting the future return and hence the prices of the stocks. This leads to the optimal investment strategy in the stock market: determine the proportion of funds to invest on various stocks that constitute a portfolio and the stopping time for investing on the portfolio. To this end, we would use historical data to estimate the unknown parameters in (1.1) and (1.2). To simplify the current model specified by (1.1) and (1.2), we first re-parameterize the random process, as we will discuss in Section 2.1 on page 5, into one that follows a state-space model with linear Gaussian dynamics for the state fit and observation Rt: dfi = dRt = fitdt + adWP. t -afitdt + fidW^, (1.4) (1.5) More specifically, based on the state-space.model specified by Equations (1.4) and (1.5), we estimate • er using result of Sass and Haussmann in [14]; • a using filter-based maximum likelihood estimators v i a the expectation maximization algorithm derived by Elliott and Krishnamurthy in [5]; • 5 and E/M) using the method of least-squares; • f3 using results on interpolation of conditionally Gaussian processes, well doc2 umented by Liptster arid'Shiryayev in [11]. Except for a, the other four parameters are estimated iteratively. Chapter 2 provides the foundations for the parameter estimations, based on the process (fit,Rt)- Section- 2.1 first starts with a model specified by (1.1) and 3 (1.2). It then introduces a re-parametrization to simplify the model into a statespace model. Section 2.2 provides Theorem 2.1, Corollary 2.1 and Theorem 2.2 to examine the conditional behaviour of fit given the sigma field of {R s (denoted as J-^). : 0 < s < t] Although these results generally impose certain assumptions on the data, in our simple setting, those assumptions all hold automatically, as justified in Section 2.2. Section 2.3 discusses some results on interpolation of conditional Gaussian processes, as the foundation of estimation of (3. Particularly, Lemma 2.1 and Theorem 2.3 present the forward and backward equations, respectively, for the posterior variance of p, given Tf-. s Chapter 3 continues with estimation of param- eters. They include estimation of a in Section 3:1, estimation of 5 in Section 3.2, estimation of a in Section 3.3, the estimation of P in Section 3.4. Section 3.5 justifies the choice of initial estimates for the iterative procedure that estimates the unknown parameters. Chapter 3 ends in Section 3.6 with a summary of the iterative procedure for the parameter estimation. The estimation algorithm introduced in Chapter 3 is then employed to analyze the historical data; see Appendix B for the description of such a data set. Chapter 4 documents the estimation results of the historical data, as well as some aspects on the convergence of iteration. Finally, Chapter 5 gives some directions of future works. These include some possible improvements, as well as some future applications or developments, from the current work. A l l computations are done using Matlab 6.5 Release 13. The two graphical summaries are produced using R version 2.0.1, a free-ware version of S-Plus, a statistical language and computing environment. 4 Chapter 2 Basic Theory 2.1 State-space Model with Linear Gaussian Dynamics We consider the unobservable mean rate of return {/it : 0 < t < T} of a stock, when the return process {Rt : 0 < t < T} is observable. This leads to a random process -(fit, Rt) with unobservable first component and observable second component. 1 Particularly, we consider (fj,t,Rt), or (fi,R) in short unless clarity is needed, as a continuous time process of the diffusion type with a pair of Ito stochastic equations dut. = 'a(S - m) dt + P dW^ , (2.1) dR (2.2) ] t = ntdt + odW^, where a, j3 and a are assumed to be constant both in time and stocks, 5 is timeindependent but may differ among the stocks, and denote two mutually independent Wiener processes, and /io and RQ usually are assumed to be two random variables that are independent of 2 and W^ \ 2 We can apply Kalman filter to (2.2) (2.3), cf. [11], but to use the result of [5] we want to have 5 = 0. In fact, Lakner had used this model in the context of multiple stocks, cf. equations (2.13) and (4.1) in [10]. However, instead of working with the model directly as Lakner or (Ot,£t) in the notation of Liptster and Shiryayev in [11]. Sce the beginning of Chapter 3: here, fio is assumed as deterministic and R is observed. 1 2 u 5 did, we first simplify the model by re-parameterizing the process into (fi , R ) with t fl t = fi - . R t = Rt-St. t 6 and t (2.3) (2.4) Since 5, being the limit of Efi, as t —> oo (as discussed in Chapter 1)', is a constant t in time, the diffusion model specified by Equation (2.1) and Equation (2.2) then reduces to dp dR = d/j, = -apdt = + i3dW \ (1 dR-6dt = (fidt + adW'Wy-Sdt = (n = fidt + o-dWW- 5) dt + - dW {2) G that is, dfit = ~afl dt dRt = fu dt + adWJ; . t + /3dW^\ (2.5) (2.6) 2) t This model now has Gaussian dynamics as used in [5], having the state p, and observation process R. We will see later, cf. Section 3.2 on page 19, how to estimate S as the long run mean of fit, i.e. of Putting this model in Liptster and Shiryayev's notations as on p.2 of [11], with 9 and £ replaced by p and R, respectively, we have dpi = [a {t,R)+ai(t,R)p }dt dR = [A (t, R) + Ai(t, R)p ] dt + D(t, R) t 0 + t Q bi(t,R)d,W^(t)+b (t,R)d,W^(t), 2 t dW^{t), with ao{t, R) = 0, oj {t, R) = -a, b {t, R) = [5, b {t, 0 = 0; A (t, R) = 0, A (t, R\) = x 2 1 and B(t, R) = a. 6 0 x The next few sections examine the conditional behaviour of fit given o~{R : s 0 < s < i). The results are summarized as Theorems-2.1, 2.2 & 2.3 and Corol- lary 2.1. They provide the theoretical bases for the parameter estimations in Chapter 3. 2.2 Random Behaviour of p, t This section examines the random behaviour of (fit, Ri) or more precisely, fit given the observation on {R : 0 < s < t}, i.e., given the sigma field T^'. s In the later discussions, the following conditions will be assumed (first expressed in terms of Liptster and Shiryayev's notations as in [11]). See equations (11.4)-(11.11) on p.2, 3 of [11]. Suppose CT denotes a space of continuous functions x = {x , s < T). s (1) Vx G C , T rT I ' / 0 Y, i\^(t,x)\ + \Ai(t,x)\}+ \fcO,l Y tf(t,x) + B (t,x) ] ,dt < oo; 2 j=l,2 J rT / [A (t,x)+A (t,x)}dt< oo; • Jo inf D (t,x) > C > 0, 0 < t < T; xeC 2 -• 2 0 2 (2)Vx,y£C , T \B(t,x)-B(t,y)\ B (t,x) 2 <Li t f\l Jo \x -y \ dK{s)+L \x -i |2 Vt\ , ' f Jo <L 2 2 a + x )dK(s) 2 + L (l a 2 t + x ), 2 2 where K(s) is a nondecreasing right continuous function, 0 < K(s) < 1; (3) f • Jo E\A^(t,Z)O \dt<oo, T t E\0 \ < oo,0 < t < T, t pl^jj A (t^)m dt< 2 2 oo| =1, where mi = E(9t\J-f) Translating into our setting, these assumptions become: (1) Vx c C . r £ (\a\ + 1 + P + a ) dt < oo, 2 2 . (2.7) / ldt<oo, or T < o o , ' Jo ini a (t,x) > C > 0, 0 < t < T, or a > 0; 2 (2.8) (2.9) 2 c (2) Vx,y€C , T 0 = \a - <J\ < Li f \x - y \ dK{s) + L \x - Vt\ , •lo 2 2 8 2 a = a {t, x) < Li I (1 + x )dK{s) Jo 2 2 2 (2.10) 2 s t + L {1 + x ). 2 2 ' .(2.11) where K(s) is a nondecreasing right continuous function, 0 < K(s) < 1; (3)[ E\p, \dt Jo < oo, t E\p, \ < oo,0 < t < T, (2.13) P ^ m (2.14) t where fht = (2.12) 2 d t < o o j = 1, E^il^J-^) Justification for assumptions (2.7)—(2.14) Since a, (3 and a are constant, as soon as T < oo, assumption (2.7) will automatically hold. Assumptions (2.8)-(2.11) obviously are automatic as well. For assumptions (2.12) and (2.13), note that by using integrating factor e at to rewrite the model equation (2:5) for /l, we have: dflt + afitdt = P d,Wp } =>• d \e = Pe e fit / x + [' Jo at at ai 0 dWf \ because a is a constant l Pe dWPds as =• fit = e~ at P f fi + e~ at 0 E \TH I = E e~ at Since a. > 0 and t > 0 so that e a t e fi + e~ as at 0 dW™ ds fi f Jo e as dW™ ds < 1 and since /l*O — (MO - <5) is deterministic, we have Thus, with, a > 0 and for 0 < t < T < 00 E\tn\ < 00. (2.15) B y the virtue of inequality (2.15), assumptions (2.12) and (2.13) follow. Finally, assumption (2.13) holds because Therefore, in conclusion, all the assumptions (2.7)-(2.14) are valid in our setting. W i t h the assumptions specified in Equations (2.7)-(2.14), the following theorem, as stated on p.3 of [11], assures that the random process (p, , R ) is conditionally t t Gaussian. Theorem 2.1 (Joint Conditional Gaussian of /z's) Suppose conditions (2.14) are fulfilled and let (with probability 1) the initial conditional Fi ( ) E a 0 ~ P{Po < \Ro) a (2.7)- distribution be Gaussian, iV(mo,7o), with 0 < 70 < 00. Then the future 9 random process (p,t,Rt), 0 < t < T, satisfying (2.5) and (2.6) is conditional Gaussian in the sense that for any t, and 0 < to < t\ < • • • < t n < t, the conditionally « distributions Ri (XQ, x ) = P(pt F 0 are (P-a.s.) n < x , • • •, fH < XnlF?) 0 0 n Gaussian. Corollary 2.1 (Conditional Gaussian of jit) the conditional distribution F ^(a) c for some parameters m t) In accordance with Theorem 2.1, — P(fit < a\j [ ') will also be Gaussian. } (fii\!Ft') ~ N(m 7t), 3 t That is 1 and "ft- As a consequence of Corollary 2.1, we conclude that the posterior mean, m<, will be an optimal estimate of jit from {R : s < t], where optimality is in L norm s 2 or i n the mean-square sense. Thus, a procedure to obtain m/ is called for. The next theorem, as stated on p.21 of [11], provides a pair of differential equations for the posterior mean and variance of the conditional distribution of jX given {fi : s < t} t (i.e., nit and jt)- s However, this theorem requires the following assumptions, cf. equations (12.12)-(12.15) on p.17 of [11]. sup E(9f) < oo; 0<1<T rT / £[a (s,£)+ai(s,O<y Jo ds < oo; 2 0 J E{26 [a (s,O+ai(s,O8 }+{bl(s,Z)+4(s,O}} ds<cx>; 2 s f E{A (s,t:) Jo r 0 0 s +A (s,00 } ds < oo; 2 1 s In our case, these assumptions become: sup £(/J. ) < oo;' 0<t<T (2.16) [ E{ajl ) •Jo (2.17) 4 T 2 s 3 ds < oo; Seep,16 of [11]. 10 E ( - 2 a / ; + / 3 ) ds < oo; 2 f Jo fJo 2 (2.18) 2 E (p ^ ds < oo. (2.19) Jo Justification for assumptions (2.16)—(2.19) B y virtue of inequality (2.15), we2 have both E \p \ and E \pt\ bounded. Thus, assumptions (2.16) and (2.19) directly 2 4 t follow. Also, because a and (3 are constants, assumptions (2.17) and (2.18) obviously hold. Theorem 2.2 (Representations for fht and 7^) Let (fi,R) be a random process defined by differential equations (2.5) and (2.6). Suppose (2.7-2.14) and (2.162.19) are satisfied and suppose the conditional distribution of pa given RQ IS Gaussian, N(niQ, 7o). Then rli is the unique T'^-measurable solution of the linear system t of stochastic differential equation (2.22) and ft is the unique solution of the deterministic Riccati equation (2.23):^ dmt = —am dt H—^IdR — m-tdt); ^ dt = -2a (2.22) t lt + f 3 2 - t o~ (2.23) subject to the initial conditions mo = Eipo\Ro) and 70 = E ^(po — mo) |i?oj . 2 To solve for fat given in Equation (2.22), let's put {a. 4- as Kt so that Equation (2.22) becomes: dfht = — (a + -^j fhtdt + = dRt —Kifhtdt + -^r dRt. <7 ''Expressed in the notations of [11], the equations are: dm t ' = r• ( r>\ i 1 [a, (t,R) + ai{t,R)m ] dt + 0 b (t,R)B(t,R)+~f A {t.R) ' ' ' ' ' B*(t,R) dR - (A (t, R) + Ai{t, R)m ) dt] t jt = v 2 ; u t l t 2aj(t,R) 0 + b (t,R) + b (t,R) 2 Jt t 2 x 2 11 h(t,R)B(t,R)+'ytA {t,R)'" B(t, R) l (2.20) (2.21) Note that K = (a + ^ is a function of t. Using the integrating factor e-h ' , K zdz t we have d[m eSZ '*]=eft '*(j£)dRt K K t J* ( m e / o . ^ = fh + a- 2 t 0 m 0 7 s e /o (2.24) dR, 7s& Jo cLR, where It K= t (2.25) la+-r Also, as shown in Appendix A , solving Equation (2.23) gives j . in closed t form: l + £t It — c p. 1 - £t (2.26) (TQ where p = Vo 2.3 ^(aa) +p , (2.27) = — + era, and (2.28) - ( °-P\ 2 2 f-W V (2.29) Interpolation of conditionally Gaussian processes This section develops the basic theory for the estimation of (3 introduced in Section 3.4. Suppose (p,R), specified in the stochastic differential equations (2.5) and (2.6), satisfies conditions given'by (2.7)-(2.14). Then, by the virtue of Theorem 2.1, the conditional distribution P(p \pf-), s s < t, is (P-a.s.) Gaussian. Let's denote the parameters as m{s,t) = E{p \F?), s ^s,t)^E{{p, -m{s,t)] \^}. 2 s Theorem 2.3 below gives m(s,t) and "y(s,t), for s < t;in close forms. 12 As stated below, representation of 7(-,-) depends on (i) whether the first argument is larger or less than the second argument and (ii) which of the two arguments is fixed and which is varying. In [11], t is always used to denote the larger argument, whereas s is always used to denote smaller one. Also, the second argument of 7(-, •) is always fixed and the first one is varying (or running) . When the first 5 argument is larger with the second argument fixed, [11] refers the representation as forward equation (of interpolation) for 7(4, s). In the other case when the second argument is larger with the first augment fixed, [11] refers the representation as the backward equation for 7(s,i). When the notations for the forward equations were first introduced on p.22 of [11], the subscript p, was used in m and 7 to distinguish s the difference as: m .(t,s) = ii 7A„(*,s) = E{\fi a E[jit\J?" ], fi - (t,s)} IJf "*} . 2 However, these subscripts are suppressed later in [11]. For clarity, we use -/{s,t) in the case s is varying in the backward equation and r ( i , s ) when t is varying in the forward equation; in both cases, t > s. Note that in the discussion of interpolation and extrapolation of components, Liptster and Shiryayev extend the l t d stochastic differential equation for the observable process from dR = [A (t, R) + Ai (t, R)p ] dt + B(t, R) dW® (t) t 0 t of equation (11.3) on p.2 of [11] to 2 dR\ = [Mt, R) + M(t, R)ih} dt + Y, i(t, R) B dW®(t) of equation (12.60) on p.30 of [11]. Thus, in the later discussion, we extend from B(t, R) = a to B-i (t, R) = 0 and B (t, R) = a. 2 5 cf. p.35 of [11] 13 Theorem 12.9 on p.37 of [11] provides representations of m(s,t) and 7(.s,£), for s < t: m(s,t) = m + f\(s,u)(tf(R))'A[(u,R){B Js x\dRu - {A {u,R) = Befall) + A^fyfhujdu}, 0 7(M) o s (2.30) fy^m'A'^R^BoBr'iu^R) +% x A {u,R)^{R)du}~ 7.,, X Y (2.31) where / is the identity matrix, B o B denotes B\B[ -f B B' 2 6 2 and for t > s, the triangular matrix <p (R) is a solution to the differential equation l 7 s ^ dt d = [ ( i , i j ) - T(t,s) c(t,RM(R), <p° = I a (2.32) a with boB = hB\ + b B' , 2 2 a(t,x) = a\(t,x) - (boB)(t,x) c(t, x) = A[(t, x) (B o B)- ^, 1 (BoB)"'(t,i) x) A^t, A^fx), x), as defined on p.36 and p.32 of [11]. Lemma 2.1 (Forward Equation for T(i, s)) stochastic differential equations 8 Suppose (fi,R), (2.5) and (2.6), satisfies specified in the (2.7)-(2.14). Then, for t > s, r(t, s) is the unique solution to the differential equation: ^ at = (2.33) a z with F(s, s) = 0. Note that the differential equation (2.33) for T(t,s) is the same as (2.23), just subjected to different initial or boundary condition. Modifying the solution for li 7 8 p.32 of [11]. Lemma 12.2 on p.36 of [11]. See p.36 and Note 3 on p.22 of [11]. 14 7t given in (2.26)-(2.29) for boundary condition F(s, s) = 0 gives F(t,s) = a p- 1+ 1 £t,s aa (2.34) £t,s — where (2.35) P aa — p £t, . exp , aa + p s -2p (t - s) N (2.36) because, with F(s, s) = 0, (2.28) gives Vb = ^ ' ^ + aa = aa. r 6 Note that even though T(t, s) = 7 „(t, s) = £ {[£., - % , ( t , s)] | j f 2 A s , R | de- pends on p,,* by definition, (2.34)-(2.35) show that r(r.,.s) does not depend on p . s Theorem 2.3 (Backward Equations for m(s,t) and 7(s,t)) Suppose (p,R), specified in the stochastic differential equations (2.5) and (2.6), satisfies (2.7)-(2.14). Then, for s < t, the conditional mean and variance of p given observations from s process {R : s < t} are: s m{s,t) = 7(M) = = m +a s 4 rt 2 f Js j(s,u)ip'^[dR m du], n + u (2.37)- du 1 7 7 ' + W W du (2.38) Js where ip™, with u > s, is the solution to the IVP d^ du = -{a + a- r(u,s)}<p , 2 u s <p> = l. s (2.39) Solving the IVP gives V?" = e x p | - a(u-s)+a~ 2 because ip = l. s s 15 j T(z,s)dz j . (2.40) Chapter 3 Estimations of Parameters Because R is the cumulative return starting from time 0, Ro = 0 by definition. Moreover, because Rt = Rt — St, this leads to Ro = Ro = 0. Consequently, E(.\Ro) or E(.\J-Q) is simply E(.) — it is so, because RQ assumes its value by definiton and provides no information on the process. Thus, mo = E(no\^ o') ~ -^(MO)- Further: more, we also assume the initial process is fixed; that is, fio = rn,Q is deterministic. Because fit is defined as Ht — S for some constant 5, p,Q = fio — S is also deterministic. Therefore, 70, the conditional variance of flo, is zero. 3.1 Estimation of a Observe from model equation (1.2) on page 1 dRi = ntdt + adWt , 2) (1.2) the parameter a is the volatility of the return process Rt- Thus, one can estimate <T based on the quadratic variation of Rt- Here we use a method proposed by Sass 2 and Haussmann, where a hidden Markov model was used as a multi-stock model. See, particularly, pp.568-569 in [14]. 16 Sass and Haussmann observed in equation (5.7) of [14] that jE [(R - R) ] 2 t+h t =o- + -E 2 ^ X dtj (3.1) Moreover; according to Proposition 5.1 of [14], the last term in (3.1), j^E (j ''' m dt 0 can be expressed as a power series in h with a zero constant coefficient. Combining this with (3.1) and putting ^E [{Rt+h — Rt) } as Yj,, we conclude that 2 -. Y h oo = -E [(R k t+ ~ Rt) ] =v 2 2 + Y, ^ > fc=i (3-2) a for some constants 0 1 , 0 , 2 , . . . . Note that a is the constant term or (y-)intercept of 2 the power series (3.2). Thus, if we can observe or estimate ^E [(Rt+h — Rt) ) 2 or Yh for a certain set of h values, we can use the method of least squares to fit the observed on some polynomial of h (say of degree p). Then we can estimate a by 2 the estimated value of the intercept. According to Sass and Haussmann, the degree of the polynomial somehow depends on the unknown parameters. However, based on their analyses of the simulated data and the historical data 1 in one dimension, they observed that p = 2 (i.e., a quadratic fit) "performed very well". See p.568 of [14]. Below, we use the method due to Sass and Haussmann to estimate Y/, or \ E [(Rt+h — Rt) ] 2 x {So, Si,..., using discrete observations {RQ, RI, . . . , /?./v-i}, or equivalently SJV_I}, observed N times from the continuous time process {St : 0 < t < T } a t uniform frequency of At over (N — l)At = T years. Particularly, we will obtain m pairs of values (h,Yh) corresponding to h = At, 2At,..., mAt, for some small positive integer m > p, the degree of polynomial of (3.2) use in the least-squares fit. For j = 1, 2 , . . . ,TOand h = jAt, 1 estimate Yh by the sample mean based on We will use the same data set in Chapter 4 for parameter estimation. 17 = L ^ J values : 2 n-l Y t = Y jA = h ( '<Rk A n f-i h k=0 1 n-l 1 2 r ' E l (Rkm+j ~ Rkm) , where, for instance, Rkm+j denotes (km + j) th process; that is, Rkm+j = R((km + j)At) (3.3) discrete observation of the return = R(kmAt + h), because h = jAt. See the list of notations and definitions. Recall that R is calculated from St as t R t + h -R St+h\ , 1 J2 = \ o g \ ^ j + -a h, 2 t from Section C.3 on page 63, where Rt+h and R t (CIO) l are two returns separated by time h. Notice that a, the parameter being estimated and thus its value is not yet known, appears on the R H S of ( C I O ) above, that defines R h - Rt- Thus, some i+ adjustment is called for. Recall that h = At, 2 At,..., mAt for some fixed small m. Here At = 1/252. So, i?, t + / ( -i? = l o g ( ^ ) + 0 ( A t ) « l o g ( ^ ) , (3.4) t where, for instance, St+h denote S(t + h), the price at time t + h. (Rkm+j — Rkm) o n the R H S of (3.3) by the approximation in (3.4) gives 1 Yj or Y « ± £ n h to order of O(At). Replacing _ 1 " to 1 i log(%^)f (3.5) h Here, for instance, Skm+j denotes the (km+j) th tion taken from the price process; that is, Skm+j — S((km, + j)At). discrete observaSee Section C.4 on page 63 for some computational insight for Yh.. In summary, for h = At, 2At,..., mAt, compute Yh as in (3.5) for some small value of m (say m = 4, cf. pp.568-570 of [14]) This leads to m pairs of (h,Yi,). v R u n a quadratic fit of Y/,. on h and take the (y)-intercept to estimate a. 2 For simplicity, below we assume TV, the number sample points, is a multiple of m; truncate the data if necessary. Consequently, n = L ^ p J = 2 18 3.2 Estimation of S and mo or /IQ To avoid two-dimensional Kalman filtering, we use a rude estimator of 6. First, let's recall model equation (2.1) from page 5: d/M = a(5 - m) dt + (3 dW^>, where (2.1) is a Wiener process. As we first integrate both sides of (2.1) with respect to t, then take expectation, interchange order of expectation and integration, and finally differentiate with respect to t, we have [' a{5 - fi ) ds + f p dWW Jo Jo fit = li + 0 s => Em = Efi + j => ^ {Em) = aS - 0 a(5 - Em) ds + (3 J o E [dW^] aEm, because a, S, and f3 are constants and E ^dW^~] = 0. Using the integrating factor e Qt gives => (Ent)e at = Em) + 5(e -l) at Em = 6 + (Efio-5)e~ \ at (3.6) since a and 5 are constant. Observe that (3.6) can be interpreted as a linear relation between Em and e~ , at with unknown intercept (3o = S and slope (3\ = (Efio — 5). These two parameters of (3.6) can be estimated by regressing the observed value of fit on the corresponding value of e~ , al which can be calculate for each time t, once a is known. Notice that as argued in Chapter 1, the expectation on (3.6) is taken with respect to a hypothetical population consisting of all the replicates or "sample paths" of a single stock that could have been sampled over time. Recall from (1.3) on page 2 that Em = E (^f -)- So, Em 4 o n by the sample mean of observed rates ^ 19 the left side of (3.6) can be estimated taken over the M "sample paths" of a *2, • • •, set single stock. That is, for t — to, M 771=1 as the observed value of Efif + Pixt, t at L Then rewrite (3.6) as Y =0o where xt denotes e~ , " N (3.8) f% denotes 5, the intercept and Pi denotes (E/IQ — 6)., the slope of (3.6). Thus, once paired, data (xt,Y ) are available, we can estimate S and t Efj,Q by estimating PQ and Pi using method of least-squares. Obviously, we did not observe and hence needed to be estimated. Indeed, it is a well-known problem to get a good estimate of ^ . One can approximate ^ using the » forward difference = ( +^)-^ ) fl t t ) • backward difference = R ( t ) . central difference = ~^/" A t ) , or ^+M)-R(t-^y where At denotes the uniform frequency in which discrete observations are taken. It is well-known that the errors of approximation using the forward and backward difference are both 0 ( A t ) , whereas that of central differences is 0 ( ( A t ) ) . For in2 stance, see [2]. Thus, in regular numerical analysis, the central difference is preferred over its counterparts. However, in stochastics, we usually use the forward difference, in order to preserve certain indepence properties. From Section C.3 on page 61, we have St\ R t = {f ) l0g + Q , 1 „ 2 2 + a X { C - where, R = R(t) and St = S{t) — the values of the processes at time t, not the t observation. Approximating dRi dt _ ~ by the forward difference gives Rt+i - R At t 20 2 ) t lh _1_ At 1 log At l o a (t + At) 2 So 7 2 1 , (3.9) \-sr g In summary, for t = 0, A t , 2At, . . . , (AT - 2) At, use (3.9) to approximate in (3.7) for the m t h of the M "sample paths" of each stock. This gives M 1 M E m—1 1 f 1 ^ l o g ( S / S ) + ^cx m 2 t ) M Then regress Y , as defined in (3.10), on x = e~ ot t t 1 (3.10) and use the method of least- squares to estimate [3Q and 0i, the intercept and slope of (3.8). Then estimate 5 by /3Q and jG/io by (/3Q + /3i), because E/IQ - 5 = l3\. 3.3 Estimation of a The classical linear Gaussian state-space model has been heavily used for the signal model and observation processes. For instance, Elliott and Krishnamurthy in [5] described the signal {x ,t t > 0} and the observation {yt,t > 0} as dxt = A x dt dy = C x dt + D dv , t t t t XQ G + Dtdu , t t t R, m y = 0e t 0 R. n Matching with our one-dimensional model (2.5) and (2.6) on page 6 dut -ap, dt dR p, dt + t we have A = -a, t t t + f3dW , W t adW , (2) t B = [3, Q = 1, and D = u. t t • Elliott and Krishnamurthy in [5] derived finite-dimensional filter-based maximum likelihood estimators ( M L E ) for the parameters A and C in the state-space t 21 t system. These M L E ' s are based on the expectation maximization ( E M ) algorithm. Here, we apply these results to estimate a. Equation (5.4) on p.1920 of [5] gives - a = A = L' Hr\ (3.11) t where, as given Equation (3.33) and Equation (3.31) on p.1917 of [5], If E{Vl\y ) = = t m m r f + fti s( + Y Y l- (P' ^ ( P . <?) + m' v%m , or u t t t p = l 9=1 Lt = n +stmt + ut(j +m ), H? = / ' ( / ^ ' 3v) rn = and 2 t f. + 't t + Y Y a m b (3.12) rn C ^ (P. 7)7t(p, o) + m' cfm , t t or p=l 5=1 H t = at + 6tm + ct(7t + m ) , (3.13) 2 t where 6[',c| ,sf and ttf are given explicitly in Equations (3.15), (3.16), (3.29) and 7 (3.30) on p.1915 and p.1917 of [5]. Moreover, af and rf are solutions to differential equations given as Equation (3.5) on p.1913 and Equation (3.25) on p. 1916 of [5], respectively. Notice that the updating formula (3.11) for a requires a minor adjustment due to the following difference: while Elliott and Krishnamurthy allowed At as a function of t, a in our setting is a time-constant. As a result, all the values —Lt/Ht. are M L E ' s for the constant a . 3 Thus, we can first calculate N values of a as in (3.11) based on N observations made at time points t = nAt with n = 0 , 1 , . . ' . , N — 1. Then take the mean to update a. That is, take J V n=0 t H Bccause they are estimates, not the actual value, of a and hence involve estimation errors, they need not take on the same value. 3 22 where t — nAt. Moreover, since we assume a > 0, this restriction should be implemented in the iteration as follows. Update a by N-l r 1 -L t 71=0 (3.15) where J+ ( z, if z > 0; e, otherwise. Here, e denotes a small positive random number. We take e as a random number instead of a fixed number, say zero, to avoid cycling in iteration. For an appropriate choice of e, see the end of Section 3.6 on page 29. Note that even if —Lt/Ht occasionally takes on negative value, it does not necessarily contradict the assumption that a > 0. See discussion on page 26. In order to write down those equations that define bj, cf, sf, uf, of and rf, let's first make two definitions. Let a 6 R m i th denotes the unit vector with 1 in the position. Also, suppose Gt satisfies the differential equation given .in Equation (3.14) on p.1915 of [5]: ^ = -(A' + ^ B B' )G , = (a- r\9 )G , with Go = l t t t t 2 7 t I mxm with G = 1. 0 This implies G t = Go exp Jo I*"-'* Is rvt e exp ds (3.16) because Go — 1 and a is constant in time. Explicit Equations for bt,c ,St,u , at, and r t Gt{ t Gj^e'i t + eie^G'^dslG't, 23 or G fGT 'ds, 2 Jo t 2G { j t 2(3 G, b G-^DsB'^ihsds), 2 t u; (3.17) = Jo (^s , or '-ds, (3.18) + (A's + ^BsB'^e'A'iG^dsJG,., ft u t = = G and (2u f Jo A G . , s 2 t (3.19) ds \ , 1 PG t ds, jT [ G ; ( 2 u f - e.e'^B^fhs] t s -2 -a+—]G 7. U Jo G or or ds. (3.20) 7 s Also, at and r are the solutions to the differential equations: t dt dat lit drf dt dr t ~dt TricfBtB^+bjBtB^fht, with og = 0 or P [ct + bt — 7t , with ar, = 0 and Tr(v??B B' ) 1 1 t t P [u + + s } BtB'^mt 2 t 7t - T r ^ B ^ ) , = 0 or 1 ) , with To = 0. Integrating the differential equations of ^ a Jo t rt with P- and ^ c + O z — 1 dz Iz sm . u H I K , Iz z z f Jo gives a* and r : t (3.21) z z because ao = 0 = ?*oOnce data of {Ro, R\, • • •, i?/v_i} are known, one can evaluate these integrals numerically and thereby estimate cr. given m, 7 and a, we can compute these functions, then L and H, hence get a new value for a; that is, we estimate a iteratively. 24 3.4 Estimation of (3 • Equation(5.7) on p.1921 of [5] expresses t as: 7 It- E{x x' \y ) 0 0 +E | t x dx' + J* dx x' \y } + BB't. s s s a t In our setting, it becomes: 7 l = E{ui\T^)+2E^fi dTi \^+[3 t 2 s = s E(f4\F )+2L +pX R t t because, as given in Equation (5.4) on p.1920 of [5], the second expectation is simp])' Li. Solving for f3 , we have 2 (i = j [y - E (fillF?) 2 t For the expectation term Efa^F^) - 2 L ] .. (3.23) t in Equation (3.23), note that E [(fio - m ) | j f ]' = E - 2m E (fio\^) 2 0 0 + m. 2 So, we have 7(0, t) = E E ) - 2 m m(0, t)'+ m§ 0 (jft\f ) = 7(0, *) + 2 m m(0, t) - mg. R t (3.24) 0 Putting Equation (3.24) into Equation (3.23) to replace E ( M O I - ^ ) leads to P = - [7t-7(0,t)-.2mom(0,t)+m§-2L ] 2 t t ,. ' . (3.25) where if z > 0; e, otherwise. Here, e denotes a small positive random number. Note that (3 is a constant and independent of t. So, we can first calculate AT values .of /3 as in (3.25) based on' 2 25 N - l observations made at time points t = nAt with n = l , 2 , . . . , J V — 1. Then take the mean to update P : 2 P' = J^-J 2 J! {1 h " 7 ( 0 , ^ ~ 2 f h o ™ ' *) + ™o - 21*] J (3.26) ( 0 + Justification for the adjustment [.]+ When /? is estimated, the estimated value 2 would differ from the actual value to be estimated. Thus, even though fi 2 > 0, estimation error may cause its estimated value occasionally takes on a value less than zero before convergence is reached during iteration. This would be particularly so if P is close to zero. Therefore, this adjustment is necessary to avoid the estimated 2 value of P from drifting off its domain and going below zero during iteration. 2 3.5 Initial Value of a Recall (3.6) from page 19 that Em = 6 + (Epo - S) e~ , (3.6) at so that E/j, —> 8 as t —> oo. To examine the rate in Efi t —> 5, let us rewrite (3.6) t gives (Ef -5) = (Ef -6)e- (EiM-8) = {Eto- -5)e- , M or at M) a 1 where t is measured in years. Thus, Efit -*• <5 in such a way that the "excess" 4 (Efii — 5) from the limit value decreases as a geometric sequence with a retarding 5 ratio of e~ a per year. While a retarding ratio of 40% or 0.4 per year may be a reasonable guess, we take a = — log 0.4 w 0.92 or 1.0 as an initial value of a for the iterative estimation procedure. ''If the mean rate of return starts off with EfiQ > 5; otherwise, it is a negative excess. because a > 0. 5 26, 3.6 Iterative Procedure for Parameter Estimation Let's summarize from the results above and outline the iterative procedure to estimate the parameters a, 6, mo, and /3 . 2 Input: • discretized values from the return process {Ro, Ri, • • •, -Rjv-i} calculated from discrete observations on stock price {So, Si, •.., S J V - I } as i ? = log n \ a nAt, 2 + derived on page 61 in (C.2), • the uniform frequency A t in which the stock prices are observed, • estimate of a 1 as computed by procedure due to Sass and Haussmann (see Section 3.1 on page 16). • initial guesses for a and /3 — note that the iteration does not require initial value for 5 nor mo to proceed, • tolerance as a criteria for convergence (see the next footnote below), and • the maximum number of iterations to stop even if convergence is not yet reached. Output: • estimates for parameters a, 5, mo, and 0 , 1 • "flag" indicating if convergence is actually achieved within the tolerance specified, • the "extent of convergence" (as measured by the maximum absolute change in the estimated value between two successive iterations), • number of iterations when it stops. 27 F i x 70 = 0 (because no is deterministic). Do until convergence or the maximum of number of iteration is reached 6 p = ^\craf + 0 from (2.27) 2 Vo = ^ + era from (2.28) t ~ St {iz E m = i fi°&( t+i/S )} } Y + \cr s t from (3.10), where the summation 2 m is done over the M "sample paths" of each stock and At denotes the uniform frequency in which the price process is observed, the reciprocal of the number of trading days per year. Regress. Yt on e~ and use the method of least squares to estimate the at intercept 0o and slope 0i. S = 0o • mo = 0o+0i For t• — 1,... ,T, compute and update R = et = (fe) exp It = o [pT=ft 2 > ((a,/? )) 2 = (a+fy _ a from (2.29) \ a f r o ( ' ) m 2 2 6 from (2.25) fht = mo + a " /„' (i 2 eft * ) dR K s 2 K 6t - 2 / 3 G /„' ds from (3.18) t at - 0 Jo (c + b ^f) dz from (3.22) u = G\ J* [ ( - a + g) G" ] 2 z t s = /? Gt / * [ ( ^ ) 2 ftfrom (2.24) from (3.17) 2 z d z ds] from (3.16) t ct = Gl&G- ds e~ So ' dz s G = e"*exp [- /„' (Q t ((a,0 )- Rt~6t t K t 7 from (3.19) 2 s fh ] ds from (3.20) s Hcrc, convergence is concluded if all the estimates stabilize — each estimate differs from its corresponding value in last iteration by a small amount, specified as tolerance in the input. Most integrals in the updating formulas involve uniform subintervals and hence can be conveniently estimated numerically using Simpson's rule. Two exceptions are the integrals for rn and m(0, t), which involve dR . These two exceptional integrals arc evaluated using the trapezoid rule instead. 6 7 0 s 28 r = fi ft (u + ^ - 2 t z l ) dz from (3.22) H = a + b mt + ctilt + fh ) from (3.13) 2 t t t t L = T + sm t t a = ^ E l - t + u ( 7 +TO )from (3.12) 2 t 1 0 t [ ^ ] r t + f r o m = ( S ) (3.15) ( -36) from 2 T(t, s) = a [p i±fj* - aa] from (2.34) = exp [ - [a (t-s)+ 7(s, *) = { 7 7 1 a" T(z, s)dz]} from (2.40) 2 + <^" £ (<Ps) du}"' from (2.38) 2 2 m(0, t) =TOO+ a - 2 fg 7 ( 0 , u)</J '[eLR„ - m„du] by putting s = 0 into (2.3.7) m(s,i) = m + a s 0 - 2 (*7(s, u)ipg[dR\ — m du) t u mo — mo — S P 2 = TV-T El'i { 1 [7nAt - 7(0, t) - 2 T O TO(0,t) + TO , 2 0 - 2L ] t + j from (3.26), where t = n A t . E n d % of For 7oop E n d % of Do loop A n appropriate choice of the random positive number, e: Here, random number is required to replace an obviously erroneous estimate that results from estimation errors during iteration — a negative estimate is obtained even though the actual value of the parameter being estimated should be positive, either by definition (e.g., volatility) or by assumption (e.g., a ) . When a negative estimate results during iteration, the parameter value most likely is close to the boundary of its domain, that is, zero. Thus, a small but positive value should be chosen to replace the obviously erroneous estimate. However, recall that if each estimate differs from its value in last iteration by a small amount, say tolerance of 1E-5, convergence is claimed. So, to avoid an artificial convergence, the chosen random number, other than positive, should differ from the current estimate by more than the tolerance specified. 29 Chapter 4 Estimation Results 4.1 Description of "Experiment" or Sampling Procedure In this chapter, we use the procedure developed in Chapter 3 to estimate the parameters of the stock model specified by the pair of Ito stochastic differential equations (1.1) and (1.2) on page 1. These parameters are a, S, f) and a. To this end, we use a set of historical data of 20 stocks from the Dow Jones Industrial Index. These data cover a period of 30 years, starting from January 21, 1972 to December 31, 2001. See Appendix B for a brief description of such a data set. In fact, Sass and Haussmann also used this data set while they considered a hidden Markov model as a multi-stock model in [14]. For the purposes of parameter estimation, we consider 5-year overlapping periods with starting years 1972, 1973, . . . , 2001. This leads to 26 such 5-yea,r periods, where, for instance, the first set consists of data roughly from the beginning of 1972 to the end of 1976 . Since each year roughly consists of 1 'Because the data actually do not start until January 21 in 1972, the first set of data actually goes beyond 1976, covering roughly three weeks of 1977. However, for the purpose of parameter estimation, the calendar boundary may not be important, as soon as each set consists of the same number of each holiday (e.g., five new year clays), event (e.g., five clays of Halloween!) or peculiar days (e.g. five months of October, the month in which stock market has a higher risk of crashing). Indeed, because a calendar year does not have actually the same number of trading days, it is better to have the data starts off in a day 30 252 trading days, we take 252 x 5 or 1260 trading days of data as one 5-year set. The closing prices of a single stock over each of these periods constitutes a "sample path" of that stock. So, 26 "sample paths" are observed for each stock, with each path covering T = 5 years or N = 252 x 5 = 1260 trading days. Altogether with 20 stocks, we come up with 26 x 20 or 520 "sample paths". This leads to an array of 26 (periods) by 20 (stocks) estimates for each parameter. It is worth noticing the pros and cons of taking overlapping versus nonoverlapping periods. In order to examine the repeatability of estimate for each parameter, a few replicates or "sample paths" would be required for each stock. However, each set of data should consist of a reasonable number of data points in order to obtain good estimate of each parameter. Recall that our data set covers 30 years. If we consider non-overlapping 5-year periods, we only have 30/5 or 6 "sample paths" for each stocks. W i t h a fixed amount of information (i.e, 30 years of data), we are bounded to face a tradeoff between the number of "sample paths" and- the number of trading days in each one. One may suggest taking stocks with a longer history. Currently, our data, set spans 30 years. This approach would face two problems. Firstly, adjusted prices with a longer history are rarely available. Secondly, or maybe more importantly, over a very long period, the structure of the stock market may change so much that the parameter values are no longer constant over the period covered. In that case, one needs to model the temporal variation of the parameters as well. In any case, taking overlapping periods is a way to get around the dilemma. Here, each period overlaps with next one by 4 years, resulting in 26 "sample paths" consisting of 1260 trading days. However, there is a price to pay for. Because the "sample paths" overlap and thus share certain portion of data with other "sample, paths", they are not true replicates — or so-called pseudo-replicates. Consequently, the estimates obtained are not independent. In fact, except for those that is not close to any of those special.days mentioned above. Now, our data set starts off three weeks after a new year day and thus has a good chance to balance the number of any special day in each set of 1260 consecutive trading days. 31 obtained in the very early or very late years, each estimate would somehow relates with eight other estimates — four from the sets earlier and four from the sets ahead in time — with different degree of dependence according to different amount of overlapping. For instance, while the estimate obtained from a path starting in year 1976 would relate with those obtained from paths starting in years 1972 to 1980, it relates more heavily with those two starting in years 1975 and 1977 but much less so with those two starting in years 1972 and 1980. Indeed, this dependence structure among the estimates may be too complicated to be incorporated into the later analysis. However, we should be aware of the effect of such a dependence structure. See further discussion below. The estimate of each parameter, say, a, from each "sample path" of a stock, say, Walt Disney, can be considered as a realization or an observation taken from the sampling distribution of the estimate for a. Thus, the 26 observed values of 2 the estimate for a, one from each 5-year period or "sample path", should be treated as a sample of 26 observations for a. Thus, we can take the sample mean of these 26 observed estimates as a better estimate of the true value of a. A few words of caution worth regarding the observed sampling variation of the estimate among stocks. Because the estimates are obtained from common data due to overlapping periods, they appear more similar among' "sample paths" than they should be. Consequently, the sampling variation tends to be under-estimated. Finally, based on the Student's t-distribution, 95% confidence interval for each parameter is obtained for each stock. Notice that, because the sampling variation of the estimates are under-estimated, the resulting confidence intervals are Sincc wc cannot control for all the factors that would affect the price and return processes, wc simply incorporate the totality of all those effects as random fluctuation, modelled as a Wiener process. Thus, any observation is the total of two unobservable components: the actual value and a random error taken from certain (unknown) error population. Tims, the resulting estimate is not the actual value to be estimate but varies from sample to sample. Such randomness among samples driven by certain uncontrollable mechanism is referred as the sampling distribution of the estimate. 2 32 shorter than they should be. Consequently, the actual coverage of the resulting confidence interval is lower than the nominal value or 95%. It would be nice to quantify such a discrepancy, but this would require knowing the complicated dependence structure among the estimates. Here, we simply interpret the confidence intervals with caution that they are shorter than they should be. For all the stocks and "sample paths", we use initial guesses of 1.0 for a and 0.2 for f3 , or equivalently, 0.4472 for (3. Recall from Section 3.6 on page 27 that no 2 initial value is required for 5 or mo in order to start the iteration. Also, all iterations use a tolerance of 5E-4 and choose random number between 0 and 0.01 to replace' any obviously erroneous estimate during iteration. 4.2 Estimated Values Table 4.1 on page 34 provides the estimates of a, 5, (3 and a for each stocks. Note that each estimate is the mean of the 26 estimates obtained from the 26 "sample paths". Due to its definition, variation should be added and subtracted in Li norm. Thus, each estimate for (3 (the volatility of the mean rate of return) actually is the square root of mean of square (abbreviated as R M S ) of the (3 estimates over the 26 "sample paths". The same principle applies for er, another volatility term. The bottom row gives the mean of estimates of each parameter, taken over the 20 stocks. Also provided are the rank of each stock, ranked according to estimate of each parameter. For instance, let's consider the second column, the ranks of stocks according to the estimates for a. For example, since Boeing has a rank of 1 and Exxon Mobil has a rank of 20, we can easily see that Boeing has the least the estimate for a while Exxon Mobil has the largest one. One noticeable pattern is the ranks of estimates for a and (3. Particularly, their ranks among the stocks appear quite similarly, indicating that the estimates increase/decrease together; that is, they appear positively correlated. We will examine this potential correlation later. But, let us first examine how do the estimates for each parameter vary among 33 Table 4.1: Estimates of Parameters with Rank Parameters a Stock Alcoa Boeing Caterpillar E.I.Dupont de Nemours Walt Disney Eastman Kodak General Electric General Motors Honeywell IBM International Paper Johnson & Johnson Coca-Cola McDonald's 3M Philip Morris Merck Procter & Gamble United Technologies Exxon Mobil Mean S Est Rank 10.40 11 1 4.31 7.86 5 10.82 13 6.65 2 9.88 9 11.21 16 7.60 3 9.17 • 7 7.62. 4 9.31 8 11.00 14 11.80 17 12 10.55 12.01 18 11.05 .. 15 10.10 10 12.72 19 8.84 6 13.78 20 N/A 9.83 34 Est 0.1703 0.1977 0.1084 0.1497 0.1683 0.0957 0.1623 0.1222 0.1686 0.0974 0.1264 0.1686 0.1943 0.1620 0.1302 0.1859 0.1936 0.1409 0.1932 0.1375 0.1537 Rank 15 20 3 9 12 1 11 4 13 2 5 14 19 10 6 16 18 8 17 7 N/A 0 Est Rank 0.3536 8 0.2982 1 0.3524 6 0.3577 12 0.3281 3 0:3556 9 0.3812 20 0.3462 5 0.3561 10 2 0.3238 0.3527 7 0.3701 • 18 0.3644 15 17 0.3685 0.3605 13 0.3671 16 0.3364 4 0.3807 19 11 0.3566 14 0.3640 0.3542 N/A a Est Rank, 0.2936 17 0.2697 14 0.2608. 10 0.2573 6 0.2836 16 0.2989 19 0.2416 3 0.2606 9 0.3234 20 4 0.2466 0.2728 15 2 0.2389 0.2597 7 12 0.2638 0.2332 1 0.2983 18 0.2526 5 0.2682 13 0.2603 8 11 0.2613 0.2682 N/A Table 4,2: Summaries of Estimates for Parameters Parameters Summary Statistics a . 5 a 0 Mean t 9.83 0.1537 0.3542 0.2682 Median 10.25 0.1622 0.3563 0.2611 Maximum * 13.78 0.1977 0.3812 0.3234 Minimum § 4.31 0.0957 0.2982 0.2332 Maximum/Minimum .3.20 2.0659 1.2782 1.3872 Range" = M a x - M i n 9.47 0.1020 0.2374 0.2242 Range/Mean 0.96 0.6637 0.6703 0.8360 Standard Deviation 2.24 0.0326 0.1165 0.1118 S D / M e a n or C . V . 0.23 0.2121 0.3288 0.4168 For j3 and a, the mean actually is the square root of the mean square (ie., R M S ) . * The stocks that correspond to the maximum values are Exxon Mobil, Boeing, General Electric and Honeywell. § The stocks that correspond to the minimum values are Boeing, Eastman Kodak, Boeing and 3 M . 1 " For (3 and a, it actually is calculated as \ / M a x — M i n . 2 2 the 20 stocks. To this end, let's consider some summary statistics. Table 4.2 on page 35 provides some summary statistics for the estimate for each parameter, taken over the 20 stocks. These summaries include mean, median, maximum, minimum, maximum-to-minimum ratio, range, range-to-mean ratio, standard deviation (or SD) and SD-to-mean ratio (or the coefficient of variation, C V ) . Again, for [3 and <r, addition and subtraction is done is L norm. Consequently, 2 the mean is the R M S of the estimates over the 20 stocks and the range is calculated using a formula similar to the Pythagorean formula: x / M a x — M i n . 2 2 From the summary statistics given for the estimate of each parameters in Table 4.2 on page 35, first noticeable result is the similarities of the estimates for (3 and a among the 20 stocks. More specifically, while the maximum-to-minimum ratios for [3 and a are 1.28 and 1.39, respectively, the range-to-mean ratios are 0.67 and 0.84, and the SD-to-mean ratios aie 0.33 and 0.42. Here, we take ratios as measures of similarity (instead of solely considering the range and SD) because they 35 are dimensional and hence tend to be scale free (that is, independent of the scale : used). However, with no knowledge on the sampling variation of these ratios, it is difficult to assess how big is big or how small is small, in order to justify if an observed ratio is small enough to be explained by chance variation alone or is large enough to falsify the constancy of a parameter value among the stocks. To this end, some simulations would be required. In any case, while a ratio of 1.28 or 1.39 may be difficult to judge, a ratio of 0.84 or even as smaller as 0.33 is most likely small enough to establish constancy — or, more correctly, not large enough to reject constancy. The situation of 5 is somewhat surprising. Recall that the model allows the stocks to have different 6 value. The figures in Table 4.2 for 6 show that, while a maximum-to-minirnum ratio of 2.07 is somewhat large, a range-to-mean of ratio of 0.66 and a. SD-to-mean ratio of 0.21 are certainly small. These appear to suggest that the estimates of 5 are reasonably similar among the 20 stocks. This, in turn, leads one to suspect that the stocks may have the same value for the parameter 5. For the estimation of a, the maximum-to-minimum gives a high ratio of 3.20, but the SD-to-mean results a low ratio of 0.23! (It appears that we just mistakenly permute the three digits, they actually are two values at different ends of a scale of ratio.) Thus, it is difficult to conclude if it is reasonable to assume a is actually constant among the stocks. Note that the 20 estimates for a average to 0.2682, which differs from 0.26 as reported by Sass and Haussmann on p.573 of [14]. This difference actually is not due to rounding. Indeed, if one simply takes the mean of the 20 estimates in L\ norm, one actually would get 0.2612, which rounds to 0.26 in two decimal places. However, as argue above, one should work in L for measures of variation. Taking the R M S of 2 estimates for a would result 0.2682. A n accurate estimate of a is important partly because its role in computing the return: Rt = log(5{/5o) + \ o t from (C.2) on 2 page 61. This is particularly so for the later data with a larger value of t. In order to better see the potential inter-relationship among the four param- 36 Table 4,3: Stocks in Ascending Order of Estimates a S a Boeing Kodak Boeing 3M Walt Disney IBM IBM Johnson Caterpillar GM Walt Disney GE IBM GM Merck IBM Caterpillar Inter. Paper GM Merck United Tech. 3M Caterpillar Dupont Honeywell Exxon Mobil Inter. Paper Coca-Cola Inter.. Paper Proc & Gamble Alcoa United Tech. Kodak Dupont Kodak GM Merck McDonald's Honeywell Caterpillar Alcoa GE United Tech. Exxon Mobil McDonald's Walt Disney Dupont McDonald's Dupont Honeywell 3M Proc & Gamble Johnson Johnson Boeing Exxon Mobil Philip Morris Alcoa Coca-Cola Inter. Paper GE Philip Morris Philip Morris Walt Disney Coca-Cola United Tech. McDonald's Alcoa 3M Merck Johnson Philip Morris Proc'& Gamble Coca-Cola Proc & Gamble Kodak Exxon Mobil Boeing GE Honeywell eters, additional to examining the ranks of the estimates reported in Table 4.1 on page 34, let us put the stock names in ascending order of each parameter estimate. This leads to four lists, one for each parameter. We then form a table having these four lists put side by side as four columns. This leads Table 4.3 on page 37. Several noticeable patterns stand out in Table 4.1 on page 34 and Table 4.3 on page 37. Firstly, while Boeing has the smallest estimate for a, its estimate for S is largest among the 20 stocks. Secondly, I B M tends to have small estimates for all four parameters. Its estimates for S and [3 are second smallest, and its estimate for a and a are fourth smallest. Thirdly, the estimates for a and (3 appear to be positively correlated: they tend to increase/decrease together. To examine the third point graphically, let us construct a pairwise scatter plots, where estimate of each parameter is plotted versus estimates of all other parameters. For four parameters, 37 this leads to six pairs of scatter plots. A smooth curve is superimposed on each scatter plot to help visualizing any potential pattern between estimates of a pair of parameters. This pattern can be linear or non-linear. Here, a procedure due to Cleveland is used to produce such smooth curve. See [4] and [3] for details. This pairwise scatter plots is provided in Figure 4.1 on page 39. While a graphic summary lets each data point "speak", it should also be accompanied with some numerical summary to capture or summarize the strength and direction of the linear correlation. Thus, we compute the correlation coefficients for each pair of the four estimates: a a 8 1.0000 8 (3 a -0.0176 0.8394 -0.1726 1.0000 -0.0556 0.0990 1.0000 -0.0773 (3 a 1.0000 These numerical summaries are also plotted on the pairwise scatter plots, placed with the size of the figure proportional to the size of the correlation coefficient itself. Thus, a figure of large size signifies a strong linear correlation, regardless of a positive or negative one. It is clear both from the correlation coefficients and scatter plots that while estimates for a and (3 appear correlated positively, the other five pairs of estimates do not appear to be correlated, at least not linearly. Recall that a somehow controls the rate in which E(dR/dt) converges to its asymptotic value; the larger a is, the faster is the convergence. Thus, while estimates for a and (3 appear positively correlated, the estimates suggest that for a stock whose expected rate of change of return converges to its asymptotic value rapidly, it tends to have a volatility. Note that we only consider bivariate linear relation here with no attempt to capture any other relations, such as non-linear and multivariate. . Confidence interval of 95% confidence level is constructed for each parameter first for each stock and then for all the 20 stocks, treating them as a sample taken 38 Figure 4.1: Pairwise Scatter plots: Estimates for Parameters from 20 Stocks 39 from a hypothetical population consisting of all the stocks that could have been sampled. The results are tabulated in Table 4.4 on page 41 for a, Table 4.'6 on page 43 for 8, Table 4.7 on page 44 for /?' and Table 4:8 on page 45 for a. Since e~ is the retarding ratio that measures the rate in which E(dR/dt) a converges to its asymptotic value, we also compute 95% confidence interval for the retarding ratio of each stock; the results are tabulated in Table 4.5 on page 42. To ease the comparison of estimates among the stocks, in each tables of confidence interval, the.stocks are put in ascending order of each estimate. Recall from discussion on page 33 that, because of the overlapping periods used, the CIs constructed are shorter they should be. These 95% CIs for the parameters are also summarized graphically in Figure 4.2 on page 46. 4.3 Convergence of Iteration Finally, let us look at some summaries regarding the convergence of the iterative procedure, particularly the number of iterations required and the "accuracy" achieved. Similar to the convergence criteria introduced in one of the footnotes on page 28, the "accuracy" is measured by the maximum absolute difference between estimates in two last successive iterations when iteration terminates; the maximum is taken among the four parameters to be estimated iteratively. Averaging over all 20 x 26 or 560 "sample paths", they require 354 iterations to reach an average accuracy of 4.84E-4. Among all the iterations, Coca-Cola's "sample path" starting from year 1973 (or the third one) converges the faster, only requires 20 iteration to achieve an accuracy of 4.40E-4. This "sample path" can be characterized by the estimates: a = 2.23, 8 = 0.0042, (3 - 0.4351 and a = 0.2347. Let us compare these estimates with the mean values ta,ken over the 20 stocks provided in the "mean row" of Table 4.1 on page 34: a = 9.83, 8 = 0.1537,'/? = 0.3542 and a = 0.2682. Because the value 40 Table 4.4: 95% Confidence Interval of a Lower Stock Limit Estimate Boeing 2.75 4.31 Walt Disney 4.82 6.65 General Motors 5.77 7.60 IBM 7.62 5.55 Caterpillar 6.52 7.86 United Technologies 8.84 6.93 Honeywell 7.79 9.17 International Paper 7.26 9.31 Eastman Kodak 8.14 9.88 Merck 8.06 10.10 Alcoa 8.83 10.40 McDonald's 8.83 10.55 10.82 E.I. Dupont de-Nemours 9.03 Johnson & Johnson 9.47 11.00 Philip Morris 9.77 11.05 General Electric 9.60 11.21 Coca-Cola 10.05 11.80 3M 10.07 12.01 Procter & Gamble 12.72 11.78 Exxon Mobil 11.93 13.78 Mean 8.79 9.83 41 Upper Limit 5.88 8.48 9.42 9.70 9.20 10.75 10.56 11.35 11.62 12.14 11.96 12.26 12.61 12.52 12.32. 12.82 13.56 13.95 13.66 15.63 10.88 Table 4.5: 95% Confidence Interval of exp(—a) in 10-thousandth Lower Upper Stock Limit Estimate • Limit Boeing 641.90 133.90 . 27.93 Walt Disney 80.97 12.96 2.07 General Motors 31.08 5.01 0.81 IBM 38.97 4.88 0.61 Caterpillar 14.80 3.87 1.01 United Technologies 9.82 0.21 1.45 Honeywell 1.04 4.14 0.26 International Paper 0.12 7.03 0.91 Eastman Kodak 2.90 0.51 0.09 Merck 0.41 3.15 0.05 Alcoa 1.46 0.31 0.06 McDonald's 1.46 0.26 0.05 E.I. Dupont de Nemours 1.19 0.20 0.03 Johnson & Johnson 0.17 0.04 0.77 Philip Morris 0.04 0.57 0.16 General Electric 0.14 0.68 0.03 Coca-Cola 0.43 0.07 0.01 3M 0.42 0.06 0.01 Procter & Gamble 0.08 0.03 0.01 Exxon Mobil 0.07 0.01 0.00 Mean 0.54 1.53 0.19 For instance, the figure for Boeing of 133.90 indicates that the value of exp(—a) is estimated as 133.90/1E4 or 1.3390%. Thus, it is estimated that (Em - S) decreases with only 1.3390% of last year value remains or equivalently by 98.661% per year. Similarly, for Coca-Cola, its (Em — <5) is estimated to decrease by 100-0.0007 or 99.9993%. Thus, even though Em -> S very fast for both stocks, Coca-Cola converges at a much higher rate. 42 Table 4.6: 95% Confidence Interval of 8 Lower Stock Limit Estimate Eastman Kodak 0.0531 0.0957 IBM 0.0390 0.0974 Caterpillar 0.0714 0.1084 General Motors 0.1001 0.1222 International Paper 0.0975 0.1264 3M 0.1001 0.1302 Exxon Mobil 0.1087 0.1375 Procter & Gamble 0.0992 0.1409 0.1179 0.1497 E.I. Dupont de Nemours 0.1255 0.1620 McDonald's General Electric 0.1201 0.1623 Walt Disney 0.1113 0.1683 Honeywell 0.1375 0.1686 Johnson & Johnson 0.1262 0.1686 Alcoa 0.1445 0.1703 Philip Morris 0.1525 0.1859 United Technologies 0.1623 0.1932 Merck 0.1481 0.1936 Coca-Cola 0.1404 0.1943 Boeing 0.1449 0.1977 Mean 0.1384 0.1537 43 Upper Limit 0.1383 0.1558 0.1455 0.1443 0.1554 0.1603 0.1664 0.1826 0.1814 0.1986 0.2045 0.2254 0.1996 0.2111 0.1961 0.2194 0.2241 0.2392 0.2483 0.2505 0.1689 Table 4.7: 95% Confidence Interval of [3 Lower Upper Stock Limit Estimate Limit Boeing 0.2462 0.2982 0.3424 IBM 0.2845 0.3238 0.3588 Walt Disney 0.2878 0.3281 0.3640 Merck 0.3024 0:3364 0.3672 General Motors 0.3121 0.3462 0.3772 Caterpillar 0.3239 0.3524 0.3788 International Paper 0.3233. 0.3527 0.3799 Alcoa 0.3318 0.3536 0.3742 Eastman Kodak 0.3321 0.3556 0.3777 Honeywell 0.3373 0.3561 0.3739 United Technologies 0.3325 0.3566 0.3791 E.I. Dupont de Nemours 0.3372 0.3577 0.3771 3M 0.3394 0.3605 0.3804 Exxon Mobil 0.3427 0.3640 0.3841 Coca-Cola 0.3442 0.3644 0.3836 Philip Morris 0.3514 0.3671 0.3822 McDonald's 0.3494 0.3685 0.3866 Johnson & Johnson 0.3539 0.3701 0.3857 Procter & Gamble 0.3760 0.3807 0.3853 General Electric 0.3739 0.3812 0.3884 Mean 0.3451 0.3542 0.3631 44 Table 4.8: 95% Confidence Interval of a Lower Upper Stock Limit Estimate Limit 3M 0.2111 0.2332 0.2533 Johnson & Johnson 0.2252 0.2389 0.2518 General Electric 0.2179 0.2416 0.2633 IBM 0.2364 0.2466 0.2564 Merck 0.2424 0.2526 0.2625 E J . Dupont de Nemours 0.2435 0.2573 0.2703 Coca-Cola 0.2346 0.2597 0.2826 United Technologies 0.2478 0.2603 0.2722 General Motors 0.2393 0.2606 0.2803 Caterpillar 0.2313 0.2608 0.2873 Exxon Mobil 0.2154 0.2613 0.3004 McDonald's 0.2472 0.2638 0.2794 Procter & Gamble 0.2109 0.2682 0.3152 Boeing 0.2473 0.2697 0.2905 International Paper 0.2466 0.2728 0.2967 Walt Disney 0.2616 0.2836 0.3040 Alcoa 0.2705 0.2936 0.3150 Philip Morris 0.2718 0.2983 0.3225 Eastman Kodak 0.2712 0.2989 0.3242 0.2814 0.3234 0.3606 Honeywell Mean 0.2570 0.2682 0.2789 45 Figure 4.2: 95% Confidence Intervals for Parameters alpha delta o i 4 r—i .6 1—T 8 10 ' 12 1 { 14 16 o _J 3 i——r ' 0.05 0.10 Ci's Sigma beta 0.15 0.20 o r " 0.25 0:26 i i i—i—i—r .0.30 0.34 0.38 Cl's Cl's Lengend for Stocks Lengend for Stocks (cont'd) 10 =.IBM 9 = Honeywell 8 = General Motors 7 = General Electric 6 = Eastman Kodak 5 = Walt Disney 4 = E.I. Dupont de Nemours 3 = Caterpillar 2 = Boeing 1 = Alcoa Cl's 46 OJ O CM _ CO _^ <° _ -SI- _ CN T— _ 21 20 19 18 17 16 15 14 13 12 11 = Mean = Exxon Mobil = United Technologies Procter & Gamble - Merck = Philip Morris = 3M = McDonald's = Coca-Cola = Johnson & Johnson — Internation Paper of e~ at is larger than the value corresponding to the mean a, Ep t or E(dR/dt) of this "sample path" approaches to its limit slower than an average path. While its volatility of return is slightly smaller than the average (in a ratio of 0.2347/0.2682 or 87.5%), its volatility of mean rate of return is slightly higher than the average (in ratio of 0.4351/0.3542 or 1.23 times). Finally, its rate of return is only 0.0042/0.1537 or 2.7% of the average value. One can compare the estimates of this path with the average values of the estimates of the same stock (i.e., Coca-Cola) — a. — 11.80, 8 = 0.1943, (3 - 0.3644 and a = 0.2597 from Table 4.1 on page 34 — to see how does this path compared with the other 25 paths of the same stock. The "sample path" of General Motors starting in year 1978 requires the most iterations, namely, 1005 steps, to achieve an accuracy of 4.996E-4. Its estimates are a = 0.87, 8 = 0.1478, [3 = 0.0460 and a = 0.2242, as compared with the means of the stocks: a = 9.83, S = 0.1537, (3 = 0.3542 and a = 0.2682. Caterpillar's "sample path" starting in year 1977 achieves the best accuracy of 3.21E-4, as compared with the average accuracy of 4.84E-4. Indeed, it only takes 75 iterations to reach this accuracy, even faster than the average of 354 iterations. Its estimates are a = 0.0080, S = 0.2727, (3 = 0.0236 and a = 0.3177. Finally, the "sample path" of 3 M starting in year 1976 has the worst accuracy of 4.9999E-4, just marginally below the tolerance 5E-5 specified. It takes 610 steps, almost double the average of 354 iteration to achieve this accuracy. Its estimates are a = 3.11, 8 = 0.0543, (3 = 0.1778 and a = 0.2400. 47 Chapter 5 Future Works 5.1 Estimation Methods Better Estimation for 5: Currently, two simple estimators for S are used as introduced in Section 3.2, with both reply on the relation (3.6): E ^ f ) =K E = 8 + (E j -8)e- , at l <i (3.6) and use the forward difference to estimate dR/dt. While the first estimator uses least squares fit to regress estimate of E the asymptotic behaviour of E estimates of E (^^j onto e~ , at the second estimator relies on as t —> oo and thus takes the sample mean of from certain latter portion of the data. Future works would use two-dimensional Kalman filtering to improve such an estimate. Better Initial Values for Iteration: To start the iterative estimation procedure, initial values are required for the parameters to be estimated. Currently, the initial values for mo and d are chosen according to their meanings. However, that of a, though is an educational guess, is somewhat subjective. Good initial values may either improve the speed of convergence or even completely avoid divergence. (Since each iteration involves quite heavy computations based on more than 1000 data points, anything that helps reducing the number of iterations would be desirable.) 48 One may employ a less sophisticated method to estimate a parameter value and use it as the initial value of the parameter. This idea, is commonly used in statistical estimation of parameter: as suggested in [13], when the maximum likelihood estimation leads to iteration and hence requires initial estimates, one often takes the estimates from method of moments as the initial estimates. Because this choice of initial value is data-driven, it is less subjective. Often, such a data-driven initial value is closer to the actual value to be estimated and thus leads to better convergence. However, in order to have this idea works, the estimate from the less sophisticated method should be "stand-alone", in sense that its calculation should not involve any other unknown parameter values. Otherwise, such a method of getting an initial value would be itself iterative and hence requires an initial value as well! Improvement on Evaluation of Integrals: In each iteration of the estimation procedure, quite a number of integrals are evaluated based on more than 1000 discrete data points. It would be desirable to use method with high accuracy to evaluate those integrals. Currently, Simpson's rule is used for those integral evaluations. Trapezoid rule, which performs even worse than Simpson's rule, is used in two integrals. Of interest is to employ some computationally simple methods but with, accuracy better than 0 ( ? i ) , the accuracy of Simpson's rule, where n denotes - 4 the number of intervals used in the numerical integration. For instance, one may first derive an approximating function for the integrand to be integrated based on the discrete data. Then use the corresponding integral of the approximating function to approximate the desired integral wanted. Difficulty of such approach lies in choice of the class of approximating functions. One may propose splines, whose construction is computational simple if the subintervals are uniform width, which in the case for most of the integrals in our data. However, the splines, though are constructed piecewise, are fairly smooth by construction with the degree of smoothness depends on the conditions imposed. Because our data involve daily return of stock, smoothness may not be a desirable feature! However, there is no knowledge 49 to guide us to characterize such a "rough" feature. 5.2 Model Validation and Performance Once the unknown parameter values are estimated, a model for the stock return and hence closing price would be available. Examinations of the validation and performance of such a model are then called for. One possible method for such examination is described below. The estimating model is first used to forecast the future closing prices of each stock for, say, one year. 1 These forecasted closing prices are then compared with the actual observed prices to see if the discrepancy is within expectation or if the model is "statistical significance". Many measures of discrepancy are available; mean squared error (MSE) and mean absolute discrepancy ( M A D ) possibly are the two commonly used measures in literature. 2 If closing prices are not available for year 2002, the year right after the last year covered by the current data used for estimation, this comparison would not be possible. In the case, the cross-validation as described below would be necessary. • Split the current into two parts. • The first part of data covering the early years would be used to estimate the model parameters and to forecast the "future" closing prices for the year(s) covered in the second part. • The second part covering the later year(s), or maybe just the last year, would be used for validation. A few words are important about the difference between "statistically significance" and practical importance. When we check if the discrepancy is within expectation, we are considering if chance variation alone can account for the observed For forecasting, see temporal variation in Section 5.3 below. Basically, they are measures of discrepancy in and Li, respectively. 1 2 50 discrepancy or if the proposed model statistically account for the observed pattern in the data. In that case, we say the result is statistically significant. However, when some results are statistically significant, they may not be useful or practically important! It is so if the discrepancy, though can be accounted for by chance variations alone, leads to a big loss (of money in our case). In that case, possibly no one would use such result to forecast the future closing prices and then wait for a big loss. A excellent discussion about this difference is provided in [6]. 5.3 Statistical Analysis of Parameters Temporal Variation: One possible application of this work is to forecast the closing prices of stocks. To this end, more understanding on the parameters of the mechanism that derives the closing price would be called. Particular importance would be to examine the potential temporal variations of the parameter values over the years covered by the data. If such temporal variation presents, it should be incorporated into the forecasting. This analysis is allowed in our current data. Recall that, this work examines 20 stocks over 30 years. More specifically, parameters are estimated using five-year data of each stock. This leads to 26 estimates of each parameters over time for each stock. Time series analysis would help to examine such temporal variations. See [1] for an excellent introduction to the theory. Note that the 26 estimates are derived based on "overlapping years": years 1972-1976 for the first estimate, years 1973-1977 for the second estimates, etc. This may complicate such examination. Clustering Analysis: The parameter values may bear some systematic structure with the stocks. Clustering analysis on the parameter values may help to discover group structure amongst the stocks. 3 There are many reasons for searching such One may examine the 20 stocks considered here or. possibly more sensible, extend to a larger set of stocks in the trade market. 3 51 potential groups or clusters. For instance, close examination of the stocks in resulting clusters by experts may indicate that the group structure reflects risk associated with the stocks.. In that case, it would be reasonable to pick stocks from different clusters to form a basket or portfolio to invest. To this end, a set of selected parameter values of each stock would be treated as a set of multi-variate measurements, like the measurements of sepal length, sepal width, petal length and petal width on 150 specimens of iris plant in the famous Fisher-Anderson iris data. If the temporal variation of the parameters is significant, one should somewhat incorporate such variation into the clustering analysis. This would certainly make the analysis more interesting. Two general references for cluster analysis are [7] and [8]. 52 Bibliography [1] Brockwell, P . J . , and Davis, R . A . Introduction to Time Series Analysis and Forecasting, Second Edition. Springer, 2002. [2] Burden, R., and Faires, J . D . Numerical Analysis. Sixth Edition. Brooks/Cole, Pacific Groove, 1997, p. 169 and p.640. [3] Chambers, J . M . , Cleveland, W . S . , Kleiner, B . , and Tukey, P.A. Graphical Methods for Data Analysis. Wadsworth, Belmont, California, 1983. [4] Cleveland, W . S . Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74, 829-836 (1979). [5] Elliott, R . J . , and Krishnamurthy, V . Exact finite-dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian system. S I A M Journal on Control and Optimization 35, 1908-23 (1997). [6] Freedman, D., Pisani, R., Purves, R., and Adhikari, A . Statistics. Second E d i tion. Norton, New York, 1991, p.500-1. [7] Gordon, A . D . Classification; Methods for the Exploratory Analysis of Multi- variate Data. Chapman and Hall, London, 1981. [8] Hartigan J . A . Clustering Algorithms. John Wiley and Sons, New York, 1975. 53 [9] Kleinbaum, D . G . , Kupper, L . L . , Muller, K . E . , and Nizam, A . Applied Regression Analysis and Other Multivariable Methods, T h i r d Editon. Duxbury Press, Pacific Grove, California, 1998. [10] Lakner, P. Optimal Trading Strategy for an Inverstor: the Case of Partial Information. Stochastic Process and their Applications 76, 77-97 (1998). [11] Liptser, R . N . , and Shiryayev, A . N . Statistics of Random Process II. Translated by A . B . Aries. Springer-Verlag, New York, 1978, Chapters 11 and 12. [12] 0ksendal, B . Stochastic Differential Equations: An Introduction -with Applications, Fifth Edition. Springer-Verlag, New York, 2000. [13] Rice, J . A. Mathematical Statistics and Data Analysis, Second Edition. Duxbury Press, Belmont, California, 1995, p.257. [14] Sass, J . , and Haussmann, U . G . Optimizing the terminal wealth under partial information: The drift process as a, continuous time Markov chain. Finance Stochast. 8, 553-577 (2004). [15] Seydel, R. Tools for Computational Finance. Springer-Verlag, New York, 2000. [16] Weisberg, S. Applied Linear Regression, Second Edition. John Wiley and Sons, New York, 1985. 54 Appendix A Solving IVP (2.23) for -y t It 2 dt It' + [aaf + 2aj + (era) 2 t It = \aa) + era -[V + 2 2 + 0' (A.l) p][V -p], t + 0 t where V = ^ + era and p = (era) + 0 . Note that 2 2 2 t dVt dt dt 1 dru a dt djt dt ' dV t dt Combining ( A . l ) and (A.2), we have dV dt -[Vt-p][V t + p]. t dVt [Vt-pWt _ dt + p] 1 / 1 1 dt 2p\V -p V +p a t t 55 (A.2) L (^ ~vtTp) -- ii dVt dt 2 P log V -p Vt + p. t V 0 (Vt-p)/(Vt + p) log (V - p)/(V + p) (Vt-P) (Vo-P) exp (Vt + p) (V + p) 0 IP. 0 -2pt = 0 V - p = e (V + p), where e = t t t t V (l-£ )=p(et t t 7t or Vo + p exp + l), t V Vh era = p aa I , 7t = °~ where (A.3) P 70 . h era, Vb (A.4) ^0 et .VQ + -2pt exp P, z, if z> 0; e, otherwise. Here, e denotes a small positive random number. . and (A.5) See Section 3.4 regarding the justification for the adjustment [.] on page 26. + Note that this solution is the same as provided in equations (4.11)-(4.14) on p.85 of [10]. Particularly, Lakner put the solution j(t) in equation (4.11) as 7 = jg Ci exp{2(VC/a)t} a <J + C 2 C7 exp{2(v C/cr)i}-C2 / 1 where C = aa C\ = fo + ^ + VCa C = 70 + o- -\/C 2 2 2 + 0 2 2 56 a Hence, VC = J(aa) + (3 = p, 2 by(A.3) 2 C2 70 + a - VC cr CI 70 + Cr + V Ccr (A.6) 2 2 / [(lo/cr) + g] - s/C [(70/cr) + a] + VC Vo-P by (A.4) and (A.6) Vo + p 2pt by (A.5). it exp (A.7) Thus. Ciexp[2( C/<7)t] + C = •• C exp[2(VC/a)t]~C / i(t) VOa m v aer 2 1 2 2 exp[2(VC/a)i] + ( C / C i ) 2 — aa exp[2(>/C/a)t] - ( C / C i ) exp(2pr./<j) + e exp(2pt/<r)era y by (A.6) and (A.5) exp(2pi/cr) - e exp(2pt/cr) 2 t t p. 1 - £t era as derived above. This is not a coincidence. Recall that while Lakner worked with model for (fit, Rt) in (1.1) and (1.2) directly, we consider a transformed process (fit, Rt) as fit = fit—8 in (2-3) and Rt = R —8t in (2.4). Since 5 is constant with no randomness, t the posterior variances of fi and fl should be the same. As Lakner worked with t t the former and we work with the latter, we should get the same posterior variance as Lakner obtained. 57 Appendix B Historical Data The data available consists of daily closing prices for 30 years, from January 21, 1972 to December 31, 2001 of 20 stocks listed on the New York Stock Exchange. Altogether, this consists of 7561 trading days for each stock. These prices have been somehow adjusted for inflation, stock splits, dividend paid and other factors. We choose these stocks partly because their adjusted prices are available for such a long history of 30 years. These stocks come from various areas and industries. They include mining, science and technology, machinery and equipment such as Alcoa, Boeing, Caterpillar, E.I. DuPont de Nemours, General Electric, General motors, Honeywell International, I B M , 3 M and United Technologies; services, food and beverage, consumer products and entertainment such as Coca-Cola, McDonald's, Philip Morris, Procter & Gamble and Walt Disney; health care and pharmaceutical such as Johnson & Johnson and Merck; energy such as Exxon Mobil; imaging products such as Eastman Kodak; and paper and forest products such as International Paper. A few words possibly worth mentioning about the way in which the data were originally organized when received. Originally they were in reverse chronological order, with December 2001 on the top running back to January 1972 on the bottom of each list. For the convenience we reverse order. In any case, one should be aware of this fact and access the data accordingly. 58 Appendix C Computational Details C.l Differential Equations (2.22) and (2.23) for m and t 7t in Theorem 2.2 Recall from Section 2.1 that a {t,R) = 0, ai{t,R) = -a, b\(t,R) = (3, b (t,£) 0 A (t,R) 0 = 0, Ai(t,R ) t = 1 and B(t,R) 2 = a. So, the differential equations (2.20) and (2.21) for m and 7t from Theorem 2.2 become: t dfn t = , R) + a t, R ] M l{ dt )mt + ^ + ) fi x [dfij - (4 (t, i?) + A i ( t , £ ) m ) dt] 0 = t —am dt -\—= (dR — mtdt) t and ^ = 2 (t,R) ai = -2a + bl(t,R) + b (t,R) 2 lt 2 + /3 -4, 2 7 t = 0; as equations (2.22) and (2.23) on page 11. 59 b (t,R)B(t,R)+j A (t,R) 2 t 1 B(t, R) |2 C.2 Backward Equations (2.37) and (2.38) for m{s,t) and 7(5, t) and differential Equation (2.39) for Lp in v s Theorem 2.3 Recall from Section 2.1 that a (t,R) = 0, ai(t,R) 0 -4n(t-, R) = 0, A-[(t,R ) = -a, b\(t,R) = f3, b {t.X) = 0; 2 = 1 and modification from Section 2.3 that B\ = 0 and t i?2 = cr. Thus, Z?o73 = B B[ + B B boB = biB[+ b B' = fix 0 + 0 x o-= 0, a(*,z) = ai(t,a;) - ( 6 o # ) ( t , x ) {B o B)~ (t, x) Ai(t,x) c(t,re) * ^ ( t , re) ( B o i ? ) - ^ ! ) ^ ^ ! ) = l x a ' 1 2 2 = cr , 2 2 2 l 1 = - a - 0= -a, x l = a 2 - 2 . Thus, (2.30), (2.31) and (2.32) become: m(M) = m +j\(s,u)(^(R))'A[(u,R)(BoB)- (u,R) l s x [d/lu - ( A ( u , R) + Ai (u, R) fhy) du\ 0 J 7(s, w)v?" 1 cr~ [dRu - (0 + 1 m ) du] 2 u m . + cr~ J 2 7(M) 7 ( s , u)(Ps[dRu - m du], u = {i + i, [ A [ ( u , f y ( B o B ) - \ u , R ^ = {l.+7,£^lcr- l^ri } ' l 2 w i(t,R)-T(t,s) dt l s c(t,R)]<p,(R) i-a-r(t,s)cr- }<p 2 s or - [ a + cr- r( ,s)](p«, 2 = u as equations (2.37), (2.38) and (2.39) on page 15: 60 " 7.s C.3 Definition and Computation of R t B y definition, we have dR = ^ , ' t (C.l) where Rt = R(t) and St = S(t). Below we apply the Ito formula to get a computational formula for Rf. R = \ o g ^ + ~ a h . (C.2) t Changing variable in ( C . l ) from t to r and integrating both sides from r = 0 to r = t gives (C.3) Since S T is stochastic, we need to apply the Ito formula to evaluate the stochastic integral on the R H S of (C.3). Let us first state the one-dimensional Ito formula, cf. Theorem 4.1.2 on p.44 of [12] Theorem C . l (One-dimensional Ito formula) Let X be an ltd process given t by dX = udt + vdB t Letg(t,x) t 6 C ([0,co)x7e) (i.e., g is twice continuously differentiable in [0, oo) xTZ). 2 Then Yt = g(t,X ) t is again an ltd process, and dY = ^(t,X )dt t where (dX ) 2 t t + ^(t,X )dX t t + ^(t,X ) t • [dX )\ t (C.4) = (dXt) • (dXt) is computed according to the rules dt-dt = dt- dB = dB -dt = 0, t t 61 dB • dB = dt. t t (C.5) Let us apply the l t d formula to evaluate the stochastic integral on the R H S of (C.3) by taking Y = g(t, St) = log (St) in order to obtain the integrand 4jk. Then t !<«.*>=°. f ^ St)= s 2ds 2{h ~~ l > h aild sf Thus, the Ito formula (C.4) implies diog(s ) t = %{t, s ) dt + ^(t, s ) ds + l ~ ( t , s ) • (ds' ) 6V ' ' 6V ' ' ' 2dx 2 t 1 1 t L t 1 t 2 t r I. • v«^t H (dS ) (C.6) 2 St 25 t 2 Now, for the last term on the R H S of (C.6), i.e., (dS ) , recall that dR = dS /S 2 t t t t or dSt = St dR and from (1.2) on page 1 in Chapter 1 that t dR = fj, dt + adWJ; . (1.2) 2) t t Combining gives dS = fitStdt + aS dW^ . ] t t Now, using the rules given by (C.5), we have (dS ) 2 t = (dSt)-(dS ) t = (jitSt dt + o-St dW^ ) = n S dt-dt + utaS? dt • dWp 2 t 2 2 2 -dW {2) t {2) t dW ) {2) t ] + a S dW = • [fi St dt + aS 2) t + an Sf dWJ; • dt 2) t . a S dt, 2 2 according to the rules given by (C.5). That is, (dSt) = o- S\dt 2 2 62 (C.7) Putting (C.7) into the R H S of (C.6) to eliminate (dS ) 2 t dlog(5 ) - t S 2Sf dS 1 , ~~S~t 2 yields t t 2 CT d + (C.8) L Transposing terms in (C.8), changing variables from t to r and then integrating both sides from r = 0 to r = t yields f ^ J =0 = O T f rt f dlo (S )+ g T J o T J T= = rt 2 T = log(5 )-log(5 ) + t 0 \a dr 0 2 ^ t (C.9) 2 Combining (C.3) and (C.9) gives /•*//<? -ir Rt-Ro= i = log(St) - iog(So) + -y t. Because RQ = 0, we have .Rt = log ,S J (C2) • 2 0 as stated in (C.2) on page 61 above. Using (C.2) for the difference of two R 's, apart by h in time, gives: t Rt+h — Rt C.4 St+h\ 1 l o g ( ^ ) + > ( t + /i) So J 2 S W / i \ , 1 2. log 2 log (CIO) Calculation of Yh in the Estimation of a 2 Recall from Section 3.1 page 18, writing Yj for YjAt, A 1 n _ 1 1 (3.3) n Yj — ~ Yl Th (Rkm+j ~ Rkm) , where j = 1,2,..., m, h = jAt and, for instance, Rkm.+j denotes (km + j) th observation of the return process; that is, Rkm+j = R{(km + j)At) 63 discrete = R(kmAt + h), because h = jAt. Note from (3.3) that Yj is the mean of squares of n differences divided by h. Expanding the summation in (3.3) gives y. 3 — _JL ~ nh \fn. D . \ / D D (Rj - Ro)' +i (Rrn+j - Rr y^2 + (R 2 n H 1- ^ i ? ( _ ! ) n m+J 2m+j R( -l)r, -- - R ) 2m 21 (C.ll) n Enumerating ( C . l l ) for j = 1, 2 , . . . , m gives F = l " ^ [(^1 " ^ O ) H y ^ 2 + ( i W l - (C.12) 2 2 - R) + (R 2 r o + 2 m 2m+2 - i?. ) 2 2m (C.13) h (i?( -l)m+2 - ^(n-l)m) n Rm - R(l) + (R2m ~ Rm) + {Rim ~ R2m) 2 H h (^Rnm ~ the discrete observations {Ro, R\,..., It (C.14) 1 To this end, first put -ftjv-i} into an m-by-n matrix columnwise as shown below, where n = I ^—-1 = l>. 2 R(n-l)m) It is easier to think of this computation in matrix form. A 2 m+ [(i? - #o) + ( / ? nh / ^2m) h ( - R ( „ _ i ) i - R(n-l)m) H Y™ + {Rm+l ~ Rmf 2 2 Ri Rm+l I? •R m+1 • ' • I) R2 Rm+2 R2m+2 • • • R(n- l)m+2 Rs Rm+3 R2m+3 • • • R(n- l)m+3 Rm R2m Rim Rmn = RN-I RN 2 ' ' R{n- ])m+l \ I 'my. n As a check, note that the element a ^ m n has index raxi! = J V - l , because n — as it should be. 1 2 In fact, matrix computation is preferable to using for loops in M A T L A B . Again, for simplify, we assume AT — 1 is a multiple of m. 64 Next, take the last row of matrix A, except the element in the last column (i.e., in Matlab notation, A(m, 1 : (n — 1))). This gives a (n — l)-vector. Insert i?n in front of this (n — l)-vector to form a n-vector. Let us denote this n-vector by For Yj in (C.12)-(C.14), subtract A from the j th m t m row of A element by element; then divide the mean of square of those differences by h — jAt would yield rn pairs of (h, Y/ ) for h = A t , 2 A i , . . . , A. to get Yj. These mAt. Finally, as (3.5) on page 18 (3.5) indicates, we should work with the ratio of prices, not the difference of returns. We can simply construct a matrix similar to one above using prices In lieu of returns. Then construct the n-vector A m from the last row and the other rows as before from the newly constructed matrix. Instead of taking difference, we then take ratio element by element, etc. 65
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Modeling the return rates of stock via EM algorithm
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Modeling the return rates of stock via EM algorithm Chou, Steve Che-Ming 2005
pdf
Page Metadata
Item Metadata
Title | Modeling the return rates of stock via EM algorithm |
Creator |
Chou, Steve Che-Ming |
Date Issued | 2005 |
Description | We consider a multi-stock market, where daily return process of each stock together with its mean rate of daily return are assumed to follow a continuous diffusion process, which is similar to a state-space system with linear Gaussian dynamics. Our major objective is to estimate the model parameters based on historical data. Our estimation method is an iterative method bases on the expectation maximization (or EM) algorithm. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2009-12-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0079632 |
URI | http://hdl.handle.net/2429/16495 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2005-0402.pdf [ 2.44MB ]
- Metadata
- JSON: 831-1.0079632.json
- JSON-LD: 831-1.0079632-ld.json
- RDF/XML (Pretty): 831-1.0079632-rdf.xml
- RDF/JSON: 831-1.0079632-rdf.json
- Turtle: 831-1.0079632-turtle.txt
- N-Triples: 831-1.0079632-rdf-ntriples.txt
- Original Record: 831-1.0079632-source.json
- Full Text
- 831-1.0079632-fulltext.txt
- Citation
- 831-1.0079632.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0079632/manifest