{"http:\/\/dx.doi.org\/10.14288\/1.0097525":{"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool":[{"value":"Science, Faculty of","type":"literal","lang":"en"},{"value":"Statistics, Department of","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider":[{"value":"DSpace","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeCampus":[{"value":"UBCV","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/creator":[{"value":"Li, Bing","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/issued":[{"value":"2010-08-21T20:20:54Z","type":"literal","lang":"en"},{"value":"1989","type":"literal","lang":"en"}],"http:\/\/vivoweb.org\/ontology\/core#relatedDegree":[{"value":"Master of Science - MSc","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeGrantor":[{"value":"University of British Columbia","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/description":[{"value":"Three methods are used to analyse the relationships between asthma and pollution levels in Prince George. The first is the spectral analysis approach, which concentrates on the spectra of the pollusion level series and the asthma counts series, their cross spectra, their coherency, and a measure of the linear relationship between them. The other two methods are what we term the \"generalized harmonic process\" and the \"generalized autoregressive approach\", which generalize the traditional harmonic process and autoregressive model and allow us to analyse time series with less restricted distribution assumption such as discrete time series. The analyses suggests a weak positive association between asthma and the pollution levels of total reduced sulphates.","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO":[{"value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/27582?expand=metadata","type":"literal","lang":"en"}],"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note":[{"value":"THE RELATIONSHIP BETWEEN ASTHMA AND POLLUTION LEVELS IN PRINCE GEORGE, BRITISH COLUMBIA By Bing Li B.Sc, Beijing Institute of Technology, 1982 M.Sc, Beijing Institute of Technology, 1986 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September 1989 \u00a9Bing Li, 1989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada Date y )ep9J j i ^ Q DE-6 (2\/88) Abstract Three methods are used to analyse the relationships between asthma and pol-lution levels in Prince George. The first is the spectral analysis approach, which concentrates on the spectra of the pollusion level series and the asthma counts series, their cross spectra, their coherency, and a measure of the linear relationship between them. The other two methods are what we term the \"generalized harmonic process\" and the \"generalized autoregressive approach\", which generalize the traditional har-monic process and autoregressive model and allow us to analyse time series with less restricted distribution assumption such as discrete time series. The analyses suggests a weak positive association between asthma and the pollution levels of total reduced sulphates. n Contents Abstract ii Table of contents iii List of tables vi List of figures viii Acknowledgemet ix 1 Introduction \u2022 \u2022 \u2022_\u2022 1 1.1 Description of the Data Set 1 1.2 Overview 2 2 Methodology 3 2.1 The Spectral Analysis Approach 3 2.1.1 Prewhitening of the X and Y Series 4 2.1.2 Cross Spectrum and Coherency 6 2.1.3 A Measure of Linear Relationship between the Series X and Y 7 2.2 The Generalized Harmonic Process (GHP) Approach 8 2.2.1 The GHP Model .9 2.2.2 The Spectra of GHP and Their Null Distributions 10 2.2.3 Spectra Based on Likelihood Ratio Tests Against Periodicities 11 2.2.4 Application of the GHP Model 13 iii 2.3 The Generalized Autoregressive Model (GAR) Approach 14 2.3.1 The GAR Model 14 2.3.2 Application of the GAR Model 16 3 Relationships of the Health Series and the TRS Series 16 3.1 The Spectral Analysis Approach 16 3.2 The Generalized Harmonic Process Approach 17 3.2.1 The Spectra of ER and AD 18 3.2.2 ER versus TRS 19 3.2.3 AD versus TRS 21 3.3 The Generalized Autoregressive Model Approach 22 3.3.1 ER versus TRS .22 3.3.2 AD versus TRS 24 4 Relationship\u00a9 of the Herlth Series and the TSP Series 24 4.1 The Spectral Analysis Approach 25 4.2 The Generalized Harmonic Process Approach 27 4.2.1 ER versus TSP 28 4.2.2 AD versus TSP 32 4.3 The Generalized Autoregressive Model Approach 36 4.3.1 ER versus TSP 36 4.3.2 AD versus TSP 40 iv 5 Final Remarks Bibliography List of Tables Table 1. Sample Means and Variances 2 Table 2. Week Pattern of ER and AD 18 Table 3. P-values for the Tests for Cycles 19 Table 4. Tests of Significance 21 Table 5. Tests of Significance 23 Table 6. p 2 for ER and AD versus TSP 27 Table 7. Test of Significance for X terms 29 Table 8. Test of Significance for X terms 29 Table 9. Test of Significance for X terms 30 Table 10. Test of Significance for X terms 31 Table 11. Test of Significance for X terms 31 Table 12. Test of Significance for X terms 32 Table 13. Test of Significance for X terms 33 Table 14. Test of Significance for X terms 33 Table 15. Test of Significance for X terms 34 Table 16. Test of Significance for X terms 35 Table 17. Test of Significance for X terms 35 Table 18. Test of Significance for X terms 36 Table 19. Test of Significance for X terms 37 Table 20. Test of Significance for X terms 37 vi Table 21. Test of Significance for X terms 38 Table 22. Test of Significance for X terms 39 Table 23. Test of Significance for X terms 39 Table 24. Test of Significance for X terms 40 Table 25. Test of Significance for X terms 41 Table 26. Test of Significance for X terms 41 Table 27. Test of Significance for X terms 42 Table 28. Test of Significance for X terms 42 Table 29. Test of Significance for X terms 43 Table 30. Test of Significance for X terms ; 43 vn List of Figures Figure 1. Histograms of ER and AD 48 Figure 2. Comparison of Whitened vs Unwhitened Spectra 49 \u2022 Figure 3. Coherencies between Whitened Series 50 Figure 4. Proposed Spectra of ER 51 Figure 5. Proposed Spectra of AD 52 Figure 6. Comparison between the Whitened and Unwhitened Spectra of ER .. 53 Figure 7. Comparison between the Whitened and Unwhitened Spectra of AD .. 55 Figure 8. Comparison between the Whitened and Unwhitened Spectra of TSP . 57 Figure 9. Coherencies between Whitened ER and Whitened TSP 58 Figure 10. Coherencies between Whitened AD and Whitened TSP 60 vm Acknowledgement I would like to thank Dr. J. Petkau, my supervisor, for his very helpful comments, suggestions and advices on this thesis as well as many other aspects of my study at UBC, and for his patiently reviewing the draft. I would also like to thank Dr. P. De Jong for reading the thesis and making valuable suggestions. I am very grateful to the Department of Statistics for providing me the financial support. ix 1 Introduction 1.1 Description of the Data Set One part of the data set to be analyzed consists of the counts of the number of admissions for asthma to the single hospital in Prince George, which will be referred to as AD, and those of visits to the emergency room of that hospital, which will be referred to as ER. These counts were extracted from the hospital records for the period April 1, 1984 to March 31, 1986. The other part of the data is the levels of total suspended particulates (TSP) and the levels of total reduced sulphates (TRS) in Prince George for the same period. The TRS levels were measured at six monitoring stations on a daily basis, whereas the TSP levels were measured at five of the above stations, and are available only every sixth day. We estimated the missing values in TSP and TRS via the EM algorithm of Demp-ster, Laird and Rubin [2]. To reduce the pollutant series to single series describing the levels experienced in Prince George, we then take the average of TSP across the six stations and of TRS across the five stations. There is no missing data in the ER and AD series. There has long been a public concern in Prince George that the air pollution in the city may be influencing the health of the residents, since the pollution levels often exceed the provincial air quality standards. The objective of the work reported in this thesis was to study the relationships between the two health series, AD and ER, 1 and the two pollution series, TRS and TSP, to examine whether any associations are apparent between human health and the ambient levels of air pollution in Prince George. The hospital admission counts for asthma on a specific day can be viewed as the outcome of a large number of Bernoulli experiments (one for each of the residents of Prince George), each of which has a very small corresponding probability of suc-cess (being admitted to the hospital for asthma). So, as a starting point, it seems reasonable to treat the admission counts as the observed values of Poisson random variables. A similar argument can be made for the emergency room visit counts. We can see from Table 1 that the sample means and variances of ER and AD are quite close, which agrees with the Poisson assumption. Table 1: Sample Means and Variances Sample Mean Sample Variance ER 0.75 0.86 AS 0.55 0.51 The histograms of ER and AD are plotted in Figure 1; these also suggest that a Poisson model may be reasonable. 1.2 Overview We shall use three different methods to study these relationships. Section 2 discusses the theoretical aspects of these methods. The results of their application to studying the above relationships are presented in Sections 3 and 4. Since TRS 2 and T S P are collected differently, the relationships between the health series and the T R S series are examined in Section 3, while those between the health series and the T S P series are examined in Section 4. In Section 5 we summarize the results of these analyses and briefly discuss other possible methods of analysis. The methods used here differ from those employed in [3] and [4], and the results obtained complement those reported in these earlier studies. 2 Methodology The first method to be applied is the traditional spectral analysis approach, car-ried out as a preliminary and primarily descriptive analysis. It concentrates on the spectra of each pair of X and Y series, their cross spectrum, their coherency, and finally a measure of the linear relationship between them. The other two methods are combinations of ideas and techniques from the methodology of generalized linear models ( G L I M ; see [5]) and time series analysis. One method will be referred to as the generalized harmonic process (GHP) , and combines G L I M with the spectral analysis, the other method will be referred to as the generalized autoregressive model ( G A R ) , and combines G L I M with autoregressive models. 2.1 The Spectral Analysis Approach In this section, we describe the details of a method of measuring the strength of linear relationship between two time series {Xt, t \u2014 1,2,...} and {Yt, t = 1,2,...}, which will be referred to as the X and Y series. 3 2.1.1 Prewhitening of the X and Y Series The relationship of interest between the X and Y series is that which excludes any association due to the deterministic patterns each series may have. Therefore it seems reasonable to \"whiten\", that is, to remove the dependence within each of the X and Y series before studying the relationship between the two series. This might be done, for example, by first fitting the X and Y series with AR models, and then considering the relationship between their residuals. A possible concern is that such a whitening process might remove more than just the association due to the deterministic patterns of each series or even that it might remove all the relationship between the X and Y series. The following theorem indicates that this is not the case. If a time series {Xt} is stationary, then it can be represented as (see [6], page 246): J \u2014 TT where the stochastic process Z^(CJ) will be referred to as the spectral representation of the series {Xt}. If we have two series {Xt} and {Yt}, then at each frequency w, the correlation coefficient between the two random variables dZx(u) and dZy(u>), that is, Cor(dZx(^>), dZy(to)), is referred to as the \"complex coherency between series X and Y at frequency and denoted as WXY(W). The absolute value of WXY(W) is called the \"coherency\" between series X and Y at frequency OJ. Theorem (see [6], page 661): Let {Xt}, {Yt}, {X't}, {Y{} be-time series, stationary 4 up to the second order, with +00 +00 X't = YJ auXt-u and Y[ '= ^ t - u -U = \u2014 C O U = \u2014 O O Let Zx(u), Zy(oj), ZX'(u), ZY'(u), be their spectral representations. Then WXY{U) = wx'y>(u). The theorem can be interpreted as \"coherency is invariant under linear transfor-mations\" . Therefore, in the sense of coherency, prewhitening the two time series as described at the beginning of this section would not remove the relationship between them, provided that both of them are stationary and the prewhitening procedure is linear. Based on this consideration, we will use the following whitening process. We first estimate the spectra of the X and Y series (see [6], page 398-399). Eval-uate the estimated covariance function, R(s), as i y t=i the estimated autocorrelation function, r(s), is then given by f(a) = R(s)\/R(0). The spectra are then estimated by: N-1 h(u) = 2 Yl M(s)r(s)cos(oos), (1) s=-(N-l) where M(s) is the \"weight\" assigned to f(s) for each s, which is often referred to as the \"lag window\" (see [6], page 434). In our analyses, we will use the Parzen lag 5 window (see [6], page 443): l-6(s\/M)2 + 6(H\/M)3, M < M\/2, M(s) = < 2(1 - \\s\\\/Mf, M\/2 < \\s\\ < M, 0 \\s\\ > M, with some integer M. We fit each of the X and Y series with an AR model, vising maximum likelihood to estimate the parameters and Akaike information criterion to determine the order. The residual from the fit is the \"whitened\" series. This whitening \"removes\" the au-tocorrelation within each series but, according to the above theorem, does not disturb the intercorrelation between the X and Y series if they are stationary. Comparing the estimated spectra for each pair of whitened and unwhitened series clearly illustrates the effects of the prewhitening. 2.1.2 Cross Spectrum and Coherency Next, we wish to study the relationship between the whitened X and Y series. In the following discussion, if not specifically mentioned, the whitened X and Y series will be simply referred to as the X and Y series. First, we estimate the cross spectra between the X series and the Y series (see [6], page 693): hxrH = E M(s)fXY(s)e-iu'; s=-(N-l) 6 here rxY(s) is the estimated cross-correlation function, YY(0) where RXY(S) is the estimated cross covariance function, RxY(s)=^j:(Xt-T)(Yt+s-Y), with the summation extending from t = 1 to JV \u2014 s for s > 0 and from t = 1 \u2014 s to N for s < 0, and M(s) is the Parzen window centered at the maximum value of TXY-The coherency between time series X and Y is then estimated by: The introduction of the coherency between two time series allows us to have a measure of the strength of linear relationship between the two series, as described in the next subsection. 2.1.3 A Measure of Linear Relationship between the Series X and Y Although coherency gives the correlation between dZx(u) and dZyiui), it corre-sponds to specific frequencies, and is not a direct measure of the linear relationship between the series X and Y. However, based on coherency and spectra, such a measure of linear relationship between two stationary time series is provided by the following theorem. Theorem (see [6], page 675): Suppose Xt and Yt are stationary up to the second order, and Yt can be represented as a linear transformation of Xt of the following hxv(u) (2) 7 form: +00 where e4 and are uncorrelated. Then we have: \u2014 TT \/iyy(cj)j?-Oxy(u;)| 2c?w + a\\ where a\\ \u2014 var(YJ) and o~\\ = var(e4). This result provides a decomposition of the total variance of Yt into the variance explained by the regression of Yt on Xt and the variance of the residuals. So the ratio p2 = 1 \u2014 o~2\/aY represents the extent to which Yt and Xt are linearly related in the above sense. More specifically, it represents the percentage of variation in the series Y explained by its \"best possible\" linear relationship (with infinitely many parameters) with the series X. It can be estimated by p2 = 1 \u2014 a2\/aY, where (see [6], page 675): In our analysis, p2 provides us with a rough idea of the strength of the linear rela-tionship between two time series. 2.2 The Generalized Harmonic Process (GHP) Approach In this section we introduce an approach that can be applied to detect periodicities for data modelled with the exponential family class of distributions. These cycles can 8 then be incorporated into regression models relating two series. Due to its importance for application to the Prince George data, emphasis in the development which follows is given to the Poisson case. 2.2.1 The GHP Model Assume that we have independent observations: Y\\,Y\/v, and and Yt has prob-ability density function proportional to exp[Ytg(0t) + a(6t) + b(Yt)]. (3) Then {Yt} is said to be generated by a GHP if k g(0t) = A + ^(Aismuit + Bicosuiit) (4) t=i for some A, {A,}, {Bi}, and a prespecified set of {w,}. Example 1: If Yt ~ Poisson(At), then g(Xt) = logAt, and k logXt = X + y^(Ajsincj,^ + Bicosujit). i=i We call it the Poisson harmonic process. Example 2: If Yt ~ Binomial(ro,\/>j) with common n, then g(pt) = log (j^-), and log -2\u2014 = A + \u00a3(A,sinu,ti + B j C O S W j t ) . ,=i We call it the binomial harmonic process. Example 3: If ~ N(\/it, 1), then g(pt) \u2014 fJ-t, a n i : U{ = 2iri\/N,i = 1,[N\/2]}, where JV is the number of observations. The set of frequencies thus defined has the orthogonality properties (see [6], page 392). Theorem: Let { (^ i ,uik] he a set of fundamental frequencies, and A, Aj's and J3,'s be the MLE's for A, A;'s and B^s under the model (3). Then, under the null hypothesis: H0 : Ai = 0, Bi = 0, i = 1,k the yjYYAiS and \\pjYJ3,'s are asymptotically independently and identically dis-tributed (iid) as N(0,1), and therefore the yY^A2 - f J3,2)'s are asymptotically iid xl-10 The intuition of this spectrum is similar to that of the traditional one, that is, the \"power\" assigned to each frequency. Proof: It is easy to verify that under model (3), the Fisher information matrix for the MLE's is 1(d) = W'MW, where \/ 1 sinu^l cosu^l sinu>fcl coso^l W 1 sino'12 cosa;i2 sinwA;2 cosa;jt2 ^ 1 s'muJiN cosu>iJV sinu>jtiV cosuikN j and M = diag(Ai,An). Under JET0, M = diag(A,A), and because of the orthog-onality properties of the fundamental frequencies, W is an orthogonal matrix with Ei l r Wfx = N, and \u00a3 \u00a3 x W$ = N\/2,j > 1, we have 2 0 0 hkx2k t I(0Q) = W'MW NX \\ \\ Setting 9 = (Ai, B \\ , A k , Bk)f, under Ho, we have -A^=^N(0,\/2fcx2fc). The theorem follows because under Ho, Y is a consistent estimator of A. 2.2.3 Spectra Based on Likelihood Ratio Tests Against Periodicities In this subsection we introduce other two methods for detecting periodicities, each based on a series of hypothesis tests associated with a prespecified set of fre-quencies. The null hypotheses are that the series has a mean not varying with time; 11 the alternative hypotheses are that the series has a cycle of a specified frequency. A likelihood ratio test can then be constructed with respect to each of the frequencies, and these likelihood ratio tests lead directly to spectra. Proposal 1: Consider the following series of hypothesis tests: : Ai = ... = An = A rl[p) : A; = Xi+P i = 1 , n - p For any value of p, the likelihood ratio is: Rp = 21ogL(H
, R(u) = 21ogL(Hf\")) - 2logL(HS,w)) \u00bb x \\ where logL(Ho^) is the same as before, and n n logL(Hiu')) = ^(A + As'mujt + Bcosut) - ]T) exp(A + Asinut + Bcosut). t=i t=i Then we can plot R(ui) against u. 2.2.4 Application of the GHP Model The spectrum methods introduced above provide an idea of the \"importance\" of each frequency. The \"important\" frequencies can then be included in regression models for Y against X. The applications which follow in Sections 3 and 4 involve the Poisson case and the following procedure will be employed. First, based on the GHP spectrum introduced in Section 2.2.2, we select the set of frequencies corresponding to small P-values from the asscoated x\\ statistics, and combine the terms corresponding to these frequencies with the X terms in the following model: m logAt = A + ^ (At-sina>< + BiCosojt) + J2 A^t-v; (5) w t=0 here m is a relatively large integer believed sufficient to cover the past with which the present could be associated. Then model reduction is based on retaining in the above model only those terms with coefficients which differ significantly from zero 13 (P-value < a.'). This process is repeated in stages, leading to a final model of the form where {Ai}, {Bi}, and {\/?,} are the coefficients surviving the model reduction. At each stage of model reduction, Adeviances are also investigated to make sure the reduction is legitimate. Tests for the \"importance\" of the X terms, and specification of the relationship between {-X\"*}, and {Yt}, if any, are then based on this final model. 2.3 The Generalized Autoregressive Model (GAR) Approach Traditional AR models are most suitable for time series with continuous marginal distributions. The GAR model is developed to describe time series with the expo-nential family class of marginal distributions. This allows us to model time series with discrete marginal distributions. 2.3.1 The GAR Model A sequence of N observations Y\\, ... , Yjv are said to follow an GAR model of order p if they have the following conditional distributions: \/W(V.-|VT) = exP{ylC[0(y?)] + d[0(yf)] + S(y,)}, where 14 c[9(Y?)] = 0o + 0*17, P = (A, A . .\".AO*. Example 1: When the conditional distributions of the Vs are Poisson, we have 9(Y?) = EiYlY?), and c[6{Yft] = logt^r\/)] = 0O + p*Y?. We call it a log-AR model. Example 2: When the conditional distributions of Y's are binomial with common known total count rc, we have nB(YT) = EiYlY?), and c\\6