THE RELATIONSHIP BETWEEN ASTHMA AND POLLUTION LEVELS IN PRINCE GEORGE, BRITISH COLUMBIA By Bing Li B.Sc, Beijing Institute of Technology, 1982 M.Sc, Beijing Institute of Technology, 1986 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September 1989 © B i n g Li, 1989 In presenting this degree at the thesis in University of partial fulfilment of of department this thesis for or by his or requirements British Columbia, I agree that the freely available for reference and study. I further copying the representatives. an advanced Library shall make it agree that permission for extensive scholarly purposes may be her for It is granted by the understood that head of copying my or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada Date DE-6 (2/88) y )ep9J j i^Q Abstract Three methods are used to analyse the relationships between asthma and pollution levels in Prince George. The first is the spectral analysis approach, which concentrates on the spectra of the pollusion level series and the asthma counts series, their cross spectra, their coherency, and a measure of the linear relationship between them. The other two methods are what we term the "generalized harmonic process" and the "generalized autoregressive approach", which generalize the traditional harmonic process and autoregressive model and allow us to analyse time series with less restricted distribution assumption such as discrete time series. The analyses suggests a weak positive association between asthma and the pollution levels of total reduced sulphates. n Contents Abstract ii Table of contents iii List of tables vi List of figures viii Acknowledgemet ix 1 Introduction • • •_• 1 1.1 Description of the Data Set 1 1.2 Overview 2 2 Methodology 3 2.1 The Spectral Analysis Approach 3 2.1.1 Prewhitening of the X and Y Series 4 2.1.2 Cross Spectrum and Coherency 6 2.1.3 A Measure of Linear Relationship between the Series X and Y 7 2.2 The Generalized Harmonic Process (GHP) Approach 8 2.2.1 The GHP Model .9 2.2.2 The Spectra of GHP and Their Null Distributions 10 2.2.3 Spectra Based on Likelihood Ratio Tests Against Periodicities 11 2.2.4 Application of the GHP Model 13 iii 2.3 The Generalized Autoregressive Model (GAR) Approach 14 2.3.1 The GAR Model 14 2.3.2 Application of the GAR Model 16 3 Relationships of the Health Series and the T R S Series 16 3.1 The Spectral Analysis Approach 16 3.2 The Generalized Harmonic Process Approach 17 3.2.1 The Spectra of ER and AD 18 3.2.2 ER versus TRS 19 3.2.3 AD versus TRS 21 3.3 The Generalized Autoregressive Model Approach 22 3.3.1 ER versus TRS .22 3.3.2 AD versus TRS 24 4 Relationship© of the Herlth Series and the T S P Series 24 4.1 The Spectral Analysis Approach 25 4.2 The Generalized Harmonic Process Approach 27 4.2.1 ER versus TSP 28 4.2.2 AD versus TSP 32 4.3 The Generalized Autoregressive Model Approach 36 4.3.1 ER versus TSP 36 4.3.2 AD versus TSP 40 iv 5 Final Remarks Bibliography List of Tables Table 1. Sample Means and Variances 2 Table 2. Week Pattern of ER and AD 18 Table 3. P-values for the Tests for Cycles 19 Table 4. Tests of Significance 21 Table 5. Tests of Significance 23 Table 6. p 2 for ER and AD versus TSP 27 Table 7. Test of Significance for X terms 29 Table 8. Test of Significance for X terms 29 Table 9. Test of Significance for X terms 30 Table 10. Test of Significance for X terms 31 Table 11. Test of Significance for X terms 31 Table 12. Test of Significance for X terms 32 Table 13. Test of Significance for X terms 33 Table 14. Test of Significance for X terms 33 Table 15. Test of Significance for X terms 34 Table 16. Test of Significance for X terms 35 Table 17. Test of Significance for X terms 35 Table 18. Test of Significance for X terms 36 Table 19. Test of Significance for X terms 37 Table 20. Test of Significance for X terms 37 vi Table 21. Test of Significance for X terms 38 Table 22. Test of Significance for X terms 39 Table 23. Test of Significance for X terms 39 Table 24. Test of Significance for X terms 40 Table 25. Test of Significance for X terms 41 Table 26. Test of Significance for X terms 41 Table 27. Test of Significance for X terms 42 Table 28. Test of Significance for X terms 42 Table 29. Test of Significance for X terms 43 Table 30. Test of Significance for X terms ; 43 vn List of Figures Figure 1. Histograms of ER and AD 48 Figure 2. Comparison of Whitened vs Unwhitened Spectra 49 • Figure 3. Coherencies between Whitened Series 50 Figure 4. Proposed Spectra of ER 51 Figure 5. Proposed Spectra of AD 52 Figure 6. Comparison between the Whitened and Unwhitened Spectra of ER .. 53 Figure 7. Comparison between the Whitened and Unwhitened Spectra of AD .. 55 Figure 8. Comparison between the Whitened and Unwhitened Spectra of TSP . 57 Figure 9. Coherencies between Whitened ER and Whitened TSP Figure 10. Coherencies between Whitened AD and Whitened TSP vm 58 60 Acknowledgement I would like to thank Dr. J. Petkau, my supervisor, for his very helpful comments, suggestions and advices on this thesis as well as many other aspects of my study at UBC, and for his patiently reviewing the draft. I would also like to thank Dr. P. De Jong for reading the thesis and making valuable suggestions. I am very grateful to the Department of Statistics for providing me the financial support. ix 1 Introduction 1.1 Description of the Data Set One part of the data set to be analyzed consists of the counts of the number of admissions for asthma to the single hospital in Prince George, which will be referred to as AD, and those of visits to the emergency room of that hospital, which will be referred to as ER. These counts were extracted from the hospital records for the period April 1, 1984 to March 31, 1986. The other part of the data is the levels of total suspended particulates (TSP) and the levels of total reduced sulphates (TRS) in Prince George for the same period. The TRS levels were measured at six monitoring stations on a daily basis, whereas the TSP levels were measured at five of the above stations, and are available only every sixth day. We estimated the missing values in TSP and TRS via the EM algorithm of Dempster, Laird and Rubin [2]. To reduce the pollutant series to single series describing the levels experienced in Prince George, we then take the average of TSP across the six stations and of TRS across thefivestations. There is no missing data in the ER and AD series. There has long been a public concern in Prince George that the air pollution in the city may be influencing the health of the residents, since the pollution levels often exceed the provincial air quality standards. The objective of the work reported in this thesis was to study the relationships between the two health series, AD and ER, 1 and the two pollution series, TRS and TSP, to examine whether any associations are apparent between human health and the ambient levels of air pollution in Prince George. The hospital admission counts for asthma on a specific day can be viewed as the outcome of a large number of Bernoulli experiments (one for each of the residents of Prince George), each of which has a very small corresponding probability of success (being admitted to the hospital for asthma). So, as a starting point, it seems reasonable to treat the admission counts as the observed values of Poisson random variables. A similar argument can be made for the emergency room visit counts. We can see from Table 1 that the sample means and variances of ER and AD are quite close, which agrees with the Poisson assumption. Table 1: Sample Means and Variances Sample Mean Sample Variance ER 0.75 0.86 AS 0.55 0.51 The histograms of ER and AD are plotted in Figure 1; these also suggest that a Poisson model may be reasonable. 1.2 Overview We shall use three different methods to study these relationships. Section 2 discusses the theoretical aspects of these methods. The results of their application to studying the above relationships are presented in Sections 3 and 4. Since TRS 2 and T S P are collected differently, the relationships between the health series and the T R S series are examined i n Section 3, while those between the health series and the T S P series are examined in Section 4. In Section 5 we summarize the results of these analyses and briefly discuss other possible methods of analysis. T h e methods used here differ from those employed i n [3] and [4], and the results obtained complement those reported i n these earlier studies. 2 Methodology The first method to be applied is the traditional spectral analysis approach, carried out as a preliminary and primarily descriptive analysis. It concentrates on the spectra of each pair of X and Y series, their cross spectrum, their coherency, and finally a measure of the linear relationship between them. The other two methods are combinations of ideas and techniques from the methodology of generalized linear models ( G L I M ; see [5]) and time series analysis. One method will be referred to as the generalized harmonic process ( G H P ) , and combines G L I M with the spectral analysis, the other method will be referred to as the generalized autoregressive model ( G A R ) , and combines G L I M with autoregressive models. 2.1 The Spectral Analysis Approach In this section, we describe the details of a method of measuring the strength of linear relationship between two time series {X , t — 1,2,...} and {Y , t = 1,2,...}, t which will be referred to as the X and Y series. 3 t 2.1.1 Prewhitening of the X and Y Series The relationship of interest between the X and Y series is that which excludes any association due to the deterministic patterns each series may have. Therefore it seems reasonable to "whiten", that is, to remove the dependence within each of the X and Y series before studying the relationship between the two series. This might be done, for example, by first fitting the X and Y series with AR models, and then considering the relationship between their residuals. A possible concern is that such a whitening process might remove more than just the association due to the deterministic patterns of each series or even that it might remove all the relationship between the X and Y series. The following theorem indicates that this is not the case. If a time series {X } t is stationary, then it can be represented as (see [6], page 246): J — TT where the stochastic process Z^(CJ) will be referred to as the spectral representation of the series {X }. t If we have two series {X } and {Y }, then at each frequency w, the t t correlation coefficient between the two random variables dZx(u) and dZy(u>), that is, Cor(dZx(^>), dZy(to)), is referred to as the "complex coherency between series X and Y at frequency and denoted as WXY(W). The absolute value of WXY(W) is called the "coherency" between series X and Y at frequency OJ. Theorem (see [6], page 661): Let {X }, {Y }, {X' }, {Y{} be-time series, stationary t t 4 t up to the second order, with +00 't = X +00 YJ uX -u and a t Y[ '= U= —CO ^t-uU=—OO Let Zx(u), Zy(oj), ZX '(u), ZY '(u), be their spectral representations. Then WXY{U) = wx 'y>(u). The theorem can be interpreted as "coherency is invariant under linear transformations" . Therefore, in the sense of coherency, prewhitening the two time series as described at the beginning of this section would not remove the relationship between them, provided that both of them are stationary and the prewhitening procedure is linear. Based on this consideration, we will use the following whitening process. We first estimate the spectra of the X and Y series (see [6], page 398-399). Evaluate the estimated covariance function, R(s), as i y t=i the estimated autocorrelation function, r(s), is then given by f(a) = R(s)/R(0). The spectra are then estimated by: N-1 h(u) = 2 Yl M(s)r(s)cos(oos), (1) s=-(N-l) where M(s) is the "weight" assigned to f(s) for each s, which is often referred to as the "lag window" (see [6], page 434). In our analyses, we will use the Parzen lag 5 window (see [6], page 443): 2 3 l-6(s/M) + 6(H/M) , M(s) = < 2(1 - \s\/Mf, M < M/2, M/2 < \s\ < M, \s\ > M, 0 with some integer M. We fit each of the X and Y series with an AR model, vising maximum likelihood to estimate the parameters and Akaike information criterion to determine the order. The residual from the fit is the "whitened" series. This whitening "removes" the autocorrelation within each series but, according to the above theorem, does not disturb the intercorrelation between the X and Y series if they are stationary. Comparing the estimated spectra for each pair of whitened and unwhitened series clearly illustrates the effects of the prewhitening. 2.1.2 Cross Spectrum and Coherency Next, we wish to study the relationship between the whitened X and Y series. In the following discussion, if not specifically mentioned, the whitened X and Y series will be simply referred to as the X and Y series. First, we estimate the cross spectra between the X series and the Y series (see [6], page 693): hxrH = E M(s)f (s)e- '; iu XY s=-(N-l) 6 here rxY(s) is the estimated cross-correlation function, YY(0) where RXY(S) is the estimated cross covariance function, Rx (s)=^j:(Xt -T)(Y -Y), Y t+s with the summation extending from t = 1 to JV — s for s > 0 and from t = 1 — s to N for s < 0, and M(s) is the Parzen window centered at the maximum value of TXYThe coherency between time series X and Y is then estimated by: hxv(u) (2) The introduction of the coherency between two time series allows us to have a measure of the strength of linear relationship between the two series, as described in the next subsection. 2.1.3 A Measure of Linear Relationship between the Series X and Y Although coherency gives the correlation between dZx(u) and dZyiui), it corresponds to specific frequencies, and is not a direct measure of the linear relationship between the series X and Y. However, based on coherency and spectra, such a measure of linear relationship between two stationary time series is provided by the following theorem. Theorem (see [6], page 675): Suppose X and Y are stationary up to the second t t order, and Y can be represented as a linear transformation of X of the following t t 7 form: +00 where e4 and are uncorrelated. Then we have: / i y y ( c j ) j ? - O x y ( u ; ) | c ? w + a\ 2 — TT where a\ — var(YJ) and o~\ = var(e4). This result provides a decomposition of the total variance of Yt into the variance explained by the regression of Yt on Xt and the variance of the residuals. So the ratio 2 2 p = 1 — o~ /aY represents the extent to which Yt and X are linearly related in the t above sense. More specifically, it represents the percentage of variation in the series Y explained by its "best possible" linear relationship (with infinitely many parameters) 2 2 with the series X. It can be estimated by p = 1 — a /aY , where (see [6], page 675): 2 In our analysis, p provides us with a rough idea of the strength of the linear relationship between two time series. 2.2 The Generalized Harmonic Process ( G H P ) Approach In this section we introduce an approach that can be applied to detect periodicities for data modelled with the exponential family class of distributions. These cycles can 8 then be incorporated into regression models relating two series. Due to its importance for application to the Prince George data, emphasis in the development which follows is given to the Poisson case. 2.2.1 The GHP Model Assume that we have independent observations: Y\,Y/v, and and Y has probt ability density function proportional to exp[Y g(0 ) + a(6 ) + b(Y )]. t t t (3) t Then {Y } is said to be generated by a GHP if t k g(0 ) = A + ^(Aismuit t + Bicosuiit) t=i for some A, {A,}, {Bi}, and a prespecified set of {w,}. Example 1: If Y ~ Poisson(At), then g(X ) = logAt, and t t k logX = X + t y^(Ajsincj,^ + Bicosujit). i=i We call it the Poisson harmonic process. Example 2: If Y ~ Binomial(ro,/>j) with common n, then g(pt) = log ( j ^ - ) , and t log -2— = A + £(A,sinu,ti + B jCOSWjt). ,=i We call it the binomial harmonic process. Example 3: If ~ N(/it, 1), then g(pt) — fJ-t, a n < i k p = X+ t £(A,sinu;;i + Bicoscoit). (4) This is the traditional harmonic process. 2.2.2 The Spectra of GHP and Their Null Distributions Once we have the model (3) and relation (4), we can use GLIM to get the MLE's A, A,'s and J3,'s for A, A,'s and Bis. Following the traditional approach, we can plot - 2 - 2 the scaled A, +B; as the "power" associated with the frequency u;,- (see [6], pp. 395). The following theorem gives their limiting null distribution in the Poisson case. The result is very similar to that for traditional spectra, except that the current results are asymptotic. Completely similar arguments can be applied for other distributions in the exponential family (3) such as the binomial to obtain limiting null distributions for spectra. A set of "fundamental frequencies" is given by {u>i : U{ = 2iri/N,i = 1,[N /2]}, where JV is the number of observations. The set of frequencies thus defined has the orthogonality properties (see [6], page 392). Theorem: Let { ( ^ i , u i k ] he a set of fundamental frequencies, and A, Aj's and J3,'s be the MLE's for A, A;'s and B^s under the model (3). Then, under the null hypothesis: H : Ai = 0, Bi = 0, i = 0 the yjYYAiS 1,k and \pjYJ3,'s are asymptotically independently and identically dis2 2 tributed (iid) as N(0,1), and therefore the yY^A -f J3, )'s are asymptotically iid xl- 10 The intuition of this spectrum is similar to that of the traditional one, that is, the "power" assigned to each frequency. Proof: It is easy to verify that under model (3), the Fisher information matrix for the MLE's is 1(d) = W'MW, where / W ^ 1 1 sinu^l cosu^l sinu>fcl coso^l 1 sino'12 cosa;i2 sinwA;2 cosu>iJV sinu>jtiV s'muJiN cosa;jt2 cosuikN j and M = diag(Ai,A n ). Under JET0, M = diag(A,A), and because of the orthogonality properties of the fundamental frequencies, W is an orthogonal matrix with E i l r Wf = N, and £ £ x W$ = N/2,j > 1, we have x t I(0 ) Q = W'MW NX \ 2 0 0 hkx2k \ Setting 9 = (Ai, B \ , A k , Bk) , under Ho, we have f -A^=^N(0,/2fcx2fc). The theorem follows because under Ho, Y is a consistent estimator of A. 2.2.3 Spectra Based on Likelihood Ratio Tests Against Periodicities In this subsection we introduce other two methods for detecting periodicities, each based on a series of hypothesis tests associated with a prespecified set of frequencies. The null hypotheses are that the series has a mean not varying with time; 11 the alternative hypotheses are that the series has a cycle of a specified frequency. A likelihood ratio test can then be constructed with respect to each of the frequencies, and these likelihood ratio tests lead directly to spectra. Proposal 1: Consider the following series of hypothesis tests: : Ai = ... = An = A p) rl[ : A; = X i= 1,n - p i+P For any value of p, the likelihood ratio is: R = 21ogL(H< ) - 2logL(nt ) P) ) p where logL(H«) = £ £ j=l t'=l y(|..1)jH.ilogAjA„ p j=l and — w e c a n Since R ~ Xp-n p p n/,p Y(i-i)p+ . p the P-value associated with each period p. If the P-value at some period p is below a critical level a, this suggests a period p in the process. Alternatively, we can plot 1 — P-value against period, and the large (close to 1) values in this "quantile spectrum" suggests a periodicity. Proposal 2: Similarly, consider the following series of hypothesis tests: HQ ^ : Ai = ... = An = A 12 Hi : logA< = A + As'mut + Bcosut Associated with each u>, R(u) = 21ogL(Hf") - 2logL(HS, ) » \ ) w) x where logL(Ho^) is the same as before, and logL(Hi ') = u) n n ^(A + As'mujt + Bcosut) - ]T) exp(A + Asinut + Bcosut). t=i t=i Then we can plot R(ui) against u. 2.2.4 Application of the GHP Model The spectrum methods introduced above provide an idea of the "importance" of each frequency. The "important" frequencies can then be included in regression models for Y against X. The applications which follow in Sections 3 and 4 involve the Poisson case and the following procedure will be employed. First, based on the GHP spectrum introduced in Section 2.2.2, we select the set of frequencies corresponding to small P-values from the asscoated x\ statistics, and combine the terms corresponding to these frequencies with the X terms in the following model: m logAt = A + ^(At-sina>< + B osojt) + J2 A^t-v; iC w (5) t=0 here m is a relatively large integer believed sufficient to cover the past with which the present could be associated. Then model reduction is based on retaining in the above model only those terms with coefficients which differ significantly from zero 13 (P-value < a.'). This process is repeated in stages, leading to a final model of the form where {Ai}, {Bi}, and {/?,} are the coefficients surviving the model reduction. At each stage of model reduction, Adeviances are also investigated to make sure the reduction is legitimate. Tests for the "importance" of the X terms, and specification of the relationship between {-X"*}, and {Y }, if any, are then based on thisfinalmodel. t 2.3 The Generalized Autoregressive Model (GAR) Approach Traditional AR models are most suitable for time series with continuous marginal distributions. The GAR model is developed to describe time series with the exponential family class of marginal distributions. This allows us to model time series with discrete marginal distributions. 2.3.1 The GAR Model A sequence of N observations Y\, ... , Yjv are said to follow an GAR model of order p if they have the following conditional distributions: /W(V.-|VT) = ex {y [0(y?)] + d[0(yf)] + S(y,)}, P lC where 14 c[9(Y?)] = 0o + 0*17, P = (A, A . .".AO*. Example 1: When the conditional distributions of the V s are Poisson, we have 9(Y?) = EiYlY?), and c[6{Yft] = logt^r/)] = 0 + p*Y?. O We call it a log-AR model. Example 2: When the conditional distributions of Y's are binomial with common known total countrc,we have nB(YT) = EiYlY?), and c\6<yn\ = i o g l e Y } g ^n =00 + p**?. We call it a logit-AR model. Example 3: When the conditional distributions of Y's are normal with common known a = 1, we have 0(Y?) = EiYlYf), and [e(xn\ = eom = 0 + C O This is the traditional AR model. 15 2.3.2 Application of the GAR Model Similar to Section 2.2.4, we can combine the GAR model with regression against X. In the Poisson case, we start with the following model: logAt = c*o + E aiYt-i + J2 PiXt-i i=l (6) t'=0 here p and q are relatively large integer believed sufficient to cover the past values of X and Y with which the present Y could be associated. A model reduction procedure similar to that of Section 2.2.4 can then be applied. 3 Relationships of the Health Series and the TRS Series 3.1 T h e Spectral Analysis Approach The prewhitening is done byfittingeach series with an AR model. Providing thefitis adequate, the residuals will be close to "white noise". The AR fitting is based on maximum likelihood, and the model reduction procedure is based on Akaike's information criterion. The residual autocorrelation function, the residual sum of squares, and the spectrum of the residual series are investegated to check the goodness-of-fit.We present only the resulting model and the spectrum of residuals. In the following models, Y denotes an observation centered at the maximum likelihood estimate (MLE) of the mean of the original series; that is, Y denotes t Y — fi, where Y is the observation and p is the MLE of the mean of the series. t t For ER, fi = 0.75, and thefittedmodel is: Y = 0.079Ff_i - 0.07lFt_n + 0.11Yt_i5. t 16 For AD, fi = 0.54, and thefittedmodel is: Y = 0.12rt_2 + 0.083Yt_8 + 0.068Yf_9 - 0.062Yt_„ - 0.054Yt_13 + 0.07lYt_19. t For TRS, fi = 3.37, and thefittedmodel is: Y = 0.37Yi_1 + 0.12Y,_2 + 0.059Yi_5 - 0.097Yt_12 - 0.053Y"t_16. t The above models show how the present data is linearly associated with the past data, thereby describing the dynamics within each series. We do not see a similar pattern among the three series. The whitened and unwhitened spectra for ER, AD and TRS are estimated using the foumula (1) with M = 60 and are plotted in Figure 2, with solid curves representing the whitened spectra and dotted curves representing the unwhitened spectra. In all of the 3 pictures, the solid curves are much flatter than the dotted curves. This illustrates the effect of prewhitening the series. The coherency between ER and TRS and between AD and TRS are estimated 2 using formula (2) and are plotted in Figure 3; the p measure described in Section 2.1.3 equals 0.04 and 0.06 for ER vs TRS and AD vs TRS respectively. Both values are quite small, which means the strength of the linear relationships between ER and TRS and between AD and TRS are both quite weak. 3.2 The Generalized Harmonic Process Approach In this section we first study the dynamics within each health series and then incorporate the apparent dynamics into the regression of each of the health series 17 against the TRS pollution series. 3.2.1 The Spectra of ER and AD We calculate the 3 spectra introduced in Sections 2.2.2 and 2.2.3 for both ER and AD. The fundamental frequencies we selected for the GHP spectrum and the Proposal 2 spectrum are: : k = 13,26,39,364}. The quantile spectrum is plotted against the period T = 2 , 1 6 days. Figure 4 presents the GHP, Proposal 2 and Proposal 1 spectra for ER. From Figures 4a and 4b we can see a high jump at the frequency u> = 0.89; the associated period is T = 27r/0.89 = 7.02 ~ 1 week. From Figure 4c we can see this spectrum has high jumps at both T = 7 and 14, which also indicates a period of one week. This fact is comfirmed by Table 2, where we can see that there are more emergency room visits in weekends than in weekdays. Figure 5 presents the GHP, Proposal 2 and Proposal 1 spectra for AD. Figure 5a and Figure 5b both show a relatively high value at frequency u = 0.56, the associated period is T = 11, which seems to suggest a period of 11 days. However, this is not supported by Figure 5c, where we do not see any indication of such a period. Although it seems hard to give an intuitive interpretation to this period, the corresponding trigonometric function does, as will be seen later, capture quite a bit of the variation in the data and provide us a reasonable predictor in modelling the "dynamics" inside AD. Table 2: Week Patterns of ER and AD 18 Sun. Mon. Tue. Wed. Thu. Fri. Sat. Average of ER counts 0.98 0.81 0.63 0.49 0.54 0.76 1.07 Average of AD counts 0.55 0.64 0.53 0.57 0.48 0.50 0.56 We can also construct tests for the significance of the cycles using the GHP spectrum. According to the Theorem in Section 2.2.2, we know that the statistics in the GHP spectrum are asymptotically iid x\'i that is, exponential with 6 — 2. Therefore the largest peak in the spectrum should be distributed asymptotically as the first order statistic for a random sample of N exponentials, and similarly for the second and third largest peaks. Comparison to these null distributions leads to the results summarized in Table 3: Table 3: P-Values for the Tests for Cycles ER AD First Peak 0.0002 0.024 Second Peak 0.33 0.25 The third and following order statistics also have large values, indicating they do not correspond to significant cycles. ER and AD seem to have only one dominating cycle each. 3.2.2 ER versus TRS Based on the GHP spectrum calculated above, we can then do model reduction to obtain important frequencies and then incorporate them into the regression versus 19 TRS as described in Section 2.2.4. Starting with model (5) with m=30, and using a — a' = 0.10 in the model reduction procedure, we are led to the following model: logAi = -.33 + .13sin(.llt) + .12sin(.89r) - .15sin(2.13i) - .llcos(.lli) +.13cos(.34t) + .21cos(1.12i) + .13cos(1.68t) + .016X _ t -.022X t U 5 + .017X(_14 - .025X(_16 + .015Xt_2O. The z-scores for the coefficients of the X terms are: 1.91, -1.98, 1.83, -2.16, 1.76. The deviance is 803.5, with 697 degrees of freedom (P-value=0.0031). There is a lack of fit in this fitting, this could probably be removed by introducing an overdispersion parameter. Notice that TRS enters thefittedrelationship via some quite large lags; for example, a lag of 20. This peculiar phenomenon may be due to the fact that, since a' = 0.1, roughly 10% of the predictors would survive the model reduction even if there were no significant predictors. However, without further empirical knowledge of the relationship between ER and TRS, we have no way to determine which of the lags surviving are of essential importance. Due to the above fact and the lack of fit, this model only provides us with a rough idea of how ER is related with TRS. Also observe the alternating sequence of signs on the TRS terms. It does not seem unreasonable to assume that changes in air pollution levels have an effect on ER. Therefore we refit a simpler model in which X ^ , X _n t 5 t Xt -i4, only through A i = Xt -5 — X -u, and A 2 = Xt -\4 — X -iQ. t t 20 and Xt_16 enter This results in the following fit: logAt = -.37 + .13sin(.ll<) + .13sin(.89i) - .15sin(2.13<) - .llcos(.llf) +.13cos(.34*) + .21cos(1.12t) + .13cos(1.68<) + .OI8A1 +.019A2 + .014Xt_20. (7) This refitting results in an increase in the deviance of 0.8, with 2 degrees of freedom; this suggests the reduction is appropriate. The results of tests for the importance of the X and A terms based on the above model are summarized in Table 4. Table 4: Tests of significance Test Adev df P-value for Ax 7.4 1 0.007 for A 2 6.5 1 0.01 for Xt -2G 3.1 1 0.08 We can see the P-values for the A terms are quite small, which suggests that the changes in TRS levels could be an important factor. 3.2.3 AD versus TRS In exactly the same fashion we obtain the GHP model for AD vs TRS: logAt = -.70 - .12sin(0.56<) - .12sin(1.34f) -.16sin(2.24i) + .20cos(.56t) + .020Xt_4. 21 The z-score for the coefficient of Xt -4 is 2.25 and the deviance is 697 with 720 degrees of freedom (P-value=0.724). Withholding -X"t_4 increases the deviance by 4.5 (Pvalue=0.034), which suggests an association of AD with TRS with 4 days lag. The above analyses seem to suggest positive associations of both ER and AD with TRS. Emergency room visits seem to be more affected by changes in pollution levels than by the actual values, although the peculiar structure of the relationship suggested by model (7) and the lack of empirical knowledge as well as the lack of fit do not allow us to say so with high confidence. On the other hand, hospital admission seem to be affected by a high level of TRS with 4 days lag. 3.3 The Generalized Autoregressive Model Approach In this section we will apply the GAR model to study the relationships between the health series and the TRS series. 3.3.1 ER versus TRS We start with model (6) with the order of autoregression p = 20, and the order of regression against TRS q = 30. The model reduction rule is a = 0.10. Restriction to the order q = 30, corresponding approximately a month, assumes that pollution levels a month ago would not affect the present counts of emergency room visits. Restriction to the order p = 20 assumes that the counts 20 days ago would not affect the present counts. The following model results: logAi = -.49 + .091YU + .14yt_15 + .016X,_5 - .022X,_n 22 +.020X . t 14 - .027Xt_16 + .018Xt_20. The deviance is 827.78, with 702 degrees of freedom (P-value=.0004). The z-scores for the coefficients of the X terms are -2.32 and 2.05. There is a lack of fit, this could probably be removed by introducing an overdispersion parameter. Notice that the fitted relationship between ER and TRS is quite similar to that obtained via the GHP approach, except that the dynamics within ER is accounted for by regression on the past values of ER rather than on trigonometric functions. Again, given the peculiar structure of the model with some large lags and the fact that there is a lack offit,we are not very confident about the conclusions based on this model. Proceeding as in the GHP approach, we consider reduction to the effects of changes in the TRS levels and refit. The follow model results: logA, = -.53 + .09iyt_! + -14Y4_15 + .018AX + .022A2 + .016Xt_2O. (8) Eliminating two parameters in this fashion increases the deviance by only 0.8, which again suggests the reduction is appropriate. The results of tests for the significance of the A and X terms are summarized in Table 5. Table 5: Tests of significance Test Adev df P-value for Ai 7.3 1 0.007 for A 2 8.1 1 0.004 for X _ Q 4.0 1 0.05 ' t 2 23 The small P-values for the A terms seems to suggest that changes in TRS levels could be important factors. 3.3.2 AD versus TRS In exactly the same way as above, we obtain the following model for AD vs TRS: logAt = -.81 + .19Ft_2 + .13Y|_8 + .020Xf_4. The deviance is 699.5, with 718 degrees of freedom (P-value=0.682). The z-score for the coefficient of Xt_4 is 2.13; Withholding X _ t 4 increases the deviance by 4.1 (P-value=0.043). The small P-value suggests an association of AD with TRS with 4 days lag. We can see that the two methods GHP and GAR yield very similar results. 4 Relationships of the Health Series and the TSP Series Because the TSP levels were measured only every sixth day, straightforward application of the three methods is not possible. We shall study the relationship between the TSP levels and the health series corresponding to the same day, 1 day after, 5 days after, the days when TSP levels were measured. We refer to the relation between {Y6»+A:} (referring to either ER or AD) and {-X^,} as the lag k relation, where k = 0,1,5. Therefore we shall be dealing with a total of 13 different time series: 6 for ER, 6 for AD, and 1 for TSP. 24 4.1 T h e Spectral Analysis Approach The AR models for prewhitening of the above 13 time series are obtained the same way as before. In the following results, the unit of time is 6 days; for example, Yt -6 denotes the value at 6 x 6 = 36 days prior to Y . The full model allows 12 lags, t and the reduction procedure is as described in Section 2.1.1. Applying the prewhitening procedure to the 6 ER series leads to the following results: — with lag 0: fl = 0.74, and the AR model is: Y = -0.18Yi_6 + 0.18Y4_9 + e ; t t — with lag 1: fj, = 0.87, and the AR model is: Y = -0.15rt_3 - 0.20rf_5 - 0.18Y,_lo + et; t — with lag 2: fi = 0.73, and the AR model is: Y = +0.18Ft_1 + e ; t t — with lag 3: fx = 0.76, and the AR model is: Y = 0.19Yt_7 - 0.14Yi_8 - 0.15Ft_12 + et; t — with lag 4: jj, = 0.69, and the AR model is: Y = 0.16^ t - O.UY -2 - 0.22^.!! + et; t 25 — with lag 5: ft = 0.76, and the AR model is: Y = -0.12rt_12 + tt. t The whitened and unwhitened spectra for ER with lag 0, lag 5 are plotted in Figure 6, with solid curves representing the whitened spectra and dotted curves for unwhitened. As in Section 3.1, the solid curves are flatter than the dotted ones, illustrating the effect of prewhitening the series. Applying the same procedure to the 6 AD series leads to the following results: — with lag 0: (J, = 0.62, and the AR model is: Y = -0.17Yt_6 + 0.23Yt_9 + 0.19Ft_12 + e ; t t — with lag 1: fi 0.50, and the AR model is: Y = -o.ny _ + 0.15r _ + et; t t — with lag 2: ft 3 t 12 0.61, and the AR model is: Y = -0.17yi_! - 0.23Yt_4 + e ; t t — with lag 3: ft 0.53, and the AR model is: Y = +0.15Ft_9 + 0.20yt_io - 0.28Yi_12 + e ; t — with lag 4: jx t 0.49, and the AR model is: Y = -0.17^.7 + 0.13y«_n - 0.18Ft_i2 + et; t 26 — with lag 5: \i = 0.55, and the AR model is: Y = -0.12Yt_2 - 0.21Y,_5 - 0.15Ft_6 - 0.22Ft_9 + e . t t The whitened and unwhitened spectra for ER with lag 0, lag 5 are plotted in Figure 7. Again, theflattersolid curves show the effect of prewhitening. Finally, for TSP, jl = 49.55, and the AR model is: Y = O.UY _ + e . t t 9 t The whitened and unwhitened spectra for TSP are plotted in Figure 8. The coherencies between ER lag 0 and TSP, in Figure 9; those between AD lag 0 and TSP, ER lag 5 and TSP are plotted AD lag 5 and TSP are plotted in 2 Figure 10. The corresponding values of p are provided in Table 6. Table 6: p 2 for ER and AD versus TSP Lag 0 1 2 3 4 5 ER 0.099 0.152 0.120 0.135 0.157 0.152 AD 0.217 0.081 0.158 0.134 0.167 0.188 Again, we can see that the strength of the linear relationships between ER and TSP and between AD and TSP are quite weak. 4.2 T h e Generalized Harmonic Process Approach 27 Since TSP is available only every sixth day, we cannot use model (5)- directly. Instead, important frequencies in ER or AD are suggested by their GHP spectra and they are then incorporated with the regression against TSP. The frequencies are extracted from the spectrum of the full ER and AD data. To take into account the fact that one time unit here is equivalent to 6 time units in the full data, we have the following model: m Yt = A + £{Awsin[u;(6* - 5)] + £wcos[u;(6i - 5)]} + ]T ftlt.,- + e u (9) where the summation is over the important frequencies selected from the GHP spectra by the selection rule: P-value < a. Then, based on (9) we carry out model reduction according to the rule: P-value < a'. Often, none of the X terms survived the reduction, suggesting they are not important. For the sake of completeness, we forced the X terms and test each of the coefficients via the change in deviance. The model reduction is implemented using a = a' = 0.10; but we now take m — 5 because this corresponds to the use of m = 30 before. 4.2.1 ER versus TSP Applying the procedure to the 6 ER series leads to the following results: — with lag 0: None of the X terms survive the reduction (Adev = 4.85, degrees of freedom=6). The model with the X terms forced is: Y t = -0.77 + 0.29sin[0.34(6i - 5)] + 0.35sin[2.13(6* - 5)] - 0.31cos[1.68(6* - 5)] -f-0.0048Xt - 0.0019JT_i + 0.0047X4_2 - 0.0024Xt_3 '- 0.0020Xt_4 t 28 +0.0052Xt_5. The deviance is 128.66, with 106 degrees of freedom (P-value=0.066). The z-scores for the coefficients of the X terms are: 1.15, -0.44, 1.17, -0.54, -0.45, 1.29. Test of significance for the X terms are summarized in Table 7. Table 7: Test of Significance for X terms Xt Xt -2 X_ Xt _4 Xt -5 1.28 0.20 1.33 0.30 0.21 P-value 0.26 0.65 0.25 0.58 0.65 0.21 xl X -\ t t 3 1.59 — with lag 1: None of the X terms survive the reduction (Adev = 7.47, degrees of freedom=6). The model with the X terms forced is: Y t = -0.85 + 0.29sin[0.34(6i - 5)] + 0.47cos[0.90(6i - 5)] - 0.0039^ +0.0040Xt_i + 0.002lXt_2 - 0.0061X4_3 + 0.0017Xt_4 + 0.006lXt_5. The deviance is 117.45, with 107 degress of freedom (P-value=0.402). The z-scores for the coefficients of the X terms are: -0.99, 1.14, 0.57, -1.34, 0.41, 1.67. Tests of significance for the X terms are summarized in Table 8. Table 8: Test of Significance for X terms Xt X _i Xt -2 X -z Xt -4 Xt -5 1.00 1.26 0.31 1.86 0.16 2.65 P-value 0.32 0.26 0.58 0.17 0.69 0.10 xl t 29 t — with lag 2: The model with the X terms forced is: Y t = -1.23 + 0.39sin[0.34(6i - 5)] + 0.38sin[0.90(6i - 5)] - 0.30sin[2.13(6i - 5)] -0.29cos[0.11(6i - 5)] + 0.40cos[1.12(6t - 5)] - 0.0078X* +0.0024Xi_i + 0.0033X,_2 + 0.0062X,_3 + 0.0082Xt_4 + 0.0024Xi_5. The deviance is 123.97, with 104 degrees of freedom (P-value=0.088). The z-scores for the coefficients of the coefficients of the X terms are: -1.72, 0.59, 0.88, 1.51, 1.87, 0.52. Tests of significance for the X terms are summarized in Table 9. Table 9: Test of Significance for X terms Xt X -i X -2 X -z X ~4 3.11 0.35 0.77 2.22 3.36 0.27 P-value 0.08 0.55 0.38 0.14 0.07 0.60 Xl t t t t X -s t — with lag 3: None of the X terms survive the reduction (Adev = 4.21, degrees of freedom=6), the model with the X terms forced is: Y t = -0.30 + 0.33sin[0.34(6i - 5)] - 0.35cos[0.34(6i - 5)] - 0.31cos[2.13(6i - 5)] -0.0041X, + 0.0058Xt_i + 0.0018Xt_2 - 0.0005Xt_3 - 0.0045Xt_4 -f-O.OOOOX^g. The deviance is 123.93, with 106 degrees of freedom (P-value=0.113). The z-scores of the coefficients of the X terms are: -0.99, 1.49, 0.44, -0.12, -0.99, 0.01. Tests of significance for the X terms are summarized in Table 10. 30 Table 10: Test of Significance for X terms Xt Xt -\ Xt -2 Xt -3 Xt_4 1-02 2.14 1.93 0.15 1.00 0.00 P-value 0.31 0.14 0.66 0.90 0.32 0.99 Xi Xt _5 — with lag 4: The model with the X terms forced is: Y t = 0.50 - 0.33cos[0.34(6i - 5)] + 0.28cos[2.13(6i - 5)] - 0.0016X, -0.0022Xt_! + 0.0008Xt_2 - 0.0005Xt-3 - 0.011lXt_4 - 0.0052Xt_5. The deviance is 124.84, with 107 degrees of freedom (P-value=0.115). The z-scores of the coefficients of the X terms are -0.37, -0.49, 0.20, -0.12, -2.17, -1.09. Test of significance for the X terms are summarized in Table 11. Table 11: Test of Significance for X terms Xt Xt -2 Xt ~z Xt -A Xts 0.13 0.24 0.03 0.01 5.17 1.23 P-value 0.72 0.62 0.86 0.92 0.02 0.27 Xi — with lag 5: None of the X terms survive the reduction (Adev = 6.27, degrees of freedom=6). The model with the X terms forced is: Y t = -1.23 + 0.0016X( - 0.0033Xt_! + 0.0064Xt_2 -|-0.0028Xt_3 + 0.0039Xt_4 + 0.0064.Yt_5. 31 The deviance is 118.68, with 109 degrees of freedom (P-value=0.248). The z-scores of the coefficients of the X terms are: 0.38, -0.78, 1.62, 0.67, 0.94, 1.59. Tests of significance for the X terms are summarized in Table 12. Table 12: Test of Significance for X terms Xt Xt -2 Xt -Z Xt -4 Xt -5 0.14 0.62 2.54 0.44 0.85 2.42 P-value 0.71 0.43 0.11 0.51 0.36 0.12 xl X -\ t None of the above analyses suggests any clear positive association between ER and TSP. In fact, the only P-value smaller than 0.05 is in Table 11 and corresponds to a negative coefficient. The other two smaller P-values of 0.08 and 0.07 appear in Table 9 and correspond to a negative and positive coefficient respectively. But those are only 3 out of 36, and may very well be due to the random variation, since roughly speaking, even if there are no relationships, around 4 of these 36 P-values should be in this range. 4.2.2 AD versus TSP Similarly, examination of AD versus TSP leads to the following results: — with lag 0: None of the X terms survive the reduction (Adev = 5.94, degrees of freedom=6). The model with the X terms forced is: Y t = -0.51 + 0.34sin[2.46(6r - 5)] - 0.0055^ - 0.0049Xt_! +0.0011X(_2 - 0.0018Xt_3 + 0.005lXt_4 + 0.0060Xt_5. 32 The deviance is 108.74, with 108 degrees of freedom (P-value=0.462). The z-scores of the coefficients of the X terms are: -1.15, -1.07, 0.25, -0.38, 1.18, 1.36. Tests of significance for the X terms are summarized in Table 13. Table 13: Test of Significance for X terms X X -i X -2 X- 1.21 0.06 P-value 0.24 0.27 0.80 XJ t 1.38 t t X -4 X- 0.15 1.34 1.77 0.70 0.25 0.18 t 3 t t 5 — with lag 1: None of the X terms survive the reduction (Adev = 2.39, degrees of freedom=6). The model with the X terms forced is: Y t = -1.63 + 0.41sin[2.24(6i - 5)] + 0.34sin[2.80(6i - 5)] + 0.42cos[l.34(6i - 5)] -0.35cos[2.46(6* - 5)] + 0.0037Xf + 0.0036Xt_! + 0.0036Xt_2 +0.0017Xt_3 - 0.0008Xt_4 + 0.0042Xt_5. The deviance is 96.40, with 105 degrees of freedom (P-value=0.714). The z-scores of the coefficients of the X terms are: 0.75, 0.73, 0.68, 0.33, -0.15, 0.82. Tests of significance for the X terms are summarized in Table 14. Table 14: Test of Significance for X terms X4 Xt-i Xt-2 Xt-Z Xt-4 Xt -5 0.55 0.52 0.46 0.10 0.02 0.66 P-value 0.46 0.47 0.50 0.75 0.88 0.42 x\ 33 — with lag 2: None of the X terms survive the reduction (Adev = 2.43, degrees of freedom=6). The model with the X terms forced is: Y t = -0.50 + 0.41sin[0.56(6* - 5)] + 0.35sin[1.23(6* - 5)] - 0.41sin[2.46(6t - 5)] -0.37cos[0.56(6t - 5)] + 0.0028Xt - 0.0013Xt_! - 0.0053Xt_2 +0.0027Xt_3 + 0.0004Xt_4 - 0.0023Xt_5. The deviance is 95.03, with 105 degrees of freedom (P-value=0.747). The z-scores of the coefficients of the X terms are: 0.61, -0.25, -1.07, 0.54, 0.07, -0.47. Tests of significance for the X terms are summarized in Table 15. Table 15: Test of Significance for X terms Xt Xt-l Xt -2 Xt -Z Xt -4 Xt -5 0.37 0.06 1.19 0.29 0.00 0.22 P-value 0.54 0.80 0.28 0.59 0.95 0.64 xl — with lag 3: None of the X terms survive the reduction (Adev = 3.18, degrees of freedom=6). The model with the X terms forced is: Y t = -1.09 - 0.35cos[1.34(6i-5)] 0.0028Xt + 0.0009X(_! + 0.0012Xt_2 - 0.0003Xt_3 + 0.0074^-4 -0.0015Xt_5. The deviance is 114.70, with 108 degrees of freedom (P-value=0.311). The z-scores of the coefficients of the X terms are: 0.61, 0.20, -0.24, -0.05, 1.59, -0.30. Tests of significance for the X terms are summarized in Table 16. 34 Table 16: Test of Significance for X terms Xt X -\ t Xt-2 X t 3 Xt -4 Xt -5 Xl 0.36 0.04 0.06 0.00 2.41 0.09 P-value 0.55 0.84 0.81 0.96 0.12 0.76 — with lag 4: None of the X terms survive the reduction (Adev = 3.85, degrees of freedom=6). The model with the X terms forced is: Y t = -0.63 - 0.40cos[0.56(6i - 5)] + 0.0047X* - 0.0017Xt_! -0.0047Xt_2 + 0.0033Xt_3 + 0.0012X,_4 - 0.0063Xt_5. The deviance is 94.27, with 108 degrees of freedom (P-value=0.824). The z-scores of the coefficients of the X terms are: 0.97, -0.34, -0.85, 0.65, 0.23, -1.09. Tests of significance for the X terms are summarized in Table 17. Table 17: Test of Significance for X terms Xt X -\ t Xt-2 Xt-2, Xt-4 Xt -5 xl 0.91 0.06 1.02 0.42 0.05 1.26 P-value 0.34 0.80 0.31 0.52 0.82 0.26 — with lag 5: None of the X terms survive the reduction (Adev = 4.16, degrees of freedom=6). The model with the X terms forced is: Y t = -0.47 + 0.0028Xt - 0.0004Xt_! - 0.0054Xt_2 + 0.0022X^3 +0.0028Xt_4 - 0.0081Xt_535 The deviance is 97.86, with 109 degrees of freedom (P-value=0.769). The z-scores of the coefficients of X terms are: 0.59, -0.08, -0.97, 0.43, 0.57, -1.38. Tests of significance for the X terms are summarized in Table 18. Table 18: Test of Significance for X terms X -\ t x\ Xt-2 Xt -3 Xt -4. Xt -s 0.34 0.01 0.99 0.18 0.31 2.04 P-value 0.56 0.94 0.32 0.67 0.57 0.15 None of the above analyses (Table 13-18) suggests any positive association between AD and TSP; in fact, all the P-values are larger than 0.1. In neither the of the analyses of ER vs TSP nor those of AD vs TSP do we see any clear indications of positive association between the health and the TSP series. 4.3 The Generalized Autoregressive Model Approach We shall use model (6) in Section 2.2.3 with p = 3, q = 5, since in the present case, this is approximately equivalent to the case p = 20, q = 30 used when dealing with TRS. Again, in many of the followingfits,none of the X terms survive the model reduction. We shall force the X terms and test each via the change in deviance. 4.3.1 ER versus TSP Applying the procedure to the 6 ER series leads to the following results: 36 — with lag 0: None of the X terms survive the reduction (Adev = 4.16, degrees of freedom=6). The model with the X terms forced is: Y t = -0.22 - 0.30y"t_i + 0.0022Xt - 0.0018Xt_! + 0.0023Xt_2 -0.0020X(_3 - 0.0014Xt_4 + 0.0028Xt_5. The deviance is 135.58, with 108 degrees of freedom (P-value=0.374). The z-scores of the coefficients of the X terms are: 0.56, -0.43, 0.60, -0.47, -0.34, 0.73. Tests of significance for the X terms are summarized in Table 19. Table 19: Test of Significance for X Terms Xt X -l Xts Xt-2 Xt -3 Xt -4 0.29 0.19 0.36 0.22 0.12 0.52 P-value 0.59 0.67 0.55 0.64 0,73 0.47 Xl t — with lag 1: the model with the X terms forced is: Y t = -0.22 - 0.26F(_3 - 0.0020X* + 0.0055Xt_i + 0.0006X(_2 - 0.0073X^3 +0.0022Xt_4 + 0.006lXt_5. The deviance is 119.21, with 108 degrees of freedom. (P-values=0.217). The z-scores of the coefficients of the X terms are: -0.51, 1.60, 0.17, -1.70, 0.54, 1.65. Tests of significance for the X terms are summarized in Table 20. Table 20: Test of Significance for X Terms 37 Xt X -\ t Xt-2 Xt -3 Xt-4 Xt-S XI 0.27 2.46 0.03 3.09 0.29 2.62 P-value 0.61 0.12 0.86 0.08 0.59 0.11 — with lag 2: None of the X terms survive the reduction (Adev = 7.57, degrees of freedom=6). The model with the X terms forced is: Y t = -1.08 + 0.28Yi_i - 0.0049Xt + 0.005lXt_! + 0.0053X(_2 + 0.0053.X,_3 +0.0032Xt_4 - 0.0036Xt_5. The deviance is 139.50, with 108 degrees of freedom. (P-value=0.022). The z-scores of the coefficients of the X terms are: -1.11, 1.23, 1.35, 1.29, 0.76, -0.79. Tests of significance for the X terms are summarized in Table 21. Table 21: Test of Significance for X Terms Xt Xl P-value X -i Xt-2 Xt -3 Xt -4 Xt -5 1.48 1.76 1.62 0.56 0.64 0.26 0.22 0.18 0.20 0.45 0.42 1-28 t — with lag 3: None of the X terms survive the reduction (Adev = 3.62, degrees of freedom=6). The model with the X terms forced is: Y t = -0.046 - 0.0034Xt + 0.0055Xt_! - 0.0007Xt_2 - 0.0008Xt_3 - 0.003lXt_4 -0.0018Xi_5. 38 The deviance is 137.65, with 109 degrees of freedom (P-value=0.033). The z-scores of the coefficients of the X terms are: 0.81, 1.46, -0.19, -0.22, -0.74, -0.44. Tests of significance for the X terms are summarized in Table 22. Table 22: Test of Significance for X Terms Xt Xt-l X -2 Xt-3 X -4 0.67 2.06 0.04 0.05 0.56 0.19 P-value 0.41 0.15 0.84 0.83 0.45 0.66 Xi t t X -s t — with lag 4: The model with the X terms forced is: Y t = +0.12 + 0.22yt_1 - 0.0017X, - 0.0003Xt_i + 0.0007Xf_2 - 0.0006Xt_3 -0.0093Xt_4 - 0.0038Xt_5. The deviance is 129.03, with 108 degrees of freedom (P-value=0.082). The z-scores of the coefficients of the X terms are: -0.38, -0.06, 0.17, -0.13, -1.84, -0.80. Tests of significance for the X terms are summarized in Table 23. Table 23: Test of Significance for X Terms X Xi t Xt-l X -2 t X -3 t X -4 t Xt-5 0.15 0.00 0.03 0.02 3.69 0.67 P-value 0.70 0.95 0.87 0.90 0.05 0.41 — with lag 5: None of the X terms survive the reduction (Adev = 6.27, degrees of freedom=6). The model with the X terms forced is: Y t = -1.23 + 0.0064Xt - 0.0033_Yt_i + 0.0063Xt_2 + 0.0028Xt_3 + 0.0039X<_4 39 +0.0063Xt_5. The deviance is 118.68, with 109 degrees of freedom. (P-value=0.258). The z-scores of the coefficients of the X terms are: 0.38, -0.78, 1.62, 0.67, 0.94, 1.59. Tests of significance for the X terms are summarized in Table 24. Table 24: Test of Significance for X Terms Xt X -.i X -2 X .. Xl 0.15 0.63 2.55 0.45 0.85 2.43 P-value 0.70 0.43 0.11 0.50 0.36 0.12 t t t 3 X -4 t Xt -5 None of the above analyses (Table 19-24) suggests any clear positive association between ER and TSP. The only small P-value is the 0.05 in Table 23, which correspond to a negative coefficient. However, as argued before, this may well be due to random variation. 4.3.2 AD versus TSP Similarly, examination of AD versus TSP leads to the following results: — with lag 0: None of the X terms survive the reduction (Adev = 4.37, degrees of freedom=6). The model with the X terms forced is: Y t = -0.47 - 0.004lXt - 0.0045Xt_i - 0.0005Xt_2 - 0.0004Xt_3 + 0.0045Xt_4 -f-0.0048Xi_5. 40 The deviance is 112.47, with 109 degrees of freedom (P-value=0.39l). The z-scores of the coefficients of the X terms are: -0.89, -0.99, -0.13, -0.09, 1.04, 1.11. Tests of significance for the.X terms are summarized in Table 25. Table 25: Test of Significance for X Terms X Xt-l Xt -2 Xt -3 Xt -4 Xt -5 0.82 1.01 0.02 0.01 1.05 1.19 0.89 0.93 0.31 0.28 t Xl 0.36 0.31 P-value — with lag 1: None of the X terms survive the reduction (Adev = 1.51, degrees of freedom=6). The model with the X terms forced is: Y t = - 1 . 2 7 + 0.0032X + 0.0040X _! + 0.0016X _ + 0.0001X^3 - 0.0003X _ t t t 2 t 4 -r-0.0030Xi_5. The deviance is 112.17, with 109 degrees of freedom (P-value=0.398). The z-scores of the coefficients of the X terms are: 0.67, 0.84, 0.33, 0.01, -0.06, 0.59. Tests of significance for the X terms are summarized in Table 26. Table 26: Test of Significance for X Terms Xl P-value Xt Xt-l Xt -2 Xt-3 Xt -4 Xt -5 0.44 0.69 0.10 0.00 0.00 0.35 0.75 0.99 0.95 0.55 0.51 0.41 — with lag 2: None of the X terms survive the reduction (Adev = 0.93, degrees of 41 freedom=6). The model with the X terms forced is: Y t = -0.22 + 0.0010Xt - 0.0002X4.! - 0.0034Xt_2 - 0.0001X(_3 - 0.0004Xt_4 -0.0026Xt_5. The deviance is 114.19, with 109 degrees of freedom (P-value=0.348). The z-scores of the coefficients of the X terms are: 0.23,-0.03,-0.73, -0.02, -0.09, -0.54. Tests of significance for the X terms are summarized in Table 27. Table 27: Test of Significance for X Terms X( Xt_x X _2 t X _3 f Xt -4 Xt -5 0.05 0.00 0.55 0.00 0.01 0.30 P-value 0.82 0.97 0.46 0.99 0.93 0.59 Xl — with lag 3: The model with the X terms forced is: Y t = -1.09 + 0.0029X, + 0.0016X,_X - 0.0015Xt_2 - 0.0008Xt_3 + 0.0008X_4 -0.0014Xt_5. The deviance is 118.40, with 109 degrees of freedom (P-value=0.253). The z-scores of the coefficients of the X terms are: 0.66, 0.35, -0.31, -0.16, 1.78, -0.28. Tests of significance for the X terms are summarized in Table 28. Table 28: Test of Significance for X Terms Xt X(_i X(_2 X(_3 Xt_4 Xt -5 0.42 0.12 0.10 0.03 3.02 0.08 P-value 0.52 0.72 0.76 0.87 0.08 0.78 Xl 42 — with lag 4: None of the X terms survive the reduction (Adev = 4.16, degrees of freedom=6). The model with the X terms forced is: Y = t -0.47 + 0.0028X, - 0.0004Xt_x - 0.0054Xt_2 + 0.0022X;_3 + 0.0028X^4 -0.0081X<_5. The deviance is 97.86, with 109 degrees of freedom (P-value=0.769). The z-scores of the coefficients of the X terms are: 0.59, -0.08, -0.97, 0.43, 0.57, -1.38. Tests of significance for the X terms are summarized in Table 29. Table 29: Test of Significance for X Terms X Xt -2 X -z Xt-4 0.34 0.01 0.99 0.18 0.31 2.04 P-value 0.56 0.94 0.32 0.67 0.57 0.15 t X\ X -i t t Xts — with lag 5: The model with the X terms forced is: Y t = 0.32 - 0.36Yt_2 - 0.0071X, - 0.0091X4_i + 0.0033X4_2 - 0.0059Xf_3 +0.0017Xt_4 - 0.0010Xt_5. The deviance is 107.44, with 108 degrees of freedom (P-value=0.497). The z-scores of the coefficients of the X terms are: -1.72, -1.35, -1.75, 0.73, -1.12, 0.36, Tests of significance for the X terms are summarized in Table 30. Table 30: Test of Significance for X terms 43 X t Xt -i Xit-2 Xt-3 Xt-4 Xt- t 5 1.94 3.35 0.52 1.32 0.13 0.05 P-value 0.16 0.07 0.47 0.25 0.72 0.83 xl None of the above analysis suggests any clear positive association between AD and TSP. There are two moderately small P-values: 0.08 in Table 28 and 0.07 in Table 30, corresponding to a positive and a negative parameter respectively. Again, as argued before, these may well be due to random variation. Both GHP and GAR seems to reach the same confusion: there is no clear positive association between the health series and the TSP series suggested by the data available. Of course, this might be due primarily to the fact that TSP is collected only every sixth day. This makes the effective length of the series one-sixth the length of that for TRS. Relationships between the health series and TSP would therefore have to be much stronger to be identified with the same level of confidence. 5 Final Remarks Analysis via both GHP and GAR seem to suggest a weak positive association between both AD and TRS and ER and TRS. The hospital admission counts seem to be affected by TRS levels 4 days ago. The emergency room visit counts seem to be affected by changes in the levels of TRS, as well as by the level of TRS. However, without subject matter knowledge to support the peculiar structure of the models (7) 44 and (8), the second statement should perhaps be considered as a conjecture rather than as a conclusion. On the other hand, the results of Section 4 provide no clear indication of positive association between either ER and TSP or AD and TSP. This suggests either that TSP is less influential (to asthma) or that the way TSP is collected might disguise the association, as was discussed in Section 4. All three methods used in this study are focused two points: to take acount of the dynamics within each series; to study the relationship between two series of data. The spectral analysis approach provides an preliminary exploration of the frequency pattern of each series as well as an approximate measure describing the strength of linear relationship between two time series. However, since the health data is discrete, this approach may not be entirely appropriate. The generalized harmonic process approach and generalized autoregressive model approach both allow us to model discrete time series and their relationships with air pollution covariates. Both approaches allow identification of the form of relationships and provide statistical inferences such as testing the importance of the covariates. The GHP approach focuses on capturing seasonality in the data, whereas the GAR approach focuses on describing the "memory", or "inertia", of the data. Other approaches could be applied to this data analysis. For example, one could consider the daily counts as binary time series that consists of only 0 (a 0 count) and 1 (a positive count). Since the actual counts are mostly 0 and 1, very little information would be lost and the methodology of binary time series could then be 45 applied (see [l]). One could also apply the "parameter-driven model" discussed in [7]. Here, the counts are conditioned on a "latent process", and the dynamics within the count series are accounted for in the correlation structure of the latent process. In dealing with TSP, another approach would be to sum or average the asthma counts within each six day period, and then study the relation between the TSP levels and these summed or averaged counts. In future studies, we suggest that a complete record of the levels of TSP on every day, instead of only on every sixth day, should be made, for the reason described in Section 4. In addition, a larger scale of ER and AD data, possibly by including more hospitals and a longer time period, should be collected. 46 Bibliography [1] Kedem, Benjarnin. (1980). "Binary Time Series". Marcel Dekker, Inc. [2] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (B), 39 1-38. [3] Knight, K., Leroux, B., Millar, J., Petkau, J. (1988). Air pollution and human health: A study based on hospital admissions data from Prince George, British Columbia. Report prepared under Contract No. 1754 with the Health Protection Branch, Department of National Health and Welfare, Ottawa, Ontario. [4] Knight, K., Leroux, B., Millar, J., Petkau, J. (1988). Air pollution and human health: A study based on emergency room visits data from Prince George, British Columbia. Report prepared under Contract No. 1977 with the Health Protection Branch, Department of National Health and Welfare, Ottawa, Ontario. [5] McCullagh, P. and Nelder, J.A. (1983). "Generalized Linear Models". Chapman and Hall, London. [6] Priestley, M.B. (1981). "Spectral Analysis and Time Series". Academic Press, London. [7] Zeger, S. L. (1988). A Regression Model for Time Series of Counts. Biometrika, 75, 4, pp.621-9. 47 Figure 1. Histograms of ER and AD — i 1 2 — -r 3 - 5 4 (a) Histogram of ER 0 (b) Histogram of AD 48 —r~ -r~ 3 4 Figure 2. Comparison of Whitened vs Unwhitened Spectra frequency (a) For ER 1.5 2.0 frequency (c) For T R S 49 3.5 Figure 3 . Coherencies between Whitened Series Figure 4. Proposed Spectra of ER 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 frequency (a) Spectrum of ER via GHP o . OJ in E J = o. o •>w in • o I 1 _ CD CL 0.0 1,111 i 0.5 i 1.0 1.5 i I i 2.0 I i . . . i 2.5 3.0 12 14 i 3.5 frequency (b) Proposal 2 Spectrum of ER 8 10 days (c) Proposal 1 Spectrum of ER 51 16 Figure 5. Proposed Spectra of A D CD CM . E 3 •~ 00 o CL O . 0.0 I 1 I l 0.5 I i I 111 I 1.0 1.5 i l l 2.0 j | 2.5 3.0 3.5 2.5 3.0 3.5 12 14 16 frequency (a) Spectrum of A D via G H P CM E oo r .£= CD • CD Q. , w • CM 0.0 0.5 1.0 1.5 2.0 frequency (b) P r o p o s a l 2 Spectrum of A D 6 8 10 days (c) Proposal 1 Spectrum of A D 52 Figure 6. Comparison between the Whitened and Unwhitened Spectra of ER 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3. 2.5 3.0 3. 2.5 3.0 3. frequency (a) Lag 0 0.0 0.5 1.0 1.5 2.0 frequency (b) Lag 1 0.0 0.5 1.0 1.5 2.0 frequency (c) Lag 2 53 in o L , , , , ^ °0.0 0.5 1.0 1.5 2.0 frequency (d) Lag 3 , 2.5 3.0 3.5 ° 0.0 0.5 1.0 1.5 2.0 frequency (e) Lag 4 2.5 3.0 3.5 0.5 1.0 1.5 2.0 frequency (e) Lag 5 2.5 3.0 3.5 o CO E o 0.0 54 Figure 7. C o m p a r i s o n between the Whitened and Unwhitened S p e c t r a of A D frequency (a) Lag 0 ° 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 2.5 3.0 3.5 frequency (b) Lag 1 ° 0.0 0.5 1.0 1.5 2.0 frequency (c) Lag 2 55 ° 0.0 0.5 1.0 1.5 2.0 frequency (e) Lag 4 2.5 3.0 3.5 Figure 8. Comparison between the Whitened and Unwhitened Spectra of T S P 0.0 0.5 1.0 1.5 2.0 frequency 57 2.5 3.0 3.5 Figure 9. Coherencies between Whitened ER and Whitened TSP 0.0 0.5 1.0 1.5 2.0 frequency (a) Lag 0 2.5 3.0 3.5 co 0.0 0.5 1.0 1.5 2.0 frequency (e) Lag 4 2.5 3.0 3.5 Figure 10. Coherencies between Whitened AD and Whitened T S P 0.0 0.5 1.0 1.5 2.0 frequency (a) AD with Lag 0 2.5 3.0 3.5 CO O Fi ,i > 0.0 0.5 1.0 — i 1.5 1—• , . 2.0 frequency (e) AD with Lag 4 j. 2.5 3.0 3.5
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The relationship between asthma and pollution levels...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The relationship between asthma and pollution levels in Prince George, British Columbia Li, Bing 1989-12-31
pdf
Page Metadata
Item Metadata
Title | The relationship between asthma and pollution levels in Prince George, British Columbia |
Creator |
Li, Bing |
Publisher | University of British Columbia |
Date | 1989 |
Date Issued | 2010-08-21T20:20:54Z |
Description | Three methods are used to analyse the relationships between asthma and pollution levels in Prince George. The first is the spectral analysis approach, which concentrates on the spectra of the pollusion level series and the asthma counts series, their cross spectra, their coherency, and a measure of the linear relationship between them. The other two methods are what we term the "generalized harmonic process" and the "generalized autoregressive approach", which generalize the traditional harmonic process and autoregressive model and allow us to analyse time series with less restricted distribution assumption such as discrete time series. The analyses suggests a weak positive association between asthma and the pollution levels of total reduced sulphates. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2010-08-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0097525 |
URI | http://hdl.handle.net/2429/27582 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- [if-you-see-this-DO-NOT-CLICK]
- [if-you-see-this-DO-NOT-CLICK]
- UBC_1989_A6_7 L48.pdf [ 2.96MB ]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0097525.json
- JSON-LD: 1.0097525+ld.json
- RDF/XML (Pretty): 1.0097525.xml
- RDF/JSON: 1.0097525+rdf.json
- Turtle: 1.0097525+rdf-turtle.txt
- N-Triples: 1.0097525+rdf-ntriples.txt
- Original Record: 1.0097525 +original-record.json
- Full Text
- 1.0097525.txt
- Citation
- 1.0097525.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 29 | 1 |
United States | 14 | 2 |
Canada | 9 | 0 |
Russia | 1 | 0 |
City | Views | Downloads |
---|---|---|
Beijing | 23 | 0 |
Shenzhen | 6 | 1 |
Unknown | 5 | 11 |
Ashburn | 5 | 0 |
Prince George | 4 | 0 |
University Park | 3 | 0 |
Redmond | 2 | 0 |
Redwood City | 1 | 0 |
Saint Petersburg | 1 | 0 |
Rapid City | 1 | 0 |
Pelion | 1 | 0 |
Mountain View | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0097525/manifest