STATISTICAL ANALYSIS OF T H E TEMPORAL -SPAT IAL S T R U C T U R E OF pH L E V E L S F R O M T H E MAP3S / PCN MONITORING N E T W O R K by NHU DINH LE B.Sc, The University of British Columbia ,1984 A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF T H E REQUIREMENTS FOR THE D E G R E E OF M A S T E R OF SCIENCE in T H E FACULTY OF G R A D U A T E STUDIES (The Department of Statistics) We accept this thesis as conforming to the required stj>adprd T H E UNIVERSITY OF BRITISH COLUMB IA August, 1986 © Nhu Dinh Le, 1986 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree a t the U n i v e r s i t y of B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the head of my department or by h i s or her r e p r e s e n t a t i v e s . I t i s understood t h a t copying or p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be allowed without my w r i t t e n p e r m i s s i o n . Department of s t a t i s t i c s The U n i v e r s i t y of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date August 20, 1986 E - 6 (3/81) A B S T R A C T The approach developed by Eynon-Switzer (1983) to analyze the spatial-temporal structure of a data set obtained from the EPRI moni-toring network is applied to a data set obtained from the M A P 3 S / P C N monitoring network. In this approach, a spatio-temporal stochastic model, including deterministic components for seasonal varia-tion and rainfall washout, is fitted to the data. The results indicate that the model fails to capture some of the features of the underlying structure. In an effort to identify an appropriate model for the data, we examine the raw data in detail. An A N O V A model is fitted to the data. Different criteria such as Akaike, Schwarz, Mallows, etc, are used to identify the 'best' submodel (i.e. eliminate some terms in the full A N O V A model). The results indicate that it is possible to capture the deterministic component of the model with a much smaller model (i.e. fewer parameters). The normality of the residuals is also examined. The results indicate that the data from all stations except one can reasonably be approximated as coming from normal distributions. ii T A B L E O F C O N T E N T S A B S T R A C T tt T A B L E O F C O N T E N T S tit LIST O F T A B L E S v LIST O F FIGURES vt A C K N O W L E D G E M E N T vtt 1) I N T R O D U C T I O N 1 1.1) Inference for Spatial-Temporal Data 1 1.2) Data Set Description 3 1.2.1) pH levels 4 1.2.2) Rainfall volumes 5 1) E Y N O N - S W I T Z E R A P P R O A C H 7 2.1) Introduction 7 2.2) The Eynon-Switzer Model 8 2.3) Preliminary Fitting of The Model 10 2.4) Spatial and Temporal Components 17 2.4.1) Fitting the temporal variogram 18 2.4.2) Fitting the spatial variogram 21 2.5) Fitting Without Station 020b 24 2.6) Kriging With a Different Spatial Variogram 26 2.7) Conclusion ' 29 3) D A T A E X P L O R A T I O N 31 3.1) Factors to Be Examined 31 3.2) Preliminary Examination of The Structure of The Data . 33 3.2.1) Relationship between pH and Month, iii pH and Station 34 3.2.2) Relationship between Volume and Station, Volume and Month 39 3.2.3) Relationship between pH and Volume 44 3.3) An A N O V A Model 46 3.3.1) Fitting the A N O V A model 47 3.3.2) Deterministic component 52 a) Multiple correlation coefficient 54 b) Adjusted R2 55 c) Mallows' Cp statistics 55 3.3.3) Stochastic component 60 3.4) Conclusion 63 4) C O N C L U S I O N 65 R E F E R E N C E S 67 iv LIST O F T A B L E S T A B L E 1.1 Site location: The M A P 3 S / P C N monitoring network 69 1.2 Summary of data 70 2.1 The fitted constant coefficients obtained by fitting the Eynon-Switzer model 71 2.2 A N O V A and median decompositions on a, b, c, and d 72 2.3 The ratio of residual SS's of the corrected and uncorrected series of pH levels 76 2.4 Sums of squares errors for different models 77 2.5 The results of fitting the temporal variogram Vp(w) to the data, where Vp(w) = gi/(l + [w2/g2]~1) 78 2.6 Temporal variogram estimated with data of each year and each station using Vp(w) — ave[^(pH(xi, tj) — pH(xi, tj>))2] 79 2.7 The residuals and effects obtained by decomposing the estimated coefficients g! using A V E R A G E and M E D I A N 80 2.8 The estimated value of or 81 3.1 Station effects, using A N O V A and Median decompositions (pH levels) 82 3.2 Monthly effects, using A N O V A and Median decompositions (pH levels) 82 3.3 Station effects on rainfall volumes, using A N O V A and Median decompositions 83 3.4 Monthly effects on rainfall volumes, using A N O V A and Median decompositions 83 v LIST O F FIGURES y F I G U R E 1.1 Field pH 84 1.2 Rainfall volumes 85 2.1 The pH monitoring networks 86 2.2 Phase and amplitude of seasonal pH variation 87 2.3 The estimates of the unsmoothed temporal variogram obtained for data of station 072a, in 1982 90 2.4 The estimates of the unsmoothed spatial variogram for different station pairs 91 2.5 Same as 2.4, data from station 020b removed 91 2.6 Contour maps for pH 93 3.1 Station effects (pH levels) 94 3.2 Monthly effects (pH levels) 95 3.3 Station effects (rainfall volumes) 96 3.4 Monthly effects (rainfall volumes) 97 3.5 Scatter plot of adjusted volumes and adjusted pH levels, all data combined 98 3.6 Scatter plot of adjusted volumes and adjusted pH levels for each station separately 99 3.7 Normal plots for each station separately 103 3.8 Normal plots for each year separately at station 020b 108 vi A C K N O W L E D G E M E N T S I would like to thank Professor A. J. Petkau for his guidance and assistance in producing this thesis. I would like to express my gratitude to Professor J. V. Zidek for his careful reading and helpful comments on this work. I am indebted to Drs. A. Olsen and C. Watson of Batelle Pacific Northwest Laboratory for making the MAP3S/PCN data set available to me. I also would like to express my appreciation to the U B C Acid Rain group, particularly Professors P. Guttorp and P. Sampson, for extensive input during the course of this work. This work was partially supported by the Environmental Protection Agency through a co-operative research agreement with SIAM's Institude for Mathematics and Society (SIMS) and by the Natural Sciences and Engineering Research Council of Canada. vii 1) I N T R O D U C T I O N 1.1) Inferences for Spatial-Temporal D a t a Given a data set (pH levels in this case) obtained from a set of monitoring sites, it is often necessary in environmetrics to make inferences about events at nonmonitored sites. The solution of this problem usually involves some interpolation techniques (i.e. finding the 'best' estimates for the data at nonmonitored sites). Kriging (Delhomme 1978) is a simple method for interpolation to obtain contour maps over the whole region which contains the monitoring sites. However, the spatial covariance function for the whole region is required in order to do kriging. The spatial covariance function can possibly be modelled by using the estimated values of spatial covariances between pairs of monitoring sites, which in turn depend on the spatial-temporal stochastic model used to explain the raw data. Successful use of this scheme for inferences concerning spatial-temporal data requires an appropriate spatial-temporal stochastic model for the data. l In chapter 2 of this thesis, the approach followed by Eynon and Switzer (1983) is applied to the data set obtained by the MAP3S/PCN monitoring network. The results indicate that the spatial-temporal stochastic model developed for the variability of pH levels in their paper fails to capture some of the features of this independent data set. However, for the sake of illustration, we carry the approach through to its logical conclusion and obtain by kriging, contour maps for this data set. In an effort to identify a more appropriate spatial-temporal stochastic model for the data set, the underlying structure of the deterministic components of the data are carefully examined in chapter 3. The results show that there is some seasonal structure in the pH levels. The pH levels are not influenced by the rainfall volumes which show no seasonal structure of their own. A saturated ANOVA model is fitted to the pH levels and the importance of each component of the model is examined. The results suggest that the deterministic components of the pH levels can be captured by a simpler model than the saturated one. Many criteria for comparing different models are also described. Many of the models and methods of analysis employed in environ-metrics are based on assumptions of normality; therefore, the structure 2 -of the residual stochastic component is examined by simple normal probability plots. The results indicate that the data from all sites but one have residual variation which can be reasonably approximated as being normally distributed. 1.2) Data Set Description The MAP3S/PCN network has nine monitoring stations which are located in the northeastern part of the United States. Data were collected for each day on which rainfall occurred from 1976 to 1982; one data record was collected per storm (i.e. event data). However, the stations started collecting data at different times; only data obtained at stations which were active for the entire year are considered (i.e. data obtained in the first year of operation, but with start-up time much later than January 1st of that year are ignored). About five percent of the data are missing (i.e. there was a storm but no data were reported for the pH level or volume); missing data are ignored in this work. As in the Eynon-Switzer analysis, only (field) pH level and Volume measurements are used although the data set provides much more information (e.g. Conductivity, Sulfite, Sulfate, H+,...). The variables considered are Time (t, t=0 on Jan 1**, 1976 and the unit is a day), Volume (rainfall volume was obtained from a raingauge and 3 measured in millimeters) and p H level. T h e station locations, the first active dates, and the A D S (Acid Deposition System) site identity are given in Table 1.1. Table 1.2 provides the number of rainfall events and the averages and standard deviations of the corresponding p H levels and rainfall volumes for each station in each year. 1.2.1) p H levels T h e standard deviations of the p H levels are not entirely consistent from year to year and from station to station. T h e standard deviation of the data at station 020b (Illinois, Illinois) is consistently larger than that at other stations; this excess variability is easily seen in the plots of the raw data (Figures l . l . a - l . l . d ) . Station 065a (Penn State, Pennsylvania) exhibits a large standard deviation (.61) in 1980 due to an outlying data-value (a pH level of 7.31 in December 1980; see Figure l . l . d ) ; the standard deviation calculated without this single p H level is .30, a value which is perfectly consistent with the standard deviations of other years at this station (Table 1.2). T h e data corresponding to four stations (020b, 044a, 048a, 065a) are plotted in Figure 1.1 ( p H levels against Time) . These four plots do not demonstrate any obvious long-term relationship between p H levels 4 and Time; of course, seasonal patterns may be obscured by the severely compacted Time axis. However, the plots do show differences in the data corresponding to different stations. Station 020b (Figure 1.1.a) has a large variability in its pH levels; station 048a (Figure l.l.c) has distinctly smaller variability. Stations 044a and 065a (Figures 1.1.b and l.l.d) display the same overall pattern of relatively small variability. One difference is that there are some possible outliers in the data obtained at station 065a. The plots of the data of other stations (not included in this report) are similar to one of the above figures. 1.2.2) Rainfall volumes The average rainfall volumes vary considerably from year to year and from station to station (Table 1.2). The average rainfall volumes in 1979 are greater than those of other years at all stations but stations 044a and 048a. The standard deviations of the rainfall volumes are large (Table 1.2), and vary greatly from year to year and from station to station. At most stations the standard deviations in 1979 are exceptionally large relative to those of other years; for example, at station 013a, the standard deviation is 29.49 in 1979 while those in other years are less than 15.28. Rainfall volumes are plotted against time for each station separately to examine whether there is any 5 obvious pattern in the data. The resulting plots do not suggest any obvious long-term relationship between the rainfall volumes and Time. Overall, all plots are similar; the plots of four stations (020b, 044a, 048a, 065a) are presented in Figure 1.2. The great variabilites in the rainfall volumes are clearly demonstrated in all plots. There is one possible outlier at station 020b in August of 1979; the rainfall volume is 157.14 mm (the rest are less than 90.00 mm) and the corresponding pH level is 4.22. 6 2) E Y N O N - S W I T Z E R A P P R O A C H 2.1) Introduction One general objective of the study of rainfall acidity is the deter-mination of 'best' estimates of a spatial process describing geographic fluctuations in the acid levels of rainfall at some unobserved locations using the observed data obtained at a limited number of stations moni-toring rainfall acidity. These estimates would allow consideration of the optimal design of networks to monitor the long term and potentially damaging effects of the components of acid rain. An approach for finding these estimates is provided by Eynon and Switzer (1983). In their paper, a parametric model is developed using data obtained from the Electric Power Research Institute (EPRI) network. A spatio-temporal stochastic model, including deterministic components for seasonal variation and rainfall washout and stochastic components for spatial, temporal, and measurement variation, is fitted 7 to the data. Using the method known as Kriging, the best linear unbiased estimates (BLUE) of seasonal and rainfall adjusted yearly average pH over the monitoring region are obtained. It should be noted that Eynon and Switzer use the data set obtained by the EPRI network which also has nine stations, each with two years of data. The geographical locations of stations in both networks are presented in Figure 2 . 1 . Although the geographical locations of the stations in the two networks differ, both networks cover essentially the same area. One difference is that rainfall volume is measured in inches in the EPRI network. In this chapter, we present the results of an attempt to validate the Eynon-Switzer model using the data obtained by the M A P 3 S / P C N monitoring network. The attempted validation follows the Eynon and Switzer paper in a step-by-step fashion. We also present some of the corresponding results from their paper using data obtained from the EPRI network. 2.2) The Evnon-Switser M o d e l Eynon and Switzer propose the following model under which a 8 single year's data are analyzed : -logl0H+{x,t)=pH{x,t) i \ nr.\ t \ • 27rt 2irt = <*(x) + P(t) + a{x) sin — + b (x) cos — dob o65 , c-I(x.t) . ,, + ' ° s . o 1 _ e x p [ _ c . ; ( I > i ) l + ^ . o , where x denotes the two-dimensional location under consideration, t denotes the number of days starting from January 1**, 1976j I(x,t) denotes the observed rainfall in millimeters at location x on day t (Eynon-Switzer used inches in their analysis), pE(x,t) denotes the observed rainfall pH at location x on day t, given I(x, t) > 0, H+(x,t) denotes the hydrogen-ion concentration at location x on day t in gram-atoms per litre, a(x) is a stationary auto correlated spatial process describing the geographic fluctuations in annual pH level, P(t) is a zero-mean stationary autocorrelated time process describing temporal fluctuations not attributed to seasonal variation, independent of a(x), e(x}t) is a zero-mean stationary white-noise measurement error process, independent of a(x) and /?(£), a(x), 6(x)are location-specific model constants which describe seasonal fluctuations, and c is a washout rate constant. The correction for rainfall volume (i.e. the logarithmic term in the pH model) represents the simplest scavenging process in which the number of hydrogen ions picked up per unit time per unit volume of rain is proportional to the remaining atmospheric concentration. This process produces lower acidity (higher pH) for larger rainfall volume. The limit of the logarithmic term is zero as the rainfall volume tends to zero. Since /?(£) and e(x,t) are zero-mean residual processes, a(x) may be interpreted as the seasonally adjusted maximum acidity potential at location x. 2.3) Preliminary Fitting of The Model In this section all parameters in the model are estimated by using the B M D P statistical computer package; the program used is Derivative Free Non-Linear Regression, P :AR. For each of the study years, the separate fitting of the data to the model proceeds in several stages. First, the constant coefficients a(x),b(x),c(x) are fitted by non-linear least squares separately at each station location i=l,...,9. The model used for the data (corresponding to events at times tj in the study 10 year under consideration) being fitted in this step is: pH(x{, tj) = a(x,) sin 2lrfy + b(xi) cos 2nt3-+ log c(xi) • I(Xi,tj) 365 365 10 1 - exp[-c(x t ) • I[xi,tj)] +d{xi) + e(xi,tj). The values of the fitted parameters a(x,), 6(x,), c(x,), d(x,) for each year, and each station are presented in Table 2.1. Note that there are some negative values of e's at stations 020b and 057a. These values should always be positive in principle since c is defined as a washout rate. However, the magnitudes of these values are always very small. These negative values could possibly be the result of the round-off errors in the nonlinear fit of the model. Figure 2.2 shows the phases and amplitudes of the nine fit-ted seasonal curves for each year separately. Amplitude is calcu-lated as \fcfl + S2, and phase is the value of t which minimizes d sin | | | + 6 cos |||. Although most of the minimum points of the seasonal curves occur in the summer, the amplitudes and the times at which the minimum occurs for any station do not show great con-sistency from year to year. This fact is supported by the results presented in Tables 2.2.1 and 2.2.2 obtained by applying the A N O V A and the M E D I A N P O L I S H I N G decomposition techniques to the fitted 11 coefficients a ( i , ) , Results in the Eynon-Switzer paper suggests a similar problem. Two figures (similar to Figure 2.2) in their paper show the same inconsistency described above. T h e results obtained by applying the techniques described above to the fitted coefficients c(a;,), d(x,) are also presented in Tables 2.2.3 and 2.2.4. Each table contains two parts. In Part (a) the average is used to do the decomposition. T h e coefficients are considered as the entries in a two-way table (yearxstation). Each entry could be represented as the sum of overall average, station effect, year effect and residual. T h e decomposition, then, is done by using an A N O V A - t y p e approach; for example, the year (row) effect is taken to be the difference between the row average and the overall average. In Part (b) the Median Polishing approach is used to do the decomposition. Each entry in the yearxstation table is replaced by the difference between itself and the row median. T h e median of these row medians is taken to be the overall effect. T h e year effect is taken to be the difference between the row median and the overall effect. The station effect is taken to be the column median of the new table. T h e residual is then the difference between the entry of the new table and the station effect; the sum of these effects and the residual results in the original entry of the two-way table. 12 In Tables 2.2.1 and 2.2.2 of estimated values of the seasonal co-efficients a(x) and b(x), there are many large residuals. These values appear at different stations and different years without any apparent pattern. Some year effects are much larger, in magnitude, than others. This suggests a considerable interyear variation in the year effect The station effect behaves the same way. Therefore, we may infer that the periodic coefficients a(x) and b(x) are inconsistent from year to year, and from station to station. Note that if the proposed model fitted the data well enough, then one would expect the periodic coefficients to be consistent at least from year to year. This suggests that these data do not have the periodic behaviour postulated in the model. We also apply the above decomposition techniques to the yearxstation tables of phases and amplitudes. The results which are not included in this thesis support our earlier conclusion that the phases and amplitudes are somewhat inconsistent from year to year. It is not surprising that the estimated values of d(x) (decomposed in Table 2.2.4) show little consistency from year to year, and from station to station. The overall effect, d(x)t should be different from station to station and possibly also from year to year. Table 2.2.3 shows some consistency in the estimated values of c. 13 There is only one large residual (at station 065a, in 1980) which is greater (but not very much) than the other residuals. The magnitudes of the year effect are approximately the same. The station effect, however, shows some variation. There are two relatively large values of the station effect which correspond to stations 020b and 048a. This points to possible differences in station effects. Besides, some of the residuals corresponding to stations 048a and 065a vary considerably from year to year (and also from station to station). Therefore, estimates of a common washout rate c obtained by using the data from all of the stations in each year separately, may not be consistent from year to year in conflict with a suggestion in the Eynon-Switzer paper; this is now examined in more detail. Following Eynon and Switzer, we fit a common value of the washout rate parameter c across all stations for each year separately. Using the previous estimates of a(x,), 6(xt), d(x,) (different for each station), we fit a common washout rate c for each year separately. The fitted values of the parameter c are: Year 1977 1978 1979 1980 1981 1982 A c .0405 .0528 .0189 .0555 .0304 .0489 se(c) .0050 .0060 .0037 .0067 .0095 .0049. 14 Here se(c) denotes the nominal asymptotic-theory standard error of c, based on independence assumptions on the residuals. As a rough check for significant year to year differences in these c-values, we apply the Newman-Keuls procedure. The c-values are assumed to be independent, with a common standard error of approximately .006 (this value is the root-mean square of the standard errors displayed above). While the assumptions giving rise to this procedure are not satisfied exactly, nevertheless, the results give us some rough indication of the behaviour of these c-values. As expected, there are some statistically significant differences (at the five percent level). Specifically, the c-value obtained for 1979 is significantly different from the c-values obtained for 1977, 1978, 1980, 1982; and the c-value of 1980 is also significantly different from the c-value of 1981. Note: The fitted values of the common washout rate parameter c given by Eynon-Switzer are: c = 0.92 for year 1 (Sept. 78 - Aug. 79), = 1.10 for year 2 (Sept. 79 - Aug. 80). These values correspond to rainfall volume measured in inches. To make these values comparable with our values of c (corresponding to volume measured in millimeters), we divide c by 25.4 (1 inch = 25.4 15 millimeters). T h e converted values of Eynon-Switzer's c are respectively c = 0.92/25.4 = .0362, and c = 1.10/25.4 = .0433. These values of c agree with ours. Specifically, they are about in the middle of our range of values of c. T h e combined corrections for rainfall volume (using a common washout rate c for each year) and seasonality reduce the overall variation of an annual p H series; the residual standard deviation of a series is reduced by 0 - 30 percent from that of the corresponding uncorrected series. T h e ratios of sums of squares of residuals for the corrected series to that of the uncorrected series are presented in Table 2.3. In some cases, essentially no reduction is obtained by using the combined corrections. T h e reductions corresponding to station 020b in different years are always very small. The results presented in Table 2.3 do not show any obvious relationship between the percentage of the reduction and the stations or the years. We note that according to Eynon and Switzer, these corrections produce a 0 - 20 percent reduction of the residual standard deviation with their data. In addition, they note that each correction separately contributes about one-half of the reduction. T o see whether this is also true for 16 our data, we fit the full model and the reduced model in which the volume correction is omitted (i.e. with only seasonal terms) to eight different data sets. Each data set corresponds to one year at a specific station. T h e results are presented in Table 2.4. The reduction in the residual sum of squares due to the seasonal correction is much higher in some cases and lower in others than the reduction due to the volume correction. It is not obvious here whether each correction separately contributes about one-half of the reduction. 2.4) Spatial and Temporal Components W i t h the estimated parameters a(x,), 6(x,) for each station, each year and the estimated common washout rate c for each year, the residual for the station at location x,- for collection day tj is given by: — • 2 T T £ J » . . 2ntj pH (xi, tj) = pH(xi, tj) - a(xi) sin - 6(x.) cos ~ l o g i o 365 v " 365 C(X.) • I{Xi,tj) 1 - exp[-c(x.) • I(xi,tj)]' These pH(x{,tj) values are treated as estimates of the combined random components a(x,) + fi(t3) + e(zi, tj). Define: Va(u) = E[a(x) — a(x + u)] 2/2 : the variogram of the spatial process, and 17 Vp(tv) — E\(3{t) — B(t + tu)]2/2 : the variogram of the temporal process. These quantities describe the geographic and temporal variation of daily 'corrected' pH readings. Eynon-Switzer indicate that for convenience, the following two simple parametric models for the variograms were fitted: V a { u ) = l + [ u ' M - i ' where <7i,/ii,<?2 a r e non-negative constants and h2 is a symmetric positive definite 2 x 2 matrix. The contours of the spatial variogram Va(u) are concentric ellipses centered at the origin. The models satisfy the constraints of positive defmiteness required of all variograms. The above variogram models are also used to fit our data as described in what follows. 2.4.1) Fitting the temporal variogram; The corrected pH values, pH(xi,t3), are used to estimate an un-smoothed variogram; l/2[pH(xi,tj) - pH(xi,t3i)]2 estimates Vp(w) + of , 18 for w = tj — tji, where o f - var(e (x, - , tj)). Note that Eynon and Switzer estimate o f by using duplicate pH readings. In fact, the replication variability was estimated separately at each station for each year. There were modest differences among these estimates but not so large as to warrant modelling o f as a function of location. In any event, their estimate, o f , turns out to be small relative to temporal and spatial variability, and a common value of o f = .019 was obtained with their data. In our data, since no replicate observations are available, o f is absorbed into the temporal variogram. So l/2[pH(xi,tj) — pH(xi,tj>)]2 estimates Vp(w). All rainfall days separated by a lag w (at a fixed station, separately in each year) are used to estimate Vp(w). The scatter plot of the estimates of the unsmoothed temporal variogram at station 072a in 1982 is presented in Figure 2.3. All possible estimates are plotted; there are 1653 data points. Many of these estimates are near zero, but the variability of the temporal variogram estimates is very large. This pattern is, however, a typical one for all other plots of the temporal variogram estimates at different stations. Eynon and Switzer observe that the fitted value of the constant g2 19 could not be distinguished from zero with their data. To see whether this is true with our data, we fit the variogram model i +(_.2/ g 2)-i to four different data sets corresponding to different combinations of year and station; the results are presented in Table 2.5. Our results support the conclusion of Eynon and Switzer (that the nominal asymptotic standard error of the estimate is always bigger than the estimated value). We would propose to let <j_ = 0. It may be inferred that the temporal variogram has a constant value gfi. Using the data from all years and all stations, the estimated value of gi is gi = .1287. In Table 2.6, different constants gi which are estimated using the data from each year and each station separately are presented. Applying the previous decomposition techniques to these values, we obtain the results presented in Table 2.7. There are some unreasonably large values of residuals corresponding to station 020b. The effect of station 020b is 0.31. This is very large relative to the other values whose magnitudes do not exceed 0.08. This suggests an unusual pattern in the data obtained at station 020b; recall the excessive variability of the data at station 020b noted earlier. Specifically, the values of gi = 0.805 in 1979 at station 020b is very large relative to all others. Except for these consistently large values of station 020b, the results do not show any obvious pattern between the remaining 20 values. Note: O u r value of gx of .1287 is bigger than the value of gi + of = .079 + .019 = .098 obtained by Eynon-Switzer. One possible reason for this is that we appear to have outliers in our data. T h e excessive variability of the data at station 020b might be another possible reason. 2.4.2) Fitting the spatial variogram; T h e corrected p H values, pH(x{,tj) are used as follows to obtain the unsmoothed variogram: l/2[pH(xi,tj) - p~H(xi>,tj)]2 estimates Va(u) +Vp(0+), where u is the vector joining station pairs (a;,-, £ ,< ) . Vp(0+) comes into the expression because the time unit was previously chosen as one day, and two readings of p H levels at two stations on a same day (i.e. same tj) might not have the identical collecting times so that the time lag is always 0 + . Hence l/2[pH(xi,tj) — pH(xi>,tj)]2 — gi estimates V a ( u ) . O f course, Va(u) can be estimated directly only for arguments u corresponding to two stations with events recorded on the same day. 21 With our data set, there are 2848 estimated values for the unsmoothed spatial variogram corresponding to two-dimensional vectors u joining the 36 station pairs. An attempt to fit Va(u) to this data set failed because the algorithm failed to converge. We note that Eynon and Switzer do not report any difficulties in fitting the spatial variogram. To find the estimated values of h\ and h2 for our data set, we try another approach. For each station pair, the simple average of all the available estimates of the unsmoothed spatial variogram is considered as a new data point. The number of unsmoothed estimates entering this average is treated as a case weight. This new data set has 36 data points, with a case weight attached to each point. However, the attempt to fit Va(u) to this new data set also fails; each of three different algorithms, Gauss-Newton and Marquardt algorithms (SAS statistical computer package), and Derivative-Free Non Linear Regression (BMDP package), fails to converge to a solution. The plot of this new data set is in Figure 2.4. The plot, however, shows that the proposed spatial variogram model is totally inappropriate for our data set. There are many negative estimates possibly because 22 of the large estimated value of Vp(0+) (i.e. gi = .1287). And there are also some unreasonably large values of the unsmoothed spatial variogram estimates. These estimates are always from those station pairs where one of the two stations was station 020b (Illinois, Illinois). The difficulties encountered in this fitting may be due to these values. This is another indication of the atypical behaviour of the data obtained at station 020b. The plots of the unsmoothed spatial variogram estimates for each year separately (not included in this report) have approximately the same pattern as the above. The final step in the Eynon-Switzer approach is the interpolation step, called kriging. To carry out this step, an estimated spatial variogram model is required. Since there have been indications that station 020b could be the source of many problems that we have encountered in applying the approach to this data set, we will repeat the previous steps of the approach to the data set with station 020b removed. This may allow us to find estimates of the parameters hi and h2 in the spatial variogram model used by Eynon and Switzer. Besides, we would like to find out how well this approach work if we remove station 020b. This will be done in the next section. 23 2.5) Pitting Without Station 020b In this section we examine whether we can obtain estimates of the parameters hi and h2 from the data with station 020b removed. We repeat the previous steps of the Eynon-Switzer approach on the reduced data set and present the results in what follows. The estimates of parameters a,b,c,d in the Eynon-Switzer model are the same as before. Now we use these estimates to fit a common c value for each year separately. The fitted values of the parameter c are: Year 1977 1978 1979 1980 1981 1982 c .0405 .0622 .0297 .0654 .0492 .0530 old c .0405 .0528 .0189 .0555 .0304 .0489 se(c) .0050 .0054 .0033 .0060 .0044 .0049 where the notation is the same as in section 2.3. The estimated values of c for the years in which the data from station 020 are removed turnout to be larger than those obtained earlier; the largest increase, 0.0188, is in 1981. As before, we apply the Newman-Keuls procedure to check for significant year to year differences in these c-values. An approximate common standard error of c is about 0.005. The result obtained in this step is generally similar to that in section 2.3 although there are differences. Specifically, the c-value for 1980 24 is no longer significantly different from the c-value of 1981, and the c-value obtained in 1979 is significantly different from all the rest. We use estimated values of a, b and a common c to estimate the temporal variogram Vp as in section 2.4.1. As before, the fitted value of g2 is negigible (i.e. the temporal variogram would have a constant value gi). Using all the data together (excluding station 020b), the estimated value of gi is 0.095; this value is comparable to the Eynon-Switzer estimated value of ffi+of = 0.098. This supports our claim that our earlier estimated value of g\ is larger than Eynon and Switzer's because of the excessive variability of the data obtained at station 020b. Using this gi, we obtain the unsmoothed estimates of the spatial variogram as in section 2.4.2. Our attempt to fit the model for Va to these estimates fails (i.e. the nonlinear regression fit fails to converge to a solution for h\ and h2) using all the different approaches described in section 2.4.2. To examine the unsmoothed estimates of the spatial variogram, we plot them in Figure 2.5. The points with unusually large values in Figure 2.4 do not appear in this plot; however, the proposed model 25 for the spat ial variogram st i l l does not seem to be appropriate. Note that there are st i l l many estimated values of the spatial variogram which are negative. Th is suggests that we may st i l l be overestimating the temporal var iogram gi, although our estimated value of gi is now comparable to Eynon and Switzer's. So the removal of station 020b f rom the data set has not resolved the difficulty in obtaining estimates of the parameters hi and h2 in the Eynon-Switzer spatial variogram model. Consequently, the interpolat ion procedure called kriging, undertaken by Eynon-Switzer could not be carried out with this spatial variogram. However, to demonstrate the complete approach of Eynon-Switzer, we consider an alternate spatial variogram model in the next section. 2.6) Kriging With a Different Spatial "Variogram We have learned from our earlier analysis that the Eynon-Switzer model does not appear to be appropriate for this data set. Consequently, the interpolation step based on the residuals from this model seems inappropriate. However, to demonstrate the considerable value of the Eynon-Switzer approach in analyzing spatial-temporal data (when the model is appropriate), we carry out the interpolation step with the 26 spatial variogram ) Va(u) = a-/(tt?_o) + 6* N l i where u is the vector joining station pairs; ||«|| is the length of u and J(u^o) 1S * n e indicator function. The parameters a and b are fitted with constraints a > 0, b > 0, using the above unsmoothed estimates of the spatial variogram. The estimated values of a and b are 0.000 and 0.005 respectively. For each year, the estimated values of the realized spatial process ct(x) at the station locations x = x,- can be obtained by <*(*») = — 2 — , Tli where n,- is the number of rainfall days at station i in the given year. The estimates a(x,) for each year are presented in Table 2.8. Because of the rainfall volume correction, these estimated values are smaller than the simple averages of the uncorrected pH levels (station 020b in 1979 is an exception; this may be due to the excessive variability of pH levels at this station 020b in 1979). At other stations, the differences range from 0.01 to 0.20 and they vary from year to year and from station to station. To put the corrected pH values of Table 2.8 on the original scale we would need to add a number of pH units corresponding to the washout associated with the average daily rainfall at that station in that year. Although the washout rate c is assumed to be common across stations in each year, the large 27 differences in average daily rainfall across stations in each year mean that these adjustments will differ somewhat from station to station in each year. For example, in 1979 when the estimate of the washout rate is c = .0189, these adjustments range from 0.05 to 0.10. Similarly, in 1982 when c = .0489, the adjustments range from 0.08 to 0.22. It is important to add these adjustments to the corrected p H levels before doing the interpolation because estimates at unobserved locations (i.e. contour maps) will be substantially influenced by these values. Since the spatial-temporal model for the p H levels does not seem to be appropriate, resulting contour maps may be misleading. We would like to demonstrate how the kriging method works, but do not intend to take the interpolating results seriously. For this reason adjusting the corrected p H levels seems unneccessary. The interpolation step described below is carried out without first adjusting the corrected p H values. T o obtain estimates of the p H levels for unobserved locations (sub-sequently, contour maps), we use the interpolation technique called kriging (Delhomme 1978). The 'best' linear unbiased estimate of oc(x0) at an unobserved location x0 is given by 9 28 where the weights Ai,...,A9 are chosen to minimize v{x0) = E[a*{x0) - ot{x0)}2 subject to __]^»'( xo) = 1- With stationarity assumptions about the process underlying the data (cf. Journel and Huijbregts (1978)), the quantity v(x0) can be expressed as a quadratic form in the A,'s whose coefficients depend on the spatial variogram V a , the temporal variance parameter gi, and We minimized this quadratic form for each x0 on a fine grid over the eastern United States using the above estimated parameters for the spatial variogram. The contour maps of these resulting interpolated values are presented in Figure 2.6 (produced using the computer language S) for each of the four years from 1979 to 1982. The contour maps could possibly be useful in investigating the pattern of changes from year to year. However, as mentioned earlier, these results could be misleading, so attempting to draw conclusions from these results seems unwise. 2.7) Conclusion In this attempt to validate the Eynon-Switzer model using the M A P 3 S / P C N data we obtain some results which are consistent with those based on the EPRI data set. However, some of the results differ. Specifically, there are significant differences between the estimated values 29 of the yearly washout rate c. O u r estimated constant value of the temporal variogram Vp is larger than that reported by Eynon and Switzer, possibly because of some potential outliers in our data set (including station 020b). This is supported by the results obtained in section 2.5 using the data set with station 020b removed, where the estimated value of Vp is about the same as that obtained by Eynon and Switzer (.095 vs .098). The spatial variogram model proposed by Eynon-Switzer does not appear to be appropriate for the M A P 3 S / P C N network, possibly because of the differences in geographical location between the two networks. Overall, the indication obtained from our analysis is that the Eynon and Switzer model does not completely capture the structure of this data set. However, to investigate their complete approach throughly, we use an alternate spatial variogram model to obtain the contour maps for 1979 to 1982. But as e we have already concluded that the proposed spatial-temporal model for the p H levels appears to be inappropriate, drawing conclusions from these contour maps seems ill-advised. Since this approach is unsuccessful for this data set, we propose to examine the raw data in more detail with a view towards identifying structure which may be present. This will be done in the next chapter. 30 3) D A T A E X P L O R A T I O N We now start a careful examination of the data in an attempt to uncover any underlying structure which may be present. We examine a number of factors which may be expected to influence the pH readings and investigate the possibility of consistent relationships between pH levels and these factors. There may also be some relationships between these factors. 3.1) Factors to Be Examined As described earlier, there are about 3000 data points in the data set. Each data record contains the time of the event (the storm), the location of the station, the rainfall volume of the storm, and other information. Note that in this examination one pH reading of 7.31 at station 065a in 1980 is removed. The new mean and standard deviation for this year at this station are 4.04 and 0.33 respectively, while the old ones are 4.14 and 0.61 (Table 1.2). The removal of this obvious outlier makes the new standard deviation of this year consistent with other years at this station (about 0.3). 31 T h e location of the station should be one important factor. Most stations are located in heavily industrialized areas; therefore, the pollu-tants in the sky at each station are possibly influenced by the chemicals released from the factories in that area. There are also possibly some long-range transportation effects, that is tall remote smokestacks emis-sions are carried by strong winds at high altitudes over very long distances to the location. These contaminants may affect the acidity of the storm at the station under consideration. For these reasons, we expect the effects on the p H levels of the geographical locations of stations may be different from station to station. In our study, the station factor is a categorical variable with nine categories; each specifies a single station. Another important factor is the time of the storm. In some factories, the volume of production is seasonal (i.e. for some specific periods, production is very high, and in some other periods, production is low). During periods of high volume production, more contaminants are released into the sky. These substances may affect the p H reading of the storm. Further, during the summer season, forest fires may pollute the sky; these obviously change the chemistry of the storm, which will in turn affect the p H readings of that storm. For these reasons, we expect the time (i.e. seasonal) factor is important for 32 this study. T o account for seasonality, time is divided into years and months. Note that Egbert and Lettenmaier (1986) divided time into years and seasons (3 or 4 months each), but did not discover any obvious patterns in p H levels. For this reason, we use a shorter time period (i.e. one month) in this analysis. Another important factor is rainfall volume. It is generally believed that the p H reading varies monotonically with the rainfall volume. However, in the previous chapter, our observations of the effect of volume as represented by the scavenging term used by Eynon-Switzer (1983) do not support this belief. Therefore, we will examine more general relationships between rainfall volumes and p H levels. 3.2) Preliminary Examination of The Structure of The Data We first examine the relationships among Time, Volume, Station, and also between these factors and the p H levels. In this section we use monthly averages of p H levels and Volumes instead of individual values. T h e monthly average p H level (or Volume) is just the simple average over the events in each month. Since these monthly averages smooth the original data (i.e. reduce the variability), relationships may be more clearly exhibited in the averaged data than in the original (individual) data. If there are no obvious relationships in the averaged 33 data, relationships in the original data set seem unlikely. One potential problem with this approach is that taking averages could cancel out some effects which we may be able to detect in the individual data. However, this is unlikely to occur since with only a few observations available in each month (about 1-8 points), effects are very hard to detect even if they are present in the individual data. Note that relationships suggested by the monthly-average data can always be examined subsequently in the individual data. Working with the average data also has the practical advantages of reducing the size of the data set to about 500 data points from about 3000 data points. In the following sections, we examine the relationships among the factors in detail using the averaged data set. 3.2.1) Relationships between p H and M o n t h , p H and Station We would like to find out whether or not the Station and T i m e factors affect the readings of p H levels. The T i m e factor has two components: Year and Month . In each year, there are 12 monthly averages at each station. T o examine the relationships among these factors, we will look at the data of each year (all stations) separately. If there are obvious relationships in each year, we hope to be able to recognize how they change over the years. This would subsequently 34 help us to understand the relationships among the factors for all years of data. The monthly average data in each year are fitted with the following additive model: where pHij denotes the pH average of Month j at Station i, fi denotes the overall effect, or,- denotes the Station effects, i = 1 to 9, Pj denotes the Month effects, j = 1 to 12, €ij denotes the residuals. The residuals may contain not only the pure errors, but possibly also some other effects such as a rainfall volume. We proceed to obtain rough estimates of the parameters by using ANOVA-like and median polish decompositions which are described in detail in section 2.4. The monthly averages are arranged in a 2-way table; the factors are Month and Station. We fit the model by this rough method for years 1977 to 1982 separately. The estimated values of the parameters a and P for the years from 1977 to 1982 are presented in Tables 3.1 and 3.2 respectively. 35 T h e estimates of \i are: Year 1977 1978 1979 1980 1981 1982 A N O V A 4.14 4.10 4.27 4.18 4.14 4.21 Median 4.14 4.07 4.22 4.14 4.16 4.25 Note that similar estimates of /x are obtained by both methods. T h e monthly averages range only from about 3.5 to 5.5, and most of them are about 4. Therefore, the values of the overall effect obtained by taking the overall average and the median should be very similar. T h e differences of the estimates from year to year are not large. If we had many more years of data, then we could investigate whether there is a true yearly effect or a trend (pattern) over the years by examining the estimated values of /./. Unfortunately, these issues could not be examined thoroughly because we have only six years of data. Although the six estimated values of fi (i.e. for the six years) are similar in magnitude, it is not reasonable to ignore the yearly effect on the p H levels. Therefore, we still assume that there is some yearly effect on the p H levels. T h e station effects (&) for the different years are plotted against Station in Figures 3.1.1 ( A N O V A ) and 3.1.2 (Median Polish) to see if there is any consistent pattern from year to year. Similarly, the 36 monthly effects (f3) for the different years are plotted against Month in Figures 3.2.1 and 3.2.2. Note that the plots for the two methods of decomposition are qualitatively similar for both sets of estimates. Figure 3.1 shows that the station effects are quite different from station to station. In Figure 3.1.2, there are large positive effects from stations 013a (0.05-0.27) and 020b (0.12-0.34). There are also some large negative effects from station 057a; for example, the effect is -0.30 in 1979. The effects for the remaining stations are very small or centered around zero. The changes in the station effects from year to year are considerable for most stations; stations 048a and 072a are exceptions. In general, Figure 3.1.1 has similar features, although there are some differences. The effects do not change from year to year at station 013a. The effects vary more from year to year in Figure 3.1.1 than Figure 3.1.2. Overall, the effects are quite different from station to station in both plots. This observation supports our belief that the geographical location of a station is an important factor in explaining the pH levels obtained at that station. Note that at station 020b, both estimates of the effect of Station for 1979 are unusually large (Figure 3.1). The reason is that in November 1979, there was only one storm with an unusually high pH level of 7.12; 37 therefore, the monthly average of this month is greater than those of other months (7.12 compared to about 4.0-5.5). Moreover, the estimate for 1979 (Figure 3.1.1) based on the A N O V A decomposition is greater than that from the Median decomposition (Figure 3.1.2). This may be explained by the robustness of the Median Polish decomposition. The monthly average of November in 1979 is obviously an outlier. The decomposition using medians reduces the effect of the outlier on the estimated values of the parameters more than the decomposition using averages (ANOVA). Figure 3.2 shows some interesting results. The monthly effects vary considerably from year to year. However, the effects change in a somewhat consistent manner from month to month. Generally, the effects decrease slowly from January to August, increase sharply from September to November, and then decrease a little in December. Note that this general pattern only becomes apparent when the effects for all years are plotted together. In each year, the pattern of the effects is somewhat different from the general pattern. For example, the effect increases from January to February (sharply) in 1981 and from March to April in 1982; the general pattern suggests a decrease in both cases. These plots also indicate that the minimum of the monthly effects always occurs in the summer. This result supports the expectation 38 that levels of acidity increase in the summer. Overall, although there are differences in the patterns of different years, the general picture described above is the only apparent structure in the data. 3.2.2) Relationships between Volume and Station, Volume and Month T h e monthly average of the p H levels adjusted for these additive station, monthly, and yearly effects, denoted by pH^-, is pHi3 = pH^ -/*-<*,-- fa. T h e results which follow are based on the adjusted p H levels obtained by using the estimated values of /z, a, P provided by the median polish decomposition. We use the median polish estimates here because there are possible outliers in the data set." T h e effect of outliers on the estimated values of parameters is much less pronounced with the median decomposition than the A N O V A method; therefore, the median decomposition is more appropriate. Since we wish to determine the relationship between p H levels and Volumes, we proceed to find the station, monthly, and yearly effects on the rainfall volumes so that we can adjust the rainfall volume for these effects as we did for the pH levels in the previous section. This is done in what follows. 39 The relationship between Volume and Station and Month can be examined by an approach similar to that in section 3.2.1. In each year we fit the following model: Vi] = /* + 7 i + + e,y where Vij denotes the volume average in Month j at Station i, H denotes the overall effect, 7,- denotes the Station effects, i = 1 to 9, Uj denotes the Month effects, j = 1 to 12, €ij denotes the residuals. We obtain estimates of the parameters [i, 7 , and u by using the median polish and ANOVA-like decompositions. We fit this model to the data of each year separately from 1977 to 1982. The estimated values of /z are: Year 1977 1978 1979 1980 1981 1982 ANOVA 14.42 12.82 17.07 10.53 10.77 12.00 Median 15.32 14.03 20.31 11.00 12.39 13.43 The estimates of n are quite different from year to year. Therefore, we infer that there is a substantial yearly effect on the average volumes. The distribution of the monthly average rainfall volumes has a long left 40 tail in each year (all stations); consequently, the median is greater than the average. This is reflected in the above table where the estimates of fi based on the median decomposition are in every case larger than those based on the A N O V A decomposition. The estimated values of the parameters 7 and w are presented in Tables 3.3 and 3.4, respectively and are plotted in Figures 3.3 and 3.4 respectively. Figure 3.3.1 suggests some differences between stations. The effects vary greatly from year to year at some stations. Specifically, the effect for 1979 at station 013a is much greater than that for other years. Also, the effect for 1979 at station 048a is less pronounced than that for other years. The difference between 1981 and 1982 at station 171b is large (about 8). Figure 3.3.2 suggests a qualitatively similar general pattern although there are some differences. The effect of 1979 at station 020b is very small in Figure 3.3.2 but it is large relative to other years in Figure 3.3.1. This is due possibly to the fact that the average volumes are greatly different from month to month in 1979 at this station. As a result, the average and the median of the monthly rainfall volume averages of this year (station 020b) are greatly different; the average is about 20.71 mm and the median is about 7.82 mm. Consequently, the results of the two decomposition methods are quite different for year 1979 and station 020b. Note that 41 there was no rainfall event in February 1979 at this station. Overall, Figure 3.3 suggests that the monthly volumes are substantially different from station to station. Figures 3.4.1 and 3.4.2 show that the monthly effects do not exhibit any obvious pattern. Although the estimates are quite different from year to year, they are scattered around zero. This suggests that seasonality is not important for monthly average rainfall volumes. This result seems surprising; average rainfall volumes might be expected to differ from season to season. Since seasonality was important for pH levels and might be expected to be important for rainfall volumes, this issue is now examined further. We apply the SUPER S M O O T H E R developed by Friedman and Stuetzle (1982), to all individual (event) data. The Super Smoother uses local linear fits with varying window width determined by local cross-validation. In general, this smoother takes bivariate data and produces a smooth fitted relationship between the two variables. In this exercise we use Volume and Time as a bivariate data set. Both Volume and Time are measured as in chapter 2 (i.e. Volume is measured in millimeters and Time is the number of days from January l*t,1976 to the day of the event). The relationship produced by the 42 smoother is Volume as essentially a constant over Time. The constant is 13.6 mm which is very close to the average of all individual rainfall volumes. We also apply the same procedure with Volume adjusted for the Year and Station effects. The result indicates that there is no obvious relationship between the adjusted Volume and Time (i.e. Month). The smoothed curve provided by the smoother does not resemble any standard functional form (Eynon-Switzer suggested a logarithmic relationship between Volume and Time). These results suggest that the rainfall volume of a storm is not influenced by the time of the storm (i.e. there is no seasonality). It appears that the effect of the factor, Month, on the average rainfall volumes can be ignored. The model for the monthly average rainfall volume can now be written as: Vi3- = / i + 7t + where everything is as previously defined. Using the estimated values of the parameters fi and 7, we obtain the adjusted average rainfall volume as follows: Vii = Vij - p, - 7,-. V represents the monthly average rainfall volume adjusted by the yearly average and the station effect. Note that we use the median estimates to estimate /z and 7 . 3.2.3) Relationship between pH and Volume We now search for a relationship between monthly average pH levels and rainfall volumes using the adjusted data (V, pH) instead of the original data (V, pH). Note that if there is indeed a relationship between pH and Volume then it would be more easily detected using the adjusted data than the original data. The effects of each factor (Year, Station, Month) on the volume and the pH level are different; therefore, under the influence of these different effects, the original data may suggest some relationship which is not true. As a first step, pH is plotted against V in Figure 3.5. Only data from 1979 to 1982 is used because only three stations were active in 1977 and only five stations were active in 1978. Station 171b is not included since it has only two years of data. There are about 400 data points. The figure shows that most data points lie near the origin (i.e. V = 0, pH = 0). There are some unusually large values of pH and V. There are four large values of pH corresponding to the values 44 of V ranging from -10 to 5. There are also four large values of V corresponding to pH ~ 0. With these points deleted, the figure does not show any irregularities. To examine these unusual points further, we plot the data from each station separately in Figures 3.6.1 to 3.6.8. These figures reveal that all of the four large values of pH and the largest value of V are from station 020b. They also show that the other three large values of V are from some possible outliers in the data. Indeed, at station 013a, there is one monthly average rainfall volume of 75.20 mm (average of two storms), while the remaining values range from 3.15 to 34.59 mm. At station 072a, there are two average rainfall volumes of 69.36 and 70.76 mm, while the rest range from 2.14 to 31.01 mm. The plot for station 020b (Figure 3.6.2) shows an unusual pattern. The strange behaviour of station 020b is not surprising since as mentioned previously, the monthly averages of rainfall volumes vary greatly from year to year and from month to month at this station. The plots of other stations do not show any obvious pattern for pH and V; there are a few outliers. Overall, the plots do not show any obvious relationship between the adjusted rainfall volumes and pH levels. Since we did not see any relationship in the adjusted data, we are convinced that there is no relationship between the rainfall volumes and the pH levels. It appears that the 45 rainfall volume is not an important factor in explaining the pH levels. In the light of the above preliminary examination, we conclude that Year, Station, and Month are three important factors in explaining the pH levels. The rainfall volumes do not appear to influence the pH levels. In the next section, using the individual (event) data, we examine how these factors influence the pH levels in detail. To investigate this issue, we first fit an ANOVA model to the event data to see if we can fit this data with a simpler model (i.e. fewer parameters). 3.3) An ANOVA Model An ANOVA model would represent the pH values corresponding to the individual events as follows: (1) pHijki = fi + cti + Pj + 7fc + (aP)ij + («7)tfc + {Pl)jk + {apl)ijk + etjkh where pHijki denotes the pH level of the Ith storm occuring at the ith Year, j t h Station and kth Month, H denotes the overall effect, or,- denotes the Year effect, i = 1,..,6, Pj denotes the Station effect, j = 1 , . . ,9 , 7^ denotes the Month effect, k = 1,..,12, 46 [ctj3)ij,(at'i)ik,[l3'i)jk denote the second order interactions, (a/?7),-yfc denotes the third order interaction, Cijki denotes the residual. This is a saturated A N O V A model in which the pH level is represented as a combination of two parts. The stochastic part is represented by e and the remaining components of the model represent the deterministic part (where all factors are considered fixed). In the following sections, we estimate these two parts and then examine each estimate carefully. This examination is intended to reveal the stochastic structure of the data; for example, whether the residual variation can be reasonably approximated by the normal distribution. We would also like to capture the deterministic component of the pH level with a simpler model (i.e. one with fewer parameters). 3.3.1) Fitting the ANOVA model As mentioned in chapter 1, the standard deviations of the pH levels are quite different from station to station; they also vary somewhat from year to year (Table 1.2). Therefore, it seems reasonable to allow the pH levels to have different variances for different yearxstation combinations; we assume that all pH levels measured in the same year and same station have the same variance. Using the data in each 47 year and each station (about 50 to 100 data points), we can get an estimate of the variance (as presented in Table 1.2). However, since we know that monthly effects are important, we can get a better estimate by utilizing this knowledge about the structure of the data. Given a fixed year and a specific station, we can write the pH level of the Ith storm observed in the kth month as: pHki — + CM, where pit denotes the monthly effect, k = 1,..,12, eki denotes the residual. Since we assume that the residuals in each yearxstation combination, have the same variance, a2, we get an unbiased estimator S2 for cr2, given by (2) ?2 HkE,(pHki~pHk.)2 E f c K - 1 ) where p~Hk denotes the average of all pH levels in the kth month, i%k denotes the number of observations in the kth month. These estimates are a bit smaller than those adjusted only for the yearly average instead of the monthly average. We present both estimates of 48 the variances in Table 2.2 where the estimates adjusted for monthly average are denoted by sd(adj). With these estimates of the variances for the pH levels, we use a weighted least squares (WLS) method to fit the model (1). There are two ways of fitting this model. The first is to consider the estimates of the variances obtained above as the true values; the second is to carry out re-weighted least squares in which the variances are re-estimated after each iteration until they do not change substantially. Here, to remain consistent with our assumptions about the variances, we use the former and think that the above estimates are already very close to the true values. Since the primary objective of this fitting is to obtain the estimated values for the deterministic components, not to estimate the variances, the use of the re-weighted least squares method seems unnecessary. In what follows, the saturated model and various submodels will be fitted. To convince ourselves that the estimated values of the parameters would not be much different with re-weighting, we carried out the re-weighted least squares with one simple submodel consisting of Year, Month, and Station effects only: pHijki = (M + ai + fa + 7fc + eijki. 49 Our previous estimates of the variances were used as initial values and the maximum likelihood method was used to re-estimate the parame-ters and the variances, assuming normal data. The fitting procedure converged after three iterations (the absolute change in the estimated values of each parameter is less than .01) and the maximum of the differences between our estimated values of parameters (i.e. iteration 1) and the resulting values from the re-weighting method is less than .02 for each parameter. This supports our belief that re-weighted least squares is unnecessary. Using the procedure G L M in SAS with weights tu,-y defined as 1 No-where Sfj is the estimated variance for year i at station j , calculated using (2), we get the following results (Y=Year, M=Month, S=Station): Source df SS MS Y 5 81.6 16.32 M 11 366.7 33.34 S 8 222.4 27.80 Y M 55 311.6 5.77 YS 29 198.3 6.84 MS 88 172.0 1.96 50 Y M S 319 521.8 1.64 Error 2400 2449.1 1.02 Total 2915 4323.5 Note that by using the WLS method to fit the parameters, we assume that TpHijki/Sij has a variance of 1. Our estimated value of the variance (MSE) is 1.02, which is close to 1. This indicates that our estimates for the variances used in the fitting are appropriate. In this table, df denotes the degrees of freedom of the effect and SS is the reduction in the weighted residual sum of squares due to the source (factor), obtained by adding these sources sequentially to the model in the order listed. More precisely, let RSS(f) be the weighted sum of squares of residuals when fitting the model denoted by / : RSS(f) = Y^wi3ipHi3kl-f) 2, ijkl where / denotes the WLS fitted model. Then each SS can be written in terms of these RSS's. For example, the sum of squares due to the factor, Month, SSM, is given by SSM = RSS(fi + a{ + pj) - RSS(fi + or,- + p3 + -)k). The corresponding MS is just the ratio of SS and df. The sums of squares due to the sources in the above table are not substantially 51 influenced by the order in which the components are sequentially added to the model. Specifically, provided that all l * f order effects are entered, essentially the same SS's for higher order effects are obtained using different orders. The results in the table correspond to what seems the most natural order. Using the above results, we examine the deterministic and stochastic components in the next sections. 3.3.2) Deterministic component In the saturated model (1), we use 516 parameters to capture the deterministic component. In the hope of finding a simpler model (one with fewer parameters) which adequately represents the deterministic components, we examine whether we can drop some effects from the full model. To examine this question, we compare different submodels containing combinations of the effects and interactions. It is obvious that the main effects are very important (the mean square for each of the main effects is at least 16.6, while the largest mean square for the second order interactions is at most 6.8). Moreover, it is reasonable to include a higher order interaction term in the model only when the lower term is also included. For example, if the yearx month interaction is in the model then the year and month 52 effects must be in the model. Guided by these criteria, we consider only eight specific submodels and the full model. We use the log-linear convention to identify the submodels; for example, [Y,SM] represents the model containing Y, S, M , and SM. The results comparing the models to be considered are provided in the following table (p = number of parameters): Submodel P RSS R2 R2 C P - p Ax [Y,M,S] 25 3652.8 .16 .15 761.8 3852.3 3702.8 [YM,S] 80 3341.2 .23 .21 505.2 3979.4 3501.2 [YS,M] 54 3454.5 .20 .19 592.5 3885.3 3562.5 [Y,MS] 113 3480.8 .20 .16 677.8 4382.3 3706.8 [YM,YS] 109 3142.9 .27 .25 335.9 4012.5 3360.9 [YM,MS] 168 3169.2 .27 .22 421.2 4509.5 3505.2 [YS,MS] 142 3282.5 .25 .20 508.5 4415.4 3566.5 [YM,YS,MS] 197 2970.9 .31 .26 251.9 4542.6 3364.9 [YMS] 516 2449.1 .43 .31 49.1 6565.7 3481.1 In this table, the values of various statistics often used as criteria for choosing the 'best' submodel are also included. These criteria, which are similar to those suggested in Weisberg (1980) for independent and identically distributed (iid) residuals, are described in what follows. 5 3 a) M u l t i p l e correlation coefficient The square of the multiple correlation coefficient R2 is often used as a criterion for comparing models. In this case, since we have only independent but not identically distributed residual data, we use an analogue of R2, which is calculated by using the weighted sums of squares, for comparing submodels. A computing formula for R2 in a p-parameter model (denoted by R2) is: P 2 — i _ R^Sp p ~ SYY' where RSSP is the weighted residual sum of squares using the submodel with p parameters, S Y Y is the total sum of squares, given by SYY = £ 0 • « WijtpHvu - fH...)2, where pH denotes the overall average, Wij is as previously defined. Note that the large values of R2 indicate that more variability is explained by the model. A disadvantage of this method is that it is only useful for comparing different models with a fixed number of parameters. 54 b) Adjusted R2 To adjust the method in (a) for the stated disadvantage, corre-sponding to suggestions in Weisberg (1980), one may use an adjusted version of R2, denoted by R2 and defined by where n is the total number of observations. Note that as before large values of R2 indicate that more variability is explained by the model. The adjusted value can be negative. c) Mal lows ' Cp statistic Let pHijkl be the predicted value of the corresponding observation obtained by the submodel. Let the mean square error (mse) be defined as mse(fHijkl) = var(fHijM) + [E(fHijkl) - E{pHijkl)]2. The submodel which makes the weighted mean square errors, given by Wij x mse(pHi:jki), as small as possible is preferable. Note that Weisberg (1980) suggests this criterion only for the iid case. However, since we use the weighted mean square error criterion, we in fact have an iid case where pHijki/Sij is the raw data with variance equal to 55 1. The Mallows' Cp statistic for a p-parameter model is defined by C p = RSSp + 2p - n, where RSSP is as previously defined. Mallows (1973) suggests that good models will have negative, or small, values of Cp — p. We calculate R2, R2, Cp with SYY=4323.5 and n=2916, and present the results in the above table. All three methods suggest that the saturated model is the best one among those submodels considered. This suggests that all terms in the full model are needed to account for the pH levels. Note that the full model corresponds to using the monthly average as the fitted value for the pH level of any particular event. Our objective in investigating the deterministic component is to reduce the full model to a model with fewer parameters which still captures the structure in the deterministic component of the data. To find a simpler model, we would first have to examine whether any of the different sources (i.e. Y, S, M , YS,...) could be dropped from the full model without serious effect on its ability to capture the deterministic component. This would hopefully lead us to a submodel. Subsequently, we would investigate the possiblity of replacing the remaining parameters with fewer parameters by using simple (possibly nonlinear) functional 56 forms to represent some of the effects which were identified as important (a periodic form to represent any seasonality evident in the estimated Y , M , and Y M effects, for example). Using the M S due to the sources, we can see if any source can be el iminated. As noted earlier, the M S due to the main effects are very important; they range from 16.32 to 33.34. W i t h MSerror ~ 1.02, rough F tests would yield very smal l p-values for these main effects. The fate of interaction terms are not so clear. The M S due to the yearx month (MSYM) and yearxstat ion (MSys) interactions are considerably smaller than the M S due to the main effects (the values are 5.77 and 6.84), but rough F tests would yield smal l p-values. The M S due to the monthxs ta t ion (MSus) and yearxs ta t ionxmonth (MSYSM) interactions are much smaller than MSYM and MSYs (MSMS = 1.96, MSySM =• 1.64); intuit ion suggests that these interactions are not really important. A l though the rough F tests would st i l l yield statist ical significance for these interactions, the significance in this case may be inevitable and due to the very large number of observations available (2916). The results using different criteria (described above) also suggest that every term in the model is important in explaining the p H levels (i.e. the fu l l model is the 'best ' one). Note that none of these methods 'adjust ' for the large number of observations available even though the latter 57 plays an important role in these results. To pursue this issue further, we can use the criteria proposed by Schwarz (1978) and Akaike (1973) for comparing the fit of different models. Let p be the number of estimable parameters in a model. Schwarz proposed that the model which maximizes B = log(maximum likelihood) - | log(n) be chosen. Since the number of observations (n) is fixed, with the assumption of independent and normal data having known variances (i.e. ofy = 5?), the criterion is equivalent to minimizing Bi = RSSp -fpxlog(n). Akaike proposed that the model which maximizes A = log(maximum likelihood) - p be chosen. With the same assumptions, this is equivalent to minimizing Ai = RSSP + 2 x p. Note that log(n) = 7.978, so Bi — Ai + 5.978 x p in this case. The values of Ai and Bi for the submodels are presented in the above table. According to Akaike's criterion, the submodel [YS,YM] 58 is the best; the last two terms in the full model (i.e. SM, YSM) can be dropped. This supports our earlier suggestion that MSSM and MSYSM are not really important. Schwarz's criterion, however, prefers the submodel [Y,S,M] to the rest. Nishii (1984) proves that the criteria proposed by Mallows and Akaike are asymptotically equivalent under general conditions; asymptotically, they have a positive probability of selection only for models that properly include the true model. However, the Schwarz criterion behaves somewhat differently; the model chosen by the Schwarz criterion is a 'consistent' estimator of the true model. To a certain extent, this behaviour is evident in our analysis. The three 'best' submodels chosen by the Mallows and Akaike criteria are the same (although their rankings are slightly different), but the Schwarz criterion indicates that smaller submodels are preferable. Since the term log(n) adjusts for the large number of observations available, the Schwarz criterion may be more appropriate in this present context than those of Akaike and Mallows. These results clearly suggest that the deterministic component of the pH levels can be captured by some simpler model than the saturated one. The model [YS,YM] chosen by Akaike's criterion is already much simpler than the full model (109 versus 516 parameters). However, according to the criterion proposed by Schwarz, an even simpler model 59 can possibly be used. Nishii's results suggest that although the sub-model chosen by the Schwarz criterion is an asymptotically 'consistent' estimator of the true model, the submodel chosen by the Akaike cri-terion has a greater likelihood of properly containing the true model. Therefore, to find the smallest possible model for the pH levels, we would need to examine the estimated values of the parameters in the model [YS,YM] to see whether they can be approximated by a smoothed function. This smoothed function (if found) would be a proposed model and its properties and fit to the data could be further examined. However, due to limitations of time, this interesting project must be deferred. 3.3.3) Stochastic component Many methods of analysis employed in environmetrics are based on assumptions of normality. Therefore, it is important to check the normality of the pH levels. The simplest way of examining this question is to use the normal probability plot. The residuals from the full model of the previous section are standardized and plotted for each station (all years together) separately. Using SAS, we calculate the normal 60 probability by Pr(ei) = *-l(ri-l)/(n+1-), where e,- denotes the residual of the Ith observation, 7'i denotes the rank of e,-, n denotes the total number of residuals, $ denotes the cumulative normal distribution. The resulting plots for the different stations are presented in Figures 3.7.1 to 3.7.9. The plots of stations 013a, 044a, 171b are presented in 3.7.1, 3.7.4, 3.7.9 respectively. These plots do not suggest any violation of the normality assumption. Figure 3.7.3 represents the normal plot of data from station 043a. The plot is very well-behaved, except for one possible outlier: pH=5.95 in November 1980 (the rest of the pH readings in that month range from 4.02 to 4.35). The normal plot for station 048a (Figure 3.7.5) has a similar property. It also has one possible outlier: pH=6.00 in December 1981 (the rest of the pH readings in that month range from 3.70 to 4.71). The plot of station 072a (Figure 3.7.8) does not suggest any serious violation of the normality assumption; however, there is one apparent outlier: pH=6.57 in June 1981 (the rest of the pH readings 61 in that month range from 3.86 to 4.07). Figures 3.7.6 and 3.7.7 represent data from stations 057a and 065a respectively. The plots do not show any serious violation of the nor-mality assumption. However, these plots have some disturbing features. Although most stations have collinear data, there are some points on both tails which are slightly off those straight lines. A more careful examination of the data from these stations is described below. Figure 3.7.2 represents the normal plot of data from station 020b. The normal plot shows a serious violation of the normal assumption. Instead of having a roughly linear, a curve is obtained; the data needs to be examined more carefully. We can investigate these 'questionable' stations further by examining the normal plots for each year in each of these stations separately. The plots (not included in this work) for stations 057a and 065a for each year separately show that except for some outliers in the data, there are no obvious violations of the normality assumption. However, the plots of station 020b suggest a serious departure from normality. The plots for 1978 to 1982 are presented in Figures 3.8.1 to 3.8.5 respectively. Except for Figure 3.8.1, the plots clearly exhibit the 62 extreme skewness of the data from this station. Figure 3.8.1 shows that the data for 1978 support the normality assumption except for some apparent outliers. These results indicate that the stochastic components of the p H levels obtained at station 020b cannot be considered as arising from a normal distribution. However, except for a few possible outliers, the data from the remaining stations can be reasonably approximated as normal. 3.4) Conclusion In this chapter, we have examined the data set carefully in an attempt to uncover underlying patterns which might be used to find an appropriate model for the p H levels. T h e results indicate that there is a seasonal structure in the p H levels. Specifically, the time effects decrease slowly from January to August, increase sharply from September to November, and then decrease a little bit in December. T h e rainfall volumes, however, do not show any obvious seasonal structures. There is no obvious effect of the rainfall volumes on the p H levels. T h e station effects are quite important in explaining the p H levels. T h e deterministic components of the p H levels were examined by 63 fitting a saturated A N O V A model; the importance of the individual components in the model were also investigated. The results suggest that the deterministic components can be captured by a smaller model than the full model. The stochastic component was examined for the appropriateness normality assumptions by using normal probability plots. The results indicate that serious violations of the normality assumption are found in the data obtained at station 020b. However, except for a few possible outliers, the data from other stations can be reasonably approximated as normal. 64 4) C O N C L U S I O N In this study, we have learned that the model developed by Eynon and Switzer (1983) for analyzing p H levels, does not seem to be appropriate for the data obtained from the M A P 3 S / P C N montoring network. However, the general approach seems to be useful in analyzing the spatial-temporal data. For this reason, the approach including the interpolation step called kriging was completely demonstrated using the Eynon-Switzer model. In an effort to identify a more appropriate model for the data, we examined the raw data in detail. A full A N O V A model, including the three factors, Year, Month , and Station, was fitted to the individual p H levels. T h e results suggested that the residuals have different variances at different stations. T h e normality of the residuals was examined. T h e conclusion is that the data from all stations except one can reasonably be approximated as coming from normal distributions. This is a very useful result because many methods of analysis employed in 65 environmetrics are based on assumptions of normality. Different criteria such as Akaike, Schwarz, Mallows, etc, are used to identify the 'best' submodel (i.e. fewer parameters than the full model) for capturing the deterministic component of the data. The results suggested that it is possible to capture the deterministic component by a much smaller model than the full model. It would be very interesting to examine whether the 'best' submodel can be represented by a smoothed function. The abilities to fit the data and the properties of this smoothed function (if found) could be further examined. Moreover, since the variances of the residuals of the data are not homogeneous from station to station, we must have the estimates of the variances of the residuals at any locations in order to do interpolation (kriging in this case). The question of whether we would be able to estimate these variances is also interesting. However, due to limitations of time, these interesting projects must be deferred. 66 R E F E R E N C E S Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. g"d International Symposium on In-formation Theory. (B. N. Petrov and F. Czaki, eds.). Akademiai Kiado, Budapest. Delhomme, J. P. (1978). Kriging in the hydrosciences. Adv. Water Resources, 1, 251-266. Egbert, G . D. and Lettenmaier, D. P. (1986). Stochastic Modeling of the Space-Time Structure of Atmospheric Chemical Deposition. Water Resources Research, 22, 165-179. Eynon, B. P. and Switzer, P. (1983). The variability of rainfall acidity. The Canadian Journal of Statistics, 11, 11-24. Friedman, J . H . and Stuetzle, W. (1982). Smoothing of scat-terplots. Department of Statistics, Stanford University, Technical Report ORION006. 67 Journel, A . G . and Huijbregts, Ch. J . (1978). Mining Geostatistics. London: Academic Press. Nishii, R. (1984). Asymptotic properties of criteria for selection of variable in multiple regression. Annals of Statistics, 12, 758-765. Schwarz, G . (1978). Estimating the dimension of model. Annals of Statistics, 6, 461-464. Weisberg, S. (1980). Applied Linear Regression. New York: Wiley. 68 Table 1.1: Site location : The MAP3S/PCN monitoring network. AOS site Elevation F i r s t active Years of identity Location Latitude Longitude (meters) date data 013a Lewes, Delaware 38 46 00 75 00 00 0 01--Har-78 4 020b Illinois, Illinois 40 03 12 88 22 19 212 20--Nov-7 5 043a Uhiteface, New York 44 23 26 73 51 34 610 11--Oct-76 6 044a Ithaca, New York 42 24 03 76 39 12 509 26--Oct-76 6 048a Brookhaven, New York 40 52 00 72 53 00 25 09--Feb-78 5 057a Oxford, Ohio 39 31 51 84 43 25 284 01--Oct-78 4 065a Penn State, 40 47 18 77 56 47 393 22--Sept-76 6 Pennsylvania 072a Virginia, Virginia 38 02 23 78 32 31 172 12--Dec-76 5 171b Oakridge, Tennessee 35 57 41 84 17 14 341 07--Jan-81 2 * A. Olsen and C. Watson, "Acid Deposition System (ADS) for S t a t i s t i c a l Reporting", EPA-600/8-84-023, September 1984, pp. C.7-C.13. 69 Table 1.2; Summary of data. pH Volumes(mm) ADS Identity Year No. of ~ ~~~7 r a i n f a l l s Mean sd sd(ad_) Mean sd 013a 79 80 81 82 52 71 64 73 4.37 4.25 4.22 4.26 .40 .35 .40 .55 .28 .30 .36 .48 24.99 12.88 13.16 13.31 29.49 13.92 12.97 15.28 020b 78 79 80 81 82 50 40 94 87 80 4.25 4.73 4.28 4.34 4.37 .55 .89 .68 .65 .52 .58 .77 .69 .57 .48 10.44 19.80 9.21 10.68 11.12 12.92 29.48 11.65 14.26 14.01 043a 77 78 79 80 81 82 83 54 50 87 99 98 4.24 4.16 4.08 4.03 4.06 4.21 .33 .31 .18 .29 .23 .39 .32 .24 .17 .26 .23 .37 12.97 14.16 17.08 9.52 10.92 8.22 13.18 12.39 14.39 7.78 16.10 10.06 044a 77 78 79 80 81 82 52 60 62 75 72 80 16 05 08 19 10 14 .28 .27 .29 .32 .23 .33 .20 .20 .27 .27 .20 .31 19.06 13.83 17.80 12.41 16.49 11.54 14.20 11.09 19.63 12.20 17.06 12.01 048a 78 79 80 81 82 48 71 64 80 69 4.07 4.10 4.14 4.18 4.23 .35 .49 .51 .45 .45 .30 .42 .44 .39 .40 14.86 12.93 12.61 11.93 15.49 15.21 18.15 12.77 13.26 18.55 057a 79 80 81 82 47 75 75 83 4.22 4.08 3.80 3.99 .23 .29 .24 .37 .20 .24 .19 .30 15.08 11.72 9.67 11.20 16.97 13.34 8.42 13.20 065a 77 78 79 80 81 82 80 70 61 40 69 86 4.06 4.04 4.20 4.14 4.19 4.23 .27 .30 .27 .61 .30 .36 .22 .25 .24 .20 .25 .33 14.93 13.00 18.57 12.46 11.64 9.46 17.84 11.03 18.57 12.61 15.24 10.85 072a 171b 78 79 80 81 82 81 82 57 51 65 56 58 65 64 4.06 4.17 ,09 .15 ,15 .16 .27 .34 .30 .30 .46 .30 .23 .23 .28 .29 .31 .48 .29 .23 .22 18.00 25.30 10.85 14.98 16.01 14.63 23.14 17.47 32.42 11.50 21.79 14.74 13.75 20.17 70 T A B L E 2 . 1 ; T h e f i t t e d c o n s t a n t c o e f f i c i e n t s o b t a i n e d b y f i t t i n g t h e E y n o n - S w i t z e r m o d e l Y e a r P a r a -m e t e r 0 1 3 a 0 2 0 b 0 4 3 a S t a t i o n 0 4 4 a 0 4 8 a 0 5 7 a 0 6 5 a 0 7 2 a 1 7 1 b 77 78 79 8 0 8 1 8 2 a b c d a b c d a b c d a b c d a b c d a b c d - 0 . 1 4 7 8 0 . 2 6 0 6 0 . 0 1 8 7 4 . 2 7 7 8 - 0 . 0 6 1 6 0 . 1 4 6 7 0 . 0 9 0 6 4 . 0 4 1 4 0 . 1 2 4 2 0 . 2 2 5 8 0 . 0 4 5 0 4 . 1 0 8 5 0 . 1 0 8 2 0 . 2 3 3 1 0 . 0 8 5 5 4 . 6 6 0 6 0 . 0 1 8 5 - 0 . 0 0 1 5 0 . 1 3 7 5 0 . 1 8 5 5 0 . 0 4 6 6 0 . 0 4 2 2 4 . 1 3 1 1 3 . 9 9 1 8 - 0 . 1 1 0 0 0 . 2 0 6 3 - 0 . 0 0 6 6 0 . 1 8 9 7 - 0 . 0 0 6 4 0 . 0 2 2 7 4 . 2 6 7 8 4 . 1 1 4 5 0 . 0 2 1 8 - 0 . 0 4 7 7 0 . 1 6 3 1 - 0 . 0 5 0 7 0 . 0 7 6 3 0 . 0 6 6 2 3 . 8 5 8 8 3 . 8 7 4 0 - 0 . 2 4 1 5 0 . 3 5 8 3 - 0 . 0 2 8 9 5 . 0 5 3 8 - 0 . 1 8 1 0 0 . 3 0 3 9 - 0 . 0 0 1 5 4 . 3 8 4 6 - 0 . 0 5 3 0 0 . 3 8 2 1 - 0 . 0 0 0 4 4 . 3 9 9 0 - 0 . 0 1 9 3 0 . 0 5 7 5 0 . 0 1 6 1 4 . 3 3 7 4 - 0 . 0 2 4 1 - 0 . 1 0 1 7 0 . 0 7 2 9 0 . 1 1 1 7 0 . 0 3 4 8 0 . 0 5 6 5 3 . 9 5 5 0 3 . 8 9 0 5 0 . 0 7 4 2 - 0 . 0 4 6 6 0 . 3 1 9 5 - 0 . 0 0 8 1 0 . 0 3 3 7 - 0 . 0 0 5 6 4 . 0 5 6 3 4 . 2 4 0 8 - 0 . 0 4 4 6 0 . 1 5 6 1 0 . 0 2 8 5 3 . 9 9 2 9 0 . 0 2 5 2 0 . 0 3 6 5 0 . 0 5 5 3 3 . 9 5 2 7 - 0 . 1 8 4 0 0 . 0 4 9 4 0 . 0 6 9 0 4 . 0 8 5 9 0 . 0 3 6 6 0 . 1 8 5 5 0 . 0 4 4 9 4 . 0 7 3 1 0 . 0 4 1 2 0 . 1 8 3 0 0 . 0 3 1 1 4 . 0 1 3 7 - 0 . 0 3 8 3 0 . 2 0 6 2 0 . 1 0 2 6 3 . 9 2 9 8 - 0 . 1 4 9 9 0 . 1 0 7 5 0 . 1 3 7 1 3 . 8 8 0 2 0 . 1 3 6 8 0 . 1 2 3 2 0 . 1 4 3 1 3 . 9 0 8 0 - 0 . 1 0 4 8 0 . 2 1 7 6 0 . 1 3 7 5 3 . 9 1 9 4 0 . 2 0 5 8 0 . 0 7 2 9 - 0 . 0 0 2 0 4 . 0 7 8 4 0 . 1 5 3 1 0 . 0 3 6 7 0 . 0 3 3 1 3 . 7 2 9 3 - 0 . 0 3 9 2 0 . 1 3 2 2 0 . 0 2 6 6 3 . 9 3 1 3 0 . 0 5 9 8 0 . 2 3 6 4 0 . 0 3 4 8 3 . 9 6 1 8 0 . 0 4 7 4 0 . 1 6 5 2 0 . 1 0 6 9 3 . 8 0 8 3 - 0 . 0 0 4 5 0 . 0 8 7 1 0 . 0 4 4 S 4 . 0 6 0 7 0 . 0 5 8 6 0 . 3 8 1 3 0 . 1 8 0 5 3 . 9 2 3 6 0 . 0 9 4 5 0 . 1 4 9 6 0 . 0 4 4 7 4 . 1 1 4 1 1 3 7 5 1 6 9 4 0 5 1 6 1 2 0 5 0 . 0 9 4 1 0 . 0 2 5 7 0 . 0 4 7 7 3 . 9 0 5 6 , 0 8 7 1 , 0 8 0 7 0 2 5 4 4 . 0 3 8 1 - 0 . 0 1 7 6 0 . 1 5 5 9 0 . 0 6 6 5 3 . 9 6 2 3 0 . 0 7 7 1 - 0 . 0 3 9 9 0 . 0 2 0 1 4 . 0 7 2 2 0 . 0 3 9 9 0 . 1 4 7 1 0 . 0 2 5 9 4 . 0 6 0 2 0 . 0 7 7 1 0 . 0 5 1 0 0 . 0 4 9 5 4 . 0 1 7 6 0 . 0 5 4 6 0 . 0 8 8 2 0 . 0 0 8 7 4 . 2 3 5 9 71 Table 2.2t ANOVA and median decompositions on a,b,c, and d. (1) The residuals and effects obtained by decomposing the estimated coefficients a(x) using AVERAGE (part a) and MEDIAN (part b). (a) This uses AVERAGE for decomposition. Ads Id. 013a 020b 043a 044a 048a 057a 065a 072a Year 171b Eff e c t Resld. 77 -0 01 -0 02 0 00 0.02 78 0.04 0 16 -0 02 -0 08 -0 04 0 02 0.05 79 -0.08 -0.07 0 05 -0 02 0 17 -0 04 0 04 -0 03 -0.07 80 -0.05 -0.06 -0 02 0 06 -0 11 0 16 0 05 -0 02 -0.02 81 0.07 -0.00 -0 02 -0 00 0 11 0 04 -0 18 0 01 -0 05 0.05 82 0. 1 1 0.09 -0 18 -0 02 -0 08 -0 10 0 11 0 03 0 01 -0.01 Col. e f f . Overall average 0.00 -0. 10 -0. 00 -0 01 -0 02 0 07 0 03 0 02 0 05 0.00 (b) This uses MEDIAN for decomposition. Ads Id. 013a 020b 043a 044a 048a 057a 065a 072a 171b Year Ef f e c t Res. 77 0.01 -0 00 -0 01 0.02 78 0 06 0. 18 0 00 0 -0 04 0 03 0.03 79 -0 10 -0 07 0.05 -0 02 0 22 -0 04 0 01 -0 05 -0.07 80 -0 05 -0 05 -0.01 0 08 -0 04 0 17 0 04 -0 01 -0.03 81 0 05 0 -0.02 0 01 0 17 0 04 -0 20 0 -0 04 0.05 82 0 1 1 0 10 -0. 16 -0 00 -0 00 -0 08 0 10 0 03 0 04 -0.02 Overall median Col . e f f . 0 02 -0 10 -0.01 -0 02 -0 08 0 06 0 05 0 03 0 04 -0.00 72 (2) The residuals and effects obtained by decomposing the estimated coefficients b(x) using AVERAGE (part a) and MEDIAN (part b). (a) This uses AVERAGE for decomposition. Ads Id. 013a 020b 043a 044a 048a 057a 065a 072a 171b E f f e c t Year Resld. 77 -0 01 -0 03 -0 00 0. 04 78 -0. 16 0 15 0 06 -0 13 0 03 0 02 -0. 07 79 0 03 0. 12 -0 05 -0 08 0 16 -0 08 -0 13 -0 00 0. 02 80 -0 1 1 0. 04 0 01 -0 03 -O 08 -0 03 0 14 0 04 0. 04 81 o 03 o. 18 -0 05 0 03 -0 OO -0 00 -0 03 -0 10 0 OO -O. 02 82 0 02 -0. 16 -0 06 0 04 0 08 0 08 -O 03 0 07 0 02 -0. OO Overal1 a\ Col. e f f . 0 07 0. 07 -0 04 0 03 -0 00 -0 09 0 05 -0 07 -0 08 0. 15 (b) This uses MEDIAN for decomposition. Ads Id. 013a 020b 043a 044a 048a 057a 065a 072a 171b Ef f e c t Year 77 -0.01 -0 04 0 01 0 05 78 -0 25 0. 13 0 02 -0 15 0 03 -0 06 -0 04 79 0 07 0 1 1 0.01 -0 03 0 22 -0 02 -0 05 0 -0 03 80 -0 10 0 0.04 -0 01 -0 05 0 00 0 19 0 01 0 02 81 0 01 0 1 1 -0.05 0 02 0 -0 00 -0 01 -0 15 -0 01 -0 01 8.2 -0 01 -0 24 -0.06 0 01 0 07 0 07 -0 02 0 01 0 01 0 01 Overal1 median Col. e f f . 0.09 0.15-0.04 0.04 O -0.08 0.04 -0.01 -0.07 | 0.14 73 (3) The residuals and effects obtained by decomposing the estimated coefficients c(x) using AVERAGE (part a) and MEDIAN (part b). ( a ) T h i s u s e s AVERAGE f o r d e c o m p o s i t i o n . Ads Id. Year 013a 020b 043a 044a 048a 057a 065a 072a 171b E f f e c t R e s l d . 77 0 01 -0 01 -0.03 -0.01 78 -0.01 -0 02 0 01 -0 04 0.03 0 01 0.00 79 -0 01 0.00 0 02 0 02 -0 04 0 01 -0.01 0 01 -0.03 80 0 01 -0.02 -0 03 -0 03 0 01 -0 03 0.08 0 01 0.02 81 -0 01 0.01 0 01 -0 03 0 04 0 02 -0.03 -0 01 0. 02 -0.00 82 0 02 0.01 0 02 0 03 0 02 0 00 -0.03 -0 02 -0. 03 0.01 O v e r a l l a verage C o l . e f f . 0 01 -0.05 -0 01 0 01 0 05 -0 04 0.03 -0 01 -0. 02 0.05 (b) T h i s u s e s MEDIAN f o r d e c o m p o s i t i o n . Ads Id. 013a 020b 043a 044 a 048a 057a 065a 072a 171b E f f e c t Year Res. 77 -0.00 -0.01 -0.01 -0.01 78 -0.01 -0.04 0.01 -0.07 0.04 0 0.01 79 -0.03 -0.00 0.00 0.02 -0.08 -0.01 0.01 0.01 -0.02 80 0.02 0 -0.03 -0.02 0 -0.03 0. 12 0.02 0.01 81 -0.02 0.01 0.01 -0.02 0.02 0.02 -0.01 -0.02 0.02 -0.00 82 0.02 0.02 0.01 0.04 0.00 0.01 -0.01 -0.02 -0.02 0.00 O v e r a l l median C o l . e f f . 0.02 -0.06 0.00 0.01 0.08 -0.03 0.01 -0.01 -0.02 0.05 74 (4) The residuals and effects obtained by decomposing the estimated coefficients d(x) using AVERAGE (part a) and MEDIAN (part b). (a) This uses AVERAGE for decomposition. Ads Id. Year 013a 020b 043a 044a 048a 057a 065a 072a 171b Effect Resid. 77 0 13 0 07 0 00 -0.04 78 -0 13 0 17 -0 01 0.04 -0 10 -0 01 -0.09 79 0 02 0 43 -0 21 -0 20 -0.00 0. 1 1 -0 07 -0 10 0. 13 80 -0 06 -0 08 -0 02 0 14 -0.02 0. 1 1 -0 05 -0 02 -0.02 81 0 02 -0 06 -0 06 0 08 0.01 -0.24 0 15 0 10 -0 08 -0.03 82 -O 07 -0 16 0 04 -0 04 -0.02 -0.07 0 1 1 0 04 0 10 0.01 Overall average Col. e f f . 0 06 0 42 -0 03 -0 11 -0. 14 -0.07 -0 07 -0 06 0 06 4.07 (b) This uses MEDIAN for decomposition. Ads 1d. 013a 020b 043a 044a 048a 057a 065a 072a 171b Ef f e c t Year 77 0 14 0.02 -0 02 -0 03 78 0 0 22 -0.01 0 09 -0 07 0 02 -0 13 79 0 16 0 62 -0 10 -0.15 0 11 0 22 0 02 -0 02 0 04 80 -0 03 -0 01 -0 02 0.07 -0 03 0 10 -0 08 -0 05 -O OO 81 0 03 0 00 -0 07 0.01 0 -0 25 0 1 1 0 05 -0 09 0 00 82 -0 06 -0 10 0 02 -0. 1 1 -0 03 -0 10 0 07 0 0 09 0 04 Overal1 median Col. e f f . 0.06 0.38 0.00 -0.02 -0.11 -0.03 -0.01 -0.00 0.09 | 4.02 75 Table 2.3: The ratio of residual SS's of the corrected and uncorrected series of pH levels. Station Year 013a 020b 043a 044a 048a 057a 065a 072a 171 1977 .86 .63 .50 1978 .99 .55 .52 .56 .52 .79 1979 .72 .86 .73 .48 .68 .98 .65 .74 1980 .70 .90 .82 .73 .75 .72 .57 .68 1981 .75 .85 .73 .61 .98 .73 .74 .90 .07 19S2 .70 .98 .77 .55 .53 .89 .76 .81 .86 Average .73 .92 .74 .59 .70 .71 .62 .80 .77 76 Table 2.4: Sums of squares errors for different models. Station Year SS_!(_ia>j) RSS_ R s s ; ; „ 043a 1977 8.97 8.23 09.0 7.71 06.7 1978 5.11 2.95 73.2 2.80 05.4 * 1979 1.G7 1.59 05.0 1.22 30.3 1980 7.27 G.12 18.8 5.95 02.9 072a 1982 5.28 4.59 15.0 4.28 07.2 OGSa 1978 4.25 3.84 10.7 3.30 16.4 044a 1981 3.82 2.86 33.6 2.34 22.2 013a 1980 8.54 7.80 08.5 5.71 3G.G SS_;(_) denotes the sum of squares error corresponding to fitting a reduced model which involves only term d (i.e. pH = d). Similarly, SSE(_ i 0 IJ,) corresponds to the model . 2nt , 2rrt ^ = O S m 3 W + 6 c ° 3 3 6 5 + ' / -Note : the full model is 2?r£ 2nt e • I P H = fl8in 365 + 6 C ° S 3C5 + / 0 g l 0 l - e x p ( - c / ) + * * denotes (SS_;(j) - S S £ ( _ i 0 > l j ) / SS£( r f ( _ i t ) (in percent). '** denotes (SS^JJ,.,!) - SSK(/««)) / ^E{/uii) («n percent). 77 Table 2.5: The results of fitting the temporal variogram Vp(w) to the data, where V p [ w ) = i + 1 ^ / 1 . 1 - ' ' Station Year Number of data pts. 9i se(ffi) 92 se(c7 013a 1070 1326 .116 .004 2.93 3.54 013a 1980 2085 .085 .002 2.04 2.37 072a 1982 1G53 .079 .003 2.85 3.72 171b 1982 2016 .0G0 .002 1.54 2.C0 78 Table 2 .6: Temporal variogram estimated with data of each year and each station using where tj and tj> belong to the same year, and i denotes station i . Station Year ' 013a 020b 043a 044a 048a 057a 005a 072a 171b 1977 .094 .050 .038 1978 .320 .056 .040 .100 .053 .094 1979 .115 .805 .026 .051 .108 .057 .051 .008 1980 .085 .430 .071 .075 .208 .080 .242 .003 1981 .122 .364 .042 .033 .153 .043 .066 .204 .030 1982 .234 .275 .117 .009 .125 .122 .098 .078 .000 79 Table 2.7 The residuals and effects obtained by decomposing the estimated coefficients gi using AVERAGE (part a) and MEDIAN (part b). (a) This uses AVERAGE for decomposition. Ads i d . 013a 020b 043a 044a 048a 057a 065a 072a 171b Eff e c t Year Resid. 77 0. 10 0.07 0.02 -0.07 78 -0.09 0.01 0.01 -0.03 -0.02 0.01 -0.02 79 -0.06 0.33 -0.08 -0.04 -0.02 -0.05 -0.08 -0-07 0.04 80 -0.08 -0.03 -0.02 -0.00 0.03 -0.02 0. 13 -0.06 0.02 81 -0.00 -0.06 -0.01 -0.01 0.02 -0.02 -0.01 0. 12 0.00 -0.01 82 0. 10 -0. 16 0.05 0.02 -0.02 0.05 0.01 -0.02 0.01 -0.00 Overall average C o l . e f f . 0.01 0.31 -0.06 -0.08 0.02 -0.06 -0.04 -0.03 -0.08 0. 13 (b) This uses MEDIAN for decomposition. Ads i d . 013a 020b 043a 044a 048a 057a 065a 072a 171b E f f e c t Year Res. 77 0 06 0.02 -o oo -0.02 78 -0 05 -0 00 -0.01 -0 06 -0 01 0 01 0.00 79 -0 00 0 45 -0 02 0.01 0 02 -0 00 0 00 0 -0.01 80 -0 05 0 05 0 00 0.01 0 04 0 00 0 17 -0 03 0.01 81 0 00 0 -0 01 -0.01 0 -0 02 0 01 0 13 0 01 -0.00 82 O 06 -0 14 0 02 -0.03 -O 08 0 01 -0 01 -o 05 -O Ol 0.05 Overal1 median Col . e f f . 0 05 0 30 -0 02 -0.02 0 09 -0 00 -0 01 0 01 -0 04 0.07 80 Table 2.8: The estimated value of Ct in year i i s <*(*.) = „ ' m S t a t i o n 1979 Year 1980 1981 1982 01 3a 4. 28 4. 1 1 4. 14 4. 1 4 020b 4. 84 4. 28 4. 33 4. 27 043a 4. 01 3 94 4. 00 4. 12 044a 4. 00 4 .05 4. 02 4. 03 048a 4 .09 4 .03 4. 13 4. 10 057a 4 .16 3 .95 3. 72 3. 89 065a 4 .14 4 .13 4. 14 4. 13 072a 4 .06 3 .98 4. 05 3. 99 171b 4. 07 4. 07 81 Table 3.1; Station e f f e c t s , using ANOVA and Median decompositions (pH l e v e l s ) . S t a t i o n Year 1977 1978 1979 1980 1981 1982 Anova Median Anova Median Anova Median Anova Median Anova Median *P°va Median 013a .09 .22 .07 .06 .08 .06 .08 . IS 020b . 11 . 15 .64 .31 .24 .12 .25 . 13 . 16 . 16 043a .09 .10 . 11 . 14 -.19 -.17 -.12 -.12 -.07 -.11 -.01 .00 044a -.01 .00 -.06 -.04 -.19 -.17 .00 .07 -.03 .07 -.06 -.06 04Ba -.04 -.02 -.13 .03 .02 .06 .03 .00 .02 .03 057a -.04 .02 -.09 -.07 -.30 -.33 -.22 -.25 065a -.08 -.07 - .05 -.08 -.05 -.02 -.04 -.12 .03 . 13 .01 -.02 072a -.06 .02 -.09 -.09 -.08 -.06 -.01 -.05 -.06 -.10 171b .02 .04 .06 .04 Table 3.2; Monthly e f f e c t s , using ANOVA and Median decompositions (pH l e v e l s ) . Year 1977 1978 1979 1980 19B1 1982 Anova Median Anova Median Anova Median Anova Median Anova Median Anova Median January .02 .09 .38 .40 .02 18 .24 20 -.09 _ 11 .05 .06 February -. 13 -.22 -.10 -.19 -.07 02 -.05 - 03 . 18 .18 .01 .01 March .26 .20 . 11 . 12 -.09 - 06 . 12 14 . 14 10 -.07 -.05 Aprl 1 .07 .09 .01 -.01 -.02 - 02 .03 01 -.03 - 01 . 15 .09 May -.22 -.25 . 10 . 15 -.04 - 06 -.18 - 04 -.05 - 07 -.16 -.18 June -. 11 -.12 -.11 -. 17 -. 18 - 11 -. 14 - 11 -.08 - 13 -.01 -.02 J u l y -.31 -.34 -.07 -. 11 -.29 - 22 -.06 - 12 -.09 - 13 -. 18 -.25 August -. 13 -.10 -.08 -.02 -.22 - 12 -.32 - 24 -. 17 - 17 -.21 -.26 September -.03 -.06 -.12 -. 10 . 18 26 -.06 - 08 -.05 - 09 -.04 - . 14 October . 13 .11 -.01 .05 .11 03 .28 18 -.07 07 .08 .01 November .20 .23 -.08 . 10 .41 19 . 12 19 . 15 04 . 17 .09 December .28 .29 . 11 .09 . 18 08 .00 01 .02 05 .22 . 17 82 Table 3.3: Station e f f e c t s on r a i n f a l l volumes, using ANOVA and Median decompositions. Year 1977 1978 1979 1980 1981 1982 Anova Median Anova Median Anova Median Anova Median Anova Median Anova Median - 2 . 4 6 3 . 0 0 - 2 . 8 9 6 . 6 1 -0.54 0 . 0 0 1 1 . 3 9 1 1 . 6 8 2 . 2 9 5 . 0 9 0 31 1.78 0 57 3 09 - 3 . 8 8 - 4 . 5 1 0 . 4 0 - 9 . 2 5 - 2 . 8 0 - 3 .07 - 2 61 - 2 . 2 6 - 2 02 - 0 11 0 . 5 5 0 . 4 8 - 2 . 4 8 0 . 3 8 - 0 . 5 4 - 0 . 3 6 -1 24 -1 . 9 6 - 5 t o -4 31 - 0 . 5 1 0 . 3 7 - 2 . 2 0 - 0 . 0 5 1 .29 1 12 4 81 6 . 3 1 -1 64 0 0 0 1 .05 - 0 . 3 7 - 8 . 0 1 - 5 . 6 9 1 .09 2 . 9 5 -1 09 0 . 0 0 1 99 3 93 - 3 ; 4 4 - 2 . 6 6 - 0 . 2 7 - 0 .74 - 2 94 - 1 . 2 9 - 2 28 - 2 31 - 1 . 2 8 - 2 . 0 3 - 2 . 0 9 0 . 0 5 - 0 . 5 2 - 0 . 4 3 -1 86 -1 .97 - 4 0 5 - 2 40 3 . 8 4 5 . 6 3 6 . 4 5 4 . 18 - 0 . 5 8 0 37 2 31 1 . 42 2 17 1 89 2 33 3 . 7 3 10 53 11 20 Table 3.4: Monthly e f f e c t s on r a i n f a l l volumes, using ANOVA and Median decompositions. Year January February March Apr 11 May •June J u l y August September October November December 1977 Anova Median 1978 1979 1980 1981 1982 Anova Median Anova Median Anova Median Anova Median Anova Median - 9 . 2 1 - 1 0 . 6 2 - 9 . 0 8 - 7 . 2 0 4 . 4 1 2 . 7 5 2 . 7 4 1 .72 - 1 0 . 0 9 - 7 . 2 4 2 . 4 6 - 1 .20 - 5 . 3 0 - 2 . 6 3 - 0 . 5 0 - 3 . 1 1 7 . 2 5 6 . 6 9 9 . 9 3 1 0 . 5 0 4 . 0 4 3 . 2 5 3 . 3 4 4 . 5 8 1 0 . 1 6 9 . 6 7 - 2 . 6 3 - 2 . 1 0 - 3 . 4 7 - 0 . 6 7 - 3 . 5 7 - 1 . 8 6 8 . 9 8 1 2 . S O - 4 . 6 0 - 5 . 5 7 - 0 . 3 4 - 0 . 9 8 0 . 5 7 1 .43 - 1 . 7 1 - 0 . 2 0 - 2 . 3 0 1-69 - 4 . 0 4 - 5 . 2 7 6 . 3 5 5 . 1 4 7 . 4 7 9 . 3 8 -1 . 9 3 - 0 . 7 8 0 . 2 5 - 1 . 2 1 - 5 . 1 1 3 . 2 4 3 . 6 4 - 5 . 6 0 1 . 98 - 1 0 . 1 5 1 0 . 6 6 1 .25 1.94 1.54 - 0 . 2 4 0 . 4 6 - 3 . 6 8 - 3 . 0 3 1 .64 - 0 . 4 5 1 . 14 - 9 . 9 5 - 1 . 9 3 - 3 . 7 0 3 . 5 7 3 . 3 4 3 . 9 7 0 . 3 1 - 1 . 9 6 2 . 3 1 - 2 . 0 9 2 . 7 2 - 1 . 0 6 - 5 . 9 5 - 3 . 6 9 - 4 . 3 2 3 . 3 4 2 . 3 8 4 . 6 7 0 . 0 6 - 2 . 3 1 2 . 6 4 - 1 . 8 1 1.77 - 0 . 6 4 - 6 . 5 4 - 8 . 9 5 2 . 3 0 - 4 . 8 1 - 1 . 0 8 - 0 . 8 4 - 0 . 4 1 4 . 7 7 2 . 6 5 4 . 4 8 5 . 9 4 - 2 . 0 9 - 1 . 9 8 -7 .99 3 . 6 1 - 4 . 2 6 - 1 . 0 3 - 0 . 5 4 - 1 . 6 6 10 . 10 6 . 2 6 4 . 6 1 5 . 3 0 - 1 . 18 0 . 5 1 3 . 14 2 . 4 4 - 0 . 6 2 - 1 . 6 6 - 1 . 9 4 1.41 1.38 0 . 4 5 - 1 . 6 3 - 3 . 0 1 2 . 13 - 1 . 9 3 1 .80 1.11 - 1 . 7 0 - 2 . 0 4 - 2 . 7 0 1.64 1.86 1 . 6 0 - 0 . 9 8 - 3 . 4 1 0 . 7 5 - 3 . 6 7 83 (a) Station 020b ( I l l i n o i s . I l l i n o i s ) (b) Station 044a (Ithaca,NY) (c) Station 048a (Brookhaven,NY) (d) Station 065a (Penn State,PA) Figure 1.1: F i e l d pH. 84 II x > « « » x * x* 11 0 109.) 14 0 IB? 9 719.0 ours I X I O 1 i * K K II X « XX X X X * V « II II « II II 36 5 19.0 l « 0 197.9 , H I D 741 '. Oflrs ixio' i (a) Station 020b ( I l l i n o i s , I l l i n o i s ) (b) Station 044a (Ithaca,NY) X »x x xV x X XX * 1 1 » XX "if/ "A H* -X I , *«b**x/x * " XX "V XXX xV X X X X „ «x* 99.9 110 14 0 117.3 719.0 Oflrs I X I O 1 I 2* X X x* X x X X* X . X X x " * „» V * » X »x X* * ' 'X I XX X X » x * xx\ » \ «" xx , / x „ "x *« « 0 197.9 719.1 7)3 9 DATS IXIO1 I (c) Station 048a (Brookhaven,NY) (d) Station 065a (Penn State,PA) Figure 1.2; R a i n f a l l volumes. 85 Figure 2.1 : The pH monitoring networks. 86 (b) 1978 Figure 2.2; Phase and amplitude of seasonal pH variation: (a) 1977; (b) 1973; (c) 1979; (d) 1980; (e) 1981; (f) 1982. The direction of the ray for each station indicates the time of the year of lowest pH. The length of the ray indicates the amplitude of the seasonal variation. 87 (c) 1979 (d) 1980 88 89 0_ UJ a o ^ 1 -no j o s l I^ GTO 1B2.S m~0^ 255.5 292~0 J 2 B . 5 3 6 5 . 0 TIME LAG (day) Figure 2 . 3 : The estimates of the unsmoothed temporal variogram obtained for data of station 072a, in 1982. 90 (0 0 . 1 BO. O S . 04 ft) - 0 . 0 2 - 0 . 0 5 0 . 24 - 0 . 0 3 0 . 44 - 0 . 01 C . 19 o - 0 . 0 5 0 . 0 2 0 . 24 0 . 21 ftj 1 - 0 . 0 6 0 0 . 0 2 - 0 . Of l - 0 9 - 0 . 0 8 . - 0 . 18 - 0 . 0 5 - 0 . 0 2 - 0 . 0 8 - 0 . 0 8 t 1 - 0 . 0 6 - 0 . 0 8 - 0 . 0 9 - O . - Q B 0 8 0 . 2 8 10 1 - 0 . 0 5 - 0 . 01 - 0 . 0 8 <D 1 - 0 . 0 8 O | i l 1 i l ! 1 c 2 4 6 8 10 12 14 16 long, d i f f Figure 2.4 : The estimates of the unsmoothed s p a t i a l variogram for d i f f e r e n t station p a i r s . LONGITUDE and LATITUDE (in degree) denote the difference i n longitude and i n la t i t u d e , respectively, between a station p a i r . A l l years of data combined. o. ea. op. oi o. 01 -o.oi o.oi o. 01 -o.oi - 0 . 0 2 0 . 0 3 0 . 01 0 . 04 o . 0 6 I CD I GQ I o I 0 . O P - 0 6 -o. 0 5 - 0 . 02 0 . 01 --0. 0 5 - • . 0 5 - 0 . 0 3 - 0 . 0 5 - 0 . 0G - 0 . 0 E 0 4 - 0 . 0 2 0 . 03 -o. 04 - 0 . 0 4 J I I 1 I I L 0 2 4 6 8 10 12 14 16 long. d lFf Figure 2.5 : Same as above, data from Station 020b removed. 91 -95 -90 -85 -80 -75 -70 -65 longltude (a) Year 1979. longitude (b) Year 1980. 92 -95 -90 -85 -80 longitude) -75 (c) Year 1981. -70 o in oo ID 9 U CD m to «» m -95 -90 -85 (d) Year 1982. -80 longltudo -75 -70 Figure 2.6 ; Contour maps for pH. '*': denotes the location of station. -65 93 013a 9 2 (1)ANOVA decomposition 2 8 2 1 0 9 2 1 8 9 9 0 2 1 020b i i 1 1 043a 044a 048a 057a STATION 065a 072a 171b (2) Median decomposition 8 7 9 9 9 0 2 1 9 3 0 8 013a 043a 044a 048a STATION 057a 065a 072a 171b Figure 3.1; Station e f f e c t s (pH l e v e l s ) . Legend; 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982. 94 8 0 2 § 2 7 8 7 9 2 9 8 7 1 — i — Jan. — I 1 1— Feb. Mar. Apr. — I — May June July Aug. MONTH (1) ANOVA decomposition Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2 U J in 2 7 2 f 8 Jan. Feb. Mar. Apr. May June July MONTH Aug. (2) Median decomposition Figure 3.2: Monthly e f f e c t s (pH l e v e l s ) . 95 9 2 1 8 0 7 2 2 ? 9 8 0 8 8 9 (1) ANOVA decomposition , i — i 1 1 i " 1 1 013a 020b 043a 044a 048a 057a 065a 072a 171b STATION Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982 <->o 0 2 9 2 0 . ° ? 2 I 1 h ? 8 0 0 7 3 (2) Median decomposition 1 1 1 1 1 1 1 i i 013a 020b 043a 044a 048a 057a 065a 072a 171b STATION Figure 3.3: Station effects (rainfall volumes) 96 8 9 8 7 8 0 7 0 2 9 8 1 9 1 2 7 2 1 9 2 8 0 8 7 0 1 T T T" ^ ^ BepT Oct. Nov. Dec. Jan. Feb. Mar. Apr. May M 0 N T H (1) ANOVA decomposition Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982 9 8 8 0 1 7 0. 1 1 9 I Q 9 3 2 2 3 g 8 Q g o z l R ? 8 9 n S • 7 J • i ? • 9 9 2 2 0 1 1 1 1 1 1 1 1 1 1 1 i Jan. Feb. Mar. Apr. May June July Aug. Sep. Oct. Nov. Dec. M 0 N T H (2) Median decomposition Figure 3.4: Monthly e f f e c t s ( r a i n f a l l volumes). 97 r-i" 0-Figure 3 . 5 : Scatter' p l o t of adjusted volumes and adjusted pH l e v e l s , a l l data combined. P - I u * *** xl* xx*" x X ' X X X ". 1 *." * x . x x ,» x 1 x a CO CO o . -30 -20 -10 0 10 i — 20 I — 30 40 — r * 50 — r 60 -1 70 VOLUME ADJ (mm) (1) Station 013a a ° (X o ' > • > > -30 -20 -10 10 20 30 VOLUME ADJ (mn) I a. (2) Station 020b • * 11 >. , 1 1—• 1 " f -30 -20 -10 0 10 20 30 VOLUME ADJ (mm) 40 50 60 70 1 1 1 1 40 50 60 70 Figure 3.6: Scatter plots of adjusted volumes and adjusted pH levels for each station separately. 99 03 O " (3) Station 043a o * o o CE •-0_ x x x x x x x x x X* x x , S X X * * * " * * * " »x „x x x X ' - X x O . O . I -30 -20 -10 10 20 VOLUME ADJ (mm) (4) Station 044a tn, cc # X Xy K ' * « *« x * * X x y X o - l CO - I — 20 -30 -20 -10 T " 0 10 VOLUME ADJ (mm) 100 (5) Station 048a Q o o_ P- I oo o . XX X X X „ X " X X! < „ w I J x X X X " X K x x x — 1 1 10 20 VOLUME ADJ (mm) I 30 -30 -20 -10 (6) Station 057a cr o ' Q - l CO o . x X X X * X X x xx -30 -20 -10 "T 0 -1— 10 20 30 VOLUME ADJ (mm) 101 (7) S t a t i o n 065a 2 -cr o " o - l CO o . X M * X „. ' X X if x X It * „ X « w x «x * * » • "* " x M X » * X X « —r 30 -30 1 — -20 - 1 — -10 10 20 VOLUME ADJ (mm) (8) S t a t i o n 072a o " Q CX X x i * x x xx. X X x X X o . CO o . 30 -30 -20 -10 10 I 20 40 VOLUME ADJ (mm) 102 P L O T OF R E S I O R N K ' R E S I D L E G E N D : * • 1 O B S . B - 7 O B S . E T C . A AB BAA A C S F B F GA B J GG I E E L GOE CH* BL FK MB GCE CHB HE AGC CF DAE B F AE E (5) Station 048a - 0 . 8 0 . 3 R E S I O I . S R A N K 1 . 0 0 R 0 . S R 0 . 0 I A B L - 0 . 3 E R E - 1 . 0 S I 0 - 1 . 5 A A » AA A CAA DCB AA E B F A C G A GO f C CHC A C F 8 CGF - I C B F F B F F 8 K B FOA B E ADA 0 . 5 0 R E S I O (6) Station 057a 105 PLOT OF RESIDRNK'RESIO LEGEND: * • 1 OBS. B • 2 OBS, ETC. A C AAA AA B ACB CBA OA CCC BFA DE BDF a G AK DI AGS GO f BDB DA CAAA BA 3.0 -2.5 -2.0 -I.S -1.0 -O.S 0.0 0.5 1.0 RES1D (1) Station 013a I.S 2.0 I.S 3.0 3.5 4.0 A.S N K 1.0 F 0 R O.S B 0.0 L -0.5 E -1.0 s I 0 -I.S BCC ABBD GO AFE CK JC HO GGB HO GI IK FF EH KC BJC H EH DCD 00 CF ADB CAA BC -2.0 -2.S -2.0 -I.S -1.0 -0.5 0.0 0 5 1.5 2.0 2.5 3.0 3.5 A O 4.5 (2) Station 020b Figure 3.7; Normal plots for each station ( a l l years) separately. 103 PLOT OF RESIORNK'RESID LEGENO: * • 1 OBS, B • 2 OBS. ETC. I F • 0 R V 0 • 1 L E R E - 2 • S I 0 -3 • FHL AILE MJ DOE AFGF EFGO GPO HOI IOI HSJ — -CRO JPK OPC OOK HO BEE F B AAAAA AB B A (3) S t a t i o n 043a 0 . 6 1 8 RESIO A AA B A B E -a • s I BBC BF FFCAO OKOO KOA DOOC CI I ID DOKF EIF1B AN I CEDFN BFBIE AEBDF CCFAC BOAA ABC I - J . 5 - 2 . 0 - 1 . 5 - 1 . 0 - 0 . 5 0 . 0 RESID (4) S t a t i o n 044a 104 P L O T OF R E S I O R N K ' R E S I O L E G E N D : A • 1 O B S . B • 2 O B S . E T C . I 4 • A A C C B B C i I B -1 • L E R E -2 • S I I D CDOC ECHO FOE I I L F F K K B HFHG HRG B F C I F E J H O OP F H I A E G F B D J BD AAAAB -3 • I -4.0 " o ! 9 " " " i . o " " - » . 5 " " - 2 . o " - I.S -< 0 -0.5 0.0 0.5 1.0 1 5 2.0 2.5 3.0 3.5 R E S I O (7) S t a t i o n 065a a A A C S B C AF B E FC A E D DFA A F D DI F D C KD E L GE D E C GCB BCF AGB C E A E B BBA B B B S AC ABA B AA AA A -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 I . S 2.0 2.5 3.0 3.5 4.0 4.5 (8) S t a t i o n 072a 106 PLOT OF RESIDRNX'RESID LEGENO: * • 1 OBS, B • 2 OBS. ETC. R N 1.0 c c AC D CC 0.0 V » R I A 8 -0.5 L E R -1.0 E S I 0 -1.5 AF 8 CA BC CB AC BAB A I . . • • • • * " 0 * 0 - a ! s - 2 * 0 - V . l " -1.0 .-0.5 0.0 0.5 1.0 I.S 2.0 2.5 3.0 RESIO (9) Station 171b 107 PLOT OF RESIDRNK'RESIO LEGEND: 4 • 1 OBS, B • 2 OBS. ETC. * 0.0 •-L -0.5 E R E S -1.0 • I 0 -i.a - I . J (1) 1978 0.6 RESIO F 0.5 0 R A 0.0 • R I L -0.5 • E R E S -1.0 • I 0 AA A 0.4 RESID (2) 1979 Figure 3.8: Normal plots for each year separately at station 020b. 108 PLOT OF RESIDRNK'RESID LEGEND: A • 1 OBS. B • 2 OBS. ETC. I 2.S • R A N 1.0 AA A A BA F 0 0.9 R B -0.5 L E R -t.O E S I 0 -I.S B CA F BC CA AC ABA BA BAA BA A -3.0 • I (3) 1980 N 1 .0 K F 0 0.5 R V A 0.0 B -0.5 L E R -1.0 E S 1 0 -1.5 AB BA C BA BC DA CA -I 0 RESID (4) 1981 109 PLOT OF RESIDRNK'RESID LEGEND: A • 1 OBS, B • 2 OBS. ETC. F 0 0.9 R A A AB AE C A BB BA R -1.0 E S I D -1.5 -3.0 • I (5) 1982 1.5 RESID 110
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis of the temporal-spatial structure...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis of the temporal-spatial structure of pH levels from the MAP3S/PCN monitoring network Le, Nhu Dinh 1986
pdf
Page Metadata
Item Metadata
Title | Statistical analysis of the temporal-spatial structure of pH levels from the MAP3S/PCN monitoring network |
Creator |
Le, Nhu Dinh |
Publisher | University of British Columbia |
Date Issued | 1986 |
Description | The approach developed by Eynon-Switzer (1983) to analyze the spatial-temporal structure of a data set obtained from the EPRI monitoring network is applied to a data set obtained from the MAP3S/PCN monitoring network. In this approach, a spatio-temporal stochastic model, including deterministic components for seasonal variation and rainfall washout, is fitted to the data. The results indicate that the model fails to capture some of the features of the underlying structure. In an effort to identify an appropriate model for the data, we examine the raw data in detail. An ANOVA model is fitted to the data. Different criteria such as Akaike, Schwarz, Mallows, etc, are used to identify the 'best' submodel (i.e. eliminate some terms in the full ANOVA model). The results indicate that it is possible to capture the deterministic component of the model with a much smaller model (i.e. fewer parameters). The normality of the residuals is also examined. The results indicate that the data from all stations except one can reasonably be approximated as coming from normal distributions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0096703 |
URI | http://hdl.handle.net/2429/25884 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1986_A6_7 L39.pdf [ 5MB ]
- Metadata
- JSON: 831-1.0096703.json
- JSON-LD: 831-1.0096703-ld.json
- RDF/XML (Pretty): 831-1.0096703-rdf.xml
- RDF/JSON: 831-1.0096703-rdf.json
- Turtle: 831-1.0096703-turtle.txt
- N-Triples: 831-1.0096703-rdf-ntriples.txt
- Original Record: 831-1.0096703-source.json
- Full Text
- 831-1.0096703-fulltext.txt
- Citation
- 831-1.0096703.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096703/manifest