STATISTICAL ANALYSIS OF T H E TEMPORAL-SPATIAL S T R U C T U R E OF pH L E V E L S F R O M T H E M A P 3 S / P C N MONITORING NETWORK by N H U DINH L E B.Sc, The University of British Columbia ,1984 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S FOR T H E D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES (The Department of Statistics) We accept this thesis as conforming to the required stj>adprd T H E UNIVERSITY OF BRITISH C O L U M B I A August, 1986 © Nhu Dinh Le, 1986 In p r e s e n t i n g this thesis r e q u i r e m e n t s f o r an of British it freely available agree that in partial advanced degree a t Columbia, I agree that for reference permission by understood that for h i s or be her copying or f i n a n c i a l gain Department o f statistics The of University Vancouver, V6T 1Y3 Date E-6 (3/81) British Canada August 20, 1986 University Library s h a l l make and study. I the publication be of further this this my It is thesis a l l o w e d w i t h o u t my Columbia thesis head o f representatives. s h a l l not the the g r a n t e d by permission. 1956 Main M a l l the f o r extensive copying of f o r s c h o l a r l y p u r p o s e s may department or f u l f i l m e n t of written ABSTRACT The approach developed by Eynon-Switzer (1983) to analyze the spatial-temporal structure of a data set obtained from the E P R I monitoring network is applied to a data set obtained from the M A P 3 S / P C N monitoring network. model, In this approach, a spatio-temporal stochastic including deterministic components for seasonal varia-tion and rainfall washout, is fitted to the data. The results indicate that the model fails to capture some of the features of the underlying structure. In an effort to identify an appropriate examine the raw data in detail. data. model for the An ANOVA data, we model is fitted to the Different criteria such as Akaike, Schwarz, Mallows, etc, are used to identify the 'best' submodel (i.e. eliminate some terms in the full A N O V A model). The results indicate that it is possible to capture the deterministic component of the model with a much smaller model (i.e. fewer parameters). The The normality of the residuals is also examined. results indicate that the data from all stations except one can reasonably be approximated as coming from normal distributions. ii TABLE OF CONTENTS ABSTRACT tt TABLE O F CONTENTS tit LIST O F T A B L E S v LIST O F F I G U R E S vt ACKNOWLEDGEMENT vtt 1) I N T R O D U C T I O N 1 1.1) Inference for Spatial-Temporal Data 1 1.2) Data Set Description 3 1.2.1) p H levels 4 1.2.2) Rainfall volumes 5 1) E Y N O N - S W I T Z E R A P P R O A C H 7 2.1) Introduction 7 2.2) The Eynon-Switzer Model 8 2.3) Preliminary Fitting of The Model 10 2.4) Spatial and Temporal Components 17 2.4.1) Fitting the temporal variogram 18 2.4.2) Fitting the spatial variogram 21 2.5) Fitting Without Station 020b 24 2.6) Kriging With a Different Spatial Variogram 26 2.7) Conclusion 29 ' 3) D A T A E X P L O R A T I O N 31 3.1) Factors to Be Examined 31 3.2) Preliminary Examination of The Structure of The Data 3.2.1) Relationship between p H and Month, iii . 33 pH and Station 34 3.2.2) Relationship between Volume and Station, Volume and Month 3.2.3) Relationship between p H and Volume 3.3) A n A N O V A Model 39 44 46 3.3.1) Fitting the A N O V A model 47 3.3.2) Deterministic component 52 a) Multiple correlation coefficient 54 b) Adjusted R 55 c) Mallows' C 2 p statistics 3.3.3) Stochastic component 3.4) Conclusion 55 60 63 4) C O N C L U S I O N 65 REFERENCES 67 iv LIST O F T A B L E S TABLE 1.1 Site location: The M A P 3 S / P C N monitoring network 69 1.2 Summary of data 70 2.1 The fitted constant coefficients obtained by fitting the Eynon-Switzer model 71 2.2 A N O V A and median decompositions on a, b, c, and d 72 2.3 The ratio of residual SS's of the corrected and uncorrected series of p H levels 76 2.4 Sums of squares errors for different models 77 2.5 The results of fitting the temporal variogram Vp(w) to the data, where Vp(w) = gi/(l 2.6 + [w /g ]~ ) 2 tj) — pH(xi, tj>)) ] 2 2.8 The estimated value of or 3.1 Station effects, using A N O V A and Median and M E D I A N 82 Monthly effects, using A N O V A and Median decompositions (pH levels) 82 Station effects on rainfall volumes, using A N O V A and Median decompositions 3.4 80 81 decompositions (pH levels) 3.3 79 The residuals and effects obtained by decomposing the estimated coefficients g! using A V E R A G E 3.2 78 1 Temporal variogram estimated with data of each year and each station using Vp(w) — ave[^(pH(xi, 2.7 2 83 Monthly effects on rainfall volumes, using A N O V A and Median decompositions v 83 LIST O F F I G U R E S y FIGURE 1.1 Field p H 84 1.2 Rainfall volumes 85 2.1 The p H monitoring networks 86 2.2 Phase and amplitude of seasonal p H variation 87 2.3 The estimates of the unsmoothed temporal variogram obtained for data of station 072a, in 1982 2.4 90 The estimates of the unsmoothed spatial variogram for different station pairs 91 2.5 Same as 2.4, data from station 020b removed 91 2.6 Contour maps for p H 93 3.1 Station effects (pH levels) 94 3.2 Monthly effects (pH levels) 95 3.3 Station effects (rainfall volumes) 96 3.4 Monthly effects (rainfall volumes) 97 3.5 Scatter plot of adjusted volumes and adjusted p H levels, all data combined 3.6 98 Scatter plot of adjusted volumes and adjusted p H levels for each station separately 99 3.7 Normal plots for each station separately 103 3.8 Normal plots for each year separately at station 020b 108 vi ACKNOWLEDGEMENTS I would like to thank Professor A. J. Petkau for his guidance and assistance in producing this thesis. I would like to express my gratitude to Professor J. V. Zidek for his careful reading and helpful comments on this work. I am indebted to Drs. A. Olsen and C. Watson of Batelle Pacific Northwest Laboratory for making the M A P 3 S / P C N data set available to me. I also would like to express my appreciation to the U B C Acid Rain group, particularly Professors P. Guttorp and P. Sampson, for extensive input during the course of this work. This work was partially supported by the Environmental Protection Agency through a co-operative research agreement with SIAM's Institude for Mathematics and Society (SIMS) and by the Natural Sciences and Engineering Research Council of Canada. vii 1) I N T R O D U C T I O N 1.1) Given Inferences for S p a t i a l - T e m p o r a l D a t a a data set of monitoring sites, inferences about (pH levels in this case) obtained from a it is often necessary in environmetrics events at nonmonitored sites. to set make The solution of this problem usually involves some interpolation techniques (i.e. finding the 'best' estimates for the data at nonmonitored sites). Kriging (Delhomme 1978) is a simple method for interpolation to obtain contour maps over the whole region which contains the monitoring sites. However, the spatial covariance function for the whole region is required in order to do kriging. The spatial covariance function can possibly be modelled by using the estimated values of spatial covariances between pairs of monitoring sites, which in turn depend on the spatial-temporal stochastic model used to explain the raw data. Successful use of this scheme for inferences concerning spatial-temporal data requires an appropriate spatial-temporal stochastic model for the data. l In chapter 2 of this thesis, the approach followed by Eynon and Switzer (1983) is applied to the data set obtained by the MAP3S/PCN monitoring network. The results indicate that the spatial-temporal stochastic model developed for the variability of pH levels in their paper fails to capture some of the features of this independent data set. However, for the sake of illustration, we carry the approach through to its logical conclusion and obtain by kriging, contour maps for this data set. In an effort to identify a more appropriate spatial-temporal stochastic model for the data set, the underlying structure of the deterministic components of the data are carefully examined in chapter 3. The results show that there is some seasonal structure in the pH levels. The pH levels are not influenced by the rainfall volumes which show no seasonal structure of their own. A saturated ANOVA model is fitted to the pH levels and the importance of each component of the model is examined. The results suggest that the deterministic components of the pH levels can be captured by a simpler model than the saturated one. Many criteria for comparing different models are also described. Many of the models and methods of analysis employed in environmetrics are based on assumptions of normality; therefore, the structure 2 - of the residual stochastic component is examined by simple normal probability plots. The results indicate that the data from all sites but one have residual variation which can be reasonably approximated as being normally distributed. 1.2) Data Set Description The MAP3S/PCN network has nine monitoring stations which are located in the northeastern part of the United States. Data were collected for each day on which rainfall occurred from 1976 to 1982; one data record was collected per storm (i.e. event data). However, the stations started collecting data at different times; only data obtained at stations which were active for the entire year are considered (i.e. data obtained in the first year of operation, but with start-up time much later than January 1 st of that year are ignored). About five percent of the data are missing (i.e. there was a storm but no data were reported for the pH level or volume); missing data are ignored in this work. As in the Eynon-Switzer and analysis, only (field) pH level Volume measurements are used although the data set provides much more information (e.g. Conductivity, Sulfite, Sulfate, H,...). The + variables considered are Time (t, t=0 on Jan 1**, 1976 and the unit is a day), Volume (rainfall volume was obtained from a raingauge and 3 measured in millimeters) and p H level. T h e station locations, the first active dates, and the given in Table 1.1. the averages and A D S (Acid Deposition System) site identity are Table 1.2 provides the number of rainfall events and standard deviations and rainfall volumes for each station of the in each corresponding p H levels year. 1.2.1) p H levels T h e standard deviations of the p H levels are not entirely from year to year and from station to station. consistent T h e standard deviation of the data at station 020b (Illinois, Illinois) is consistently larger than that at other stations; this excess variability is easily seen in the plots of the raw data (Figures exhibits a large data-value the (a standard a value standard data Station 065a (Penn State, Pennsylvania) deviation (.61) level of 7.31 this station without consistent (Table corresponding to in 1980 in December deviation calculated which is perfectly other years at The pH l.l.a-l.l.d). 1980; due to see an outlying Figure l.l.d); this single p H level is with the standard deviations .30, of 1.2). four stations (020b, 044a, 048a, 065a) are plotted in Figure 1.1 ( p H levels against T i m e ) . These four plots do not demonstrate any obvious long-term relationship between p H levels 4 and Time; of course, seasonal patterns may be obscured by the severely compacted Time axis. However, the plots do show differences in the data corresponding to different stations. Station 020b (Figure 1.1.a) has a large variability in its pH levels; station 048a (Figure l.l.c) has distinctly smaller variability. Stations 044a and 065a (Figures 1.1.b and l.l.d) display the same overall pattern of relatively small variability. One difference is that there are some possible outliers in the data obtained at station 065a. The plots of the data of other stations (not included in this report) are similar to one of the above figures. 1.2.2) Rainfall volumes The average rainfall volumes vary considerably from year to year and from station to station (Table 1.2). The average rainfall volumes in 1979 are greater than those of other years at all stations but stations 044a and 048a. The standard deviations of the rainfall volumes are large (Table 1.2), and vary greatly from year to year and from station to station. At most stations the standard deviations in 1979 are exceptionally large relative to those of other years; for example, at station 013a, the standard deviation is 29.49 in 1979 while those in other years are less than 15.28. Rainfall volumes are plotted against time for each station separately to examine whether there is any 5 obvious pattern in the data. The resulting plots do not suggest any obvious long-term relationship between the rainfall volumes and Time. Overall, all plots are similar; the plots of four stations (020b, 044a, 048a, 065a) are presented in Figure 1.2. rainfall volumes are clearly demonstrated The great variabilites in the in all plots. possible outlier at station 020b in August of 1979; There is one the rainfall volume is 157.14 mm (the rest are less than 90.00 mm) and the corresponding pH level is 4.22. 6 2) E Y N O N - S W I T Z E R 2.1) APPROACH Introduction One general objective of the study of rainfall acidity is the determination of 'best' estimates of a spatial process describing geographic fluctuations in the acid levels of rainfall at some unobserved locations using the observed data obtained at a limited number of stations monitoring rainfall acidity. These estimates would allow consideration of the optimal design of networks to monitor the long term and potentially damaging effects of the components of acid rain. An approach for and Switzer (1983). finding these estimates is provided by Eynon In their paper, a parametric model is developed using data obtained from the Electric Power Research Institute (EPRI) network. A spatio-temporal stochastic model, including deterministic components for seasonal variation and rainfall washout components for spatial, temporal, and measurement 7 and stochastic variation, is fitted to the data. unbiased Using the method known as Kriging, estimates (BLUE) of seasonal and rainfall the best linear adjusted yearly average p H over the monitoring region are obtained. It should be noted that Eynon and Switzer use the data set obtained by the E P R I network which also has nine stations, each with two years of data. The geographical locations of stations in both networks are presented in Figure 2 . 1 . Although the geographical locations of the stations in the two networks differ, both networks cover essentially the same area. One difference is that rainfall volume is measured in inches in the E P R I network. In this chapter, we present the results of an attempt to validate the Eynon-Switzer model using the data obtained by the M A P 3 S / P C N monitoring network. The attempted validation follows the Eynon and Switzer paper in a step-by-step fashion. We also present some of the corresponding results from their paper using data obtained from the E P R I network. 2.2) The Evnon-Switser M o d e l Eynon and Switzer propose the 8 following model under which a single year's data are analyzed : -log H+{x,t)=pH{x,t) l0 i \ nr.\ t \ • 27rt = <*(x) + P(t) + a{x) , sin — dob 2irt + b (x) cos c-I(x.t) — o65 . ,, +'°s.o1_exp[_c.;(I>i)l+^.o, where x denotes the two-dimensional location under consideration, t denotes the number of days starting from January 1**, 1976j I(x,t) denotes the observed rainfall in millimeters at location x on day t (Eynon-Switzer used inches in their analysis), pE(x,t) denotes the observed rainfall pH at location x on day t, given I(x, t) > 0, H (x,t) + denotes the hydrogen-ion concentration at location x on day t in gram-atoms per litre, a(x) is a stationary auto correlated spatial process describing the geographic fluctuations in annual pH level, P(t) is a zero-mean stationary autocorrelated time process describing temporal fluctuations not attributed to seasonal variation, independent of a(x), e(x t) } is a zero-mean stationary white-noise measurement error process, independent of a(x) and /?(£), a(x), 6(x)are location-specific model constants which describe seasonal fluctuations, and c is a washout rate constant. The correction for rainfall volume (i.e. the logarithmic term in the pH model) represents the simplest scavenging process in which the number of hydrogen ions picked up per unit time per unit volume of rain is proportional to the remaining atmospheric concentration. This process produces lower acidity (higher pH) for larger rainfall volume. The limit of the logarithmic term is zero as the rainfall volume tends to zero. Since /?(£) and e(x,t) are zero-mean residual processes, a(x) may be interpreted as the seasonally adjusted maximum acidity potential at location x. 2.3) Preliminary Fitting of The Model In this section all parameters in the model are estimated by using the B M D P statistical computer package; the program used is Derivative Free Non-Linear Regression, P : A R . For each of the study years, the separate fitting of the data to the model proceeds in several stages. First, the constant coefficients a(x),b(x),c(x) least squares separately at each station location are fitted by non-linear i=l,...,9. The model used for the data (corresponding to events at times tj in the study 10 year under consideration) being fitted in this step is: pH(x{, tj) = a(x,) sin 2lrfy 365 + b(xi) cos 2nt 3 365 +d{xi) + + log 10 c(xi) • I(Xi,tj) 1 - e x p [ - c ( x ) • I[xi,tj)] t e(xi,tj). T h e values of the fitted parameters a(x,), 6(x,), c(x,), d(x,) for each year, and each station are presented in Table 2.1. negative values of e's at stations Note that there are some 020b and 057a. These values should always be positive in principle since c is defined as a washout However, the magnitudes of these values are always very small. rate. These negative values could possibly be the result of the round-off errors in the nonlinear fit of the model. Figure ted lated 2.2 seasonal as shows curves \fcfl + S, Although occur and amplitudes year and phase 2 curves phases for each d sin | | | + 6 cos |||. seasonal the is most separately. the value of the in the summer, of the nine Amplitude of t which minimum fit- is calcuminimizes points of the the amplitudes and the times at which the m i n i m u m occurs for any station do not show great consistency f r o m year to year. presented in Tables 2.2.1 and the M E D I A N This and 2.2.2 fact is supported by the obtained by applying the results ANOVA P O L I S H I N G decomposition techniques to the 11 fitted coefficients a ( i , ) , similar problem. show the same Results in the Two figures Eynon-Switzer paper suggests (similar to Figure 2.2) in their a paper inconsistency described above. T h e results obtained by applying the techniques described above to the fitted coefficients c(a;,), d(x,) are also presented in Tables 2.2.3 2.2.4. E a c h table contains two parts. In Part (a) the average is used to do the decomposition. T h e coefficients are considered as the in a two-way table (yearxstation). the s u m of overall average, decomposition, then, example, the the average row year and as station effect, year effect and residual. T h e effect the entries E a c h entry could be represented is done by using an (row) and is taken overall to average. ANOVA-type be the In Part approach; for difference (b) between the Median Polishing approach is used to do the decomposition. E a c h entry in the yearxstation row median. table is replaced by the difference between itself and T h e median of these row overall effect. T h e year effect is taken the row median and the overall effect. be the column median of the difference between the entry new medians to be is taken the The of the new table be the difference between T h e station effect table. to the is taken residual is then and the station to the effect; the s u m of these effects and the residual results in the original entry of the two-way table. 12 In Tables 2.2.1 and 2.2.2 of estimated values of the seasonal coefficients a(x) and b(x), there are many large residuals. These values appear at different stations and different years without any apparent pattern. Some year effects are much larger, in magnitude, than others. This suggests a considerable interyear variation in the year effect The station effect behaves the same way. Therefore, we may infer that the periodic coefficients a(x) and b(x) are inconsistent from year to year, and from station to station. Note that if the proposed model fitted the data well enough, then one would expect the periodic coefficients to be consistent at least from year to year. This suggests that these data do not have the periodic behaviour postulated in the model. We also apply the above decomposition techniques to the yearxstation tables of phases and amplitudes. The results which are not included in this thesis support our earlier conclusion that the phases and amplitudes are somewhat inconsistent from year to year. It is not surprising that the estimated values of d(x) (decomposed in Table 2.2.4) show little consistency from year to year, and from station to station. The overall effect, d(x) t should be different from station to station and possibly also from year to year. Table 2.2.3 shows some consistency in the estimated values of c. 13 There is only one large residual (at station 065a, in 1980) which is greater (but not very much) than the other residuals. The magnitudes of the year effect are approximately the same. The station effect, however, shows some variation. There are two relatively large values of the station effect which correspond to stations 020b and 048a. This points to possible differences in station effects. Besides, some of the residuals corresponding to stations 048a and 065a vary considerably from year to year (and also from station to station). Therefore, estimates of a common washout rate c obtained by using the data from all of the stations in each year separately, may not be consistent from year to year in conflict with a suggestion in the Eynon-Switzer paper; this is now examined in more detail. Following Eynon and Switzer, we fit a common value of the washout rate parameter c across all stations for each year separately. Using the previous estimates of a(x,), 6(x ), d(x,) (different for each station), t we fit a common washout rate c for each year separately. The fitted values of the parameter c are: Year 1977 1978 1979 1980 1981 1982 c A .0405 .0528 .0189 .0555 .0304 .0489 se(c) .0050 .0060 .0037 .0067 .0095 .0049. 14 Here se(c) denotes the nominal asymptotic-theory standard error of c, based on independence assumptions on the residuals. As a rough check for significant year to year differences in these c-values, we apply the Newman-Keuls procedure. The c-values are assumed to be independent, with a common standard error of approximately .006 (this value is the root-mean square of the standard errors displayed above). While the assumptions giving rise to this procedure are not satisfied exactly, nevertheless, the results give us some rough indication of the behaviour of these c-values. As expected, there are some statistically significant differences (at the five percent level). Specifically, the c-value obtained for 1979 is significantly different from the c-values obtained for 1977, 1978, 1980, 1982; and the c-value of 1980 is also significantly different from the c-value of 1981. Note: The fitted values of the common washout rate parameter c given by Eynon-Switzer are: c = 0.92 for year 1 (Sept. 78 - Aug. 79), = 1.10 for year 2 (Sept. 79 - Aug. 80). These values correspond to rainfall volume measured in inches. To make these values comparable with our values of c (corresponding to volume measured in millimeters), we divide c by 25.4 (1 inch = 25.4 15 millimeters). c = T h e converted values of Eynon-Switzer's c are respectively 0.92/25.4 = .0362, and c = 1.10/25.4 = These values of c agree with ours. .0433. Specifically, they are about in the middle of our range of values of c. The combined corrections for rainfall volume (using a common washout rate c for each year) and seasonality reduce the overall variation of an annual p H series; the residual standard deviation of a series is reduced by 0 - 30 percent series. The ratios of sums from that of the corresponding uncorrected of squares of residuals for the corrected series to that of the uncorrected series are presented in Table 2.3. In some cases, essentially no reduction is obtained by using the combined corrections. T h e reductions corresponding to station 020b in different years are always very small. T h e results presented in Table 2.3 do not show any obvious relationship between the percentage and the stations or the of the reduction years. We note that according to E y n o n and Switzer, these corrections produce a 0 - 20 percent reduction of the residual standard deviation with their data. about In addition, they note that each correction separately contributes one-half of the reduction. T o see whether this is also true for 16 our data, we fit the full model and the reduced model in which volume correction is omitted (i.e. with only seasonal terms) the to eight different data sets. E a c h data set corresponds to one year at a specific station. the The results are presented residual sum of squares due to in Table the seasonal higher in some cases and lower in others the volume correction. separately year and correction is much than the reduction due It is not obvious here whether each to correction contributes about one-half of the reduction. 2.4) With T h e reduction in 2.4. the the Spatial and Temporal Components estimated estimated parameters a(x,), 6(x,) common washout for each rate station, c for each year, each the residual for the station at location x,- for collection day tj is given by: — • pH (xi, tj) = pH(xi, tj) - a(xi) sin ~ These pH(x{,tj) l o gio 2TT£J 365 . » . - 6(x.) cos v " 2ntj 365 C(X.) • I{Xi,tj) 1 - e x p [ - c ( x . ) • I(xi,tj)]' values are treated as estimates of the combined random components a(x,) + fi(t ) + e(zi, tj). 3 Define: V (u) a = E[a(x) — a(x + u)] /2 : the variogram of the spatial process, and 2 17 Vp(tv) — E\(3{t) — B(t + tu)] /2 : the variogram of the temporal process. 2 These quantities describe the geographic and temporal variation of daily 'corrected' p H readings. Eynon-Switzer indicate that for convenience, the following two simple parametric models for the variograms were fitted: V where <7i,/ii,<?2 a r e a { u ) = l + non-negative positive definite 2 x 2 [ u ' M - i ' constants and a at the origin. 2 is a symmetric matrix. The contours of the spatial variogram V (u) centered h are concentric ellipses The models satisfy the constraints defmiteness required of all variograms. of positive The above variogram models are also used to fit our data as described in what follows. 2.4.1) Fitting the temporal variogram; The corrected p H values, pH(xi,t ), 3 are used to estimate an un- smoothed variogram; l/2[pH(xi,tj) - pH(xi,t i)] 2 3 estimates Vp(w) + of , 18 for w = tj — tji, where o f - var(e(x,-, tj)). Note that Eynon and Switzer estimate o f by using duplicate pH readings. In fact, the replication variability was estimated separately at each station for each year. There were modest differences among these estimates but not so large as to warrant modelling o f as a function of location. In any event, their estimate, o f , turns out to be small relative to temporal and spatial variability, and a common value of o f = .019 was obtained with their data. In our data, since no replicate observations are available, o f is absorbed into the temporal variogram. So l/2[pH(xi,tj) — pH(xi,tj>)] 2 estimates Vp(w). All rainfall days separated by a lag w (at a fixed station, separately in each year) are used to estimate Vp(w). The scatter plot of the estimates of the unsmoothed temporal variogram at station 072a in 1982 is presented in Figure 2.3. All possible estimates are plotted; there are 1653 data points. Many of these estimates are near zero, but the variability of the temporal variogram estimates is very large. This pattern is, however, a typical one for all other plots of the temporal variogram estimates at different stations. Eynon and Switzer observe that the fitted value of the constant g 2 19 could not be distinguished from zero with their data. To see whether this is true with our data, we fit the variogram model i (_.2/ )-i + g2 to four different data sets corresponding to different combinations of year and station; the results are presented in Table 2.5. Our results support the nominal asymptotic conclusion of Eynon standard error and Switzer (that of the estimate the is always bigger than the estimated value). We would propose to let <j_ = 0. It may be inferred that the temporal variogram has a constant value gfi. Using the data from all years and all stations, the estimated value of gi is gi = .1287. In Table 2.6, different constants gi which are estimated using the data from each year and each station separately are presented. Applying the previous decomposition techniques to these values, we obtain the results presented in Table 2.7. There are some unreasonably large values of residuals corresponding to station 020b. The effect of station 020b is 0.31. This is very large relative to the other values whose magnitudes do not exceed 0.08. This suggests an unusual pattern in the data obtained at station 020b; recall the excessive variability of the data at station 020b noted earlier. values of gi = 0.805 in 1979 all others. Specifically, the at station 020b is very large relative to Except for these consistently large values of station 020b, the results do not show any obvious pattern between the remaining 20 values. Note: O u r value of g x of .1287 is bigger than the value of gi + o f = .079 + .019 = .098 obtained by Eynon-Switzer. this is that we appear to have outliers O n e possible reason for in our data. T h e excessive variability of the data at station 020b might be another possible reason. 2.4.2) Fitting the spatial variogram; T h e corrected p H values, pH(x{,tj) are used as follows to obtain the unsmoothed variogram: l/2[pH(xi,tj) - p~H(xi>,tj)] estimates V (u) 2 +Vp(0 ), + a where u is the vector joining station pairs (a;,-, £ , < ) . Vp(0 ) + comes into the expression because the time unit was previously chosen as one day, and two readings of p H levels at two stations on a same day (i.e. same tj) might not have the identical collecting times so that the time lag is always 0 . + l/2[pH(xi,tj) Of course, V (u) a Hence — pH(xi>,tj)] can be 2 — gi estimates V ( u ) . a estimated corresponding to two stations directly only with events recorded 21 for arguments u on the same day. With our data set, there are 2848 estimated values for the unsmoothed spatial variogram corresponding to two-dimensional vectors u joining the 36 station pairs. An attempt to fit V (u) to this data set failed a because the algorithm failed to converge. We note that Eynon and Switzer do not report any difficulties in fitting the spatial variogram. To find the estimated values of h\ and h 2 for our data set, we try another approach. For each station pair, the simple average of all the available estimates of the unsmoothed spatial variogram is considered as a new data point. The number of unsmoothed estimates entering this average is treated as a case weight. This new data set has 36 data points, with a case weight attached to each point. However, the attempt to fit V (u) a to this new data set also fails; each of three different algorithms, Gauss-Newton and Marquardt algorithms (SAS statistical computer package), and Derivative-Free Non Linear Regression (BMDP package), fails to converge to a solution. The plot of this new data set is in Figure 2.4. The plot, however, shows that the proposed spatial variogram model is totally inappropriate for our data set. There are many negative estimates possibly because 22 of the large estimated value of Vp(0 ) (i.e. gi = .1287). And there are + also some unreasonably large values of the unsmoothed spatial variogram estimates. These estimates are always from those station pairs where one of the two stations was station 020b (Illinois, Illinois). The difficulties encountered in this fitting may be due to these values. This is another indication of the atypical behaviour of the data obtained at 020b. station The plots of the unsmoothed spatial variogram estimates for each year separately (not included in this report) have approximately the same pattern as the above. The final step in the Eynon-Switzer approach is the interpolation step, called kriging. To carry variogram model is required. station 020b could be the out this step, an estimated spatial Since there have been indications that source of many problems that we have encountered in applying the approach to this data set, we will repeat the previous steps of the approach to the data set with station 020b removed. and h 2 This may allow us to find estimates of the parameters hi in the spatial variogram model used by Eynon and Switzer. Besides, we would like to find out how well this approach work if we remove station 020b. This will be done in the next section. 23 2.5) Pitting Without Station 020b In this section we examine whether we can obtain estimates of the parameters hi and h 2 from the data with station 020b removed. We repeat the previous steps of the Eynon-Switzer approach on the reduced data set and present the results in what follows. The estimates of parameters a,b,c,d in the Eynon-Switzer model are the same as before. Now we use these estimates to fit a common c value for each year separately. The fitted values of the parameter c are: Year 1977 1978 1979 1980 1981 1982 c .0405 .0622 .0297 .0654 .0492 .0530 old c .0405 .0528 .0189 .0555 .0304 .0489 se(c) .0050 .0054 .0033 .0060 .0044 .0049 where the notation is the same as in section 2.3. The estimated values of c for the years in which the data from station 020 are removed turnout to be larger than those obtained earlier; the largest increase, 0.0188, is in 1981. As before, we apply the Newman-Keuls procedure to check for significant year to year differences in these cvalues. An approximate common standard error of c is about 0.005. The result obtained in this step is generally similar to that in section 2.3 although there are differences. 24 Specifically, the c-value for 1980 is no longer significantly different from the c-value of 1981, and the c-value obtained in 1979 is significantly different from all the rest. We use estimated values of a, b and a common c to estimate the temporal variogram Vp as in section 2.4.1. As before, the fitted value of g 2 is negigible (i.e. the temporal variogram would have a constant value gi). Using all the data together (excluding station 020b), the estimated value of gi is 0.095; this value is comparable to the EynonSwitzer estimated value of ffi+of = 0.098. This supports our claim that our earlier estimated value of g\ is larger than Eynon and Switzer's because of the excessive variability of the data obtained at station 020b. Using this gi, we obtain the unsmoothed estimates of the spatial variogram as in section 2.4.2. Our attempt to fit the model for V a to these estimates fails (i.e. the nonlinear regression fit fails to converge to a solution for h\ and h ) using all the different approaches described 2 in section 2.4.2. To examine the unsmoothed estimates of the spatial variogram, we plot them in Figure 2.5. The points with unusually large values in Figure 2.4 do not appear in this plot; however, the proposed model 25 for the spatial variogram s t i l l does not seem to be appropriate. that there are still many estimated values of the spatial Note variogram which are negative. T h i s suggests that we may still be overestimating the t e m p o r a l variogram gi, although our estimated value of gi is now comparable to E y n o n and Switzer's. So the removal of station 020b f r o m the d a t a set has not resolved the difficulty in obtaining estimates of the parameters hi and h 2 in the E y n o n - S w i t z e r spatial variogram model. Consequently, the interpolation procedure called kriging, undertaken by carried out w i t h this spatial variogram. complete Eynon-Switzer could not However, to demonstrate approach of Eynon-Switzer, we consider an alternate be the spatial variogram model in the next section. 2.6) Kriging W i t h a Different Spatial "Variogram We have learned f r o m our earlier analysis that the model does not appear to be appropriate for this d a t a set. the interpolation Eynon-Switzer Consequently, step based on the residuals from this m o d e l seems inappropriate. However, to demonstrate the considerable value of the Eynon-Switzer approach i n analyzing spatial-temporal d a t a (when model is appropriate), we carry out 26 the interpolation step w i t h the the spatial variogram ) V (u) = a-/( _o) + 6* N l i a tt? where u is the vector joining station pairs; ||«|| is the length of u and J(u^o) * 1S n e indicator function. The parameters a and b are fitted with constraints a > 0, b > 0, using the above unsmoothed estimates of the spatial variogram. The estimated values of a and b are 0.000 and 0.005 respectively. For each year, the estimated values of the realized spatial process ct(x) at the station locations x = x,- can be obtained by <*(*») = — 2 — , Tli where n,- is the number of rainfall days at station i in the given year. The estimates a(x,) for each year are presented in Table 2.8. of the rainfall volume correction, than the simple averages in 1979 Because these estimated values are smaller of the uncorrected p H levels (station 020b is an exception; this may be due to the excessive variability of p H levels at this station 020b in 1979). differences range from 0.01 to 0.20 At other stations, the and they vary from year to year and from station to station. To put the corrected p H values of Table 2.8 on the original scale we would need to add a number of p H units corresponding to the washout associated with the average daily rainfall at is assumed that to station in that year. Although the washout rate c be common across stations 27 in each year, the large differences in average daily rainfall across stations in each that these adjustments each year. will differ somewhat F o r example, range from 0.05 to 0.10. in 1982 when c = .0489, the adjustments to add these adjustments mean to station in i n 1979 when the estimate of the rate is c = .0189, these adjustments important from station year washout Similarly, range from 0.08 to 0.22. to the corrected p H levels It is before doing the interpolation because estimates at unobserved locations (i.e. contour maps) will be substantially influenced by these values. Since the spatial-temporal model for the p H levels does not seem to be appropriate, resulting contour like to demonstrate maps may be misleading. We would how the kriging method works, but do not intend to take the interpolating results seriously. corrected p H levels seems unneccessary. below is carried out without first F o r this reason adjusting the T h e interpolation step described adjusting the corrected p H values. T o obtain estimates of the p H levels for unobserved locations sequently, contour maps), kriging (Delhomme 1978). at we use the interpolation technique (subcalled T h e 'best' linear unbiased estimate of oc(x ) an unobserved location x 0 0 is given by 9 28 where the weights Ai,...,A are chosen to minimize 9 v{x ) 0 subject to __]^»'( o) = 1x = E[a*{x ) 0 With process underlying the data (cf. - ot{x )} 0 stationarity 2 assumptions about the Journel and Huijbregts (1978)), the quantity v(x ) can be expressed as a quadratic form in the A,'s whose 0 coefficients depend on the spatial variogram V , the temporal variance a parameter x 0 gi, and We minimized this quadratic form for each on a fine grid over the eastern United States using the above estimated parameters for the spatial variogram. The contour maps of these resulting interpolated values are presented in Figure 2.6 (produced using the computer language S) for each of the four years from 1979 to 1982. The contour maps could possibly be useful in investigating the pattern of changes from year to year. However, as mentioned earlier, these results could be misleading, so attempting to draw conclusions from these results seems unwise. 2.7) Conclusion In this attempt MAP3S/PCN to validate the Eynon-Switzer model using the data we obtain some results which are consistent with those based on the E P R I data set. However, some of the results differ. Specifically, there are significant differences between the estimated values 29 of the yearly temporal washout variogram Switzer, Vp is possibly because (including station section rate 2.5 estimated 020b). using the c. O u r estimated larger of some that potential reported outliers by data set vs .098). possibly because between the analysis is capture the complete model have the to the structure approach obtain with station of 020b removed, the same as that already p H levels the E y n o n and contour to be we the and data set where the obtained by E y n o n set. use maps the for MAP3S/PCN in geographical location indication obtained from our Switzer of this data concluded that appears differences Overall, throughly, the the T h e spatial variogram model proposed by two networks. that of Eynon in our Eynon-Switzer does not appear to be appropriate for the network, value T h i s is supported by the results obtained in value of Vp is about and Switzer (.095 than constant model does However, an to alternate 1979 to not investigate spatial 1982. their variogram But proposed spatial-temporal inappropriate, completely as e we model for drawing conclusions from these contour maps seems ill-advised. Since this approach is unsuccessful for this data set, examine structure the raw data we propose to in more detail with a view towards identifying which may be present. T h i s will be done in the next 30 chapter. 3) D A T A EXPLORATION We now start a careful examination of the data in an attempt to uncover any underlying structure which may be present. We examine a number of factors which may be expected to influence the pH readings and investigate the possibility of consistent relationships between pH levels and these factors. There may also be some relationships between these factors. 3.1) Factors to Be Examined As described earlier, there are about 3000 data points in the data set. Each data record contains the time of the event (the storm), the location of the station, the rainfall volume of the storm, and other information. Note that in this examination one pH reading of 7.31 at station 065a in 1980 is removed. The new mean and standard deviation for this year at this station are 4.04 and 0.33 respectively, while the old ones are 4.14 and 0.61 (Table 1.2). The removal of this obvious outlier makes the new standard deviation of this year consistent with other years at this station (about 0.3). 31 T h e location of the station should be one important factor. Most stations are located in heavily industrialized areas; therefore, the pollutants in the sky at each station are possibly influenced by the chemicals released from the factories long-range transportation sions are carried distances of the to the storm we expect strong the may be the factor There are also possibly some that is tall remote winds at high on the may under consideration. p H levels of the different from station is a categorical smokestacks emis- altitudes These contaminants station effects of stations station effects, location. at the by in that area. to over affect For very the these geographical station. In long acidity reasons, locations our study, variable with nine categories; each specifies a single station. Another factories, important the factor is the time of volume of production is seasonal the (i.e. storm. In some for some specific periods, production is very high, and in some other periods, production is low). D u r i n g periods of high volume production, more are released into the sky. These substances of the storm. pollute which the will reasons, we sky; Further, during the in turn expect affect the the time the p H readings (i.e. may affect the p H reading summer these obviously change season, forest chemistry of that seasonal) 32 contaminants factor of the storm. is fires For may storm, these important for this study. months. years T o account Note that and seasons for seasonality, Egbert (3 and Lettenmaier or 4 months obvious patterns i n p H levels. period (i.e. one month) time is divided into years and each), (1986) divided time into but d i d not discover any F o r this reason, we use a shorter time in this analysis. Another important factor is rainfall volume. It is generally believed that the p H reading However, varies in the previous volume as represented (1983) monotonically chapter, with the rainfall volume. our observations of the effect of by the scavenging term used by Eynon-Switzer do not support this belief. Therefore, we will examine more general relationships between rainfall volumes a n d p H levels. 3.2) Preliminary Examination of T h e S t r u c t u r e of T h e Data We first and examine also between the relationships among T i m e , these factors a n d the p H levels. Volume, Station, In this section we use monthly averages of p H levels a n d Volumes instead of individual values. T h e monthly average p H level (or Volume) is just average over the events i n each month. the simple Since these monthly averages smooth the original data (i.e. reduce the variability), relationships may be more clearly (individual) data. exhibited in the averaged data than in the original If there are no obvious relationships in the averaged 33 data, relationships in the original data set seem unlikely. problem with this approach is that taking averages some effects which we may be able to detect O n e potential could cancel out in the individual data. However, this is unlikely to occur since with only a few observations available in each to that detect even month if they (about are relationships suggested 1-8 present points), data are very i n the individual data. by the monthly-average data be examined subsequently in the individual average effects data. hard Note can always Working with the also has the practical advantages of reducing the size of the d a t a set to about 500 d a t a points from about 3000 data points. In the following sections, we examine the relationships among the factors in detail using the averaged data set. 3.2.1) Relationships between p H a n d M o n t h , p H a n d S t a t i o n We would factors affect components: averages at like to find out whether or not the Station a n d T i m e the readings of p H levels. Year each and M o n t h . station. In each T o examine The Time year, there factor are 12 monthly the relationships among factors, we will look at the data of each year (all stations) If there are obvious relationships in each year, to recognize how they change over the years. 34 has two these separately. we hope to be able T h i s would subsequently help us to understand the relationships among the factors for all years of data. The monthly average data in each year are fitted with the following additive model: where pHij denotes the pH average of Month j at Station i, fi denotes the overall effect, or,- denotes the Station effects, i = 1 to 9, Pj denotes the Month effects, j = 1 to 12, €ij denotes the residuals. The residuals may contain not only the pure errors, but possibly also some other effects such as a rainfall volume. We proceed to obtain rough estimates of the parameters by using ANOVA-like and median polish decompositions which are described in detail in section 2.4. The monthly averages are arranged in a 2-way table; the factors are Month and Station. We fit the model by this rough method for years 1977 to 1982 separately. The estimated values of the parameters a and P for the years from 1977 to 1982 are presented in Tables 3.1 and 3.2 respectively. 35 T h e estimates of \i are: 1977 Year 1978 1979 1980 1981 1982 ANOVA 4.14 4.10 4.27 4.18 4.14 4.21 Median 4.14 4.07 4.22 4.14 4.16 4.25 Note that similar estimates of /x are obtained by both methods. The monthly averages range only from about 3.5 to 5.5, and most of them are about 4. Therefore, the values of the overall effect obtained by taking the overall average and the median should be very similar. The differences of the estimates from year to year are not large. If we had many more years of data, then we could investigate whether there is a true yearly effect or a trend (pattern) over the years by examining the estimated values of /./. Unfortunately, these issues could examined thoroughly because we have only six years of data. the six estimated values of fi (i.e. for the six years) are not be Although similar in magnitude, it is not reasonable to ignore the yearly effect on the p H levels. Therefore, we still assume that there is some yearly effect on the p H levels. The Station if there station effects (&) for the different years in Figures 3.1.1 is any consistent (ANOVA) pattern and 3.1.2 from 36 year are (Median to year. plotted against Polish) to see Similarly, the monthly effects (f3) for the different years are plotted against in Figures 3.2.1 and 3.2.2. Month Note that the plots for the two methods of decomposition are qualitatively similar for both sets of estimates. Figure 3.1 station to shows that the station effects are quite different from station. from stations 013a In Figure 3.1.2, there are large positive effects (0.05-0.27) and 020b (0.12-0.34). There are also some large negative effects from station 057a; for example, the effect is -0.30 in 1979. The effects for the remaining stations are very small or centered around zero. The changes in the station effects from year to year are considerable for most stations; stations 048a and 072a are exceptions. In general, Figure 3.1.1 has similar features, although there are some differences. The effects do not change from year to year at station 013a. The effects vary more from year to year in Figure 3.1.1 than Figure 3.1.2. Overall, the effects are quite different from station to station in both plots. This observation supports our belief that the geographical location of a station is an important factor in explaining the p H levels obtained at that station. Note that at station 020b, both estimates of the effect of Station for 1979 are unusually large (Figure 3.1). The reason is that in November 1979, there was only one storm with an unusually high p H level of 7.12; 37 therefore, the monthly average of this month is greater than those of other months (7.12 compared to about 4.0-5.5). Moreover, the estimate for 1979 (Figure 3.1.1) based on the A N O V A decomposition is greater than that from the Median decomposition (Figure 3.1.2). This may be explained by the robustness of the Median Polish decomposition. The monthly average of November in 1979 is obviously an outlier. The decomposition using medians reduces the effect of the outlier on the estimated values of the parameters more than the decomposition using averages ( A N O V A ) . Figure 3.2 shows some interesting results. vary considerably from year to year. The monthly effects However, the effects change in a somewhat consistent manner from month to month. Generally, the effects decrease slowly from January to August, increase sharply from September to November, and then decrease a little in December. Note that this general pattern only becomes apparent when the effects for all years are plotted together. In each year, the pattern of the effects is somewhat different from the general pattern. For example, the effect increases from January to February (sharply) in 1981 and from March to April in 1982; the general pattern suggests a decrease in both cases. These plots also indicate that the minimum of the monthly effects always occurs This result supports the expectation in the summer. 38 that levels of acidity increase are differences in the summer. Overall, in the patterns of different years, described above is the only apparent structure although there the general picture i n the data. 3.2.2) Relationships between Volume and Station, Volume and Month The monthly average of the p H levels adjusted station, monthly, and yearly effects, pH i3 for these additive denoted by pH^-, is = pH^ -/*-<*,-- fa. T h e results which follow are based on the adjusted p H levels obtained by using the estimated values of /z, a, P provided by the median polish decomposition. We use the median polish estimates here there are possible outliers the estimated in the data set." T h e effect because of outliers on values of parameters is much less pronounced with the median decomposition than the A N O V A decomposition is more method; therefore, the median appropriate. Since we wish to determine the relationship between p H levels and Volumes, we proceed to find the station, monthly, and yearly on the rainfall volumes so that we can adjust these effects the rainfall volume for as we d i d for the p H levels in the previous section. is done i n what follows. 39 effects This The relationship between Volume and Station and Month can be examined by an approach similar to that in section 3.2.1. In each year we fit the following model: Vi] = /* + 7i + + e,y where Vij denotes the volume average in Month j at Station i, H denotes the overall 7,- denotes the Station effects, i = 1 to 9, Uj denotes the Month effects, j = 1 to 12, €ij denotes the residuals. We effect, obtain estimates of the parameters [i, 7 , and u by using the median polish and ANOVA-like decompositions. We fit this model to the data of each year separately from 1977 to 1982. The estimated values of /z are: Year 1977 1978 1979 1980 1981 1982 ANOVA 14.42 12.82 17.07 10.53 10.77 12.00 Median 15.32 14.03 20.31 11.00 12.39 13.43 The estimates of n are quite different from year to year. Therefore, we infer that there is a substantial yearly effect on the average volumes. The distribution of the monthly average rainfall volumes has a long left 40 tail in each year (all stations); consequently, the median is greater than the average. This is reflected in the above table where the estimates of fi based on the median decomposition are in every case larger than those based on the A N O V A decomposition. The estimated values of the parameters Tables 3.3 and 3.4, 7 and w are presented in respectively and are plotted in Figures 3.3 and 3.4 respectively. Figure 3.3.1 suggests some differences between stations. The effects vary greatly from year to year at some stations. Specifically, the effect for 1979 at station 013a is much greater than that for other years. Also, the effect for 1979 than that for other years. at station 048a is less pronounced The difference between 1981 station 171b is large (about 8). Figure 3.3.2 suggests and 1982 at a qualitatively similar general pattern although there are some differences. The effect of 1979 at station 020b is very small in Figure 3.3.2 but it is large relative to other years in Figure 3.3.1. This is due possibly to the fact that the average volumes are greatly different from month to month in 1979 at this station. As a result, the average and the median of the monthly rainfall volume averages of this year (station greatly different; the average is about 20.71 about 7.82 mm. 020b) are mm and the median is Consequently, the results of the two decomposition methods are quite different for year 1979 and station 020b. Note that 41 there was no rainfall event in February 1979 at this station. Overall, Figure 3.3 suggests that the monthly volumes are substantially different from station to station. Figures 3.4.1 and 3.4.2 any obvious pattern. year to year, they show that the monthly effects do not exhibit Although the estimates are quite different from are scattered around zero. This suggests that seasonality is not important for monthly average rainfall volumes. This result seems surprising; average rainfall volumes might be expected to differ from season to season. Since seasonality was important for p H levels and might be expected to be important for rainfall volumes, this issue is now examined We apply the Stuetzle further. SUPER SMOOTHER developed by Friedman and (1982), to all individual (event) data. The Super Smoother uses local linear fits with varying window width determined by local cross-validation. produces In general, this smoother takes bivariate a smooth fitted relationship between data and the two variables. this exercise we use Volume and Time as a bivariate data set. Volume and Time are measured as in chapter 2 (i.e. In Both Volume is measured in millimeters and Time is the number of days from January l* ,1976 to the day of the event). t 42 The relationship produced by the smoother is Volume as essentially a constant over Time. The constant is 13.6 mm which is very close to the average of all individual rainfall volumes. We also apply the same procedure with Volume adjusted for the Year and Station effects. The result indicates that there is no obvious relationship between the adjusted Volume and Time (i.e. Month). The smoothed curve provided by the smoother does not resemble any standard functional form (Eynon-Switzer suggested a logarithmic relationship between Volume and Time). These results suggest that the rainfall volume of a storm is not influenced by the time of the storm (i.e. there is no seasonality). It appears that the effect of the factor, Month, on the average rainfall volumes can be ignored. The model for the monthly average rainfall volume can now be written as: V - = /i + i3 7t + where everything is as previously defined. Using the estimated values of the parameters fi and 7, we obtain the adjusted average rainfall volume as follows: Vii = Vij - p, - 7,-. V represents the monthly average rainfall volume adjusted by the yearly average and the station effect. Note that we use the median estimates to estimate /z and 7 . 3.2.3) Relationship between p H and Volume We now search for a relationship between monthly average p H levels and rainfall volumes using the adjusted data (V, original data (V, pH). pH) instead of the Note that if there is indeed a relationship between p H and Volume then it would be more easily detected using the adjusted data than the original data. The effects of each factor (Year, Station, Month) on the volume and the p H level are different; therefore, under the influence of these different effects, the original data may suggest some relationship which is not true. As a first step, pH is plotted against V in Figure 3.5. from 1979 to 1982 1977 Only data is used because only three stations were active in and only five stations were active in 1978. included since it has only two years of data. Station 171b is not There are about 400 data points. The figure shows that most data points lie near the origin (i.e. and V = 0, V. pH = 0). There are some unusually large values of pH There are four large values of pH corresponding to the values 44 of V ranging from -10 to 5. There are also four large values of V corresponding to pH ~ 0. With these points deleted, the figure does not show any irregularities. To examine these unusual points further, we plot the data from each station separately in Figures 3.6.1 to 3.6.8. These figures reveal that all of the four large values of pH and the largest value of V are from station 020b. They also show that the other three large values of V are from some possible outliers in the data. Indeed, at station 013a, there is one monthly average rainfall volume of 75.20 mm (average of two storms), while the remaining values range from 3.15 to 34.59 mm. At station 072a, there are two average rainfall volumes of 69.36 and 70.76 mm, from 2.14 to 31.01 mm. while the rest range The plot for station 020b (Figure 3.6.2) shows an unusual pattern. The strange behaviour of station 020b is not surprising since as mentioned previously, the monthly averages of rainfall volumes vary greatly from year to year and from month to month at this station. The plots of other stations do not show any obvious pattern for pH and V; there are a few outliers. Overall, the plots do not show any obvious relationship between the adjusted rainfall volumes and pH levels. Since we did not see any relationship in the adjusted data, we are convinced that there is no relationship between the rainfall volumes and the pH levels. It appears that the 45 rainfall volume is not an important factor in explaining the p H levels. In the light of the above preliminary examination, we conclude that Year, Station, and Month are three important factors in explaining the pH levels. The rainfall volumes do not appear to influence the p H levels. In the next section, using the individual (event) data, we examine how these factors influence the p H levels in detail. To investigate this issue, we first fit an ANOVA model to the event data to see if we can fit this data with a simpler model (i.e. fewer parameters). 3.3) A n A N O V A Model An ANOVA model would represent the p H values corresponding to the individual events as follows: (1) pHijki = fi + cti + Pj + 7fc + ( P)ij + («7)tfc a + {Pl)jk + {apl)ijk + etjkh where pHijki denotes the p H level of the I th at the i th Year, j t h storm occuring Station and k th H denotes the overall effect, or,- denotes the Year effect, i = 1,..,6, Pj denotes the Station effect, j = 7^ denotes the Month effect, k = 1,..,12, 46 1,..,9, Month, [ctj3)ij,(at'i)ik,[l3'i)jk (a/?7),-yfc Cijki denote the second order interactions, denotes the third order interaction, denotes the residual. This is a saturated A N O V A model in which the p H level is represented as a combination of two parts. The stochastic part is represented by e and the remaining components of the model represent the deterministic part (where all factors are considered fixed). In the following sections, we estimate these two parts and then examine each estimate carefully. This examination is intended to reveal the stochastic structure of the data; for example, whether the residual variation can be reasonably approximated by the normal distribution. We would also like to capture the deterministic component of the p H level with a simpler model (i.e. one with fewer parameters). 3.3.1) Fitting the A N O V A model As mentioned in chapter 1, the standard deviations of the p H levels are quite different from station to station; they also vary somewhat from year to year (Table 1.2). Therefore, it seems reasonable to allow the p H levels to have different variances for different yearxstation combinations; we assume that all p H levels measured in the same year and same station have the same variance. 47 Using the data in each year and each station (about 50 to 100 data points), we can get an estimate of the variance (as presented in Table 1.2). However, since we know that monthly effects are important, we can get a better estimate by utilizing this knowledge about the structure of the data. Given a fixed year and a specific station, we can write the p H level of the storm observed in the k I th month as: th pHki — + CM, where pit denotes the monthly effect, k = eki denotes the residual. 1,..,12, Since we assume that the residuals in each yearxstation combination, have the same variance, a , we get an unbiased estimator S 2 2 for cr , 2 given by ?2 (2) H E,(pHki~pH .) k k 2 E K -1) f c where p~H denotes the average of all p H levels in the k i%k denotes the number of observations in the k k th th month, month. These estimates are a bit smaller than those adjusted only for the yearly average instead of the monthly average. 48 We present both estimates of the variances in Table 2.2 where the estimates adjusted for monthly average are denoted by sd(adj). With these estimates of the variances for the pH levels, we use a weighted least squares (WLS) method to fit the model (1). There are two ways of fitting this model. The first is to consider the estimates of the variances obtained above as the true values; the second is to carry out re-weighted least squares in which the variances are re-estimated after each iteration until they do not change substantially. Here, to remain consistent with our assumptions about the variances, we use the former and think that the above estimates are already very close to the true values. Since the primary objective of this fitting is to obtain the estimated values for the deterministic components, not to estimate the variances, the use of the re-weighted least squares method seems unnecessary. In what follows, the saturated be fitted. To convince ourselves model and various submodels will that the estimated values of the parameters would not be much different with re-weighting, we carried out the re-weighted least squares with one simple submodel consisting of Year, Month, and Station effects only: pHijki = (M + ai +fa+ 7fc + eijki. 49 Our previous estimates of the variances were used as initial values and the maximum likelihood method was used to re-estimate ters and the variances, assuming normal data. the parame- The fitting procedure converged after three iterations (the absolute change in the estimated values of each parameter is less than .01) and the maximum of the differences between our estimated values of parameters (i.e. iteration 1) and the resulting values from the re-weighting method is less than .02 for each parameter. This supports our belief that re-weighted least squares is unnecessary. Using the procedure G L M in SAS with weights tu,-y defined as 1 Nowhere Sfj is the estimated variance for year i at station j , calculated using (2), we get the following results (Y=Year, M=Month, Source df SS MS Y 5 81.6 16.32 M 11 366.7 33.34 S 8 222.4 27.80 YM 55 311.6 5.77 YS 29 198.3 6.84 MS 88 172.0 1.96 50 S=Station): YMS 319 521.8 1.64 Error 2400 2449.1 1.02 Total 2915 4323.5 Note that by using the W L S method to fit the parameters, we assume that TpHijki/Sij (MSE) has a variance of 1. Our estimated value of the variance is 1.02, which is close to 1. This indicates that our estimates for the variances used in the fitting are appropriate. In this table, df denotes the degrees of freedom of the effect and SS is the reduction in the weighted residual sum of squares due to the source (factor), obtained by adding these sources sequentially to the model in the order listed. More precisely, let RSS(f) be the weighted sum of squares of residuals when fitting the model denoted by / : RSS(f) Y^w ipH -f) , = i3 2 i3kl ijkl where / denotes the W L S fitted model. Then each SS can be written in terms of these RSS's. For example, the sum of squares due to the factor, Month, SSM, is given by SS M = RSS(fi + a + pj) - RSS(fi { + or,- + p + 3 -) ). k The corresponding MS is just the ratio of SS and df. The sums of squares due to the sources in the above table are not substantially 51 influenced by the order in which the components are sequentially added to the model. Specifically, provided that all l * order effects are entered, f essentially the same SS's for higher order effects are obtained using different orders. the most The results in the table correspond to what seems natural order. Using the above results, we examine the deterministic and stochastic components in the next sections. 3.3.2) D e t e r m i n i s t i c component In the saturated model (1), we use 516 parameters to capture the deterministic component. In the hope of finding a simpler model (one with fewer parameters) which adequately represents the deterministic components, we examine whether we can drop some effects from the full model. To examine this question, we compare different submodels containing combinations of the effects and interactions. It is obvious that the main effects are very important (the mean square for each of the main effects is at least 16.6, while the largest mean square for the second order interactions is at most 6.8). Moreover, it is reasonable to include a higher order interaction term in the model only when the yearx month lower term is also interaction is in the included. For example, model then the year 52 if the and month effects must be in the model. Guided by these criteria, we consider only eight specific submodels and the full model. We use the log-linear convention to identify the submodels; for example, [Y,SM] represents the model containing Y , S, M , and S M . The results comparing the models to be considered are provided in the following table (p = number of parameters): Submodel P RSS R R 2 2 Ax C -p P [Y,M,S] 25 3652.8 .16 .15 761.8 3852.3 3702.8 [YM,S] 80 3341.2 .23 .21 505.2 3979.4 3501.2 [YS,M] 54 3454.5 .20 .19 592.5 3885.3 3562.5 [Y,MS] 113 3480.8 .20 .16 677.8 4382.3 3706.8 [YM,YS] 109 3142.9 .27 .25 335.9 4012.5 3360.9 [YM,MS] 168 3169.2 .27 .22 421.2 4509.5 3505.2 [YS,MS] 142 3282.5 .25 .20 508.5 4415.4 3566.5 [YM,YS,MS] 197 2970.9 .31 .26 251.9 4542.6 3364.9 [YMS] 516 2449.1 .43 .31 49.1 6565.7 3481.1 In this table, the values of various statistics often used as criteria for choosing the 'best' submodel are also included. These criteria, which are similar to those suggested in Weisberg (1980) for independent and identically distributed (iid) residuals, are described in what follows. 53 a) M u l t i p l e correlation coefficient The square of the multiple correlation coefficient R 2 is often used as a criterion for comparing models. In this case, since we have only independent analogue but not of R , identically distributed residual data, which is calculated 2 squares, for comparing submodels. 2 2 2 p — by using the weighted sums of A computing formula for R p-parameter model (denoted by R ) P we use an in a is: R^Sp SYY' i _ ~ where RSS P is the weighted residual sum of squares using the submodel with p parameters, SYY is the total sum of squares, given by SYY - = £ 0 • « WijtpHvu fH...) , 2 where pH denotes the overall average, Wij is as previously defined. Note that the large values explained by the model. of R 2 indicate A disadvantage that more variability is of this method is that it is only useful for comparing different models with a fixed number of parameters. 54 b) Adjusted R 2 To adjust the method in (a) for the stated disadvantage, corre- sponding to suggestions in Weisberg (1980), one may use an adjusted version of R , 2 denoted by R and defined by 2 where n is the total number of observations. Note that as before large values of R 2 indicate that more variability is explained by the model. The adjusted value can be negative. c) M a l l o w s ' C statistic p Let pH ijkl be the predicted value of the corresponding observation obtained by the submodel. Let the mean square error (mse) be defined as mse(fH ) ijkl = var(fH ) ijM + [E(fH ) ijkl - E{pH )] . ijkl 2 The submodel which makes the weighted mean square errors, given by Wij x mse(pH j i), as small as possible is preferable. i: k Note that Weisberg (1980) suggests this criterion only for the iid case. However, since we use the weighted mean square error criterion, we in fact have an iid case where pHi i/Sij jk is the raw data with variance equal to 55 1. The Mallows' C statistic for a p-parameter model is defined by p C where RSS P p = RSS p + 2p - n, is as previously defined. Mallows (1973) suggests that good models will have negative, or small, values of C — p. p We calculate R, 2 R, 2 C p with SYY=4323.5 and n=2916, and present the results in the above table. A l l three methods suggest that the saturated model is the best one among those submodels considered. This suggests that all terms in the full model are needed to account for the p H levels. Note that the full model corresponds to using the monthly average as the fitted value for the p H level of any particular event. Our objective in investigating the deterministic component is to reduce the full model to a model with fewer parameters which still captures the structure in the deterministic component of the data. To find a simpler model, we would first have to examine whether any of the different sources (i.e. Y , S, M , YS,...) could be dropped from the full model without serious effect on its ability to capture the deterministic component. This would hopefully lead us to a submodel. Subsequently, we would investigate the possiblity of replacing the remaining parameters with fewer parameters by using simple (possibly nonlinear) functional 56 forms to represent some of the effects which were identified as important (a periodic f o r m to represent any seasonality evident i n the estimated Y , M , and Y M effects, for example). Using the M S due t o the sources, we c a n see if any source can be eliminated. A s noted earlier, the M S due to the m a i n effects are very important; they range f r o m 16.32 to 33.34. W i t h MS ~ 1.02, rough error F tests would yield very s m a l l p-values for these m a i n effects. T h e fate of interaction terms are not so clear. T h e M S due to the y e a r x m o n t h (MSYM) and y e a r x s t a t i o n (MSys) interactions are considerably smaller t h a n the M S due t o the m a i n effects (the values are 5.77 and 6.84), but rough F tests would yield small p-values. m o n t h x s t a t i o n (MSus) are much smaller t h a n and y e a r x s t a t i o n x m o n t h MSYM and MS s Y (MSMS T h e M S due to the interactions (MSYSM) = 1.96, MSySM =• 1.64); intuition suggests that these interactions are not really i m p o r t a n t . A l t h o u g h the rough F tests would still yield statistical significance for these interactions, the significance i n this case may be inevitable and due to the very large number of observations available (2916). The results using different criteria (described above) also suggest that every t e r m i n the model is important i n explaining the p H levels (i.e. the f u l l m o d e l is the 'best' one). Note that none of these methods 'adjust' for the large number of observations available even though the latter 57 plays an important role in these results. To pursue this issue further, we can use the criteria proposed by Schwarz (1978) and Akaike (1973) for comparing the fit of different models. Let p be the number of estimable parameters in a model. Schwarz proposed that the model which maximizes B = log(maximum likelihood) - | log(n) be chosen. assumption Since the number of observations (n) is fixed, with the of independent and normal data having known variances (i.e. ofy = 5?), the criterion is equivalent to minimizing Bi = RSSp -fpxlog(n). Akaike proposed that the model which maximizes A = log(maximum likelihood) - p be chosen. With the same assumptions, this is equivalent to minimizing Ai = RSS P + 2 x p. Note that log(n) = 7.978, so Bi — Ai + 5.978 x p in this case. The values of Ai and Bi for the submodels are presented in the above table. According to Akaike's criterion, the submodel [YS,YM] 58 is the best; the last two terms in the full model (i.e. can be dropped. This supports our earlier suggestion that MSYSM SM, YSM) MSSM and are not really important. Schwarz's criterion, however, prefers the submodel [Y,S,M] to the rest. Nishii (1984) proves that the criteria proposed by Mallows and Akaike are asymptotically equivalent under general conditions; asymptotically, they have a positive probability of selection only for models that properly include the true model. However, the Schwarz criterion behaves somewhat differently; the model chosen by the Schwarz criterion is a 'consistent' estimator of the true model. To a certain extent, this behaviour is evident in our analysis. The three 'best' submodels chosen by the Mallows and Akaike criteria are the same (although their rankings are slightly different), but the Schwarz criterion indicates that smaller submodels are preferable. Since the term log(n) adjusts for the large number of observations available, the Schwarz criterion may be more appropriate in this present context than those of Akaike and Mallows. These results clearly suggest that the deterministic component of the p H levels can be captured by some simpler model than the saturated one. The model [YS,YM] chosen by Akaike's criterion is already much simpler than the full model (109 versus 516 parameters). However, according to the criterion proposed by Schwarz, an even simpler model 59 can possibly be used. Nishii's results suggest that although the submodel chosen by the Schwarz criterion is an asymptotically 'consistent' estimator of the true model, the submodel chosen by the Akaike criterion has a greater likelihood of properly containing the true model. Therefore, to find the smallest possible model for the pH levels, we would need to examine the estimated values of the parameters in the model [YS,YM] to see whether they can be approximated by a smoothed function. This smoothed function (if found) would be a proposed model and its properties and fit to the data could be further examined. However, due to limitations of time, this interesting project must be deferred. 3.3.3) Stochastic component Many methods of analysis employed in environmetrics are based on assumptions of normality. Therefore, it is important to check the normality of the pH levels. The simplest way of examining this question is to use the normal probability plot. The residuals from the full model of the previous section are standardized and plotted for each station (all years together) separately. 60 Using SAS, we calculate the normal probability by Pr(e ) i *- (r -l)/(n+ -), = l 1 i where e,- denotes the residual of the I 7'i denotes the rank of e,-, n denotes the total number of residuals, $ denotes the cumulative normal distribution. th observation, The resulting plots for the different stations are presented in Figures 3.7.1 to 3.7.9. The plots of stations 013a, 3.7.1, 3.7.4, 3.7.9 044a, 171b are presented in respectively. These plots do not suggest any violation of the normality assumption. Figure 3.7.3 represents the normal plot of data from station 043a. The plot is very well-behaved, except for one possible outlier: pH=5.95 in November 1980 from 4.02 to 4.35). (the rest of the p H readings in that month range The normal plot for station 048a (Figure 3.7.5) has a similar property. December 1981 3.70 It also has one possible outlier: pH=6.00 in (the rest of the pH readings in that month range from to 4.71). The plot of station 072a (Figure 3.7.8) does not suggest any serious violation of the normality assumption; however, there is one apparent outlier: pH=6.57 in June 1981 61 (the rest of the pH readings in that month range from 3.86 to 4.07). Figures 3.7.6 and 3.7.7 represent data from stations 057a and 065a respectively. The plots do not show any serious violation of the normality assumption. However, these plots have some disturbing features. Although most stations have collinear data, there are some points on both tails which are slightly off those straight lines. A more careful examination of the data from these stations is described below. Figure 3.7.2 represents the normal plot of data from station 020b. The normal plot shows a serious violation of the normal assumption. Instead of having a roughly linear, a curve is obtained; the data needs to be examined more carefully. We can investigate these 'questionable' stations further by examining the normal plots for each year in each of these stations separately. The plots (not included in this work) for stations 057a and 065a for each year separately show that except for some outliers in the data, there are no obvious violations of the normality assumption. However, the plots of station 020b suggest a serious departure from normality. The plots for 1978 to 1982 are presented in Figures 3.8.1 respectively. to 3.8.5 Except for Figure 3.8.1, the plots clearly exhibit the 62 extreme skewness that the some of the data 1978 support data for apparent These results indicate distribution. d a t a f r o m the the station. normality Figure 3.8.1 assumption shows except for of the pH outliers. levels obtained at station normal from this that the 020b However, stochastic cannot components be considered as arising from a except for remaining stations can a few possible outliers, be reasonably approximated the as normal. 3.4) Conclusion In this chapter, we have examined the data set carefully in an attempt to uncover underlying patterns which might be used to find an appropriate model for the p H levels. T h e results indicate that there is a seasonal structure in the p H levels. Specifically, the time effects decrease slowly from November, January and then to August, decrease a increase little bit sharply from in December. September T h e rainfall volumes, however, do not show any obvious seasonal structures. is no obvious effect station The of the rainfall volumes effects are quite important deterministic components on the to There p H levels. The in explaining the p H levels. of the 63 p H levels were examined by fitting a saturated ANOVA model; the importance of the individual components in the model were also investigated. The results suggest that the deterministic components can be captured by a smaller model than the full model. The stochastic component was examined for the appropriateness normality assumptions by using normal probability plots. The results indicate that serious violations of the normality assumption are found in the data obtained at station 020b. However, except for a few possible outliers, the data from other stations can be reasonably approximated as normal. 64 4) C O N C L U S I O N In this study, we have learned that the model developed by E y n o n and Switzer appropriate network. (1983) for the for analyzing data obtained pH from levels, the does not seem MAP3S/PCN to be montoring However, the general approach seems to be useful in analyzing the spatial-temporal data. For this reason, the approach including the interpolation step called kriging was completely demonstrated using the Eynon-Switzer model. In an effort to identify a more appropriate model for the data, examined the raw data in detail. three factors, levels. at The model, including the Year, M o n t h , and Station, was fitted to the individual p H T h e results suggested different A full A N O V A we stations. conclusion is The that the that the residuals have different normality data of from the all residuals stations was except variances examined. one can reasonably be approximated as coming from normal distributions. T h i s is a very useful result because many methods of analysis employed in 65 environmetrics are based on assumptions of normality. Different criteria such as Akaike, Schwarz, Mallows, etc, are used to identify the 'best' submodel (i.e. fewer parameters than the full model) for capturing the deterministic component of the data. The results suggested that it is possible to capture the deterministic component by a much smaller model than the full model. It would be very interesting to examine whether the 'best' submodel can be represented by a smoothed function. The abilities to fit the data and the properties of this smoothed function (if found) could be further examined. Moreover, since the variances of the residuals of the data are not homogeneous from station to station, we must have the estimates of the variances of the residuals at any locations in order to do interpolation (kriging in this case). The question of whether we would be able to estimate these variances is also interesting. However, due to limitations of time, these interesting projects must be deferred. 66 REFERENCES Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. g" d formation Kiado, Theory. International Symposium (B. N . Petrov and F . Czaki, eds.). on In- Akademiai Budapest. Delhomme, J . P. (1978). Kriging in the hydrosciences. Resources, Adv. Water 1, 251-266. Egbert, G . D . and Lettenmaier, D . P. (1986). Stochastic Modeling of the Space-Time Water Resources Structure of Atmospheric Research, Eynon, B. P. and Switzer, acidity. The Canadian Friedman, terplots. Report 22, 165-179. P. (1983). Journal The variability of rainfall of Statistics, J . H . and Stuetzle, Department Chemical Deposition. W . (1982). of Statistics, ORION006. 67 Stanford 11, 11-24. Smoothing University, of scatTechnical Journel, A . G . and Huijbregts, C h . J . (1978). Mining Geostatistics. London: Academic Press. Nishii, R. (1984). Asymptotic properties of criteria for selection of variable in multiple regression. Annals of Statistics, 12, 758-765. Schwarz, G . (1978). Estimating the dimension of model. Annals of Statistics, 6, 461-464. Weisberg, S. (1980). Applied Linear Regression. New York: Wiley. 68 Table 1.1: Site location : The MAP3S/PCN monitoring network. AOS s i t e identity Location Latitude Longitude Elevation F i r s t active Years of (meters) date data 013a Lewes, Delaware 38 46 00 75 00 00 0 020b Illinois, Illinois 40 03 12 88 22 19 043a Uhiteface, New York 44 23 26 044a Ithaca, New York 048a 01--Har-78 4 212 20--Nov-7 5 73 51 34 610 11--Oct-76 6 42 24 03 76 39 12 509 26--Oct-76 6 Brookhaven, New York 40 52 00 72 53 00 25 09--Feb-78 5 057a Oxford, Ohio 39 31 51 84 43 25 284 01--Oct-78 4 065a Penn State, 40 47 18 77 56 47 393 22--Sept-76 6 Pennsylvania 072a Virginia, Virginia 38 02 23 78 32 31 172 12--Dec-76 5 171b Oakridge, Tennessee 35 57 41 84 17 14 341 07--Jan-81 2 * A. Olsen and C. Watson, "Acid Deposition System (ADS) f o r S t a t i s t i c a l Reporting", EPA-600/8-84-023, September 1984, pp. C.7-C.13. 69 Table 1.2; Summary of data. pH ADS Identity Year No. of rainfalls ~ Mean sd Volumes(mm) ~~~7 sd(ad_) Mean sd 013a 79 80 81 82 52 71 64 73 4.37 4.25 4.22 4.26 .40 .35 .40 .55 .28 .30 .36 .48 24.99 12.88 13.16 13.31 29.49 13.92 12.97 15.28 020b 78 79 80 81 82 50 40 94 87 80 4.25 4.73 4.28 4.34 4.37 .55 .89 .68 .65 .52 .58 .77 .69 .57 .48 10.44 19.80 9.21 10.68 11.12 12.92 29.48 11.65 14.26 14.01 043a 77 78 79 80 81 82 83 54 50 87 99 98 4.24 4.16 4.08 4.03 4.06 4.21 .33 .31 .18 .29 .23 .39 .32 .24 .17 .26 .23 .37 12.97 14.16 17.08 9.52 10.92 8.22 13.18 12.39 14.39 7.78 16.10 10.06 044a 77 78 79 80 81 82 52 60 62 75 72 80 16 05 08 19 10 14 .28 .27 .29 .32 .23 .33 .20 .20 .27 .27 .20 .31 19.06 13.83 17.80 12.41 16.49 11.54 14.20 11.09 19.63 12.20 17.06 12.01 048a 78 79 80 81 82 48 71 64 80 69 4.07 4.10 4.14 4.18 4.23 .35 .49 .51 .45 .45 .30 .42 .44 .39 .40 14.86 12.93 12.61 11.93 15.49 15.21 18.15 12.77 13.26 18.55 057a 79 80 81 82 47 75 75 83 4.22 4.08 3.80 3.99 .23 .29 .24 .37 .20 .24 .19 .30 15.08 11.72 9.67 11.20 16.97 13.34 8.42 13.20 065a 77 78 79 80 81 82 80 70 61 40 69 86 4.06 4.04 4.20 4.14 4.19 4.23 .27 .30 .27 .61 .30 .36 .22 .25 .24 .20 .25 .33 14.93 13.00 18.57 12.46 11.64 9.46 17.84 11.03 18.57 12.61 15.24 10.85 072a 78 79 80 81 82 57 51 65 56 58 4.06 4.17 ,09 .15 ,15 .34 .30 .30 .46 .30 .28 .29 .31 .48 .29 18.00 25.30 10.85 14.98 16.01 17.47 32.42 11.50 21.79 14.74 171b 81 82 65 64 .16 .27 .23 .23 .23 .22 14.63 23.14 13.75 20.17 70 TABLE The f i t t e d constant f i t t i n g Year 013a 020b a 82 by 043a 044a 0.0185 -0.0015 048a 057a 065a 072a 171b 0.0598 0.1375 0.1855 0.2364 c 0.0466 0.0422 0.0348 d 4.1311 3.9918 3.9618 a -0.1100 0.2063 0.0218 -0.0477 0.0474 0.0941 b -0.0066 0.1897 0.1631 -0.0507 0.1652 0.0257 c -0.0064 0.0227 0.0763 0.0662 0.1069 0.0477 d 4.2678 4.1145 3.8588 3.8740 3.8083 3.9056 a -0.1478 -0.2415 -0.0241 -0.1017 0.0742 -0.0466 -0.0045 ,0871 b 0.2606 0.3583 0.0729 0.1117 0.3195 -0.0081 0.0871 ,0807 c 0.0187 -0.0289 0.0348 0.0565 0.0337 -0.0056 0.044S 0254 d 4.2778 5.0538 3.9550 3.8905 4.0563 4.2408 4.0607 4.0381 79 81 obtained model b 77 78 coefficients Eynon-Switzer Station Parameter 80 the 2.1; a -0.0616 -0.1810 -0.0446 0.0366 -0.1499 0.2058 0.0586 -0.0176 b 0.1467 0.3039 0.1561 0.1855 0.1075 0.0729 0.3813 0.1559 c 0.0906 -0.0015 0.0285 0.0449 0.1371 -0.0020 0.1805 0.0665 d 4.0414 4.3846 3.9929 4.0731 3.8802 4.0784 3.9236 3.9623 a 0.1242 -0.0530 0.0252 0.0412 0.1368 0.1531 0.0945 0.0771 0.0771 b 0.2258 0.3821 0.0365 0.1830 0.1232 0.0367 0.1496 -0.0399 0.0510 c 0.0450 -0.0004 0.0553 0.0311 0.1431 0.0331 0.0447 0.0201 0.0495 d 4.1085 4.3990 3.9527 4.0137 3.9080 3.7293 4.1141 4.0722 4.0176 0.0546 a 0.1082 -0.0193 -0.1840 -0.0383 -0.1048 -0.0392 1375 0.0399 b 0.2331 0.0575 0.0494 0.2062 0.2176 0.1322 1694 0.1471 0.0882 c 0.0855 0.0161 0.0690 0.1026 0.1375 0.0266 0516 0.0259 0.0087 d 4.6606 4.3374 4.0859 3.9298 3.9194 3.9313 1205 4.0602 4.2359 71 ANOVA and median decompositions on a,b,c, Table 2.2t (1) (a) Ads The residuals and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s a(x) using AVERAGE (part a) and MEDIAN (part b ) . T h i s uses AVERAGE f o r d e c o m p o s i t i o n . Id. Year R e s l d . 77 78 79 80 81 82 Col. e f f . (b) Ads Res. and d. 013a 020b 043a 044a -0 01 -0 0.04 0 16 -0 -0.08 -0.07 0 05 -0 -0.05 -0.06 -0 02 0 0.07 -0.00 -0 02 -0 0. 1 1 0.09 -0 18 -0 048a 057a 02 02 -0 08 02 0 17 -0 06 -0 11 0 00 0 11 0 02 -0 08 -0 0.00 -0. 10 -0. 00 -0 01 -0 02 04 16 04 10 0 07 065a 0 -0 0 0 -0 0 072a 171b 00 04 0 02 04 -0 03 05 -0 02 18 0 01 -0 05 11 0 03 0 01 0 03 0 02 Effect 0.02 0.05 -0.07 -0.02 0.05 -0.01 O v e r a l l average 0 05 0.00 T h i s uses MEDIAN f o r d e c o m p o s i t i o n . Id. Year 77 78 79 80 81 82 Col . e f f . 013a 020b -0 -0 0 0 0 -0 -0 0 0 10 05 05 1 1 043a 044a 048a 057a 0.01 -0 00 06 0. 18 0 00 0 07 0.05 -0 02 0 22 -0 05 -0.01 0 08 -0 04 0 -0.02 0 01 0 17 0 10 -0. 16 -0 00 -0 00 -0 0 02 -0 10 -0.01 -0 02 -0 08 72 04 17 04 08 0 06 065a -0 -0 0 0 -0 0 072a 171b 01 04 0 03 01 -0 05 04 -0 01 20 0 -0 04 10 0 03 0 04 0 05 0 03 Effect 0.02 0.03 -0.07 -0.03 0.05 -0.02 O v e r a l l median 0 04 -0.00 (2) (a) The residuals and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s b(x) using AVERAGE (part a) and MEDIAN (part b ) . T h i s uses AVERAGE f o r decomposition. Ads Id. Year 013a 020b R e s l d . 77 78 79 80 81 82 0 03 -0 1 1 o 03 0 02 -0 00 -0 01 -0 03 0 03 0 02 -0. 16 0 15 0 06 -0 13 0. 12 -0 05 -0 08 0 16 -0 08 -0 13 -0 00 0.04 0 01 -0 03 -O 08 -0 03 0 14 0 04 o. 18 -0 05 0 03 -0 OO -0 00 -0 03 -0 10 -0. 16 -0 06 0 04 0 08 0 08 -O 03 0 07 Col. e f f . 0 07 (b) 0.07 -0 04 044a 048a 057a 0 03 -0 00 -0 09 065a 072a 171b 0 OO 0 02 Effect 0.04 -0. 07 0. 02 0. 04 -O. 02 -0. OO Overal1 a\ 0 05 -0 07 -0 08 0. 15 T h i s uses MEDIAN f o r d e c o m p o s i t i o n . Ads Id. Year 77 78 79 80 81 8.2 Col. 043a eff. 013a 0 -0 0 -0 07 10 01 01 0.09 020b -0 0 0 0 -0 043a 044a 048a 057a 065a 072a 171b -0.01 -0 04 0 01 0 03 -0 06 0. 13 0 02 -0 15 0.01 -0 03 0 22 -0 02 -0 05 0 0.04 -0 01 -0 05 0 00 0 19 0 01 1 1 -0.05 0 02 0 -0 00 -0 01 -0 15 -0 01 24 -0.06 0 01 0 07 0 07 -0 02 0 01 0 01 25 1 1 0.15-0.04 0.04 O 73 -0.08 Effect 0 -0 -0 0 -0 0 05 04 03 02 01 01 Overal1 median 0.04 -0.01 -0.07 | 0.14 The r e s i d u a l s and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s c(x) using AVERAGE (part a) and MEDIAN (part b ) . (3) (a) This Ads I d . Year Resld. 77 78 79 80 81 82 Col. e f f . (b) This Ads Id. Year Res. 77 78 79 80 81 82 Col . e f f . uses AVERAGE 013a -0 0 -0 0 020b f o r decomposition. 043a 044a 048a 0 01 -0 01 -0.01 -0 02 0 01 01 0 02 0 02 0.00 01 -0.02 -0 03 -0 0 3 01 0.01 0 01 -0 0 3 02 0 03 0.01 0 02 0 01 uses -0.05 MEDIAN -0 01 -0 -0 0 0 0 065a 072a 171b -0.03 04 0.03 0 01 04 0 01 0 01 -0.01 0.08 01 -0 0 3 0 01 0. 02 04 0 02 -0.03 -0 01 02 0 0 0 -0.03 -0 02 - 0 . 03 0 05 0 01 057a -0 04 0.03 -0 01 Effect -0.01 0.00 -0.03 0.02 -0.00 0.01 Overall average 0.05 -0. 02 f o r decomposition. 048a 013a 020b 043a -0.03 0.02 -0.02 0.02 -0.01 -0.00 0 0.01 0.02 -0.00 -0.01 -0.04 0.01 0.02 0.00 -0.03 -0.02 0.01 -0.02 0.04 0.01 0.02 -0.06 0.00 044 a 057a -0.07 -0.08 -0.01 -0.03 0 0.02 0.02 0.01 0.00 0.08 0.01 74 -0.03 065a 072a -0.01 0.04 0 0.01 0.01 0.02 0. 12 -0.01 -0.02 -0.01 -0.02 0.01 -0.01 171b 0.02 -0.02 Effect -0.01 0.01 -0.02 0.01 -0.00 0.00 O v e r a l l median 0.05 -0.02 (4) The r e s i d u a l s and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s d(x) using AVERAGE (part a) and MEDIAN (part b ) . (a) T h i s uses AVERAGE f o r d e c o m p o s i t i o n . Ads Id. Year R e s i d . 77 78 79 80 81 82 Col. e f f . (b) 013a 0 -0 0 -O 02 06 02 07 0 06 020b -0 0 -0 -0 -0 043a 044a 0 13 0 13 0 17 -0 43 -0 21 -0 08 -0 02 0 06 -0 06 0 16 0 04 -0 048a 057a 065a 07 0 -0 01 0.04 0. 1 1 -0 20 -0.00 14 -0.02 0. 1 1 -0 08 0.01 -0.24 0 04 -0.02 -0.07 0 072a 00 10 -0 07 -0 05 -0 15 0 1 1 0 171b 01 10 02 10 -0 08 04 0 10 0 42 -0 03 -0 11 -0. 14 -0.07 -0 07 -0 06 Effect -0.04 -0.09 0. 13 -0.02 -0.03 0.01 O v e r a l l average 4.07 0 06 T h i s uses MEDIAN f o r d e c o m p o s i t i o n . Ads 1d. Year 77 78 79 80 81 82 Col. e f f . 013a 0 -0 0 -0 16 03 03 06 0.06 020b 043a 0 0 -0 0 -0 0 0 -0 -0 -0 0 62 01 00 10 0.38 044a 048a 14 0.02 22 -0.01 0 10 -0.15 0 02 0.07 -0 07 0.01 0 02 -0. 1 1 -0 09 11 03 057a 0 0 -0 03 -0 22 10 25 10 065a -0 -0 0 -0 0 0 02 07 02 08 11 07 072a 0 -0 -0 0 0 02 02 05 05 -0 09 0 09 0.00 -0.02 -0.11 -0.03 -0.01 -0.00 75 171b Effect -0 -0 0 -O 0 0 03 13 04 OO 00 04 Overal1 median 0.09 | 4.02 Table 2.3: The ratio of residual SS's of the corrected and uncorrected series of p H levels. Station Year 013a 020b 043a 044a .86 .63 .99 .55 .52 .56 1977 1978 048a 057a 065a 072a 171 .50 .52 .79 1979 .72 .86 .73 .48 .68 .98 .65 .74 1980 .70 .90 .82 .73 .75 .72 .57 .68 1981 .75 .85 .73 .61 .98 .73 .74 .90 .07 19S2 .70 .98 .77 .55 .53 .89 .76 .81 .86 Average .73 .92 .74 .59 .70 .71 .62 .80 .77 76 Table 2.4: Sums of squares errors for different models. SS_!(_ j) RSS_ Rss;;„ Station Year 043a 1977 8.97 8.23 09.0 7.71 06.7 1978 5.11 2.95 73.2 2.80 05.4 1979 1.G7 1.59 05.0 1.22 30.3 1980 7.27 G.12 18.8 5.95 02.9 072a 1982 5.28 4.59 15.0 4.28 07.2 OGSa 1978 4.25 3.84 10.7 3.30 16.4 044a 1981 3.82 2.86 33.6 2.34 22.2 013a 1980 8.54 7.80 08.5 5.71 3G.G * ia> SS_;(_) denotes the sum of squares error corresponding to fitting a reduced model which involves only term d (i.e. pH = d). Similarly, SSE(_ J,) corresponds to the model i0I . ^ = O S m 2nt 3W , + 6 c 2rrt ° 365 3 + ' / Note : the full model is 2?r£ P * '** H = fl8in denotes (SS_;(j) - S S £ ( _ i0>l 365 2nt + 6 C ° S 3C5 j ) / SS£( _ ) rf( it e•I + / 0 g l 0 (in percent). denotes (SS^JJ,.,!) - SSK(/««)) / ^E{/uii) 77 l-exp(-c/) («n percent). + * Table 2.5: The results of fitting the temporal variogram Vp(w) to the data, where V p [ w ) = i+1^/1.1-'' Station Year Number of data pts. 9i se(ffi) 92 se(c7 013a 1070 1326 .116 .004 2.93 3.54 013a 1980 2085 .085 .002 2.04 2.37 072a 1982 1G53 .079 .003 2.85 3.72 171b 1982 2016 .0G0 .002 1.54 2.C0 78 Table 2 . 6 : Temporal variogram estimated with data of each year and each station using where tj and tj> belong to the same year, and i denotes station i . Station Year ' 013a 020b 043a 044a .094 .050 .320 .056 .040 .100 1977 1978 048a 057a 005a 072a 171b .038 .053 .094 1979 .115 .805 .026 .051 .108 .057 .051 .008 1980 .085 .430 .071 .075 .208 .080 .242 .003 1981 .122 .364 .042 .033 .153 .043 .066 .204 .030 1982 .234 .275 .117 .009 .125 .122 .098 .078 .000 79 Table 2.7 The r e s i d u a l s and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s gi using AVERAGE (part a) and MEDIAN (part b ) . (a) T h i s uses AVERAGE f o r d e c o m p o s i t i o n . Ads i d . Year 013a 020b R e s i d . 77 78 79 80 81 82 -0.06 -0.08 -0.00 0. 10 0. 10 0.07 0.02 -0.09 0.01 0.01 -0.03 -0.02 0.01 0.33 -0.08 -0.04 -0.02 -0.05 -0.08 -0-07 -0.03 -0.02 -0.00 0.03 -0.02 0. 13 -0.06 -0.06 -0.01 -0.01 0.02 -0.02 -0.01 0. 12 -0. 16 0.05 0.02 -0.02 0.05 0.01 -0.02 Col. e f f . 0.01 (b) 044a 0.31 -0.06 -0.08 048a 057a 065a 0.02 -0.06 -0.04 072a 171b 0.00 0.01 Effect -0.07 -0.02 0.04 0.02 -0.01 -0.00 O v e r a l l average -0.03 -0.08 0. 13 T h i s uses MEDIAN f o r d e c o m p o s i t i o n . Ads i d . Year Res. 043a 77 78 79 80 81 82 Col . e f f . 013a 020b -0 -0 0 O -0 0 0 0 -0 00 05 00 06 0 05 043a 044a 048a 0 06 0.02 05 -0 00 -0.01 -0 45 -0 02 0.01 0 05 0 00 0.01 0 -0 01 -0.01 0 14 0 02 -0.03 -O 0 30 -0 02 -0.02 057a 065a -o oo 171b 06 -0 01 0 01 02 -0 00 0 00 0 04 0 00 0 17 -0 03 -0 02 0 01 0 13 0 01 08 0 01 -0 01 -o 05 -O Ol 0 09 -0 00 -0 01 80 072a Effect -0.02 0.00 -0.01 0.01 -0.00 0.05 Overal1 median 0.07 0 01 -0 04 Table 2.8: The estimated value of Ct i n year i i s <*(*.) = ' „ m Year Station 1979 1980 1981 1982 01 3a 020b 043a 044a 048a 057a 065a 072a 171b 4. 28 4. 84 4. 01 4. 00 4 .09 4 .16 4 .14 4 .06 4. 1 1 4. 28 3 94 4 .05 4 .03 3 .95 4 .13 3 .98 4. 14 4. 33 4. 00 4. 02 4. 13 3. 72 4. 14 4. 05 4. 07 4. 1 4 4. 27 4. 12 4. 03 4. 10 3. 89 4. 13 3. 99 4. 07 81 T a b l e 3.1; S t a t i o n e f f e c t s , u s i n g ANOVA and Median decompositions (pH l e v e l s ) . Year Station Anova 013a 020b 043a 044a 04Ba 057a 065a 072a 171b Median .09 -.01 .10 .00 -.08 -.07 Anova Median . 11 . 11 -.06 -.04 . 15 . 14 -.04 -.02 - .05 -.06 -.08 .02 Anova Median .22 .31 -.17 -.17 .03 .02 -.02 -.09 .09 .64 -.19 -.19 -.13 -.04 -.05 -.09 Anova .07 .24 -.12 .00 .02 -.09 -.04 -.08 1982 1981 1980 1979 1978 1977 Median Anova Median *P°va Median .06 .12 -.12 .07 .06 -.07 -.12 -.06 .08 .25 -.07 -.03 .03 -.30 .03 -.01 .02 .06 . 13 -.11 .07 .00 -.33 . 13 -.05 .04 .08 . 16 -.01 -.06 .02 -.22 .01 -.06 .06 . IS . 16 .00 -.06 .03 -.25 -.02 -.10 .04 T a b l e 3.2; Monthly e f f e c t s , u s i n g ANOVA and Median decompositions (pH l e v e l s ) . Year 1977 Anova January February March Aprl 1 May June July August September October November December .02 -. 13 .26 .07 -.22 -. 11 -.31 -. 13 -.03 . 13 .20 .28 Median .09 -.22 .20 .09 -.25 -.12 -.34 -.10 -.06 .11 .23 .29 1979 1978 Anova .38 -.10 . 11 .01 . 10 -.11 -.07 -.08 -.12 -.01 -.08 . 11 Median .40 -.19 . 12 -.01 . 15 -. 17 -. 11 -.02 -. 10 .05 . 10 .09 82 Anova .02 -.07 -.09 -.02 -.04 -. 18 -.29 -.22 . 18 .11 .41 . 18 Median - 18 02 06 02 06 11 22 12 26 03 19 08 Anova .24 -.05 . 12 .03 -.18 -. 14 -.06 -.32 -.06 .28 . 12 .00 Median - - 1982 19B1 1980 20 03 14 01 04 11 12 24 08 18 19 01 Anova Median Anova Median -.09 . 18 . 14 -.03 -.05 -.08 -.09 -. 17 -.05 -.07 . 15 .02 _ .05 .01 -.07 . 15 -.16 -.01 -. 18 -.21 -.04 .08 . 17 .22 .06 .01 -.05 .09 -.18 -.02 -.25 -.26 - . 14 .01 .09 . 17 - 11 .18 10 01 07 13 13 17 09 07 04 05 Table 3.3: S t a t i o n e f f e c t s on r a i n f a l l volumes, u s i n g ANOVA and Median decompositions. Year Anova Median -2.89 6.61 -2.46 3.00 -0.54 0.00 Anova Median -3.88 0.55 -0.51 1.05 -4.51 0.48 0.37 -0.37 -1.28 3.84 -2.03 5.63 1982 1981 1980 1979 1978 1977 Anova Median Anova Median Anova 11.39 0.40 -2.48 -2.20 -8.01 -3;44 -2.09 6.45 11.68 -9.25 0.38 -0.05 -5.69 -2.66 0.05 4 . 18 2.29 -2.80 -0.54 1.29 1.09 -0.27 -0.52 -0.58 5 .09 - 3 .07 - 0 .36 1 12 2 .95 - 0 .74 - 0 .43 0 37 0 -2 -1 4 -1 -2 -1 2 2 31 61 24 81 09 94 86 31 33 Median Anova 1.78 -2.26 -1 . 9 6 6.31 0.00 -1.29 -1 . 9 7 1 .42 3.73 0 -2 -5 -1 1 -2 -4 2 10 Median 57 02 to 64 99 28 05 17 53 3 09 - 0 11 31 0 00 3 93 - 2 31 - 2 40 1 89 11 2 0 -4 T a b l e 3.4: Monthly e f f e c t s on r a i n f a l l volumes, u s i n g ANOVA and Median decompositions. Year 1977 Anova January February March Apr 11 May •June July August September October November December -9.21 -9.08 4.41 2.74 -10.09 2.46 -5.30 -0.50 7.25 9.93 4.04 3.34 Median -10.62 -7.20 2.75 1.72 -7.24 - 1.20 -2.63 -3.11 6.69 10.50 3.25 4.58 1979 1978 Anova Median 10.16 9.67 -2.63 -2.10 -3.47 -0.67 -3.57 -1.86 8.98 1 2 . S O -4.60 -5.57 -0.34 -0.98 0.57 1.43 -1.71 - 0 . 2 0 -2.30 1-69 -4.04 -5.27 6.35 5.14 83 Anova 7.47 9.38 -1 .93 -0.78 0.25 -1.21 -5.11 3.24 3.64 -5.60 1 .98 -10.15 Median 10.66 1.25 1.94 1.54 -0.24 0.46 -3.68 -3.03 1 .64 -0.45 1 . 14 -9.95 1980 Anova -1.93 -3.70 3.57 3.34 3.97 0.31 -1.96 2.31 -2.09 2.72 -1.06 -5.95 Median -3.69 -4.32 3.34 2.38 4.67 0.06 -2.31 2.64 -1.81 1.77 -0.64 -6.54 1981 1982 Anova Median Anova Median -8.95 2.30 -4.81 -1.08 -0.84 -0.41 4.77 2.65 4.48 5.94 -2.09 -1.98 -7.99 3.61 -4.26 -1.03 -0.54 -1.66 1 0 . 10 6.26 4.61 5.30 - 1 . 18 0.51 3 . 14 2.44 -0.62 -1.66 -1.94 1.41 1.38 0.45 -1.63 -3.01 2 . 13 -1.93 1.80 1.11 -1.70 -2.04 -2.70 1.64 1.86 1 .60 -0.98 -3.41 0.75 -3.67 (a) S t a t i o n 020b ( I l l i n o i s . I l l i n o i s ) (c) S t a t i o n 048a (Brookhaven,NY) Figure 1.1: F i e l d pH. 84 (b) S t a t i o n 044a (Ithaca,NY) (d) S t a t i o n 065a (Penn State,PA) « II x > * K K II « « « » x II XX X X X * 14 0 109.) ours (a) S t a t i o n 020b IB? 9 IXIO i 1 36 5 719.0 « II II V * x* 11 0 II X l« 0 19.0 Oflrs 197.9 , HID 741 '. ixio' i (b) S t a t i o n 044a (Ithaca,NY) (Illinois,Illinois) 2* X »x x xV XX * x » 1 1 " XX "if/ "A * H* X X XX X X "V X X X X „ XX - X X* X XXX X x x* «x* xV „» X I , *«b**x/x » x * xx\ » \ «" xx , / x „ "x X .XX X* * 'X ' I x " * V * » X »x *« 99.9 110 14 0 Oflrs 117.3 IXIO 719.0 1 « 0 I DATS 197.9 IXIO 719.1 1 I 7)3 9 (d) S t a t i o n 065a (Penn State,PA) (c) S t a t i o n 048a (Brookhaven,NY) Figure 1.2; R a i n f a l l volumes. 85 F i g u r e 2.1 : The pH m o n i t o r i n g networks. 86 (b) 1978 Figure 2.2; Phase and amplitude of seasonal pH v a r i a t i o n : (a) 1977; (b) 1973; (c) 1979; (d) 1980; (e) 1981; (f) 1982. The d i r e c t i o n of the ray f o r each station indicates the time of the year of lowest pH. The length of the ray indicates the amplitude of the seasonal v a r i a t i o n . 87 (c) 1979 (d) 88 1980 89 0_ UJ a o ^1 -no josl I^GTO 1B2.S TIME LAG Figure m~0^ 255.5 292~0 J 2 B . 5 3 6 5 . 0 (day) 2.3: The estimates of the unsmoothed temporal variogram obtained f o r data of station 072a, i n 1982. 90 (0 0. 1 BO. O S . 0 4 ft) - 0 . 02 -0. 05 0. 02 - 0 . 05 0. 24 0 - 0 . 06 - 0 . Ofl- 0 9 ftj 1 C . 19 - 0 . 01 0. 24 o 0. 44 - 0 . 03 -0.08 - 0 . 05 0 . 21 0. 02 . -0. 02 -0. 08 - 0 . 08 0 . 18 - 0 . 06 t 1 - 0 . 08 - 0 . 09 0. 28 -O.-QB 08 - 0 . 05 10 1 - 0 . 01 -0. 08 <D 1 - 0 . 08 O 1 c | i l 2 4 6 1 8 i l ! 10 12 14 16 long, d i f f Figure 2.4 : The estimates o f the unsmoothed s p a t i a l variogram for different s t a t i o n p a i r s . LONGITUDE and LATITUDE ( i n degree) denote the d i f f e r e n c e i n longitude and i n l a t i t u d e , r e s p e c t i v e l y , between a s t a t i o n p a i r . A l l years of data combined. o. ea. op. oi o. 01 0 . 01 o. 01 -o. 0 5 0. 0 . O P - 0 -o.oi 6 - 0 . 0- 20 . 0 2 0. 0 3 o.oi -o.oi 04 0 . 01 o. 06 --0. 0 5 - • . 0 5 -0. 03 I - 0 . 05 - 0 . 0G -0. 02 CD I -0.0E04 -o. 0. 03 04 GQ I - 0 . o I 0 J I I 1 2 4 6 8 0 4 I I L 10 12 14 long. d l F f F i g u r e 2.5 : Same as above, data from S t a t i o n 91 020b removed. 16 -95 -90 -85 -80 longltude (a) Year 1979. longitude (b) Year 1980. 92 -75 -70 -65 -95 -90 -85 -80 -75 -70 longitude) (c) Year 1981. o in oo ID 9 U CD m to «» m -95 -90 (d) Year 1982. -85 -80 -75 longltudo Figure 2.6 ; Contour maps for pH. '*': denotes the location of station. 93 -70 -65 (1)ANOVA decomposition 2 8 2 1 0 8 9 9 2 1 9 0 2 1 013a 020b i i 043a 044a 1 048a STATION 1 057a 065a 072a 171b (2) Median decomposition 9 2 8 7 9 0 9 9 8 3 0 9 2 1 013a 043a 044a 048a 057a 065a 072a 171b STATION F i g u r e 3.1; Station effects (pH l e v e l s ) . Legend; 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982. 94 8 0 2 7 2 § 8 9 7 — i — Jan. —I Feb. 1 Mar. 1— Apr. — I — May 2 8 7 1 9 June July Aug. MONTH (1) ANOVA d e c o m p o s i t i o n Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2 UJ in 2 2 7 Jan. Feb. Mar. Apr. May f 8 June MONTH (2) Median d e c o m p o s i t i o n F i g u r e 3.2: Monthly e f f e c t s (pH l e v e l s ) . 95 July Aug. 9 2 1 8 0 ? 9 8 2 2 8 7 0 8 9 (1) ANOVA decomposition , 013a i 020b — i 043a 1 044a 1 048a i 057a " 065a 1 1 072a 171b STATION Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982 0 2 0 <->o 2 . 2 I 0 7 9 ° 1 ? h 8 ? 0 3 (2) Median decomposition 1 013a 1 020b 1 043a 1 044a 1 048a STATION Figure 3.3: Station e f f e c t s 96 1 057a 1 065a ( r a i n f a l l volumes) i 072a i 171b 8 7 9 8 8 7 0 8 0 Jan. Feb. 1 9 7 2 2 9 1 2 2 9 7 8 0 1 0 8 1 T Mar. T Apr. T" May ^ ^ BepT Oct. Dec. Nov. (1) ANOVA d e c o m p o s i t i o n Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982 M 0 N T H 9 8 1 I 9 0. 1 0 1 Jan. 1 Feb. 2 3 R S • 1 Mar. 9 1 Q 2 3 g 8 7 1 Apr. 8 9 8 7 • 9 J • i 9 1 1 1 1 May June July Aug. z n ? 2 1 Sep. 1 Oct. l 2 0 1 i Nov. Dec. (2) Median d e c o m p o s i t i o n ( r a i n f a l l volumes). 97 Q o ? M 0 N T H F i g u r e 3.4: Monthly e f f e c t s g r-i" Figure 3.5: Scatter' p l o t of a d j u s t e d volumes and a d j u s t e d pH l e v e l s , a l l data combined. 0- CO u * *** xl* xx*" x X X ' X X ".* x 1x ,»x*." x. x 1 P-I a CO o . i— -30 -20 -10 0 10 I— 30 20 VOLUME ADJ (mm) 40 —r* 50 — r 60 -1 70 (1) Station 013a > • a ° (X o' > -30 -20 > 10 -10 20 40 30 50 60 70 VOLUME ADJ (mn) (2) Station 020b I a. • * >. 11 1 , -30 -20 1 -10 1—• 0 1 " 10 20 VOLUME ADJ (mm) f 40 1 50 1 60 30 Figure 3.6: Scatter plots of adjusted volumes and adjusted pH l e v e l s f o r each s t a t i o n separately. 99 1 70 03 (3) Station 043a O " o* o CE x x x o •- , S X 0_ xx X* x »x x x x x XX***"** „x ' - X x x * " x x O . O . I -30 -20 10 -10 20 VOLUME ADJ (mm) (4) Station 044a tn, cc # ' X Xy * « x K x *« ** y X X o-l CO - I — T" -30 -10 -20 0 10 20 VOLUME ADJ (mm) 100 (5) Station 048a XX X X X Q X! X o o_ X X" <„ w X x X x x IJ „ " x X K P-I oo o . 1 —1 -30 -20 -10 10 20 I 30 VOLUME ADJ (mm) (6) Station 057a x X X cr o ' X * X X xx x Q-l CO o . -30 -20 -10 "T 0 -1— 10 20 VOLUME ADJ (mm) 101 30 (7) S t a t i o n 065a X M * X 2- if X X X cr o " » • » « x wx "* M ' „. * X X " x «x It * * * X X „ « o-l CO o. -30 1— -20 -1— 10 -10 20 —r 30 VOLUME ADJ (mm) (8) S t a t i o n 072a o" Q CX X xx. x xx X X x X X ix * o. CO o. I -30 -20 10 -10 20 VOLUME ADJ (mm) 102 30 40 PLOT OF RESIORNK'RESID LEGEND: * • 1 OBS. B - 7 OBS. ETC. A AB BAA ACS F BF GA BJ GG IE EL GOE CH* BL FK MB GCE CHB HE AGC CF DAE BF AE E -0.8 0.3 RESIO (5) S t a t i o n 048a AA CAA I.S R A N K 1.0 0 R 0.S R I A B L E AA BFA A fC CG GO CHC ACF8 CGF -ICB FF BFF 0.0 8KB FOA -0.3 R E S I 0 -1.0 BE ADA -1.5 0.50 RESIO (6) S t a t i o n 057a 105 DCB E A A A » PLOT OF RESIDRNK'RESIO LEGEND: * • 1 OBS. B • 2 OBS, ETC. A C AAA AA B ACB CBA OA CCC BFA DE a BDF G AK DI AGS GO f BDB DA CAAA BA 3.0 -2.5 -2.0 -I.S -1.0 -O.S 0.0 0.5 1.0 I.S 2.0 I.S 3.0 3.5 4.0 A.S RES1D (1) Station 013a BCC ABBD GO AFE CK JC HO GGB HO GI N K 1.0 F 0 R O.S B 0.0 L -0.5 E -1.0 s IK FF EH KC BJC H EH DCD 00 CF ADB CAA BC I 0 -I.S -2.0 -2.S -2.0 -I.S -1.0 -0.5 0.0 0 5 1.5 2.0 2.5 3.0 3.5 AO (2) Station 020b Figure 3.7; Normal plots for each station ( a l l years) 103 separately. 4.5 PLOT OF RESIORNK'RESID LEGENO: * • 1 OBS, B • 2 OBS. ETC. I F • 0 R V 0 • 1 FHL AILE DOE AFGF EFGO GPO HOI IOI HSJ — -CRO JPK OPC MJ L E R E S I BEE F B OOK HO AAAAA AB B A -2 • 0 -3 • 0.6 (3) Station 1 8 RESIO 043a B A B OKOO KOA DOOC CI I ID DOKF EIF1B AN I CEDFN BFBIE AEBDF CCFAC BOAA ABC E -a • s I - J . 5 I - 2 . 0 -1.5 - 1 . 0 -0.5 0.0 RESID (4) Station 044a 104 BBC BF FFCAO A AA PLOT OF R E S I O R N K ' R E S I O LEGEND: A • 1 OBS. B • 2 OBS.E T C . I 4 • AAC CBBC CDOC ECHO FOE I ILF FKKB HFHG HRG BFCIFE JHO i B L E R E S I D OP FHIA EGFB DJ I -1 • BD AAAAB -2 • I -3 • I -4.0 " o ! 9 " " " i . o " " - » . 5 " " - 2 . o " - I . S -< 0 -0.5 0.0 0.5 1.0 15 2.0 2.5 3.0 3.5 I.S 2.0 2.5 3.0 3.5 4.0 4.5 RESIO (7) S t a t i o n 065a a A ACS BC AF BE FC AED DFA AFD DI FDC KD EL GE DEC GCB BBA BBS BCF AGB CE AEB B AC ABA B AA AA A -3.0 -2.5 (8) -2.0 Station -1.5 -1.0 -0.5 0.0 072a 106 0.5 1.0 PLOT OF RESIDRNX'RESID LEGENO: * • 1 OBS, B • 2 OBS. ETC. R N 1.0 c c AC D CC V » R I A 8 L E R E S I 0.0 AF 8 CA -0.5 BC CB AC BAB -1.0 0 -1.5 A I . . 0 * 0 - a ! s - 2 * 0 - V . l " • -1.0 • .-0.5 0.0 RESIO (9) S t a t i o n 171b 107 0.5 1.0 • I.S • * 2.0 " 2.5 3.0 PLOT OF RESIDRNK'RESIO * L E R E S I 0 LEGEND: 4 • 1 OBS, B • 2 OBS. ETC. 0.0 •- -0.5 -1.0 • -i.a 0.6 -I.J RESIO (1) 1978 F 0 R 0.5 A R I 0.0 • L E -0.5 • R E S I 0 -1.0 • AA A 0.4 RESID (2) 1979 Figure 3.8: Normal p l o t s f o r each year separately at s t a t i o n 020b. 108 PLOT OF RESIDRNK'RESID I LEGEND: A • 1 OBS. B • 2 OBS. ETC. 2.S • R A N F 0 R B L E AA A A BA 1.0 0.9 B CA -0.5 ABA BA BAA R -t.O E S I 0 -I.S F BC CA AC BA A -3.0 • I (3) 1980 N K 1 .0 F 0 R 0.5 AB BA C BA BC DA CA V A 0.0 B L E -0.5 R E S 1 0 -1.0 -1.5 -I 0 RESID (4) 1981 109 PLOT OF RESIDRNK'RESID F 0 R LEGEND: A • 1 OBS, B • 2 OBS. ETC. A AB A 0.9 AE C A BB BA R E S I D -1.0 -1.5 -3.0 • I 1.5 RESID (5) 1982 110
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis of the temporal-spatial structure...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis of the temporal-spatial structure of pH levels from the MAP3S/PCN monitoring network Le, Nhu Dinh 1986-12-31
pdf
Page Metadata
Item Metadata
Title | Statistical analysis of the temporal-spatial structure of pH levels from the MAP3S/PCN monitoring network |
Creator |
Le, Nhu Dinh |
Publisher | University of British Columbia |
Date | 1986 |
Date Issued | 2010-06-20T16:54:32Z |
Description | The approach developed by Eynon-Switzer (1983) to analyze the spatial-temporal structure of a data set obtained from the EPRI monitoring network is applied to a data set obtained from the MAP3S/PCN monitoring network. In this approach, a spatio-temporal stochastic model, including deterministic components for seasonal variation and rainfall washout, is fitted to the data. The results indicate that the model fails to capture some of the features of the underlying structure. In an effort to identify an appropriate model for the data, we examine the raw data in detail. An ANOVA model is fitted to the data. Different criteria such as Akaike, Schwarz, Mallows, etc, are used to identify the 'best' submodel (i.e. eliminate some terms in the full ANOVA model). The results indicate that it is possible to capture the deterministic component of the model with a much smaller model (i.e. fewer parameters). The normality of the residuals is also examined. The results indicate that the data from all stations except one can reasonably be approximated as coming from normal distributions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2010-06-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0096703 |
URI | http://hdl.handle.net/2429/25884 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- [if-you-see-this-DO-NOT-CLICK]
- UBC_1986_A6_7 L39.pdf [ 5MB ]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0096703.json
- JSON-LD: 1.0096703+ld.json
- RDF/XML (Pretty): 1.0096703.xml
- RDF/JSON: 1.0096703+rdf.json
- Turtle: 1.0096703+rdf-turtle.txt
- N-Triples: 1.0096703+rdf-ntriples.txt
- Original Record: 1.0096703 +original-record.json
- Full Text
- 1.0096703.txt
- Citation
- 1.0096703.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 26 | 17 |
United States | 11 | 3 |
Japan | 9 | 0 |
Germany | 3 | 0 |
France | 1 | 0 |
United Kingdom | 1 | 0 |
City | Views | Downloads |
---|---|---|
Beijing | 19 | 0 |
Ashburn | 9 | 0 |
Tokyo | 8 | 0 |
Shenzhen | 7 | 17 |
Unknown | 3 | 4 |
Nagoya-shi | 1 | 0 |
Nairn | 1 | 0 |
Mountain View | 1 | 0 |
Pasadena | 1 | 0 |
Paris | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096703/manifest