Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Statistical analysis of the temporal-spatial structure of pH levels from the MAP3S/PCN monitoring network Le, Nhu Dinh 1986

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
UBC_1986_A6_7 L39.pdf [ 5MB ]
Metadata
JSON: 1.0096703.json
JSON-LD: 1.0096703+ld.json
RDF/XML (Pretty): 1.0096703.xml
RDF/JSON: 1.0096703+rdf.json
Turtle: 1.0096703+rdf-turtle.txt
N-Triples: 1.0096703+rdf-ntriples.txt
Original Record: 1.0096703 +original-record.json
Full Text
1.0096703.txt
Citation
1.0096703.ris

Full Text

STATISTICAL ANALYSIS OF T H E TEMPORAL-SPATIAL S T R U C T U R E OF pH L E V E L S F R O M T H E M A P 3 S / P C N MONITORING NETWORK by N H U DINH L E B.Sc, The University of British Columbia ,1984  A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S FOR T H E D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES (The Department of Statistics)  We accept this thesis as conforming to the required stj>adprd  T H E UNIVERSITY OF BRITISH C O L U M B I A August, 1986 © Nhu Dinh Le, 1986  In p r e s e n t i n g  this thesis  r e q u i r e m e n t s f o r an of  British  it  freely available  agree that  in partial  advanced degree a t  Columbia, I agree that for reference  permission by  understood that for  h i s or  be  her  copying or  f i n a n c i a l gain  Department o f  statistics  The  of  University  Vancouver,  V6T 1Y3 Date  E-6  (3/81)  British  Canada  August 20,  1986  University  Library  s h a l l make  and  study.  I  the  publication be  of  further this  this  my  It is thesis  a l l o w e d w i t h o u t my  Columbia  thesis  head o f  representatives.  s h a l l not  the  the  g r a n t e d by  permission.  1956 Main M a l l  the  f o r extensive copying of  f o r s c h o l a r l y p u r p o s e s may department or  f u l f i l m e n t of  written  ABSTRACT  The  approach developed by Eynon-Switzer (1983) to analyze the  spatial-temporal structure of a data set obtained from the E P R I monitoring network is applied to a data set obtained from the M A P 3 S / P C N monitoring network. model,  In this  approach,  a spatio-temporal  stochastic  including deterministic components for seasonal varia-tion and  rainfall washout, is fitted to the data.  The results indicate that the  model fails to capture some of the features of the underlying structure.  In  an effort  to identify an appropriate  examine the raw data in detail. data.  model for the  An ANOVA  data,  we  model is fitted to the  Different criteria such as Akaike, Schwarz, Mallows, etc, are used  to identify the 'best' submodel (i.e.  eliminate some terms in the full  A N O V A model). The results indicate that it is possible to capture the deterministic component of the model with a much smaller model (i.e. fewer parameters). The  The normality of the residuals is also examined.  results indicate that the data from all stations  except one can  reasonably be approximated as coming from normal distributions.  ii  TABLE OF CONTENTS  ABSTRACT  tt  TABLE O F CONTENTS  tit  LIST O F T A B L E S  v  LIST O F F I G U R E S  vt  ACKNOWLEDGEMENT  vtt  1) I N T R O D U C T I O N  1  1.1) Inference for Spatial-Temporal Data  1  1.2) Data Set Description  3  1.2.1) p H levels  4  1.2.2) Rainfall volumes  5  1) E Y N O N - S W I T Z E R A P P R O A C H  7  2.1) Introduction  7  2.2) The Eynon-Switzer Model  8  2.3) Preliminary Fitting of The Model  10  2.4) Spatial and Temporal Components  17  2.4.1) Fitting the temporal variogram  18  2.4.2) Fitting the spatial variogram  21  2.5) Fitting Without Station 020b  24  2.6) Kriging With a Different Spatial Variogram  26  2.7) Conclusion  29  '  3) D A T A E X P L O R A T I O N  31  3.1) Factors to Be Examined  31  3.2) Preliminary Examination of The Structure of The Data 3.2.1) Relationship between p H and Month, iii  .  33  pH and Station  34  3.2.2) Relationship between Volume and Station, Volume and Month 3.2.3) Relationship between p H and Volume 3.3) A n A N O V A Model  39 44 46  3.3.1) Fitting the A N O V A model  47  3.3.2) Deterministic component  52  a) Multiple correlation coefficient  54  b) Adjusted R  55  c) Mallows' C  2  p  statistics  3.3.3) Stochastic component 3.4) Conclusion  55 60 63  4) C O N C L U S I O N  65  REFERENCES  67  iv  LIST O F T A B L E S  TABLE 1.1  Site location: The M A P 3 S / P C N monitoring network  69  1.2  Summary of data  70  2.1  The fitted constant coefficients obtained by fitting the Eynon-Switzer model  71  2.2  A N O V A and median decompositions on a, b, c, and d  72  2.3  The ratio of residual SS's of the corrected and uncorrected series of p H levels  76  2.4  Sums of squares errors for different models  77  2.5  The results of fitting the temporal variogram Vp(w) to the data, where Vp(w) = gi/(l  2.6  + [w /g ]~ ) 2  tj) — pH(xi, tj>)) ] 2  2.8  The estimated value of or  3.1  Station effects, using A N O V A and Median  and M E D I A N  82  Monthly effects, using A N O V A and Median decompositions (pH levels)  82  Station effects on rainfall volumes, using A N O V A and Median decompositions  3.4  80 81  decompositions (pH levels)  3.3  79  The residuals and effects obtained by decomposing the estimated coefficients g! using A V E R A G E  3.2  78  1  Temporal variogram estimated with data of each year and each station using Vp(w) — ave[^(pH(xi,  2.7  2  83  Monthly effects on rainfall volumes, using A N O V A and Median decompositions  v  83  LIST O F F I G U R E S  y  FIGURE 1.1  Field p H  84  1.2  Rainfall volumes  85  2.1  The p H monitoring networks  86  2.2  Phase and amplitude of seasonal p H variation  87  2.3  The estimates of the unsmoothed temporal variogram obtained for data of station 072a, in 1982  2.4  90  The estimates of the unsmoothed spatial variogram for different station pairs  91  2.5  Same as 2.4, data from station 020b removed  91  2.6  Contour maps for p H  93  3.1  Station effects (pH levels)  94  3.2  Monthly effects (pH levels)  95  3.3  Station effects (rainfall volumes)  96  3.4  Monthly effects (rainfall volumes)  97  3.5  Scatter plot of adjusted volumes and adjusted p H levels, all data combined  3.6  98  Scatter plot of adjusted volumes and adjusted p H levels for each station separately  99  3.7  Normal plots for each station separately  103  3.8  Normal plots for each year separately at station 020b  108  vi  ACKNOWLEDGEMENTS  I would like to thank Professor A. J. Petkau for his guidance and assistance in producing this thesis. I would like to express my gratitude to Professor J. V. Zidek for his careful reading and helpful comments on this work. I am  indebted to Drs.  A. Olsen and C. Watson of  Batelle Pacific Northwest Laboratory for making the M A P 3 S / P C N data set available to me.  I also would like to express my  appreciation to  the U B C Acid Rain group, particularly Professors P. Guttorp and P. Sampson, for extensive input during the course of this work.  This work was partially supported by the Environmental Protection Agency through a co-operative research agreement with SIAM's Institude for Mathematics and Society (SIMS) and by the Natural Sciences and Engineering Research Council of Canada.  vii  1) I N T R O D U C T I O N  1.1)  Given  Inferences for S p a t i a l - T e m p o r a l D a t a  a data set  of monitoring sites, inferences  about  (pH levels in this case) obtained from a  it is often necessary in environmetrics  events at  nonmonitored sites.  to  set  make  The solution of this  problem usually involves some interpolation techniques (i.e. finding the 'best' estimates for the data at nonmonitored sites). Kriging (Delhomme 1978)  is a simple method for interpolation to obtain contour maps over  the whole region which contains  the monitoring sites.  However,  the  spatial covariance function for the whole region is required in order to do kriging.  The spatial covariance function can possibly be modelled  by using the estimated values of spatial covariances  between pairs of  monitoring sites, which in turn depend on the spatial-temporal stochastic model used to explain the raw data.  Successful use of this  scheme  for inferences concerning spatial-temporal data requires an appropriate spatial-temporal stochastic model for the data. l  In chapter 2 of this thesis, the approach followed by Eynon and Switzer (1983) is applied to the data set obtained by the MAP3S/PCN monitoring  network.  The results indicate that the spatial-temporal  stochastic model developed for the variability of pH levels in their paper fails to capture some of the features of this independent data set.  However, for the sake of illustration, we carry the approach  through to its logical conclusion and obtain by kriging, contour maps for this data set.  In an effort to identify a more appropriate spatial-temporal stochastic model for the data set, the underlying structure of the deterministic components of the data are carefully examined in chapter 3. The results show that there is some seasonal structure in the pH levels. The pH levels are not influenced by the rainfall volumes which show no seasonal structure of their own. A saturated ANOVA model is fitted to the pH levels and the importance of each component of the model is examined. The results suggest that the deterministic components of the pH levels can be captured by a simpler model than the saturated one. Many criteria for comparing different models are also described.  Many of the models and methods of analysis employed in environmetrics are based on assumptions of normality; therefore, the structure 2 -  of the residual stochastic component is examined by simple normal probability plots. The results indicate that the data from all sites but one have residual variation which can be reasonably approximated  as  being normally distributed.  1.2)  Data Set  Description  The MAP3S/PCN network has nine monitoring stations which are located in the northeastern part of the United States.  Data were  collected for each day on which rainfall occurred from 1976 to 1982; one data record was collected per storm (i.e. event data). However, the stations started collecting data at different times; only data obtained at stations which were active for the entire year are considered (i.e. data obtained in the first year of operation, but with start-up time much later than January  1  st  of that year are ignored). About five  percent of the data are missing (i.e. there was a storm but no data were reported for the pH  level or volume); missing data are ignored  in this work. As in the Eynon-Switzer and  analysis, only (field) pH level  Volume measurements are used although the data set provides  much more information (e.g. Conductivity, Sulfite, Sulfate, H,...). The +  variables considered are Time (t, t=0 on Jan 1**, 1976 and the unit is a day), Volume (rainfall volume was obtained from a raingauge and 3  measured in millimeters) and p H level. T h e station locations, the first active  dates,  and the  given in Table 1.1. the  averages  and  A D S (Acid  Deposition System)  site identity  are  Table 1.2 provides the number of rainfall events and standard  deviations  and rainfall volumes for each station  of the in each  corresponding p H levels year.  1.2.1) p H levels  T h e standard deviations of the p H levels are not entirely from year to year and from station to station.  consistent  T h e standard deviation of  the data at station 020b (Illinois, Illinois) is consistently larger than that at other stations; this excess variability is easily seen in the plots of the raw data (Figures exhibits  a large  data-value the  (a  standard  a value  standard  data  Station 065a (Penn State, Pennsylvania)  deviation (.61)  level of 7.31  this station  without  consistent (Table  corresponding to  in 1980  in December  deviation calculated  which is perfectly  other years at  The  pH  l.l.a-l.l.d).  1980;  due to see  an outlying  Figure  l.l.d);  this single p H level is  with the  standard  deviations  .30, of  1.2).  four stations  (020b, 044a,  048a,  065a)  are plotted in Figure 1.1 ( p H levels against T i m e ) . These four plots do not  demonstrate  any obvious long-term relationship between p H levels  4  and Time; of course, seasonal patterns may be obscured by the severely compacted Time axis. However, the plots do show differences in the data corresponding to different stations.  Station 020b (Figure 1.1.a)  has a large variability in its pH levels; station 048a (Figure l.l.c) has distinctly smaller variability. Stations 044a and 065a (Figures 1.1.b and l.l.d) display the same overall pattern of relatively small variability. One difference is that there are some possible outliers in the data obtained at station 065a. The plots of the data of other stations (not included in this report) are similar to one of the above figures.  1.2.2) Rainfall volumes  The average rainfall volumes vary considerably from year to year and from station to station (Table 1.2). The average rainfall volumes in 1979 are greater than those of other years at all stations but stations 044a and 048a. The standard deviations of the rainfall volumes are large (Table 1.2), and vary greatly from year to year and from station to station.  At most stations the standard deviations in 1979 are  exceptionally large relative to those of other years; for example, at station 013a, the standard deviation is 29.49 in 1979 while those in other years are less than 15.28. Rainfall volumes are plotted against time for each station separately to examine whether there is any  5  obvious pattern  in the data.  The resulting plots do not suggest any  obvious long-term relationship between the rainfall volumes and Time. Overall, all plots are similar; the plots of four stations (020b, 044a, 048a, 065a) are presented in Figure 1.2. rainfall volumes are clearly demonstrated  The great variabilites in the in all plots.  possible outlier at station 020b in August of 1979;  There is one  the rainfall volume  is 157.14 mm (the rest are less than 90.00 mm) and the corresponding pH level is  4.22.  6  2) E Y N O N - S W I T Z E R  2.1)  APPROACH  Introduction  One general objective of the study of rainfall acidity is the determination of 'best' estimates of a spatial process describing geographic fluctuations in the acid levels of rainfall at some unobserved locations using the observed data obtained at a limited number of stations monitoring rainfall acidity. These estimates would allow consideration of the optimal design of networks to monitor the long term and potentially damaging effects of the components of acid rain.  An  approach  for  and Switzer (1983).  finding  these estimates is provided by Eynon  In their paper, a parametric  model is developed  using data obtained from the Electric Power Research Institute (EPRI) network.  A spatio-temporal  stochastic model, including deterministic  components for seasonal variation and rainfall washout components for spatial, temporal, and measurement 7  and stochastic  variation, is fitted  to  the data.  unbiased  Using the method known as Kriging,  estimates  (BLUE)  of seasonal  and rainfall  the best linear adjusted yearly  average p H over the monitoring region are obtained.  It should be noted that Eynon and Switzer use the data set obtained by the E P R I network which also has nine stations, each with two years of data.  The geographical locations of stations in both networks are  presented  in Figure 2 . 1 . Although the geographical locations of the  stations in the two networks differ, both networks cover essentially the same area.  One difference is that rainfall volume is measured in inches  in the E P R I network.  In  this chapter,  we present  the results of an attempt to validate  the Eynon-Switzer model using the data obtained by the M A P 3 S / P C N monitoring network.  The attempted validation follows the Eynon and  Switzer paper in a step-by-step fashion. We also present some of the corresponding results from their paper using data obtained from the E P R I network.  2.2) The Evnon-Switser M o d e l  Eynon  and Switzer propose  the 8  following model under which a  single year's data are analyzed : -log H+{x,t)=pH{x,t) l0  i \  nr.\  t \ • 27rt  = <*(x) + P(t) + a{x) ,  sin —  dob  2irt  + b (x) cos  c-I(x.t)  —  o65  .  ,,  +'°s.o1_exp[_c.;(I>i)l+^.o, where x  denotes the two-dimensional location under consideration,  t  denotes the number of days starting from January 1**, 1976j  I(x,t)  denotes the observed rainfall in millimeters at location x on day t (Eynon-Switzer used inches in their analysis),  pE(x,t)  denotes the observed rainfall pH at location x on day t, given I(x, t) > 0,  H (x,t) +  denotes the hydrogen-ion concentration at location x on day t in gram-atoms per litre,  a(x)  is a stationary auto correlated spatial process describing the geographic fluctuations in annual pH level,  P(t)  is a zero-mean stationary autocorrelated time process describing temporal fluctuations not attributed to seasonal variation, independent of a(x),  e(x t) }  is a zero-mean stationary white-noise measurement error process, independent of a(x) and /?(£),  a(x), 6(x)are location-specific model constants which describe  seasonal fluctuations, and c  is a washout rate constant.  The  correction  for rainfall volume (i.e.  the logarithmic term in  the pH model) represents the simplest scavenging process in which the number of hydrogen ions picked up per unit time per unit volume of rain is proportional to the remaining atmospheric concentration.  This  process produces lower acidity (higher pH) for larger rainfall volume. The limit of the logarithmic term is zero as the rainfall volume tends to zero. Since /?(£) and e(x,t) are zero-mean residual processes, a(x)  may  be interpreted as the seasonally adjusted maximum acidity potential at location x.  2.3)  Preliminary Fitting of The Model  In this section all parameters in the model are estimated by using the B M D P statistical computer package; the program used is Derivative Free Non-Linear Regression, P : A R . For each of the study years,  the  separate fitting of the data to the model proceeds in several stages. First,  the constant coefficients a(x),b(x),c(x)  least squares separately at each station location  are fitted by non-linear i=l,...,9. The model  used for the data (corresponding to events at times tj in the study 10  year under consideration) being fitted in this step is:  pH(x{, tj) = a(x,) sin  2lrfy 365  + b(xi) cos  2nt 3  365  +d{xi) +  + log 10  c(xi) • I(Xi,tj)  1 - e x p [ - c ( x ) • I[xi,tj)] t  e(xi,tj).  T h e values of the fitted parameters a(x,), 6(x,), c(x,), d(x,) for each year, and each station are presented in Table 2.1. negative values of e's at stations  Note that there are some  020b and 057a.  These values should  always be positive in principle since c is defined as a washout However, the magnitudes of these values are always very small.  rate. These  negative values could possibly be the result of the round-off errors in the nonlinear fit of the model.  Figure ted lated  2.2  seasonal as  shows curves  \fcfl + S,  Although  occur  and amplitudes  year  and phase  2  curves  phases  for each  d sin | | | + 6 cos |||. seasonal  the  is most  separately. the  value  of the  in the summer,  of  the  nine  Amplitude of t  which  minimum  fit-  is calcuminimizes  points  of the  the amplitudes and the  times  at which the m i n i m u m occurs for any station do not show great consistency  f r o m year  to  year.  presented in Tables 2.2.1 and the M E D I A N  This  and 2.2.2  fact  is supported  by the  obtained by applying the  results ANOVA  P O L I S H I N G decomposition techniques to the 11  fitted  coefficients a ( i , ) , similar  problem.  show the same  Results in the Two  figures  Eynon-Switzer paper suggests  (similar to  Figure  2.2)  in their  a  paper  inconsistency described above.  T h e results obtained by applying the techniques described above to the fitted coefficients c(a;,), d(x,)  are also presented in Tables 2.2.3  2.2.4. E a c h table contains two parts.  In Part  (a)  the average is used  to do the decomposition. T h e coefficients are considered as the in a two-way table (yearxstation). the s u m of overall average, decomposition,  then,  example,  the  the  average  row  year  and  as  station effect, year effect and residual. T h e  effect  the  entries  E a c h entry could be represented  is done by using an  (row)  and  is taken  overall  to  average.  ANOVA-type be the In  Part  approach; for  difference (b)  between  the  Median  Polishing approach is used to do the decomposition. E a c h entry in the yearxstation row  median.  table is replaced by the difference between itself and T h e median of these row  overall effect.  T h e year  effect  is taken  the row median and the overall effect. be  the  column median of the  difference between  the entry  new  medians to  be  is taken  the  The  of the new table  be  the  difference between  T h e station effect  table.  to  the  is taken  residual is then  and the  station  to the  effect;  the s u m of these effects and the residual results in the original entry of the two-way  table.  12  In Tables 2.2.1 and 2.2.2 of estimated values of the seasonal coefficients a(x) and b(x), there are many large residuals. These values appear at different stations and different years without any apparent pattern. Some year effects are much larger, in magnitude, than others. This suggests a considerable interyear variation in the year effect The station effect behaves the same way. Therefore, we may infer that the periodic coefficients a(x) and b(x) are inconsistent from year to year, and from station to station. Note that if the proposed model fitted the data well enough, then one would expect the periodic coefficients to be consistent at least from year to year. This suggests that these data do not have the periodic behaviour postulated in the model. We also apply the above decomposition  techniques to the yearxstation tables  of phases and amplitudes. The results which are not included in this thesis support our earlier conclusion that the phases and amplitudes are somewhat inconsistent from year to year.  It is not surprising that the estimated values of d(x) (decomposed in Table 2.2.4) show little consistency from year to year, and from station to station. The overall effect, d(x)  t  should be different from  station to station and possibly also from year to year.  Table 2.2.3 shows some consistency in the estimated values of c. 13  There is only one large residual (at station 065a, in 1980) which is greater (but not very much) than the other residuals. The magnitudes of the year effect are approximately the same.  The  station effect,  however, shows some variation. There are two relatively large values of the station effect which correspond to stations 020b and 048a. This points to possible differences in station effects. Besides, some of the residuals corresponding to stations 048a and 065a vary considerably from year to year (and also from station to station). Therefore, estimates of a common washout rate c obtained by using the data from all of the stations in each year separately, may  not be consistent from year  to year in conflict with a suggestion in the Eynon-Switzer paper; this is now  examined in more detail.  Following Eynon and Switzer, we fit a common value of the washout rate parameter c across all stations for each year separately. Using the previous estimates of a(x,), 6(x ), d(x,) (different for each station), t  we fit a common washout rate c for each year separately. The fitted values of the parameter c are: Year  1977  1978  1979  1980  1981  1982  c  A  .0405  .0528  .0189  .0555  .0304  .0489  se(c)  .0050  .0060  .0037  .0067  .0095  .0049.  14  Here se(c) denotes the nominal asymptotic-theory standard error of c, based on independence assumptions on the residuals. As a rough check for significant year to year differences in these c-values, we apply the Newman-Keuls procedure. The c-values are assumed to be independent, with a common standard error of approximately  .006 (this value is  the root-mean square of the standard errors displayed above).  While  the assumptions giving rise to this procedure are not satisfied exactly, nevertheless, the results give us some rough indication of the behaviour of these c-values. As expected, there are some statistically significant differences (at the five percent level). Specifically, the c-value obtained for 1979 is significantly different from the c-values obtained for 1977, 1978, 1980, 1982; and the c-value of 1980 is also significantly different from the c-value of 1981.  Note: The fitted values of the common washout rate parameter c given by Eynon-Switzer are: c  = 0.92 for year 1 (Sept. 78 - Aug. 79), =  1.10 for year 2 (Sept. 79 - Aug. 80).  These values correspond to rainfall volume measured in inches. To make these values comparable with our values of c (corresponding to volume measured in millimeters), we divide c by 25.4 (1 inch = 25.4 15  millimeters). c =  T h e converted values of Eynon-Switzer's c are respectively  0.92/25.4 =  .0362, and c =  1.10/25.4 =  These values of c agree with ours.  .0433.  Specifically, they are about in the  middle of our range of values of c.  The  combined  corrections  for  rainfall  volume  (using  a  common  washout rate c for each year) and seasonality reduce the overall variation of an annual p H series;  the residual standard deviation of a series is  reduced by 0 - 30 percent series.  The  ratios  of sums  from that of the corresponding uncorrected of squares  of residuals for the  corrected  series to that of the uncorrected series are presented in Table 2.3.  In  some cases, essentially no reduction is obtained by using the combined corrections.  T h e reductions corresponding to station 020b in different  years are always very small.  T h e results presented in Table 2.3 do not  show any obvious relationship between the percentage and the stations or the  of the reduction  years.  We note that according to E y n o n and Switzer, these corrections produce a 0 - 20 percent reduction of the residual standard deviation with their data. about  In addition, they note that each correction separately contributes one-half of the reduction.  T o see whether this is also true for  16  our  data,  we fit the full model and the reduced model in which  volume correction is omitted (i.e.  with only seasonal terms)  the  to eight  different data sets. E a c h data set corresponds to one year at a specific station. the  The  results  are  presented  residual sum of squares  due to  in Table the  seasonal  higher in some cases and lower in others the volume correction. separately  year  and  correction  is much  than the reduction due  It is not obvious here whether each  to  correction  contributes about one-half of the reduction.  2.4)  With  T h e reduction in  2.4.  the the  Spatial and Temporal Components  estimated estimated  parameters  a(x,), 6(x,)  common washout  for each  rate  station,  c for each  year,  each the  residual for the station at location x,- for collection day tj is given by:  —  •  pH (xi, tj) = pH(xi, tj) - a(xi) sin  ~  These pH(x{,tj)  l o  gio  2TT£J  365  .  » .  - 6(x.) cos v  "  2ntj 365  C(X.) • I{Xi,tj) 1 - e x p [ - c ( x . ) • I(xi,tj)]'  values are treated as estimates of the combined random  components a(x,) + fi(t ) + e(zi, tj). 3  Define: V (u) a  = E[a(x)  — a(x + u)] /2 : the variogram of the spatial process, and 2  17  Vp(tv) — E\(3{t) — B(t + tu)] /2 : the variogram of the temporal process. 2  These quantities describe the geographic and temporal variation of daily 'corrected'  p H readings.  Eynon-Switzer indicate that for convenience,  the following two simple parametric  models for the variograms  were  fitted:  V  where  <7i,/ii,<?2  a r e  a  {  u  )  =  l +  non-negative  positive definite 2 x 2  [ u ' M - i '  constants  and  a  at the origin.  2  is a  symmetric  matrix.  The contours of the spatial variogram V (u) centered  h  are concentric ellipses  The models satisfy the constraints  defmiteness required of all variograms.  of positive  The above variogram models  are also used to fit our data as described in what follows.  2.4.1) Fitting the temporal variogram;  The  corrected  p H values, pH(xi,t ), 3  are used to estimate an un-  smoothed variogram; l/2[pH(xi,tj)  - pH(xi,t i)]  2  3  estimates Vp(w) + of , 18  for w = tj — tji,  where o f - var(e(x,-, tj)).  Note that Eynon and Switzer estimate o f by using duplicate pH readings. In fact, the replication variability was estimated separately at each station for each year. There were modest differences among these estimates but not so large as to warrant modelling o f as a function of location. In any event, their estimate, o f , turns out to be small relative to temporal and spatial variability, and a common value of o f = .019 was obtained with their data. In our data, since no replicate observations are available, o f is absorbed into the temporal variogram. So l/2[pH(xi,tj)  — pH(xi,tj>)]  2  estimates Vp(w).  All rainfall days separated by a lag w (at a fixed station, separately in each year) are used to estimate Vp(w).  The scatter plot of the  estimates of the unsmoothed temporal variogram at station 072a in 1982  is presented in Figure 2.3. All possible estimates are plotted;  there are 1653 data points. Many of these estimates are near zero, but the variability of the temporal variogram estimates is very large. This pattern is, however, a typical one for all other plots of the temporal variogram estimates at different stations.  Eynon and Switzer observe that the fitted value of the constant g  2  19  could not be distinguished from zero with their data. To see whether this is true with our data, we fit the variogram model i (_.2/ )-i +  g2  to  four different data sets corresponding to different combinations of year and station; the results are presented in Table 2.5.  Our results support  the  nominal asymptotic  conclusion of Eynon  standard error  and Switzer (that  of the estimate  the  is always bigger than the  estimated  value). We would propose to let <j_ = 0.  It  may be inferred that the temporal variogram has a  constant  value gfi. Using the data from all years and all stations, the estimated value of gi is gi = .1287. In Table 2.6, different constants gi which are estimated using the data from each year and each station  separately  are presented. Applying the previous decomposition techniques to these values, we obtain the results presented in Table 2.7.  There are some  unreasonably large values of residuals corresponding to station 020b. The effect of station 020b is 0.31.  This is very large relative to the  other values whose magnitudes do not exceed 0.08.  This suggests an  unusual pattern in the data obtained at station 020b; recall the excessive variability of the data at station 020b noted earlier. values of gi = 0.805 in 1979 all others.  Specifically, the  at station 020b is very large relative to  Except for these consistently large values of station 020b,  the results do not show any obvious pattern between the remaining  20  values.  Note:  O u r value of g  x  of .1287 is bigger than the value of gi + o f =  .079 + .019 = .098 obtained by Eynon-Switzer. this  is that  we appear  to have  outliers  O n e possible reason for  in our data.  T h e excessive  variability of the data at station 020b might be another possible reason.  2.4.2) Fitting the spatial variogram;  T h e corrected  p H values, pH(x{,tj)  are used as follows to obtain  the unsmoothed variogram: l/2[pH(xi,tj)  - p~H(xi>,tj)]  estimates V (u)  2  +Vp(0 ), +  a  where u is the vector joining station pairs (a;,-, £ , < ) . Vp(0 ) +  comes into  the expression because the time unit was previously chosen as one day, and  two readings  of p H levels  at  two stations  on a same day (i.e.  same tj) might not have the identical collecting times so that the time lag is always 0 . +  l/2[pH(xi,tj)  Of  course,  V (u) a  Hence — pH(xi>,tj)]  can  be  2  — gi estimates V ( u ) . a  estimated  corresponding to two stations  directly  only  with events recorded  21  for arguments  u  on the same day.  With our data set, there are 2848 estimated values for the unsmoothed spatial variogram corresponding to two-dimensional  vectors u joining  the 36 station pairs. An attempt to fit V (u) to this data set failed a  because the algorithm failed to converge.  We note that Eynon and Switzer do not report any difficulties in fitting the spatial variogram.  To find the estimated values of h\ and h  2  for our data set, we try  another approach. For each station pair, the simple average of all the available estimates of the unsmoothed spatial variogram is considered as a new data point. The number of unsmoothed estimates entering this average is treated as a case weight. This new data set has 36 data points, with a case weight attached to each point. However, the attempt to fit V (u) a  to this new data set also fails; each of three different  algorithms, Gauss-Newton and Marquardt algorithms (SAS statistical computer package), and Derivative-Free Non Linear Regression (BMDP package), fails to converge to a solution.  The plot of this new data set is in Figure 2.4. The plot, however, shows that the proposed spatial variogram model is totally inappropriate for our data set. There are many negative estimates possibly because  22  of the large estimated value of Vp(0 ) (i.e. gi = .1287). And there are +  also some unreasonably large values of the unsmoothed spatial variogram estimates.  These estimates are always from those station pairs where one  of the two stations was station 020b (Illinois, Illinois). The difficulties encountered in this fitting may be due to these values. This is another indication of the atypical behaviour of the data obtained at 020b.  station  The plots of the unsmoothed spatial variogram estimates for  each year separately  (not included in this report) have approximately  the same pattern as the above.  The final step in the Eynon-Switzer approach is the interpolation step,  called kriging.  To carry  variogram model is required. station  020b could be the  out  this step,  an estimated  spatial  Since there have been indications that  source  of many problems that  we have  encountered in applying the approach to this data set, we will repeat the previous steps of the approach to the data set with station 020b removed. and h  2  This may allow us to find estimates of the parameters hi in the spatial variogram model used by Eynon and Switzer.  Besides, we would like to find out how well this approach work if we remove station 020b. This will be done in the next section.  23  2.5) Pitting Without Station 020b  In this section we examine whether we can obtain estimates of the parameters hi and h  2  from the data with station 020b removed.  We repeat the previous steps of the Eynon-Switzer approach on the reduced data set and present the results in what follows. The estimates of parameters a,b,c,d in the Eynon-Switzer model are the same as before. Now we use these estimates to fit a common c value for each year separately.  The fitted values of the parameter c  are: Year  1977  1978  1979  1980  1981  1982  c  .0405  .0622  .0297  .0654  .0492  .0530  old c  .0405  .0528  .0189  .0555  .0304  .0489  se(c)  .0050  .0054  .0033  .0060  .0044  .0049  where the notation is the same as in section 2.3. The estimated values of c for the years in which the data from station 020 are removed turnout to be larger than those obtained earlier; the largest increase, 0.0188, is in 1981. As before, we apply the Newman-Keuls procedure to check for significant year to year differences in these cvalues.  An approximate common standard error of c is about 0.005.  The result obtained in this step is generally similar to that in section 2.3 although there are differences. 24  Specifically, the c-value for 1980  is no longer significantly different from the c-value of 1981, and the c-value obtained in 1979 is significantly different from all the rest.  We use estimated values of a, b and a common c to estimate the temporal variogram Vp as in section 2.4.1. As before, the fitted value of g  2  is negigible (i.e. the temporal variogram would have a constant  value gi). Using all the data together (excluding station 020b), the estimated value of gi is 0.095; this value is comparable to the EynonSwitzer estimated value of ffi+of = 0.098. This supports our claim that our earlier estimated value of g\ is larger than Eynon and Switzer's because  of the excessive variability of the data  obtained at  station  020b.  Using this gi, we obtain the unsmoothed estimates of the spatial variogram as in section 2.4.2. Our attempt to fit the model for V  a  to  these estimates fails (i.e. the nonlinear regression fit fails to converge to a solution for h\ and h ) using all the different approaches described 2  in section 2.4.2.  To examine the unsmoothed estimates of the spatial variogram, we plot them in Figure 2.5.  The points with unusually large values in  Figure 2.4 do not appear in this plot; however, the proposed model  25  for the spatial variogram s t i l l does not seem to be appropriate. that  there  are  still many  estimated  values of the spatial  Note  variogram  which are negative. T h i s suggests that we may still be overestimating the t e m p o r a l variogram gi,  although our estimated value of gi is now  comparable to E y n o n and Switzer's.  So the removal of station 020b f r o m the d a t a set has not resolved the difficulty in obtaining estimates of the parameters hi and h  2  in the  E y n o n - S w i t z e r spatial variogram model. Consequently, the interpolation procedure  called kriging, undertaken by  carried out w i t h this spatial variogram. complete  Eynon-Switzer  could not  However, to demonstrate  approach of Eynon-Switzer, we consider an alternate  be the  spatial  variogram model in the next section.  2.6) Kriging W i t h a Different Spatial "Variogram  We have learned f r o m our earlier analysis that the model does not appear to be appropriate for this d a t a set. the interpolation  Eynon-Switzer Consequently,  step based on the residuals from this m o d e l seems  inappropriate.  However, to demonstrate the considerable value of the  Eynon-Switzer  approach i n analyzing spatial-temporal d a t a (when  model is appropriate), we carry  out 26  the interpolation  step w i t h  the the  spatial variogram )  V (u) = a-/( _o) + 6* N l i a  tt?  where u is the vector joining station pairs; ||«|| is the length of u and J(u^o)  *  1S  n e  indicator function.  The parameters  a and b are fitted  with constraints a > 0, b > 0, using the above unsmoothed estimates of the spatial variogram. The estimated values of a and b are 0.000 and 0.005 respectively. For each year, the estimated values of the realized spatial process ct(x) at the station locations x = x,- can be obtained by  <*(*») = —  2  —  ,  Tli where n,- is the number of rainfall days at station i in the given year. The estimates a(x,)  for each year are presented in Table 2.8.  of the rainfall volume correction, than the simple averages in 1979  Because  these estimated values are smaller  of the uncorrected p H levels (station 020b  is an exception; this may be due to the excessive variability  of p H levels at  this station 020b in 1979).  differences range from 0.01  to 0.20  At other stations,  the  and they vary from year to year  and from station to station. To put the corrected p H values of Table 2.8  on the  original scale  we would need to  add a number of p H  units corresponding to the washout associated with the average daily rainfall at is assumed  that to  station in that year.  Although the washout rate c  be common across stations 27  in each  year,  the  large  differences  in average daily rainfall across stations in each  that these adjustments each  year.  will differ somewhat  F o r example,  range from 0.05 to 0.10.  in 1982 when c = .0489, the adjustments to add these adjustments  mean  to station in  i n 1979 when the estimate of the  rate is c = .0189, these adjustments  important  from station  year  washout Similarly,  range from 0.08 to 0.22.  to the corrected p H levels  It is before  doing the interpolation because estimates at unobserved locations (i.e. contour  maps) will be substantially influenced by these values.  Since the spatial-temporal model for the p H levels does not seem to be appropriate,  resulting contour  like to demonstrate  maps may be misleading.  We would  how the kriging method works, but do not intend  to take the interpolating results seriously. corrected p H levels seems unneccessary. below is carried out without  first  F o r this reason adjusting the  T h e interpolation step described  adjusting the corrected p H values.  T o obtain estimates of the p H levels for unobserved locations sequently,  contour  maps),  kriging (Delhomme 1978). at  we use the interpolation  technique  (subcalled  T h e 'best' linear unbiased estimate of oc(x )  an unobserved location x  0  0  is given by  9  28  where the weights Ai,...,A are chosen to minimize 9  v{x ) 0  subject  to __]^»'( o) = 1x  = E[a*{x ) 0  With  process underlying the data (cf.  - ot{x )} 0  stationarity  2  assumptions about the  Journel and Huijbregts (1978)), the  quantity v(x ) can be expressed as a quadratic form in the A,'s whose 0  coefficients depend on the spatial variogram V , the temporal variance a  parameter x  0  gi, and  We minimized this quadratic form for each  on a fine grid over the eastern  United  States using the above  estimated parameters for the spatial variogram. The contour maps of these resulting interpolated values are presented in Figure 2.6 (produced using the computer language S) for each of the four years from 1979 to 1982.  The contour maps could possibly be useful in investigating the  pattern of changes from year to year.  However, as mentioned earlier,  these results could be misleading, so attempting to draw conclusions from these results seems unwise.  2.7) Conclusion  In  this  attempt  MAP3S/PCN  to validate the Eynon-Switzer model using the  data we obtain some results which are consistent with  those based on the E P R I data set. However, some of the results differ. Specifically, there are significant differences between the estimated values 29  of the  yearly  temporal  washout  variogram  Switzer,  Vp  is  possibly because  (including station section  rate  2.5  estimated  020b).  using the  c.  O u r estimated  larger of some  that  potential  reported outliers  by  data  set  vs .098).  possibly because  between  the  analysis  is  capture  the  complete model have the  to  the  structure  approach obtain  with station  of  020b removed,  the same as that  already p H levels  the  E y n o n and  contour  to  be  we  the  and  data  set  where  the  obtained by E y n o n  set. use  maps the  for  MAP3S/PCN  in geographical  location  indication obtained from our  Switzer  of this data  concluded that appears  differences  Overall,  throughly,  the  the  T h e spatial variogram model proposed by  two networks. that  of  Eynon  in our  Eynon-Switzer does not appear to be appropriate for the network,  value  T h i s is supported by the results obtained in  value of Vp is about  and Switzer (.095  than  constant  model does However,  an  to  alternate  1979  to  not  investigate spatial  1982.  their  variogram  But  proposed spatial-temporal  inappropriate,  completely  as  e  we  model for  drawing conclusions from  these contour maps seems ill-advised.  Since this approach is unsuccessful for this data set, examine structure  the raw data  we propose  to  in more detail with a view towards identifying  which may be present.  T h i s will be done in the next  30  chapter.  3) D A T A  EXPLORATION  We now start a careful examination of the data in an attempt to uncover any underlying structure which may be present. We examine a number of factors which may be expected to influence the pH readings and investigate the possibility of consistent relationships between pH levels and these factors. There may also be some relationships between these factors. 3.1) Factors to Be Examined  As described earlier, there are about 3000 data points in the data set. Each data record contains the time of the event (the storm), the location of the station, the rainfall volume of the storm, and other information. Note that in this examination one pH reading of 7.31 at station 065a in 1980 is removed. The new mean and standard deviation for this year at this station are 4.04 and 0.33 respectively, while the old ones are 4.14 and 0.61 (Table 1.2). The removal of this obvious outlier makes the new standard deviation of this year consistent with other years at this station (about 0.3). 31  T h e location of the station  should be one important factor.  Most  stations are located in heavily industrialized areas; therefore, the pollutants in the sky at each station are possibly influenced by the chemicals released from the factories long-range transportation sions  are  carried  distances of the  to  the  storm  we expect  strong  the  may  be  the  factor  There are also possibly some  that is tall remote  winds at  high  on the  may  under consideration. p H levels of the  different from station  is a categorical  smokestacks emis-  altitudes  These contaminants  station  effects  of stations station  effects,  location.  at  the  by  in that area.  to  over  affect For  very the  these  geographical station.  In  long  acidity reasons,  locations  our  study,  variable with nine categories;  each  specifies a single station.  Another factories,  important  the  factor  is  the  time  of  volume of production is seasonal  the (i.e.  storm.  In  some  for some specific  periods, production is very high, and in some other periods, production is low).  D u r i n g periods of high volume production, more  are released into the sky. These substances of  the  storm.  pollute which  the will  reasons,  we  sky;  Further,  during  the  in turn expect  affect the  the time  the  p H readings (i.e.  may affect the p H reading  summer  these obviously change  season,  forest  chemistry  of that  seasonal)  32  contaminants  factor  of the  storm. is  fires  For  may  storm, these  important  for  this study. months. years  T o account  Note that  and seasons  for seasonality,  Egbert (3  and Lettenmaier  or 4 months  obvious patterns i n p H levels. period (i.e. one month)  time is divided into years and  each),  (1986)  divided  time  into  but d i d not discover any  F o r this reason, we use a shorter  time  in this analysis.  Another important factor is rainfall volume. It is generally believed that  the p H reading  However,  varies  in the previous  volume as represented (1983)  monotonically  chapter,  with  the rainfall volume.  our observations  of the effect  of  by the scavenging term used by Eynon-Switzer  do not support  this  belief.  Therefore,  we will  examine  more  general relationships between rainfall volumes a n d p H levels.  3.2) Preliminary Examination of T h e S t r u c t u r e of T h e Data  We first and  examine  also between  the relationships among T i m e ,  these factors a n d the p H levels.  Volume,  Station,  In this section we  use monthly averages of p H levels a n d Volumes instead of individual values.  T h e monthly average p H level (or Volume) is just  average over the events  i n each month.  the simple  Since these monthly averages  smooth the original data (i.e. reduce the variability), relationships may be  more  clearly  (individual) data.  exhibited  in the averaged  data  than  in the original  If there are no obvious relationships in the averaged  33  data, relationships in the original data set seem unlikely. problem  with  this approach is that  taking averages  some effects which we may be able to detect  O n e potential  could  cancel out  in the individual  data.  However,  this is unlikely to occur since with only a few observations  available  in each  to that  detect  even  month  if they  (about are  relationships suggested  1-8  present  points),  data  are very  i n the individual  data.  by the monthly-average data  be examined subsequently in the individual average  effects  data.  hard Note  can always  Working  with the  also has the practical advantages of reducing the size of  the d a t a set to about 500 d a t a points from about 3000 data points. In the following sections, we examine the relationships among the factors in detail using the averaged data set.  3.2.1) Relationships between p H a n d M o n t h , p H a n d S t a t i o n  We would factors  affect  components: averages  at  like to find out whether or not the Station a n d T i m e the readings of p H levels. Year  each  and M o n t h . station.  In each  T o examine  The Time year,  there  factor  are 12 monthly  the relationships among  factors, we will look at the data of each year (all stations) If there  are obvious relationships in each year,  to recognize how they change over the years.  34  has two  these  separately.  we hope to be able  T h i s would subsequently  help us to understand the relationships among the factors for all years of data.  The monthly average data in each year are fitted with the following additive model:  where pHij  denotes the pH average of Month j at Station i,  fi  denotes the overall effect,  or,-  denotes the Station effects, i = 1 to 9,  Pj  denotes the Month effects, j = 1 to 12,  €ij  denotes the residuals.  The residuals may contain not only the pure errors, but possibly also some other effects such as a rainfall volume. We proceed to obtain rough estimates of the parameters by using ANOVA-like and median polish decompositions which are described in detail in section 2.4. The monthly averages are arranged in a 2-way table; the factors are Month and Station. We fit the model by this rough method for years 1977 to 1982 separately. The estimated values of the parameters a and P for the years from 1977 to 1982 are presented in Tables 3.1 and 3.2 respectively. 35  T h e estimates  of \i are:  1977  Year  1978  1979  1980  1981  1982  ANOVA  4.14  4.10  4.27  4.18  4.14  4.21  Median  4.14  4.07  4.22  4.14  4.16  4.25  Note that  similar estimates  of /x are obtained by both methods.  The  monthly averages range only from about 3.5 to 5.5, and most of them are  about  4.  Therefore, the  values of the  overall effect  obtained by  taking the overall average and the median should be very similar.  The  differences of the estimates from year to year are not large. If we had many  more years of data,  then we could investigate whether there is  a true yearly effect or a trend (pattern) over the years by examining the  estimated  values of /./.  Unfortunately, these  issues could  examined thoroughly because we have only six years of data. the  six estimated  values of fi (i.e.  for the  six years)  are  not  be  Although similar in  magnitude, it is not reasonable to ignore the yearly effect on the p H levels.  Therefore, we still assume that  there  is some yearly effect on  the p H levels.  The Station if there  station effects  (&) for the different years  in Figures 3.1.1 is any  consistent  (ANOVA) pattern  and 3.1.2 from  36  year  are  (Median to  year.  plotted  against  Polish) to  see  Similarly,  the  monthly effects (f3) for the different years are plotted against in Figures 3.2.1  and 3.2.2.  Month  Note that the plots for the two methods  of decomposition are qualitatively similar for both sets of estimates.  Figure 3.1 station  to  shows that the station effects are quite different from  station.  from stations  013a  In Figure 3.1.2,  there are  large positive effects  (0.05-0.27) and 020b (0.12-0.34).  There are also  some large negative effects from station 057a; for example, the effect is -0.30  in 1979.  The effects for the remaining stations are very small  or centered around zero. The changes in the station effects from year to year are considerable for most stations; stations 048a and 072a are exceptions. In general, Figure 3.1.1  has similar features, although there  are some differences. The effects do not change from year to year at station 013a.  The effects vary more from year to year in Figure  3.1.1  than Figure 3.1.2. Overall, the effects are quite different from station to station in both plots. This observation supports our belief that the geographical location of a station is an important factor in explaining the p H levels obtained at that station.  Note that at station 020b, both estimates of the effect of Station for 1979 are unusually large (Figure 3.1).  The reason is that in November  1979, there was only one storm with an unusually high p H level of 7.12; 37  therefore, the monthly average of this month is greater than those of other months (7.12 compared to about 4.0-5.5). Moreover, the estimate for  1979 (Figure 3.1.1) based on the A N O V A decomposition is greater  than that from the Median decomposition (Figure 3.1.2). This may be explained by the robustness of the Median Polish decomposition. The monthly average of November in 1979 is obviously an outlier. The decomposition using medians reduces the effect of the outlier on the estimated values of the parameters more than the decomposition using averages ( A N O V A ) .  Figure  3.2 shows some  interesting results.  vary considerably from year to year.  The monthly effects  However, the effects change in  a somewhat consistent manner from month to month.  Generally, the  effects decrease slowly from January to August, increase sharply from September to November, and then decrease a little in December. Note that this general pattern only becomes apparent when the effects for all years are plotted together.  In each year, the pattern of the effects  is somewhat different from the general pattern. For example, the effect increases from January to February (sharply) in 1981 and from March to April in 1982; the general pattern suggests a decrease in both cases. These plots also indicate that  the minimum of the monthly effects  always occurs  This result supports the expectation  in the summer.  38  that levels of acidity increase are  differences  in the summer.  Overall,  in the patterns of different years,  described above is the only apparent  structure  although there  the general  picture  i n the data.  3.2.2) Relationships between Volume and Station, Volume and Month  The  monthly average of the p H levels adjusted  station, monthly, and yearly effects, pH  i3  for these additive  denoted by pH^-, is  = pH^ -/*-<*,-- fa.  T h e results which follow are based on the adjusted p H levels obtained by  using the estimated  values  of /z, a, P provided by the median  polish decomposition. We use the median polish estimates here there are possible outliers the estimated  in the data  set." T h e effect  because  of outliers on  values of parameters is much less pronounced with the  median decomposition than the A N O V A decomposition is more  method; therefore,  the median  appropriate.  Since we wish to determine the relationship between p H levels and Volumes,  we proceed  to  find  the station,  monthly, and yearly  on the rainfall volumes so that we can adjust these effects  the rainfall volume for  as we d i d for the p H levels in the previous section.  is done i n what follows.  39  effects  This  The  relationship between Volume and Station and Month can be  examined by an approach similar to that in section 3.2.1. In each year we fit the following model: Vi] =  /* +  7i  +  + e,y  where  Vij  denotes the volume average in Month j at Station i,  H  denotes the overall  7,-  denotes the Station effects, i = 1 to 9,  Uj  denotes the Month effects, j = 1 to 12,  €ij  denotes the residuals.  We  effect,  obtain estimates of the parameters [i, 7 , and u by using the  median polish and ANOVA-like decompositions. We fit this model to the data of each year separately from 1977 to 1982.  The estimated  values of /z are: Year  1977  1978  1979  1980  1981  1982  ANOVA  14.42  12.82  17.07  10.53  10.77  12.00  Median  15.32  14.03  20.31  11.00  12.39  13.43  The estimates of n are quite different from year to year. Therefore, we infer that there is a substantial yearly effect on the average volumes. The distribution of the monthly average rainfall volumes has a long left 40  tail in each year (all stations); consequently, the median is greater than the average.  This is reflected in the above table where the  estimates  of fi based on the median decomposition are in every case larger than those based on the A N O V A decomposition.  The estimated values of the parameters Tables 3.3  and 3.4,  7 and w are presented in  respectively and are plotted in Figures 3.3  and  3.4 respectively. Figure 3.3.1 suggests some differences between stations. The effects vary greatly from year to year at some stations. Specifically, the effect for 1979 at station 013a is much greater than that for other years.  Also,  the effect for 1979  than that for other years.  at  station 048a is less pronounced  The difference between 1981  station 171b is large (about 8).  Figure 3.3.2  suggests  and 1982  at  a qualitatively  similar general pattern although there are some differences. The effect of 1979  at station 020b is very small in Figure 3.3.2  but it is large  relative to other years in Figure 3.3.1. This is due possibly to the fact that the average volumes are greatly different from month to month in 1979  at this station.  As a result, the average and the median of  the monthly rainfall volume averages of this year (station greatly different; the average is about 20.71 about  7.82  mm.  020b) are  mm and the median is  Consequently, the results of the two decomposition  methods are quite different for year 1979 and station 020b. Note that 41  there was no rainfall event in February 1979  at this station. Overall,  Figure 3.3 suggests that the monthly volumes are substantially different from station to station.  Figures 3.4.1  and 3.4.2  any obvious pattern. year to  year,  they  show that the monthly effects do not exhibit  Although the estimates are quite different from are  scattered  around  zero.  This suggests that  seasonality is not important for monthly average rainfall volumes. This result seems surprising; average rainfall volumes might be expected to differ from season to season.  Since seasonality  was important  for p H  levels and might be expected to be important for rainfall volumes, this issue is now examined  We apply the Stuetzle  further.  SUPER SMOOTHER  developed by Friedman and  (1982), to all individual (event) data.  The Super  Smoother  uses local linear fits with varying window width determined by local cross-validation. produces  In general,  this smoother  takes bivariate  a smooth fitted relationship between  data and  the two variables.  this exercise we use Volume and Time as a bivariate data set. Volume  and Time are  measured  as  in chapter  2  (i.e.  In Both  Volume is  measured in millimeters and Time is the number of days from January l* ,1976 to the day of the event). t  42  The relationship produced by the  smoother is Volume as essentially a constant over Time. The constant is 13.6 mm which is very close to the average of all individual rainfall volumes. We also apply the same procedure with Volume adjusted for the Year and Station effects. The result indicates that there is no obvious relationship between the adjusted Volume and Time (i.e. Month). The smoothed curve provided by the smoother does not resemble any standard functional form (Eynon-Switzer suggested a logarithmic relationship between Volume and Time). These results suggest that the rainfall volume of a storm is not influenced by the time of the storm (i.e. there is no seasonality). It appears that the effect of the factor, Month, on the average rainfall volumes can be ignored. The model for the monthly average rainfall volume can now be written as: V - = /i + i3  7t +  where everything is as previously defined. Using the estimated values of the parameters fi and 7, we obtain the adjusted average rainfall volume as follows: Vii  = Vij - p, -  7,-.  V represents the monthly average rainfall volume adjusted by the yearly average and the station effect. Note that we use the median estimates to estimate /z and 7 .  3.2.3) Relationship between p H and Volume  We now search for a relationship between monthly average p H levels and  rainfall volumes using the adjusted data (V,  original data (V,  pH).  pH)  instead of the  Note that if there is indeed a relationship  between p H and Volume then it would be more easily detected using the adjusted data than the original data.  The effects of each factor  (Year, Station, Month) on the volume and the p H level are different; therefore, under the influence of these different effects, the original data may  suggest some relationship which is not true.  As a first step, pH is plotted against V in Figure 3.5. from 1979 to 1982 1977  Only data  is used because only three stations were active in  and only five stations were active in 1978.  included since it has only two years of data.  Station 171b is not There are about  400  data points. The figure shows that most data points lie near the origin (i.e. and  V = 0, V.  pH = 0).  There are some unusually large values of pH  There are four large values of pH corresponding to the values 44  of V ranging from -10 to 5. There are also four large values of V corresponding to pH ~ 0. With these points deleted, the figure does not show any irregularities. To examine these unusual points further, we plot the data from each station separately in Figures 3.6.1 to 3.6.8. These figures reveal that all of the four large values of pH and the largest value of V are from station 020b. They also show that the other three large values of V are from some possible outliers in the data. Indeed, at station 013a, there is one monthly average rainfall volume of 75.20 mm  (average of two storms), while the remaining  values range from 3.15 to 34.59 mm.  At station 072a, there are two  average rainfall volumes of 69.36 and 70.76 mm, from 2.14 to 31.01 mm.  while the rest range  The plot for station 020b (Figure 3.6.2)  shows an unusual pattern. The strange behaviour of station 020b is not surprising since as mentioned previously, the monthly averages of rainfall volumes vary greatly from year to year and from month to month at this station. The plots of other stations do not show any obvious pattern for pH and V;  there are a few outliers.  Overall,  the plots do not show any obvious relationship between the adjusted rainfall volumes and pH levels. Since we did not see any relationship in the adjusted data, we are convinced that there is no relationship between the rainfall volumes and the pH levels. It appears that the  45  rainfall volume is not an important factor in explaining the p H levels.  In the light of the above preliminary examination, we conclude that Year, Station, and Month are three important factors in explaining the pH levels. The rainfall volumes do not appear to influence the p H levels. In the next section, using the individual (event) data, we examine how these factors influence the p H levels in detail. To investigate this issue, we first fit an ANOVA model to the event data to see if we can fit this data with a simpler model (i.e. fewer parameters).  3.3) A n A N O V A Model  An ANOVA model would represent the p H values corresponding to the individual events as follows:  (1) pHijki = fi + cti + Pj + 7fc + ( P)ij + («7)tfc a  +  {Pl)jk + {apl)ijk + etjkh  where pHijki  denotes the p H level of the I  th  at the i  th  Year, j  t h  storm occuring  Station and k  th  H  denotes the overall effect,  or,-  denotes the Year effect, i = 1,..,6,  Pj  denotes the Station effect, j =  7^  denotes the Month effect, k = 1,..,12, 46  1,..,9,  Month,  [ctj3)ij,(at'i)ik,[l3'i)jk (a/?7),-yfc Cijki  denote the second order interactions,  denotes the third order interaction, denotes the residual.  This is a saturated A N O V A model in which the p H level is represented as a combination of two parts. The stochastic part is represented by e and  the remaining components of the model represent the deterministic  part (where all factors are considered fixed). In the following sections, we estimate these two parts and then examine each estimate carefully. This examination is intended to reveal the stochastic structure of the data;  for example,  whether the residual variation can be reasonably  approximated by the normal distribution. We would also like to capture the deterministic component of the p H level with a simpler model (i.e. one with fewer parameters).  3.3.1) Fitting the A N O V A  model  As mentioned in chapter 1, the standard deviations of the p H levels are quite different from station to station; they also vary somewhat from year to year (Table 1.2). Therefore, it seems reasonable to allow the  p H levels to have  different variances  for different  yearxstation  combinations; we assume that all p H levels measured in the same year and  same  station have the same variance. 47  Using the data  in each  year and each station (about 50 to 100 data points), we can get an estimate of the variance (as presented in Table 1.2).  However, since we  know that monthly effects are important, we can get a better estimate by utilizing this knowledge about the structure of the data.  Given a  fixed year and a specific station, we can write the p H level of the storm observed in the k  I  th  month as:  th  pHki —  + CM,  where pit  denotes the monthly effect, k =  eki  denotes the residual.  1,..,12,  Since we assume that the residuals in each yearxstation combination, have the same variance, a ,  we get an unbiased estimator S  2  2  for cr , 2  given by ?2  (2)  H E,(pHki~pH .) k  k  2  E K -1) f c  where p~H  denotes the average of all p H levels in the k  i%k  denotes the number of observations in the k  k  th  th  month, month.  These estimates are a bit smaller than those adjusted only for the yearly average instead of the monthly average. 48  We present both estimates of  the variances in Table 2.2 where the estimates adjusted for monthly average are denoted by sd(adj).  With these estimates of the variances for the pH levels, we use a weighted least squares (WLS) method to fit the model (1). There are two ways of fitting this model. The first is to consider the estimates of the variances obtained above as the true values; the second is to carry out re-weighted least squares in which the variances are  re-estimated  after each iteration until they do not change substantially.  Here, to  remain consistent with our assumptions about the variances, we use the former and think that the above estimates are already very close to the true values. Since the primary objective of this fitting is to obtain the estimated values for the deterministic components, not to estimate the variances, the use of the re-weighted least squares method seems unnecessary.  In what follows, the saturated be  fitted.  To convince  ourselves  model and various submodels will that  the  estimated  values of the  parameters would not be much different with re-weighting, we carried out the re-weighted least squares with one simple submodel consisting of Year, Month, and Station effects only:  pHijki = (M + ai +fa+ 7fc + eijki. 49  Our previous estimates of the variances were used as initial values and the maximum likelihood method was used to re-estimate ters and the variances, assuming normal data.  the parame-  The fitting procedure  converged after three iterations (the absolute change in the estimated values of each parameter is less than .01) and the maximum of the differences between our estimated values of parameters  (i.e. iteration  1) and the resulting values from the re-weighting method is less than .02 for each parameter. This supports our belief that re-weighted least squares is unnecessary.  Using the procedure G L M in SAS with weights tu,-y defined as 1  Nowhere Sfj is the estimated variance for year i at station j , calculated using (2), we get the following results (Y=Year, M=Month, Source  df  SS  MS  Y  5  81.6  16.32  M  11  366.7  33.34  S  8  222.4  27.80  YM  55  311.6  5.77  YS  29  198.3  6.84  MS  88  172.0  1.96  50  S=Station):  YMS  319  521.8  1.64  Error  2400  2449.1  1.02  Total  2915  4323.5  Note that by using the W L S method to fit the parameters, we assume that TpHijki/Sij  (MSE)  has a variance of 1. Our estimated value of the variance  is 1.02, which is close to 1. This indicates that our estimates  for the variances used in the fitting are appropriate.  In this table, df denotes the degrees of freedom of the effect and SS is the reduction in the weighted residual sum of squares due to the source (factor), obtained by adding these sources sequentially to the model in the order listed. More precisely, let RSS(f) be the weighted sum of squares of residuals when fitting the model denoted by / :  RSS(f)  Y^w ipH -f) ,  =  i3  2  i3kl  ijkl where /  denotes the W L S fitted model.  Then each SS can be written in terms of these RSS's.  For example,  the sum of squares due to the factor, Month, SSM, is given by SS  M  = RSS(fi  + a + pj) - RSS(fi {  + or,- + p + 3  -) ). k  The corresponding MS is just the ratio of SS and df. The sums of squares due to the sources in the above table are not substantially 51  influenced by the order in which the components are sequentially added to the model. Specifically, provided that all l * order effects are entered, f  essentially the same SS's for higher order effects are obtained using different orders. the  most  The results in the table correspond to what seems  natural order.  Using  the  above results,  we examine  the  deterministic and stochastic components in the next sections.  3.3.2) D e t e r m i n i s t i c component  In the saturated model (1), we use 516 parameters to capture the deterministic component. In the hope of finding a simpler model (one with fewer parameters) which adequately represents  the deterministic  components, we examine whether we can drop some effects from the full model. To examine this question, we compare different submodels containing combinations of the effects and interactions.  It is obvious that the main effects are very important (the mean square for each of the main effects is at least 16.6,  while the largest  mean square for the second order interactions is at most 6.8). Moreover, it is reasonable to include a higher order interaction term in the model only  when the  yearx month  lower term  is also  interaction is in the  included.  For  example,  model then the year 52  if the  and month  effects must be in the model.  Guided by these criteria, we consider  only eight specific submodels and the full model. We use the log-linear convention to identify the submodels; for example, [Y,SM] represents the model containing Y , S, M , and S M . The results comparing the models to be considered are provided in the following table (p = number of parameters): Submodel  P  RSS  R  R  2  2  Ax  C -p P  [Y,M,S]  25  3652.8  .16  .15  761.8  3852.3  3702.8  [YM,S]  80  3341.2  .23 .21  505.2  3979.4  3501.2  [YS,M]  54  3454.5  .20  .19  592.5  3885.3  3562.5  [Y,MS]  113  3480.8  .20  .16  677.8  4382.3  3706.8  [YM,YS]  109  3142.9  .27  .25  335.9  4012.5  3360.9  [YM,MS]  168  3169.2  .27  .22  421.2  4509.5  3505.2  [YS,MS]  142  3282.5  .25  .20  508.5  4415.4  3566.5  [YM,YS,MS]  197  2970.9  .31  .26  251.9  4542.6  3364.9  [YMS]  516  2449.1  .43 .31  49.1  6565.7  3481.1  In this table, the values of various statistics often used as criteria for choosing the 'best' submodel are also included. These criteria, which are similar to those suggested in Weisberg (1980) for independent and identically distributed (iid) residuals, are described in what follows. 53  a) M u l t i p l e correlation  coefficient  The square of the multiple correlation coefficient R  2  is often used  as a criterion for comparing models. In this case, since we have only independent analogue  but not  of R ,  identically distributed residual data,  which is calculated  2  squares, for comparing submodels.  2  2  2  p  —  by using the weighted sums of  A computing formula for R  p-parameter model (denoted by R ) P  we use an  in a  is: R^Sp SYY'  i _  ~  where RSS  P  is the weighted residual sum of squares using the submodel with p parameters,  SYY  is the total sum of squares, given by SYY  -  = £ 0 • « WijtpHvu  fH...) , 2  where pH  denotes the overall average,  Wij is as previously defined.  Note  that  the  large values  explained by the  model.  of R  2  indicate  A disadvantage  that  more  variability is  of this method is that it  is only useful for comparing different models with a fixed number of parameters. 54  b) Adjusted R  2  To adjust  the method in (a) for the stated disadvantage,  corre-  sponding to suggestions in Weisberg (1980), one may use an adjusted version of R , 2  denoted by R  and defined by  2  where n is the total number of observations.  Note that as before large values of R  2  indicate that more variability  is explained by the model. The adjusted value can be negative.  c) M a l l o w s ' C statistic p  Let pH  ijkl  be the predicted value of the corresponding observation  obtained by the submodel. Let the mean square error (mse) be defined as mse(fH ) ijkl  = var(fH ) ijM  + [E(fH ) ijkl  -  E{pH )] . ijkl  2  The submodel which makes the weighted mean square errors, given by Wij x mse(pH j i),  as small as possible is preferable.  i: k  Note that  Weisberg (1980) suggests this criterion only for the iid case. However, since we use the weighted mean square error criterion, we in fact have an iid case where pHi i/Sij jk  is the raw data with variance equal to  55  1. The Mallows' C  statistic for a p-parameter model is defined by  p  C  where RSS  P  p  = RSS  p  + 2p - n,  is as previously defined.  Mallows (1973) suggests  that  good models will have negative, or small, values of C — p. p  We  calculate  R, 2  R, 2  C  p  with  SYY=4323.5  and n=2916,  and  present the results in the above table. A l l three methods suggest that the saturated model is the best one among those submodels considered. This suggests that all terms in the full model are needed to account for the p H levels. Note that the full model corresponds to using the monthly average as the fitted value for the p H level of any particular event.  Our  objective in investigating the  deterministic component is to  reduce the full model to a model with fewer parameters  which still  captures the structure in the deterministic component of the data. To find a simpler model, we would first have to examine whether any of the different sources (i.e. Y , S, M , YS,...) could be dropped from the full model without serious effect on its ability to capture the deterministic component. This would hopefully lead us to a submodel. Subsequently, we would investigate the possiblity of replacing the remaining parameters with fewer parameters  by using simple (possibly nonlinear) functional 56  forms to represent some of the effects which were identified as important (a periodic f o r m to represent any seasonality evident i n the estimated Y , M , and Y M effects, for example).  Using the M S due t o the sources, we c a n see if any source can be eliminated. A s noted earlier, the M S due to the m a i n effects are very important; they range f r o m 16.32 to 33.34. W i t h MS  ~ 1.02, rough  error  F tests would yield very s m a l l p-values for these m a i n effects. T h e fate of interaction terms are not so clear. T h e M S due to the y e a r x m o n t h (MSYM)  and y e a r x s t a t i o n (MSys)  interactions are considerably smaller  t h a n the M S due t o the m a i n effects (the values are 5.77 and 6.84), but  rough  F  tests would yield small p-values.  m o n t h x s t a t i o n (MSus) are much smaller t h a n  and y e a r x s t a t i o n x m o n t h MSYM  and MS s Y  (MSMS  T h e M S due to the interactions  (MSYSM)  =  1.96, MSySM  =•  1.64); intuition suggests that these interactions are not really i m p o r t a n t . A l t h o u g h the rough F tests would still yield statistical significance for these interactions, the significance i n this case may be inevitable and due to the very  large number of observations  available  (2916).  The  results using different criteria (described above) also suggest that every t e r m i n the model is important  i n explaining the p H levels (i.e. the  f u l l m o d e l is the 'best' one). Note that none of these methods 'adjust' for the large number of observations available even though the latter 57  plays an important role in these results.  To pursue this issue further, we can use the criteria proposed by Schwarz (1978) and Akaike (1973) for comparing the fit of different models. Let p be the number of estimable parameters in a model. Schwarz proposed that the model which maximizes B = log(maximum likelihood) - | log(n)  be chosen. assumption  Since the number of observations (n) is fixed, with the of independent  and normal data having known variances  (i.e. ofy = 5?), the criterion is equivalent to minimizing Bi  = RSSp -fpxlog(n).  Akaike proposed that the model which maximizes A = log(maximum likelihood) - p  be chosen. With the same assumptions, this is equivalent to minimizing Ai  = RSS  P  + 2 x p.  Note that log(n) = 7.978, so Bi — Ai + 5.978 x p in this case.  The values of Ai and Bi for the submodels are presented in the above table.  According to Akaike's criterion, the submodel [YS,YM]  58  is the best;  the last  two terms in the full model (i.e.  can be dropped. This supports our earlier suggestion that MSYSM  SM, YSM) MSSM  and  are not really important. Schwarz's criterion, however, prefers  the submodel [Y,S,M] to the rest. Nishii (1984) proves that the criteria proposed by Mallows and Akaike are asymptotically equivalent under general conditions; asymptotically, they have a positive probability of selection only for models that properly include the true model. However, the Schwarz criterion behaves somewhat differently; the model chosen by the Schwarz criterion is a 'consistent' estimator of the true model. To a certain extent, this behaviour is evident in our analysis. The three 'best' submodels chosen by the Mallows and Akaike criteria are  the  same (although their rankings are slightly different), but the Schwarz criterion  indicates  that  smaller submodels are  preferable.  Since  the  term log(n) adjusts for the large number of observations available, the Schwarz criterion may be more appropriate in this present context than those of Akaike and Mallows.  These results clearly suggest  that the  deterministic component of the p H levels can be captured by some simpler model than the saturated one.  The  model [YS,YM] chosen by Akaike's criterion is already much  simpler than the full model (109  versus 516  parameters).  However,  according to the criterion proposed by Schwarz, an even simpler model 59  can possibly be used. Nishii's results suggest that although the submodel chosen by the Schwarz criterion is an asymptotically 'consistent' estimator of the true model, the submodel chosen by the Akaike criterion has a greater likelihood of properly containing the true model. Therefore, to find the smallest possible model for the pH levels, we would need to examine the estimated values of the parameters in the model [YS,YM] to see whether they can be approximated by a smoothed function.  This smoothed function (if found) would be a  proposed model and its properties and fit to the data could be further examined. However, due to limitations of time, this interesting project must be deferred.  3.3.3) Stochastic  component  Many methods of analysis employed in environmetrics are based on assumptions of normality. Therefore, it is important to check the normality of the pH levels. The simplest way of examining this question is to use the normal probability plot. The residuals from the full model of the previous section are standardized and plotted for each station (all years together) separately.  60  Using SAS, we calculate the normal  probability by  Pr(e ) i  *- (r -l)/(n+ -),  =  l  1  i  where e,-  denotes the residual of the I  7'i  denotes the rank of e,-,  n  denotes the total number of residuals,  $  denotes the cumulative normal distribution.  th  observation,  The resulting plots for the different stations are presented in Figures 3.7.1  to 3.7.9. The plots of stations 013a,  3.7.1, 3.7.4, 3.7.9  044a, 171b are presented in  respectively. These plots do not suggest any violation  of the normality assumption.  Figure 3.7.3  represents the normal plot of data from station 043a.  The plot is very well-behaved, except for one possible outlier: pH=5.95 in November 1980 from 4.02  to 4.35).  (the rest of the p H readings in that month range The normal plot for station 048a (Figure 3.7.5)  has a similar property. December 1981 3.70  It  also has one possible outlier:  pH=6.00 in  (the rest of the pH readings in that month range from  to 4.71). The plot of station 072a (Figure 3.7.8) does not suggest  any serious violation of the normality assumption; however, there is one apparent outlier: pH=6.57 in June 1981 61  (the rest of the pH readings  in that month range from 3.86 to 4.07).  Figures 3.7.6 and 3.7.7 represent data from stations 057a and 065a respectively. The plots do not show any serious violation of the normality assumption. However, these plots have some disturbing features. Although most stations have collinear data, there are some points on both tails which are slightly off those straight lines. A more careful examination of the data from these stations is described below.  Figure 3.7.2 represents the normal plot of data from station 020b. The normal plot shows a serious violation of the normal  assumption.  Instead of having a roughly linear, a curve is obtained; the data needs to be examined more carefully.  We can investigate these 'questionable' stations further by examining the normal plots for each year in each of these stations separately. The plots (not included in this work) for stations 057a and 065a for each year separately show that except for some outliers in the data, there are no obvious violations of the normality assumption. However, the plots of station 020b suggest a serious departure from normality. The plots for 1978 to 1982 are presented in Figures 3.8.1 respectively.  to 3.8.5  Except for Figure 3.8.1, the plots clearly exhibit the  62  extreme skewness that  the  some  of the  data  1978  support  data for  apparent  These  results  indicate  distribution.  d a t a f r o m the  the  station.  normality  Figure  3.8.1  assumption  shows  except  for  of the  pH  outliers.  levels obtained at station normal  from this  that  the  020b  However,  stochastic  cannot  components  be considered as arising from a  except for  remaining stations can  a few  possible outliers,  be reasonably  approximated  the as  normal.  3.4) Conclusion  In  this  chapter,  we  have  examined  the  data  set  carefully  in  an  attempt to uncover underlying patterns which might be used to find an appropriate model for the p H levels. T h e results indicate that there is a seasonal structure in the p H levels. Specifically, the time effects decrease slowly  from  November,  January and  then  to  August,  decrease  a  increase little bit  sharply  from  in December.  September  T h e rainfall  volumes, however, do not show any obvious seasonal structures. is  no  obvious effect  station  The  of the  rainfall volumes  effects are quite important  deterministic  components  on  the  to  There  p H levels.  The  in explaining the p H levels.  of the 63  p H levels were examined  by  fitting a saturated  ANOVA  model; the importance of the individual  components in the model were also investigated.  The results suggest  that the deterministic components can be captured by a smaller model than the full model.  The  stochastic  component  was examined for the  appropriateness  normality assumptions by using normal probability plots.  The results  indicate that serious violations of the normality assumption are found in the data obtained at station 020b. However, except for a few possible outliers, the data from other stations can be reasonably approximated as normal.  64  4)  C O N C L U S I O N  In this study, we have learned that the model developed by E y n o n and  Switzer  appropriate network.  (1983) for  the  for  analyzing  data  obtained  pH from  levels, the  does  not  seem  MAP3S/PCN  to  be  montoring  However, the general approach seems to be useful in analyzing  the spatial-temporal data.  For this reason,  the approach including  the  interpolation step called kriging was completely demonstrated using the Eynon-Switzer model.  In an effort to identify a more appropriate model for the data, examined the raw data in detail. three factors, levels. at The  model, including  the  Year, M o n t h , and Station, was fitted to the individual p H  T h e results suggested  different  A full A N O V A  we  stations.  conclusion  is  The  that  the  that the residuals have different normality data  of  from  the all  residuals stations  was except  variances examined. one  can  reasonably be approximated as coming from normal distributions. T h i s is a very useful result  because  many methods of analysis employed in  65  environmetrics are based on assumptions of normality.  Different criteria such as Akaike, Schwarz, Mallows, etc, are used to identify the 'best' submodel (i.e. fewer parameters than the full model) for  capturing the deterministic component  of the data.  The results  suggested that it is possible to capture the deterministic component by a much smaller model than the full model.  It would be very interesting to examine whether the 'best' submodel can be represented  by a smoothed function.  The abilities to fit the  data and the properties of this smoothed function (if found) could be further examined. Moreover, since the variances of the residuals of the data are not homogeneous from station to station, we must have the estimates of the variances of the residuals at any locations in order to do interpolation (kriging in this case). The question of whether we would be able to estimate these variances is also interesting. However, due to limitations of time, these interesting projects must be deferred.  66  REFERENCES  Akaike,  H. (1973).  Information  theory and an extension of the  maximum likelihood principle. g"  d  formation Kiado,  Theory.  International  Symposium  (B. N . Petrov and F . Czaki, eds.).  on In-  Akademiai  Budapest.  Delhomme, J . P. (1978). Kriging in the hydrosciences. Resources,  Adv. Water  1, 251-266.  Egbert, G . D . and Lettenmaier, D . P. (1986). Stochastic Modeling of the Space-Time Water Resources  Structure of Atmospheric  Research,  Eynon,  B. P. and Switzer,  acidity.  The Canadian  Friedman, terplots. Report  22, 165-179.  P. (1983).  Journal  The variability of rainfall  of Statistics,  J . H . and Stuetzle, Department  Chemical Deposition.  W . (1982).  of Statistics,  ORION006.  67  Stanford  11, 11-24.  Smoothing University,  of scatTechnical  Journel, A . G . and Huijbregts, C h . J . (1978). Mining  Geostatistics.  London: Academic Press.  Nishii, R. (1984). Asymptotic properties of criteria for selection of variable in multiple regression.  Annals  of Statistics,  12, 758-765.  Schwarz, G . (1978). Estimating the dimension of model. Annals of Statistics,  6, 461-464.  Weisberg, S. (1980). Applied Linear Regression. New York: Wiley.  68  Table 1.1: Site location : The MAP3S/PCN monitoring network. AOS s i t e identity  Location  Latitude  Longitude  Elevation F i r s t active Years of (meters) date data  013a  Lewes, Delaware  38 46 00  75 00 00  0  020b  Illinois, Illinois  40 03 12  88 22 19  043a  Uhiteface, New York  44 23 26  044a  Ithaca, New York  048a  01--Har-78  4  212  20--Nov-7  5  73 51 34  610  11--Oct-76  6  42 24 03  76 39 12  509  26--Oct-76  6  Brookhaven, New York  40 52 00  72 53 00  25  09--Feb-78  5  057a  Oxford, Ohio  39 31 51  84 43 25  284  01--Oct-78  4  065a  Penn State,  40 47 18  77 56 47  393  22--Sept-76  6  Pennsylvania 072a  Virginia, Virginia  38 02 23  78 32 31  172  12--Dec-76  5  171b  Oakridge, Tennessee  35 57 41  84 17 14  341  07--Jan-81  2  * A. Olsen and C. Watson, "Acid Deposition System (ADS) f o r S t a t i s t i c a l Reporting", EPA-600/8-84-023, September 1984, pp. C.7-C.13.  69  Table 1.2; Summary of data. pH ADS Identity  Year  No. of rainfalls  ~ Mean  sd  Volumes(mm) ~~~7 sd(ad_)  Mean  sd  013a  79 80 81 82  52 71 64 73  4.37 4.25 4.22 4.26  .40 .35 .40 .55  .28 .30 .36 .48  24.99 12.88 13.16 13.31  29.49 13.92 12.97 15.28  020b  78 79 80 81 82  50 40 94 87 80  4.25 4.73 4.28 4.34 4.37  .55 .89 .68 .65 .52  .58 .77 .69 .57 .48  10.44 19.80 9.21 10.68 11.12  12.92 29.48 11.65 14.26 14.01  043a  77 78 79 80 81 82  83 54 50 87 99 98  4.24 4.16 4.08 4.03 4.06 4.21  .33 .31 .18 .29 .23 .39  .32 .24 .17 .26 .23 .37  12.97 14.16 17.08 9.52 10.92 8.22  13.18 12.39 14.39 7.78 16.10 10.06  044a  77 78 79 80 81 82  52 60 62 75 72 80  16 05 08 19 10 14  .28 .27 .29 .32 .23 .33  .20 .20 .27 .27 .20 .31  19.06 13.83 17.80 12.41 16.49 11.54  14.20 11.09 19.63 12.20 17.06 12.01  048a  78 79 80 81 82  48 71 64 80 69  4.07 4.10 4.14 4.18 4.23  .35 .49 .51 .45 .45  .30 .42 .44 .39 .40  14.86 12.93 12.61 11.93 15.49  15.21 18.15 12.77 13.26 18.55  057a  79 80 81 82  47 75 75 83  4.22 4.08 3.80 3.99  .23 .29 .24 .37  .20 .24 .19 .30  15.08 11.72 9.67 11.20  16.97 13.34 8.42 13.20  065a  77 78 79 80 81 82  80 70 61 40 69 86  4.06 4.04 4.20 4.14 4.19 4.23  .27 .30 .27 .61 .30 .36  .22 .25 .24 .20 .25 .33  14.93 13.00 18.57 12.46 11.64 9.46  17.84 11.03 18.57 12.61 15.24 10.85  072a  78 79 80 81 82  57 51 65 56 58  4.06 4.17 ,09 .15 ,15  .34 .30 .30 .46 .30  .28 .29 .31 .48 .29  18.00 25.30 10.85 14.98 16.01  17.47 32.42 11.50 21.79 14.74  171b  81 82  65 64  .16 .27  .23 .23  .23 .22  14.63 23.14  13.75 20.17  70  TABLE The  f i t t e d  constant  f i t t i n g  Year  013a  020b  a  82  by  043a  044a  0.0185  -0.0015  048a  057a  065a  072a  171b  0.0598  0.1375  0.1855  0.2364  c  0.0466  0.0422  0.0348  d  4.1311  3.9918  3.9618  a  -0.1100  0.2063  0.0218  -0.0477  0.0474  0.0941  b  -0.0066  0.1897  0.1631  -0.0507  0.1652  0.0257  c  -0.0064  0.0227  0.0763  0.0662  0.1069  0.0477  d  4.2678  4.1145  3.8588  3.8740  3.8083  3.9056  a  -0.1478  -0.2415  -0.0241  -0.1017  0.0742  -0.0466  -0.0045  ,0871  b  0.2606  0.3583  0.0729  0.1117  0.3195  -0.0081  0.0871  ,0807  c  0.0187  -0.0289  0.0348  0.0565  0.0337  -0.0056  0.044S  0254  d  4.2778  5.0538  3.9550  3.8905  4.0563  4.2408  4.0607  4.0381  79  81  obtained model  b  77  78  coefficients  Eynon-Switzer  Station  Parameter  80  the  2.1;  a  -0.0616  -0.1810  -0.0446  0.0366  -0.1499  0.2058  0.0586  -0.0176  b  0.1467  0.3039  0.1561  0.1855  0.1075  0.0729  0.3813  0.1559  c  0.0906  -0.0015  0.0285  0.0449  0.1371  -0.0020  0.1805  0.0665  d  4.0414  4.3846  3.9929  4.0731  3.8802  4.0784  3.9236  3.9623  a  0.1242  -0.0530  0.0252  0.0412  0.1368  0.1531  0.0945  0.0771  0.0771  b  0.2258  0.3821  0.0365  0.1830  0.1232  0.0367  0.1496  -0.0399  0.0510  c  0.0450  -0.0004  0.0553  0.0311  0.1431  0.0331  0.0447  0.0201  0.0495  d  4.1085  4.3990  3.9527  4.0137  3.9080  3.7293  4.1141  4.0722  4.0176 0.0546  a  0.1082  -0.0193  -0.1840  -0.0383  -0.1048  -0.0392  1375  0.0399  b  0.2331  0.0575  0.0494  0.2062  0.2176  0.1322  1694  0.1471  0.0882  c  0.0855  0.0161  0.0690  0.1026  0.1375  0.0266  0516  0.0259  0.0087  d  4.6606  4.3374  4.0859  3.9298  3.9194  3.9313  1205  4.0602  4.2359  71  ANOVA and median decompositions on a,b,c,  Table 2.2t (1)  (a)  Ads  The residuals and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s a(x) using AVERAGE (part a) and MEDIAN (part b ) .  T h i s uses AVERAGE f o r d e c o m p o s i t i o n .  Id. Year  R e s l d . 77 78 79 80 81 82 Col. e f f .  (b) Ads Res.  and d.  013a  020b  043a  044a  -0 01 -0 0.04 0 16 -0 -0.08 -0.07 0 05 -0 -0.05 -0.06 -0 02 0 0.07 -0.00 -0 02 -0 0. 1 1 0.09 -0 18 -0  048a  057a  02 02 -0 08 02 0 17 -0 06 -0 11 0 00 0 11 0 02 -0 08 -0  0.00 -0. 10 -0. 00 -0 01 -0 02  04 16 04 10  0 07  065a 0 -0 0 0 -0 0  072a  171b  00 04 0 02 04 -0 03 05 -0 02 18 0 01 -0 05 11 0 03 0 01  0 03  0 02  Effect 0.02 0.05 -0.07 -0.02 0.05 -0.01  O v e r a l l average 0 05 0.00  T h i s uses MEDIAN f o r d e c o m p o s i t i o n . Id. Year 77 78 79 80 81 82  Col . e f f .  013a  020b  -0 -0 0 0  0 -0 -0 0 0  10 05 05 1 1  043a  044a  048a  057a  0.01 -0 00 06 0. 18 0 00 0 07 0.05 -0 02 0 22 -0 05 -0.01 0 08 -0 04 0 -0.02 0 01 0 17 0 10 -0. 16 -0 00 -0 00 -0  0 02 -0 10 -0.01  -0 02 -0 08  72  04 17 04 08  0 06  065a -0 -0 0 0 -0 0  072a  171b  01 04 0 03 01 -0 05 04 -0 01 20 0 -0 04 10 0 03 0 04  0 05  0 03  Effect 0.02 0.03 -0.07 -0.03 0.05 -0.02  O v e r a l l median 0 04 -0.00  (2)  (a)  The residuals and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s b(x) using AVERAGE (part a) and MEDIAN (part b ) .  T h i s uses AVERAGE f o r decomposition.  Ads Id. Year  013a  020b  R e s l d . 77 78 79 80 81 82  0 03 -0 1 1 o 03 0 02  -0 00 -0 01 -0 03 0 03 0 02 -0. 16 0 15 0 06 -0 13 0. 12 -0 05 -0 08 0 16 -0 08 -0 13 -0 00 0.04 0 01 -0 03 -O 08 -0 03 0 14 0 04 o. 18 -0 05 0 03 -0 OO -0 00 -0 03 -0 10 -0. 16 -0 06 0 04 0 08 0 08 -O 03 0 07  Col. e f f .  0 07  (b)  0.07 -0 04  044a  048a  057a  0 03 -0 00 -0 09  065a  072a  171b  0 OO 0 02  Effect 0.04 -0. 07 0. 02 0. 04 -O. 02 -0. OO  Overal1 a\ 0 05 -0 07 -0 08 0. 15  T h i s uses MEDIAN f o r d e c o m p o s i t i o n .  Ads Id. Year 77 78 79 80 81 8.2 Col.  043a  eff.  013a  0 -0 0 -0  07 10 01 01  0.09  020b  -0 0 0 0 -0  043a  044a  048a  057a  065a  072a  171b  -0.01 -0 04 0 01 0 03 -0 06 0. 13 0 02 -0 15 0.01 -0 03 0 22 -0 02 -0 05 0 0.04 -0 01 -0 05 0 00 0 19 0 01 1 1 -0.05 0 02 0 -0 00 -0 01 -0 15 -0 01 24 -0.06 0 01 0 07 0 07 -0 02 0 01 0 01 25 1 1  0.15-0.04  0.04  O  73  -0.08  Effect 0 -0 -0 0 -0 0  05 04 03 02 01 01  Overal1 median 0.04 -0.01 -0.07 | 0.14  The r e s i d u a l s and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s c(x) using AVERAGE (part a) and MEDIAN (part b ) .  (3)  (a)  This  Ads I d . Year Resld.  77 78 79 80 81 82  Col. e f f .  (b)  This  Ads  Id. Year  Res.  77 78 79 80 81 82  Col . e f f .  uses  AVERAGE  013a  -0 0 -0 0  020b  f o r decomposition.  043a  044a  048a  0 01 -0 01 -0.01 -0 02 0 01 01 0 02 0 02 0.00 01 -0.02 -0 03 -0 0 3 01 0.01 0 01 -0 0 3 02 0 03 0.01 0 02  0 01  uses  -0.05  MEDIAN  -0 01  -0 -0 0 0 0  065a  072a  171b  -0.03 04 0.03 0 01 04 0 01 0 01 -0.01 0.08 01 -0 0 3 0 01 0. 02 04 0 02 -0.03 -0 01 02 0 0 0 -0.03 -0 02 - 0 . 03  0 05  0 01  057a  -0 04  0.03 -0 01  Effect  -0.01 0.00 -0.03 0.02 -0.00 0.01  Overall average 0.05 -0. 02  f o r decomposition. 048a  013a  020b  043a  -0.03 0.02 -0.02 0.02  -0.01 -0.00 0 0.01 0.02  -0.00 -0.01 -0.04 0.01 0.02 0.00 -0.03 -0.02 0.01 -0.02 0.04 0.01  0.02  -0.06  0.00  044 a  057a  -0.07 -0.08 -0.01 -0.03 0 0.02 0.02 0.01 0.00  0.08  0.01  74  -0.03  065a  072a  -0.01 0.04 0 0.01 0.01 0.02 0. 12 -0.01 -0.02 -0.01 -0.02  0.01  -0.01  171b  0.02 -0.02  Effect  -0.01 0.01 -0.02 0.01 -0.00 0.00  O v e r a l l median 0.05 -0.02  (4)  The r e s i d u a l s and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s d(x) using AVERAGE (part a) and MEDIAN (part b ) .  (a)  T h i s uses AVERAGE f o r d e c o m p o s i t i o n .  Ads Id. Year R e s i d . 77 78 79 80 81 82 Col. e f f .  (b)  013a  0 -0 0 -O  02 06 02 07  0 06  020b  -0 0 -0 -0 -0  043a  044a  0 13 0 13 0 17 -0 43 -0 21 -0 08 -0 02 0 06 -0 06 0 16 0 04 -0  048a  057a  065a  07 0 -0 01 0.04 0. 1 1 -0 20 -0.00 14 -0.02 0. 1 1 -0 08 0.01 -0.24 0 04 -0.02 -0.07 0  072a  00 10 -0 07 -0 05 -0 15 0 1 1 0  171b  01 10 02 10 -0 08 04 0 10  0 42 -0 03 -0 11 -0. 14 -0.07 -0 07 -0 06  Effect -0.04 -0.09 0. 13 -0.02 -0.03 0.01  O v e r a l l average 4.07 0 06  T h i s uses MEDIAN f o r d e c o m p o s i t i o n .  Ads 1d. Year 77 78 79 80 81 82 Col. e f f .  013a  0 -0 0 -0  16 03 03 06  0.06  020b  043a  0 0 -0 0 -0  0 0 -0 -0 -0 0  62 01 00 10  0.38  044a  048a  14 0.02 22 -0.01 0 10 -0.15 0 02 0.07 -0 07 0.01 0 02 -0. 1 1 -0  09 11 03  057a  0 0 -0 03 -0  22 10 25 10  065a -0 -0 0 -0 0 0  02 07 02 08 11 07  072a  0 -0 -0 0 0  02 02 05 05 -0 09 0 09  0.00 -0.02 -0.11 -0.03 -0.01 -0.00  75  171b  Effect -0 -0 0 -O 0 0  03 13 04 OO 00 04  Overal1 median 0.09 | 4.02  Table 2.3: The ratio of residual SS's of the corrected and uncorrected series of p H levels.  Station Year  013a  020b  043a  044a  .86  .63  .99  .55  .52  .56  1977 1978  048a  057a  065a  072a  171  .50 .52  .79  1979  .72  .86  .73  .48  .68  .98  .65  .74  1980  .70  .90  .82  .73  .75  .72  .57  .68  1981  .75  .85  .73  .61  .98  .73  .74  .90  .07  19S2  .70  .98  .77  .55  .53  .89  .76  .81  .86  Average  .73  .92  .74  .59  .70  .71  .62  .80  .77  76  Table 2.4: Sums of squares errors for different models.  SS_!(_ j) RSS_  Rss;;„  Station  Year  043a  1977  8.97  8.23  09.0  7.71  06.7  1978  5.11  2.95  73.2  2.80  05.4  1979  1.G7  1.59  05.0  1.22  30.3  1980  7.27  G.12  18.8  5.95  02.9  072a  1982  5.28  4.59  15.0  4.28  07.2  OGSa  1978  4.25  3.84  10.7  3.30  16.4  044a  1981  3.82  2.86  33.6  2.34  22.2  013a  1980  8.54  7.80  08.5  5.71  3G.G  *  ia>  SS_;(_) denotes the sum of squares error corresponding to fitting a reduced model which involves only term d (i.e. pH = d).  Similarly, SSE(_ J,) corresponds to the model i0I  . ^  =  O S m  2nt 3W  ,  + 6 c  2rrt  ° 365 3  +  ' /  Note : the full model is 2?r£ P  *  '**  H  =  fl8in  denotes (SS_;(j) - S S £ ( _  i0>l  365  2nt +  6 C  °  S  3C5  j ) / SS£( _ ) rf(  it  e•I  +  / 0 g l 0  (in percent).  denotes (SS^JJ,.,!) - SSK(/««)) / ^E{/uii)  77  l-exp(-c/)  («n percent).  +  *  Table 2.5: The results of fitting the temporal variogram Vp(w) to the data, where  V p [ w )  =  i+1^/1.1-''  Station  Year  Number of data pts.  9i  se(ffi)  92  se(c7  013a  1070  1326  .116  .004  2.93  3.54  013a  1980  2085  .085  .002  2.04  2.37  072a  1982  1G53  .079  .003  2.85  3.72  171b  1982  2016  .0G0  .002  1.54  2.C0  78  Table 2 . 6 :  Temporal variogram estimated with data of each year and each station using  where tj and tj> belong to the same year, and i denotes station i .  Station Year  '  013a  020b  043a  044a  .094  .050  .320  .056  .040  .100  1977 1978  048a  057a  005a  072a  171b  .038 .053  .094  1979  .115  .805  .026  .051  .108  .057  .051  .008  1980  .085  .430  .071  .075  .208  .080  .242  .003  1981  .122  .364  .042  .033  .153  .043  .066  .204  .030  1982  .234  .275  .117  .009  .125  .122  .098  .078  .000  79  Table 2.7 The r e s i d u a l s and e f f e c t s obtained by decomposing the estimated c o e f f i c i e n t s gi using AVERAGE (part a) and MEDIAN (part b ) .  (a)  T h i s uses AVERAGE f o r d e c o m p o s i t i o n .  Ads i d . Year  013a  020b  R e s i d . 77 78 79 80 81 82  -0.06 -0.08 -0.00 0. 10  0. 10 0.07 0.02 -0.09 0.01 0.01 -0.03 -0.02 0.01 0.33 -0.08 -0.04 -0.02 -0.05 -0.08 -0-07 -0.03 -0.02 -0.00 0.03 -0.02 0. 13 -0.06 -0.06 -0.01 -0.01 0.02 -0.02 -0.01 0. 12 -0. 16 0.05 0.02 -0.02 0.05 0.01 -0.02  Col. e f f .  0.01  (b)  044a  0.31 -0.06 -0.08  048a  057a  065a  0.02 -0.06 -0.04  072a  171b  0.00 0.01  Effect -0.07 -0.02 0.04 0.02 -0.01 -0.00  O v e r a l l average -0.03 -0.08 0. 13  T h i s uses MEDIAN f o r d e c o m p o s i t i o n .  Ads i d . Year Res.  043a  77 78 79 80 81 82  Col . e f f .  013a  020b  -0 -0 0 O  -0 0 0 0 -0  00 05 00 06  0 05  043a  044a  048a  0 06 0.02 05 -0 00 -0.01 -0 45 -0 02 0.01 0 05 0 00 0.01 0 -0 01 -0.01 0 14 0 02 -0.03 -O  0 30 -0 02 -0.02  057a  065a  -o oo  171b  06 -0 01 0 01 02 -0 00 0 00 0 04 0 00 0 17 -0 03 -0 02 0 01 0 13 0 01 08 0 01 -0 01 -o 05 -O Ol  0 09 -0 00 -0 01  80  072a  Effect -0.02 0.00 -0.01 0.01 -0.00 0.05  Overal1 median 0.07 0 01 -0 04  Table 2.8: The estimated value  of Ct i n year i i s  <*(*.) =  '  „  m  Year Station  1979  1980  1981  1982  01 3a 020b 043a 044a 048a 057a 065a 072a 171b  4. 28 4. 84 4. 01 4. 00 4 .09 4 .16 4 .14 4 .06  4. 1 1 4. 28 3 94 4 .05 4 .03 3 .95 4 .13 3 .98  4. 14 4. 33 4. 00 4. 02 4. 13 3. 72 4. 14 4. 05 4. 07  4. 1 4 4. 27 4. 12 4. 03 4. 10 3. 89 4. 13 3. 99 4. 07  81  T a b l e 3.1; S t a t i o n e f f e c t s , u s i n g ANOVA and Median decompositions  (pH l e v e l s ) .  Year Station  Anova 013a 020b 043a 044a 04Ba 057a 065a 072a 171b  Median  .09 -.01  .10 .00  -.08  -.07  Anova  Median  . 11 . 11 -.06 -.04  . 15 . 14 -.04 -.02  - .05 -.06  -.08 .02  Anova  Median .22 .31 -.17 -.17 .03 .02 -.02 -.09  .09 .64 -.19 -.19 -.13 -.04 -.05 -.09  Anova .07 .24 -.12 .00 .02 -.09 -.04 -.08  1982  1981  1980  1979  1978  1977  Median  Anova  Median  *P°va  Median  .06 .12 -.12 .07 .06 -.07 -.12 -.06  .08 .25 -.07 -.03 .03 -.30 .03 -.01 .02  .06 . 13 -.11 .07 .00 -.33 . 13 -.05 .04  .08 . 16 -.01 -.06 .02 -.22 .01 -.06 .06  . IS . 16 .00 -.06 .03 -.25 -.02 -.10 .04  T a b l e 3.2; Monthly e f f e c t s , u s i n g ANOVA and Median decompositions  (pH l e v e l s ) .  Year 1977 Anova January February March Aprl 1 May June July August September October November December  .02 -. 13 .26 .07 -.22 -. 11 -.31 -. 13 -.03 . 13 .20 .28  Median .09 -.22 .20 .09 -.25 -.12 -.34 -.10 -.06 .11 .23 .29  1979  1978 Anova .38 -.10 . 11 .01 . 10 -.11 -.07 -.08 -.12 -.01 -.08 . 11  Median .40 -.19 . 12 -.01 . 15 -. 17 -. 11 -.02 -. 10 .05 . 10 .09  82  Anova .02 -.07 -.09 -.02 -.04 -. 18 -.29 -.22 . 18 .11 .41 . 18  Median  -  18 02 06 02 06 11 22 12 26 03 19 08  Anova .24 -.05 . 12 .03 -.18 -. 14 -.06 -.32 -.06 .28 . 12 .00  Median  -  -  1982  19B1  1980  20 03 14 01 04 11 12 24 08 18 19 01  Anova  Median  Anova  Median  -.09 . 18 . 14 -.03 -.05 -.08 -.09 -. 17 -.05 -.07 . 15 .02  _  .05 .01 -.07 . 15 -.16 -.01 -. 18 -.21 -.04 .08 . 17 .22  .06 .01 -.05 .09 -.18 -.02 -.25 -.26 - . 14 .01 .09 . 17  -  11 .18 10 01 07 13 13 17 09 07 04 05  Table 3.3: S t a t i o n e f f e c t s on r a i n f a l l volumes, u s i n g ANOVA and Median decompositions. Year  Anova  Median  -2.89 6.61  -2.46 3.00  -0.54  0.00  Anova  Median  -3.88 0.55 -0.51 1.05  -4.51 0.48 0.37 -0.37  -1.28 3.84  -2.03 5.63  1982  1981  1980  1979  1978  1977  Anova  Median  Anova  Median  Anova  11.39 0.40 -2.48 -2.20 -8.01 -3;44 -2.09 6.45  11.68 -9.25 0.38 -0.05 -5.69 -2.66 0.05 4 . 18  2.29 -2.80 -0.54 1.29 1.09 -0.27 -0.52 -0.58  5 .09 - 3 .07 - 0 .36 1 12 2 .95 - 0 .74 - 0 .43 0 37  0 -2 -1 4 -1 -2 -1 2 2  31 61 24 81 09 94 86 31 33  Median  Anova  1.78 -2.26 -1 . 9 6 6.31 0.00 -1.29 -1 . 9 7 1 .42 3.73  0 -2 -5 -1 1 -2 -4 2 10  Median  57 02 to 64 99 28 05 17 53  3 09 - 0 11 31 0 00 3 93 - 2 31 - 2 40 1 89 11 2 0  -4  T a b l e 3.4: Monthly e f f e c t s on r a i n f a l l volumes, u s i n g ANOVA and Median decompositions.  Year 1977 Anova January February March Apr 11 May •June July August September October November December  -9.21 -9.08 4.41 2.74 -10.09 2.46 -5.30 -0.50 7.25 9.93 4.04 3.34  Median -10.62 -7.20 2.75 1.72 -7.24 - 1.20 -2.63 -3.11 6.69 10.50 3.25 4.58  1979  1978 Anova  Median  10.16 9.67 -2.63 -2.10 -3.47 -0.67 -3.57 -1.86 8.98 1 2 . S O -4.60 -5.57 -0.34 -0.98 0.57 1.43 -1.71 - 0 . 2 0 -2.30 1-69 -4.04 -5.27 6.35 5.14  83  Anova 7.47 9.38 -1 .93 -0.78 0.25 -1.21 -5.11 3.24 3.64 -5.60 1 .98 -10.15  Median 10.66 1.25 1.94 1.54 -0.24 0.46 -3.68 -3.03 1 .64 -0.45 1 . 14 -9.95  1980 Anova -1.93 -3.70 3.57 3.34 3.97 0.31 -1.96 2.31 -2.09 2.72 -1.06 -5.95  Median -3.69 -4.32 3.34 2.38 4.67 0.06 -2.31 2.64 -1.81 1.77 -0.64 -6.54  1981  1982  Anova  Median  Anova  Median  -8.95 2.30 -4.81 -1.08 -0.84 -0.41 4.77 2.65 4.48 5.94 -2.09 -1.98  -7.99 3.61 -4.26 -1.03 -0.54 -1.66 1 0 . 10 6.26 4.61 5.30 - 1 . 18 0.51  3 . 14 2.44 -0.62 -1.66 -1.94 1.41 1.38 0.45 -1.63 -3.01 2 . 13 -1.93  1.80 1.11 -1.70 -2.04 -2.70 1.64 1.86 1 .60 -0.98 -3.41 0.75 -3.67  (a) S t a t i o n 020b ( I l l i n o i s . I l l i n o i s )  (c) S t a t i o n 048a (Brookhaven,NY)  Figure 1.1: F i e l d  pH.  84  (b) S t a t i o n 044a  (Ithaca,NY)  (d) S t a t i o n 065a (Penn State,PA)  «  II  x >  *  K K II  «  «  «  » x  II  XX X X X *  14 0  109.)  ours  (a) S t a t i o n 020b  IB? 9  IXIO  i  1  36 5  719.0  « II II  V  * x*  11 0  II  X  l« 0  19.0  Oflrs  197.9  ,  HID  741 '.  ixio' i  (b) S t a t i o n 044a (Ithaca,NY)  (Illinois,Illinois)  2*  X  »x x xV XX  *  x  »  1 1  " XX  "if/ "A  *  H*  X X  XX X  X  "V  X X X X „  XX  -  X X*  X  XXX  X x  x*  «x*  xV  „»  X I , *«b**x/x  » x * xx\ » \ «" xx , / x „ "x  X  .XX  X*  *  'X  '  I  x " * V * » X »x  *« 99.9  110  14 0  Oflrs  117.3  IXIO  719.0 1  « 0  I  DATS  197.9  IXIO  719.1  1  I  7)3 9  (d) S t a t i o n 065a (Penn State,PA)  (c) S t a t i o n 048a (Brookhaven,NY) Figure 1.2; R a i n f a l l volumes. 85  F i g u r e 2.1 : The pH m o n i t o r i n g networks.  86  (b) 1978  Figure  2.2; Phase and amplitude of seasonal pH v a r i a t i o n : (a) 1977; (b) 1973; (c) 1979; (d) 1980; (e) 1981; (f) 1982. The d i r e c t i o n of the ray f o r each station indicates the time of the year of lowest pH. The length of the ray indicates the amplitude of the seasonal v a r i a t i o n .  87  (c) 1979  (d)  88  1980  89  0_ UJ  a  o  ^1  -no  josl  I^GTO  1B2.S  TIME LAG  Figure  m~0^  255.5  292~0  J 2 B . 5 3 6 5 . 0  (day)  2.3: The estimates of the unsmoothed temporal variogram obtained f o r data of station 072a, i n 1982.  90  (0 0. 1 BO. O S . 0 4  ft)  - 0 . 02 -0. 05 0. 02  - 0 . 05  0. 24 0  - 0 . 06 - 0 . Ofl- 0 9  ftj  1  C . 19  - 0 . 01  0. 24  o  0. 44  - 0 . 03  -0.08  - 0 . 05  0 . 21  0. 02  .  -0. 02  -0.  08 - 0 . 08  0 . 18  - 0 . 06  t  1  - 0 . 08  - 0 . 09  0. 28  -O.-QB 08  - 0 . 05  10 1  - 0 . 01  -0. 08  <D  1  - 0 . 08  O  1  c  |  i  l  2  4  6  1 8  i  l  !  10  12  14  16  long, d i f f  Figure 2.4 : The estimates o f the unsmoothed s p a t i a l variogram for different s t a t i o n p a i r s . LONGITUDE and LATITUDE ( i n degree) denote the d i f f e r e n c e i n longitude and i n l a t i t u d e , r e s p e c t i v e l y , between a s t a t i o n p a i r . A l l years of data combined.  o. ea. op. oi o.  01  0 . 01  o. 01 -o. 0 5  0.  0 . O P - 0 -o.oi 6 - 0 . 0- 20 . 0 2  0. 0 3  o.oi  -o.oi  04  0 . 01  o. 06 --0. 0 5 - • . 0 5  -0. 03  I  - 0 . 05 - 0 . 0G -0. 02  CD I  -0.0E04  -o.  0. 03  04  GQ I  - 0 .  o I  0  J  I  I  1  2  4  6  8  0 4  I  I  L  10  12  14  long. d l F f  F i g u r e 2.5 : Same as above, data from S t a t i o n  91  020b removed.  16  -95  -90  -85  -80  longltude (a) Year 1979.  longitude (b) Year 1980.  92  -75  -70  -65  -95  -90  -85  -80  -75  -70  longitude)  (c) Year 1981.  o in  oo  ID  9  U  CD  m to  «»  m  -95  -90  (d) Year 1982.  -85  -80  -75  longltudo  Figure 2.6 ; Contour maps for pH. '*': denotes the location of station.  93  -70  -65  (1)ANOVA decomposition  2 8 2 1 0  8 9  9  2 1  9 0 2 1  013a  020b  i  i  043a  044a  1  048a STATION  1  057a  065a  072a  171b  (2) Median decomposition  9 2  8 7 9 0 9  9  8  3 0  9 2 1  013a  043a  044a  048a  057a  065a  072a  171b  STATION  F i g u r e 3.1;  Station effects  (pH l e v e l s ) .  Legend; 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982.  94  8 0 2 7 2 §  8  9  7  — i —  Jan.  —I Feb.  1  Mar.  1—  Apr.  — I — May  2  8 7 1  9 June  July  Aug.  MONTH  (1) ANOVA d e c o m p o s i t i o n Legend:  7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2  UJ  in  2 2 7  Jan.  Feb.  Mar.  Apr.  May  f 8  June MONTH  (2) Median d e c o m p o s i t i o n F i g u r e 3.2: Monthly e f f e c t s  (pH l e v e l s ) .  95  July  Aug.  9  2  1 8 0 ?  9  8  2  2 8  7  0 8  9  (1) ANOVA decomposition , 013a  i 020b  —  i 043a  1 044a  1 048a  i 057a  " 065a  1  1  072a  171b  STATION  Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982  0 2 0  <->o  2  . 2  I  0  7  9  ° 1  ?  h  8  ?  0  3  (2) Median decomposition 1 013a  1 020b  1 043a  1 044a  1 048a STATION  Figure 3.3: Station e f f e c t s  96  1 057a  1 065a  ( r a i n f a l l volumes)  i 072a  i 171b  8  7  9  8 8 7 0  8 0  Jan.  Feb.  1 9  7 2  2  9 1 2  2 9  7  8  0 1  0  8 1  T Mar.  T Apr.  T" May  ^  ^  BepT  Oct.  Dec.  Nov.  (1) ANOVA d e c o m p o s i t i o n Legend: 7=1977, 8=1978, 9=1979, 0=1980, 1=1981, 2=1982 M 0 N T H  9 8 1  I 9  0.  1  0  1  Jan.  1  Feb.  2  3  R  S •  1  Mar.  9  1  Q  2  3  g  8  7  1  Apr.  8  9  8  7  • 9  J •  i  9  1  1  1  1  May  June  July  Aug.  z n  ? 2  1  Sep.  1  Oct.  l  2 0  1  i  Nov.  Dec.  (2) Median d e c o m p o s i t i o n  ( r a i n f a l l volumes).  97  Q  o  ?  M 0 N T H  F i g u r e 3.4: Monthly e f f e c t s  g  r-i"  Figure  3.5:  Scatter' p l o t of a d j u s t e d volumes and a d j u s t e d pH l e v e l s , a l l data combined.  0-  CO  u  * ***  xl*  xx*"  x  X  X '  X  X  ".* x 1x ,»x*." x. x  1  P-I  a  CO  o .  i—  -30  -20  -10  0  10  I—  30  20  VOLUME ADJ  (mm)  40  —r* 50  — r 60  -1 70  (1) Station 013a  >  •  a °  (X  o'  >  -30  -20  >  10  -10  20  40  30  50  60  70  VOLUME ADJ (mn)  (2) Station 020b  I a.  •  *  >.  11  1 , -30  -20  1 -10  1—• 0  1  "  10 20 VOLUME ADJ (mm)  f  40  1 50  1 60  30  Figure 3.6: Scatter plots of adjusted volumes and adjusted pH l e v e l s f o r each s t a t i o n separately. 99  1 70  03  (3) Station 043a  O "  o*  o CE  x x x  o  •-  ,  S X  0_  xx X* x »x  x  x  x  x  XX***"**  „x  '  - X  x  x  * " x  x  O .  O . I  -30  -20  10  -10  20  VOLUME ADJ (mm)  (4) Station 044a  tn, cc  #  '  X Xy  * « x  K x *« **  y X  X  o-l  CO - I —  T"  -30  -10  -20  0  10  20  VOLUME ADJ (mm)  100  (5) Station 048a  XX  X X X  Q  X!  X  o  o_  X  X"  <„ w X  x  X  x x  IJ  „ "  x  X  K  P-I  oo o .  1  —1  -30  -20  -10  10  20  I  30  VOLUME ADJ (mm)  (6) Station 057a  x X X  cr  o '  X * X X  xx  x  Q-l  CO o .  -30  -20  -10  "T 0  -1—  10  20  VOLUME ADJ (mm)  101  30  (7) S t a t i o n 065a  X M  * X  2-  if  X X  X  cr o "  » • »  «  x  wx  "* M  '  „.  * X  X  "  x «x  It  *  *  *  X X  „  «  o-l  CO  o.  -30  1— -20  -1—  10  -10  20  —r 30  VOLUME ADJ (mm)  (8) S t a t i o n 072a  o"  Q CX X  xx.  x xx  X X x  X X  ix *  o.  CO  o.  I  -30  -20  10  -10  20  VOLUME ADJ (mm)  102  30  40  PLOT  OF  RESIORNK'RESID  LEGEND:  *  •  1 OBS.  B  -  7  OBS.  ETC.  A  AB BAA ACS F BF GA  BJ GG IE EL GOE  CH* BL FK MB GCE CHB HE AGC CF DAE BF AE E  -0.8  0.3 RESIO  (5) S t a t i o n 048a  AA CAA  I.S R A N K  1.0  0 R  0.S  R I A B L E  AA BFA A fC  CG GO  CHC ACF8 CGF -ICB FF BFF  0.0  8KB FOA  -0.3  R E S I 0  -1.0  BE ADA  -1.5  0.50 RESIO  (6) S t a t i o n 057a  105  DCB E  A A  A  »  PLOT OF RESIDRNK'RESIO  LEGEND: * • 1 OBS. B • 2 OBS, ETC.  A C AAA AA B ACB CBA OA CCC BFA DE  a  BDF G  AK DI AGS GO  f  BDB DA CAAA BA  3.0  -2.5  -2.0  -I.S  -1.0  -O.S  0.0  0.5  1.0  I.S  2.0  I.S  3.0  3.5  4.0  A.S  RES1D  (1) Station 013a  BCC ABBD GO AFE CK JC HO GGB HO GI  N  K  1.0  F 0  R  O.S  B  0.0  L  -0.5  E  -1.0  s  IK FF EH KC BJC H EH DCD 00 CF ADB CAA BC  I 0  -I.S  -2.0  -2.S  -2.0  -I.S  -1.0  -0.5  0.0  0 5  1.5  2.0  2.5  3.0  3.5  AO  (2) Station 020b Figure 3.7; Normal plots for each station ( a l l years)  103  separately.  4.5  PLOT OF RESIORNK'RESID  LEGENO: * • 1 OBS, B • 2 OBS. ETC.  I  F • 0  R V  0  •  1  FHL AILE  DOE AFGF EFGO GPO HOI IOI HSJ — -CRO JPK OPC  MJ  L E R E S I  BEE F B  OOK HO  AAAAA AB B A  -2 •  0  -3 •  0.6  (3)  Station  1 8  RESIO  043a  B A B  OKOO KOA DOOC CI I ID  DOKF  EIF1B  AN I CEDFN BFBIE AEBDF CCFAC BOAA ABC  E -a • s I  - J . 5  I  - 2 . 0  -1.5  - 1 . 0  -0.5  0.0  RESID  (4)  Station  044a  104  BBC BF FFCAO  A  AA  PLOT  OF R E S I O R N K ' R E S I O  LEGEND:  A  •  1 OBS. B • 2  OBS.E T C .  I  4 •  AAC CBBC CDOC ECHO FOE I ILF FKKB HFHG HRG BFCIFE JHO  i  B L E R E S I D  OP FHIA EGFB DJ  I  -1 • BD AAAAB  -2 • I  -3 •  I  -4.0 " o ! 9 " " " i . o " " - » . 5 " " - 2 . o "  - I . S -< 0  -0.5  0.0  0.5  1.0  15  2.0  2.5  3.0  3.5  I.S  2.0  2.5  3.0  3.5  4.0  4.5  RESIO  (7)  S t a t i o n 065a  a  A  ACS BC AF BE FC AED DFA AFD DI FDC KD  EL GE DEC GCB  BBA BBS  BCF AGB CE AEB B  AC ABA B AA AA A  -3.0  -2.5  (8)  -2.0  Station  -1.5  -1.0  -0.5  0.0  072a  106  0.5  1.0  PLOT OF RESIDRNX'RESID  LEGENO: * • 1 OBS, B • 2 OBS. ETC.  R N  1.0  c c AC D CC  V » R I A 8 L E R E S I  0.0 AF 8 CA  -0.5  BC CB AC BAB  -1.0  0 -1.5  A  I  .  .  0 * 0 - a ! s - 2 * 0 - V . l "  • -1.0  • .-0.5  0.0 RESIO  (9) S t a t i o n 171b  107  0.5  1.0  • I.S  •  * 2.0  " 2.5  3.0  PLOT OF RESIDRNK'RESIO  *  L E R E S I 0  LEGEND: 4 • 1 OBS, B • 2 OBS. ETC.  0.0 •-  -0.5  -1.0 •  -i.a  0.6  -I.J  RESIO  (1) 1978  F 0 R  0.5  A R I  0.0 •  L E  -0.5 •  R E S I 0  -1.0 •  AA A  0.4 RESID  (2) 1979 Figure 3.8: Normal p l o t s f o r each year separately at s t a t i o n 020b.  108  PLOT OF RESIDRNK'RESID  I  LEGEND: A • 1 OBS. B • 2 OBS. ETC.  2.S •  R A N F 0 R  B L E  AA A A BA  1.0  0.9 B CA  -0.5  ABA BA BAA  R -t.O E S I 0 -I.S  F BC CA AC  BA A  -3.0 •  I  (3) 1980  N K  1 .0  F 0 R  0.5  AB BA C BA BC DA CA  V  A  0.0  B L E  -0.5  R E S 1 0  -1.0  -1.5  -I  0 RESID  (4) 1981  109  PLOT OF RESIDRNK'RESID  F 0 R  LEGEND: A • 1 OBS, B • 2 OBS. ETC.  A AB  A  0.9 AE C A  BB BA R E S I D  -1.0  -1.5  -3.0 •  I 1.5 RESID  (5) 1982  110  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
China 19 0
Japan 8 0
United States 4 1
United Kingdom 1 0
City Views Downloads
Beijing 19 0
Tokyo 8 0
Ashburn 3 0
Nairn 1 0
Mountain View 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096703/manifest

Comment

Related Items