STATISTICAL MODELLING OF SEDIMENT CONCENTRATION by MAI PHUONG THOMPSON B.Sc, The University of British Columbia, 1984 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF T H E REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in T H E FACULTY OF GRADUATE STUDIES (The Department of Statistics) We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH COLUMBIA August, 1987 ©Mai Phuong Thompson, 1987 In presenting degree this at the thesis in University of freely available for reference copying of department publication this or of partial fulfilment of British I agree and study. by this his or her The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6(3/81) that the representatives. may be It thesis for financial gain shall not Sfelh'shcS requirements I further agree thesis for scholarly purposes permission. Department of Columbia, the is an advanced Library shall make it that permission for extensive granted head by the understood be for that allowed without of my copying or my written ABSTRACT One technique that is commonly used to replace the costly daily sampling of sediment concentration in assessing sediment discharge is the "rating curve" technique. This technique relies on the form of the relationship between sediment concentration and water discharge to estimate long-term sediment loads. In this study, a regression/time-series approach to modelling the relationship between sediment concentration and water discharge is developed. The model comprises of a linear regression of the natural logarithm of sediment concentration on the natural logarithm of water discharge and an autoregressive time-series of order one or two for the errors of the regression equation. The main inferences from the model are the estimation of annual sediment loads and the calculation of their standard errors. Bias correction factors for the bias resulted from the inverse transformation of the natural logarithm of sediment concentration are studied. The accuracy of the load estimates is checked by comparing them to figures published by Water Survey of Canada. 11 TABLE OF CONTENTS ABSTRACT ii TABLE OF CONTENTS iii LIST O F T A B L E S v LIST O F F I G U R E S vi ACKNOWLEDGEMENT vii CHAPTER 1 INTRODUCTION 1 1.1 O u t l i n e of S t u d y 3 1.2 Description of D a t a 3 a. Sediment sampling stations 3 b. Suspended sediment concentration and load 4 c. Discharge 5 1.3 V a r i a b l e Names and Descriptions CHAPTER 2 REVIEW OF SOME PREVIOUS WORK ON RATING CURVES 5 7 2.1 Kellerhals et al.'s (1974) C o n s u l t i n g Report 8 2.2 W a l l i n g and Webb (1981) 8 2.3 Smillie and K o c h (1984) and Ferguson (1986) 11 2.4 C h u r c h , Kellerhals and Ward's (1985) Consulting Report 12 2.5 Discussion 15 C H A P T E R 3 STATISTICAL ANALYSIS A N D MODELLING 18 3.0 Introduction 18 3.1 E x p l o r a t o r y D a t a Analysis for Mission D a t a 19 a. A basic m o d e l 20 b. T h e seasonal effect 20 in c. Other factors 21 d. T h e "dog leg" effect 23 e. Outliers 25 f. M o d e l for the errors 25 3.2 M o d e l for M i s s i o n D a t a 28 a. T h e proposed model 28 b. M o d e l adequacy and assumptions 28 3.3 E s t i m a t i o n of T h e A n n u a l Sediment L o a d 29 3.4 Other D a t a Sets 31 C H A P T E R 4 COMPARISON OF ESTIMATED LOADS WITH PUBLISHED DATA 34 REFERENCES 36 APPENDIX 39 A . I Bias Correction Factor for C w i t h N o r m a l Errors 39 A . 2 A Comparison of T w o Estimated Bias Correction Terms 40 A . 3 B i a s Correction Factor for C w i t h Generalized Gaussian Errors . . . 41 A . 4 Integration Process for Sediment L o a d E s t i m a t i o n 42 A . 5 E x p e c t a t i o n and Variance of the Sediment L o a d 43 A . 6 M a x i m u m Likelihood Estimators 45 IV LIST OF TABLES TABLE 2.1 The prediction factors investigated by Kellerhals et al 48 3.1 C and adjusted R for rising stage submodels of Mission 49 3.2 C and adjusted R for falling stage submodels of 50 2 p 2 p 3.3 Sample autocorrelations and sample partial autocorrelations of the errors in the Mission model 51 3.4 AIC(k) - nine- for the errors in the Mission model 51 3.5 Least squares fitted models 52 2 3.6 Rising stage model's coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions . . . . 53 3.7 Falling stage model's coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions . . . . 53 3.8 The proposed models for Mission, Hansard, Brocket and Walsingham 54 3.9 Numerical comparison between two estimates of exp(<r /2) 55 4.1 Estimated loads vs. published data for Fraser at Mission 56 4.2 Estimated loads vs. published data for Fraser at Hansard 56 2 4.3 Estimated loads vs. published data for Oldman near Brocket . . . . 57 4.4 Estimated loads vs. published data for Big Creek near Walsingham . 57 4.5 Cross-validation estimates vs. estimates based on all years 58 4.6 Sensitivity of the bias correction factor to the change in the generalized Gaussian parameter p for Mission 59 4.7 Sensitivity of the bias correction factor to the change in the generalized Gaussian parameter p for Brocket v 59 LIST O F FIGURES FIGURE 2.1 A n example where the shifting rating curve can be used 60 3.1 Q and C vs. day for M i s s i o n 1973 61 3.2 I n Q and I n C vs. day for Mission 1973 62 3.3 Scatter plot of I n Q vs. I n C for Mission 1973, by m o n t h 63 3.4a Scatter plot of I n Q vs. I n C for the rising stage of Mission 64 3.4b Scatter plot of 64 I n Q _ i 0 vs. I n C for the rising stage of M i s s i o n . . . . 3.5a Predicted I n C vs. Studentized residual for the rising stage of M i s s i o n 65 3.5b I n Q vs. Studentized residual for the rising stage of M i s s i o n . . . . 65 3.6a N o r m a l probability plot for the residuals of the rising stage, M i s s i o n 66 3.6b N o r m a l probability plot for the residuals of the falling stage, Mission 66 3.7a Scatter plot of e(t — 1) vs. e(t) for Mission 67 3.7b Scatter plot of e(t — 2) vs. e(t) for M i s s i o n 68 3.8 L a g vs. sample autocorrelation of residuals, Mission 69 3.9 L a g vs. sample p a r t i a l autocorrelation of residuals, M i s s i o n 69 3.10a Predicted I n C vs. Studentized residual for the rising stage of M i s s i o n 70 3.10b I n Q vs. Studentized residual for the rising stage of M i s s i o n 3.10c InQ_io . . . . 71 vs. Studentized residual for the rising stage of M i s s i o n . . . 72 3.11a Predicted I n C vs. Studentized residual for the falling stage of M i s s i o n 73 3.11b I n Q vs. Studentized residual for the failing stage of Mission . . . . VI 74 ACKNOWLEDGEMENT I would like to thank Doctor H. Joe for his guidance and assistance in producing this thesis. I am indebted to Doctor K. Knight for his careful reading and helpful criticisms of this thesis. I would like to express my gratitude to Doctor M. Church, who kindly offered advice on the hydrological aspects and carefully read this thesis. This thesis is funded by Canada Department of Supply and Services for the Sediment Survey Section, Water Survey of Canada, Water Resources Branch, Inland Waters Directorate, Environment Canada, by a contract to Doctor M. Church, Geography Department, U.B.C. Vll CHAPTER 1 INTRODUCTION This thesis consists of a statistical study and modelling of sediment transport observations. The basic issue is the form of the relationship between streamflow and the concentration of suspended sediment. Important inferences from the model includes the estimation of long term suspended sediment loads and the calculation of their standard errors. Information on suspended sediment load has many uses, including: 1. design and operation of reservoirs, 2. design of canals, diversion works and drainage ditches, 3. prediction, evaluation and control of deposition of natural channels, 4. evaluation of erosion control measures, 5. design of treatment facilities for water supply, 6. improvement of fish habitat. More details on the purposes of estimating suspended sediment load can be found in the consulting report by Church et al. (1985). 1 Sediment data come from sediment and hydrometric stations. In 1985, there were approximately 185 (M. Cashman, Sediment Survey of Canada, personal communication, 1987) active suspended sediment stations in Canada for which daily, monthly and yearly suspended sediment concentrations and corresponding suspended loads were being measured. All these stations were operated in conjunction with hydrometric stations which generally predate the corresponding sediment stations by a number of years. The sediment records cover periods from 1 to a maximum of 23 years while corresponding discharge records cover over 72 years. There are presently over 3000 (Kellerhals et al. (1974)) operating hydrometric stations in Canada, thus exceeding the number of sediment of sediment stations by an order of magnitude. There are many similarities between the operation of a sediment station and that of a hydrometric station. However, the hydrometric station costs much less to operate and is a prerequisite for the suspended sediment station. The cost-differential is due to the many more frequent visits, and the laboratory analysis of samples that are required in the operation of the suspended sediment stations. Because of rapidly increasing costs, it is becoming more difficult to justify the continued operation of many of the daily sediment sampling stations on streams and rivers. In these circumstances, use of a relationship between suspended sediment transport and water discharge can reduce the need to sample daily sediment concentrations and hence reduce costs. Hence, the purpose of the thesis is to explore model structure and performance for inferring sediment load from discharge data. Section 1 of this chapter gives a brief outline of the thesis. Section 2 provides a description of the data used in this study and Section 3 presents the definitions of relevant hydrological terms. 2 1.1 O U T L I N E OF S T U D Y T h i s thesis is organized as follows. A review of some of the studies relevant to the present project is presented i n Chapter 2. Included are notices of the statistical shortcomings of previous research papers. In C h a p t e r 3, the relationship between the mean daily sediment concentration, C, and the mean daily discharge, Q, is investigated. ( M e a n daily values are studied because they correspond w i t h the resolution of normally available flow data.) Modification of the model lnC = /? + AlnQt+e, t 0 where {e } are r a n d o m errors, provides an adequate fit to the data. Various discharget related variables are examined as possible additional predictors. T h e final model is fitted using least-squares regression. A n analysis of the residuals shows that the model errors are serially correlated, so {e } is modelled as a first-order autoregressive process. t T h e model is used for estimating the annual sediment loads of a river together w i t h the standard errors. Detailed calculations are given at the end of Chapter 3. F i n a l l y , i n C h a p t e r 4, summaries of estimates are given and the estimated annual loads are compared w i t h those calculated by the Water Survey of C a n a d a by direct integration from frequent observations. 1.2 DESCRIPTION OF D A T A a) Sediment sampling stations 3 Four sediment stations are considered for this study. T w o of the stations are located on the Fraser river, one at M i s s i o n (08MH024) and the other at Hansard ( 0 8 K A 0 0 4 ) . O f the remaining two, one is located on the O l d m a n River, near Brocket (05AA024), i n A l b e r t a and the other is located on the B i g Creek, near W a l s i n g h a m (02GC007), i n Ontario. T h e codes i n parentheses are station reference codes used by the W a t e r Survey of C a n a d a . Different geographical sites are picked for the purpose of checking to see if a general modelling approach remains valid i n a variety of geographical locations w i t h different characteristics. b) Suspended sediment concentration and load T h e equipment, methods and procedures used by the W a t e r Survey of C a n a d a i n obtaining suspended sediment records have been described by Stichling (1969, 1973). T h e published data consist of average daily values for suspended sediment load, L, i n tonnes per day, and suspended sediment concentration, C, i n mg/l. It should be noted that instantaneous concentration, C7, is the parameter actually determined i n the field by a sampling program. T h e rivers are sampled at irregular intervals, ranging from fractions of a day up to one month, depending on the transport rate and its variability. A continuous concentration graph is obtained by interpolating between the observed CI values. T h i s interpolation relies mostly on the stage records of the hydrometric station that is always associated w i t h the sediment stations, but many other factors are also considered. A n average concentration, C, is determined for each 24- 4 hour interval and combined with the mean daily discharge, Q, in cubic metres per second, to give the daily transport rate according to the following equation L (tonnes per day) = 0.0864 Q ( m / s ) C (mg/l) 3 Ideally CI should be used because C is based on the stage record and therefore also the discharge. However, for the present sites, CI and the corresponding discharge QI were not readily available. In order to minimize the spurious effects of discharge, all those C values for days on which no samples had been collected, were omitted from the analysis. c) Discharge The basic data used are the mean daily flows as published by the Water Survey of Canada. 1.3 V A R I A B L E N A M E S A N D DESCRIPTIONS Sediment is non-aqueous material transported by, suspended in or deposited by a flowing stream. Suspended sediment or suspended load is solid material that moves in suspension in water, either as a colloid or through the influence of the upward component of turbulent currents. Suspended sediment concentration is the ratio of the weight of dry solids in a water-sediment mixture to the volume of the mixture. Instantaneous concentration is the above ratio of a sample taken instantaneously at a fixed point in a river cross section. 5 Some relevant variable names and their descriptions are given below. These notations are used i n Chapters 2 and 3. C M e a n daily concentration of sediment. Q M e a n daily discharge. CI Instantaneous concentration of sediment. QI Instantaneous discharge. C M e a n concentration of sediment for m o n t h m. m Q m M e a n discharge for m o n t h m. Q r M e a n discharge for the period of record. Q pi M e a n discharge for interval between samples. InC N a t u r a l logarithm of the mean daily concentration of sediment. In Q N a t u r a l l o g a r i t h m of the mean daily discharge. I n Qi N a t u r a l l o g a r i t h m of the mean daily discharge for day i. ln<5_j N a t u r a l l o g a r i t h m of the lag i mean daily discharge. L T o t a l load for the period i n context. K Correction factor to take account of period of record. n N u m b e r of samples. 6 CHAPTER 2 REVIEW OF SOME PREVIOUS W O R K ON RATING CURVES R a t i n g curves for sediment concentration based upon discharge have many uses, but their most extensive use is i n estimating the sediment yield. Such rating curves are produced by fitting a function to the scatter plot of the sediment concentration transformation, w h i c h may be a daily, monthly or yearly average, against the corresponding discharge transformation. Commonly, log-transforms are used. Since the introduction of the rating curve by C a m p b e l l and Bauder (1940), much progress has been done. A brief chronological review of some of the more recent work on rating curves and sediment load estimation is presented i n Sections 2.1 to 2.4. Included are studies by Kellerhals, A b r a h a m s and V o n G a z a (1974), W a l l i n g and Webb (1981), Smillie and K o c h (1984) and C h u r c h , Kellerhals and W a r d (1985). Kellerhals et al (1974) and C h u r c h et al. (1985) are m a i n l y concerned w i t h the development of a model for the sediment concentration, which may be a daily, monthly or yearly average, based on the corresponding discharge and some discharge-related variables. W a l l i n g and Webb (1981) assessed the performances of six frequently used methods of estimating long t e r m sediment load and the reliability of load estimates obtained using rating curve technique. Smillie and K o c h (1984) derived a bias correction factor, based on a n o r m a l i t y assumption, for estimating the sediment concentration i n the usual logarithmic model employed i n a l l of the other studies. 7 2.1 K E L L E R H A L S E T AL.'S (1974) C O N S U L T I N G R E P O R T Using only the data of the high-flow period of five Western Canadian stations, Kellerhals et al. tested the effectiveness of the six factors in Table 2.1 and their logarithmic, square and square root transformations, when appropriate, in predicting C in a linear regression model of the form I n C = A, + p\ I n * ! + /3 X + ... + +p X + e. 2 2 p p Through stepwise regression, they discovered that except for the station with the smallest drainage basin, is the dominant and sufficient factor. This agrees with the findings of prior workers such as Abraham (1969) and Church (1972). To test the rating equations, Kellerhals et al. calculated the estimated load for the high-flow period of each year and found that these estimates were generally too low. Furthermore, for virtually every year on record, the highest concentration and the mean of the ten highest concentrations observed generally exceed the corresponding concentrations computed from the rating curves. The study was not carried any further due to the difficulties faced by the authors in dealing with serially correlated residuals. The general underestimation found by Kellerhals et al. could be attributed to the model being too simplistic and most likely to the bias resulted from inverting from I n C to C , as will be discussed later. As for the underestimation of the highest concentration and the mean of the ten highest concentrations, it should be noted that rating curves predict means, not extremes. 2.2 W A L L I N G A N D W E B B (1981) 8 Walling and Webb used a continuous record of sediment concentration from the River Creedy in Devon, U.K. to assess the accuracy and precision of some commonly used methods of estimating long-term sediment load. The authors divided these methods into two groups: one involving the use of the rating curves and the other, involving more direct approaches of estimation, namely 1. Total load = tf(£?=i 2. Total load = K £ " = 1 CIi/n)(£?=i QU/n) (C/ )(Q/,)/n 3. Total load = # Q7£? t C I =1 i/ n 4. Total load = KQ (E?=I(C/«)(Q^)/ £?=I(QA)) r 5. Total load = K J27=i(CIi)Q pi 6. Total load = K £ ^ = 1 CQ m m The variables are defined in Section 1.3. To examine the accuracy and precision of the methods in the latter group, Walling and Webb applied the methods to replicate sets of data representing systematic sampling at one, two, four, seven, ten and fourteen day intervals. The authors argued that although methods 2 and 4 were by far the more accurate ones, other methods, particularly 1 and 3, yielded smaller variances and therefore were preferrable. However, it should be noted that the greater precision yielded by methods 1, 3, 5 and 6 is an artifact of their having averaged out some of the variability. Furthermore, these methods are justified only in the case where C or Q is approximately constant within the periods over which the averages were taken. Positive correlation between C and Q would result in underestimation of the total loads. Since 9 for a positive correlation between C and Q, E(QC) > E(C)E(Q); hence on the average, the product between the average sediment concentration and the average flow is less than the average of the products of the two. The great increase in variability of methods 2 and 4 as the sampling interval increases is due to the fact that in systematic sampling, longer interval means fewer points and poorer precision. It should also be noted that systematic sampling may give poor precision when periodicity is present (Cochran (1977)). In evaluating the load estimates obtained using rating curves, four data sets from a seven year record, each containing fifty replicates representing four sampling strategies were assembled. The resultant rating curve relationships were applied to several frequently used load calculation procedures. The sampling strategies employed to generate the replicate data sets included regular sampling at weekly intervals and coverage of storm events with additional random sampling when flows exceeded certain thresholds. Fifty replicate relationships of the form C = aQ b where Q is the instantaneous discharge at the time of sampling, were calculated using least squares regression. In addition, two of the data sets were subdivided into four stages: winter and rising stage, winter and falling stage, summer and rising stage, summer and falling stage; four separate relationships were developed for each stage. The results showed that accuracy and precision increased with the coverage of storm events. Most noteworthy is the discovery that the use of rating relationships subdivided according to season and stage tendency produced an increase in the accuracy of the estimate provided by a particular sampling strategy. 10 In general, all four strategies underestimated the total load for the seven year period. This could be the result of poor sampling and/or poor modelling in addition to the lack of a bias correction factor. 2.3 S M I L L I E A N D K O C H (1984) A N D F E R G U S O N (1986) For years, hydrologists have overlooked the bias in their estimation of the sediment loads or concentrations that resulted from simply transforming the fitted model lnC = a + 6 InQ to obtain C = exp(a)Q* Smillie and Koch (1984) and Ferguson (1986) showed that if certain assumptions were satisfied, this bias could be easily corrected. Smillie and Koch illustrated this biased behaviour for data on the Yampa River, near Maybell and the Little Snake River, at Lilly Park, in Northwestern Colorado, U.S.A., but found that for both rivers, the bias correction factor derived overcorrected the estimates of the annual sediment loads. To further illustrate the tendency towards bias and to test the ability of the correction factor to improve a prediction, Smillie and Koch created a data set for which the following properties were satisfied: 1. A log-transformed linear relationship exists between the two variables. 2. Errors are normally distributed. 11 3. There is temporal stationarity in the system. As in the case with the real data, the uncorrected model underpredicted the average of the dependent variable. On the other hand, the use of the bias correction factor improved the estimates considerably and no sign of overcorrection was detected. The consistent overestimation noted earlier for the Yampa River and the Little Snake River could be the result of the following possibilities: • The data collected were not representative. • The model used was too simplistic. • The normality assumption was not satisfied. Smillie and Koch suggested that the serial correlation between the errors might have caused the overcorrection by the bias correction factor, but as shown in Appendix A.4, this should affect only the variance of the estimate and not the estimate itself. Ferguson applied the bias correction factor to sixteen simulated and real data sets. For the nine simulation experiments, he obtained corrected estimates that lie within three percent of the true load. For the seven real data sets, the corrected estimates range from 91 to 104 percent of the true load. So, when the normality assumption is reasonable, the bias correction factor derived appears to be an adequate one. 2.4 CHURCH, KELLERHALS AND WARD'S (1985) CONSULTING REPORT 12 As in Walling and Webb (1981), Church, Kellerhals and Ward divided long-term sediment load estimation into two groups: one involving the use of rating curves and the other involving more direct approaches. In the latter group, the authors divided the methods into three main approaches 1. Total load = K £ " = 1 C.-Qj/n 2. Total load = # ( £ " i C,-/n)(E£=i Qj/m) = 3. Total load = K £ ^ 2 = 1 CQ m m They dismissed the second and third approaches for reasons similar to those mentioned in Section 2.2 and recommended the first approach, it being the only "unbiased" one. It should be noted that the first approach is not necessarily unbiased if sampling is not random. The second approach is valid only if Q is of no use in predicting C. The third approach is useful where C and Q are approximately "independent" within the time periods over which they were averaged. Turning to the methods involving rating curves, Church, Kellerhals and Ward used data from six Western Canadian stations to run the following regressions: 1. Daily concentration vs. daily flow for all sampled days in each year, 2. As in 1 but including only every second, every fifth, or every tenth sample, 3. Daily concentration vs. daily flow for all samples in one, two, three and five years combined, 4. Annual sediment load vs. annual water volume for complete years of record. 13 In the analysis 1, the data of a year were used to derive a rating equation and this equation was i n t u r n used to estimate the year's total sediment load. T h e results were very close to those figures provided by the Water Survey of C a n a d a for the stations w i t h big drainage areas and were much less accurate for those w i t h small drainage areas. T h i s is i n agreement w i t h the findings of other workers (Kellerhals et al. (1974)). T h e estimates, however, were generally smaller than the ones published by the W a t e r Survey of Canada. T h i s general underestimation was corrected when the bias correction factor suggested b y Smillie and K o c h (1984) was used. In the analysis 2, the results were comparable to those obtained i n the analysis 1. T h e authors concluded that if long-term sediment yield was the m a i n object of observation, twenty to t h i r t y samples a year were easily sufficient. There was no systematic change i n the quality of the annual load estimates obtained using the rating curves derived from two, three or five years i n the analysis 3 i n comparison w i t h those obtained i n the analysis 1, where annual rating curves were used. T h i s suggested that the annual relationship between water discharge and sediment concentration does not vary drastically from year to year so that using the rating curve developed from data of past years to estimate future annual loads from flow data is not out of line. T h i s would, however, be subject to vagaries of the long-term behaviour of the contributing drainage basin. T h e analysis 4 showed good results for stations w i t h large drainage areas. T h e procedure, however, was not very useful for stations w i t h small drainage areas. C h u r c h , Kellerhals and W a r d went further by examining flow during restricted portions of the year for one of the stations w i t h larger drainage areas. T h e combinations were: M a y - J u n e , M a y - J u l y , A p r i l - J u l y , A p r i l - A u g u s t , May-September, and A p r i l 14 October. The authors discovered that all combinations yielded better R than the an2 nual flow. This suggested that the linear relationship assumed between the logarithm of the sediment load and that of the flow is more of a high-flow period relationship than an annual one. These R values should not be used as a means of comparisons 2 between the portions of the year as time periods over which the modelling should be done since the dependent variable in each case is different. In the end, Church, Kellerhals and Ward suggested a new method, to be used in case of sparse data, called the shifting rating curve. This method suggested that all available data be used to compute the best fitting curve of the form C = aQ . Then b assuming that 6 is fixed, a is recalculated for each sampled concentration of sediment as follows a. = CiQi~ b Linear interpolation was used to estimate the coefficient between samples. This scheme was tested on for two stations, one using every tenth sample and the other using every fifth sample. These same samples were employed to derive the yearly rating curves, and comparisons amongst the annual loads computed by this new method, by the fixed yearly rating curve and by the Water Survey of Canada were carried out. The results showed substantial improvement in the annual loads estimated by the new method over those derived from the single year, fixed rating curves method. However, this method does not have a solid foundation. Furthermore, an implicit assumption is that the data be of the form shown in Figure 2.1. 2.5 DISCUSSION With the exception of Smillie and Koch (1984), all investigators had problems with underestimating the load, even with the bias correction, when assuming that 15 the daily relationship between l n C and InQ over the year or over the high-flow period can be described by the model l n C = /? + / ? i l n Q + e 0 (2.1) Kellerhals et al. (1974) tried including several other independent variables but the results showed that only for rivers with small drainage areas did the addition of mostly one variable increase the percent of explained variation in l n C . Even then, there was no obvious second best factor. Assumptions required for the use of the adopted statistical model were mentioned by Kellerhals et al, but none of the papers reviewed in this chapter reported on whether the data satisfied the assumptions. Also, checks for the inadequacies of the models used were not mentioned in any of the papers. Some authors noted that the residuals were highly serially correlated but the problem was not pursued. With a record of continuous data, Walling and Webb (1981) were able to create replicate sets of data and obtain the sample standard deviation of the load. With the correct procedures and a sampling interval of one day, they obtained at best a standard deviation of about ten percent of the load. None of the investigators attempted to calculate the standard error of the load estimate based on a model. In this study (described in Chapter 3), the relationship between the mean daily sediment concentration and the mean daily discharge is also found to be well represented by the model (2.1). However, as Walling and Webb (1981) have discovered, the fit is much improved when the seasonal effect is taken into account. Here, the data are split into two parts: before and after the day of the peak flow, and a different relationship is fitted to each set. 16 Following the work of Kellerhals et al. (1974), several discharge-related variables are included in the the relationship mentioned in the preceding paragraph. Of these, only the inclusion of a certain lag of the mean daily discharge is found to increase the portion of explained variance in In C by a significant amount, and even then, only for one of the relationships. As noted by Kellerhals et al. (1974), Smillie and Koch (1984), and Church et al. (1985), the residuals are positively serially correlated. This is not unexpected as the observations are ordered in time, hence adjacent cases influence each other. Analysis of the residuals suggests that the errors be modelled using a first-order autoregressive model e = pe -i + Vt, t where t are independent and identically normally (i.i.d.) distributed random vari- ables with mean zero and constant variance. The standard assumption in regression analysis that the errors, e, are uncorrelated no longer applies. This violation of the uncorrelated errors assumption does not affect the bias correction factor for 6, as suggested by Smillie and Koch (1984). The bias correction factor remains exp(o-/2) in the case of normally distributed error. However, the calcu2 lation of the load variance is very sensitive to the assumption of uncorrelated errors, as shown in Appendix A.4. Up to now, only the bias correction factor for the case of normally distributed errors has been mentioned in the rating curves literature. In practice, the normality assumption may not always hold. In Section 3.3 and Appendix A.2, the sensitivity of the bias correction factor to the normality assumption is studied. Also a nonparametric estimate of the bias correction factor is introduced. 17 CHAPTER 3 STATISTICAL ANALYSIS AND MODELLING 3.0 I N T R O D U C T I O N The models used in this chapter, which are based on the exploratory data analysis described in Section 3.1, have the form l n C * = F(lnQ u •••) + £«, £* = pe«_i + i/ t> (3.1) where {r) } are independent and identically distributed error terms. That is, l n C is t ( some function of In Q and some other variables, plus an error term, which is serially t correlated in time. The model is detailed in Section 3.2. One important inference from the model is the estimation of the sediment load for the high-flow period (Section 3.3). This involves a bias correction factor from the inverse transformation of the first term in (3.1). Some theory behind the bias correction factor is presented in Appendices A.I, A.2 and A.3. Sections 3.1 and 3.2 consist of a detailed analysis and modelling of the Mission data. The Mission data are the most complete and "densely" collected set of data among the four stations under study, and perhaps in Canada. For this reason, a model developed for Mission can be used as a guide for modelling other stations, in particular Hansard, Brocket and Walsingham. Section 3.4 summarizes the analysis of these other three data sets. Most statistical analyses were carried out using S and SAS. S is especially useful 18 for exploratory data analysis; the main reference for it is Becker and Chambers (1984). 3.1 EXPLORATORY DATA ANALYSIS FOR MISSION DATA To study the relationship between C and Q in relation to time, the yearly plots of C and Q versus time are examined. As shown in Figure 3.1, the variation of C and Q by several orders of magnitude makes it very difficult to identify the relationship between the two. To facilitate this task, the plots of the natural logarithm of C , I n C , vs. the natural logarithm of Q , I n Q , are inspected. These plots, as presented in Figure 3.2, show that there exists a relationship between I n C and I n Q during the high-flow period, usually between March and October; but no clear pattern can be detected during the low-flow period between November and Febuary (due to few observations). In addition, these plots indicate that the relationship between I n C and I n Q varies from the rising stage of the high-flow period to the falling stage of the same period. This observation is made obvious by the yearly plots of I n C vs. I n Q , with each point identified by the month in which it was sampled. As shown in Figure 3.3, these plots reveal a systematic annual hysteresis. That is, the trend of I n C vs. I n Q is reasonably consistent between June and October and in March, but between April and May, departs toward substantially higher I n C for a given I n Q . For the remaining months, there appears to be no trend between I n C and I n Q . Due to having few observations and a lack of a clear pattern between I n C and I n Q in the low-flow months, only data from the high-flow period, between March and October, are included in further analyses. This omission should not affect the estimation of the annual sediment load as the amount of sediment transport during these low-flow months is observed in plots such as Figure 3.1 to contribute very little 19 to the year's total load. A sample average of the mean of daily sediment concentration multiplied by the total flow will suffice as an estimate of the total sediment transport for these low-flow months. Also omitted from the data are all samples from 1980. As a result of the very unusual weather in that year. The data do not follow the general pattern exhibited by other years. For example, the plot of l n C vs. I n Q for this year is very different pattern. a. A basic model There is no theory which determines the form of the sediment rating curve. For suspended sediment, nearly all investigators (those whose papers are reviewed in Chapter 2 and references therein) have used a power law C = aQ t h t exp(c ) (3.2) f where {e } are random errors. This relationship is no doubt inspired by the linear t trend discovered when plotting l n C against I n Q . That is, by applying the natural logarithm to both sides of equation (3.2), a linear model of the form In C t = In a + bIn Q + e t t is obtained. The method of linear regression can then be applied. b. The seasonal effect Many investigators (e.g., Miller (1951), Sharma and Dickinson (1980), Walling and Webb (1981)) believe that time is an important factor in studying the relationship between l n C and I n Q . As observed earlier, the trend of l n C vs. I n Q appeared to vary 20 from the rising limb of the hydrograph to the falling limb of the hydrograph. To substantiate this conjecture the following two models A)r + P\r InQt + ct, day t is before the peak flow A)/ + Pis^ Qt + t, day t is after the peak flow n € and lnC« = A> + /?ihiQi + C l are fitted to the data and the results compared. The gain in the portion of explained variance, the adjusted R , of the two-part model is approximately 16 percent. This 2 suggests that two separate sub-models are required: one for the pre-peak flow days, the rising stage; and the other for the post-peak flow days, the falling stage. c. Other factors The lags of the daily discharge are considered to be influential factors. It is generally observed that the sediment transport is dependent on not only the momentary discharge but also its recent history (Bogen (1980)). To investigate the above observation, the natural logarithm of the lag 1 to lag 10 of the daily flow were examined. The scatter plots of the lags vs. I n C all exhibit a linear trend, although increasing variability is observed with the larger lags. An algorithm for selecting the "best" subsets of predictor variables in regression in S), is applied to the lags along with I n Q . (LEAPS The results show that I n Q is the most dominant factor, explaining 59 percent of the variability in I n C . The addition of the lag 10 discharge, l n Q _ i 0 , is accompanied by an increase of about 21 percent of the portion of explained variance in I n C . (It should be noted that the addition of lag 7, lag 8 or lag 9 yielded comparable increase in the portion of explained variance as that of lag 10.) After these two variables have been introduced, further gain in the adjusted R? is minor. Thus, I n Q and l n Q _ i 21 0 are used as predictors for I n C . According to Abraham (1969), Walling and Teed (1971) and Church (1972), other factors believed to influence the sediment transport include the daily rate of change of discharge (rate), the number of days since the last peak flow of the year (dslp) and the previous maximum discharge of the year (prMadis). These variables are thought to be related to the amounts of sediments available for transport. These variables along with l n Q _ i and InQ will now be examined for the rising stage and 0 the falling stage separately. For the rising stage, all scatter plots (not given in the thesis) of the predictor variables vs. l n C , except the dslp vs. l n C and rate vs. l n C plots, exhibit some linear trend. The plot of rate vs. l n C shows that the magnitude of rate, abs(rate), is a better variable to use. Next, the best subsets algorithm is used to pick out the important factors. The results are summarized in Table 3.1 with p as the number of parameters (the number of predictors plus the intercept). With the adjusted R serving as the criterion, the model with I n Q and l n Q _ i o is 2 the best model. The model with InQ alone explains only 66 percent of the variability in l n C . By including l n Q _ i , the portion of explained variance in l n C is increased to 0 84 percent, which is the highest for any model with just two predictors. With InQ and l n Q _ i o already in the model, further gain in the adjusted R is insignificant. 2 The Mallow's C criterion (Draper and Smith (1981)) is also considered. The p Mallow's C statistic for a p-parameter model is defined by p C = RSS + 2p-n, p P where RSS = E ( l n C - l n C ) and n is total number of observations. Mallows (1973) , P 2 t t suggests that good models will have negative or small C - p. Although the results do p not appear to favour the above model, the ranking is the same as that provided by 22 the adjusted R criterion. 2 For the falling stage, the plots of the predictor variables vs. InC are examined. Except for the plot of prMadis (which is now a constant, ie., the maximum discharge for the year) vs. InC, all scatter plots follow a definite linear trend. Again, the magnitude of rate is used instead of rate itself. The results obtained from running the best subsets algorithm are presented in the Table 3.2. According to the adjusted R criterion, InQ is the only variable needed. The 2 model with just InQ yields an adjusted R of 81 percent. Only a maximum of about 2 1 percent can be gained by adding more variables in the model. Once more, the ranking based on the Mallow's C criterion is very close to the p ranking provided by the adjusted R criterion. 2 In choosing the "best" subset model, it should be kept in mind that although criteria are provided to aid the investigator in making the selection, a model with a simpler form that does not give up much of the criteria-based performance is often preferred over more complex models. In keeping with this rule, the models mentioned above are preferred to their competitors. d. T h e "dog leg" effect Next, a "dog leg" effect is considered; that is, the upper segment of the plot of InC vs. InQ has a tendency to flatten off or even approach horizontal. This can be explained by the fact that above a certain limit, sediment concentrations do not continue to increase with discharge at a uniform rate but may even remain constant 23 when the supply potential of the catchment has been fully achieved (Gregory and Walling (1973)). This effect can be seen in Figures 3.4a and b, where the plots of I n Q vs. I n C and I n C vs. l n Q _ i 0 for the rising stage both follow a curved line. The curved lines running through the data are the nonparametric smooths proposed by Cleveland (1975). In S, these can be obtained through the function LOWESS; for the figures, the smoothing factor used is 0.2. When the model lnC = t Po + Pi I n Q t + l n Q _ i o t e + t is fitted to the rising stage data, the plots of the Studentized residuals vs. the predicted values and the Studentized residuals vs. I n Q both betray signs of nonlinearity as shown in Figures 3.5a and b. The plot of the Studentized residuals vs. l n Q _ i 0 , on the other hand shows no indication of nonlinearity. This suggests that the model failure is more a function of I n Q than it is of l n Q _ 1 0 . Thus some transformation of I n Q is required. Careful examination of the I n C vs. I n Q plot reveals that only a slight transformation of I n Q is necessary. Figure 3.4a shows that the relationship between I n C and I n Q is approximately piecewise linear with a corner at I n Q = 8 . Thus the data for the rising stage can be fitted using two different lines joined at I n Q = 8 , namely i n r + e*, l/% + /?ilnQt + &ln<3-io« + et _ / A) + subject to continuity at l n Q t / ? i l n Q = 8 . t /?2lnQ-io< + for for I n Q * > l n Q < 8 This is equivalent to fitting the following linear model ]nC = Po + PiXl + p X2 t ( 8 2 t 2 4 t + p\ l n Q. 10t + e t where XI = (InQ - 8) x (l-d) and X2 = (InQ x d) + ((l - d) x 8), with d = 1 if InQ > 8 and d = 0 otherwise. e. Outliers The Studentized residuals (Weisberg (1980)) vs. the predicted values plots for the rising stage and the falling stage both display a few candidates for outliers. Because the least squares estimators can be very sensitive to outliers, alternative estimation methods, generally called robust regression methods, are used for comparison. These methods basically give lower weights to observations which seem inconsistent with the rest of the data, assuming that the fitted model is true. So, instead of minimizing £(lnC,- - lnC,) , 2 the function: £ p ( [ l n C , - - lnC,]/s), is minimized, p is a function that penalizes outliers (less severely than the least squares procedure) and s is a fixed scale factor. Two such well-known methods are by Huber (1972) and Andrews (1974). The loss functions for these two are as follow: 1*1 < otherwise. 1*1 < K; otherwise. Displayed in Tables 3.6 and 3.7 are the results obtained using the Huber and Andrews criteria (with the constants A and K being 1.339 and 1.345 respectively) and the sum of absolute deviations criterion. These were obtained using the functions RREG and L 1 F I T in S. The estimates are found to be very close to those obtained using least squares. Thus, it is concluded that the possible outliers observed are not influential in the estimation of the parameters. f. Model for the errors 25 So far, all the regression methods have been used in an exploratory analysis sense, because the assumption of independent and identically normally distributed errors is not necessarily valid. This assumption is now checked by studying the residuals plots. The normality assumption is checked by plotting the normal probability plot of the Studentized residuals. The plot for the rising stage is presented in Figure 3.6a and the plot for the falling stage is presented in Figure 3.6b. Both these plots indicate slightly longer tails than those of the normal distribution. Otherwise, neither suggests any sign of serious violation of the normality assumption. Because the data in this study form a time series, adjacent errors would be expected to be serially correlated. Residuals of both the rising stage and the falling stage were joined together and daily runs of residuals were plotted against time. The plots show that the successive residuals tend to be more alike than otherwise, indicating the presence of positive serial correlation. On a set of residuals for which lag 1 up to lag 6 exist, the plots of the residuals vs. their lags were examined. The plots exhibit a "lower left to upper right" tendency that is representative of positive serial correlation. The amount of scatter in the plots increases with the lag. The plots of the residuals vs. lag 1 and lag 2 are provided in Figures 3.7a and 3.7b. An estimate of the lag i correlation is P " ) = --E.J where n,- is the number e for which e_,- also exists (ie., a measurement was taken) and t t n is the length of the time series. The estimates of the lag 1 to lag 10 correlations are 0 summarized in Table 3.3. These estimates are then plotted against their respective lags. The plot, as shown in Figure 3.8, indicates a decay of the correlation as the lag increases. 26 The partial autocorrelations (Box and Jenkins (1976)), pac(i), are also estimated and the estimates are provided in Table 3.3. The partial autocorrelations form a set of statistical measurements, similar to autocorrelations, that reveal how time series values are related to each other at specified lags. The plot of the partial autocorrelations vs. the lags is shown in Figure 3.9. This figure indicates that a first or second order autoregressive model might be appropriate because the pac(i) for i > 3 are approximately zero, whereas pac(l) and pac(2) are considerably larger. The Akaike's Information Criterion (Priestley (1981)) , AIC, is a method that can be used to select an appropriate order of an autoregressive model. This criterion picks the order k for which * AIC(k) = n ln a + 2 «'=i 2 ln(l - pac(i) ) + 2k, where u is the residual mean square from the regression model for InC, is minimized. 2 However, it is also desirable to have as simple a model as possible. With this additional objective, the selected k should have a small AIC(k), and none of the should be much smaller than The AIC(i) — nlnd- , for 2 AIC(k) i (that is, AIC(i) stabilizes for AIC(i), for i > k, i > k). from one to ten, are listed in Table 3.4. Following the above guidelines, a first order autoregressive model is preferred over the second order autoregressive model. This choice is also supported by the adjusted B? criterion. The adjusted R s obtained by fitting the models 2, and e = pie -i + P2tt-2 + m t t to the set of residuals for which lag 1 and lag 2 exist, are almost equal. 27 3.2 M O D E L F O R MISSION D A T A a. The proposed model S u m m a r i z i n g the analysis i n Section 3.1, the models proposed are i_ c _ J /?o + /?ilnQt + /?2lnQ- t + e , * 10 4 l/3o + #lnQ + /? lnQ_i + e , t 2 OJ < for InQ, > 8 for InQ, < 8 subject to continuity at InQ, = 8, for the rising stage and lnC, = /? + /?ilnQ + e, 0 ( for the falling stage, where 1. InQ, is considered fixed and lnC, is random. 2. {e,} is a sequence of n o r m a l random variables w i t h mean zero and variance a 2 such that e, = pe,_! +77!, where {77,} are independent and identically normally distributed w i t h mean zero and constant variance. T h e parameters fio, fid, fii, fi[, f h , cr and p can be estimated using the m a x i m u m likelihood estimators derived i n A p p e n d i x A.6. However, this method is not available i n statistical packages and it may not be robust to the normality assumption. A n alternative method that involves two stages is to estimate fi , 0 fid, fix, fi[, and a using least squares regression and to estimate p using least squares regression on the residuals of the first regression. T h e estimates obtained b y this latter method are presented i n Table 3.5. b. Model adequacy and assumptions Some m o d e l inadequacies are more serious t h a n others. A small deviation from 28 normality for the errors is among the less serious ones. As checked earlier, the normal probability plots of the Studentized residuals for the rising stage and the falling stage both look satisfactory. A potentially more serious failure is the heterogeneity of error variance. Heterogeneity of variance can be detected by plotting the Studentized residuals against the predicted values and the independent variables. For the rising stage, the plots of the Studentized residuals vs. the predicted values and the independent variables, presented in Figures 3.10a - c, all display a slight degree of heterogeneity. However, since a main goal is to estimate the annual load of sediment transport and not the daily concentration of sediment, the slight nonconstancy of variance, which may or may not be a result of outliers, should hopefully not be a factor. The plots of the Studentized residuals vs. the predicted values and the independent variable for the falling stage are presented in Figures 3.11a - b. Neither of the plots shows any indication of heterogeneity of variance. A much more serious failure is the nonlinearity of the model. To check for nonlinearity, the plots of the Studentized residuals vs. the predicted values are studied. As shown in Figures 3.10a and 3.11a, such plots for the rising stage and the falling stage are well behaved. The specified models are considered are adequate. 3.3 ESTIMATION OF THE ANNUAL SEDIMENT LOAD In Section 3.2, a model for the natural logarithm of the mean daily sediment concentration was constructed. To obtain an estimate of the mean daily sediment concentration itself, the transformed scale prediction InC is retransformed by exp(.). 29 However, because the transformation is nonlinear and because E{C) = exp(p )Q E(exp(e)) jt exp(/?o)Q" Pl 1 0 exp(lnC) is a biased estimator (see Duan (1983) and Miller (1984) for example ). Still, if the errors are normally distributed, a major portion of the bias can be eliminated by applying the adjustment factor exp(o- /2) of Smillie and Koch (1984) (see Appendix 2 A.I). In practice, departures from the normality assumption are not uncommon. Some frequent departures from normality is that the underlying distribution of errors is approximately be symmetric but nonnormal, possibly "more peaked" than the normal with "lighter tails", or "less peaked" than the normal with "heavier tails". A family of distributions that includes some of the distributions described above is the generalized Gaussian distribution f(x) = const exp(—\x/b\ ). p A study of the sensitivity of the bias correction factor to the change in p, for 1 < p < 2 is presented in Appendix A.3. The biasing factor i?(exp(e)) can also be estimated nonparametrically using n ^exp(e,)/n !=1 . This estimator is approximately unbiased and does not assume any distribution for the errors. A comparison between the estimates of this bias correction factor and the "normal" bias correction factor is presented in Table 4.5. The two are very close, suggesting that any deviation from normality assumption is slight. A theoretical relative efficiency comparison between the two bias correction factors, under the normality assumption, is provided in Appendix A.2. The efficiency exp(<r /2) to the 2 nonparametric estimate is greater than 1. 30 After obtaining an estimator for the mean daily sediment concentration, the problem of estimating the sediment transport of the high-flow period can be addressed as follow. First, linear interpolation between C 's and between Qt's is performed to t complete the sediment hydrograph and the discharge hydrograph. Then an integration process detailed in Appendix A.4 is used to determine the daily sediment loads. These daily loads are then summed to produce the total sediment transport estimate of the high-flow period. For the low-flow period, the total sediment transport can be estimated as described in the third paragraph of Section 3.1. At Mission, it is observed that the mean daily sediment concentration remains approximately constant between December and February, but is higher in November. Therefore the sediment transport is determined separately for these two periods. The expectation and variance of the sediment load of the high-flow period are calculated in Appendix A.5. These are used to obtain standard error of the high-flow period load. Discussion of estimates of total annual loads and their standard errors is given in Chapter 4. 3.4 OTHER DATA SETS In Sections 3.1 and 3.2, the Mission data have been carefully examined and a model has been developed for the natural logarithm of the sediment concentration. The mean daily discharge, the lag 10 of the mean daily discharge and the season are the factors found to be important in defining the model. The daily rate of change of discharge, the number of days since the last peak and the previous maximum discharge are not as useful; hence, they are not included in the analyses of the other 31 stations. Basically, the same analysis was carried out for each of the remaining stations. The only difference being the additional use of a nonparametric smoothing function developed by Friedman and Stuetzle (1982), called SUPERSMOOTHER, in identifying the "peak" that separates the rising from the falling stage. For Mission, the day of the peak flow is generally the dividing point between the two stages. For the other data sets, there are some years in which there is a period after the peak flow during which it is not clear whether the general trend of the hydrograph is up or down. The SUPERSMOOTHER is used in these cases to pick out the local or global peak, which is identified with the peak of the smooth line. It should also be noted that for the Walsingham data, the year is reset to start on August 1 and ends on July 31. This is so that the full high flow period occurs sometimes in the middle of the year, as for the west coast stations, and the same analysis can be carried out. The models developed for Hansard, Brocket and Walsingham are listed in Table 3.8. The fitted models are listed in Table 3.5. It is found that a lag of the mean daily discharge of about 8 to 10 days is needed in the rising stage model for all the west coast stations. For Walsingham, the lag 1 of discharge is included in the falling stage model. The seasonal effect, as indicated in the preceding statements, is important at all four stations. Robust regressions were performed to test the sensitivity of the models to outliers. The results are listed in Tables 3.6 and 3.7. The estimates are close for all stations, indicating that the possible outliers are not influencial in the estimation of the parameters. 32 The error model used for all four stations is a first-order autoregressive model. The AIC suggested a second-order autoregressive model for Walsingham. However, the standard errors of the load estimates are not very different one way or the other, thus, the simpler model is used. The residual plots show that all four stations have a few possible outliers. However, no serious violations of the normality assumptions are noted in the normal probability plots. A slight degree of heterogeneity of variance is observed in some residual plots, but probably not serious enough to affect the estimation of the annual sediment load. Finally, it should be noted that the models developed for Brocket and especially Walsingham are not as effective, in terms of portion of explained variance in lnC, as the ones developed for Mission and Hansard. The same basic factors that explain between sixty and eighty percent of the variability in lnC at Mission, Hansard and Brocket only explain about forty five percent of the variability in In C at Walsingham. This finding is in agreement with the findings of other workers (Kellerhals et al. (1974) and Church et al. (1985)), in that models that work well for stations on big rivers tend to do poorly for those on small rivers. The reason could be that in small rivers, the sediment transport is more strongly regulated by short-term events such as storms, whereas in large rivers, these events would not be individually significant factors. 33 CHAPTER 4 COMPARISON OF ESTIMATED ANNUAL LOAD WITH PUBLISHED DATA One purpose of the models developed in Chapter 3 is to reduce the costly sample collection of the sediment concentration. A test of the usefulness of these models can therefore be made by computing the annual sediment transport of past years and comparing results with figures published by the Water Survey of Canada. In this present study, the annual sediment transport is calculated for several years per station and tabulated in tables 4.1 - 4.4 along with the published data. A simple computer program was written to calculate these values and the standard errors. It is found that although the estimates obtained are generally close to the values provided by the Water Survey of Canada for all stations, the accuracy is more consistent at Mission and Hansard than the others. As a further test, cross-validation estimates for the annual sediment transport are calculated for three years per station. That is, for each year, the year's sediment load is calculated using the model based on other years. The cross-validation test is especially important in this study as it will indicate whether or not future annual sediment loads can be well estimated using the model based on past years. As shown in Table 4.5, these estimates are as good as those obtained using the models based on data from all years. Hence, it is reasonable to assume that future annual loads can be as well estimated by the model based on past and present years as they can be by the models based on all years, including those for which the loads are estimated. Finally, to assess the precision of the estimates produced by the developed models, the standard errors of the annual sediment loads are computed (see formulas in 34 Appendix A.4). The results listed in Tables 4.1 - 4.4 show that the estimates obtained for Mission and Hansard are much more precise than those obtained for Brocket and Walsingham. The standard errors calculated for Mission and Hansard are about 10 to 15 percent of the load estimates while the standard errors calculated for Walsingham are about 20 percent of the estimated loads and those for Brocket vary between 30 and 65 percent. From the above calculations, it can be concluded that the proposed models for Mission and Hansard are very capable of replacing the sampling scheme presently employed by the Water Survey of Canada. The models for Brocket and Walsingham, on the other hand, are not as successful. Although the annual sediment transport can be reasonably well approximated using the proposed models, the precision of the estimates, in terms of standard error, is lower than ideal for Walsingham and unacceptably low for Brocket. The above findings are not surprising. Other workers (Kellerhals et al. (1974), Church et al. (1985)) have found that sediment transport in small rivers is more unpredictable than in large rivers. The reason could be that in small rivers, the sediment transport is more strongly regulated by the local events such storms, whereas in large rivers, the effect of these events is somewhat "averaged" out. The sediment transport in large rivers is more strongly regulated by larger events such as snowmelt and seasonal yield of sediment from the land surface. This reasoning is supported by the improvements attained by some workers when additional information regarding rainfalls are included in the model (Church (1972)). 35 References [1] Abraham, C. E. (1969). Suspended sediment discharges in streams, A . G. U. Golden Anniversary Meeting in Washington, D. C. [2] Andrews, D. F. (1974). A robust method for multiple linear regression, Technometrics, 16, 523-531. [3] Becker, R. A. and Chambers, J. M . (1984). S an Interactive Data Analysis and Graphics, Environment for Belmont: Wadsworth. [4] Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics, San Francisco: Holden-Day. [5] Bogen, J. (1980). The hysteresis effect of sediment transport systems, Norsk geog. Tidsskr., 34, 45-54. [6] Box, G. E . P. and Jenkins, G. M . (1976). Time Series Analysis: Forecasting and Control (revised ed.), San Francisco: Holden-Day. [7] Campbell, F. B. and Bauder, H. A. (1940). A rating-curve method for determining silt-discharge of streams, American Geophysical Union Trans., 21, 603-607. [8] Church, M . (1972). Baffin Island sandurs: a study of Arctic fluvial processes, Geographical Survey of Canada, Department of Energy Mines and Resources. [9] Church, M . , Kellerhals, R. and Ward, P. R. B. (1985). Sediment in the Pacific and the Yukon Region: review and assessment, Report, Canada Department of Environment and Inland Waters Directorate. [10] Cleveland, W. S. (1985). The Elements Wadsworth. [11] Cochran, W. G. (1977). Sampling of Graphing [13] Duan, N. (1983). Smearing estimate: Statist. Monterey: Techniques (3rd ed.), New York: Wiley. [12] Draper, N. and Smith, H. (1981). Applied Regression Analysis York: Wiley. method, J. Amer. Data, a nonparametric (2nd ed.), Newretransformation Assoc., 78, 605-610. [14] Ferguson, R. I. (1986). River loads underestimated by rating curves, Water Resources Research, 22, 74-76. 36 [15] Friedman, J. H. and Stuetzle, W. (1982). Smoothing of scatterplots, Department of Statistics, Stanford University, Tech. Report ORION006. [16] Gregory, K. J. and Walling, D. E . (1973). Drainage Basin Form and Process, London: Edward Arnold. [17] HofT, C. J . (1983). A practical Guide to Box-Jenkins Forecasting, Belmont: Lifetime Learning Publications. [18] Huber, P. J. (1972). Robust statistics: a review, Ann. Math. Statist., 43, 10411067. [19] Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte 'Carlo, Ann. Statist., 1, 799-821. [20] Kellerhals, R., Abrahams, A. D. and von Gaza, H. (1974). Possibilities for using sediment rating curves in the Canadian Sediment Survey Program, Report, Canada Department of Environment, Alberta Department of Highways and Transport, Research Council of Canada and University of Alberta, Department of Civil Engineering, REH/74/1. [21] Millard, S. P. and Guttorp, P. (1985). Hypothesis tests for regression models with autocorrelated errors. SIMS Tech. Report 106, Department of Statistics, University of British Columbia. [22] Miller, C. R. (1952). Analysis of flow-duration, sediment rating curve method of computing sediment yield, Report, U. S. Department of Interior Bureau of Reclamation. [23] Miller, D. M . (1984). Reducing transformation bias in curve fitting, Statist, 38, 124-126. Amer. [24] Sharma, T. C. and Dickinson, W. T. (1980). System model of daily sediment yield, Water Resources Research, 16, 501-506. [25] Smillie, G. M . and Koch, R. W. (1984). Bias in hydrologic prediction using log-log regression model, A . G. U. Fall Meeting. [26] Stichling, W. (1969). Instrumentation and techniques in sediment surveying, In: Hydrology, (Proc. Victoria, B. C. Symp., May 1969). [27] Stichling, W. (1973). Sediment loads in Canadian rivers, In: Fluvial and sedimentation, (Proc. University of Alberta, May 1973). 37 processes [28] Walling, D. E. and Teed, A. (1971). A simple pumping sampler for research into suspended sediment transport in small catchments, / . Hydrol, 13, 325-337. [29] Walling, D. E . and Webb, B. W. (1981). The reliability of suspended sediment load data, In: Erosion and Sediment Transport Measurement, (Proc. Florence Symp., June 1981), I A H S 133, 177-194. [30] Water Survey of Canada, Environment Canada, Sediment Data for Canadian Rivers, Environment Canada, (Formely: Inland Waters Branch, Department of Energy, Mines and Resources, Ottawa, Canada), Yearly publications, 19661983. [31] Weisberg, S. (1980). Applied Linear Regression, New York: Wiley. 38 APPENDIX A.I Bias correction factor for C with normal errors. T h e estimator for the sediment concentration obtained by taking the inverse transformation of In C of a model like ( A l .1) below, is a biased one. O n the assumption that the errors are normally distributed, a major portion of the bias can be removed by applying the correction factor derived here. Suppose for simplicity that the true model for InC is l n C = fa + p\\n.Q + e, {Al.l) where e is n o r m a l l y distributed w i t h mean 0 and constant variance c r , and the fitted 2 model is l n C = A> + A InQ. Inversely transforming the true and the fitted model yields C = exp(/? )Q exp(e) and /Jl 0 C = exp(/3 )Q^. 0 C is a biased estimator of E{C) = exp(/? )Q -£'(exp(e)). T h e bias of exp(/? ) for exp(/? ) /3l 0 and Q& for 1 0 0 w i l l be small since b o t h p\ andfixare based on a large number of observations. Hence, an approximately unbiased estimator of E(C) can be obtained by m u l t i p l y i n g C b y an estimate of £(exp(e)) ( D u a n (1983) and M i l l e r (1984)). If e ~ N(0, cr ), then E(exp(e)) = exp(cr /2), by evaluating the moment generating 2 function of e at 1. 2 So, assuming a n o r m a l error, an estimated correction t e r m is 39 exp(<7 /2), where a is the residual mean square obtained from the regression of l n C 2 2 on InQ. Similarly, this extends to a more general regression model. In this study, the correction is not done until after the load for the high-flow period has been estimated. Hence, the estimated correction term used is exp(J /2), 2 p where <? is the pooled residual mean square from the rising stage and the falling p stage: [(ra - l)ay + (n/ - l)cfj ]/(n + nj - 2). Although it is preferrable to correct the 2 2 r r individual lnC,-, a 2 A.2 and cf are close enough so that it makes no difference. 2 A c o m p a r i s o n o f t w o e s t i m a t e d bias c o r r e c t i o n t e r m s . As discussed in Appendix A.I, exp(lnC) is a biased estimator of E(C). The biasing factor is £'(exp(e)). If the errors are normally distributed, this is equal to exp(cr /2) and 2 an estimate of the biasing factor is exp(o- /2), where cr is the residual mean square 2 2 obtained from the least squares regression. For the data in this study, the normality assumption may be in doubt. A nonparametric estimate of the biasing factor is n _ 1 £ " = i exp(ej), where {e,} are the residuals of the regression model. In this appendix, the asymptotic relative efficiency (Bickel and Doksum (1977)) of the nonparametric estimate, when the errors are normal, is studied. Assume the errors are correlated as in (A6.1). By standard asymptotic theory, the Central Limit Theorem, and formulas for the moments and moment generating function of a bivariate normal distribution, x/77(exp(«7 /2) - exp(<7 /2)) 2 2 N (0, v, ), 2 where v^ = exp(o- )<r (l + p )/2(l - p ), and 2 2 4 2 2 V ^ X > x ( e , - ) ) - exp(<r /2)) 2 P 40 N (O,^ ), 2 where v 2 2 = exp(<7 ) (exp(<r ) - lj 2 2 + (exp(<rV) - x lj exp(<r )(Er= (^)- ^ [(l + 2 1 2fc 1 P )/(1-P )]). k k It can be seen that v is less than v\. Thus, the asymptotic relative efficiency of 2 exp(e>/2) with respect to n~ YA=\ P( «) is greater than one. This is expected, as a l 2 ex e parametric estimate is generally more efficient than a nonparametric estimate when the parametric model is true. A.3 B i a s c o r r e c t i o n f a c t o r f o r C w i t h g e n e r a l i z e d G a u s s i a n e r r o r s . In Appendix A.I, a bias correction factor was derived for C under the assumption that the errors are normally distributed. In practice, the underlying distribution of the errors will not be exactly normal; it may be "more peaked" with "lighter tails", or "less peaked" with "heavier tails". For the data at hand, the distribution of the residuals appears to be approximately symmetric but with "heavier tails" than the normal. In this Appendix, the biasing factor £(exp(e)) will be derived for the case of the errors being distributed according to the generalized Gaussian distribution f(x) = P exp(-\x/b\ ), p b>0, l<p<2, (A3.1) where (3 is a normalizing constant. This is a family of symmetric distributions with heavier tails than the standard normal distribution. Let e be distributed according to (A3.1). By using gamma integrals, it can be shown that /? = p(2bT(l/p))~ . 1 Because f(x) is symmetric and has exponential tails, the odd moments of e are zero. The even moments are E{e ) = 2 k Jo 41 So, assuming that the errors are generalized Gaussian, with p known, the bias correction term is If p > 1, £'(exp(e)) is finite for all values of 6 (this follows from the fact that if p > 1, E(exp(tX)) exists for all t). A study of the sensitivity of the bias correction term to the change in p can be carried out as follow. Fix a — a , where <r is a residual mean square from one of the 2 2 2 data sets. For each value of p between 1 and 2, calculate 6 that gives this <r, then 2 substitute this b in equation (A3.3) to obtain E(exp(e)). Tabulate p and E(exp(e)) as a function of p for several values of p between 1 and 2. Such tables for Mission and Oldman are provided in Tables 4.6 and 4.7. A.4 I n t e g r a t i o n p r o c e s s for s e d i m e n t l o a d d e t e r m i n a t i o n . In this appendix, details of the integration process that is used to calculate the sediment load for the high-flow period are given. As discussed in Section 3.3, linear interpolation between Cj's and between <3,'s is performed. Let C(t) - a + bt be the linear interpolation between C(h) and C(t ), where tx < t , and 2 2 Q(t) = c + dt be the linear interpolation between Q(tx) and Q(t ). To estimate the sediment load at 2 42 time t (see Section 1.2b), where t < t < t , the following is used x 2 UJ) = clJ)Q(t) = (a +~bt)(c+ dt) Thus the total sediment load between *i and t is 2 rti / ft? C(t)Q(t) dt = Jti (a + bt)(c + dt) dt Jti Without loss of generality, let ti = 0. Then a = C~ 1} c = Qi, b = (C - Cl)/t 2 2 and d = (Q - Qi)/<2 2 and rti Jo C(t)Q(t) 1 = C2Q2 (C - CiXQ -_Qi) 2 6 2 2 + C Qj/ x 2 Recall from Section 1.2b that Q is measured in m /s and C is measured in mg/l. Hence, 3 to estimate the sediment load for the first day, for example, t is replaced by 86400, 2 which is the number of seconds in a day. The total sediment load estimate for the high-flow period is Load = «'=2 since Qi are obtained daily. A.5 E x p e c t a t i o n a n d v a r i a n c e o f t h e s e d i m e n t l o a d . In this appendix, the expectation and the variance of sediment load for the highflow period are derived assuming the model like (A5.1) below. These terms are needed for obtaining the standard error of the sediment load estimate. 43 A s shown i n A p p e n d i x A . 4 , the total sediment load for the high-flow period can be approximated by CjQj _ (Cj - CViXQj - Qi_i) 2 6 Load = 86400 ] T C.-tQ,-! 2 T h i s can be w r i t t e n as £ " = i ' ' » ' where a = [(2Q /3)+(Q - /6)+(Q /6)] a C { i i 1 i+1 - f o r i = 2.. .n-1, ai = (Q2/6) + (Qi/3) and a„ = (Q„/3) + (Q„_i/6). T o calculate the expectation and the variance of the above load, E(Ci), Var(E: =1 OiCt) = E? Var(C,) and Cov(C,-, C ; ) are needed (since + ( a.^Var^) + 2 £ E . ^ a ^ C o v ^ , C,-)). = 1 Suppose for s i m p l i c i t y that lnC,- = /?o + 0ilnQ,- + e,-, (J45.1) where e,- =~ JV(0, <r ), e- = pej-j + rn, rn are independent and identically distributed 2 t r a n d o m variables and 6 + p a 2 2 = a . Then 2 2 InQ ~ N(/3 + /?i InQ,-, a ). (A5.2) 2 Q Hence, E(Ci) = exp(/? + Pi InQ,- + <r /2), by evaluating the moment generating function 2 0 (m.g.f.) of lnC,- at 1. T h e variance of the high-flow period load is a function involving Var(Q) = E{C ) — 2 E{Ci) and Cov(Q, C ) - E(CiC )-E(Ci)E(C ). E(C ) can be obtained by evaluating 2 i+t i+t 2 i+t the m.g.f. of InCj at 2., so that Var(C,-) = exp^2[/? + /?i InQ; + <r /2]^ - exp^2[/? + /?i InQ,] + a ^. 2 0 E(CiC ) i+t 2 0 c a n be calculated by noting that 44 and that E(CiCi ) = £ ( e x p ( l n C j ) exp(lnCj )) +t +t is the joint moment generating func- tion of the bivariate normal distribution with both arguments equal to 1. Hence, = e x p ( + la+t + (2cr (l + p*))/2) and E(CiC ) 2 w i+t Cov(Ci, C ) = exp^i + in i+t where m = £(lnC,), and m +t = E(\nC ) i+t + a (l + p*)J 2 +t E(Ci)E(C ), i+t are given in (A4.2). The variance of the load can be as indicated in the first paragraph of this appendix. The standard error of the load estimate can be obtained by replacing f?(lnC;), a , and p, with their estimates in i / V a r ( L o a d ) . 2 A.6 M a x i m u m likelihood estimators. As mentioned in Section 3.2a, the unknown parameters in the models (3.5) and (3.6) can be estimated using the maximum likelihood method. However, as this method is computationally more difficult, an alternative method was used. Following is a discussion of the maximum likelihood estimates (m.l.e.) and how they can be obtained. These estimates are adaptations of those in Millard and Guttorp (1985). Suppose for simplicity that there n years of data with T consecutive sampled days in each year and that the true model for \nC it lnC =/3 +0i\nQit + €it, e = pei,t-i + na, 0 it t is t = 2,...T, i = l,...n, (A6.1) where (46.2) {r]t} are iid N(0, 6 ) random variables and 8 +p a 2 2 45 2 2 — c , and e^are independent vectors. 2 The likelihood function is L(a\ p, ft, A ) = ^ r ^ l V l ^ ^ e x p ^ - I e V - ^ , 1 where -A) ft>-- and /£ 0 0 E •• ... \0 0 ..' E J A -inQ ) iT 0\ o with E being the covariance matrix in (A6.2). It should be noted that there are two cases to consider. In one case, only the data during the high-flow period are included in the analysis. In the other case, data over the whole year are analyzed. In the former case, n and T are defined as above. For the latter case, all the years on record can be joined into a sequence of observations, T is the total number of days on record and n is taken to be 1. The maximum likelihood estimates of /? , p and a , can be obtained by solving 2 0 the derivative equations of the logarithm of the likelihood function with respect to these parameters. These equations are simplified to - l 0= X'A~ X a = [(1 - f l (A6A) j )nT\- e'A- e, t 2 {A&.Z) X'A~ Y, l 1 1 and (A6.5) «=1 j=2 where InCn X = V Y = \1 lnQ n T and A — — 1 J n C ,nT / 46 ' a Combining (A6.4) and and simplifying yield a m (A6.5) 2 0 -i (1 + -p 0 0 V o o 0 0 0 o -P (1 + P2) P o r tn ese n -P i - € l = (1 - p ) and calculations, the identities det / ^ (nT)~ Y^H Xr 2 p) 2 0 \ 0 0 -p, (1+P ) 2 1 / -p are used, where ft = E/cr . No explicit forms exist for the m.l.e.'s of these parameters, 2 but they can be obtained using the following iterative algorithm. 1. Estimate /? and /3 using ordinary least squares. 0 X 2. Estimate p using the residuals obtained in the previous step with n n T T EE P=^2^2 iJ iJ-i/^2Y^ hj=2 j=2 e e e »=1 3. Estimate /? and 0 in fa (A6.3) t=l by substituting p with p in A — A(p). 4. Estimate p as in step 2 using the residuals obtained in step 3 . 5. Repeat steps 3 and 4 until estimates of /3 , /?j and p converge. 0 6. Estimate a with a = (nT)" Er=i Ej= <r 2 2 1 2 When the normal model for the errors is true, the method of estimation described above is preferred over the one suggested in Chapter 3. When the error model is an approximation, the method used in Chapter 3 is preferred since it is probably more robust to the normality assumption. The relationship between sediment concentration and water discharge is not dependent on the error model. 47 Xi The observed discharge on day i, Qi. X The first-order estimate of the relative change of discharge on day i. 2 X 3 The time in days to the first hydrograph peak encountered by scanning backwards through the discharge record. X 4 The difference in discharge between the largest flow of the year up to and including day i - l and the flow on day i. X$ The number of days since the major spring rise, defined as the minimum j for which Q(j) > 5mirn< ,Q(n). n< XQ The number of days counted backward from day i for which Q(i - m) < Q(i), limited to a maximum of 731 days. Table 2.1 The prediction factors investigated by Kellerhals et al.. 48 submodel P InQ 2 2010.2 .66 In Q - i o 2 4988.8 .34 abs[rate) 2 5892.8 .24 dslp 2 4561.8 .38 prMadis 2 4830.9 .36 I n Q , In Q - i o 3 344.4 .84 InQ, abs(rate) 3 1774.5 .69 InQ, dslp 3 1841.6 .68 InQ, prMadis 3 505.7 .82 InQ, l n Q _ i o , abs(rate) 4 328.1 .84 InQ, l n Q _ , dslp 4 317.8 .84 InQ, l n Q _ i o , prMadis 4 5.9 .88 5 303.0 .85 I n Q , lnQ_xo, abs(rate), prMadis 5 4.0 .88 InQ, l n Q _ i o , <is/p, prMadis 5 7.5 .88 I n Q , l n Q _ , a6.s(raie), ds/p, prMadis 6 5.6 .88 1 0 I n Q , l n Q _ i o , abs(rate), dslp 1 0 adjusted R 2 Table 3.1 C and adjusted R statistics for some of the better rising stage P 2 models of Mission. 49 submodel P InQ 2 151.6 .81 In Q-io 2 1379.8 .66 abs(rate) 2 5154.4 .20 dslp 2 1777.1 .61 prMadis 2 6676.4 .02 InQ, ln<3_io 3 56.2 .82 InQ, abs(rate) 3 118.1 .81 InQ, dslp 3 151.8 .81 InQ, prMadis 3 133.6 .81 InQ, lnQ_io, abs(rate) 4 29.2 .82 InQ, lnQ_io, dslp 4 39.1 .82 InQ, lnQ_io, prMadis 4 47.6 .82 5 9.3 .83 LnQ, l n Q _ , abs(rate), prMadis 5 17.9 .83 InQ, l n Q _ , dslp, prMadis 5 39.2 .83 InQ, l n Q _ , abs(rate), dslp, prMadis 6 8.4 .83 . InQ, lnQ_io, abs(rate), dslp 10 10 10 c P adjusted R 2 Table 3.2 C and adjusted R statistics for some of the better falling stage sub2 p models of Mission. 50 lag 1 2 3 4 5 6 7 8 9 10 P(0 .6819 .5840 .5017 .4592 .3808 .3473 .3276 .2878 .2785 .2711 pac(i) .6819 .2224 .0794 .0843 -.0258 .0360 .0527 -.0090 .0476 .0390 Table 3.3 Sample autocorrelations and sample partial autocorrelations of the errors in the Mission model. order n E L i ln(l - pac\i)) 1 -1559 2 -1684 3 -1698 4 -1713 5 -1713 6 -1713 7 -1719 8 -1717 9 -1721 10 -1723 + 2k Table 3.4 AIC(k) - nlncr used to select the order of the autoregressive model for the errors in the Mission model. 51 station rising stage model falling stage model Mission InC,- = -0.40 4- 3.40Xi,- + 2.05X - - 1.40lnQ_ - InCi = -9.09+ 1.59 InQ,- ii = 0.83e,_i Hansard InC; = -1.29 4- 1.59lnQ - 0.66lnQ_ ; InC,- = -1.63 + 0.98 InQ,- e,- = 0.76e,_i 2| 10l t 10 Brocket laCi = 0.33 + 1.82 ln Qi - 1.00 lnQ. Walsingham InC,- = 1.90 + 1.11 In Qi tu/iereyi = ( l n Q - 4 ) x ( l - £ ; ) Y2 = (lnQ x E) x ((1 - D) x 4) /0,«yInQ<4 \ l , i / lnQ>4 Table 3.5 Least squares fitted models. 6i InC,- = -6.52 + 0.32Fit + 2.09r i U = 0.83ei_i 2 InC,- = 3.45 + 2.03lnQ,- - 1.51 l n Q . and XI = (InQ - 8) x (1 - D) X2 = (InQ x D) x ((1 - D) x 8) n error model _ / 0 . •'/ l n Q < 8 \ 1 , « 7 lnQ>8 u ii = 0.70e<_i station variable least squares min. abs. res. Huber Andrews Mission intercept XI X2 InQ-io -0.40 3.40 2.05 -1.40 -0.68 3.42 2.09 -1.41 -0.51 3.41 2.05 -1.38 -0.52 3.43 2.05 -1.38 Hansard intercept InQ lnQ_io -1.29 1.59 -0.66 -1.00 1.55 -0.66 -1.30 1.58 -0.64 -1.31 1.5S -0.64 Brocket intercept InQ lnQ_ 0.33 1.82 -1.00 0.51 1.97 -1.23 0.34 1.89 -1.08 0.36 1.91 -1.11 intercept InQ 1.90 1.11 1.91 1.09 1.91 1.10 1.92 1.09 8 Walsingham Table 3.6 Rising stage models' coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions. station variable least squares min. abs. res. Huber Andrews Mission intercept InQ -9.09 1.59 -8.97 1.58 -8.93 1.57 -8.93 1.57 Hansard intercept InQ -1.63 0.98 -1.07 0.89 -1.42 0.95 -1.35 0.94 Brocket intercept YI Y2 -6.52 0.32 2.09 -6.73 0.26 2.13 -6.62 0.28 2.11 -6.65 0.29 2.12 Walsingham intercept InQ lnQ_i 3.45 2.03 -1.51 3.46 1.83 -1.32 3.45 1.92 -1.41 3.45 1.91 -1.40 Table 3.7 Falling stage models' coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions. 53 station , Mission c ' _ { Po + Pi In Qi + In Q-xoi + \ p' + p[ hi Qi + p In Q_ - + a, 0 2 Hansard for In Q,- > 8 10l for In Qi < 8 \nCi-po subject to continuity In Ci = Po + Pi In Q,- + t at In Qi — 8 fa 0 f /? + AlnQ.+fi, + fi[^Qi-r€i, ' ~ \P'o In d = ft + A In Q,- + Pi In Q_ - + c,- €i = /?£,•_ i + rn In Ci = /? -r Pi lnQi + e,- In Q_ ioi + fi forlnQ,->4 forlnQi<4 8l subject to continuity Walsingham e = pei_i + m + Pi InQ, + fi 0 Brocket error model falling stage model rising stage model lnC,- = ft-r/MnQi + fi In d = Po + Px In Qi atlnQi ti = pe,_i + rji = 4 + & In Q_ + f ,• lf Table 3.8 The proposed models for Mission, Hansard, Brocket and Walsingham. ei = pci_i + m station (£exp(e -))/n t exp(cx /2) Mission 1.0661 1.0626 Hansard 1.1123 1.1067 Brocket 1.4618 1.3893 Walsingham 1.2712 1.2458 2 Table 3.9 Numerical comparison between two estimates of exp(<r/2). 2 55 year estimated annual load published annual load standard error c.v. 1966 21.5252 x 10 22.7491 x 10 2.3201 X 10 10.77 1969 15.8766 x 10 17.3299 x 10 1.7522 x IO 11.03 1972 31.5177 x 10 34.1606 x 10 3.6349 x 10 11.53 1974 25.4903 x 10 27.5183 x 10 2.8044 x 10 s 11.0 1976 26.8747 x 10 27.4562 x 10 2.6212 x 10 6 9.75 1979 12.3572 x 10 15.0096 x 10 1.4485 x 10 11.72 1982 21.2988 x 10 25.5634 x 10 2.3096 x 10 10.84 s 6 s 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Table 4.1 Estimated loads vs. published data for Fraser at Mission, (c.v. is the coefficient of variation). year estimated annual load published annual load standard error c.v. 1972 4.64268 x 10 s 4.787251 x 10 0.69344 x 10 14.90 1974 3.84450 x 10 6 2.90580 x 10 0.53203 x 10 13.83 1976 3.64673 x 10 3.36528 x 10 0.40598 x 10 11.13 1978 1.57306 x 10 1.75841 x 10 0.17586 X 10 11.17 1980 1.98128 x 10 s 1.72463 x 10 0.21311 x 10 10.75 1981 2.21504 x 10 6 1.58250 x 10 0.31622 x 10 14.27 1982* 4.10280 x 10 4.26429 x 10 0.51429 x IO 6 6 s 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Table 4.2 Estimated loads vs. published data for Fraser at Hansard, (c.v. is the coefficient of variation). * figures given for period between January and September only. 56 12.53 year estimated annual load published annual load standard error cv. 1967 825659 996061 392730 47.56 1968 82606 85679 34615 41.90 1969 248710 274020 110001 44.22 1972 596555 753428 270590 45.35 1974 291576 306314 108947 37.36 1975 785850 1222067 516770 65.75 1978 143026 221912 47477 33.19 Table 4.3 Estimated loads vs. published data for Oldman near Brocket, (cv. is the coefficient of variation). year estimated annual load published annual load standard error cv. 1968 26585 31258 6400 24.07 1971 13747 14813 2433 17.69 1973 28807 24747 5306 18.41 1977 31628 37281 5111 16.15 1979 36017 30422 6318 17.54 1981 22241 20476 3570 16.05 1983 26942 23323 3076 11.41 Table 4.4 Estimated loads vs. published data for Big Creek near Walsingham. (cv. is the coefficient of variation). 57 station Mission Hansard Brocket Walsingham* year cross-validation estimate estimate based on all years 1966 21.2879 x 10 21.0585 x 10 6 1972 31.1083 x 10 31.1629 x 10 6 1979 11.8456 x 10 12.0256 x 10 6 1974 3.9557 x 10 3.8235 x 10 1978 1.5295 x 10 1982 4.1303 x 10 4.1641 x 10 1967 822611 824675 1972 581562 595551 1978 139010 142139 1973 29078 28807 1979 36782 36017 1983 27306 26942 6 6 6 6 6 6 1.5451 x 10 6 Table 4.5 Cross-validation estimates vs. estimates based on all years. * figures given for the whole year. 58 6 6 p 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 6 0.28590 0.32132 0.35291 0.38095 0.40577 0.42771 0.44713 0.46433 0.47958 0.49313 £(exp(€)) 1.06422 1.06384 1.06357 1.06335 1.06318 1.06304 1.06293 1.06283 1.06275 1.06268 Table 4.6 Sensitivity of the estimated bias correction factor to the change in the generalized Gaussian parameter p for Mission. P 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 b 0.66489 0.74726 0.82073 0.88593 0.94365 0.99469 1.03984 1.07983 1.11530 1.14682 £(expTe)) 1.45798 1.43859 1.42546 1.41597 1.40879 1.40318 1.39867 1.39498 1.39190 1.38930 Table 4.7 Sensitivity of the estimated bias correction factor to the change in generalized Gaussian parameter p for Brocket. 61 5 5 5 5 5 5 6 e 5 5 4 4 \ .44 4 c 5 10 5 s 5 5 5 5" 6 ^ 5 3 A# 7 7T_ 10 10 3 ^ 8 »10 8 12 W 1 1 2 J 11 2| 3 1 8 _8J 7 7 7 9 12112 23 6.5 7.0 7.5 8.0 8.5 9.0 InQ Figure 3.3 Scatter plot of In Q vs. In C for Mission 1973, by month. 63 9.5 CO CO o 'TV* in c • • • • • •/VV u • *? -J- X. _ • • co Iowess f=.2 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 InQ Figure 3.4a Scatter plot of InQ vs. InC for the rising stage of Mission. oo CO h CO r O ...•V . 3 .*• * * • Iowess f=.2 6.5 7.0 7.5 8.0 8.5 9.0 9.5 lnQ_10 Figure 3.4b Scatter plot of l n Q _ 64 10 vs. InC for the rising stage of Mission. to 8 pred Figure 3.5a Predicted l n C vs. Studentized residual for the rising stage of Mission. OJ r- o h CM L CO 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 InQ Figure 3.5b In Q vs. Studentized residual for the rising stage of Mission. 65 -4 -3 -1 rank Figure 3.6a Normal probability plot for the residuals of the rising stage, Mission. CD CO o CM h L -3 -2 rank Figure 3.6b Normal probability plot for the residuals of the falling stage, Mission. 66 in e(t-1) Figure 3.7a Scatter plot of e(t - 1) vs. e(i) for Mission. 67 e(t-2) Figure 3.7b Scatter plot of e(i — 2) vs. e(i) for Mission. 68 sample partial autocorrelation -0.2 0.0 0.2 0.4 0.6 sample autocorrelation 0.8 0.2 0.3 0.4 0.5 0.6 0.' CO CD 1 ' 2 3 4 5 6 7 pred Figure 3.10a Predicted l n C vs. Studentized residual for the rising stage of Mission. 70 CO •.* - •••• .•• V .V CO I 6 - 5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 InQ Figure 3.10b InQ vs. Studentized residual for the rising stage of Mission. 71 CO .1 CO CO t CO 6.5 7.0 7.5 8.0 8.5 9.0 9.5 InQjO Figure 3.10c l n Q _ i vs. Studentized residual for the rising stage of Mission. 0 72 to CD CO Cxi L_ pred Figure 3.11a Predicted InC vs. Studentized residual for the falling stage of Mission. 73 CO CM CO CO CM 7.0 7.5 8.0 8.5 9.0 9.5 10.0 LnQ Figure 3.11b InQ vs. Studentized residual for the falling stage of Mission. 74
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical modelling of sediment concentration
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical modelling of sediment concentration Thompson, Mai Phuong 1987
pdf
Page Metadata
Item Metadata
Title | Statistical modelling of sediment concentration |
Creator |
Thompson, Mai Phuong |
Publisher | University of British Columbia |
Date Issued | 1987 |
Description | One technique that is commonly used to replace the costly daily sampling of sedimentconcentration in assessing sediment discharge is the "rating curve" technique. This technique relies on the form of the relationship between sediment concentration and water discharge to estimate long-term sediment loads. In this study, a regression/time-series approach to modelling the relationship between sediment concentration and water discharge is developed. The model comprises of a linear regression of the natural logarithm of sediment concentration on the natural logarithm of water discharge and an autoregressive time-series of order one or two for the errors of the regression equation. The main inferences from the model are the estimation of annual sediment loads and the calculation of their standard errors. Bias correction factors for the bias resulted from the inverse transformation of the natural logarithm of sediment concentration are studied. The accuracy of the load estimates is checked by comparing them to figures published by Water Survey of Canada. |
Subject |
Sedimentation and deposition -- Mathematical models |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-07-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0097224 |
URI | http://hdl.handle.net/2429/26648 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1987_A6_7 T46_4.pdf [ 3.72MB ]
- Metadata
- JSON: 831-1.0097224.json
- JSON-LD: 831-1.0097224-ld.json
- RDF/XML (Pretty): 831-1.0097224-rdf.xml
- RDF/JSON: 831-1.0097224-rdf.json
- Turtle: 831-1.0097224-turtle.txt
- N-Triples: 831-1.0097224-rdf-ntriples.txt
- Original Record: 831-1.0097224-source.json
- Full Text
- 831-1.0097224-fulltext.txt
- Citation
- 831-1.0097224.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0097224/manifest