STATISTICAL MODELLING OF SEDIMENT CONCENTRATION by MAI PHUONG THOMPSON B.Sc, The University of British Columbia, 1984 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (The Department of Statistics) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August, 1987 ©Mai Phuong Thompson, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Sfelh'shcS The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6(3/81) ABSTRACT One technique that is commonly used to replace the costly daily sampling of sed-iment concentration in assessing sediment discharge is the "rating curve" technique. This technique relies on the form of the relationship between sediment concentration and water discharge to estimate long-term sediment loads. In this study, a regression/time-series approach to modelling the relationship between sediment concentration and water discharge is developed. The model com-prises of a linear regression of the natural logarithm of sediment concentration on the natural logarithm of water discharge and an autoregressive time-series of order one or two for the errors of the regression equation. The main inferences from the model are the estimation of annual sediment loads and the calculation of their standard errors. Bias correction factors for the bias resulted from the inverse transformation of the natural logarithm of sediment concentration are studied. The accuracy of the load estimates is checked by comparing them to figures published by Water Survey of Canada. 11 T A B L E O F C O N T E N T S A B S T R A C T i i T A B L E O F C O N T E N T S i i i L I S T O F T A B L E S v L I S T O F F I G U R E S v i A C K N O W L E D G E M E N T v i i C H A P T E R 1 I N T R O D U C T I O N 1 1.1 Outl ine of Study 3 1.2 Description of Da ta 3 a. Sediment sampling stations 3 b. Suspended sediment concentration and load 4 c. Discharge 5 1.3 Variable Names and Descriptions 5 C H A P T E R 2 R E V I E W O F S O M E P R E V I O U S W O R K O N R A T I N G C U R V E S 7 2.1 Kellerhals et al.'s (1974) Consult ing Report 8 2.2 Wal l ing and Webb (1981) 8 2.3 Smill ie and K o c h (1984) and Ferguson (1986) 11 2.4 Church, Kellerhals and Ward's (1985) Consulting Report 12 2.5 Discussion 15 C H A P T E R 3 S T A T I S T I C A L A N A L Y S I S A N D M O D E L L I N G 18 3.0 Introduction 18 3.1 Exploratory Data Analysis for Miss ion Data 19 a. A basic model 20 b. The seasonal effect 20 in c. Other factors 21 d. The "dog leg" effect 23 e. Outliers 25 f. M o d e l for the errors 25 3.2 M o d e l for Miss ion Data 28 a. The proposed model 28 b. Mode l adequacy and assumptions 28 3.3 Est imat ion of The Annua l Sediment Load 29 3.4 Other Da ta Sets 31 C H A P T E R 4 C O M P A R I S O N O F E S T I M A T E D L O A D S W I T H P U B L I S H E D D A T A 34 R E F E R E N C E S 36 A P P E N D I X 39 A . I Bias Correction Factor for C wi th Normal Errors 39 A . 2 A Comparison of T w o Estimated Bias Correction Terms 40 A.3 Bias Correction Factor for C wi th Generalized Gaussian Errors . . . 41 A . 4 Integration Process for Sediment Load Est imat ion 42 A . 5 Expectat ion and Variance of the Sediment Load 43 A.6 M a x i m u m Likel ihood Estimators 45 I V LIST OF TABLES TABLE 2.1 The prediction factors investigated by Kellerhals et al 48 3.1 Cp and adjusted R2 for rising stage submodels of Mission 49 3.2 Cp and adjusted R2 for falling stage submodels of 50 3.3 Sample autocorrelations and sample partial autocorrelations of the errors in the Mission model 51 3.4 AIC(k) - nine-2 for the errors in the Mission model 51 3.5 Least squares fitted models 52 3.6 Rising stage model's coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions . . . . 53 3.7 Falling stage model's coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions . . . . 53 3.8 The proposed models for Mission, Hansard, Brocket and Walsingham 54 3.9 Numerical comparison between two estimates of exp(<r2/2) 55 4.1 Estimated loads vs. published data for Fraser at Mission 56 4.2 Estimated loads vs. published data for Fraser at Hansard 56 4.3 Estimated loads vs. published data for Oldman near Brocket . . . . 57 4.4 Estimated loads vs. published data for Big Creek near Walsingham . 57 4.5 Cross-validation estimates vs. estimates based on all years 58 4.6 Sensitivity of the bias correction factor to the change in the generalized Gaussian parameter p for Mission 59 4.7 Sensitivity of the bias correction factor to the change in the generalized Gaussian parameter p for Brocket 59 v L I S T O F F I G U R E S F I G U R E 2.1 A n example where the shifting rating curve can be used 60 3.1 Q and C vs. day for Miss ion 1973 61 3.2 I n Q and I n C vs. day for Mission 1973 62 3.3 Scatter plot of I n Q vs. I n C for Mission 1973, by month 63 3.4a Scatter plot of I n Q vs. I n C for the rising stage of Mission 64 3.4b Scatter plot of I n Q _ i 0 vs. I n C for the rising stage of Miss ion . . . . 64 3.5a Predicted I n C vs. Studentized residual for the rising stage of Miss ion 65 3.5b I n Q vs. Studentized residual for the rising stage of Miss ion . . . . 65 3.6a Normal probabili ty plot for the residuals of the rising stage, Miss ion 66 3.6b Normal probabil i ty plot for the residuals of the falling stage, Mission 66 3.7a Scatter plot of e(t — 1) vs. e(t) for Mission 67 3.7b Scatter plot of e(t — 2) vs. e(t) for Miss ion 68 3.8 Lag vs. sample autocorrelation of residuals, Mission 69 3.9 L a g vs. sample part ial autocorrelation of residuals, Miss ion 69 3.10a Predicted I n C vs. Studentized residual for the rising stage of Miss ion 70 3.10b I n Q vs. Studentized residual for the rising stage of Miss ion . . . . 71 3.10c I n Q _ i o vs. Studentized residual for the rising stage of Miss ion . . . 72 3.11a Predicted I n C vs. Studentized residual for the falling stage of Miss ion 73 3.11b I n Q vs. Studentized residual for the failing stage of Mission . . . . 74 V I ACKNOWLEDGEMENT I would like to thank Doctor H. Joe for his guidance and assistance in producing this thesis. I am indebted to Doctor K. Knight for his careful reading and helpful criticisms of this thesis. I would like to express my gratitude to Doctor M. Church, who kindly offered advice on the hydrological aspects and carefully read this thesis. This thesis is funded by Canada Department of Supply and Services for the Sediment Survey Section, Water Survey of Canada, Water Resources Branch, In-land Waters Directorate, Environment Canada, by a contract to Doctor M. Church, Geography Department, U.B.C. V l l CHAPTER 1 INTRODUCTION This thesis consists of a statistical study and modelling of sediment transport observations. The basic issue is the form of the relationship between streamflow and the concentration of suspended sediment. Important inferences from the model includes the estimation of long term suspended sediment loads and the calculation of their standard errors. Information on suspended sediment load has many uses, including: 1. design and operation of reservoirs, 2. design of canals, diversion works and drainage ditches, 3. prediction, evaluation and control of deposition of natural channels, 4. evaluation of erosion control measures, 5. design of treatment facilities for water supply, 6. improvement of fish habitat. More details on the purposes of estimating suspended sediment load can be found in the consulting report by Church et al. (1985). 1 Sediment data come from sediment and hydrometric stations. In 1985, there were approximately 185 (M. Cashman, Sediment Survey of Canada, personal communica-tion, 1987) active suspended sediment stations in Canada for which daily, monthly and yearly suspended sediment concentrations and corresponding suspended loads were being measured. All these stations were operated in conjunction with hydro-metric stations which generally predate the corresponding sediment stations by a number of years. The sediment records cover periods from 1 to a maximum of 23 years while corresponding discharge records cover over 72 years. There are presently over 3000 (Kellerhals et al. (1974)) operating hydrometric stations in Canada, thus exceeding the number of sediment of sediment stations by an order of magnitude. There are many similarities between the operation of a sediment station and that of a hydrometric station. However, the hydrometric station costs much less to operate and is a prerequisite for the suspended sediment station. The cost-differential is due to the many more frequent visits, and the laboratory analysis of samples that are required in the operation of the suspended sediment stations. Because of rapidly increasing costs, it is becoming more difficult to justify the continued operation of many of the daily sediment sampling stations on streams and rivers. In these circumstances, use of a relationship between suspended sediment transport and water discharge can reduce the need to sample daily sediment concen-trations and hence reduce costs. Hence, the purpose of the thesis is to explore model structure and performance for inferring sediment load from discharge data. Section 1 of this chapter gives a brief outline of the thesis. Section 2 provides a description of the data used in this study and Section 3 presents the definitions of relevant hydrological terms. 2 1.1 OUTLINE OF STUDY This thesis is organized as follows. A review of some of the studies relevant to the present project is presented in Chapter 2. Included are notices of the statistical shortcomings of previous research papers. In Chapter 3, the relationship between the mean daily sediment concentration, C, and the mean daily discharge, Q, is investigated. (Mean daily values are stud-ied because they correspond wi th the resolution of normally available flow data.) Modificat ion of the model lnCt = /?0 + AlnQt+e, where {et} are random errors, provides an adequate fit to the data. Various discharge-related variables are examined as possible additional predictors. The final model is fitted using least-squares regression. A n analysis of the residuals shows that the model errors are serially correlated, so {et} is modelled as a first-order autoregressive process. The model is used for estimating the annual sediment loads of a river together wi th the standard errors. Detailed calculations are given at the end of Chapter 3. Final ly , i n Chapter 4, summaries of estimates are given and the estimated annual loads are compared wi th those calculated by the Water Survey of Canada by direct integration from frequent observations. 1.2 DESCRIPTION OF DATA a) Sediment sampling stations 3 Four sediment stations are considered for this study. T w o of the stations are located on the Fraser river, one at Miss ion (08MH024) and the other at Hansard (08KA004) . Of the remaining two, one is located on the Oldman River , near Brocket (05AA024), in Alber ta and the other is located on the B i g Creek, near Walsingham (02GC007), in Ontario. The codes in parentheses are station reference codes used by the Water Survey of Canada. Different geographical sites are picked for the purpose of checking to see if a general modelling approach remains valid in a variety of geographical locations wi th different characteristics. b) Suspended sediment concentration and load The equipment, methods and procedures used by the Water Survey of Canada in obtaining suspended sediment records have been described by Stichling (1969, 1973). The published data consist of average daily values for suspended sediment load, L, in tonnes per day, and suspended sediment concentration, C, in mg/l. It should be noted that instantaneous concentration, C7, is the parameter ac-tually determined in the field by a sampling program. The rivers are sampled at irregular intervals, ranging from fractions of a day up to one month, depending on the transport rate and its variability. A continuous concentration graph is obtained by interpolating between the ob-served CI values. This interpolation relies mostly on the stage records of the hydro-metric station that is always associated with the sediment stations, but many other factors are also considered. A n average concentration, C, is determined for each 24-4 hour interval and combined with the mean daily discharge, Q, in cubic metres per second, to give the daily transport rate according to the following equation L (tonnes per day) = 0.0864 Q (m 3 /s) C (mg/l) Ideally CI should be used because C is based on the stage record and therefore also the discharge. However, for the present sites, CI and the corresponding discharge QI were not readily available. In order to minimize the spurious effects of discharge, all those C values for days on which no samples had been collected, were omitted from the analysis. c) Discharge The basic data used are the mean daily flows as published by the Water Survey of Canada. 1.3 V A R I A B L E NAMES AND DESCRIPTIONS Sediment is non-aqueous material transported by, suspended in or deposited by a flowing stream. Suspended sediment or suspended load is solid material that moves in suspension in water, either as a colloid or through the influence of the upward component of turbulent currents. Suspended sediment concentration is the ratio of the weight of dry solids in a water-sediment mixture to the volume of the mixture. Instantaneous concentration is the above ratio of a sample taken instantaneously at a fixed point in a river cross section. 5 Some relevant variable names and their descriptions are given below. These notations are used in Chapters 2 and 3. C Mean daily concentration of sediment. Q Mean daily discharge. CI Instantaneous concentration of sediment. QI Instantaneous discharge. Cm Mean concentration of sediment for month m. Qm M e a n discharge for month m. Qr M e a n discharge for the period of record. Qpi M e a n discharge for interval between samples. I n C Na tura l logarithm of the mean daily concentration of sediment. I n Q Na tura l logari thm of the mean daily discharge. I n Qi Natura l logarithm of the mean daily discharge for day i. ln<5_j Na tura l logarithm of the lag i mean daily discharge. L Tota l load for the period i n context. K Correction factor to take account of period of record. n Number of samples. 6 C H A P T E R 2 R E V I E W O F S O M E P R E V I O U S W O R K O N R A T I N G C U R V E S Rat ing curves for sediment concentration based upon discharge have many uses, but their most extensive use is in estimating the sediment yield. Such rating curves are produced by fitting a function to the scatter plot of the sediment concentration transformation, which may be a daily, monthly or yearly average, against the corre-sponding discharge transformation. Commonly, log-transforms are used. Since the introduction of the rating curve by Campbel l and Bauder (1940), much progress has been done. A brief chronological review of some of the more recent work on rating curves and sediment load estimation is presented in Sections 2.1 to 2.4. Included are studies by Kellerhals, Abrahams and Von Gaza (1974), Wal l ing and Webb (1981), Smillie and K o c h (1984) and Church, Kellerhals and Ward (1985). Kellerhals et al (1974) and Church et al. (1985) are mainly concerned wi th the development of a model for the sediment concentration, which may be a daily, monthly or yearly average, based on the corresponding discharge and some discharge-related variables. Wal l ing and Webb (1981) assessed the performances of six frequently used methods of estimating long term sediment load and the reliability of load estimates obtained using rating curve technique. Smill ie and K o c h (1984) derived a bias correction factor, based on a normality assumption, for estimating the sediment concentration in the usual logarithmic model employed in a l l of the other studies. 7 2.1 K E L L E R H A L S ET AL.'S (1974) CONSULTING REPORT Using only the data of the high-flow period of five Western Canadian stations, Kellerhals et al. tested the effectiveness of the six factors in Table 2.1 and their logarithmic, square and square root transformations, when appropriate, in predicting C in a linear regression model of the form I n C = A, + p\ I n * ! + /3 2X 2 + .. . + +ppXp + e. Through stepwise regression, they discovered that except for the station with the smallest drainage basin, is the dominant and sufficient factor. This agrees with the findings of prior workers such as Abraham (1969) and Church (1972). To test the rating equations, Kellerhals et al. calculated the estimated load for the high-flow period of each year and found that these estimates were generally too low. Furthermore, for virtually every year on record, the highest concentration and the mean of the ten highest concentrations observed generally exceed the corresponding concentrations computed from the rating curves. The study was not carried any further due to the difficulties faced by the authors in dealing with serially correlated residuals. The general underestimation found by Kellerhals et al. could be attributed to the model being too simplistic and most likely to the bias resulted from inverting from I n C to C , as will be discussed later. As for the underestimation of the highest concentration and the mean of the ten highest concentrations, it should be noted that rating curves predict means, not extremes. 2.2 WALLING AND WEBB (1981) 8 Walling and Webb used a continuous record of sediment concentration from the River Creedy in Devon, U.K. to assess the accuracy and precision of some commonly used methods of estimating long-term sediment load. The authors divided these methods into two groups: one involving the use of the rating curves and the other, involving more direct approaches of estimation, namely 1. Total load = tf(£?=i CIi/n)(£?=i QU/n) 2. Total load = K £ " = 1 (C/ t)(Q/,)/n 3. Total load = # Q7£?= 1 C I i / n 4. Total load = KQr (E?=I(C/«)(Q^)/ £?=I(QA)) 5. Total load = K J27=i(CIi)Qpi 6. Total load = K £ ^ = 1 CmQm The variables are defined in Section 1.3. To examine the accuracy and precision of the methods in the latter group, Walling and Webb applied the methods to replicate sets of data representing sys-tematic sampling at one, two, four, seven, ten and fourteen day intervals. The au-thors argued that although methods 2 and 4 were by far the more accurate ones, other methods, particularly 1 and 3, yielded smaller variances and therefore were preferrable. However, it should be noted that the greater precision yielded by meth-ods 1, 3, 5 and 6 is an artifact of their having averaged out some of the variability. Furthermore, these methods are justified only in the case where C or Q is approx-imately constant within the periods over which the averages were taken. Positive correlation between C and Q would result in underestimation of the total loads. Since 9 for a positive correlation between C and Q, E(QC) > E(C)E(Q); hence on the average, the product between the average sediment concentration and the average flow is less than the average of the products of the two. The great increase in variability of methods 2 and 4 as the sampling interval increases is due to the fact that in systematic sampling, longer interval means fewer points and poorer precision. It should also be noted that systematic sampling may give poor precision when periodicity is present (Cochran (1977)). In evaluating the load estimates obtained using rating curves, four data sets from a seven year record, each containing fifty replicates representing four sampling strategies were assembled. The resultant rating curve relationships were applied to several frequently used load calculation procedures. The sampling strategies employed to generate the replicate data sets included regular sampling at weekly intervals and coverage of storm events with additional random sampling when flows exceeded certain thresholds. Fifty replicate relationships of the form C = aQb where Q is the instantaneous discharge at the time of sampling, were calculated using least squares regression. In addition, two of the data sets were subdivided into four stages: winter and rising stage, winter and falling stage, summer and rising stage, summer and falling stage; four separate relationships were developed for each stage. The results showed that accuracy and precision increased with the coverage of storm events. Most noteworthy is the discovery that the use of rating relationships subdi-vided according to season and stage tendency produced an increase in the accuracy of the estimate provided by a particular sampling strategy. 10 In general, all four strategies underestimated the total load for the seven year period. This could be the result of poor sampling and/or poor modelling in addition to the lack of a bias correction factor. 2.3 S M I L L I E A N D K O C H (1984) A N D F E R G U S O N (1986) For years, hydrologists have overlooked the bias in their estimation of the sedi-ment loads or concentrations that resulted from simply transforming the fitted model lnC = a + 6 InQ to obtain C = exp(a)Q* Smillie and Koch (1984) and Ferguson (1986) showed that if certain assumptions were satisfied, this bias could be easily corrected. Smillie and Koch illustrated this biased behaviour for data on the Yampa River, near Maybell and the Little Snake River, at Lilly Park, in Northwestern Colorado, U.S.A., but found that for both rivers, the bias correction factor derived overcorrected the estimates of the annual sediment loads. To further illustrate the tendency towards bias and to test the ability of the correction factor to improve a prediction, Smillie and Koch created a data set for which the following properties were satisfied: 1. A log-transformed linear relationship exists between the two variables. 2. Errors are normally distributed. 11 3. There is temporal stationarity in the system. As in the case with the real data, the uncorrected model underpredicted the average of the dependent variable. On the other hand, the use of the bias correction factor improved the estimates considerably and no sign of overcorrection was detected. The consistent overestimation noted earlier for the Yampa River and the Little Snake River could be the result of the following possibilities: • The data collected were not representative. • The model used was too simplistic. • The normality assumption was not satisfied. Smillie and Koch suggested that the serial correlation between the errors might have caused the overcorrection by the bias correction factor, but as shown in Appendix A.4, this should affect only the variance of the estimate and not the estimate itself. Ferguson applied the bias correction factor to sixteen simulated and real data sets. For the nine simulation experiments, he obtained corrected estimates that lie within three percent of the true load. For the seven real data sets, the corrected estimates range from 91 to 104 percent of the true load. So, when the normality assumption is reasonable, the bias correction factor derived appears to be an adequate one. 2.4 CHURCH, KELLERHALS AND WARD'S (1985) CONSULTING REPORT 12 As in Walling and Webb (1981), Church, Kellerhals and Ward divided long-term sediment load estimation into two groups: one involving the use of rating curves and the other involving more direct approaches. In the latter group, the authors divided the methods into three main approaches 1. Total load = K £ " = 1 C.-Qj/n 2. Total load = # ( £ " = i C,-/n)(E£=i Qj/m) 3. Total load = K £ ^ 2 = 1 CmQm They dismissed the second and third approaches for reasons similar to those men-tioned in Section 2.2 and recommended the first approach, it being the only "unbi-ased" one. It should be noted that the first approach is not necessarily unbiased if sampling is not random. The second approach is valid only if Q is of no use in predict-ing C. The third approach is useful where C and Q are approximately "independent" within the time periods over which they were averaged. Turning to the methods involving rating curves, Church, Kellerhals and Ward used data from six Western Canadian stations to run the following regressions: 1. Daily concentration vs. daily flow for all sampled days in each year, 2. As in 1 but including only every second, every fifth, or every tenth sample, 3. Daily concentration vs. daily flow for all samples in one, two, three and five years combined, 4. Annual sediment load vs. annual water volume for complete years of record. 13 In the analysis 1, the data of a year were used to derive a rating equation and this equation was in turn used to estimate the year's total sediment load. The results were very close to those figures provided by the Water Survey of Canada for the stations wi th big drainage areas and were much less accurate for those wi th small drainage areas. This is in agreement wi th the findings of other workers (Kellerhals et al. (1974)). The estimates, however, were generally smaller than the ones published by the Water Survey of Canada. This general underestimation was corrected when the bias correction factor suggested by Smillie and K o c h (1984) was used. In the analysis 2, the results were comparable to those obtained in the analysis 1. The authors concluded that if long-term sediment yield was the main object of observation, twenty to thir ty samples a year were easily sufficient. There was no systematic change in the quality of the annual load estimates obtained using the rating curves derived from two, three or five years in the analysis 3 in comparison wi th those obtained in the analysis 1, where annual rating curves were used. This suggested that the annual relationship between water discharge and sediment concentration does not vary drastically from year to year so that using the rating curve developed from data of past years to estimate future annual loads from flow data is not out of line. This would, however, be subject to vagaries of the long-term behaviour of the contributing drainage basin. The analysis 4 showed good results for stations wi th large drainage areas. The procedure, however, was not very useful for stations with small drainage areas. Church, Kellerhals and Ward went further by examining flow during restricted por-tions of the year for one of the stations wi th larger drainage areas. The combinations were: M a y - J u n e , M a y - J u l y , A p r i l - J u l y , Apr i l -Augus t , May-September, and A p r i l -14 October. The authors discovered that all combinations yielded better R2 than the an-nual flow. This suggested that the linear relationship assumed between the logarithm of the sediment load and that of the flow is more of a high-flow period relationship than an annual one. These R2 values should not be used as a means of comparisons between the portions of the year as time periods over which the modelling should be done since the dependent variable in each case is different. In the end, Church, Kellerhals and Ward suggested a new method, to be used in case of sparse data, called the shifting rating curve. This method suggested that all available data be used to compute the best fitting curve of the form C = aQb. Then assuming that 6 is fixed, a is recalculated for each sampled concentration of sediment as follows a. = CiQi~b Linear interpolation was used to estimate the coefficient between samples. This scheme was tested on for two stations, one using every tenth sample and the other us-ing every fifth sample. These same samples were employed to derive the yearly rating curves, and comparisons amongst the annual loads computed by this new method, by the fixed yearly rating curve and by the Water Survey of Canada were carried out. The results showed substantial improvement in the annual loads estimated by the new method over those derived from the single year, fixed rating curves method. However, this method does not have a solid foundation. Furthermore, an implicit assumption is that the data be of the form shown in Figure 2.1. 2.5 DISCUSSION With the exception of Smillie and Koch (1984), all investigators had problems with underestimating the load, even with the bias correction, when assuming that 15 the daily relationship between l n C and InQ over the year or over the high-flow period can be described by the model l n C = /? 0 + /? i lnQ + e (2.1) Kellerhals et al. (1974) tried including several other independent variables but the results showed that only for rivers with small drainage areas did the addition of mostly one variable increase the percent of explained variation in l n C . Even then, there was no obvious second best factor. Assumptions required for the use of the adopted statistical model were mentioned by Kellerhals et al, but none of the papers reviewed in this chapter reported on whether the data satisfied the assumptions. Also, checks for the inadequacies of the models used were not mentioned in any of the papers. Some authors noted that the residuals were highly serially correlated but the problem was not pursued. With a record of continuous data, Walling and Webb (1981) were able to create replicate sets of data and obtain the sample standard deviation of the load. With the correct procedures and a sampling interval of one day, they obtained at best a standard deviation of about ten percent of the load. None of the investigators attempted to calculate the standard error of the load estimate based on a model. In this study (described in Chapter 3), the relationship between the mean daily sediment concentration and the mean daily discharge is also found to be well repre-sented by the model (2.1). However, as Walling and Webb (1981) have discovered, the fit is much improved when the seasonal effect is taken into account. Here, the data are split into two parts: before and after the day of the peak flow, and a different relationship is fitted to each set. 16 Following the work of Kellerhals et al. (1974), several discharge-related variables are included in the the relationship mentioned in the preceding paragraph. Of these, only the inclusion of a certain lag of the mean daily discharge is found to increase the portion of explained variance in In C by a significant amount, and even then, only for one of the relationships. As noted by Kellerhals et al. (1974), Smillie and Koch (1984), and Church et al. (1985), the residuals are positively serially correlated. This is not unexpected as the observations are ordered in time, hence adjacent cases influence each other. Analysis of the residuals suggests that the errors be modelled using a first-order autoregressive model et = pet-i + Vt, where are independent and identically normally (i.i.d.) distributed random vari-ables with mean zero and constant variance. The standard assumption in regression analysis that the errors, e, are uncorrelated no longer applies. This violation of the uncorrelated errors assumption does not affect the bias cor-rection factor for 6, as suggested by Smillie and Koch (1984). The bias correction factor remains exp(o-2/2) in the case of normally distributed error. However, the calcu-lation of the load variance is very sensitive to the assumption of uncorrelated errors, as shown in Appendix A.4. Up to now, only the bias correction factor for the case of normally distributed errors has been mentioned in the rating curves literature. In practice, the normality assumption may not always hold. In Section 3.3 and Appendix A.2, the sensitivity of the bias correction factor to the normality assumption is studied. Also a nonpara-metric estimate of the bias correction factor is introduced. 17 CHAPTER 3 STATISTICAL ANALYSIS AND MODELLING 3.0 I N T R O D U C T I O N The models used in this chapter, which are based on the exploratory data analysis described in Section 3.1, have the form lnC* = F(lnQu •••) + £«, £* = pe«_i + i/ t > (3.1) where {r)t} are independent and identically distributed error terms. That is, l n C ( is some function of In Qt and some other variables, plus an error term, which is serially correlated in time. The model is detailed in Section 3.2. One important inference from the model is the estimation of the sediment load for the high-flow period (Section 3.3). This involves a bias correction factor from the inverse transformation of the first term in (3.1). Some theory behind the bias correction factor is presented in Appendices A.I, A.2 and A.3. Sections 3.1 and 3.2 consist of a detailed analysis and modelling of the Mission data. The Mission data are the most complete and "densely" collected set of data among the four stations under study, and perhaps in Canada. For this reason, a model developed for Mission can be used as a guide for modelling other stations, in particular Hansard, Brocket and Walsingham. Section 3.4 summarizes the analysis of these other three data sets. Most statistical analyses were carried out using S and SAS. S is especially useful 18 for exploratory data analysis; the main reference for it is Becker and Chambers (1984). 3.1 EXPLORATORY DATA ANALYSIS FOR MISSION DATA To study the relationship between C and Q in relation to time, the yearly plots of C and Q versus time are examined. As shown in Figure 3.1, the variation of C and Q by several orders of magnitude makes it very difficult to identify the relationship between the two. To facilitate this task, the plots of the natural logarithm of C , I n C , vs. the natural logarithm of Q , I n Q , are inspected. These plots, as presented in Figure 3.2, show that there exists a relationship between I n C and I n Q during the high-flow period, usually between March and October; but no clear pattern can be detected during the low-flow period between November and Febuary (due to few observations). In addition, these plots indicate that the relationship between I n C and I n Q varies from the rising stage of the high-flow period to the falling stage of the same period. This observation is made obvious by the yearly plots of I n C vs. I n Q , with each point identified by the month in which it was sampled. As shown in Figure 3.3, these plots reveal a systematic annual hysteresis. That is, the trend of I n C vs. I n Q is reasonably consistent between June and October and in March, but between April and May, departs toward substantially higher I n C for a given I n Q . For the remaining months, there appears to be no trend between I n C and I n Q . Due to having few observations and a lack of a clear pattern between I n C and I n Q in the low-flow months, only data from the high-flow period, between March and October, are included in further analyses. This omission should not affect the estimation of the annual sediment load as the amount of sediment transport during these low-flow months is observed in plots such as Figure 3.1 to contribute very little 19 to the year's total load. A sample average of the mean of daily sediment concentration multiplied by the total flow will suffice as an estimate of the total sediment transport for these low-flow months. Also omitted from the data are all samples from 1980. As a result of the very unusual weather in that year. The data do not follow the general pattern exhibited by other years. For example, the plot of l n C vs. I n Q for this year is very different pattern. a. A basic model There is no theory which determines the form of the sediment rating curve. For suspended sediment, nearly all investigators (those whose papers are reviewed in Chapter 2 and references therein) have used a power law Ct = aQth exp(cf) (3.2) where {et} are random errors. This relationship is no doubt inspired by the linear trend discovered when plotting l n C against I n Q . That is, by applying the natural logarithm to both sides of equation (3.2), a linear model of the form I n Ct = I n a + b I n Qt + et is obtained. The method of linear regression can then be applied. b. The seasonal effect Many investigators (e.g., Miller (1951), Sharma and Dickinson (1980), Walling and Webb (1981)) believe that time is an important factor in studying the relationship between l n C and I n Q . As observed earlier, the trend of l n C vs. I n Q appeared to vary 20 from the rising limb of the hydrograph to the falling limb of the hydrograph. To substantiate this conjecture the following two models are fitted to the data and the results compared. The gain in the portion of explained variance, the adjusted R2, of the two-part model is approximately 16 percent. This suggests that two separate sub-models are required: one for the pre-peak flow days, the rising stage; and the other for the post-peak flow days, the falling stage. c. Other factors The lags of the daily discharge are considered to be influential factors. It is gen-erally observed that the sediment transport is dependent on not only the momentary discharge but also its recent history (Bogen (1980)). To investigate the above observation, the natural logarithm of the lag 1 to lag 10 of the daily flow were examined. The scatter plots of the lags vs. I n C all exhibit a linear trend, although increasing variability is observed with the larger lags. An algorithm for selecting the "best" subsets of predictor variables in regression (LEAPS in S), is applied to the lags along with I n Q . The results show that I n Q is the most dominant factor, explaining 59 percent of the variability in I n C . The addition of the lag 10 discharge, l n Q _ i 0 , is accompanied by an increase of about 21 percent of the portion of explained variance in I n C . (It should be noted that the addition of lag 7, lag 8 or lag 9 yielded comparable increase in the portion of explained variance as that of lag 10.) After these two variables have been introduced, further gain in the adjusted R? is minor. Thus, I n Q and l n Q _ i 0 are used as predictors for I n C . A)r + P\r InQt + ct, day t is before the peak flow A)/ + Pis^nQt + €t, day t is after the peak flow and lnC« = A> + /?ihiQi + C l 2 1 According to Abraham (1969), Walling and Teed (1971) and Church (1972), other factors believed to influence the sediment transport include the daily rate of change of discharge (rate), the number of days since the last peak flow of the year (dslp) and the previous maximum discharge of the year (prMadis). These variables are thought to be related to the amounts of sediments available for transport. These variables along with l n Q _ i 0 and InQ will now be examined for the rising stage and the falling stage separately. For the rising stage, all scatter plots (not given in the thesis) of the predictor variables vs. l n C , except the dslp vs. l n C and rate vs. l n C plots, exhibit some linear trend. The plot of rate vs. l n C shows that the magnitude of rate, abs(rate), is a better variable to use. Next, the best subsets algorithm is used to pick out the important factors. The results are summarized in Table 3.1 with p as the number of parameters (the number of predictors plus the intercept). With the adjusted R2 serving as the criterion, the model with InQ and lnQ_ io is the best model. The model with InQ alone explains only 66 percent of the variability in l n C . By including l n Q _ i 0 , the portion of explained variance in l n C is increased to 84 percent, which is the highest for any model with just two predictors. With InQ and lnQ_ io already in the model, further gain in the adjusted R2 is insignificant. The Mallow's Cp criterion (Draper and Smith (1981)) is also considered. The Mallow's Cp statistic for a p-parameter model is defined by Cp = RSSP + 2p-n, where RSSP = E ( l n C , t - l n C t ) 2 and n is total number of observations. Mallows (1973) suggests that good models will have negative or small Cp - p. Although the results do not appear to favour the above model, the ranking is the same as that provided by 22 the adjusted R2 criterion. For the falling stage, the plots of the predictor variables vs. InC are examined. Except for the plot of prMadis (which is now a constant, ie., the maximum discharge for the year) vs. InC, all scatter plots follow a definite linear trend. Again, the magnitude of rate is used instead of rate itself. The results obtained from running the best subsets algorithm are presented in the Table 3.2. According to the adjusted R2 criterion, InQ is the only variable needed. The model with just InQ yields an adjusted R2 of 81 percent. Only a maximum of about 1 percent can be gained by adding more variables in the model. Once more, the ranking based on the Mallow's Cp criterion is very close to the ranking provided by the adjusted R2 criterion. In choosing the "best" subset model, it should be kept in mind that although criteria are provided to aid the investigator in making the selection, a model with a simpler form that does not give up much of the criteria-based performance is often preferred over more complex models. In keeping with this rule, the models mentioned above are preferred to their competitors. d. The "dog leg" effect Next, a "dog leg" effect is considered; that is, the upper segment of the plot of InC vs. InQ has a tendency to flatten off or even approach horizontal. This can be explained by the fact that above a certain limit, sediment concentrations do not continue to increase with discharge at a uniform rate but may even remain constant 23 when the supply potential of the catchment has been fully achieved (Gregory and Walling (1973)). This effect can be seen in Figures 3.4a and b, where the plots of I n Q vs. I n C and I n C vs. l n Q _ i 0 for the rising stage both follow a curved line. The curved lines running through the data are the nonparametric smooths proposed by Cleveland (1975). In S, these can be obtained through the function LOWESS; for the figures, the smoothing factor used is 0.2. When the model lnCt = Po + Pi I n Q t + l n Q _ i o t + et is fitted to the rising stage data, the plots of the Studentized residuals vs. the pre-dicted values and the Studentized residuals vs. I n Q both betray signs of nonlinearity as shown in Figures 3.5a and b. The plot of the Studentized residuals vs. l n Q _ i 0 , on the other hand shows no indication of nonlinearity. This suggests that the model failure is more a function of I n Q than it is of l n Q _ 1 0 . Thus some transformation of I n Q is required. Careful examination of the I n C vs. I n Q plot reveals that only a slight transfor-mation of I n Q is necessary. Figure 3.4a shows that the relationship between I n C and I n Q is approximately piecewise linear with a corner at I n Q = 8 . Thus the data for the rising stage can be fitted using two different lines joined at I n Q = 8 , namely i n r _ / A) + / ? i l n Q t + /?2lnQ - i o < + e*, for I n Q * > 8 l/% + /?ilnQt + &ln<3-io« + et for l n Q ( < 8 subject to continuity at l n Q t = 8 . This is equivalent to fitting the following linear model ]nCt = Po + PiXlt + p2X2t + p\ l n Q.10t + et 2 4 where XI = (InQ - 8) x (l-d) and X2 = (InQ x d) + ((l - d) x 8), with d = 1 if InQ > 8 and d = 0 otherwise. e. Outliers The Studentized residuals (Weisberg (1980)) vs. the predicted values plots for the rising stage and the falling stage both display a few candidates for outliers. Because the least squares estimators can be very sensitive to outliers, alternative estimation methods, generally called robust regression methods, are used for comparison. These methods basically give lower weights to observations which seem inconsistent with the rest of the data, assuming that the fitted model is true. So, instead of minimizing £ ( l n C , - - lnC,) 2 , the function: £p ( [ lnC, - - lnC,]/s), is minimized, p is a function that penalizes outliers (less severely than the least squares procedure) and s is a fixed scale factor. Two such well-known methods are by Huber (1972) and Andrews (1974). The loss functions for these two are as follow: Displayed in Tables 3.6 and 3.7 are the results obtained using the Huber and Andrews criteria (with the constants A and K being 1.339 and 1.345 respectively) and the sum of absolute deviations criterion. These were obtained using the functions RREG and L 1 F I T in S. The estimates are found to be very close to those obtained using least squares. Thus, it is concluded that the possible outliers observed are not influential in the estimation of the parameters. f. Model for the errors 1*1 < otherwise. 1*1 < K; otherwise. 25 So far, all the regression methods have been used in an exploratory analysis sense, because the assumption of independent and identically normally distributed errors is not necessarily valid. This assumption is now checked by studying the residuals plots. The normality assumption is checked by plotting the normal probability plot of the Studentized residuals. The plot for the rising stage is presented in Figure 3.6a and the plot for the falling stage is presented in Figure 3.6b. Both these plots indicate slightly longer tails than those of the normal distribution. Otherwise, neither suggests any sign of serious violation of the normality assumption. Because the data in this study form a time series, adjacent errors would be expected to be serially correlated. Residuals of both the rising stage and the falling stage were joined together and daily runs of residuals were plotted against time. The plots show that the successive residuals tend to be more alike than otherwise, indicating the presence of positive serial correlation. On a set of residuals for which lag 1 up to lag 6 exist, the plots of the residuals vs. their lags were examined. The plots exhibit a "lower left to upper right" tendency that is representative of positive serial correlation. The amount of scatter in the plots increases with the lag. The plots of the residuals vs. lag 1 and lag 2 are provided in Figures 3.7a and 3.7b. An estimate of the lag i correlation is P " ) = - - E . J where n,- is the number et for which et_,- also exists (ie., a measurement was taken) and n0 is the length of the time series. The estimates of the lag 1 to lag 10 correlations are summarized in Table 3.3. These estimates are then plotted against their respective lags. The plot, as shown in Figure 3.8, indicates a decay of the correlation as the lag increases. 26 The partial autocorrelations (Box and Jenkins (1976)), pac(i), are also estimated and the estimates are provided in Table 3.3. The partial autocorrelations form a set of statistical measurements, similar to autocorrelations, that reveal how time series values are related to each other at specified lags. The plot of the partial autocorrelations vs. the lags is shown in Figure 3.9. This figure indicates that a first or second order autoregressive model might be appropriate because the pac(i) for i > 3 are approximately zero, whereas pac ( l ) and pac(2) are considerably larger. The Akaike's Information Criterion (Priestley (1981)) , AIC, is a method that can be used to select an appropriate order of an autoregressive model. This criterion picks the order k for which * 2 AIC(k) = n ln a2 + ln(l - pac(i) ) + 2k, «'=i where u2 is the residual mean square from the regression model for InC, is minimized. However, it is also desirable to have as simple a model as possible. With this additional objective, the selected k should have a small AIC(k), and none of the AIC(i), for i > k, should be much smaller than AIC(k) (that is, AIC(i) stabilizes for i > k). The AIC(i) — nlnd-2, for i from one to ten, are listed in Table 3.4. Following the above guidelines, a first order autoregressive model is preferred over the second order autoregressive model. This choice is also supported by the adjusted B? criterion. The adjusted R2,s obtained by fitting the models and et = piet-i + P2tt-2 + m to the set of residuals for which lag 1 and lag 2 exist, are almost equal. 27 3.2 M O D E L FOR MISSION DATA a. The proposed model Summarizing the analysis in Section 3.1, the models proposed are i_ c _ J /?o + /?ilnQt + /?2lnQ- 1 0t + e4, for InQ, > 8 * l/3o + #lnQ t + /?2lnQ_iOJ + e<, for InQ, < 8 subject to continuity at InQ, = 8, for the rising stage and lnC, = /?0 + /?ilnQ( + e, for the falling stage, where 1. InQ, is considered fixed and lnC, is random. 2. {e,} is a sequence of normal random variables wi th mean zero and variance a2 such that e, = pe,_! +77 ! , where {77,} are independent and identically normally distributed wi th mean zero and constant variance. The parameters fio, fid, fii, fi[, f h , cr and p can be estimated using the maximum likelihood estimators derived in Appendix A.6. However, this method is not available i n statistical packages and it may not be robust to the normality assumption. A n alternative method that involves two stages is to estimate fi0, fid, fix, fi[, and a using least squares regression and to estimate p using least squares regression on the residuals of the first regression. The estimates obtained by this latter method are presented in Table 3.5. b. Model adequacy and assumptions Some model inadequacies are more serious than others. A small deviation from 28 normality for the errors is among the less serious ones. As checked earlier, the normal probability plots of the Studentized residuals for the rising stage and the falling stage both look satisfactory. A potentially more serious failure is the heterogeneity of error variance. Hetero-geneity of variance can be detected by plotting the Studentized residuals against the predicted values and the independent variables. For the rising stage, the plots of the Studentized residuals vs. the predicted values and the independent variables, pre-sented in Figures 3.10a - c, all display a slight degree of heterogeneity. However, since a main goal is to estimate the annual load of sediment transport and not the daily concentration of sediment, the slight nonconstancy of variance, which may or may not be a result of outliers, should hopefully not be a factor. The plots of the Studen-tized residuals vs. the predicted values and the independent variable for the falling stage are presented in Figures 3.11a - b. Neither of the plots shows any indication of heterogeneity of variance. A much more serious failure is the nonlinearity of the model. To check for non-linearity, the plots of the Studentized residuals vs. the predicted values are studied. As shown in Figures 3.10a and 3.11a, such plots for the rising stage and the falling stage are well behaved. The specified models are considered are adequate. 3.3 ESTIMATION OF THE ANNUAL SEDIMENT LOAD In Section 3.2, a model for the natural logarithm of the mean daily sediment concentration was constructed. To obtain an estimate of the mean daily sediment concentration itself, the transformed scale prediction InC is retransformed by exp(.). 29 However, because the transformation is nonlinear and because E{C) = exp(p0)QPlE(exp(e)) jt exp(/?o)Q"1 exp(lnC) is a biased estimator (see Duan (1983) and Miller (1984) for example ). Still, if the errors are normally distributed, a major portion of the bias can be eliminated by applying the adjustment factor exp(o-2/2) of Smillie and Koch (1984) (see Appendix A.I). In practice, departures from the normality assumption are not uncommon. Some frequent departures from normality is that the underlying distribution of errors is approximately be symmetric but nonnormal, possibly "more peaked" than the normal with "lighter tails", or "less peaked" than the normal with "heavier tails". A family of distributions that includes some of the distributions described above is the generalized Gaussian distribution f(x) = const exp(—\x/b\p). A study of the sensitivity of the bias correction factor to the change in p, for 1 < p < 2 is presented in Appendix A.3. The biasing factor i?(exp(e)) can also be estimated nonparametrically using n ^exp(e,)/n !=1 . This estimator is approximately unbiased and does not assume any distribution for the errors. A comparison between the estimates of this bias correction factor and the "normal" bias correction factor is presented in Table 4.5. The two are very close, suggesting that any deviation from normality assumption is slight. A theoreti-cal relative efficiency comparison between the two bias correction factors, under the normality assumption, is provided in Appendix A.2. The efficiency exp(<r2/2) to the nonparametric estimate is greater than 1. 30 After obtaining an estimator for the mean daily sediment concentration, the problem of estimating the sediment transport of the high-flow period can be addressed as follow. First, linear interpolation between C t's and between Qt's is performed to complete the sediment hydrograph and the discharge hydrograph. Then an integra-tion process detailed in Appendix A.4 is used to determine the daily sediment loads. These daily loads are then summed to produce the total sediment transport estimate of the high-flow period. For the low-flow period, the total sediment transport can be estimated as de-scribed in the third paragraph of Section 3.1. At Mission, it is observed that the mean daily sediment concentration remains approximately constant between December and February, but is higher in November. Therefore the sediment transport is determined separately for these two periods. The expectation and variance of the sediment load of the high-flow period are calculated in Appendix A.5. These are used to obtain standard error of the high-flow period load. Discussion of estimates of total annual loads and their standard errors is given in Chapter 4. 3.4 OTHER DATA SETS In Sections 3.1 and 3.2, the Mission data have been carefully examined and a model has been developed for the natural logarithm of the sediment concentration. The mean daily discharge, the lag 10 of the mean daily discharge and the season are the factors found to be important in defining the model. The daily rate of change of discharge, the number of days since the last peak and the previous maximum discharge are not as useful; hence, they are not included in the analyses of the other 31 stations. Basically, the same analysis was carried out for each of the remaining stations. The only difference being the additional use of a nonparametric smoothing function developed by Friedman and Stuetzle (1982), called SUPERSMOOTHER, in identifying the "peak" that separates the rising from the falling stage. For Mission, the day of the peak flow is generally the dividing point between the two stages. For the other data sets, there are some years in which there is a period after the peak flow during which it is not clear whether the general trend of the hydrograph is up or down. The SUPERSMOOTHER is used in these cases to pick out the local or global peak, which is identified with the peak of the smooth line. It should also be noted that for the Walsingham data, the year is reset to start on August 1 and ends on July 31. This is so that the full high flow period occurs sometimes in the middle of the year, as for the west coast stations, and the same analysis can be carried out. The models developed for Hansard, Brocket and Walsingham are listed in Table 3.8. The fitted models are listed in Table 3.5. It is found that a lag of the mean daily discharge of about 8 to 10 days is needed in the rising stage model for all the west coast stations. For Walsingham, the lag 1 of discharge is included in the falling stage model. The seasonal effect, as indicated in the preceding statements, is important at all four stations. Robust regressions were performed to test the sensitivity of the models to out-liers. The results are listed in Tables 3.6 and 3.7. The estimates are close for all stations, indicating that the possible outliers are not influencial in the estimation of the parameters. 32 The error model used for all four stations is a first-order autoregressive model. The AIC suggested a second-order autoregressive model for Walsingham. However, the standard errors of the load estimates are not very different one way or the other, thus, the simpler model is used. The residual plots show that all four stations have a few possible outliers. How-ever, no serious violations of the normality assumptions are noted in the normal probability plots. A slight degree of heterogeneity of variance is observed in some residual plots, but probably not serious enough to affect the estimation of the annual sediment load. Finally, it should be noted that the models developed for Brocket and especially Walsingham are not as effective, in terms of portion of explained variance in lnC, as the ones developed for Mission and Hansard. The same basic factors that explain between sixty and eighty percent of the variability in lnC at Mission, Hansard and Brocket only explain about forty five percent of the variability in In C at Walsingham. This finding is in agreement with the findings of other workers (Kellerhals et al. (1974) and Church et al. (1985)), in that models that work well for stations on big rivers tend to do poorly for those on small rivers. The reason could be that in small rivers, the sediment transport is more strongly regulated by short-term events such as storms, whereas in large rivers, these events would not be individually significant factors. 33 CHAPTER 4 COMPARISON OF ESTIMATED ANNUAL LOAD WITH PUBLISHED DATA One purpose of the models developed in Chapter 3 is to reduce the costly sample collection of the sediment concentration. A test of the usefulness of these models can therefore be made by computing the annual sediment transport of past years and comparing results with figures published by the Water Survey of Canada. In this present study, the annual sediment transport is calculated for several years per station and tabulated in tables 4.1 - 4.4 along with the published data. A simple computer program was written to calculate these values and the standard errors. It is found that although the estimates obtained are generally close to the values provided by the Water Survey of Canada for all stations, the accuracy is more consistent at Mission and Hansard than the others. As a further test, cross-validation estimates for the annual sediment transport are calculated for three years per station. That is, for each year, the year's sediment load is calculated using the model based on other years. The cross-validation test is especially important in this study as it will indicate whether or not future annual sediment loads can be well estimated using the model based on past years. As shown in Table 4.5, these estimates are as good as those obtained using the models based on data from all years. Hence, it is reasonable to assume that future annual loads can be as well estimated by the model based on past and present years as they can be by the models based on all years, including those for which the loads are estimated. Finally, to assess the precision of the estimates produced by the developed mod-els, the standard errors of the annual sediment loads are computed (see formulas in 34 Appendix A.4). The results listed in Tables 4.1 - 4.4 show that the estimates obtained for Mission and Hansard are much more precise than those obtained for Brocket and Walsingham. The standard errors calculated for Mission and Hansard are about 10 to 15 percent of the load estimates while the standard errors calculated for Walsingham are about 20 percent of the estimated loads and those for Brocket vary between 30 and 65 percent. From the above calculations, it can be concluded that the proposed models for Mission and Hansard are very capable of replacing the sampling scheme presently employed by the Water Survey of Canada. The models for Brocket and Walsingham, on the other hand, are not as successful. Although the annual sediment transport can be reasonably well approximated using the proposed models, the precision of the estimates, in terms of standard error, is lower than ideal for Walsingham and unacceptably low for Brocket. The above findings are not surprising. Other workers (Kellerhals et al. (1974), Church et al. (1985)) have found that sediment transport in small rivers is more unpredictable than in large rivers. The reason could be that in small rivers, the sediment transport is more strongly regulated by the local events such storms, whereas in large rivers, the effect of these events is somewhat "averaged" out. The sediment transport in large rivers is more strongly regulated by larger events such as snowmelt and seasonal yield of sediment from the land surface. This reasoning is supported by the improvements attained by some workers when additional information regarding rainfalls are included in the model (Church (1972)). 35 References [1] Abraham, C. E. (1969). Suspended sediment discharges in streams, A . G. U. Golden Anniversary Meeting in Washington, D. C. [2] Andrews, D. F. (1974). A robust method for multiple linear regression, Tech-nometrics, 16, 523-531. [3] Becker, R. A. and Chambers, J. M. (1984). S an Interactive Environment for Data Analysis and Graphics, Belmont: Wadsworth. [4] Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics, San Francisco: Holden-Day. [5] Bogen, J. (1980). The hysteresis effect of sediment transport systems, Norsk geog. Tidsskr., 34, 45-54. [6] Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control (revised ed.), San Francisco: Holden-Day. [7] Campbell, F. B. and Bauder, H. A. (1940). A rating-curve method for de-termining silt-discharge of streams, American Geophysical Union Trans., 21, 603-607. [8] Church, M. (1972). Baffin Island sandurs: a study of Arctic fluvial processes, Geographical Survey of Canada, Department of Energy Mines and Resources. [9] Church, M. , Kellerhals, R. and Ward, P. R. B. (1985). Sediment in the Pacific and the Yukon Region: review and assessment, Report, Canada Department of Environment and Inland Waters Directorate. [10] Cleveland, W. S. (1985). The Elements of Graphing Data, Monterey: Wadsworth. [11] Cochran, W. G. (1977). Sampling Techniques (3rd ed.), New York: Wiley. [12] Draper, N. and Smith, H. (1981). Applied Regression Analysis (2nd ed.), New-York: Wiley. [13] Duan, N. (1983). Smearing estimate: a nonparametric retransformation method, J. Amer. Statist. Assoc., 78, 605-610. [14] Ferguson, R. I. (1986). River loads underestimated by rating curves, Water Resources Research, 22, 74-76. 36 [15] Friedman, J. H. and Stuetzle, W. (1982). Smoothing of scatterplots, Depart-ment of Statistics, Stanford University, Tech. Report ORION006. [16] Gregory, K. J. and Walling, D. E. (1973). Drainage Basin Form and Process, London: Edward Arnold. [17] HofT, C. J. (1983). A practical Guide to Box-Jenkins Forecasting, Belmont: Lifetime Learning Publications. [18] Huber, P. J. (1972). Robust statistics: a review, Ann. Math. Statist., 43, 1041-1067. [19] Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte 'Carlo, Ann. Statist., 1, 799-821. [20] Kellerhals, R., Abrahams, A. D. and von Gaza, H. (1974). Possibilities for using sediment rating curves in the Canadian Sediment Survey Program, Report, Canada Department of Environment, Alberta Department of Highways and Transport, Research Council of Canada and University of Alberta, Department of Civil Engineering, REH/74/1. [21] Millard, S. P. and Guttorp, P. (1985). Hypothesis tests for regression models with autocorrelated errors. SIMS Tech. Report 106, Department of Statistics, University of British Columbia. [22] Miller, C. R. (1952). Analysis of flow-duration, sediment rating curve method of computing sediment yield, Report, U. S. Department of Interior Bureau of Reclamation. [23] Miller, D. M. (1984). Reducing transformation bias in curve fitting, Amer. Statist, 38, 124-126. [24] Sharma, T. C. and Dickinson, W. T. (1980). System model of daily sediment yield, Water Resources Research, 16, 501-506. [25] Smillie, G. M. and Koch, R. W. (1984). Bias in hydrologic prediction using log-log regression model, A . G. U. Fall Meeting. [26] Stichling, W. (1969). Instrumentation and techniques in sediment surveying, In: Hydrology, (Proc. Victoria, B. C. Symp., May 1969). [27] Stichling, W. (1973). Sediment loads in Canadian rivers, In: Fluvial processes and sedimentation, (Proc. University of Alberta, May 1973). 37 [28] Walling, D. E. and Teed, A. (1971). A simple pumping sampler for research into suspended sediment transport in small catchments, / . Hydrol, 13, 325-337. [29] Walling, D. E. and Webb, B. W. (1981). The reliability of suspended sediment load data, In: Erosion and Sediment Transport Measurement, (Proc. Florence Symp., June 1981), IAHS 133, 177-194. [30] Water Survey of Canada, Environment Canada, Sediment Data for Canadian Rivers, Environment Canada, (Formely: Inland Waters Branch, Department of Energy, Mines and Resources, Ottawa, Canada), Yearly publications, 1966-1983. [31] Weisberg, S. (1980). Applied Linear Regression, New York: Wiley. 38 A P P E N D I X A.I Bias correction factor for C with normal errors. The estimator for the sediment concentration obtained by taking the inverse transformation of In C of a model like ( A l .1) below, is a biased one. O n the assumption that the errors are normally distributed, a major portion of the bias can be removed by applying the correction factor derived here. Suppose for s implici ty that the true model for InC is l n C = fa + p\\n.Q + e, {Al.l) where e is normally distributed wi th mean 0 and constant variance c r 2 , and the fitted model is lnC = A> + A InQ. Inversely transforming the true and the fitted model yields C = exp(/?0)Q /Jl exp(e) and C = exp(/30)Q^. C is a biased estimator of E{C) = exp(/?0)Q/3l-£'(exp(e)). The bias of exp(/?0) for exp(/?0) and Q&1 for w i l l be small since both p\ and fix are based on a large number of observations. Hence, an approximately unbiased estimator of E(C) can be obtained by mul t ip lying C by an estimate of £(exp(e)) (Duan (1983) and Mi l l e r (1984)). If e ~ N(0, cr2), then E(exp(e)) = exp(cr2/2), by evaluating the moment generating function of e at 1. So, assuming a normal error, an estimated correction term is 39 exp(<72/2), where a2 is the residual mean square obtained from the regression of lnC on InQ. Similarly, this extends to a more general regression model. In this study, the correction is not done until after the load for the high-flow period has been estimated. Hence, the estimated correction term used is exp(Jp2/2), where <?p is the pooled residual mean square from the rising stage and the falling stage: [(rar - l)ay2 + (n/ - l)cfj2]/(nr + nj - 2). Although it is preferrable to correct the individual lnC,-, a2 and cf2 are close enough so that it makes no difference. A.2 A compar i son of two est imated bias correct ion terms. As discussed in Appendix A.I, exp(lnC) is a biased estimator of E(C). The biasing factor is £'(exp(e)). If the errors are normally distributed, this is equal to exp(cr2/2) and an estimate of the biasing factor is exp(o-2/2), where cr2 is the residual mean square obtained from the least squares regression. For the data in this study, the normality assumption may be in doubt. A nonparametric estimate of the biasing factor is n _ 1 £ " = i exp(ej), where {e,} are the residuals of the regression model. In this appendix, the asymptotic relative efficiency (Bickel and Doksum (1977)) of the nonparametric estimate, when the errors are normal, is studied. Assume the errors are correlated as in (A6.1). By standard asymptotic theory, the Central Limit Theorem, and formulas for the moments and moment generating function of a bivari-ate normal distribution, x/77(exp(«72/2) - exp(<72/2)) N (0, v,2), where v^2 = exp(o-2)<r4(l + p2)/2(l - p2), and V ^ X > x P ( e , - ) ) - exp(<r2/2)) N (O,^ 2), 40 where v22 = exp(<72) (exp(<r2) - lj + x (exp(<rV) - lj Pk)/(1-Pk)]). exp(<r2)(Er=1(^)-1^2fc[(l + It can be seen that v2 is less than v\. Thus, the asymptotic relative efficiency of exp(e>2/2) with respect to n~l YA=\ exP(e«) is greater than one. This is expected, as a parametric estimate is generally more efficient than a nonparametric estimate when the parametric model is true. A.3 B ias correct ion factor for C w i th generalized Gauss ian errors. In Appendix A.I, a bias correction factor was derived for C under the assumption that the errors are normally distributed. In practice, the underlying distribution of the errors will not be exactly normal; it may be "more peaked" with "lighter tails", or "less peaked" with "heavier tails". For the data at hand, the distribution of the residuals appears to be approximately symmetric but with "heavier tails" than the normal. In this Appendix, the biasing factor £(exp(e)) will be derived for the case of the errors being distributed according to the generalized Gaussian distribution f(x) = P exp(-\x/b\p), b>0, l<p<2, (A3.1) where (3 is a normalizing constant. This is a family of symmetric distributions with heavier tails than the standard normal distribution. Let e be distributed according to (A3.1). By using gamma integrals, it can be shown that /? = p(2bT(l/p))~1. Because f(x) is symmetric and has exponential tails, the odd moments of e are zero. The even moments are E{ek) = 2 Jo 41 So, assuming that the errors are generalized Gaussian, with p known, the bias correction term is If p > 1, £'(exp(e)) is finite for all values of 6 (this follows from the fact that if p > 1, E(exp(tX)) exists for all t). A study of the sensitivity of the bias correction term to the change in p can be carried out as follow. Fix a2 — a2, where <r2 is a residual mean square from one of the data sets. For each value of p between 1 and 2, calculate 6 that gives this <r2, then substitute this b in equation (A3.3) to obtain E(exp(e)). Tabulate p and E(exp(e)) as a function of p for several values of p between 1 and 2. Such tables for Mission and Oldman are provided in Tables 4.6 and 4.7. A.4 Integrat ion process for sediment load determinat ion. In this appendix, details of the integration process that is used to calculate the sediment load for the high-flow period are given. As discussed in Section 3.3, linear interpolation between Cj's and between <3,'s is performed. Let C(t) - a + bt be the linear interpolation between C(h) and C(t2), where tx < t2, and Q(t) = c + dt be the linear interpolation between Q(tx) and Q(t2). To estimate the sediment load at 42 time t (see Section 1.2b), where tx < t < t2, the following is used UJ) = clJ)Q(t) = (a + ~bt)(c + dt) Thus the total sediment load between *i and t2 is rti ft? / C(t)Q(t) dt = (a + bt)(c + dt) dt Jti Jti Without loss of generality, let ti = 0. Then a = C~1} c = Qi, b = (C2- Cl)/t2 and d = (Q2 - Qi)/<2 and rti Jo 1 C(t)Q(t) = C2Q2 (C2 - CiXQ2 -_Qi) + CxQj/ 2 6 2 Recall from Section 1.2b that Q is measured in m3/s and C is measured in mg/l. Hence, to estimate the sediment load for the first day, for example, t2 is replaced by 86400, which is the number of seconds in a day. The total sediment load estimate for the high-flow period is Load = «'=2 since Qi are obtained daily. A.5 E x p e c t a t i o n a n d variance of the sediment load. In this appendix, the expectation and the variance of sediment load for the high-flow period are derived assuming the model like (A5.1) below. These terms are needed for obtaining the standard error of the sediment load estimate. 43 A s shown in Appendix A . 4 , the total sediment load for the high-flow period can be approximated by CjQj _ (Cj - CViXQj - Qi_i) C. - tQ,- ! -2 6 2 This can be writ ten as £"=i a ' C '» ' where a{ = [(2Qi/3)+(Qi-1/6)+(Qi+1/6)] f o r i = 2.. .n-1, ai = (Q2/6) + (Qi/3) and a„ = (Q„/3) + (Q„_i/6). To calculate the expectation and the variance of the above load, E(Ci), Var(C,) and Cov(C,-, C; + ( ) are needed (since Var(E: = 1 OiCt) = E ? = 1 a.^Var^) + 2 £ E . ^ a ^ C o v ^ , C,-)). Load = 86400 ] T Suppose for simplici ty that lnC,- = /?o + 0ilnQ,- + e,-, (J45.1) where e,- =~ JV(0, <r2), et- = pej-j + rn, rn are independent and identically distributed random variables and 62 + p2a2 = a2. Then InQ ~ N(/3Q + /?i InQ,-, a2). (A5.2) Hence, E(Ci) = exp(/?0 + Pi InQ,- + <r2/2), by evaluating the moment generating function (m.g.f.) of lnC,- at 1. The variance of the high-flow period load is a function involving Var(Q) = E{C2) — E{Ci)2 and Cov(Q, Ci+t) - E(CiCi+t)-E(Ci)E(Ci+t). E(C2) can be obtained by evaluating the m.g.f. of InCj at 2., so that Var(C,-) = exp^2[/?0 + /?i InQ; + <r2/2]^ - exp^2[/?0 + /?i InQ,] + a2^. E(CiCi+t) can be calculated by noting that 44 and that E(CiCi+t) = £ ( e x p ( l n C j ) exp( lnCj + t ) ) is the joint moment generating func-tion of the bivariate normal distribution with both arguments equal to 1. Hence, E(CiCi+t) = e x p ( w + la+t + (2cr2(l + p*))/2) and Cov(Ci , Ci+t) = exp^i + in+t + a2(l + p*)J - E(Ci)E(Ci+t), where m = £ ( l n C , ) , and m+t = E(\nCi+t) are given in (A4.2). The variance of the load can be as indicated in the first paragraph of this ap-pendix. The standard error of the load estimate can be obtained by replacing f?(lnC;), a2, and p, with their estimates in i /Var(Load). A.6 M a x i m u m l ike l ihood estimators. As mentioned in Section 3.2a, the unknown parameters in the models (3.5) and (3.6) can be estimated using the maximum likelihood method. However, as this method is computationally more difficult, an alternative method was used. Following is a discussion of the maximum likelihood estimates (m.l.e.) and how they can be obtained. These estimates are adaptations of those in Millard and Guttorp (1985). Suppose for simplicity that there n years of data with T consecutive sampled days in each year and that the true model for l n C t is \nCit =/30 +0i\nQit + €it, eit = pei,t-i + na, t = 2,...T, i = l,...n, (A6.1) {r]t} are iid N(0, 62) random variables and 82+p2a2 — c2, and e^ are independent vectors. 45 where (46.2) The likelihood function is L(a\ p, ft, A)= ^ r ^ l V l ^ ^ e x p ^ - I e V - 1 ^ , where and -A) ft>-- A -inQiT) /£ 0 •• 0 \ 0 E ... o \ 0 0 ..' E J with E being the covariance matrix in (A6.2). It should be noted that there are two cases to consider. In one case, only the data during the high-flow period are included in the analysis. In the other case, data over the whole year are analyzed. In the former case, n and T are defined as above. For the latter case, all the years on record can be joined into a sequence of observations, T is the total number of days on record and n is taken to be 1. The maximum likelihood estimates of /?0, p and a2, can be obtained by solving the derivative equations of the logarithm of the likelihood function with respect to these parameters. These equations are simplified to 0 = X'A~lX - l X'A~lY, {A&.Z) a2 = [(1 - fjt)nT\-1e'A-1e, (A6A) and where « = 1 j=2 X = \ 1 l n Q n T / InCn Y = J n C , V and A — — a1 nT ' (A6.5) 46 Combining ( A 6 . 4 ) and ( A 6 . 5 ) and simplifying yield a2 m (nT)~lY^H€Xr ^ o r tnese calculations, the identities det = (1 - p2)n and - i / i -P o - P (1 + P2) -P 0 -p (1 + p2) 0 0 V o o 0 0 \ 0 0 0 0 (1+P 2) -p, -p 1 / are used, where ft = E/cr2. No explicit forms exist for the m.l.e.'s of these parameters, but they can be obtained using the following iterative algorithm. 1. Estimate /?0 and /3X using ordinary least squares. 2. Estimate p using the residuals obtained in the previous step with n T E E » = 1 j=2 t = l j=2 n T P=^2^2eiJeiJ-i/^2Y^eh-3. Estimate /?0 and fa in ( A 6 . 3 ) by substituting p with p in A — A(p). 4. Estimate p as in step 2 using the residuals obtained in step 3 . 5. Repeat steps 3 and 4 until estimates of /30, /?j and p converge. 6. Estimate a2 with a2 = (nT)"1 Er=i Ej= 2 <r When the normal model for the errors is true, the method of estimation described above is preferred over the one suggested in Chapter 3. When the error model is an approximation, the method used in Chapter 3 is preferred since it is probably more robust to the normality assumption. The relationship between sediment concentration and water discharge is not dependent on the error model. 47 Xi The observed discharge on day i, Qi. X2 The first-order estimate of the relative change of discharge on day i. X3 The time in days to the first hydrograph peak encountered by scanning backwards through the discharge record. X4 The difference in discharge between the largest flow of the year up to and including day i - l and the flow on day i. X$ The number of days since the major spring rise, defined as the minimum j for which Q(j) > 5mirn<n<,Q(n). XQ The number of days counted backward from day i for which Q(i - m) < Q(i), limited to a maximum of 731 days. Table 2.1 The prediction factors investigated by Kellerhals et al.. 48 submodel P adjusted R2 InQ 2 2010.2 .66 In Q - i o 2 4988.8 .34 abs[rate) 2 5892.8 .24 dslp 2 4561.8 .38 prMadis 2 4830.9 .36 InQ, In Q - i o 3 344.4 .84 InQ, abs(rate) 3 1774.5 .69 InQ, dslp 3 1841.6 .68 InQ, prMadis 3 505.7 .82 InQ, lnQ_ io , abs(rate) 4 328.1 .84 InQ, l n Q _ 1 0 , dslp 4 317.8 .84 InQ, l nQ_ i o , prMadis 4 5.9 .88 InQ, lnQ_ io , abs(rate), dslp 5 303.0 .85 InQ, lnQ_xo, abs(rate), prMadis 5 4.0 .88 InQ, lnQ_ i o , <is/p, prMadis 5 7.5 .88 InQ, l n Q _ 1 0 , a6.s(raie), ds/p, prMadis 6 5.6 .88 Table 3.1 CP and adjusted R2 statistics for some of the better rising stage models of Mission. 49 submodel P cP adjusted R2 . InQ 2 151.6 .81 In Q-io 2 1379.8 .66 abs(rate) 2 5154.4 .20 dslp 2 1777.1 .61 prMadis 2 6676.4 .02 InQ, ln<3_io 3 56.2 .82 InQ, abs(rate) 3 118.1 .81 InQ, dslp 3 151.8 .81 InQ, prMadis 3 133.6 .81 InQ, lnQ_io, abs(rate) 4 29.2 .82 InQ, lnQ_io, dslp 4 39.1 .82 InQ, lnQ_io, prMadis 4 47.6 .82 InQ, lnQ_io, abs(rate), dslp 5 9.3 .83 LnQ, lnQ_ 1 0 , abs(rate), prMadis 5 17.9 .83 InQ, lnQ_ 1 0 , dslp, prMadis 5 39.2 .83 InQ, lnQ_ 1 0 , abs(rate), dslp, prMadis 6 8.4 .83 Table 3.2 Cp and adjusted R2 statistics for some of the better falling stage sub-models of Mission. 50 lag 1 2 3 4 5 6 7 8 9 10 P(0 .6819 .5840 .5017 .4592 .3808 .3473 .3276 .2878 .2785 .2711 pac(i) .6819 .2224 .0794 .0843 -.0258 .0360 .0527 -.0090 .0476 .0390 Table 3.3 Sample autocorrelations and sample partial autocorrelations of the errors in the Mission model. order n E L i ln(l - pac\i)) + 2k 1 -1559 2 -1684 3 -1698 4 -1713 5 -1713 6 -1713 7 -1719 8 -1717 9 -1721 10 -1723 Table 3.4 AIC(k) - nlncr used to select the order of the autoregressive model for the errors in the Mission model. 51 station rising stage model falling stage model error model Mission InC,- = -0.40 4- 3.40Xi,- + 2.05X2|- - 1.40lnQ_10l- InCi = -9.09+ 1.59 InQ,- ii = 0.83e,_i Hansard InC; = -1.29 4- 1.59lnQt - 0.66lnQ_10; InC,- = -1.63 + 0.98 InQ,- e,- = 0.76e,_i Brocket laCi = 0.33 + 1.82 ln Qi - 1.00 lnQ. 6 i InC,- = -6.52 + 0.32Fit + 2.09r2i U = 0.83ei_i Walsingham InC,- = 1.90 + 1.11 In Qi InC,- = 3.45 + 2.03lnQ,- - 1.51 lnQ. u ii = 0.70e<_i tu/iereyi = ( l n Q - 4 ) x ( l - £ ; ) and XI = (InQ - 8) x (1 - D) Y2 = (lnQ x E) x ((1 - D) x 4) X2 = (InQ x D) x ((1 - D) x 8) / 0 , « y I n Q < 4 n _ / 0 . •'/ l n Q < 8 \ l , i / lnQ>4 \ 1 , « 7 lnQ>8 Table 3.5 Least squares fitted models. station variable least squares min. abs. res. Huber Andrews Mission intercept -0.40 -0.68 -0.51 -0.52 XI 3.40 3.42 3.41 3.43 X2 2.05 2.09 2.05 2.05 InQ-io -1.40 -1.41 -1.38 -1.38 Hansard intercept -1.29 -1.00 -1.30 -1.31 InQ 1.59 1.55 1.58 1.5S lnQ_io -0.66 -0.66 -0.64 -0.64 Brocket intercept 0.33 0.51 0.34 0.36 InQ 1.82 1.97 1.89 1.91 lnQ_ 8 -1.00 -1.23 -1.08 -1.11 Walsingham intercept 1.90 1.91 1.91 1.92 InQ 1.11 1.09 1.10 1.09 Table 3.6 Rising stage models' coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions. station variable least squares min. abs. res. Huber Andrews Mission intercept -9.09 -8.97 -8.93 -8.93 InQ 1.59 1.58 1.57 1.57 Hansard intercept -1.63 -1.07 -1.42 -1.35 InQ 0.98 0.89 0.95 0.94 Brocket intercept -6.52 -6.73 -6.62 -6.65 YI 0.32 0.26 0.28 0.29 Y2 2.09 2.13 2.11 2.12 Walsingham intercept 3.45 3.46 3.45 3.45 InQ 2.03 1.83 1.92 1.91 lnQ_i -1.51 -1.32 -1.41 -1.40 Table 3.7 Falling stage models' coefficients estimated using least squares, minimum absolute residual, Huber and Andrews regressions. 53 station rising stage model falling stage model error model Mission , c _ { Po + Pi In Qi + In Q-xoi + for In Q,- > 8 ' \ p'0 + p[ hi Qi + p2 In Q_10l- + a, for In Qi < 8 subject to continuity at In Qi — 8 \nCi-po + Pi InQ, + fi et = pei_i + m Hansard In Ci = Po + Pi In Q,- + fa In Q_ ioi + fi In Ci = /?0-r Pi lnQi + e,- €i = /?£,•_ i + rn Brocket In d = ft + A In Q,- + Pi In Q_8l- + c,-f /?0 + AlnQ.+fi, forlnQ,->4 ' ~ \P'o + fi[^Qi-r€i, forlnQi<4 subject to continuity atlnQi = 4 ti = pe,_i + rji Walsingham lnC,- = ft-r/MnQi + fi In d = Po + Px In Qi + & In Q_ l f + f ,• ei = pci_i + m Table 3.8 The proposed models for Mission, Hansard, Brocket and Walsingham. station ( £ e x p ( e t-))/n exp(cx2/2) Mission 1.0661 1.0626 Hansard 1.1123 1.1067 Brocket 1.4618 1.3893 Walsingham 1.2712 1.2458 Table 3.9 Numerical comparison between two estimates of exp(<r2/2). 55 year estimated annual load published annual load standard error c.v. 1966 21.5252 x 10s 22.7491 x 106 2.3201 X 106 10.77 1969 15.8766 x 10s 17.3299 x 106 1.7522 x IO6 11.03 1972 31.5177 x 106 34.1606 x 106 3.6349 x 106 11.53 1974 25.4903 x 106 27.5183 x 106 2.8044 x 10s 11.0 1976 26.8747 x 106 27.4562 x 106 2.6212 x 106 9.75 1979 12.3572 x 106 15.0096 x 106 1.4485 x 106 11.72 1982 21.2988 x 106 25.5634 x 106 2.3096 x 106 10.84 Table 4.1 Estimated loads vs. published data for Fraser at Mission, (c.v. is the coefficient of variation). year estimated annual load published annual load standard error c.v. 1972 4.64268 x 10s 4.787251 x 106 0.69344 x 106 14.90 1974 3.84450 x 106 2.90580 x 106 0.53203 x 106 13.83 1976 3.64673 x 106 3.36528 x 106 0.40598 x 106 11.13 1978 1.57306 x 106 1.75841 x 106 0.17586 X 106 11.17 1980 1.98128 x 10s 1.72463 x 106 0.21311 x 106 10.75 1981 2.21504 x 106 1.58250 x 106 0.31622 x 106 14.27 1982* 4.10280 x 10s 4.26429 x 106 0.51429 x IO6 12.53 Table 4.2 Estimated loads vs. published data for Fraser at Hansard, (c.v. is the coefficient of variation). * figures given for period between January and September only. 56 year estimated annual load published annual load standard error cv. 1967 825659 996061 392730 47.56 1968 82606 85679 34615 41.90 1969 248710 274020 110001 44.22 1972 596555 753428 270590 45.35 1974 291576 306314 108947 37.36 1975 785850 1222067 516770 65.75 1978 143026 221912 47477 33.19 Table 4.3 Estimated loads vs. published data for Oldman near Brocket, (cv. is the coefficient of variation). year estimated annual load published annual load standard error cv. 1968 26585 31258 6400 24.07 1971 13747 14813 2433 17.69 1973 28807 24747 5306 18.41 1977 31628 37281 5111 16.15 1979 36017 30422 6318 17.54 1981 22241 20476 3570 16.05 1983 26942 23323 3076 11.41 Table 4.4 Estimated loads vs. published data for Big Creek near Walsingham. (cv. is the coefficient of variation). 57 station year cross-validation estimate estimate based on all years 1966 21.2879 x 106 21.0585 x 106 Mission 1972 31.1083 x 106 31.1629 x 106 1979 11.8456 x 106 12.0256 x 106 1974 3.9557 x 106 3.8235 x 106 Hansard 1978 1.5295 x 106 1.5451 x 106 1982 4.1303 x 106 4.1641 x 106 1967 822611 824675 Brocket 1972 581562 595551 1978 139010 142139 1973 29078 28807 Walsingham* 1979 36782 36017 1983 27306 26942 Table 4.5 Cross-validation estimates vs. estimates based on all years. * figures given for the whole year. 58 p 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 6 0.28590 0.32132 0.35291 0.38095 0.40577 0.42771 0.44713 0.46433 0.47958 0.49313 £(exp(€)) 1.06422 1.06384 1.06357 1.06335 1.06318 1.06304 1.06293 1.06283 1.06275 1.06268 Table 4.6 Sensitivity of the estimated bias correction factor to the change in the generalized Gaussian parameter p for Mission. P 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 b 0.66489 0.74726 0.82073 0.88593 0.94365 0.99469 1.03984 1.07983 1.11530 1.14682 £(expTe)) 1.45798 1.43859 1.42546 1.41597 1.40879 1.40318 1.39867 1.39498 1.39190 1.38930 Table 4.7 Sensitivity of the estimated bias correction factor to the change in generalized Gaussian parameter p for Brocket. 61 5 5 5 5 5 5 s 10 5 5 5 5 5 5 6 5 e 4 .44 c 5" 6 ^ 4 \ 4 5 3 7 A# 10 »10 7T_ 7 10 8 J 7 7 _8J ^ 8 2 | 3 1 2 3 11 12 8 W 1 1 9 12112 23 6.5 7.0 7.5 8.0 8.5 9.0 9.5 InQ Figure 3.3 Scatter plot of In Q vs. In C for Mission 1973, by month. 63 o c CO CO in co 6.5 'TV* • • • • • • / V V u • • *? -J- X. _ • Iowess f=.2 7.0 7.5 8.0 8.5 9.0 InQ 9.5 10.0 Figure 3.4a Scatter plot of InQ vs. InC for the rising stage of Mission. oo O CO h CO r 6.5 . . .•V 3 . .*• **• Iowess f=.2 7.0 7.5 8.0 8.5 9.0 9.5 lnQ_10 Figure 3.4b Scatter plot of l n Q _ 1 0 vs. InC for the rising stage of Mission. 64 to 8 pred Figure 3.5a Predicted lnC vs. Studentized residual for the rising stage of Mission. OJ r-o h CM L CO 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 InQ Figure 3.5b In Q vs. Studentized residual for the rising stage of Mission. 65 -4 -3 -1 rank Figure 3.6a Normal probability plot for the residuals of the Mission. rising stage, C D CO o h CM L -3 -2 rank Figure 3.6b Normal probability plot for the residuals of the falling stage, Mission. 66 in e(t-1) Figure 3.7a Scatter plot of e(t - 1) vs. e(i) for Mission. 67 e(t-2) Figure 3.7b Scatter plot of e(i — 2) vs. e(i) for Mission. 68 sample partial autocorrelation sample autocorrelation -0.2 0.0 0.2 0.4 0.6 0.8 0.2 0.3 0.4 0.5 0.6 0.' CO CD 1 ' 2 3 4 5 6 7 pred Figure 3.10a Predicted lnC vs. Studentized residual for the rising stage of Mission. 70 CO •.* •••• .•• - V .V CO I 6 - 5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 InQ Figure 3.10b InQ vs. Studentized residual for the rising stage of Mission. 71 CO CO CO CO 6.5 .1 t 7.0 7.5 9.0 9.5 8.0 8.5 InQjO Figure 3.10c l n Q_i 0 vs. Studentized residual for the rising stage of Mission. 72 to CD CO Cxi L_ pred Figure 3.11a Predicted InC vs. Studentized residual for the falling stage of Mission. 73 CO CM CO CO CM 7.0 7.5 8.0 8.5 LnQ 9.0 9.5 10.0 Figure 3.11b InQ vs. Studentized residual for the falling stage of Mission. 74
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical modelling of sediment concentration
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical modelling of sediment concentration Thompson, Mai Phuong 1987
pdf
Page Metadata
Item Metadata
Title | Statistical modelling of sediment concentration |
Creator |
Thompson, Mai Phuong |
Publisher | University of British Columbia |
Date Issued | 1987 |
Description | One technique that is commonly used to replace the costly daily sampling of sedimentconcentration in assessing sediment discharge is the "rating curve" technique. This technique relies on the form of the relationship between sediment concentration and water discharge to estimate long-term sediment loads. In this study, a regression/time-series approach to modelling the relationship between sediment concentration and water discharge is developed. The model comprises of a linear regression of the natural logarithm of sediment concentration on the natural logarithm of water discharge and an autoregressive time-series of order one or two for the errors of the regression equation. The main inferences from the model are the estimation of annual sediment loads and the calculation of their standard errors. Bias correction factors for the bias resulted from the inverse transformation of the natural logarithm of sediment concentration are studied. The accuracy of the load estimates is checked by comparing them to figures published by Water Survey of Canada. |
Subject |
Sedimentation and deposition -- Mathematical models |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-07-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0097224 |
URI | http://hdl.handle.net/2429/26648 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1987_A6_7 T46_4.pdf [ 3.72MB ]
- Metadata
- JSON: 831-1.0097224.json
- JSON-LD: 831-1.0097224-ld.json
- RDF/XML (Pretty): 831-1.0097224-rdf.xml
- RDF/JSON: 831-1.0097224-rdf.json
- Turtle: 831-1.0097224-turtle.txt
- N-Triples: 831-1.0097224-rdf-ntriples.txt
- Original Record: 831-1.0097224-source.json
- Full Text
- 831-1.0097224-fulltext.txt
- Citation
- 831-1.0097224.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0097224/manifest