Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Quantifying drainage basin comparisons within a knowledge-based system framework Cheong, Anthony Leonard 1992

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1992_spring_cheong_anthony_leonard.pdf [ 5.02MB ]
JSON: 831-1.0098944.json
JSON-LD: 831-1.0098944-ld.json
RDF/XML (Pretty): 831-1.0098944-rdf.xml
RDF/JSON: 831-1.0098944-rdf.json
Turtle: 831-1.0098944-turtle.txt
N-Triples: 831-1.0098944-rdf-ntriples.txt
Original Record: 831-1.0098944-source.json
Full Text

Full Text

QUANTIFYING DRAINAGE BASIN COMPARISONSWITHIN A KNOWLEDGE-BASED SYSTEM FRAMEWORKbyANTHONY LEONARD CHEONGB.Sc., The University of Victoria, 1987A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE STUDIES(Department of Geography)We accept this thesis as conformingto the required standard• 01• nTHE UNIVERSITY OF BRITISH COLUMBIAOctober, 1992© Anthony Leonard CheongThe University of British ColumbiaVancouver, CanadaDepartment of etticc,piNdIn presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Date ()(1 DE-6 (2/88)AbstractBecause of the subjectivity in previous methods of drainage basincomparisons, there is a need to quantify such procedures. This is in order tobe make objective comparisons and to transfer results among different studies.Quantification is best effected within the framework of a knowledge-based system.Three types of drainage basin characteristics may be identified: morphometric,biogeophysical and historical. These data exist in nominal, ordinal, interval andratio form. Quantification of morphometric information appears to be the moststraightforward and should probably play a dominant role in any similaritycomparison as many processes are related to these characteristics.These variables can be analysed using a procedure which incorporates alllevels of information; nominal and ordinal information are used as filters whileinterval and ratio information are used to calculate dissimilarity—a form ofeuclidean distance measure. Because much of this information is availablein digital form—morphometry from digital terrain models, for example—thesimilarity procedure is best developed within a knowledge-based systemframework. With the use of a pseudo-relational information base and searchprocedures formalised through artificial intelligence theory, knowledge can bestored within the system and used to effect the analysis.Thirty one characteristics are measured of 65 drainage basins in the QueenCharlotte Islands, British Columbia. Dissimilarities between the basins arecalculated for both untransformed and transformed information. Because of thenon-normality of variables, the tests are carried out on both sets of information toassess the robustness of the various statistical procedures. Two tests are performed:iion all information, and on selected variables.For all information, the transformed data provide a much smaller dissimilarityvalue with respect to both range and magnitude. Area and magnitude are dominantvariables with repect to the dissimilarity and partially control the resultingclusters. The dissimilarity (not including filters) does not appear to be relatedto geographical proximity, however, proximity is important when using geologyas a filter. The transformation of the information eliminates many dissimilaritydistances so large as to be outliers in the overall distribution.Principal component analysis and correlation studies are used to decrease thenumber of variables used in a second analysis. Magnitude, area, main channellength, relative relief, channel gradient and the number of second order channelsare determined to be 'important' in analysing dissimilarity based on hydrology.While most findings are similar to that of the all variable test, a useful result ofthis investigation is to decrease the analysis time by about 50 percent.Table of ContentsAbstract 	 iiTable of Contents 	 ivList of Tables 	 viiList of Figures 	viiiAcknowledgement 	 xiChapter 1 Introduction 	 1Chapter 2 Drainage Basin Description 	 72.1 Introduction 	 72.2 Geomorphometry, Sampling and Representation 	 82.3 The Drainage Network and the Problem of Lakes 	 152.4 Biogeophysical Parameters 	 192.5 Historical Parameters 	 272.6 Conclusion 	 28Chapter 3 Drainage Basin Comparison 	 303.1 Introduction 	 303.2 Similarity 	 333.3 A Comprehensive Dissimiliarity Testing Procedure 	 353.4 Classification 	 363.5 Conclusion 	 40Chapter 4 A Knowledge-Based System Framework 	 424.1 Introduction 	 424.2 Databases as a Source of Basin Information 	 42iv4.2.1 Digital Terrain Models—Geographic Databases 	 424.2.2 Raster versus Vector	 444.2.3 TRIM—a B. C. DTM	 444.2.4 Databases	 464.3 Artificial Intelligence	 504.4 Knowledge Representation andKnowledge-Based Systems 	 534.5 Conclusion 	 56Chapter 5 COMPARE: A Knowledge-Based System 	 585.1 Introduction 	 585.2 Information Gathering 	 585.2.1 Data Sources 	 585.2.2 MORPHCALC: a Knowledge-Base 	 605.3 Information Analysis 	 605.3.1 The Information Base 	 625.3.2 COMPARE: a SectionalKnowledge-Base 	 625.3.3 Statistics 	 635.4 Using the System 	 645.5 Conclusion 	 67Chapter 6 Drainage Basin Comparison in the Queen Charlotte Islands 	 686.1 Introduction 	 686.2 Distribution Analysis 	 68v6.3 Basin Dissimilarity Testing: Test 1—All Variables 	 746.3.1 Dissimilarity Distributions 	 746.3.2 Cluster Analysis and Proximity Relations	 766.4 Basin Dissimilarity Testing: Test 2—Using Specific Variables 	 796.4.1 Principal Component Analysis	 796.4.2 Morphometry and Discharge 	 826.4.3 Dissimilarity Distributions	 826.4.4 Cluster Analysis and Proximity Relations	 826.5 Discussion of Critical Dissimilarity 	 876.6 Conclusion 	 91Chapter 7 Conclusions 	 93Bibliography 	 96Appendix A Geomorphometry and Statistical Distributions 	 104Appendix B Analysis of the Terrain Resource Information ManagementSystem (TRIM) Digital Terrain Model and DrainageBasin Morphometry 	 117Appendix C COMPARE: Sections of the Knowledge-Base 	 122viList of TablesTable 1 Optimum Sample Size Determined from Random Sampling ofTopography in the Queen Charlotte Islands 	  15Table 2 Drainage Basin Characteristics 	  29Table 3 Comparison Between Raster and Vector Data 	  45Table 4 Distribution Characteristics of Certain Morphometric Parameters 	 71Table 5 Principal Component Analysis of 27 Variables 	  83Table 6 Variables and Weights for Test 2 	  84Table 7 Principal Components of Drainage Basin Variability 	  86viiList of FiguresFigure 1 Mercer Lake Tributary Drainage Basin, Queen Charlotte Islands . 13Figure 2 The Effects of Increasing Sample Size on the Mean Elevation of theMercer Lake Tributary Drainage Basin 	 14Figure 3 Proportional Distancing and Network Order 	  18Figure 4 Lake Drainage Length 	  20Figure 5 Geology of the Queen Charlotte Islands 	  22Figure 6 Fault and Fracture Measurements of the Queen Charlotte Islands 24Figure 7 Precipitation and Soils on the Queen Charlotte Islands (afterValentine, 1978 and Hogan, 1985) 	  25Figure 8 Digital Terrain Models	  43Figure 9 Database Types 	  48Figure 10 Search Strategies 	  52Figure 11 Components of the Knowledge-based System 	  59Figure 12 Outline of MORPHCALC	  61Figure 13 Using the Knowledge-based System 	 65Figure 14 Distributions of Two Morphometric Variables and theirTransformations 	 70Figure 15 Distributions of Dissimilarities: All Variables 	  75Figure 16 Cluster Tree of the Dissimilarity of Drainage Basins Using AllVariables 	  77Figure 17 Cluster Tree of the Dissimilarity of Drainage Basins Using AllVariables and Transformed Information. 	  78Figure 18 Proximity and Dissimilarity Relations for Two Basins — Test 1 . 80vu'Figure 19 Proximity and Dissimilarity Relations for Two Basins — Test 1Using Geology as a Filter 	  81Figure 20 Distributions of Dissimilarities of Test 2 	  85Figure 21 Cluster Tree of Dissimilarities of Test 2 (NontransformedInformation)   88Figure 22 Cluster Tree of Dissimilarities of Test 2 (Transformed Information) 89Figure 23 Proximity and Dissimilarity Relations of Two Drainage Basins —Test 2   90Figure 24 Distributions and Transformations of Morphometric Variables .. 108Figure 25 Distributions and Transformations of Morphometric Variables .. 109Figure 26 Distributions and Transformations of Morphometric Variables .. 110Figure 27 Distributions and Transformations of Morphometric Variables .. 111Figure 28 Distributions and Transformations of Morphometric Variables .. 112Figure 29 Distributions and Transformations of Morphometric Variables .. 113Figure 30 Distributions and Transformations of Morphometric Variables .. 114Figure 31 Distributions and Transformations of Morphometric Variables .. 115Figure 32 Distributions and Transformations of Morphometric Variables .. 116Figure 33 TRIM Specifications 	 118Figure 34 Compare.fnc 	  123Figure 35 Sumdist.fnc 	  126Figure 36 Geomdist.fnc 	  127Figure 37 Topdist.fnc 	  128Figure 38 Bgpdist.fnc 	  129Figure 39 Histdist.fnc 	  130Figure 40 Interval.fnc 	  131ixFigure 41 Ratio.fnc 	 132Figure 42 Filter.fnc 	 133AcknowledgementFirst and foremost, I would like to thank my supervisor, Dr. Michael Church,for all of his support, advice and encouragement throughout my stay at U.B.C.This project was funded by Steve Chatwin, Research Branch, British ColumbiaMinistry of Forests through the Canada/British Columbia Fish-Forestry InteractionProgram, and by Department of Geography teaching assistantships and NSERCsummer stipends. TRIM data were provided by Dr. Rostam Yazdani of theBritish Columbia Ministry of Crown Lands. I would also like to thank Dr. BrianKlinkenberg, Dr. Graham Thomas, Dr. Phil Austin, Dr. Gordon McBean, JamieVoogt, Dr. John Wolcott, Sandy Lapsky, Yanni Xiao, Dr. Sue Grimmond, andDr. Catherine Souch for their unending help. Elaine Cho, Catherine Griffiths, andShannon Sterling, my personal cartographers and ice cream buddies, producedmany of the diagrams. Very special thanks to my family and good friends—Roy,Kelly, Lilis, Anne, Melissa, Henry and Rick— for all their love and tolerancethrough my two degrees. Last but not least, my endless gratitude and admirationto Dan Hogan, British Columbia Ministry of Forests, who never failed to amazeme with his knowledge, endless support, and bottomless heart.xiChapter 1: IntroductionThe phrase 'experimental method' has been somewhat controversial in geo-morphology. Some researchers argue that strict scientific method is difficult toachieve in earth sciences due to the practical impossibility to account for or con-trol all of the relevant variables (Ahnert, 1980; Church, 1984; Slaymaker, 1991).Therefore, geomorphologists have had to adopt a rather loose definition of an`experiment'—encompassing precisely controlled field studies, 'before and after'studies and stratified/paired comparisons (Slaymaker, 1991).Treatment-response studies—which may belong to any of the above cate-gories—comprise a large portion of the 'experiments' conducted in geomorphol-ogy and hydrology. Two different approaches exist for such experiments. Onemethod, Slaymaker's (1991) quasi-experiments, requires long term observation ofthe 'natural system,' some treatment to be applied, and then inspection of anychanges that occur. The other method, Slaymaker's (1991) hybrid experiments,requires two identical systems: one to be used for a control and the other fortreatment purposes. However, finding two 'identical' systems appears to be animpossible task. Because of an extremely large number of environmental factorsacting on a landscape, no two landscape units appear to be identical in everyrespect. Therefore, it is necessary to accept some degree of variation. Problemsarise when trying to determine an acceptable degree of similarity for purposesof comparison.Historically, this has entailed a subjective decision based upon the processesor features in question, within terrain which is technically and economicallyfeasible to analyse. With respect to drainage basins, the Wagon Wheel Gapstudy begun in 1910 by C. Bates and A. Henry was probably the first attempt toincorporate biogeophysical characteristics such as geologic structure, precipitation,and vegetation, and simple basin morphometry, such as elevation and steepness,to determine 'similar' units (Rodda, 1976). Since this historic study, numerousfield 'experiments' have been carried out to examine changes caused by activetreatment, such as the effect of land use on hydrology (e.g., urbanization: Leopold,1972; forest harvesting: Riekirk, 1989), as well as passive treatment, such as theeffect of geology on water quality (Dillon and Kirchner, 1975).Many of these studies have encountered problems identifying similar unitsfor paired comparisons. Hogan (1985) found that classical selection criteria suchas biogeophysical characteristics and basic morphometry are inadequate for hy-drologic purposes. Other researchers have concluded that greater emphasis mustbe placed on a more detailed study of landscape morphometry when determiningterrain similarity (Melton, 1957; Zavoianu, 1985). Hewlett et al. (1969) summa-rized many of the resulting problems with experimental basins, emphasising theirnonrepresentativeness and the consequent difficulty of transferring results fromone landscape to another. They acknowledged, however, that the use of experi-mental drainage basins "has contributed valuable knowledge to [the] managementof land and to the science of hydrology" (Hewlett et al., 1969, p. 313).More current research has analysed the relations among various morphometricvariables (de Villiers, 1986; Tarboton et al., 1989) and classified similar landscapeunits based upon a detailed analysis of morphometry (Ebisemiju, 1986). Althoughthese studies have quantified landscape comparison procedures more rigourously,greater emphasis needs to be placed on the range of the variability in landscapecharacteristics and a measure of what constitutes acceptable 'similarity.'2An even more fundamental issue is the selection of an appropriate landscapeunit for study, because it is necessary to know what feature is being characterisedbefore it can be described accurately. In fluvial landscapes—that is, in virtually allterrestrial landscapes—two basic geomorphic units can be identified, the hillslopeand the stream channel reach. Many geomorphic processes, mass wasting andsedimentation for example, occur within these units. The drainage basin is acombination of hillslopes and channels. Horton established the erosional drainagebasin as the fundamental geomorphic unit in many terrains because it appears tobe (Chorley, Schumm, and Sugden, 1984):1. a limited, convenient, usually clearly defined and unambiguous topographicunit; and2. a physical process-response system open to material and energy transfersystems.Drainage basins also provide a 'biological unit.' For example, drainage networks,encompassed within a basin, may form aquatic ecosystems. More simply, drainagebasins appear to exemplify all the usual landscape forming processes.If land units could be adequately characterized knowledge and experiencecould be used with greater confidence for land management. This would alsomean that the possibility arises to transfer results, and a basis becomes availableupon which to judge the similarity of basins.This discussion identifies two major issues which will be the focus of thisstudy. The first is to develop an operationally objective procedure for selectingcomparable terrain units; the second is to create a procedure for recordinglandscape comparisons so that experience with land management or terrain history3and condition can be projected into another area with some confidence.These problems will be analysed by:1. identifying critical criteria for drainage basin comparisons and an objectivemethod to effect the comparison;2. determining efficient means to collect and manipulate the required data;3. conducting regional studies within the Queen Charlotte Islands, British Co-lumbia, to demonstrate the system and to determine the range of regionalvariability of critical criteria.The last procedure will establish a basis to determine what constitutes similaritybetween landscape units.Because of the large amounts of data which need to be manipulated, thisproject has been undertaken in a computerized environment. A number ofconcepts in computer science prove useful for this study, including databases,artificial intelligence and information systems. The phrase 'database' refers tothe storage of data in digital form in a specifically chosen pattern. Artificialintelligence (AI) may be defined as "the development of computer programsthat mimic in some way human activity" (Fisher et al., 1988). In geography,AI can be applied to spatial decision making and pattern recognition (Smith,1984)—two very important concepts in this study. 'Information system' will referto a computerised method, a combination of basic computational programmingand AI, to manipulate data.Of fundamental concern in this project is the use of terrain morphometricinformation extracted from basic land elevation data. Due to the need for asystem that may be employed throughout British Columbia, the Terrain Resource4Information Management System (TRIM), 1:20 000 mapping of British Columbia,is used as the basic source of elevation and drainage data.The Queen Charlotte Islands have been chosen as the location of the studybecause of on-going research in the area in the Canada/British Columbia Fish-Forestry Interaction Program (FFIP). The results of this study are expected tobenefit the other studies undertaken by FFIP by providing a basis for projectingresearch results and management experience from selected areas to other areasin the region.Chapter 2 presents background information concerning drainage basin de-scription. The description of basins using biogeophysical, morphometric, andhistorical parameters is examined in the context of information available for theQueen Charlotte Islands.Chapter 3 discusses the relevant procedure for drainage basin comparisons. Acomprehensive procedure for analysing similarity is developed, incorporating theuse of nominal and ordinal information and a euclidean distance measure. Thesesimilarities are also examined using cluster analysis.Knowledge-based systems are defined and discussed in chapter 4. Differenttypes of knowledge, artificial intelligence, databases and digital terrain modelsare examined in order to properly provide a framework for computerised basincomparison.Chapter 5 presents the knowledge-based system to be used in the QueenCharlotte Islands. The components of the system, predominantly knowledge andinformation bases, are described in detail. An account of the operating procedureof the system is also presented.5Chapter 6 effects an application of the knowledge-based system with respectto drainage basins on the Queen Charlotte Islands. Sixty five drainage basins areanalysed with respect to biogeophysical, morphometric and historical characteris-tics. Two different tests are described: the first using all available characteristics;the second, using only those characteristics deemed 'important.'Chapter 7 presents the conclusions and recommendations for further research.6Chapter 2: Drainage Basin Description2.1 IntroductionThe purpose of this chapter is to discuss characteristics used for drainagebasin description. Much of this discussion focusses on sampling issues, as theyare important to many aspects of this study.A major unit basis of landform analysis, the erosional drainage basin, has beenrecognized as a viable landscape process-response element since the beginning ofthe 19th century. Playfair (1802) spoke of the nice adjustment of the the system ofvalleys communicating with the main trunk of a stream, and Gilbert (1877, citedin Rodda, 1976) referred to the interdependence throughout the system leadingto a 'dynamic equilibrium' affecting all drainage lines and their flanking slopes.The work of W. M. Davis (1899), in which a river was defined as extendingthroughout the basin up to its divides, was given its more modern meaning by thework of R. E. Horton, who in 1945 described the morphometry of drainage basins.Horton showed how the morphometric features are interrelated and attempted torationalise these features on the basis of hydrological process. His analysis openedthe way for quantitative comparisons.Descriptions of drainage basins have generally been limited to biogeophysicalparameters, many of which are qualitative. They include such features as geology,soils, vegetation, and climate, which are dominant controls of many of theprocesses within the basin. If drainage basin and channel morphometry are tobe related to the geologic, climatic, and hydrologic character of the basin, thenit is necessary to describe features quantitatively in order to investigate these7relations (Chorley, Schumm, and Sugden, 1986).2.2 Geomorphometry, Sampling and RepresentationErosional terrain commonly presents a complex geometric surface. Themethod employed in describing such surfaces is to select by sampling certainsupposedly diagnostic geometric variables (e.g., elevation, gradient, area) whichare then used to generalise the features of landscape geometry which can berationalised in terms of formative process, history, or which are useful in somepractical sense.The science of landscape morphometry is concerned with the quantitativemeasurement and generalisation of land surface geometry. For a drainage basinand its related stream network, two types of parameters can be identified:1. geometric—dealing with linear and areal measurements, e.g., area, perimeterlength, circularity;2. topologic—dealing with positional relations among objects, mainly relatingto the stream network, e.g., bifurcation and length ratios.Many parameters are used in geomorphometry. Some are direct, singularmeasurements, such as area and relief, while others, such as mean aspect andmean gradient, require sampling and are therefore prone to sampling variabilityand bias.Four important parameters in geomorphometry that require sampling areelevation, aspect, gradient, and the elevation-relief ratio. The latter, whichincorporates mean elevation, expresses the relative proportion of upland to lowland8within a sample region,ER = (Xmean Xmin)l (Xmax Xmin)and is equivalent to the hypsometric integral (Pike and Wilson, 1971). Elevation isan important feature of basin topography. Within the basin, it represents potentialenergy. On the regional scale, it has a dominant influence on surface temperatureand a major one on rainfall and snowfall. The gradient of a slope influences flowrates of water and sediment by controlling the rate of energy expenditure availableto drive flow and the downslope component of forces to move earth materials.Aspect defines the slope direction. On the local scale, it becomes a dominantcontrol of precipitation and surface temperature variation, and it controls thedirection of flow.Optimum sampling to determine these parameters involves two considerations:sample size and sample pattern (Ayeni, 1982) (sample size within a fixed area,such as a drainage basin, includes the concepts of sampling density and scaleof resolution). Patterns, lack of patterns, or peculiar data characteristics mayor may not be portrayed in the sample, depending on the sampling strategy.Many authors (Pike and Wilson, 1971; Evans, 1979; Ayeni, 1982) believe thatsystematic (grid) sampling is one of the most effective sampling methods. Yates(1949, cited in Evans, 1979) showed that a grid is the most efficient samplingscheme for an autocorrelated series, such as elevation. To represent a surface,Ayeni (1982) showed that systematic sampling is the optimum sampling patternfor many different surface types.Choosing the appropriate sample size, n, is a more difficult task and a varietyof methods may be used. Sample size may be determined by choosing a desired9precision, E, an assumed population standard deviation, s, and a t value requiredfor a specific confidence level, to,_1). Then:n = (t(n_ 1 )sie) 2One assumption underlying this calculation, however, is that the sample is not tooskewed. Many morphometric variables are strongly skewed over a landscape (seeappendix A) but may become more normal for regions smaller than that dictatedby the regional 'grain'.Non-normality of the distribution of some of the morphometric variableshas been neglected in many previous studies. Gradient (Gerrard and Robinson,1971; O'Neill and Mark, 1987) and elevation (Wood and Snell, 1957) have beenfound to be non-normally distributed. Mean values are generally reported inthe literature, but due to skewness, these may not be adequate representationsof central tendency. Wood and Snell (1957) note that "the strong skewness ofgeomorphic data indicated by the spacing between means and medians calls fora kind of analysis different from that used for normal distributions" (p. ii). Inthis study, arithmetic means are used for representing samples of elevation andgradient. Geometric means probably constitute a more representative measure ofcentral tendency but the use of arithmetic means allow comparisons with previousstudies.Ayeni (1982) introduced two more complex methods to determine the opti-mum sample size for different surface types. The first is to calculate the meanharmonic vector magnitude—a terrain roughness parameter which incorporatesthe size of the entire area. The second method uses multiple linear regressionon various parameters of terrain roughness such as gradient and curvature. Both10methods achieve quite similar results—more complex surfaces require a largernumber of points to adequately represent them.An Empirical Approach to SamplingThe following approach was adopted in this study in order to determineempirically an appropriate sampling strategy for the Queen Charlotte Islands thatwill yield accurate measurement of mean elevation, mean aspect, mean gradient,and the elevation-relief ratio. An empirical strategy—determining at what samplesize sampling statistics stabilise—was utilised in order to accomodate the extremecomplexity of surface topography of the Queen Charlotte Islands; many variablesneeded for theoretical calculations, e.g., population standard deviation, surfacetype, and an estimate of error, are not accurately known.To study the complexity of the basin surface, the 'grain' of topographywas measured using the concentric circle method outlined in Wood and Snell(1960). 'Grain' of topography is defined as "the shortest significant wavelength oftopography" by Mark (1975) and may be thought of as a measure that characterisesthe upper limit of most landscape units for manipulation, outside which otherfactors which vary in landscape units—such as geology, soils and vegetation,are apt to preclude well-controlled study. Areas which are larger than the grainare assumed to be topographically more complex than areas smaller than thetopographic grain (approximately 4 km using Wood and Snell's technique). Twosmall areas arbitrarily chosen by visual inspection from within a 29 km 2 drainagebasin (Mercer Lake tributary)—a 2.05 km 2 subbasin and a 1.05 km2 uniformlysloping area—were used to analyse the variability of the relevant parameters ofless complex topographic surfaces and to determine an appropriate sample size.The random method of sampling was used in both areas.1 1Elevation was measured at 1 mm grid intervals from the Mercer Lake tributarydrainage basin, Queen Charlotte Islands, B.C. (figure 1) on map sheet NTS 103FlOW/11E (1:50 000) yielding 11 645 data points. Aspect and gradient werecalculated at each point which had eight surrounding data points (the method isoutlined in Sharpnak and Aldn, 1969; Evans, 1979) creating 10 748 data values.Means and variances of gradient, elevation, and aspect (the latter using vectoraddition and dispersion calculations) as well as correlations among the threeparameters were calculated. Random samples, without replacement, of varioussizes were taken by computer of elevation and their associated aspect and gradient;the means, variances, and correlations were calculated for the parameters for eachsample. The elevation-relief ratio was calculated from the mean elevation of eachsample.Figure 2 shows the effect of sample size on the reported mean elevationof the main basin. The optimum sample size may be thought of as the size atwhich any further increase in the sample would not significantly affect the valueof the sample statistic (mean in this case). The results of the sampling are givenin table 1. It is obvious that the optimum sample size is strongly influenced bythe complexity of the surface, as represented by the grain of topography. This istrue for the mean values as well as the variance. It appears that for areas of lessthan 30 km2 —that is with characteristic dimensions less than 5.5 km—a samplesize of 200 is sufficient to produce an adequate measure of the means.Grain is just one measure of complexity at a certain scale. It gives a measureof a 'dominant' wavelength. There may be other levels of complexity at differentscales.12:	 \"— • "\–,-	 ----,;\	 •I	 r".,t /__1 /-7-71	 -SKELU BAY6114_Q 6 	1 30In metres1:61,500Figure 1 Mercer Lake Tributary Drainage Basin, Queen Charlotte Islands(solid black line represents basin boundary; contours are in feet)130- 000— 0OD00— 0cv(,) 't LA CD(D	 (D	 ttoCI 0— CL 0— 1:1- 0_EEEEEEggc7)PoggI	 illy- 0Figure 2 The Effects of Increasing Sample Size on the MeanElevation of the Mercer Lake Tributary Drainage Basin00E1	00g I.	 0011	0001	 006	 008	 OOL(4) U011PAGI3 ueew142.3 The Drainage Network and the Problem of LakesDrainage network parameters are used in most studies for which drainagebasin characteristics are required. While some parameters are used only fordescription, such as drainage density and main channel length, others, such asnetwork order and magnitude, are also used to categorise basins into groups. Thedescription of channel networks has been well summarised in previous studies(Scheidegger, 1967; Jarvis, 1977).Lakes are prominant features of most glaciated landscapes, which cover some30% of the earth's present land area. However, in the use of geomorphometry forthe description of basin properties these important features are usually neglected(cf. the work of Mather and Doonikamp, 1970; Gardiner, 1973; Mark, 1975; deVilliers, 1986). This creates a major problem when trying to create a 'standardmethod' for the morphometric description of drainage basins. Main channel orderTable 1 Optimum Sample Size Determined from RandomSampling of Topography in the Queen Charlotte IslandsMain Basin(29.1 km2 )Sub-basin(2.05 km2)Uniform Area(1.05 km2)Mn Elevation 200 75 90Mn Elev Var. 100 75 40Mn Gradient 200 90 Not CalculatedMn Grad Var. 200 100 Not CalculatedMn Aspect 100 25 40Mn Aspect Var. 200 50 80E-R Ratio 150 75 100Maximum 200 100 10015and drainage density are two parameters which need to be reevaluated with respectto lakes.Drainage density refers to the length of stream channel per unit area. Thismeasure excludes 'drainage lengths' that may occur as the network passes througha lake. Lakes cannot readily be incorporated into drainage density as the dimen-sions of the two features, length and area, differ. Also, when water arrives in alake it raises the water level producing an immediate response at the outlet. Thisis not conceptually comparable to the performance of a drainage link Further-more, rivers, which tend to increase erosion, and lakes, which tend to decreaseerosion, have very different effects within the basin. Therefore, a new measureis suggested, 'y, which is defined as the area of lake cover divided by the totalarea of the basin. Using this measure the drainage density needs to be adjustedto take only the 'dry land' area into account:Dd = (E Li) / ([1--y] Ad)where Li refers to the sum length of drainage links,Ad refers to the area of the basin.Another variable which needs to be considered in the context of lakes is meangradient. In this study, mean gradient takes into account only those sampleswhich originate from the 'dry land' surface. Gradient, Sbasin, can be adjusted toincorporate the lake areas as follows:Sbasin = ( 1-7) Sland (7) (0) = (l'y) SlandA number of researchers (cf. Gardiner, 1973) have used network order, whichdepends upon the number of links and the topology of the network, as a networkdefining parameter. An assumption underlying the determination of network order16is that there will never be more than two upstream links joined at any junction.This assumption is violated when there are more than two inlets into a lake.Therefore it is difficult to calculate the order for a network that includes lakes.Topologic models for drainage networks, with and without lakes, can producebinary strings which may be used to describe the network (Mark and Goodchild,1982). These, however, are much more specific than network order as the topologycan be determined directly from the strings. Comparisons of these strings proveto be something of a problem. There seems to be no straightforward way ofcombining them into something similar to a stream order. Some arbritary, butjustifiable technique is needed which circumvents the problem of three or moreinlets.This can be achieved by using 'proportional distances' along the banksof a lake to arbitrate the order of channel junctions. Once the main inlet(using either Horton's method of longest channel or Strahler's method of greatestorder) and outlet have been identified, the distances along the lakeshore banksbetween these two links are calculated. Proportionally equal distances are travelledsimultaneously along each bank and the channels are incorporated into the networkin the order in which they are encountered—the highest order gets added first inthe case of a tie (figure 3).An even more challenging problem to overcome is the determination of thedrainage path through a lake using the perimeter as a guide. Manually, this may bedone easily using a central path through the lake between the inlet and the outlet.Automating this procedure accurately requires a massive number of calculationsand tests. A somewhat 'adequate and rough' pathway can be identified usingthe proportional distance method. At proportionally equal intervals along each17Inlet Aptti	 LAKE4Bank 1(d 2 j )ABank 2Inlet BFigure 3 Proportional Distancing and Network OrderMain InletInlet CInlet DWhere: (d i i ) represents thedistance between nodes Iand 1+1 on bank 1;(d 9j) represents the distancebeNveen nodes J and J+1 onBank 2k represents the k th node on bank 1;I represents the ith node on bank 2OutletProportional distance to Inlet B	Proportional distance to inlet Ck(dU ) ) / (P+1)	 ,(T i (d 2 ) )(d=1	 2JThen proportional distances are sorted andsubsidiary inlets Joined to the network byIncreasing distance from main inlet:Inlet C	Inlet DMain inlet OutletInlet B	Inlet A18bank, a line is determined which connects these two points. At a relative distancealong this line, a point is determined which is assumed to represent the drainagepath—if it lies within the boundaries of the lake (figure 4).The topology of a river network which includes lakes also has a specialsignificance. Lakes tend to have a greater effect on the drainage system if theyare larger or nearer the mouth of the basin. Therefore, the 'lake index' has beendevised:LI =  1	 (M jAj)MbAb j=1where M represents magnitude, A represents area, b refers to the basin, j refersto the jth lake and n is the total number of lakes in the basin. If there are nolakes or they occur only as headwaters, the LI approaches 0 — the lakes havelittle or no effect. Magnitude is used as an indicator of where in the drainagesystem the lake occurs.2.4 Biogeophysical ParametersAnother category of basin descriptors encompasses the biogeophysical char-acteristics. Many of these parameters are qualitative and comparisons tend to bedone on a subjective level. Classification schemes are affected by the number ofclasses and the relations among the classes. The decision on the number of classesis important because it directly affects the precision of discrimination achievedby the scheme.GeologyThe geology of the Queen Charlotte Islands, as described in Sutherland-Brown (1968), is too complex to use individual formations as geologic descriptors.Therefore it is necessary to devise a classification scheme. The geochronologicaln19d 1,1+1 \•I'-- -- -•*"kl(Figure 4 Lake Drainage Length).Bank 1WhereU1__ .13±U2 B2B 1 the length of Bank 1 = ng (d ij )B2 the length of Bank 2 = Pil kid 2 Nj=i	j iU1 is the unit length of Bank 11Bank 2Xk+1• YI'41	 N1	 ! /	 )	 \	/	 \ 1	,	s's I/	 sl PN,.	 n I‘\k	\ 	 \1\	/n 	 ,..a* " -.'00"...........v... •n•••n ....,,,OutletU2 is the unit length of Bank 2U1 and U2 depend upon the number ofpoints needed to represent the drainage pathDrainage path intersects line L at distance mfrom Bank 2 wherem = (0.5 B2/E31)L , where L=length of line20scheme, as used by Banner et al (1983), is not adequate for basin comparisonpurposes as it is based on time of rock formation, which may not directly relate tothe physiographically significant characteristics of the rock or the 'characteristics'of the drainage basin. More important is the type of rock and its structuralcharacteristics, which will directly affect erosional patterns. Softer rocks, such aslimestone, will erode more quickly than harder rocks, such as granite. However,the structure of the rock must be taken into account. "Faults are one of themost prominant geologic features of the islands" (Sutherland-Brown, 1968, p.147). Alley and Thompson (1978) note that structures in the bedrock such asjoints and faults have exerted considerable control over the course of erosion. Asa consequence drainage patterns may be rectangular or trellised. Undoubtedly,the orientation of larger lakes and fjords resulted from glacial scouring of majorstructural features (Alley and Thompson, 1978, p. 6). The effects of the joints andfaults can be seen in the volcanic rocks of the Yakoun Formation in the Alley andThompson (1978) study area. The area is heavily jointed and strongly fissured.This has facilitated deep penetration of groundwater, promoting weathering bothto considerable depths and generally throughout the rock mass. As a result ofweathering characteristics this rock unit is highly unstable in their study area;however, in other parts of Graham Island which are less heavily fractured the unitmay be relatively stable.Therefore, a new classification scheme appropriate to geomorphic concerns(figure 5) is proposed based upon rock type (adapted from Banner et al., 1983):1. Volcanicsa. 'soft' volcanics: Masset, Yakoun formationsb. 'hard' volcanics: Karmutsen Formation21oooP°0`sandstone/conglomerateMEMO-awo..•pure and impure limestones30	 0	 30	 6015515n 1Scale in KilometresGENERALIZED GEOLOGY OF THEQUEEN CHARLOTTE ISLANDSVOLCANICsofthardSEDIMENTARYM silt/shaleINTRUSIVEel111,-11.. 7L.- 1111E,i1111K- 11QUATERNARYFigure 5 Geology of the Queen Charlotte Islands(Adapted from Sutherland Brown, 1968; Banner et al., 1983)222. Sedimentarya. conglomerate/sandstone: Honna, Haida formationsb. siltstone/shale: Skidegate, Longarm, Maude formationsc. pure and impure limestones: Kunga Formation3. Intrusives4. Quaternary depositsGeological structure data, although somewhat generalised, are obtained fromSutherland-Brown (1968). Orientations of fractures and faults have been measuredand summarised (figure 6). These results will be used as a source of rockstrength information.SoilsPredominantly three types of soils are found on the Queen Charlotte Islands(figure 7). On the north and northeast coast of Graham Island dominantlyorganic soils, such as Fibrisols, Mesisols, and Humisols, are found. These acidicsoils occur where the decay of organic residues is inhibited by a lack of oxygencaused by submersion or saturation (Valentine et al., 1978). Along the southwestcoast of Graham Island and over most of Moresby, podzolic soils are found. Theferric-humic podzols are moist to wet over most of the year and rarely freeze toany significant depth. The main soil development processes are the accumulationof complexes of amorphous organic matter, iron, and aluminum producing soilswith exceptionally strong B horizons. They tend to be medium to coarse texturedand generally lack horizons in which clay has accumulated. Leaching is intense(Valentine et al., 1978). There is little information available for this study on thespatial variation of the soils at the basin scale.23FAULTSLINEAR S25.4.• of frot•ORIENTATION OF FAULTS AND L 'NEARSBY 15' QUADRANGLESL. 1.0•1 of froc• poc 15' acct orO	 8	 la8Scale Miles24Figure 6 Fault and Fracture Measurements of the Queen Charlotte IslandsADominantly Organic SoilsDominantly Podzolic Soils(poorly drained)Dominantly Podzolic Soils Gftga,SOIL BOUNDARYMEAN ANNUAL TOTAL PRECIPITATIONZONE 1: 1260-1370 mm/yr.ZONE 2: INE1 1665-1765 mm/yr.ZONE 3: M 2035-2225 mm/yr.ZONE 4: >3665 mm/yr.10 0 10 20Scale in KilometresFigure 7 Precipitation and Soils on the Queen CharlotteIslands (after Valentine, 1978 and Hogan, 1985)25VegetationThe Queen Charlotte Islands are predominantly forested. Four biogeoclimaticzones dominate the islands. Tsuga heterophylla, Thuja plicata and Picea sitchensis dominate the leeward, low elevation forests of the Coastal Western Hemlockzone. Chamaecyparis nootkatensis and Pinus contorta join Tsuga, Thuja, andPicea in the low elevation forests of the windward, hyperoceanic Coastal Cedars-Pine-Hemlock zone where blanket bogs and bog woodlands are extensive. Thesub-alpine Mountain Hemlock zone includes forests of Tsuga mertensiana, C.nootkatensis, and T. heterophylla as well as dwarf evergreen shrub, herb meadows,scrub, and rocky steeplands. The Alpine zone is not extensive but includes threebroad types of ecosystems: herb meadows, scrub, and rocky steeplands (Banneret al., 1983).Most effects exhibited upon the landscape on the Queen Charlotte Islands arenot due to differences in tree species but predominantly to differences in typesand age/size of vegetation (e.g., forest versus meadow, 2 year old alder versus20 year old alder) (Margaret North, pers. comm.). Therefore, it is necessary toexamine forest age, canopy cover, vegetation type (forest, meadow, swamp/bog,etc.) as they provide major controls in the basin. These data may be obtainedfor certain areas of the Queen Charlotte Islands from British Columbia Ministryof Forests Forest Cover maps.ClimateClimate is a dominant control upon drainage basin processes. Precipitationaffects runoff and erosion, temperature affects weathering, and together theyinfluence vegetation.26Little climate data are available for the Queen Charlotte Islands and mostof the information available has been modelled. Hogan (1985) has analysed theprecipitation patterns and his precipitation diagram (figure 7) is used as a sourceof information in this study. Temperature is not used because the data are notavailable—there are only two major weather stations on the Queen CharlotteIslands. However, the hypermaritime environment controls a year-round coolclimate with notable absence of extreme temperatures everywhere in the islands.Monthly means at Tlell varied only between —2.2°C and 16.5°C during the period1957 to 1987.2.5 Historical ParametersCertain historical parameters are major influences on the landscape. Masswasting and fires may change basin morphometry and vegetation characteristicsif large enough events occur.Mass WastingMass wasting is a major erosional process in many areas of the QueenCharlotte Islands due to the large amounts of precipitation and erodable geologicformations. Several inventories of these events have been conducted on theQueen Charlotte Islands. Some focus on particular regions, such as Rood's (1984)data analysis of Rennell Sound and northern Moresby, whereas others are morecomprehensive, such as the inventory recorded by Gimbarzevsky (1988). Thelatter will be used as the source of data for this study as it is the most areallyextensive available. Events are measured over 1 km 2 UTM grid cells.27Fire HistoryHistorically, natural fires have played a minor role in the ecology of the QueenCharlotte Islands. Only four lightening-caused fires were recorded from 1940 to1982 and none were larger than 0.1 ha (Pearson, 1963; Parminter, 1983; Banneret al., 1984). There are several large areas of human-caused fires. Some wereprobably related to settlement pre-clearing of land while others date back to the1940s and 1950s when there were several post-logging fires (Banner et al., 1984).Today, both natural and human-caused forest fires are rare, and their extent isseverely limited by modern fire suppression techniques.2.6 ConclusionA summary of drainage basin descriptors used in this study is shown in Table2. Many types of characteristics are used in geomorphology to describe drainagebasins. The three major categories are geomorphometric, biogeophysical and his-toric. Geomorphometric information seems to be the most areally comprehensiveand available for the Queen Charlotte Islands, while biogeophysical and historicalinformation is limited. It is becoming increasingly common for researchers tocollect detailed quantitative information concerning drainage basins which aids ina statistically more rigorous basin similarity testing procedure.28Table 2 Drainage Basin CharacteristicsParameter	 Units	 Data Source	 Information	 TestType	 TypeLANDSCAPEGeology1. rock type	 geological maps	 nominal	 filter2. rock structure	 radians	 cf. Sutherland-	 ratio	 parameterBrownVegetation3. age	 yr	 forest cover maps	 ratio	 parameter4. cover extentkm2kmkmas in	 (3)AES,	 Hogan	 (1985)TRIM filesas in	 (6)as in	 (6)calculation from basicdatam as in (6)km km-2 as in (9)as in (9)km2 as in (9)km as in (9)km as in (9)km2 as in (9)as in (9)as in (9)degrees as in (9)degrees as in (9)as in (9)as in (9)km as in (9)as in (9)as in (9)as in (9)as in (9)as in (9)as in (9)as in (9)ratio	 parameterordinal	 filterratio	 bothratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterratio	 parameterinterval	 bothinterval	 parameterinterval	 parameterratio	 parameterClimate5. total precipitationGEOMETRY6. area7. perimeter8 . mainstem length9. geometric shape factor10. relief11. drainage density12. long profile concavity13. valley flat area14. valley flat length15. valley flat avg. width16. steepland area17. ruggedness number18. elevation relief ratio19. channel slope20. aspect21. lake ratio22. circularity23. basin length24 ruggedness numb.TOPOLOGY25. bifurcation ratios26. length ratios27. main channel order28. network magnitude29. channel numbers, by order30. lake indexHISTORYMass Wasting31. mass wasting	 number of	 Gimbarzevsky	 ordinal	 filterevents29Chapter 3: Drainage Basin Comparisons3.1 Introduction"The theory of the paired catchment experiment is basically simple, but hasbeen widely questioned because no thorough treatise of the method has everbeen published" (Hewlett, 1971, p. 377). This statement may refer to thefact that no method of compairing two drainage basins has been formalised.Various procedures have been used to determine basin similarity, but no singlemethod has become dominant, and rarely are these procedures explained inpublished documents. It is important to note that this discussion is based upon`characteristically' similar rather than 'process' comparisons; the latter are rarelyavailable.Description of basins usually involves three types of parameters: morphome-tric, biogeophysical, and historical. Historical parameters, such as mass wastingevents and fire history, are not widely used but may be of importance dependingupon the nature of the study.Comparison techniques have generally been quite subjective. Currently, thereis no standard method available for determining whether two or more basins aresimilar or for establishing their degree of similarity. Procedures have been basedupon simple dissimilarity statistics, the researcher's judgement, and time and costconsiderations. While the 'most similar' basins may be identified using thesetechniques, the degree of similarity may not always be adequate for the particularstudy, nor even be made sufficiently objective to permit assured judgements.Few studies using basin comparisons have incorporated a critical analysisof similarity. Some researchers assume that adjacent or closely situated basins30are similar. Swindel and Douglass (1984, p. 305) described paired watershedexperiments as using "two forested watersheds in close proximity to one anotherand with similar size, topography, vegetation, and soils." Hewlett (1971, p. 377)suggested that for studies analysing the effects of vegetation on water yield, "thebasins should be similar in size, shape, geology, exposure and elevation, and atthe start they should have been under the same land use or vegetal cover for anumber of years." Riekerk (1989) analysed the effects of silvicultural practiceson hydrology in three basins, one of which was a control. Although climate andvegetation, dominant controls of hydrology, was "consistent" over the three basins,soil characteristics and structure, important factors in hydrologic response, werenot. Below sandy Paleadults and Haplaquods is a clay layer of low permeability2.5 m thick in some areas but "thin or absent" in others. There was also nomention of watershed morphometry, which influences hydrologic response (theamount and timing of runoff). Cronan et al. (1990) compared the aluminumbiogeochemistry of two watersheds in the eastern United States. Although thetwo watersheds were of similar size, 94 and 74 ha, the climate, soil type andcomposition, and vegetation cover were fairly different. One of their aims was to"determine the key factors controlling interregional variations in the transport of Alin soil solutions and surface waters" (Cronan et al., 1990, p. 1413), but becauseof the large degree of variation between the two basins in several potentiallycontrolling variables, results of this study are not conclusive though they havebeen explained by referring to previous studies and theory. This also applies tothe study of Riekerk (1989). The variation in soil structure and morphometryleaves some degree of uncertainty in the results.Hogan (1985) provided one of the few detailed descriptions available of31drainage basin characteristics and comparison procedures for paired experiments.Not only were biogeophysical parameters used, including soils, geology, vegeta-tion, precipitation and temperature, but a detailed analysis of twelve morphometricparameters was incorporated to determine comparable basins. Hogan proposedthat the most similar basins will be sub-basins within a larger main basin. Itwas suggested that in order to determine the similarity of two sub-basins withrespect to morphometry, it is necessary to consider the variability of the param-eters between several sub-basins within one major basin, and for purposes ofselecting comparable sub-basins, the pairs are assumed to be similar if the mor-phometric ratios do not differ by more than one standard deviation. Alternatively,to determine pair similarity a mean value for all the ratios for each pair canbe calculated and compared to a target value of 1.0. In this approach all basinparameters are of equal importance. If this is not the case, the specific basin pa-rameter will require weighting before calculating the mean. The ratios themselvesgive 'characteristic-by-characteristic' comparisons. One needs the mean, or theeuclidean distance statistic for a summary single measure.Further research by Hogan (1988) incorporates a dissimilarity index,Dn ((_xii) 1) 2Xj2i=1for identity D = 0where D is the dissimilarity, xil is the ith characteristic of basin 1 and xi2 isthe ith characteristic of basin 2. D represents the n-dimensional scaled euclideandistance between a study and a control basin.Little discussion is available in Hogan, or in other studies, on how qualitativecharacteristics (e.g., geology, soils) were determined to be similar. It is assumedthat similarity is based upon coincidence of simple classifications.32It is evident that one of the significant problems to overcome in the pairedbasin experiment is the determination of 'similarity'. It is important to note thatsince no two watersheds are identical in every respect, it is necessary to acceptsome degree of dissimilarity. Previously, the procedure involved has ranged froma very subjective decision based upon few biogeophysical parameters to thosebased upon statistical analyses of morphometry. Determining similar 'regions'has generally been more rigourously approached using cluster analysis (Rayner,1966; Mather and Doornkamp, 1970). It is necessary to achieve a standardised,efficient and objective method for basin comparisons.3.2 SimilarityVarious methods have been employed to analyse the similarity (or dissimilar-ity) between two objects. Most procedures incorporate some form of euclideandistance measure in order to calculate the 'proximity' between two objects (e.g.,Davis, 1973, cited in Gordon 1981).An m-space euclidean distance method used by Davis (1973, cited in Gordon,1981),2 1/21E1 (Xikaij — k=1 mis equivalent to an unweighted root mean squared statistic (R.M.S.). Two otherdissimilarity indices are given by Gordon (1981)dii = E wk ix ik xk=1and[P	 1/2dii =	 wk (x ikk=133where wk (k=1,...,p) is a set of weights, i represents the first object, j representsthe second object, and k is the kth charactersitic (equivalent to weighted R.M.S.).By extension, a general disimilarity index may be achieved,(A)	 Pdii	 [=	 wkixik - X jkk=11/Awhere A>0 and higher values of A give relatively more emphasis to the largerdifferences lxik—xjkl (Gordon, 1981). This is conceptually similar to Hogan's(1988) scaled index.Gower (1971) and Gordon (1981) also give a general similarity coefficient,pE WijkSijkSij = k=ip	E Wijkk=1whereSijk =1 — kik — X jkil Rkis the similarity between the ith and jth objects as measured by the kth variable,Rk is the range of the kth variable, and wijk is typically 1 or 0 depending onwhether the comparison is valid for the kth variable (Gordon, 1981). This generalsimilarity coefficient differs from the dissimilarity indices in that it incorporatesa variable— Rk—that standardises the scales of mensuration.There are some difficulties associated with similarity/dissimilarity indices.Incompatible units may pose a problem, i.e, are lengths measured in meters orkilometers. Variables have commonly been standardised, typically by dividingeach variable by its standard deviation or range. Standardisation of commensu-rable numeric variables has also been contemplated when some of the variablesdescribing an object have a greater range of variation than others, and would34make a larger contribution to an unstandardised measure of dissimilarity than isconsidered warranted by the investigation. A disadvantage of standardising eachvariable separately is that it ignores and can distort the relation between differentvariables describing the objects.A major problem to overcome is when variables are of different levels ofmensuration, e.g., ordinal, interval, ratio. A measure of distance has no meaningfor ordinal data, and has limited meaning for interval data. Gordon (1981)suggested three possible approaches. First, if the majority of variables belongto one type, the simplest approach may be to convert all the variables to thistype. However, this generally results in a 'lower' level of data, e.g., interval fromratio. The second approach is to employ a general similarity coefficient whichcan incorporate information from different types of variables. This coefficientevaluates silk, the similarity coefficient of one variable between two objects, forratio data but for ordinal and interval data results in 0 or 1 depending on whetherthe variable is different or identical. Third, separate analyses may be carried outon the same set of objects, each analysis involving variables of a single type only.3.3 A Comprehensive Dissimilarity Testing ProcedureScales of basin characteristics range from ratio to ordinal. Morphometricinformation tends to be of higher levels, ratio or interval, than biogeophysical andhistoric information (see table 2). The test for basin similarity needs to incorporateall levels of information. Three different tests are proposed here, representingdifferent levels of data measurement. The first type tests ordinal data. This testsearches reference files of each ordinal level parameter and performs binary testsfor similarity based upon knowledge of each characteristic—this is known as35the filter test. If these data are proven to be 'similar' (based upon knowledgecontained within the system) then the rest of the procedure is carried out. Thesecond type of test is for interval level data. This test,((  (Xi 2. — xz.k) 2 ws	))dijk 7-- 0.25 * Rsimilar to the dissimilarity equation in section 3.2, calculates a dissimilarity`distance' between basins j and k using A=2 (so that greater differences arereflected in the measure) and standardises using the range of the informationin order to eliminate any effects of the unit or range of measurement. In thistest, one quarter of the range is used as a surrogate for the standard deviation.The third type of test, for ratio information, is similar to the test for intervaldata with one exception. The 'distance' in this case is standardised by using thestandard deviation. The total difference between the two basins is represented bythe square root of the sums of the two tests. A weight measure is included whichcan affect the influence of each variable. Therefore, this procedure incorporatesall levels of data.3.4 ClassificationClassification analyses involve finding groups of similar objects based oncertain criteria. There are several ways in which classification can be performed.They typically involve defining some notion of 'distance' between the individualcase and each group centroid with the case being classified into the 'closest' group.In this sense, classification represents a means of sorting multiple objects bydegrees of similarity. Clustering is an important type of classification. Clusteringalgorithms are valuable tools in exploratory data analysis and pattern recognition36studies since they help one ascertain characteristics of structure by organising datainto subgroups or clusters. Two of the basic methods are single link and sumsof squares.In order to carry out a single link analysis of a data set it is assumed that allrelevant information on the relations between the n objects can be summarised bya half matrix of n(n-1)/2 pairwise dissimilarities {d id (1	 < i n)}, wheredenotes the dissimilarity between the i th and j th objects. Two objects, i and j, aredefined to belong to the same single link cluster at level h if there exists a chainof (m-1) intermediate objects,	linking them such thatdik ik+1 < hfor k=0,1,...,m-1 (1 m n-1) where io is equivalent to i and in, is equivalentto j (Gordon, 1981). The value of h controls the scale of the investigation. Iftwo objects belong to the same group at level hl, it is clear that they must stillbelong to the same group at level h2>hi, hence the groups at different levels arehierarchically nested. From this formulation, it is clear that the single link clustersare invariant under any monotone transformations of the dissimilarities. This is animportant property, in view of the uncertainty which is associated with definingan appropriate measure of the dissimilarity.The sums of squares method can be used for the classification of objects whichcan be represented as points in euclidean space of some number of dimensions.Let xik(i=1,...,n; k=1,...,p) denote the k th coordinate of the ith point, Pi. The aimis to partition the set of n points into g groups so as to minimize the total within-group sum of squares about the g centroids; i.e., if the centroid of the m th group,37which contains the nm points {Pmi (i=1,...,nm)), has coordinatesEn-zm, a- —	 xmi k k=1,....,pnm i=1and the within-group sum of squares of the m th group isn,,, PSm EL: E E (xm ik - zmk) 2i=1 k=1then the aim is to find a partition of the n points which minimizesSg SmTrz=1(Gordon, 1981). Use of the sum of squares criterion appears to be a morereasonable method of obtaining compact clusters since long chains of points areavoided.The user of a cluster algorithm is often unsure about the data and has littleexperience with a particular type of data or a particular clustering method. Lackof information about the data is often the reason for clustering the data in thefirst place. In this case, the researcher searches for objective meaning andneeds quantitative measures of significance for evaluating clustering structures.The task of cluster validation is to separate the artifacts (imposed by clusteralgorithms) from data structure. The applied statistics literature embeds clustervalidity questions in a hypothesis testing framework. Unfortunately, the type ofrandomness assumptions required are seldom appropriate in actual applications.The fact that the data are not random does not mean that a clustering structureis appropriate, but it is certainly foolish to impose a clustering structure on dataknown to be random. The identification of structure in data is a primary goal ofcluster analysis. It is necessary to be aware that the clustering method used mayimpose limitations on the structure that is achieved.38Assumptions about the 'shape' and size of the groups can be implicit inthe definition of a clustering criterion. This means that a clustering criterioncan be expected to be effective in detecting particular kinds of structure in thedata; for example, the sum of squares criterion will tend to produce equal-sizehyper-spherical clusters, while the single link method can find clusters of anysize and shape, but only if these clusters are isolated and not linked togetherby chains of intermediate objects. Gordon (1981) concluded that the single linkmethod performs poorly, and the sum of squares method performed well if therewere an equal number of objects from each population, but less well for unequalsized samples. Hypothesis testing may be used to determine underlying structure,and although the lack of an explicit model in most classification studies makestests of hypotheses more difficult to formulate, this has not been regarded asa disadvantage. The rationale has been that the data may well possess manydifferent properties, and tests of hypotheses necessarily concentrate on only alimited number of of properties, to the exclusion of others which might be ofinterest. One could plot the value of the clustering criterion against the numberof groups and assess the plot by eye, looking for discontinuities in slope, butthis procedure can be unreliable: some clustering criteria can show large changeswhen analysing unstructured data. The assumption that the dissimilarities areranked in random order ignores the metric structure which is generally presentin the data. Ling (1973) argued that this random model should be regarded as alimiting case: if no significant clusters were detected under the random model,then clusters would be unlikely to be regarded as significant under any model.Because drainage basins in one particular cluster may bear some similarityto basins in another cluster, so that a basin may reasonably appear in more than39one group, a more appropriate method of classification may be 'soft' or 'fuzzy'clustering. These methods link fuzzy set theory and cluster analysis. Bothtypes produce a 'degree of belongingness' index of an individual to particulargroups, although fuzzy clustering provides the values for all c clusters while softclustering provides the indices for L clusters, where 1 < L < c (Ismail, 1988).Fuzzy clustering produces a set of probablilties which represent belongingness todifferent clusters.Wang et al. (1990) construct a land suitability analysis procedure using ageographic information system and fuzzy set theory. A euclidean distance measureis used to partition the land area into fuzzy sets. "An area can be associated withpartial membership and belong to different classes to different extents" (Wang etal., 1990, p. 270). Membership to a set is defined using the euclidean distancemeasure and a normalising factor. Wang et al. (1990, p.277) emphasis that"[fluzzy representation and processing is shown to have fewer limitations inmanaging geographical information than conventional land assessment techniques.The latter are hampered by the inherent constraint of classical set theory that doesnot allow for partial set membership conditions and imprecise information."Although soft and fuzzy clustering may be more appropriate for drainagebasin grouping, standard clustering is used in this study because the purpose ofthe clustering technique is to provide a general guideline by which efficiency (i.e.,make the procedure faster) measures may be incorporated, and to find criticaldissimilarity values.3.5 ConclusionMore emphasis is needed on quantative techniques when comparing drainage40basins. Comparison of qualitative data is based on individual knowledge and amore formal method should be developed to overcome the difficulty of makingconsistent judgements with many dimensions of information. The existence ofmany different levels of information by which to describe drainage basins leadsto incorporating three different tests, in order to achieve the distance measure.The analysis of similarity based on euclidean distance measures is a desirablepart of a more formal basin comparison procedure. These procedures will bedemonstrated in the balance of this thesis.41Chapter 4: Knowledge-Based System Framework4.1 IntroductionDrainage basin comparisons require a lot of information from various sources,especially when using large digital data files. In a computer oriented environment,formal concepts must be adopted for data storage, reorganisation and analysis.This chapter will provide a brief discussion of theory underlying databases,artificial intelligence, and knowledge-based systems.4.2 Data Bases as a Source of Basin InformationTraditionally, information for basin comparisons has been retrieved frommaps. Measurements of terrain parameters from conventional maps are extremelytime consuming, tedious, and difficult. Errors can be introduced: human error,from manual measurement or possible subjectivity in measuring; and data error,from incorrect data on a map sheet. It seems appropriate that digital data beused to aid in making data acquisition less time consuming, less subjective, andmore precise. The following discussion will focus mainly on morphometry asbiogeophysical and historic data are rarely as yet available to any large extentin digital form.4.2.1 Digital Terrain Models — Geographic Data BasesDigital terrain models (DTMs) are "the numerical and mathematical represen-tation of a terrain by making use of adequate elevation and planimetric measure-ments, compatible in number and distribution with the terrain, so that the elevation42LocalMathematicalMethods RegularPatchesIrregularPatchesimageFigure 8 Digital Terrain Models,Fourier SeriesGlobalMultiquadraticPolynomialsDTM /Uniform DensityRegularPoint /	 \ariable Density/TriangulationIrregularProximal NetworksHorizontalVerticalCriticalNlineof any other point of known planimetric coordinates can be automatically interpo-lated with specific accuracy for any given application" (Ayeni, 1976, p. 3). Thisdefinition emphasises the importance of evaluating an adequate number of datapoints as well as the appropriate sampling distribution of such points to guaranteea good match for a given terrain.There are a number of different methods by which DTMs can be represented(figure 8). The two dominant types are regular uniform density (grid or raster)and irregular triangulation (triangulated irregular networks or vector). The specificmethod used depends upon a number of factors including storage constraints,objectives of any analyses to be performed, and the accuracy needed (Burrough,431986).4.2.2 Raster versus VectorDTM data acquisition, storage and information analysis may be completedin two different formats: raster, or grid; and vector, a series of points andnodes. Table 3 provides a summary comparison between the two methods.For the purpose of this study, a raster method appears to be most appropriate.Many of the calculations are local procedures. Grid sampling is an appropriateform of sampling for terrain models. Also, many of the procedures used in theanalysis, such as gradient calculations and basin delineation, have been analysedwith respect to a raster analysis.4.2.3 TRIM — a B.C. DTMIn the province of British Columbia, the Terrain Resource Information Man-agement System (TRIM) provides the common digital base necessary for geo-graphical information system (GIS) supported resource and environmental plan-ning in both the public and the private sectors. TRIM is a cooperative programbetween the Digital Mapping Group (a consortium of nine private companies) andthe British Columbia Ministry of Environment and Parks. Begun in 1986, it isscheduled to deliver digital terrain models covering all of British Columbia. Toproduce a 3D topologically structured database covering the province explicit in-formation includes closed polygons, directionality for certain features, and nodesfor all planimetric features.44Table 3 Comparison Between Raster and Vector DataRaster VectorData Structure Simple ComplexStorage Large volumes of data;sometimes redundant.As little as 0.35% that ofraster; generally 6.3%.Accuracy Linear features notaccurately represented;depends on grid cell size.Accuracy of linearfeatures good; veryprone to systematicerrors.Processing Time Shorter; easier to locatedata in data structure.Longer processing time.Analysis Good for local analysis. Good for global analysis.Overlay Better for 4 or morelayers.Better for 3 or fewerlayers.(Adapted from Boehm,1967; Burrough, 1986; NCGIA, 1989)The Specifications and Guidelines Manual (Ministry of Environment andParks, 1988) describes in detail the methods used to store data. The variousfiles include:1. the digital elevation model (DEM);2. the map positional file;453. the map representational file;4. the contours and contour number file.The digital map is edited to the extent that all stereomodel edge ties have beenperformed, ties to adjacent map sheets have been performed, linear features havebeen explicitly closed, and redundant data have been eliminated. The DEMscontain all DEM points collected directly by stereocompilation brealdines (sharpand rounded), and supplementary data (from planimetric capture). In griddedDEM capture, areas where the average slope is less than 25° have a grid spacingof 75 m. In areas where the terrain is greater than 25°, the grid spacing is 50m. Using random data capture there are approximately 120 points/km2 in areaswhere the average slope is less than 25°, and approximately 200 points/km 2 in theareas where the average slope is greater than 25°. Ninety percent of all discretespot elevations and DEM points are supposed to be accurate to within 5 m oftheir true elevation. Ninety percent of all well defined planimetric features arecoordinated to within 10 m of their true position.The planimetric and elevation data are stored in either binary or ASCIIMinistry of Environment (MOEP) format, providing a standard base which most,if not all, GIS s will be able to use. These files, used to create a digital terrainmodel, provide adequate data from which basin morphometry may be analysed.The derived morphometric indices, and other basin information, must be storedin a manner appropriate to their origin and intended use. To assure this, attentionmust be given to database structure.4.2.4 Data BasesAny computer system processes data of some sort, whether those data are46composed of numbers or symbols. Data flow into the system, are stored, arecombined, are given structure (thus becoming information) and finally flow out insome modified form. Database theory is concerned with how data are stored bothphysically and logically, and with how those data may be efficiently, meaningfullyand simply accessed; with the logic of information structures and its relatedalgebra. It is also concerned with the extent to which programs can be regardedas data and vice versa. A database then is the computer representation of datastored and logically structured in such a way as to define permissible proceduresfor addition, amendment, and deletion of groups or items of data.A database consists of data in many files. In order to be able to access datafrom one or more files easily it is necessary to have some kind of structure ororganisation. Three main kinds of database structure are commonly recognized:hierarchical, network, and relational (Burrough, 1986; Elmasri and Navathe,1989).A hierarchical database assumes that each part of the hierarchy can be reachedusing a key (a set of discriminating criteria) that fully describes the data structure(figure 9). Data access via the keys is easy for key attributes, but unfortunately isvery difficult for associated attributes. Consequently, hierarchical systems aregood for data retrieval if the structure of all possible queries can be knownbeforehand. Further disadvantages of hierarchical structures are that large indexfiles have to be maintained, and certain attribute values may have to be repeatedmany times, leading to data redundancy which increases storage and access costs.In a network structure, travel within the database is restricted to the routes upand down the taxonomic pathways (figure 9). Network systems are very usefulwhen the relations or linkages can be specified beforehand. The disadvantage47M	 IIMap Ia1 2 2 3 3 4 4 1 3 4 3 5 5 6 6 4PolygonFigure 9 Database TypesMap MDatabase Representationsl a b cdII c e f gHierarchical Data StructureI al2I b 2 3lc 3 4I d 4 1II e 3 5II f 5 6II g 64lIc 43Relational Data StructureLinesNetwork Linkages48is that the database is enlarged by the overhead of pointers, which in complexsystems can become a quite substantial part of the database.In a relational database, data are stored in simple records, known as tuples,containing an ordered set of attribute values that are grouped together in two-dimensional tables known as relations (figure 9). Each table or relation is usually aseparate file. The pointer structures in network models and the keys in hierarchicalstructures are replaced by data redundancy in the form of identification codes thatare used as unique keys to identify the records in each file. Relational databaseshave the great advantage that their structure is very flexible and can meet thedemands of all queries that can be formulated using rules of boolean logic andof mathematical operations. They allow different kinds of data to be searched,combined and compared. Addition or removal of data is easy too, because thisinvolves just adding or removing a tuple. The disadvantage of relational databasesis that many of the operations involve sequential searches through files to findthe right data to satisfy the specified relation. This can involve a considerableamount of time in large databases.Although searching relational databases is considerably more inefficient thansearching other types of databases, a relational structure was chosen for this study.A relational structure allows different types of characteristics (biogeophysical,morphometric, historical) to be stored in different files and more easily understoodand accessed. The use of separate files also allows information to be transferredto other applications more easily.In all kinds of database structures, data are written in the form of records.The simplest kind of record is a one dimensional array of fixed length, dividedinto an equal number of partitions. Fixed length records are inconvenient when49the attributes are of variable length, and when the set of attributes measured isnot common to all items. In variable length records, each record has a 'header',an extra attribute that contains information about the type of information in thesub-record and the amount of space it takes up.To search data bases efficiently and analyse the data accurately, certain rulesand procedures must be followed. In the present context it is appropriate toconsider them within the concept of 'artificial intelligence'.4.3 Artificial IntelligenceArtificial intelligence (AI) is the computer-based solution of complex prob-lems through the application of processes that are analogous to the human reason-ing process. The main long-term aim of artificial intellligence researchers is tosimulate in computers the knowledge acquisition and problem solving capabilitieswhich are innate to human beings.Much of the conscious thinking that people do is based upon incompletereasoning, hunches or intuitions. Similarly, then, most problems that have beenconsidered by AI researchers are of the sort in which no one knows any practical,completely correct procedure to solve them; therefore, a certain amount ofproficiency in using hunches and partially verified search procedures is necessaryto design programs to solve them. Heuristic programming refers to computerprograms that employ procedures or obtain results not necessarily proved to becorrect, but which seem plausible. Heuristics are procedures that have not beenshown to be theoretically sound (Jackson, 1985). The realisation of this and itsincorporation into the design of computer programs was an important step in thedevelopment of artificial intelligence, signifying a recognition by AI researchers50that intelligence is often exhibited in situations wherein one's understandingand knowledge is incomplete; i.e., intelligence represents the ability to makeeffective/efficient judgements under uncertainty. Although a heuristic proceduremay choose a sub-optimal solution, depending upon the problem in question, solong as it is near optimal it may satisfy the problem.Decision making problems may be represented by state spaces and operators.State space is the set of all possible states and the result for each state of applyingeach of the possible operators to it (Jackson, 1985; Rolston, 1988). A state canbe represented as a finitely describable mathematical object including numbers,matrices, lists, graphs, sentences, sets, vectors, and trees. It is an descriptionor representation of a situation or physical condition. An operator is a finitelydescribable means of transforming one state into another state; from the standpointof a computer an operator is a computational procedure. The field of AI researchconcerned with ways that computers can solve large state-space problems, oneswhere there are many decisions to be made among vast amounts of data, is knownas heuristic search theory. The generation process consists simply of producingfinite descriptions (data-structures) for the nodes of the state space, states at whichsome decision must be reached, and for their connections to each other. With alarge, difficult state space problem it is not possible for the computer to generatedescriptions for each of the nodes and connections between nodes of that statespace problem. Rather, the computer may generate only a relatively small portionof the state space problem, and can check only that portion, to see whether itincludes a path between nodes which is a solution to a problem. Thus, it is clearthat the computer must be somewhat selective in the way it generates the portionof the state space that it produces when trying to solve a state space problem, if it51Figure 10 Search Strategiests/\1)44:A(% KEYStateNodes of depth-first search0 • Nodes of breadth-first searchOperatoris to be effective and efficient. Any procedure that a computer uses to generate aportion of the state space for a problem , and to check that portion for a solution, iscalled a 'search procedure.' Of course, a search procedure might find a solutionfor a problem simply by randomly generating descriptions for nodes and theirinterconnections, but unless a large percentage of the paths through the state spaceof a problem happen to be solution paths, such a procedure will generally not besuccessful. A search procedure that is 'systematically oriented' toward a problemat hand, in such a way that it can find a solution without generating the entire statespace, is an 'heuristic search procedure', as it embodies heuristic information.Two search strategies exist: breadth-first, and depth-first (Rolston, 1988). In abreadth-first search, increasingly broad segments of the state space are generatedand each generated level is checked for a goal state (figure 10). The depth-search strategy selects a path and follows it through increasingly deep levels untila solution is discovered or the end of the path is encountered (figure 10). Depth-first search has several potential advantages over breadth-first search. It is lessdemanding of memory resources because it considers a more limited search spaceto reach any given level. For problems that have deep solutions, it will find asolution faster than breadth first. Unfortunately, depth first search also has severaldisadvantages. If the search of a path is stopped before reaching the end of the52path (which may be required because of the possibility of infinite paths), then wemay not find a solution even though one exists. If a solution path is located, thereis no guarantee that it is the shortest available.These search strategies may be effected using a 'filter' criterion (see section3.3) which limits the extent to which any similarity comparison is carried out.If a filter is used, a type of depth-first search occurs; a breadth-first search isachieved when no filter is used. The filter criterion depends on the informationand knowledge available.Statistical analyses may be thought of as a form of artificial intelligence.Some, such as principal component analysis (PCA), are a form of rule base whichanalyse certain characteristics—for PCA, variability and the influence of differentvariables. Knowledge from these forms of analyses can lead to some usefulinsight into the problem at hand.In order for a variety of people to have use of the many forms of knowledgeused in basin comparisons it is necessary to formalise and store it in a readilyaccessible and easily inderstood format.4.4 Knowledge Representation and Knowledge-Based SystemsIt is difficult to obtain an exact definition of a knowledge-based system. Somesay it is a system which reduces uncertainty. Others define it as a system inwhich knowledge is applied in a computerized environment (Graham and Jones,1988). These are not mutually exclusive definitions. The first is a conceptualdefinition (uncertainty is reduced by application of prior knowledge); the secondis an operational definition (achieved by formalising the knowledge as a set ofconsistent rules or procedures). Both definitions are necessary to understand53properly what is implied by a 'knowledge-based system.' It is difficult to conceivea computer system not based on knowledge of some sort; the crux is the way theknowledge is represented. A knowledge-based system could be thought of asincorporating knowledge, artificial intelligence, and information.Knowledge is both the input and the output of a knowedge-based system.Knowledge is not definable at a scientifically admissible level with respect tocertainty and truth, therefore it is necessary to define knowledge as 'scientificallyaccepted statements '—statements that are consistent with what is known. Scien-tific reasoning can be characterised as the means by which one attempts to arriveat consistent or, at least, 'rationally justified' beliefs. The lack of a reference cer-tainty (degree of belief) or 'truth' creates a very difficult problem. It is suggestedthat knowledge is a belief for which the believer has sufficient evidence, or somesuch formula. In light of the foregoing, this could be interpreted to mean that thebelief is consistent with all other knowledge within the frame of reference or, atleast, is not obviously inconsistent with it. This proposal relates to Gieres' (1979)correspondence theory of truth—a statement is true if it corresponds with the waythings appear. Knowledge in the context of knowledge-based systems is consid-ered formally to be a "justified true belief" (Addis, 1985, p. 15). A difficulty thenarises in providing criteria that establish consistency (i.e., what could be `true').Three components of knowledge of may be identified:1. data (facts)—statements that relate some element which is declared to becertain regarding the subject domain; e.g., 9.8;2. theories (procedural rules)—well-defined invariant rules that describe funda-mental sequences of events and relations relative to the domain; e.g., the54drainage density is equal to the total drainage length divided by the drainagearea, Dd=sum length/area;3. hypotheses (heuristic rules)—general rules in the form of hunches or rules ofthumb that suggest procedures to be followed when invariant procedural rulesare not available. These are approximate and have generally been gathered bytrial or by an expert through years of experience; e.g., hard volcanic rocks andigneous rocks have similar effects on morphology, while limestone producesdifferent effects.While information (e.g., the basin area is 9.8 km 2) always has a specificcontext and data (e.g., 9.8) may be context free, knowledge (e.g., the drainagebasin is small) is usually seen at a higher level of abstraction. The realisation thatmuch knowledge is expressed in the form of theoretic or heuristic descriptionsor rules of thumb is what gives rise to the conception of knowledge as moreabstract than information (Addis, 1985; Graham and Jones, 1988). One of thecomplicating factors in representing knowledge is the issue of confidence in theknowledge itself, and the deductions which can be made. In any worthwhilefield of expertise, there will be the need to arrive at judgement under uncertainty,for not all data will be available to the expert, and some of the data may beambiguous or vague (Graham and Jones, 1988).Knowledge-based system theory has focussed on the representation of knowl-edge efficiently and effectively. Maximum efficiency may be achieved using opti-mum search techniques. Effectiveness is increased by maximizing the knowledgecontent to decrease uncertainty. As effectiveness and efficiency are increased, theuncertainty in the internal decision making process is decreased.55`Knowledge engineering' is the process of acquiring domain specific knowl-edge and building it into the knowledge base. Although knowledge can be securedfrom a variety of sources, including documentation and existing computer infor-mation systems, most of it must be elicited from human experts. In computerbased knowledge engineering projects, the sources and kinds of uncertainty maybe:1. lack of certainty in evidence—e.g., not certain as to whether certain geologicalformations have different effects on morphometry;2. lack of evidence—e.g., not enough information to determine whether geologyhas an effect;3. uncertainty in judgement—e.g., a non-expert determining the effects of ge-ology on morphometry;4. experimental error—e.g., consistently mistaking metamorphic rocks for ig-neous rocks;5. random error.While all of these types of errors exist within drainage basin comparisons tosome extent, types 3, 4, and 5 are the most relevant when discussing knowledge-based systems, which are designed to help decrease these errors.4.5 ConclusionA database may be defined as a collection of interrelated data stored so asto serve multiple applications with minimum redundancy. The description of thedata is intended to be entirely independent of the instructions that process it inapplication programs. The data system should be data independent—adding new56records or modifying or deleting existing records do not require changing theapplication programs that refer to the data.A raster format is the preferable type of data acquisition format for thisstudy. Many studies of geomorphometry include some form of raster analysis.A relational database appears to be the most efficient format in which to storedata because of search efficiency. Artificial intelligence techniques allow adoptionof heuristics in the searching and analysis of a relational data base, limiting thenumber of tests needed to be completed. A knowledge-based system formatminimises the effects of uncertainty by formalising knowledge and heuristics. Anobjective drainage basin comparison procedure can be suitably developed withinthis format.57Chapter 5: COMPARE: A Knowledge-Based System5.1 IntroductionCOMPARE, a knowledge-based system for drainage basin description andcomparison, has been created. It consists of two major stages: informationgathering using databases and a procedural knowledge base; and informationanalysis using an information base and a 'sectional' procedural knowledge base.The term 'information base' is used instead of 'database' in order to distinguishbetween the levels of abstraction and the importance of the items within themgiven the particular context. This entire system is a form of AI as it attemptsto mimic a researcher's decision making process. Each of the major sectionscontains factual and procedural knowledge (figure 11).5.2 Information GatheringThe information gathering stage is achieved through both manual and auto-mated techniques. The initial database consists of the primary sources of data.These include a DTM and more traditional map sources. These data are thenmanipulated in order to create the information base.5.2.1 Data SourcesThe major source of data is the TRIM DTM because of the importance tothe study of basin morphometry. The TRIM DTM provides the initial data fromwhich all morphometric parameters are calculated. Other sources of informationinclude forest cover and geologic maps, British Columbia Forest Service andAtmospheric Environment Service (Canada) records, and previous studies. Some58Figure 11 Components of the Knowledge-based SystemDataSourcesKNOWLEDGE-BASED SYSTEMMORPHCALC:ProceduralKnowledgeBaseCOMPARE:SectionalProceduralKnowledgeBaseREGIONALINFORMATIONBASEREGIONALDATABASEFormal BasinDescriptionandCompariso59of the parameters used in basin comparisons are not used in this study due to theunavailability of the data in an appropriate form, e.g., soil types. For a completelist of the variables incorporated, see table 2. Any other data may be incorporatedinto the system with relatively straightforward editing of the data files.5.2.2 MORPHCALC: a Knowledge-BaseMORPHCALC is a procedural knowledge-base written in the FORTRANlanguage. Most of the knowledge incorporated within this series of procedures isconcerned with converting the TRIM DTM data into morphometric informationabout the Queen Charlotte Islands. Given a set of UTM coordinates for apoint near the outlet of the basin, the drainage net is identified from the TRIMplanimetric file (see figure 12). The network is then analysed and topologicparameters calculated. Using the network, the surrounding area is identifed, thelocal DEM data are rasterised, and the basin is delineated using flowpaths (Marks,Dozier, and Frew, 1984). Once this is accomplished the morphometric parametersare calculated. An information base of drainage basin characteristics is the outputfrom the information gathering stage and is processed and analysed by proceduralknowledge contained in the rest of the system.5.3 Information AnalysisThe information analysis stage of the basin description and comparison incor-porates much of the discussion of Chapters 3 and 4. The framework is discussedin section 3.3 and 3.4.60Figure 12 Outline of MORPHCALCNetworkDelineationTopologicCalculationsLocateOutletFind Pointsin Area(from DTM)GridBasinDelineation(Marks, Dozier,Frew)MorphometricCalculationsTrimDTM615.3.1 The Information BaseThis is a pseudo-relational file currently consisting of eight formatted, ASCIIunits. The morphometric and topologic parameters are stored in two and threefiles respectively. The need for an individual type (i.e., morphometry) to havemore than one file stems from the fact that it allows for printouts and screendisplays to be more easily understood. Historical and biogeographic informationare each stored in separate files. There is also a file of coordinates whichidentify the outlet of each basin. All information could be stored in one fileand viewing/printing problems eliminated if a 'real' database system were used,however, due to programming and database system availability problems, themultifile format was used.5.3.2 COMPARE: A Sectional Knowledge-BaseCOMPARE is a modular knowledge-base which conducts the comparison ofthe drainage basins (see appendix C). A modular format, written in Splus, has beenused so that any alterations necessary may be more easily and straightforwardlyaccomplished.Factual Knowledge:Test TypeThe information contained in this file relates to the type of test being per-formed. A '0' represents an ordinal level test, '1' represents an interval test, and`2' represents a ratio test. This file may be manipulated easily in case informationis to be converted, or information of a higher level becomes available. Defaultvalues are set to those in table 1.62WeightsParticular characteristics of a drainage basin may be of different importancein different analyses, therefore it is necessary to include in the test a set of weightsto account for the variation. The initial weights are all set to 1.0. To removea variable from the analysis, its value is set to 0, whereas increasing the valueabove 1 increases the importance of that variable.Filter filesThere are two different type of files in this section. The first is the file whichdefines which filter variables to use. The second type contains the informationon the determination of similarity amongst the variables—i.e., which geologicformations are similar.Procedural Knowledge:TestsThese are described in section 3.3. Each test type (filter, interval, ratio) iscontained in a separate function (see Appendix C).5.3.3 StatisticsTwo forms of statistics remain external to COMPARE and are used in theinformation analysis stage. Distribution analysis is important because many of thevariables used in the similarity analysis are non-normally distributed, therefore,distributions must be transformed in order to maintain the robustness of theprocedures. Principal component analysis is undertaken to decrease the amountof information necessary in order to make the similarity analysis procedure moreefficient by eliminating redundancy in information and the number of calculations63to be completed.5.4 Using the SystemThe Knowledge-based system is a series of FORTRAN and Splus routineswhich prompts the user to set the type of test, weights and variables used (figure13). Default values are the result of formalisation of 'expert' knowledge and maybe changed if the user has reason to change them.There are three types of tests which may be conducted:1. compare two drainage basins—the system prompts for the coordinates of onebasin. The other may be chosen by the user or by the system (which chosesthe geographically closest basin);2. find the most similar—the user is prompted for the coordinates of one basinand the system will find the 3 most similar basins;3. analyse all—cluster analysis is effected. This is to analyse groupings and givethe user a better understanding of which basins may be suitable for a moredetailed analysis. Proximity and similarity relations are also examined.After the type of test is defined, the user is prompted to determine theboundaries and the parameters of the test. First, filter parameters are chosen.Choices include using no filters, the default filters and user defined filters. Nextthe user may wish to limit the types of variables; biogeoclimatic, morphometricand historical parameters may each be excluded from the analysis. The definitionof the weights for each parameter is the next decision to be made. If the userchooses to alter the parameters of the analysis, the appropriate weights may beeasily altered. At this point the user is prompted to input the coordinates of the64Figure 13 Using the Knowledge-based SystemDeterminationofTest Type	AiDeterminationof RelevantVariables,1,Identificationof Basin(s)Check BasinInformationFactor AnalysisInformationGatheringCOMPARE: Similarity AnalysisFactual KnowledgeBase:Information Base;Weights;Critical Distance;Ordinal Data ComparisonsProceduralKnowledge Base:Ordinal, Intervaland Ratio Tests;Variable Transformations;Cluster AnalysisDissimilarity65basin(s) if necessary. The system will then return the coordinate(s) and basinnumber(s) for tests 1 and 2, or a cluster plot in the case of test 3.Two sections of the analysis remain external to the knowledge-based sys-tem. The system will analyse only information contained within the knowledge-base. The user must ensure that information regarding any basins required forthe analysis exists within the files. If the information is not available inter-nally, MORPHCALC and other information gathering procedures must be used.MORPHCALC is interactive (because of problems associated with using basindelineation routines with the TRIM DTM, and with the extremely large files;see appendix B) and cannot easily be incorprated into COMPARE in order tocalculate missing information. Also, some of the information gathering routinesinclude manual measurement and input.Correlation analysis and principal component analysis also remain external tothe system. The user may input any numerical information into a statistical routinein order to explore ways of limiting the amount of information to be analysed. Inthis case, correlation analysis is carried out by an Splus routine, and a FORTRANroutine is used for principle components analysis. These external analyses aid theuser's determination of relevant variables and weights.One of the tricks of intelligent decision making is to generate enough optionsthat you have most of the 'good ones,' but no so many that you cannot possiblymake up your mind. The aid of cluster analysis and filters limits any furthersearch. The most similar basin pair may not always be identified by this system;however, two that are 'adequately' similar will be.665.5 ConclusionOne major drawback of this system is the extremely crude knowledge repre-sentation scheme. In this system, all knowledge is incorporated into the procedure.The knowledge representation can be evaluated using four criteria: transparency,the extent to which stored knowledge can be easily identified; efficiency, therelative ease with which specific knowledge can be accessed during execution;adequacy, the extent to which a given structure can be used to represent all knowl-edge required for a given system; and modularity, the extent to which knowledgefragments can be stored independently from one another. The relative lack ofmodularity and transparency of this system leads it to be somewhat crude and dif-ficult to update in particular sections. However, the main purpose of this study,to provide a more objective method to effect basin comparisons than currentlyavailable, has been achieved. In addition, a framework for a system that willallow non-experts in this field to use this particular method has been proposed.67Chapter 6: Drainage Basin Comparison inthe Queen Charlotte Islands6.1 IntroductionThe purpose of this chapter is to demonstrate and assess the basin similarityprocedure by application in the Queen Charlotte Islands. In order to producea robust, effective procedure a number of different statistical analyses mustbe undertaken: distribution analysis, principal component analysis, and clusteranalysis. Distribution analysis is undertaken to analyse the normality of theinformation in order to assess the robustness of the statistical procedures. Principalcomponent analysis and cluster analysis both aid in limiting search procedures:the former helps limit the number of variables while the latter helps limit thebasins analsyed.6.2 Distribution AnalysisDistribution characteristics must be determined for two reasons. First, manygeomorphometric variates have distributions which are non-normal (see appendixA) and, therefore, the robustness of the various statistical tests used (which arebased on data normality) must be reviewed. Secondly, standard deviations andranges must be determined in order to standardise the ratio level data so thatvariability of data ranges and units of mensuration between the parameters does notaffect the outcome of the dissimilarity tests. The distributions were analysed byvisual assessment of histograms. Gardiner (1973, p.147) indicates the limitationsto such assessments by noting that they "... give a useful visual indication ofthe form of the distribution but they are best regarded as only a preliminary68procedure because of their reliance upon visual assessments." However, as this isa first investigation, visual assessments were thought to be adequate.Distributions of the variables from this study exhibit patterns similar to thosefound in other studies (table 4). Morphometric variates tend to be log-normal asmost parameters such as area and perimeter do not have 0 values. Also, very largevalues of these variables do not tend to occur frequently. Hence, a distributionbounded by 0 with relatively few large values results. The exceptions are theexponential distributions of the valley flat and lake characteristics. With respectto valley flats, the large number of basins with no valley flats is attributed tothe sizes of the basins analysed (from 0.8 km 2 to 18 km2) and also the griddingprocedure. The cells are 50m by 50m and, near the stream in question, generallyinclude an area other than the valley flat for small basins. This increases theaverage gradient associated with the cell and excludes it from the cells which aredetermined to be shallower than the critical gradient defined for the valley flat.Most basins in the study areas did not include lakes. In those that did, the lakeswere generally very small in relation to the overall basin size.Transformations of some of the parameters have varying results (figure 14and appendix A). The distributions of basin area and main channel length bothbecome more normal following transformationsGardiner (1973) indicates that "the choice of transformation and the extent ofnon-normality cannot always be gauged from samples which are small in thecontext of morphometric studies." However, increasing the sample size doesnot necessarily mean that more of the distributions will tend to approach non-normality. Gardiner (1973) compared his results to those of Miller (1953, cited inGardiner, 1973) and both studies include over 1300 basins. Gardiner found that69ZLOL8917Z0•Figure 14 Distributions of Two Morphometric Variables and their TransformationscoCoE01.89	 t	 ZO OL	 9	 9	 17	 Z	 0bl ZL OL 8	 917Z070Table 4 Distribution Characteristics of Certain Morphometric Parameters(After Gardiner, 1973)Variable Doornkampand King 1971Lewis 1969 Miller 1953 Schtmnn 1956 Maxwell 1960 Wong 1963and 1971Krumbein andGraybill 1965Abrahams1972 and1972bGardiner 1973 This StudySample Sizeand Order130 3rds 60 4ths 1045 lsts, 2982nds214 Isis, 452ndsNot given 90 variousorders282 3rds 1549 Isis, 3492nds, 85 3rds65 variousorders (2nd,3rd, 4th)Method ofDistributionTestingProbabilityplotsNone Histograms Visualassessment ofhistogramsGoodness of fittestsNone None None Based onmomentmeasuresVisualassessment ofhistogramsArea log log -- log log - log log logPerimeter log - - log - log log logGeometricShape- - - - --- near normalRelief log 108 - - log - normal log log near normalTable 4 (Continued) Distribution Characteristics of Certain Morphometric ParametersRelativeRelief- - - - log(inconclusive)- - - logDrainageDensitynormal or log log normal - log log normal log log near normalConcavity - - - - - - - - - inverse logVFA - - - - - - - - - exponentialVFL - - - - - - - - - non-normalVFW - - - - - - - - - exponentialMain StemLength- - - - -- - - - logE-R - - - - - - - - - near normalMeanElevation- - - - - - - - - near normalMeanGradient- - - - - - - - - near normalMean Aspect - - - - - - - - - non-normalChannelGradient- log - - - log - - log non-normalLake Ratio - - - - - - - - - exponentialTable 4 (Continued) Distribution Characteristics of Certain Morphometric ParametersCircularity - - - - - - - inverse logRuggednessNumber- -..- log(inconclusive)- - - non-normal logBasin Length - - - - - non-normal logBifurcationRatio (1)- - - - - - - -logLength Ratio(1)- - - - - - -- - logMagnitude - - - - not log; notnormal- - - log logOrder - - - - - - - - logNumber 2ndOrderChannels- - - - - - - - - exponentialLake Index  - - - - - - - - - exponentialmost of the distributions of morphometric characteristics are logarithmic, whileMiller found total stream length and drainage density to be normally distributed.Given the varying distribution types, it was thought necessary to examine thesimilarity using measures and procedures which are tranformation invariant.6.3 Basin Dissimilarity Testing: Test 1 — All VariablesSimilarity can be assessed at many different levels. Individual tests canbe conducted between two basins or among groups of basins as in the caseof cluster analysis. The purpose of this project is to develop an effectivemeans of comparison between individual basins, but similarity between groups ofbasins provides some useful information when refining this technique—the groupsprovide a range within which to gauge 'similarity' of individual pairs of basins.For test 1, option 3 (cluster analysis) was chosen and all variables available wereused in the analysis (except for filters).63.1 Dissimilarity DistributionThe distributions for each of the dissimilarity histograms (transformed anduntransformed information) show clearly that these are logarithmic in nature(figure 15). The effect of the transformation seems to have been to decreasethe index (the relations between the basin characteristics have not been greatlyaffected). The untransformed distances have a geometric mean of 42.56, whilethe transformed distances have a geometric mean of 21.23.74Figure 15 Distributions of Dissimilarities: All Variables0asE0OS L	 00 L	 OS	 0	 OS L	 00 L	 OS	 0tFCcc0.::: ,0.14/0/08)4 756.3.2 Cluster Analysis and Proximity RelationsCluster analysis was performed on sixty five basins based on all intervaland ratio level information. There are a few points to note with respect to theuntransformed information (figure 16). First, there are still single basins, basins24 and 32 (40 and 46 may also be included), which have not been included inclusters when the tree is broken at a dissimilarity equal to the geometric mean. Inthe case of basins 24, 40 and 46, it is most likely because these are the relativelylarger basins (area is shown in table 6 to have a relatively large affect on thevariability between the basins). In the case of basin 32, it is probably due toits uncharacteristic mean aspect. While almost all basins have a mean aspect ofbetween 1.0 and 4.0 radians, basin 32 has a mean aspect of 5.39. The dissimilaritycalculation does not account for the circular distribution of aspect and thereforebasin 32 is seen to be very dissimilar with respect to aspect. It can also be notedthat the basins from South Moresby (those emphasised with ** **) all occur inone cluster, interspersed with basins from Rennell Sound. It can be seen thatcluster pairs tend to develop at about dissimilarity=9 and by dissimilarity=23major clusters have formed.If the cluster tree for the transformed information (figure 17) is broken at thegeometric mean, three clusters result with no singlets. The last singlet is addedat approximately dissimilarity=14. This basin has a relatively dissimilar meanelevation which may be the cause of it's relative uniqueness. Again, there seemsto be a slight division between the basins from South Moresby and those fromRennell Sound. Cluster pairs develop at about dissimilarity=4 and major clustershave developed by dissimilarity=14.76H	ac2O0Figure 16 Cluster Tree of the Dissimilarityof Drainage Basins Using All Variables.	 19	  Cl'	  1717	  117	  09	 89	 96	 9Z	  OC	 86	 ZZ	 6Z	 L99998176LC18LZ8Z9ZCZCC66Li79.1701'91'Z• L **• L *..1•	7	 41,*• 81..... 6 4,4Fl•** 0• Z.. 6414 C *4OZ14:9099961'esZSat,.. 9 ...... 8 ..i;‘.946991i#9*+1, 4 C..Ze1	 1	 1	 i	 I	 1001	 08	 09	 01'	 OZ	 0Ailienwisqa77Figure 17 Cluster Tree of the Dissimilarity of DrainageBasins Using All Variables and Transformed Information.	 99	 .. L	 4* 1• 7 I. **.. 61. ..ZIL* 0 1. it*E• 4, 4.** 6 **• L• OZ81. ..1.96 .17990991'01'69Lt799L909• 4.EE1 89*'dt*91'VSESZ9..9 .4EZZZ49ZLt96LE8ZSZL9LEtZOEZZBEZEGE6ZCfSSGO -r ot	 OE	 OZAlpetiwtssia78With respect to similarity and geographic proximity (figure 18), there doesnot appear to be a relation between the two variables. Neither the basin in SouthMoresby (basin 11) nor the watershed in Rennell Sound (basin 53) show thesimilarity-proximity relation—equally similar basins can be found in both areas.When using geology as a filter, appropriate basins tend to be closer as geologyvaries as one moves further away from the basin in question (figure 19).6.4 Basin Dissimilarity Testing: Test 2 — Using Specific VariablesThe purpose of test 2 is to analyse basin similarity using a more efficientmethod, by decreasing the number of variables to be analysed, for the specificpurpose of comparing hydrology. Principal component analysis is used to limitthe number of variables while correlation analysis from previous studies aidsin defining those variables important to hydrology. This information is used inconjunction with option 3 (cluster analysis) in COMPARE (with no filters beingused).6.4.1 Principal Component AnalysisPrincipal component analysis (PCA) was undertaken in order to limit thenumber of calculations to be performed by decreasing the number of variablesused by eliminating redundancy. Table 5 shows the results of the PCA and thevalues for those eigenvectors which contribute more than one unit variance (cf.Horel, 1981). The dominant variables and the percentage variation for the firstseven eigenvectors are displayed in table 6 (see table 2 for a list of the full namesof the variables). In both cases, 50% of the total variance may be attributed to79. §0Ocv0- 0.,*• •.	 ••. .4.• • ..,.	 •,..‘	 4,-, `., t. •,1 .•- 0.0• •• l•• ..... 4•Ocv0• 01- -• • 11,• • ,••• ••.	 • * •	 •to • ..,• • .• . • • • 0. .	.. -.= •00C 0ocv §i00000Oco000C \ 100,..•, . Co ..:' • ** 5.5.• •	•	. 	0 • ♦.•,a• •..5 .... 5 0 ••	 ••••: •• e . .••':•••• •	 • - *.	,	 • • • . , •••••Figure 18 Proximity and Dissimilarity Relations for Two Basins — Test 1C8Cas6001.	 08	 09	 Ot	 OZAlpeflwissia017	 OE	 OZ	 0 L(pewloisue4) AlimilwissiaOZ1 001. 09 09 017 OZ	Ot,	OE	 OZ	 0LAlpellwpsia	 (pewicpue4) Alimilwissia800chU)OC\Ito0•U)Cco03• ••U)••••- OLACanOc‘i0. ••• 00• Ocha8CO-OO• aCCo03• •	• •• •.•	00• 00• 00• 0CCOco• ••	•	• •	- 0Figure 19 Proximity and Dissimilarity Relations forTwo Basins — Test 1 Using Geology as a FilterECCasCO001	 08	 09	 017	 OZAweilwissia017	 OS	 OZ	 01Awei!wissla pet.wojsueii09 09 01' OE OZ 01Awellwiss iaOE	 OZ	 01Ape guisqa peauopueJi81eight variables. Magnitude and channel numbers are important to variation inboth the untransformed and transformed cases.6.4.2 Morphometry and DischargeAnother method to limit the number of variables to be used, as well as makethe similarity analysis more applicable to a specific situation, is to weight thevariables according to their relation to specific hydrologic or geomorphologicvariables (in this case the former). Area, main channel length and first orderfrequency are all highly correlated to mean annual discharge and peak flow(Morisawa, 1959a and 1959b). The list of variables and their associated weightsused in test 2 is shown in table Dissimilarity DistributionThe distributions of the dissimilarities may be seen in figure 20. As withthe analysis with all variables, the tranformed information has a lower geometricmean, 16.72 as opposed to 22.95, and a smaller range. The values have decreasedfrom the original analysis and provide a much smoother distribution.6.4.4 Cluster Analysis and Proximity RelationsIn the case of the untransformed data, no single basins remain unclustered atdissimilarity=32 (figure 21). If the cluster tree is broken at the geometric mean, 5clusters are produced including one singlet — basin 24. Again, there seems to besomething of a division between the basins of Rennell Sound and South Moresby.For transformed information (figure 22), 5 clusters again result if the treeis broken at the geometric mean. Clustering starts at a dissimilarity of about 282Table 5 Principal Component Analysis of 27 VariablesInitial Information:VBLE.	 EV-1 EV-2 EV-3 EV-4 EV-5 EV-6 EV-7Area -0.2897 -0.1767 -0.0658 -0.1449 -0.1019 -0.0976 -0.1324Perimeter -0.2759 -0.2365 -0.0821 -0.0381 -0.0003 0.0111 -0.0159Geometric Shape -0.0928 -0.1369 -0.0786 0.2764 -0.2426 0.4520 0.3302Relief -0.2339 0.2277 0.0167 0.1181 -0.0331 0.1269 0.0951Relative Relief 0.0467 0.3660 0.0651 0.0568 -0.0418 0.1826 0.0336Drain. Density -0.2082 0.2587 0.0729 -0.0347 0.0140 0.0901 0.1426Concavity -0.0930 -0.0653 0.3020 -0.0679 0.1208 0.3020 -0.4903V. F. Area -0.1041 -0.0200 0.3669 0.4019 -0.1572 -0.3050 -0.0529V. F. Length -0.0688 0.0004 0.2815 0.2777 -0.0784 -0.0381 0.1998V. F. Width -0.0738 -0.0429 0.3502 0.3743 -0.1141 -0.3174 -0.1775Mainstem Length -0.2700 -0.2066 -0.0845 0.0316 -0.2056 0.1182 0.0104E-R Ratio 0.0413 0.2473 -0.2775 -0.0502 -0.0993 -0.3377 0.1139Mean Elevation -0.1703 0.3048 -0.0647 0.0063 -0.0087 0.0115 -0.0275Mean Gradient -0.1264 0.3377 -0.0209 0.0197 -0.0344 0.1313 -0.1322Mean Aspect -0.0513 0.0085 -0.0751 -0.0945 0.1241 -0.3905 0.4365Channel Grad. 0.0488 0.3527 0.0455 -0.0013 0.0718 0.1558 -0.1046Lake Ratio -0.0708 0.0465 -0.4321 0.4187 0.1497 -0.0648 -0.2256Circularity 0.0836 0.2192 0.0743 -0.2074 -0.2196 -0.2696 -0.2697Ruggedness Num. -0.2477 0.2692 0.0463 0.0401 0.0166 0.0811 0.1416Basin Length -0.2666 -0.2451 -0.0866 -0.0157 0.0228 0.0448 0.0068Bifur. Ratio(1) -0.0631 0.0880 -0.0576 -0.0049 -0.6199 0.0128 0.0870Length Ratio(1) -0.1152 0.0188 0.1923 0.1026 0.4817 -0.0045 0.2735Magnitude -0.3374 0.0158 -0.0254 -0.1651 -0.0750 -0.0924 -0.0540Order -0.2751 0.0693 0.1455 -0.0582 0.2358 -0.0504 0.0431Num.	 (1)	 Chan. -0.3385 0.0186 -0.0391 -0.1531 -0.0686 -0.0940 -0.0602Num.	 (2)	 Chan. -0.3257 0.0321 -0.0174 -0.1620 0.1414 -0.0995 -0.0909Lake Index -0.0705 0.0483 -0.4277 0.4189 0.1474 -0.0669 -0.2292Transformed Information:VBLE. EV-1 EV-2 EV-3 EV-4 EV-5 EV-6 EV-7Area -0.2396 -0.2439 -0.0643 0.1330 -0.0962 -0.1257 -0.1323Perimeter -0.2217 -0.2952 -0.0888 0.0621 0.0002 -0.0094 -0.0170Geometric Shape -0.0698 -0.1519 -0.1088 -0.2760 -0.1698 0.5236 0.2690Relief -0.2760 0.1538 -0.0041 -0.1037 0.0170 0.1349 0.0816Relative Relief -0.0210 0.3662 0.0715 -0.0992 0.0045 0.1694 0.0628Drain. Density -0.2533 0.2059 0.0694 0.0466 0.0192 0.0980 0.1149Concavity -0.0751 -0.1111 0.3280 0.0221 0.1975 0.1745 -0.4975V. F. Area -0.1103 -0.0565 0.3400 -0.4378 -0.1462 -0.2922 0.0234V. F. Length -0.0817 -0.0295 0.2542 -0.2883 -0.0528 -0.0203 0.2626V. F. Width -0.0755 -0.0697 0.3264 -0.4122 -0.1179 -0.3170 -0.1048Mainstem Length -0.2239 -0.2600 -0.0986 -0.0320 -0.1744 0.1562 -0.0099E-R Ratio -0.0016 0.2484 -0.2611 0.0507 -0.1074 -0.3195 0.1159Mean Elevation -0.2177 0.2577 -0.0668 -0.0085 0.0174 0.0041 -0.0215Mean Gradient -0.1815 0.2973 -0.0182 -0.0397 0.0203 0.1145 -0.1392Mean Aspect -0.0617 -0.0192 -0.0835 0.1581 0.0678 -0.3922 0.5170Channel Grad. -0.0095 0.3560 0.0590 -0.0378 0.1098 0.1119 -0.0675Lake Ratio -0.0804 0.0222 -0.4507 -0.3808 0.1942 -0.1240 -0.1950Circularity 0.0382 0.2298 0.0920 0.1749 -0.2335 -0.2498 -0.2892Ruggedness Num. -0.2954 0.2018 0.0373 -0.0274 0.0212 0.1221 0.1193Basin Length -0.2123 -0.3041 -0.0933 0.0391 0.0263 0.0168 0.0196Bifur. Ratio(1) -0.1113 0.0902 -0.0697 -0.0190 -0.5973 0.0370 -0.0119Length Ratio(1) -0.1194 -0.0143 0.1853 -0.0311 0.4970 -0.0426 0.2441Magnitude -0.3388 0.0204 -0.0029 0.1248 -0.1215 -0.0634 -0.0399Order -0.2925 -0.0038 0.1375 0.1179 0.2052 -0.0601 0.0097Num.	 (1)	 Chan. -0.3393 0.0219 -0.0126 0.1181 -0.1190 -0.0644 -0.0437Num.	 (2)	 Chan. -0.3073 -0.0373 -0.0093 0.1864 0.1402 -0.0826 -0.1147Lake Index -0.0804 0.0241 -0.4456 -0.3812 0.1924 -0.1267 -0.200383Table 6 Variables and Weights for Test 2Variate	 Type Weightrock type 0 0rock structure 2 0soil type 0 0precipitation 0 0basin area 2 2perimeter length 2 0geomtric shape 2 0relief 2 0 Typerelative relief 2 1 --drain density 2 0 0 -> nominal/ordinalconcavity 2 0 1 -> intervalvalley flat area 2 0 2 -> ratiovalley flat length 2 0valley flat width 2 0mainstem length 2 2e-r ratio 2 0mean elevation 2 1mean gradient 2 2mean aspect 2 0channel gradient 2 1lake ratio 2 0circularity 2 0ruggedness number 2 0basin length 2 0bifurcation	 (1) 2 0length ratio	 (1) 2 0magnitude 1 2order 1 0number (1)	 channel 1 2number (2)	 channel 1 1lake index 2 0mass wasting 0 0Figure 20 Distributions of Dissimilarities of Test 2CV000cofs.a0vawonMaiagaiflanaanapmemegrasnammitawastove?.K:?:..zianweift.A.;monvivoncomm,	-"among091.	 001.	 Og	 0	 091.	 001.	 OS	 085Table 7 Principal Components of Drainage Basin VariabilityOriginal Information Transformed InformationEigenvector Eigen Value Cumulative Percent Four Dominant Variables Eigen Value Cumulative Percent Four Dominant Variables1 7.61 28.2 Area, Magnitude, #1stChan., #2nd Chan.7.69 28.5 Rugg. Num., Magnitude,#1st Clan., # 2nd Chan.2 6.08 50.7 Rd. Relief, Mn. Elev.,Mn. Grad., Chan. Grad6.27 51.7 Rel. Relief, Mn. Grad.,Chan. Grad., Basin Leng.3 2.51 60.0 Conc., VFA, VFW, LakeInd.2.52 61.0 Conc., VFA, Lake Rat,Lake Ind.4 1.99 67.4 VFL, VFW, Lake Rat.,Lake Incl.1.90 68.1 VFA, VFW, Lake Rat.,Lake Ind.5 1.71 73.7 Geom. Shp., Bifur. (1),Length (1), Order1.75 74.5 Circ., Bifur. (1), Length(1), Order6 1.57 79.5 Geom. Shp., VFW, E-R,Mn. Asp.1.51 80.1 Geom. Shp., Conc., Mn.Asp., Circ.7 1.05 83.4 Geom. shp., Conc., Mn.Asp., Length (1)- - -and most major clusters are produced by dissimilarity approximately equal to 15.Basin 24 seems to be an oddity as it remains a singlet until the end of the analysisand does not get added into the cluster tree until dissimilarity=42.5.Again, there appears to be no correlation between geographic proximity andsimilarity (figure 23).6.5 Discussion of Critical DissimilarityThere are varying degrees of similarity between landscape units (cf. figures15 and 20). While two drainage basins may be compared and a distanceindex calculated, how does one decide when this number represents acceptablesimilarity?One method is to determine a 'critical dissimilarity value' based upon thedissimilarities of the drainage basins in the region. This may be thought of assimilar to determining a value at which to dissect a cluster tree into its variousclusters. To separate clusters, Mather and Doornkamp (1970, p. 176) chose avalue which was established arbitrarily because "... fewer groups would not haveprovided a sufficient basis for interregional comparisons, while the selection ofmore groups would have be a return towards a state of increasing complexity."A more justifiable technique is to use some critical characteristic of the distancedistribution. In this project, the geometric mean was chosen as it represents acentral tendency measure of the non-normal distance distribution. For distancesusing nontransformed information for all variables the geometric mean is 42.56while the distances from transformed data have a geometric mean of 21.23. Thesevalues may be used to break the cluster tree. The average dissimilarity of theresulting clusters, approximately 18 for transformed information for example,87Figure 21 Cluster Tree of Dissimilarities of Test 2 (Nontransformed Information)**	 *AP17 AP** 9 **	  of6Z	  ZZ	  ZS	  9E	 SZH	E9t7-17Li'19Zt69179os117*4 e2	 69	  617SCtIC	  66	  LC	  CC	  LZ	  8Z	  9Z	  lE	  09	  Sty	  esZSCZ91701717ZI	 I	 1	 108	 09	 017	 OZ	 0Alueilwissla88-r,  01'	 08 OZ 01 0C Imlt7 IZ1761'69917991.178559L9179Z909LZ4* LApieguissla8909Figure 22 Cluster Tree of Dissimilarities of Test 2 (Transformed Information)LZCZ9ZECLE	  17	  099	  01'	  917	  GE	  17E	  68	  LE	  9E169# **Z94. 9 +1,4d'e*	1.1L	  9ZSZ	  6Z	  ZZ	  OE	  ZE 9 44L9• •••0- 0CY)0-•*.- 0• ••• •00Cco0- 0• g• •• ••	• 4, • 0•• • f• S. 44. ••- 0017	 OE	 OZ	 01••• alo•••• •	 ••• •• •	 :4•CTo• • . •-• • • • • ••	 . • .r. v.•• •	 •. 444. :• *40 • •00CVCCO- 0•• •	 •• •• •	 •• • * t - oFigure 23 Proximity and Dissimilarity Relations of Two Drainage Basins — Test 208	 09	 017	 OZ(pewiopue4)a0001	 08	 09	 017	 OZ	 0 09	 017	 OE	 OZ	 01(pewmpue4) Apeliwissi a0 E0CV0090would be a good index for similarity. It seems to correlate with the modes of thedistributions. When using this value, if two basins which were being analysed hada dissimilarity index greater than this value, it would be appropriate to continuesearching for a more similar basin as many basins exist which are more similar.However, if the analysis produces a dissimilarity less than 18, the user may wishto terminate the search.A stricter critical dissimilarity may be determined using the ten percentilevalue of the dissimilarity distribution: 17.7 for test 1 information and no tran-formation; 9.9 for test 1 information and transformations applied; 9.0 for test 2information; and 7.5 for test 2 information with transformations applied. If thecluster trees (figures 16, 17, 21 and 22) are broken at these levels it can be seenthat the ten percentile value is roughly equivalent to the boundary between smallclusters ( from 1 to 12 basins) and large clusters (greater than 12 basins).6.6 ConclusionA few main points should be noted from the above analysis. The firstpoint is that transformation of certain non-normal distributions has not reallyaffected the analysis, i.e., the statistical procedures used are somewhat robustas they seem to produce the similar results for normally and non-normallydistributed distribtuions. Next, geographic proximity, a selection criterion uponwhich many researchers have based their identification of similar basins seemsto be of more importance to biogeophysical and historical characteristics than tomorphometry. While it is acknowledged that morphometry (the distributions ofthe individual parameters and the relations among them) varies with region, itis possible to find morphometrically similar basins in what may be considered91different geographical regions (cf. cluster analysis in figures 16, 17, 21, and22). Biophysical characteristics, such as vegetation patterns, probably are moredistinguishable between regions (depending upon how they are classified).The lack of a distinguishable geographic proximity-similarity relation chal-lenges Hogan's (1985) notion that the most similar basins will be sub-basinswithin a main basin. Basin 53 and the two basins which are geographically theclosest (cf. figure 19) are all sub-basins within a main basin. It can be seenthat for both untransformed and transformed information, there are basins whichare more characteristically similar to basin 53 than the sub-basin which is secondclosest in geographic proximity.Finally, there is the question of which parameters to use for the analysis."Depending largely on the type of analysis, the data available and the purpose ofthe study it could be possible to reduce the number of variables ..." (de Villiers,1986, p 31). Principal component analysis shows that the first three principlecomponents in both transformed and untransformed information account for atleast 60% of the variation. However, before focussing the test on these limitedvariables, the type of test needs to be taken into account. Morisawa (1959a)showed that area, total channel length, first order frequency, and longest channellength are all highly correlated with mean annual discharge and peak flow. If thesehydrological characteristics are an important reason for the study, then the fourmorphometric parameters should be included in the similarity analysis procedure,along with those which account for the major amounts of variability.It should be stressed that no two units will be identical and there will be arange of similarities among basins. It is necessary to analyse these dis/similaritiesin order to determine when two basins are no longer similar.92Chapter 7: ConclusionMany of the classical criteria with which to assess basin similarity are impor-tant. However, they lack a means by which they can be compared quantitatively(e.g., geology and vegetation types). Current similarity analysis in drainage basinstudies consists of qualitative comparison of biogeophysical parameters and rela-tively simple analysis of those morphometric parameters deemed important. Thereseem to be two solutions to this problem. First, quantitative characteristics of thesequalitative parameters may be used, for example, fracture orientation of geologyin this study or the silt-clay ratios used by de Villiers (1986) to describe geologyand soils. Secondly, the updating of the factual knowedge-base by 'experts' mayoccur as more information becomes available.A more critical analysis includes a 'standardised' comparison of qualitativecharacteristics—filter parameters—and a quantitative dissimilarity measure (basedon euclidean distances) which incorporates both interval and ratio level informa-tion. This more critical analysis is important because it allows researchers toaccount for a greater amount of variation in the landscape—thereby improvingthe geomorphological/hydrological quasi-experiments to a form closer to a 'trueexperiment.' This thesis represents a first attempt at this objective.Two major conclusions arise from this thesis. First, there is a need for amore quantifiable and objective method for basin comparison. Secondly, becauseof large amounts of data and the broad range of knowledge required to effectsuch a procedure, this method can best be developed within the framework of aknowledge-based system.93Within this system, one may incorporate heuristic knowledge (general rules)through artificial intelligence and formalise it in such a way that may be revisedand updated as more information becomes available. This creates a standardmethodology based upon the most current knowledge and information.A knowledge-based system was created using FORTRAN and Splus lan-guages. This system consists of morphometric, biogeophysical, and historicalinformation bases and several procedural and factual knowledge bases. Morpho-metric information is obtained using a procedural knowledge-base and the TerrainResource Information Management system digital terrain model for the provinceof British Columbia. A test was conducted using information available for theQueen Charlotte Islands. Other information was collected manually from previousstudies and various maps as they are rarely available in digital form. The similar-ity test incorporates all three levels of information (ordinal, interval, and ratio) byusing ordinal information as a filter (binary test) and quantifying the interval andratio tests. It was necessary to incorporate standardisation of parameters withinthe test as parameters had a wide range of variability and scales of mensurationwhich could affect the outcome. Weights for the analysis default to 1 for each ofthe available parameters but may be reset by the user.Much of the observed variation in the landscape can be accounted for byrelatively few variables, as factor analysis has shown. However, an analysiswith all available variables was conducted in order to study total variability.Parameters may also be 'thinned' by using only those known to be related tothe study in question. Upon analysis of the dissimilarity distribution, it can beseen that it is very skewed. Transformations were completed on non-normalinformation and there was little change in the resulting correlation structure, or94the distribution of dissimilarities. The geometric mean of dissimilarites may beused to break the cluster trees. The resulting average dissimilarity is used as a`critical dissimilarity' the point at which two basins are no longer similar.Future research will aid in developing the analysis procedure. There is a needto develop an accurate method with which to assess qualitative variables—e.g.,when are two geologies similar. Also, more accurate methods with which tomeasure drainage basin morphometry should be developed. There were manyproblems associated with using the TRIM data in a raster environment.Generally, untransformed and transformed information produce similar resultsfrom the similarity analysis. A critical assessment of the similarity of hydrologywith respect to the results of this study may aid in determining the effects of thetransformation.Finally, a more statistically and geographically accurate method of analysingdissimilarity measures must be developed. This thesis formalises a procedure withwhich to calulate dissimilarity and provides an initial assessment of the conceptof 'critical dissimilarly.' This concept must be more accurately assessed. Thismay be more adequately effected using fuzzy clustering methods which are moregeograhically appropriate than classical clustering methods.95ReferencesAddis, T. R. (1985) Designing Knowledge -Based Systems. Kogan Page, London.322 pp.Ahnert, F. (1980) A note on measurements and experiments in geomorphology.Zeitschrift fur Geomorphologie, N. F., Supplementband 35, p. 1-10.Alley, N. F. and Thompson, B. (1978) Aspects of environmental geology. Parts ofGraham Island, Queen Charlotte Islands. Bulletin of the Ministry of ForestsResource Analysis Branch, Victoria. 64 pp.Ayeni, O. O. (1976) Considerations for automated digital terrain models withapplications in differential photo mapping. Unpublished Phd. Dissertation.Ohio State University, Columbus, Ohio. 188 pp.Ayeni, O. O. (1978) Automated digital terrain models. In Proceedings of theDigital Terrain Model Symposium, American Society of Photogrammetry,St. Louis, p. 276-306.Ayeni, O. O. (1982) Optimum sampling for digital terrain models: A trend towardsautomation. Photogrammetic Engineering and Remote Sensing, 48:11, p.1687– 1694.Banner, A., Pojar, J., and Trowbridge, R. (1983) Ecosystem classification of theCoastal Western Hemlock zone, Queen Charlotte Island Subzone (CWHg),Prince Rupert Forest Region, British Columbia. Unpublished Report, BritishColumbia Ministry of Forests, Smithers, B. C. 235 pp.Boehm, B. W. (1967) Tabular representations of multivariate functions—withapplications to topographic modeling. Proceedings of the Association ofComputing Machinery 22nd National Conference, p. 403-415.Burrough, P. A. (1986) Principles of Geographical Information Systems for LandResources Assessment, Monographs on Soil and Resources Survey Number12. Clarendon Press, Oxford. 194 pp.Burt, T. P. and Walling, D. E. (1984) Catchment Experiments in FluvialGeomorphology. Geobooks, Norwich. 593 pp.96Chorley, R. J., Schumm, S. A. and Sugden, D. E. (1984) Geomorphology.Methuen, London. 605 pp.Church, M. (1984) On experimental method in geomorphology. In CatchmentExperiments in Fluvial Geomorphology, T. P. Burt and D. E. Walling, Eds.,Geobooks, Norwich. p. 563-580.Clark, D. (1973) Normality, transformation and the principal components solution:an empirical note. Area, 5, p. 110-113.Cronan, C., Driscoll, C., Newton, R., Kelly, J., Schofield, C., Bartlett, R., andApril, R. (1990) A comparative analysis of aluminum biogeochemistryin a northeastern and a southeastern forested watershed. Water ResourcesResearch, 26:7, p. 1413-1430.Davis, W. M. (1899) The geographical cycle. Geographical Journal, 14, p.481-504.Dillon, P. J. and Kirchner, W. B. (1975) The effects of geology and land useon the export of phosphorus from watersheds. Water Resources Research,9:1, p. Villiers, A. B. (1986) A multivariate evaluation of a group of drainage basinvariables—a South African case study. International Geomorphology PartII, John Wiley and Sons, p. 21-32.Donaldson, T. S. (1968) Robustness of the F-test to errors of both kinds andthe correlation between the numerator and the denominator of the F-ratio.Journal of the American Statistcal Society, 63, p. 660-676..Ebisemiju, F. S. (1986) Environmental constraints of the interdependence ofdrainage basin morphometric properties. International Geomorphology PartII, John Wiley and Sons, p. 3-20.Elmasri, R. and Navathe, S. (1989) Fundamentals of Database Systems.Benjamin/Cummings, California. 802 pp.Evans, I. (1979) An integrated system of terrain analysis and slope mapping.Report 6, Department of Geography, University of Durham. 192 pp.97Finney, D. J. (1941) On the distribution of a variate whose logarithm is normallydistributed. Journal of the Royal Statistical Society, Series B, 7, p. 155-161.Fisher, P. F. (1990) A primer of geographic search using artificial intelligence.Computers and Geosciences, 16:6, p. 753-776.Fisher, P. F., Mackaness, W. A., Peacegood, G., and Wilkinson, C. G. (1988)Artificial intelligence and expert systems in geodata processing. Progressin Physical Geography, 12:3, p. 371-388.Fisher, L. and Van Ness, J. W. (1971) Admissible clustering procedures.Biometrika, 58, p. 91-105.Gardiner, V. (1973) Univariate distributional characteristics of some morphometricvariables. Geografiska Annaler, 54A, p. 147-153.Gerrard, A. J. and Robinson, D. A. (1971) Variability in slope measurements.Transactions, Institute of British Geographers, 54, p. 45-54.Giere (1979) Understanding Scientific Reasoning. Holt, Rinehart and Winston,New york. 371 pp.Gimbarzevsky, P. (1988) Mass wasting on the Queen Charlotte Islands. LandManagement Report 29, British Columbia Ministry of Forests, Victoria.96 pp.Gordon A. D. (1981) Classification: methods for the exploratory analysis ofmultivariate data. Chapman and Hall, London. 193 pp.Gower, J. C. (1971) A general coefficient of similarity and some of its properties.Biometrics, 27, p. 857-874.Graham, I and Jones, P. (1988) Expert Systems: Knowledge, Uncertainty, andDecision. Chapman and Hall, London. 363 pp.Hewlett, J. D. (1971) Review of representative and experimental basins. Bulletinof the American Meteorological Society, 52, p. 892-893.Hewlett, J. D., Lull, H. W. and Reinhart, K. G. (1969) In defense of experimentalwatersheds. Water Resources Research, 5:1, p. 306-316.98Hogan, D. L. (1985) Stream channel morphology: Comparison of logged andunlogged watersheds in the Queen Charlotte Islands. Unpublished Master'sThesis, The University of British Columbia. 220 pp.Horel, J. D. (1985) A rotated principal component analysis of the interannualvariability of the northern hemisphere 500 mb height field. Monthly WeatherReview, 109, p. 2080-2092.Horton R. E. (1945) Erosional development of streams and their drainagebasins: hydrophysical approach to quantitative morphology. Bulletin ofthe Geological Society of America, 56, p. 275-370.Ismail, M. A. (1988) Soft clustering: Algorithms and validity of solutions. InFuzzy Computing, M. Gupta and T. Yamakawa, Eds., Elsevier Science,New York. 499 pp.Jackson, Jr., P. C. (1985) Introduction to Artificial Intelligence. General PublishingCompany, Toronto. 453 pp.Jarvis, R. S. (1977) Drainage network analysis. Progress in Physical Geography,1, p. 271-295.Jenson, S. K. (1985) Automated derivation of hydrologic basin characteristics fromdigital elevation data. Proceedings of Auto -Carto 7, Digital Representationsof Spatial Knowledge, Washington, D. C., March 11-14, American Societyof Photogrammetry, p. 301-310.Leopold, L. B. (1972) Hydrologic research on instrumented watersheds.Symposium of Wellington: Results of Research on Representative andExperimental Basins, International Association of Scientific Hydrology,no. 97, p. 135-150.Lewis, L. A. (1969) Analysis of surficial landform properties—the regionalisationof Indiana into units of morphometric similarity. Proceedings of the IndianaAcademy of Science, 78, p. 317-328.Ling, R. F. (1973) A probability theory of cluster analysis. Journal of the AmericanStatistical Association, 68, p. 159-164.Mark, D. M. (1975) Geomorphic Parameters: A review and evaluation.Geografiska Annaler 57A, p. 165-177.99Mark, D. M. and Goodchild, M. F. (1982) Topologic model for drainage networkswith lakes. Water Resources Research, 18:2, p. 275-280.Marks, D., Dozier, K. and Frew, J. (1984) Automated basin delineation fromdigital elevation data. Geo-processing, 2, p. 299-311.Mather, P. M. and Doornkamp, J. C. (1970) Mulitvariate analysis in geography.Transactions of the Insitute of British Geographers, 51, p. 163-187.Melton, M. A. (1957) An analysis of relations amongst elements of climate,surface properties, and geomorphology. Technical Report 11, United StatesOffice of Naval Research. 32 pp.Ministry of Environment and Parks (1988) Specifications and guidelines 1:20000 digital mapping. Release 3.0. Surveys and Resource Mapping Branch,British Columbia Ministry of Environment and Parks, Victoria, BritishColumbia. 314 pp.Morisawa, M. E. (1959a) Relation of morphometric properties to runoff in theLittle Mill Creek, Ohio, drainage basin. Technical Report No. 17, UnitedStates Office of Naval Research. 9 pp.Morisawa, M. E. (1959b) Relation of quantitative geomorphology to stream flowin representative watersheds of the Appalacian Plateau Province. TechnicalReport No. 20, United States Office of Naval Research. 94 pp.National Center for Geographic Information and Analysis (1989) Technical Issuesin GIS: Core Curriculum. University of California, Santa BarbaraNorcliffe, G. B. (1982) Inferential Statistics for Geographers. Hutchinson andCo., London. 263 pp.O'Callaghan, J. F. and Mark, D. M. (1984) The extraction of drainagenetworks from digital elevation data. Computer Vision, Graphics, and ImageProcessing, 28, p. 323-344.O'Neill, M. P. and Mark, D. M. (1987) On the frequency distribution of landslope. Earth Surface Processes and Landforms, 12, p. 127-136.Parminter, J. (1983) Fire history and ecology in the Prince Rupert Forest Region.In Prescribed Forest-Fire Soils Symposium Proceedings, R. L. Trowbridgeand A. Macadam, eds., Land Management Report 16, British Columbia100Ministry of Forests, p. 1-35.Pearce, A. J., Stewart, M. K., and Sklash, M. G. (1986) Storm runoff generationin humid headwater catchments. 1. Where does it come from? WaterResources Research, 22:8, p. 1263-1272.Pearson, W. J. (1963) A review and analysis of the fire history of the QueenCharlotte Islands. Unpublished report submitted to the Association ofProfessional Foresters of British Columbia. 50 pp.Pike, R. J. and Wilson, S. E. (1971) Elevation-relief ratio, hypsometric integral,and geomorphic area-altitude analysis. Geological Society of AmericaBulletin, 82, p. 1079-1084.Playfair (1802) Illustrations of the Huttonian theory of the earth. William Creech,Edinburgh. 528 pp.Rayner, J. H. (1966) Classification of soils by numerical methods. Journal of SoilScience, 17:1, p. 79-92.Riekirk, H. (1989) Influence of silvicultural practices on the hydrology of pineflatwoods in Florida. Water Resources Research, 25:4, p. 713-719.Rodda, J. C. (1976) Chapter 10: Basin studies. In Facets of Hydrology, J. C.Rodda, Ed., John Wiley and Sons, London. 368 pp.Rolston, D. W. (1988) Principles of Artificial Intelligence and Expert SystemsDevelopment. McGraw Hill, New York. 257 pp.Rood, K. M. (1984) An aerial photograph inventory of the frequency and yieldof mass wasting on the Queen Charlotte Islands, British Columbia. LandManagement Report No. 34, British Columbia Ministry of Forests. 54 pp.Scheidegger, A. E. (1967) On the topology of river nets. Water ResourcesResearch, 3:1, p. 103-106.Sharpnak, D. A. and Akin, G. (1969) An algorithm for computing slope andaspect from elevation. Photogrametric Engineering, 35, p. 247-248.101Slaymaker, O. (1991) Field Experiments and Measurement Programs inGeomorphology. University of British Columbia Press, Vancouver. 224 pp.Smith, T. R. (1984) Artificial intelligence and its applicability to geographicalproblem solving. The Professional Geographer, 36:2, p. 147-158.Speight, J. G. (1971) Log-normality of slope distributions. Zeischrift furGeomorphologie, 15, p. 290-311.Sutherland Brown, A. (1968) Geology of the Queen Charlotte Islands. Bulletin 54,British Columbia Department of Mines and Petroleum Resources. 226 pp.Swindel, B. and Douglass, J. (1984) Describing and testing nonlinear treatmenteffects in paired watershed experiments. Forest Science, 30, p. 305-313.Tarboton, D. G., Bras, R. L. and Rodriguez-Iturbe, I. (1989) The analysis of riverbasins and channel networks using digital terrain data. Report Number326, Ralph M. Parsons Laboratory, Department of Civil Engineering,Massachusetts Institute of Technology. 251 pp.Trowbridge, R. L. and Macadam, A. Eds. (1983) Prescribed fire-forest soilssymposium proceedings. Land Management Report 16, British ColumbiaMinistry of Forests.Valentine, K., Sprout, P., Baker, T. and Lavkulich, L., eds. (1978) The SoilLandscapes of British Columbia. Resource Analysis Branch, Ministry ofEnvironment, Victoria, B. C. 197 pp.Wang, F., Hall, G. B., and Subuaryono (1990) Fuzzy information representationand processing in conventional GIS software: database design andapplication. International Journal of Geographical Information Systems,4:3, p. 261-283.Wood, W. F. and Snell, J. B. (1957) The dispersion of geomorphic dataaround measures of central tendency and its application. Research StudyReport EA-8, Environmental Analysis Branch, Quartermaster Research andDevelopment Center, United States Army, Natick, Massachusetts. 10 pp.Wood, W. F. and Snell, J. B. (1960) A quantitative system for classifyinglandforms. Technical Report EP-128, Environmental Protection Division,Headquarters, Quartermaster Research and Engineering Command, United102States Army, Natick, Massachusetts. 20 pp.Zavoianu, I. (1985) Morphometry of drainage basins. Developments in WaterScience, 20, Elsevier, Amsterdam. 238 pp.Zevenbergen, L. W. and Thorne, C. R. (1987) Quantitative analysis of landsurface topography. Earth Surface Processes and Landforms, 12, p. 47-56.103Appendix A: Geomorphometry and Statistical DistributionsIt is often assumed that indices of drainage form are normally distributed andmean values of morphometric parameters are frequently quoted as being represen-tative of the landscape of the area. However, such averages are not representativeof central tendency if derived from a skew or bimodal distribution. All parametricprocedures impose constraints upon their input data: normality; data without mea-surement error; linearity; homoscedascity; and serial independence (Clark, 1973).Normality is the constraint to which most importance has been placed (Clark,1973; Gardiner, 1973). Three courses of action are available if the normalityassumption is violated:a) a small minority emphasize their conifidence in the robustness of themean and standard deviation as measures of central tendency and dispersion byproceding to analyse the raw data;b) normality can be imposed by transforming the data although it is unlikelythat any one transformation will be appropriate for the entire data set;c) the use of non-parametric techniques of analysis.It is the purpose of this appendix to analyse the distributional characteristicsof various geomorphometrical parameters and the methods used to analyse andrepresent them.Most morphometric parameters have been found to be non-normally dis-tributed (Gardiner, 1973). Table 4 is a summary of Gardiners (1973) analysis.In his own analysis, that with the largest sample population, he finds that allvariables for all basin orders, with the exception of third order drainage densityand drainage density, are significantly non-normally distributed (at a 99.9% con-104fidence limit) Six out of the other eight studies, Krumbein and Graybill (1965)and Miller (1953) (cited in Gardiner, 1973) being the exceptions, have found thatmost if not all of their distributions are non-normally distributed, and most arelog-normally distributed. At a larger scale, Gerrard and Robinson (1971), O'Neilland Mark (1987), and Speight (1971) find that slope angles are also non-normallydistributed. Log-normal distributions appear to be the most predominant of thenon-normal distributions. To be able to assess the non-normal distribtutions in astatistically valid manner one of two methods may be employed: tranformations;or non-parametric statistics.The precise effects of non-normality, and the transformation of the data tonormalise this distribution remain unclear. Normality can be imposed by trans-forming the data although it is unlikely any one transformation will be appropriatefor the complete data set—each variable needs to be treated independently. Whiletransformations are applied to fit the data to some model assumption, they mayalso drastically alter the relation among the variables. In this context 'blanketcover' transformations are conceptually most attractive since equality of treat-ment ensures preservation of the original patterns of the data. However, in allstages the researcher must realise that they are manipulating transformed ratherthan real world relations. Prior testing for normality is essential to ensure thatthe solution will reflect intrinsic rather than spurious variability and will lead torealistic interpretation of genuine empirical patterns.Gardiner (1973), O'Neill and Mark (1987), and Speight (1971) all assessed theuse of transformations. Gardiner (1973) found that the lognormal transformationdoes not offer a completely satisfactory solution to the problem of morphomet-ric non-normality. The statistical nature of morphometric variables is found to105differ from variable to variable. the simple measures of area and length are trans-formed readily to a form approximating normality by the use of the log-normaltransformation whereas more complex variables formed from combinations andratios of measurements usually demand more complex transformations (Gardiner,1973). The log-normal transformation provides the most suitable transformationfor most of the variables commonly employed in the analysis of drainage basinmorphometry, and ameliorates, if not completely removes, the worst manifesta-tions of non-normality of the other variables. Speight (1971) found that whenslopes are expressed in terms of the logarithm of the gradient, the distributionsare nearly normal and their standard deviations are independent of average gradi-ent. However, O'Neill and Mark (1987) analyse the several transformations anddiscover that no single transformation is capable of normalising all slope distri-butions. The square root of sine provided better results than sine, log-tangent, orseveral other transformations.An alternative method is the use of non-parametric statistics. These do notimpose the constraint of normality on the data as do parametric statistics, butnon-parametric statistics are thought to be 'less powerful' than their counterpart.The 'power' of a test is defined as the probability of rejecting the null hypothesiswhen it is true (Finney, 1941; Norcliffe, 1982). However, Donaldson (1968) hasfound empirically that the F-test (an important parametric test) is quite robust, i.e.,relatively insensitive to the violation of it's assumptions, when applied to non-normal data and to data with unequal variances. In this study, transformationsprovided mixed results (figures 24 to 32). Some non-normal distributions werenot transformed as no adequate transformation was achieved.One interesting point to note is that there seems to be effect of scale on the106distributions. While Gardiner's information was gathered at a scale of 1:10 560and the results of this study were measure at 1:20 000 (Mather and Doornkampdo not mention the scale of their information), all achieved similar results formagnitude, area and perimeter. Magnitude is the variate which would seem mostaffected by scale as the 'blue line' network on (digital) maps will vary with scale.107Zl 01 8 9 t 0 Zl 0co(1)0U)S L	 0l S	 0a)a)E08	 9	 17	 Z	 00I.	 01	 8	 9	 17	 Z	 0Figure 24 Distributions and Transformations of Morphometric Variables(for tranformation types, see table 4)108Figure 25 Distributions and Transformations of Morphometric Variables0000ocot-s0000c 8	 9	 Z	 0 09 09 O' OE OZ 01	 0tl Z1 01 8	 9	 i'	 Z	 0	 04	 8	 9	 Z	 0CDOCD"Ct0109Figure 26 Distributions and Transformations of Morphometric Variablesel	 Ol	 8	 9	 17	 e	 0 0l	 8	 9	 17	 Z	 0 0EELc)00C000171.	 el.	 01.	 8	9	 1► 	 e	 0	 91.	 01.110Figure 27 Distributions and Transformations of Morphometric Variables0.01.	 8	 9	 .17	 Z	 0 01	 8	 9	 17	 Z	 0COco0Zt	 01.	 8	 9	 17	 Z	 0	 Z1. 01	 8	 9	 17	 Z	 0111Figure 28 Distributions and Transformations of Morphometric Variables09 Oil	 OE	 OZ	 0 L	 0 8	 9	 b	 Z	 009	 017	 OE	 OZ	 0 L	 0	 01.99	 t	 Z	 0tesLL0)112CDc ta;a) a2 (4CcoD0Figure 29 Distributions and Transformations of Morphometric Variables09 OS 017 OE OZ 01	 00co0CD0cv0000 8	 9	 91	 0113Figure 30 Distributions and Transformations of Morphometric Variablesco.Z1 Ol	 8	 9	 17	 Z	 0 OOOOOc9OOOOOOOEYc9	 9	 Z	 0	 SI.	 01	 01140•0Z I.	 0 I.	 8	 9	 t,	Z	 0 0Z	 9l	 0t	 SFigure 31 Distributions and Transformations of Morphometric Variables0 I.	 8	 9	 t7	 Z	 0	 Zl	 0	8	 9	 ti	 Z	 0115Figure 32 Distributions and Transformations of Morphometric VariablesOl	 9	 9	 ti	 Z	 0 09 09 Ot	 OZ Ol 00004)0OZ	 Sl	 Ol	 9	 0	 OZ	SI.	 0l	 S	 0116Appendix B: Analysis of the Terrain Resource Information Management System(TRIM) Digital Elevation Model and Drainage Basin MorphometryThe problems associated with using the Terrain Resource Management systemdigital terrain model for morphometric purposes can be classed into two categories:1) the problem with the files relating to the specifications and the specificationsthemselves; 2) the problems with the DEM related to the similarity analysisprocedure.1) The TRIM DEMs contain all DEM points collected by stereo compilation,breaklines (sharp and rounded), and supplementary data from planimetric capture.It is difficult to analyse the representativeness of a DTM unless the purpose, aswell as any analyses to be completed, of the model is known. In this investigation,the TRIM specifications were taken into account in all calculation procedures. Forexample, the resolution of TRIM data limits the smallest representable drainagebasin to 0.5 km2 . However, there are some important aspects of the DEM fileswith respect to adequacy of representation which should be considered for thisstudy.It is widely accepted that the number of points needed to represent a surface ina DEM is realted to the complexity of that surface (Ayeni, 1982; Burrough, 1986).TRIM specifications attempt to account for this variation. In gridded capture, fourtimes as many points are measured in the steeper areas as in the shallower areas,and in random capture twice as many. However, the gradient referred to is theaverage gradient of the surface of the entire map sheet, therefore, on a more localscale, either the more complex terrain (such as mountain peaks) is adequatelyrepresented or less complex terrain (such as valley flats) is overrepresented. Bycomputing gradients and point densities, the independence of the two variables117within the TRIM DEM can be shown. Point densities were calculated using twodifferent methods: first, by calculating the number of DEM points falling withina circle of a given radius; and second, by calculating the size of a circle needed toencompass a critical number of DEM points. Gradients were calculated betweenthe center and the fourth, sixth, and eighth furthest points from the center. Themeasurement technique has some effect on the relation acheived. Two patternsare evident from this analysis: a lack of any structure at all; and two distinctgroupings independent of gradient. This pattern is evident in many of the DEMswhen measured over these larger areas. However, the groups are not seen whenmeasured over the smaller areas, such as to the tenth point. In all cases it is seenthat point density appears to be independent of gradient, and in particular, thereare no differences between gradients steeper and shallower than 25° as mentionedFigure 33 TRIM Specifications500400 GRIDDEM	 300Density(points/km 2 )200 RANDOMGRIDRANDOM1000 	0 25Average Slope (degrees)118in the TRIM Specifications Manual (Ministry of Environment, 1988) (cf. figure31). This is because the average gradients used to calculate point densities werearrived at using the entire map sheet, not necessarily a specific area.In both the steep and shallow gradient situations, TRIM specifications haveprovided a 2:1 ratio of the number of grid points to random points. If one acceptsthat a 14:1 ratio of grid to TIN (triangular irregular network—surface specificrandom) points is required for equal topographic representation (Tom Poiker, pers.comm.) then it seems that the random data capture method used in TRIM is notextremely storage efficient. This is because the randomly captured data points arenot necessarily surface specific points, such as sharp breaklines or hydrographicbreaklines.Upon comparison of the plots of the DEM points for the various map sheetsavailable (103B 015, 024, 052, Queen Charlotte Islands, and 103P 044, Terrace)it is evident that rarely is the surface represented by a semi-regular grid DEM.There are two reasons for this. First, part of the surface was captured using arandom method. Second, when the grid pattern was used, it was based upon anequidistant surface pattern which in two dimensions does not necessarily representa grid (TRIM Open House, Victoria, B.C., 1990).The Digital Mapping Group, which produced the TRIM data, is a consortiumof nine different companies. Some of these companies used different procedures(Dan Reimer, Digital Mapping Group, pers comm.)–see maps 103P 044 and 103B015, for example.2) These types of problems relate to the difficulties arising with trying tocalculate drainage basin morphometry from the TRIM digital terrain model.119One problem is 'blank' areas — large areas where no elevation points occur.One major reason for this is that these are areas where polygonal features occur,such as lakes and glaciers. These may be adequately accounted for within thecurrent study. However, 'blank' areas do occur for no apparent reason. Forexample, there appears to be a large region (about 500m in diameter) at UTM312500 5829000 where only 4 elevation points occur (out of the four possibletypes: definite DEM, indefinite DEM, and sharp and hydrometric breaklines).There appears to be no lake, glacier, or other water body in this area.Another problem are trinary stream junctions—points where more than twostream links join. Most studies and network parameters are based upon binaryjunctions. In these cases, order is calculated in a similar manner as when justa binary situation occurs.The main problem with the MORPHCALC procedure with respect to theTRIM files is the delineation of the drainage basin. The difficulties arise whengridding the DTM. It was decided that a grid format would be more appropriate(see sections 2.2 and 4.2.2) for sampling. Another reason is that gridded basindelineation procedures are widely used (Evans, 1979; Marks, Dozier, and Frew,1984; Jenson, 1985). However, gridding the point data (to 50 m intervals,one of the TRIM densities) removed some of the relief needed to delineatethe smaller basins. It should be noted however, that even if TIN procedureswere used many small basins would be inaccurately delineated as well as somechannels do not have elevation points separating them in the DTM. This problemwas somewhat solved by providing a surrounding boundary to which the basindelineation procedure may calculate (which the user defines based upon coastlinesand headwaters of surrounding basins).120Another problem is the large sizes of the files. Each DEM and planimetric filecan be up to 4 megabytes. In extreme cases it is necessary to combine and analyseup to 8 of these files which takes an incredible amount of time (up to 5 hours).The main problem with the TRIM DTM is that the point densities do notrelate to topographic complexity, none-the-less, there seems to be sufficient data tocalculate morphometric characteristics. While TIN analysis may help to eliminatesome of these problems, there will still be some difficulties.121Appendix C: COMPARE: Sections of the knowledge-base systemThe following diagrams represent important sections of the COMPAREknowledge-based system for untransformed information. For transformed in-formation, a similar set of procedures was used although transformations wereperformed in topdist.fnc and geomdist.fnc.122Figure 34 Compare.fncfunction()#BASIN SIMLARITY ANALYSISfor(i in 1:80) {cat(" ", fill = TRUE)}cat("BASIN SIMILARITY ANALYSIS", fill = TRUE)cat(" 	 .)cat(" ", fill = TRUE)cat(" ", fill = TRUE)cat("Please choose a method:")cat(" ", fill = TRUE)cat(" ", fill = TRUE)cat(" 1) Choose 1 or 2 basins using coordinates",fill = TRUE)cat("	 2) Choose 1 basin and analysis will find the3 most similar", fill = TRUE)cat("	3) Analyse all (cluster analysis)", fill = TRUE)cat(" ", fill = TRUE)cat(" ", fill = TRUE)cat("Method ( 1, 2, or 3) ?", fill = TRUE)ans <- readline()cat(" ", fill = TRUE)cat(" ", fill = TRUE)filt <- FALSEcat("Which filter variables do you want to use?", fill = TRUE)cat("	1) None", fill = TRUE)cat("	2) The default filters: gtype, stype", fill = TRUE)cat("	3) Edit filter file", fill = TRUE)ans3 <- readline()if(ans3 == 2) {unix("mv filtgd.dat filter.dat")filt <- TRUEif(ans3 == 3) {unix("mv filtgd.dat filter.dat")unix("vi filter", output = FALSE)filt <- TRUE1cat(" ", fill = TRUE)cat(" ", fill = TRUE)cat("Do you wish to change the weights?", fill = TRUE)cat("	1) Yes", fill = TRUE)cat("	2) No, use the default weights", fill = TRUE)ans3 <- readline()unix("mv wgtgd.t.dat wght.t.dat")unix("mv wgtgd.g.dat wght.g.dat")unix("mv wgtgd.b.dat wght.b.dat")unix("mv wgtgd.h.dat wght.h.dat")if(ans3 == 1) (unix("vi wght.t.dat", output=FALSE)unix("vi wght.g.dat", output=FALSE)unix("vi wght.b.dat", output=FALSE)unix("vi wght.h.dat", output=FALSE)cat(" ", fill = TRUE)cat(" ", fill = TRUE)bgp <- TRUEmorph <- TRUEhist <- TRUEcat("Please choose parameter types you wish to use:", fill = TRUE)cat("	 Biogeoclimatic (y or n) ? ")typs <- readline()if(typs == "n" II typs == "N") {bgp <- FALSEcat("	 Morphometric (y or n) ? ")typs <- readline()if(typs == "n" II typs == "N") {morph <- FALSE123cat("	 Historical (y or n) ? ")typs <- readline()if(typs == "n" II typs == "N") {hist <- FALSE}print(hist)print(bgp)if(ans == 1) (cat("Input x coordinate of first basin (UTM)", fill = TRUE)xcoorl <- as.numeric(readline())cat("Input y coordinate of first basin (UTM)", fill = TRUE)ycoorl <- as.numeric(readline())secd <- 0basinl <- findbas.fnc(xcoorl, ycoorl, secd)cat("Would you like to:", fill = TRUE)cat(" ", fill = TRUE)cat(" ", fill = TRUE)cat(" 1) enter the coordinates of the second basin", fill =TRUE)cat("	2) have the system find the closest basin", fill =TRUE)cat("Method ( 1 or 2 ) ?", fill = TRUE)ans2 <- readline()secd <- basinlif(ans2 == 1) {cat("Input x coordinate of first basin (UTM)", fill =TRUE)xcoorl <- as.numeric(readline())print(xcoorl)cat("Input y coordinate of first basin (UTM)", fill =TRUE)ycoorl <- as.numeric(readline())basin2 <- findbas.fnc(xcoorl, ycoorl, secd)simlr <- sumdist.fnc(morph, bgp, hist, basinl, basin2, filt)cat("Distance between ", basinl, " and ", basin2, " is ", simlr}if(ans == 2) {cat("Input x coordinate of first basin (UTM)", fill = TRUE)xcoorl <- as.numeric(readline())cat("Input y coordinate of first basin (UTM)", fill = TRUE)ycoorl <- as.numeric(readline())secd <- 0basinl <- findbas.fnc(xcoorl, ycoorl, secd)coors <- matrix(scan("coords"), ncol = 3, byrow = TRUE)numcoor <- length(coors[, 1])closest <- 99999999closest2 <- 99999999closest3 <- 99999999for(i in 1:numcoor) (if(i != basinl) {simlr <- sumdist.fnc(morph, bgp, hist, basinl,filt)if(simlr < closest) {closest <- simlrbascll <- iif(simlr < closest2 11 simlr > closest) {closest2 <- simlrbascl2 <- i}if(simlr < closest3 I) simlr > closest2) {closest3 <- simlrbascl3 <- i}124}cat("first ", bascll, " second ", basc12, " third ", bascl3)1if (ans == 3) {coors <- matrix(scan("coords"), ncol = 3, byrow = TRUE)numcoors <- length(coors[, 1])distarr <- 0basarr <- 0bascnt <- 0for(j in 1:numcoors)chck <- FALSEfor(k in (j + 1):numcoors) (simlr <- sumdist.fnc(morph, bgp, hist, j, k,filt)if(! {distarr <- c(distarr, simlr)basarr <- c(basarr, j)chck <- TRUEif(chck == TRUE) {bascnt <- bascnt + 1}}endofit <- nrow(distarr)distarr <- distarr[2:endofit]distarr2 <<- distarr[2:endofit]chck2 <- (nrow(distarr) * (nrow(distarr) - 1))/2if(chck2 != bascnt)return("error in number of basins andcomparison")distmat <- matrix(0, bascnt, bascnt)for(j in 2:bascnt)for(k in (j + 1):bascnt)distmat[j, k] <- distarr[(bascnt *(j-1)-(j * (j - 1))/2 + k - 1)]}plclust.fnc(distmat)# plots clusters and proximity/similarity relationsans125Figure 35 Sumdist.fncfunction(morph, bgp, hist, basl, bas2, filt)(# calculates the difference between the informationdifsum <- 0if(morph == TRUE)dif <- geomdist.fnc(basl, bas2, filt)difsum <- difdif <- topdist.fnc(basl, bas2, filt)difsum <- difsum + dif}if(bgp == TRUE) {dif <- bgpdist.fnc(basl, bas2, filt)difsum <- difsum + dif}if(hist == TRUE) {dif <- histdist.fnc(basl, bas2, flit)difsum <- difsum + dif)difsum <- (difsum)A0.5difsum}Figure 36 Geomdist.fncfunction(bl, b2, filt)1#	 get the geometric data 	gl <- matrix(scan("geoml"), ncol = 11, byrow = TRUE)g2 <- matrix(scan("geom2"), ncol = 10, byrow = TRUE)datgeom <- cbind(gl, g2)#	 get the data types 	ty <- scan("type.g.dat", list("", 0))typg <- cbind(ty[[1]], ty[[2]])#	 get the weights 	we <- scan("wght.g.dat", list("", 0))wgtg <- we[[2]] #	datgeom[, 1] <- log(datgeom[, 1])difdst <- array(data = NA, length(datgeom[1, ]))numb <- length(datgeom[l, ])for(i in 1:numb)if(wgtg[i] != 0) (if(![bl, i]) && ![b2, i])) (if(typg[i, 2] == 0 && filt == TRUE) (filtfil <- paste(i, ".geom.dat", sep =..)ff <- scan("filter.dat", list("", 0))filtstf <- c(ff[[1]], ff[[2]])fillen <- length(filtstf[, 1])for(k in 1:fillen)if(filtstf[k, 1] == typg[i,1])filtnum <- filtstf[k, 2]})filter <- filter.fnc(datbgp[bl, i],datbgp[b2, i], filtfil, wgtg[i], filtnum)}if(typg[i, 2] == 1) (R <- range(datgeom[, 1])difdst[i] <- interval.fnc(datgeom[bi, i], datgeom[b2, U. R.wgtg[i])}if(typg[i, 2] == 2)sig <- (var(datgeom[, il))^0.5difdst[i] <- ratio.fnc(datgeom[bl,i], datgeom[b2, i], sig, wgtg[i])}}1}distgeom <- sum(difdst, na.rm = TRUE)1127Figure 37 Topdist.fncfunction(bl,	 b2,	 filt)1#	 get the geometric data 	tl <- matrix(scan("topl"),	 ncol =t2 <- matrix(scan("top2"), ncol =t3 <- matrix(scan("top3"),	 ncol =dattop <- cbind(tl, 	 t2, t3)msk <- dattop < -8dattop[msk]	 <- NA9,11,11,byrowbyrowbyrow===TRUE)TRUE)TRUE)#	 get the data types 	ty <- scan("type.t.dat",	 list("",typt <- cbind(ty[[1]],	 ty[[2]])0))#	 get the weights 	we <- scan("wght.t.dat",	 list("", 0))wgtt <- we[[2]]difdst <- array(data = NA, length(dattop[1, ]))numb <- length(dattop[1, ])for(i in 1:numb) {if(wgtt[i] != 0) {if(![bl, i]) && ![b2, i])) {if(typt[i, 2] == 0 && filt == TRUE) 1filtfil <- paste(i, ".top.dat", sep =tin)ff <- scan("filter.dat", list("", 0))filtstf <- c(ff[[1]], ff[[2]])fillen <- length(filtstf[, 1])for(k in 1:fillen) {if(filtstf[k, 1] == typt[i,1])	 {filtnum <- filtstf[k, 2]}}filter <- filter.fnc(datbgp[bl, i],datbgp[b2, i], filtfil, wgtt[i], filtnum)Iif(typt[i, 2] == 1) {rngdat <- c(NA, dattop[, 1])rngdat <- rngdat[]rng <- max(rngdat) - min(rngdat)cat("rng ", rng, fill = TRUE)difdst[i] <- interval.fnc(dattop[bl,i], dattop[b2, i], rng, wgtt[i])stf <- interval.fnc(dattop[bl, i],dattop[b2, i], rng, wgtt[i]))if(typt[i, 2] == 2) 1vardat <- c(NA, dattop[, 1])vardat <- vardat[!]sig <- (var(vardat))^0.5difdst[i] <- ratio.fnc(dattop[bl, i],dattop[b2, i], sig, wgtt[i])}}}Idisttop <- sum(difdst, na.rm = TRUE)}128Figure 38 Bgpdist.fncfunction(bl, b2, filt)#	 get the biogeophysical data 	dat <- scan("biogeop", list("", 0, "", 0))datbgp <- cbind(dat[111], dat[[2]], dat[[3]], dat[[4]])#	 get the data types 	ty <- scan("type.b.dat", list("", 0))typ <- cbind(ty[[2]], ty[f21])#	 get the weights 	we <- scan("wght.b.dat",	 0))wgt <- we[[2]]difdst <- array(data = NA, length(datbgp[1, ]))numb <- length(datbgp[1, ])for(i in 1:numb) {if(wgt[i] != 0) (if(![bl, i]) && ![b2, i])) {cat(i, typ[i, 2])if(typ[i, 2] == 0 && filt == TRUE) (filtfil <- paste(i, ".bgp.dat", sep =)ff <- scan("filter.dat", list("", 0))filtstf <- c(ff[[1]], ff[[2]])fillen <- length(filtstf[, 1])for(k in 1:fillen) {if(filtstf[k, 1] == typ[i,1])	 {filtnum <- filtstf[k, 2]}filter <- filter.fnc(datbgp[bl, i],datbgp[b2, i], filtfil, wgt[i], filtnum)if(typ[i, 2] == 1) fR <- max(as.numeric(datbgp[, i])) -min(as.numeric(datbgp[, i]))difdst[i] <- ratio.fnc(as.numeric(datbgp[bi, i]), as.numeric(datbgp[b2, i]), R, wgt[i])if(typ[i, 2] == 2)sig <- (var(as.numeric(datbgp[, I]))P"0.5difdst[i] <- interval.fnc(as.numeric(datbgp[bl, i]), as.numeric(datbgp[b2, i]), sig, wgt[i])distbgp <- sum(difdst, na.rm = TRUE)129Figure 39 Histdist.fncfunction(bl, b2, filt){#	 get the historical data 	dat <- scan("histor", list(""))dathist <- cbind(dat[[1]])#	 get the data types 	ty <- scan("type.h.dat", list("", 0))typ <- cbind(ty[[2]], ty[[2]])#	 get the weights 	we <- scan("wght.h.dat", list("", 0))wgt <- we[[2]]difdst <- array(data = NA, length(dathist[1, ]))numb <- length(dathist[1, ])for(i in 1:numb) {if(![bl, i]) && ![b2, i])) {cat(i, typ[i, 2])if(typ[i, 2] == 0 && filt == TRUE) {filtfil <- paste(i, ".hist.dat", sep = "")ff <- scan("filter.dat", list("", 0))filtstf <- c(ff[[1]], ff[[2]])fillen <- length(filtstf[, 1])for(k in 1:fillen) {if(filtstf[k, 1] == typ[i, 1]) {filtnum <- filtstf[k, 2]}1filter <- filter.fnc(dathist[bl, i], dathist[b2, i], filtfil, wgt[i], filtnum)1if(typ[i, 2] == 1) (R <- max(as.numeric(dathist[, i])) - min(as.numeric(dathist[, i]))difdst[i] <- ratio.fnc(as.numeric(dathist[b1, i]), as.numeric(dathist[b2, i]).R, wgt[i])}if(typ[i, 2] == 2)sig <- (var(as.numeric(dathist[, i])))"0.5print(sig)difdst[i] <- interval.fnc(as.numeric(dathist[bl, i]), as.numeric(dathist[b2, I]),sig, wgt[i])}}}disthist <- sum(difdst, na.rm = TRUE))130Figure 40 Interval.fncfunction(xl, x2, rng, wl){# function to calculate difference between interval datarng <- rng/4dif <- wl * (((xi - x2)^2)/rng)dif}Figure 41 Ratio.fncfunction(xl, x2, sigma, wl){# function to calculate difference between ratio datadif <- wl * (((xl - x2)^2)/sigma)dif}Figure 42 Filter.fncfunction(xl, x2, datfil, numcol)# function to analyse the filter variablesdat <- matrix(scan(datfil), ncol = numcol, byrow = TRUE)numb <- length(dat[, 1])issame <- FALSEfor(j in 1:numb) {if(xl == dat[j, 1]) {for(k in 2:numb) {if(x2 == dat[j, k]) {issame <- TRUE}}1issame133


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items