Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Towards a landform geodatabase : the authomatic identification of landforms Maguire, Bradley David 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2005-0263.pdf [ 36.94MB ]
Metadata
JSON: 831-1.0091973.json
JSON-LD: 831-1.0091973-ld.json
RDF/XML (Pretty): 831-1.0091973-rdf.xml
RDF/JSON: 831-1.0091973-rdf.json
Turtle: 831-1.0091973-turtle.txt
N-Triples: 831-1.0091973-rdf-ntriples.txt
Original Record: 831-1.0091973-source.json
Full Text
831-1.0091973-fulltext.txt
Citation
831-1.0091973.ris

Full Text

TOWARDS A LANDFORM GEODATABASE: THE AUTOMATIC IDENTIFICATION OF LANDFORMS by BRADLEY DAVID MAGUIRE B.Sc, The University of Victoria, 1989 Post-Graduate Diploma, Nova Scotia College of Geographic Sciences, 1990 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE In THE FACULTY OF GRADUATE STUDIES (GEOGRAPHY) THE UNIVERSITY OF BRITISH COLUMBIA April, 2005 © Bradley David Maguire, 2005 Towards a Landform Geodatabase ii Abstract If a geomorphologist is able to identify landforms from an aerial photograph or a Digital Terrain Model, then it should be possible for a computer to mimic the same process. The Landform Classification System (LCS) was created to allow for the automated identification of landforms from a Digital Terrain Model. The system uses a combination of a Network-Integrated Triangulated Irregular Network (NetTLN), a Fuzzy ARTMap Artificial Neural Network (ANN), and custom programming to produce a classification based on 22 morphometric variables, which describe the shape of the land surface. The ANN allows the system to "see" patterns in the morphometric variables. Once it has been trained with examples of different landform types, the ANN can perform a classification based on what it has learned. The LCS requires sufficient examples to produce high classification accuracies. Within the LCS, Kappa Analysis is used as the primary method for assessing classification accuracy. Kappa analysis takes into account the fact that even a random distribution of classified triangles may result in a few correct matches, so it is used as the primary measure of accuracy in this thesis. The K statistic produced by the Kappa Analysis decreases as we move from drumlins (8911 triangles) to eskers (193 triangles) and kames (11 triangles). The results for drumlins were best, with an Overall Accuracy value of 74.78% and a K accuracy value of 26.36%. For eskers, the values were 95.85% and 3.99% respectively. It should be noted that in spite of the low K values for eskers, the system has identified six potential eskers that were previously unidentified. For kames, the Overall Accuracy value was 98.47% and the K value was 0.00%, although this latter value is a reflection of the fact that no kames are known to exist on the map sheets that were classified. The Landform Classification System is reasonably fast at performing classifications. The ANN is currendy an external program; with some additional work, it can be incorporated directly into the LCS. Once this is done, the LCS should be fast enough to allow large areas to be classified. If the accuracy of the classifications can be improved somewhat, the Landform Classification System can then be used to produce a "Landform Geodatabase," which is a Geographical Information System (GIS) layer containing the type and extent of all landforms over a broad area. A short paper summarizing some of the results of this project to date was recendy presented at the Geotec 2005 conference in Vancouver. Entided "Development of the Landform Classification System," this paper summarizes some of the successes and problems that have surfaced in this project. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase iii Contents Abstract ii Contents iii List of Tables v List of Figures vii Acknowledgements x 1 Introduction 1 1.1 The Problem 1 1.2 Objective 1 1.3 Landforms, Context, and Empiricism 2 1.4 General Project Outline 3 1.5 Nomenclature 3 1.6 Constraints and Assumptions 6 1.7 Summary 7 1.8 Thesis Structure.. 7 2 Potential Applications 8 2.1 Topographic Mapping 8 2.2 Construction 9 2.3 Agriculture and Forestry 9 2.4 Military Operations 9 2.5 Mineral Prospecting 10 2.6 Sand and Gravel Prospecting 10 2.7 Summary 12 3 Literature Review 13 3.1 Choice of Landforms 13 3.2 Geological History 14 3.3 Characteristics of Selected Landforms 1 7 3.4 Data Requirements 20 3.5 Data Structures for Storage and Analysis 24 3.6 Morphometric Variables 26 3.7 Minimizing Variable Redundancy 35 3.8 Classification of Training Data 36 3.9 Summary 40 4 Morphometric Variables and Principal Component Analysis 41 Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase iv 4.1 Implications of the TIN Data Structure 41 4.2 Morphometric Analysis 47 4.3 Principal Component Analysis 53 4.4 Geography of 093G.066 59 4.5 Geomorphological Significance of PCA Results 61 4.6 Summary 72 5 Field Work 73 5.2 2003 Field Season 74 5.3 2004 Field Season 75 5.4 Remaining Work 80 5.5 Summary 81 6 Software Design 83 6.1 Introduction 83 6.2 Overview of Research 83 6.3 Software Capabilities 84 6.4 Summary 91 7 Results 92 7.1 Introduction 92 7.2 The "Ratcheting" Process 92 7.3 Final Classification Accuracies 105 7.4 Summary 110 8 Conclusions 112 8.1 Successes 112 8.2 Software Improvements Required 113 8.3 Procedural Improvements Required 117 8.4 Prospects for Commercialization 120 8.5 Further Papers and Projects 120 References 122 Appendix A: Glossary 127 Appendix B: Calculation of Statistics 131 Index 133 Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase v List of Tables Table 2-1. Projected lifespans for commercial gravel pits at 1980 consumption rates (Hora, 1988) 11 Table 3-1. The suitability of different continental glacial landforms for material extraction (Way, 1973) 14 Table 3-2. Dimensions of Eskers, Karnes and Drumlins according to various authors 18 Table 3-3. Minimum XY and Z Resolutions required for eskers, kames and drumlins 22 Table 3-4. DEM Resolutions used by Dikau (1989) 22 Table 4-1. How neighbourhoods are applied to triangle values. This is a very simplified example; in reality, each triangle would have a minimum of six first order neighbours.44 Table 4-2. Description of neighbours in Figure 4-2a 45 Table 4-3. Principal Component Analysis Component Matrix for Entire TIN (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arialfont (SPSS, 2002) 55 Table 4-4. Principal Component Analysis Component Matrix for Eskers (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arial font (SPSS, 2002) 56 Table 4-5. Principal Component Analysis Component Matrix for Kames (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arial font (SPSS, 2002) 57 Table 4-6. Principal Component Analysis Component Matrix for Drumlins (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arialfont (SPSS, 2002). 58 Table 4-7. Summary of Morphometric Variables with the Highest Correlations to Eigenvectors, as Identified by PCA. Red check marks indicate that these variables have the highest correlation for a component, green dots indicate remaining variables that have no overlap, and black crosses indicate overlapping variables that were eliminated 59 Table 5-1. Questions to be answered during Summer 2003 fieldwork 75 Table 5-2. Answers to 2003 field work questions 76 Table 7-1. Resolution Changes for Sheet 093G.066. The accuracy values for the highlighted entries are tested in Table 7-2 94 Table 7-2. Overall and K accuracies for low, medium and high-resolution data sets 94 Table 7-3. Overall and K accuracies for different training sets, with all other variables remaining equal 95 Table 7-4. A previous run of the analysis presented in Table 7-3 with a different variable set showed a clear peak in K accuracy at 1250 training rows 96 Table 7-5. Overall and K accuracies for different sets of morphometric variables, with all other parameters remaining equal 100 Table 7-6. Effect on Accuracy of Alterations to the Base Vigilance Level 101 Table 7-7. Effects of learning rate changes on classification accuracy 102 Table 7-8. Results for classification of map sheets 093G.066 and 093G.067 using a single run, combined results from 3 votes, and combined results from 5 votes 103 Table 7-9. Changes in classification accuracies with majority filters 105 Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase vi Table 7-10. Classification accuracies for eskers. Note: these values are based on the classification of map sheet 093G.056, 093G.057 and 093G.066 with the topographic filter (0.5 threshold) 106 Table 7-11. Classification accuracies for kames (Topographic Filter used, threshold = 0.99). In this case, the low k Accuracy is the result of making a comparison where no kames are expected 108 Table 7-12. Classification accuracies for drumlins (topographic filter, threshold = 0.4). Notice the high K value relative to eskers and kames 109 Table B-1. Sample Error Matrix for three classes (X, Y, and Z) and the formula for the calculation of Overall Accuracy (based on Congalton, 1991) 131 Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase vii List of Figures Figure 1-1. General project outline, showing the eight steps involved in this project 3 Figure 1-2. A wire frame view of a NetTIN showing triangle boundaries (white) and the stream network (blue) 4 Figure 3-1. Direction of ice flow during the Fraser Glaciation (Tipper, 1971) 16 Figure 3-2. An esker train located south of Highway 16. In the background, parallel eskers are visible (arrows). The esker in the foreground has been mined for gravel, and shows the fine material characteristic of these landforms (photo: B. Maguire) 18 Figure 3-3. Karstic and drumlin-like landforms. An example of drumlin-shaped karst formations (a). Note that the long axes of the features in the contour map at bottom are not aligned (Williams, 1972, p. 149). Long profiles of glacially sculpted hills show a continuum of form from roches moutonees through to uncored drumlins (b). A and B -roches moutonees, C and D - crag and tail, E - rock drumlin, F - uncored drumlin (Davies, 1969, p. 172) 19 Figure 3-4. Unnamed flyggberg on horizon 12km south of Mt. Baldy Hughes (Photo: B. Maguire) 20 Figure 3-5. Absolute and Relative Accuracy in Digital Terrain Models 24 Figure 3-6. A 3x3 moving window (tan) moves across a DEM (white grid) in an orderly pattern. A value calculated from the nine cells in the DEM is assigned to the centre cell (dark green) of a new DEM. The new value could be based on any operation, including calculation of the mean, to effect smoothing, or a directional filter, to emphasize certain features 28 Figure 3-7. Two photographs taken from the same point on the crest of an east-west running esker. The north slope (a) features luxuriant growth and high understory, while the south slope (b) has low shrubs and open meadows. (Photos: B. Maguire) 31 Figure 3-8. Local convexity can be defined based on the slope and aspect values downslope, left, upslope and right from every triangle in a NetTIN (Maguire, 2003a) 32 Figure 3-9. Small drumlins are visible at 1m resolution (a) but are invisible at 25m resolution (b) 39 Figure 3-10. TRIM data does not always identify all the eskers in an esker train. The two eskers at left are from TRIM data (light blue) and the five additional eskers at right are from a GPS survey conducted in 2004 (black) 39 Figure 4-1. Point, Voronoi and Triangular Data Structures. In (a), each point has an X, Y and Z coordinate, but the slope needs to be dynamically calculated for each point. In (b), the X, Y, and Z coordinates of the points are preserved, and the mean slope can be calculated for the area nearest to each point. In (c) the X, Y, and Z coordinates are still preserved, and the mean slope is calculated for the triangular faces that are created between the points 42 Figure 4-2. Analogous neighbours of a triangle in a TIN (a) and of a pixel in a regularly gridded DEM (b). Note: Although the TIN neighbourhood in Figure 2 is shown as a square area for clarity, in practice the TIN should be created from points of significant elevation and breaks of slope, not regularly spaced points. See Table 4-2 for an explanation of the colours and numbers used 45 Figure 4-3. Triangle 162 is constructed using a quad-edge data structure. Edges 38, 54 and 86 form the edges of the triangle. Edge 54 has a from-node (F - 82) a to-node (T - 154), as well as a left (L - 32) and right (R - 162) polygon (measured when facing the to-node). Dashed lines show the edges of adjoining triangles (based on Guibas and Stolfi, 1985) 46 Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase viii Figure 4-4. Horizontal convexity is measured along the slope, with "left" and "right" triangles being compared, and vertical convexity it measured along the line of steepest descent, with "up" and "down" triangles being compared. Here (a) and (d) are concave and (b) and (c) are convex 47 Figure 4-5. Horizontal Convexity. The triangles shown are viewed from above. A convex surface with a horizontal convexity of 30° (a) and a concave surface with a horizontal convexity of -30° (b). Illumination is from the right 49 Figure 4-6. How eigenvectors characterize the distribution of data. The first eigenvector (a) shows the trend for a majority of the data. The second eigenvector (b) shows the trend for much of what remains; and the third eigenvector (c) shows the trend for the data points that were not well explained by the first or second. Eigenvectors are commonly created in multi-dimensional space, with one dimension for each variable in the PCA (after Shaw and Wheeler, 1985) 53 Figure 4-7. Contours and water features for map sheet 093G.066. Index contours are thick dark grey lines, and streams are dark blue. Lakes and swamps are light blue filled areas. 60 Figure 4-8. Hillshaded views of 093G.066 showing landforms identified during training: drumlins (a), eskers (b), the kame (c), and an isometric view of map sheet from the southeast (d). Illumination is from the southwest; vertical exaggeration is 7x 61 Figure 5-1. The location of the Study Area in British Columbia, Canada 73 Figure 5-2. The 2003 team on our first day of fieldwork, June 23, 2003. Steven Janssens is at the left and Brad Maguire is at the right (photo: B. Maguire) 74 Figure 5-3. The 2004 field crew: Steven Janssens (left), Richard Bader (middle), and Brad Maguire (right) (Photo: B. Maguire) 76 Figure 5-4 Richard Bader shows off his GPS mapping and bug swatting skills on top of a drumlin (a). Tracks recorded using the GPS when measuring the crest and boundaries of this drumlin (b). Note: the Z values on the track come from the GPS; this is not the result of data being draped onto the NetTIN. The GPS values were elevated 20m above the surface for clarity in this illustration (photo: B. Maguire) 77 Figure 5-5. Returning to our vehicle, which could not proceed further because of deep mud in a gulley (Photo: R. Bader) 78 Figure 5-6. GPS Tracks for Esker at A5 showing differential height of east slope versus west slope. East slope - mean elevation difference between the two highlighted tracks (thick red line) is 0.54 m (a). West slope - mean elevation difference between the highlighted tracks (thick red line) is 3.39 m (b). Location on map sheet 093G.056 (red section, marked with arrow) (c) 79 Figure 5-7. Esker examined using GPS receiver on map sheet 093G.056. Sections drawn with a thick red line were used to obtain mean elevation (a), and (b) indicates the location of the esker on map sheet 093G.056 (black arrow) 80 Figure 5-8. We measured the esker on map sheet 093G.066 at two locations. The section shown with the thick red line in (a) had a mean height of 3.63m, and the section shown with the thick red line in (b) had a mean height of 3.75m. The location of the esker is shown by the arrow in (c) 81 Figure 6-1. Overview of software modules and the flow of data through them. The Principal Component Analysis module and the Artificial Neural Network will use external software products 84 Figure 6-2. The Morphometries Module Display. Note the progress bar, which allows the user to monitor the progress of morphometric variable generation, which can be time consuming 85 Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase ix Figure 6-3. The Morphometric Sets Module User Interface (left), and the editing window (right), with which morphometric variables can be selected 86 Figure 6-4. Training of landforms on a hillshaded view of a NetTIN. Here drumlins (dark blue areas) are being trained by selecting triangles on a NetTIN. Background information draped onto the NetTIN includes contours (dark grey lines), roads (red lines), streams (dark blue lines) and swamps (light green lines) 87 Figure 6-5. The Lesson Database Module User Interface (left) and the editing window (right), from which training sets are transferred to and from the Landform Trainer 88 Figure 6-6. Window showing Error Matrix in Landform Classification System. Overall, K, User's and Producer's Accuracy values are also displayed 90 Figure 6-7. Three-dimensional map view of classified results. Sheet 093G.066 (left half) was used for training and sheet 093G.067 (right half) was used to visually examine which areas were classified. The operator can rapidly rotate the image in any plane to obtain a better understanding of where on the landforms classified triangles were placed 90 Figure 7-1. Map sheets 093G.066 (left) and 093G.067 (right) showing differences in classification accuracy. The ANN classified sheet 093G.066 perfectly because the results were memorized, but sheet 093G.067 is only sparsely classified, missing entire drumlins on the west side of sheet 093G.067 97 Figure 7-2. Graphs of accuracy and training time changes for various learning rates 102 Figure 7-3. Effect of filters on a section of map sheet 093G.056. The raw data (a) has many disaggregated triangles; the majority_filter_thin (b) produces the best accuracy improvements by removing a majority of the lone triangles and doing some aggregation. The majority_filter_thick (c) removes fewer triangles and aggregates more of them to produce a better looking, but less accurate result than majority_filter_thin. The topographic filter (d) takes a majority within regions surrounding ridgelines, and produces the most accurate and best looking result 104 Figure 7-4. Final classification results for eskers. Unfiltered ANN output (a), filtered output (b) and expected eskers (c) for map sheet 093G.056, 093G.057 and 093G.066 106 Figure 7-5. Close-up hillshaded views of previously unidentified potential eskers. Isometric view at UTM coordinates 510.758E, 5,928,872N (a). Isometric view at UTM coordinates 510.855E, 5,938,199N (b). Plan view at 511,195E, 5.932.660N (c). Isometric view at 508.299E, 5,931,290N (d). Isometric view at 512.024E, 5,929,368N (e). Isometric view at 510.357E, 5,935,201 N (f). All coordinates are UTM Zone 10, NAD 83 107 Figure 7-6. Final classification results for kames. Unfiltered (a), filtered (b) and expected kames (c) for map sheets 093G.056, 093G.057 and 093G.067 109 Figure 7-7. Final classification results for drumlins: initial, unfiltered results from the ANN with unclassified zone circled (a), filtered results (b), expected results for map sheets 093G.056, 093G.057, and 093G.067 (c) 110 Figure 8-1. The sum of the triangle orientations in a drumlin (dashed line, right) is roughly equivalent to the direction of ice flow. The resultant of the vector addition (red dashed line, left) shows the direction of ice flow 114 Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase x Acknowledgements This project could not have been completed without contributions of time and money from the following people and organizations. Special thanks are extended to: Brenda Maguire and Jocelyn Maguire My wife and daughter, who deserve a better life Viola Maguire and Watty Maguire My parents, who encouraged excellence but didn't get to see the completion of this project Steven Janssens and Richard Bader My field crew, who put up with unreasonable demands in difficult conditions Scott Martin, Dave Hawkins, Gerry Furseth, and Byron Berglund The staff at Facet Decision Systems, who provided advice on prograrnming Dr. Brian Klinkenberg, Dr. Marwan Hassan, and Dr. Michael McAllister My committee members who provided valuable advice and helped me to make some important breakthroughs The National Sciences and Engineering Research Council of Canada Funding for this project was provided by NSERC, through an Industrial Postgraduate Scholarship. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 1 1 Introduction In the land of the blind, the one-eyed man is king. - Desiderius Erasumus, 1466-1536 1.1 The Problem Identifying landforms over large areas is a difficult and time-consuming task. Traditional methods of identifying landforms, including airphoto interpretation and field studies, are limited by the speed of the person performing the interpretation. In practise, this has meant that few Geographical Information Systems contain a landform inventory, which would be useful to people creating resource inventories and performing fundamental research. 1.2 Objective In this thesis, I will design a system that can perform a general landform classification and will identify landforms by their shape. Once a number of landforms of a particular type have been identified, these can be used to teach the system how to identify other landforms of that type. With this technique, the user needs to know little about the landform being trained, other than its shape. The ultimate goal is to accurately classify large areas in order to produce a landform inventory. Recently, work within the Geographical Information Systems (GIS) community has shown the potential to replace time-consuming traditional techniques. Researchers have shown that it is possible to automatically determine landscape units from Digital Elevation Models (DEMs). Unfortunately, they have not been able to perform classifications over large areas, and the landscape units identified are quite large and do not correspond with the typology used by geomorphologists. A type of DTM called a Network-Integrated Triangulated Irregular Network (NetTIN) was used to facilitate the classification of eskers, kames and drumlins within three 1:20,000 BC Terrain Resource Information Management (TRIM) map sheets. This will form the foundation of a commercially available service, which can make use of existing, wide-coverage DTM data such as Shuttle Radar Topography Mission (SRTM) and TRIM, rather than custom-collected high-resolution DEMs. Fortunately, this project is not venturing entirely into "terra incognita." To date, other authors have performed an enormous amount of work on subjects related to this thesis. Morphometries, the study of the shape of surfaces, now has a bibliography with over 6000 entries (Pike, 2002). Add to this the other disciplines that are touched by this thesis, such as Artificial Intelligence, Statistics, and Geographical Information Systems, and there is a huge amount of research from which important lessons can be learned. This project is also a sister project to Facet Watersheds Data, a project to automatically delineate watersheds and calculate related hydrological and landscape information from a DTM (Facet, 2002). The Landform Classification System (LCS) will be tested on eskers, kames and drumlins in a study area 25km southwest of Prince George, British Columbia. I will increase both the si2e of the study area, to about 420 km2, and the resolution Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 2 of the classification in this thesis, as contrasted with previous work. I have chosen eskers and kames, which are both glaciofiuvial landforms, because of their value to the sand and gravel industry as sources of sand, gravel and aggregate. I have included drumlins as a contrasting landform to test the classification capabilities of the system developed. Eskers are long, sinuous ridges, kames are small conical hills, and drumlins are large, teardrop-shaped hills. The differences in shape between these three landforms will be used to test the capabilities of the Landform Classification System. 1.3 Landforms. Context, and Empiricism What is a landform? This seems like such a simple question that it hardly needs answering. Very few books on landforms actually include a definition of what a landform is. However, when you start to study the issue, the answer is not so clear-cut. H.F Garner describes landforms as "a discrete shape developed over an area of lithosphere" and landscapes as "an assemblage of landforms" (Gamer, 1974). This leads me to question whether every single point on the surface of the Earth belongs to some type of landform. If there is no discrete shape to recognize, is there no landform? A more positivist definition starts to clear up some of the confusion. Audrey Clark provides this definition: "The shape, form, nature of a specific physical feature of the earth's surface (e.g. a hill, a plateau) produced by the natural processes of denudation and deposition (including weathering, glaciation etc.) and by tectonic processes" (Clark, 1985). Therefore, a landform is not merely a discrete shape, but it is a discrete shape with a history. Given that every point on the Earth's surface has a distinct geological history, it is probably safe to say that every point, even on a featureless plain, belongs to some landform. What is missing from these definitions is any mention of spatial context. Although drumlins resemble the hills in cockpit karst in both shape and size, knowledge of their location provides the key in differentiating them, since cockpit karst occurs in the tropics and drumlins occur only in regions that were glaciated. Furthermore, when we examine the spatial context of the landforms, we realize that the drumlins are oriented uniformly but the hills in cockpit karst are not (see Figure 3-3a, page 19). Although geomorphologists were taught landforms by their idealized shape, in reality, landform shape is the product of the complex geological history of an area, and may not fit the textbook definition. Furthermore, erosional processes since the creation of the landform may have altered its appearance significandy. These considerations are important when we attempt to create a system to automatically identify landforms. The system that I am building is limited to empirical analysis; it has no theoretical background on which to make a decision. It cannot examine the geological history of an area, but must rely solely on the diversity of landform shapes and the context in which these exist. In this thesis, I will examine whether shape and context are sufficient to provide an accurate classification of landforms. The system should be able to distinguish all landforms in an area from each other, if the landforms are sufficiendy large to be resolved on the NetTIN. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 3 1.4 General Project Outline This project consists of eight steps (Figure 1-1). The study area is composed of two unequal parts: a smaller training area and a larger testing area. In the system, a number of Network-Integrated Triangulated Irregular Networks (NetTINs) will represent the topography of the study area. From these, variables describing the land surface will be calculated. I will make use of statistical techniques to select those variables that are significant for the identification of the selected eskers, kames and drumlins. Using these variables, I will teach an Artificial Neural Network (ANN) how to classify landforms in the training area, and will evaluate the classification accuracy for the testing area. Determine Study Area Acquire Data Build Digital Terrain Model Calculate Landscape Variables Statistical Analysis of Variables Artificial Neural Network Testing of Classification Training Data Figure 1-1. General project outline, showing the eight steps involved in this project. 1.5 Nomenclature In this thesis, I have attempted to refrain from lapsing into jargon. Nevertheless, this being a multidisciplinary technical study, I have cited a great number of acronyms, and some of the discussions of particular techniques may be quite complex. In this section, I will attempt to provide an overview of some of the technologies and terms that I use in this thesis. Definitions that are more specific can be found in the Glossary (Appendix A). To represent the land surface in the study area, I created four NetTINs. A NetTIN is a type of Digital Terrain Model (DTM) that combines a stream network (the network, or "net") with a representation of the land surface (a TIN) (Evans et al, 1996) (Figure 1-2). TINs represent the land surface using a series of irregular triangles, each of which is composed of three measured points having elevation. TINs allow for variable data resolution, permitting the accurate Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 4 representation of complex terrain, while not wasting data storage in places where terrain is less complex. The addition of the stream network allows the NetTIN to represent the both the land surface and the hydrology of the area that is represented. In the LCS, I use the stream network in the calculation of those morphometric variables that examine streams, such as source density and drainage density. Figure 1-2. A wire frame view of a NetTIN showing triangle boundaries (white) and the stream network (blue). Although Evans et al. (1996) designed this data structure for hydrological applications, it allows for better representations of landforms than do Digital Elevation Models (DEMs) constructed from the same source data. DEMs are the most commonly used type of Digital Terrain Model, and represent the land surface as a regular grid of elevation points. In the NetTIN, certain morphometric variables, such as slope, aspect and elevation are implicitly stored in the data structure. In addition, it is also easy to extract ridges, valleys, peaks and depressions from the NetTIN because they are also implicitly stored. Every line in the NetTIN separates two triangles. If both triangles slope away from the line, then the line forms a ridge. If both triangles slope towards the line, it forms a valley. Similarly, by examining the elevations of all vertices that connect to a particular vertex of interest, we can determine whether that vertex has the highest elevation (a peak) or the lowest elevation (a depression) in the group. When using DEMs, only the elevation can be extracted directly from the data structure, but all other features and variables must be calculated. I created my NetTINs with British Columbia Terrain Resource Information Management (TRIM) data (Ministry of Sustainable Resource Management, 2003) that was provided by Geographic Data BC. TRIM data distributions contain three data sets: vector line data, a set of mass points and a set of breaklines. The system uses the mass points and breaklines from the TRIM data for the construction of the NetTIN. Mass points have X, Y, and Z values and were collected at significant points on the map surface. Breaklines have X, Y and Z values for each of their points. These represent significant changes ("breaks") in slope, such as ridges, valley bottoms and lines of inflection. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 5 The system uses the NetTLNs for the calculation of morphometric variables, which describe the shape of the land surface. Measures such as elevation, slope, aspect, plan and profile convexity are among many morphometric variables that researchers have proposed. Technically, these should be called geomorphometric variables, since they describe landforms, as opposed to variables that describe the surface of a telescope mirror, the shape of sedimentary particles or the body form of an animal (Evans, 1972). In this thesis, however, I assume that all morphometric variables described pertain to the surface of the Earth. Investigators have proposed several dozen morphometric variables over the past 40 years. However, not all of these are useful for the identification of eskers, kames and drumlins. Fortunately, Principal Component Analysis (PCA) is available to help determine which morphometric variables are most relevant to our analysis. I can use this statistical technique to analyze the amount of variation present in my set of morphometric variables and to determine the smallest set of variables that represent my landforms. Based on the training data for eskers, kames and drumlins, the chosen morphometric variables are classified. Many different statistical and non-statistical classifiers are available to perform this kind of work. Statistical classifiers, such as the Maximum Likelihood (ML) classifier, assume that the data is distributed normally, which is clearly undesirable when we are attempting to rapidly classify many data sets from different areas. Two non-statistical classifiers, which avoid this problem, are Artificial Neural Networks (ANNs) and Fuzzy Logic. ANNs are simple models of the human brain that duplicate some of its learning and generalization abilities. Researchers have developed many different types of ANNs since the first one was created in 1958 (Openshaw & Openshaw, 1997). Fuzzy Logic uses a series of rules to produce a classification. Unlike Boolean Logic, with which most people are familiar, Fuzzy Logic allows partial membership in a number of sets. For example, using Boolean Logic, a Sport Utility Vehicle must be either a truck or a car. With Fuzzy Logic, we can say that the SUV can belong 60 percent to the class "truck" and 40 percent to the class "car." By combining a series of membership functions, a Fuzzy Logic system can classify landforms (MacMillan et al., 2000). In this thesis, the advantages of ANNs and Fuzzy Logic are combined using a newer type of ANN known as Fuzzy ARTMap (Carpenter and Grossberg, 1992). Fuzzy ARTMap overcomes many of the limitations of older ANNs, and seems to be an excellent choice for the classification of landforms. One key characteristic of Fuzzy ARTMap is that the ANN can learn new landforms incrementally, allowing the system to incrementally construct a Landform Geodatabase. After training, the classifications produced by the Fuzzy ARTMap ANN that is used in this project need to be tested against the landforms that were manually identified on the NetTIN. To do this, I have adopted the Error Matrix from Remote Sensing studies (Story & Congalton, 1986). An Error Matrix is a tabular comparison of expected and observed pixel values, from which the system can calculate classification accuracies. We can substitute triangles for pixels in the Landform Classification System. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 6 1.6 Constraints and Assumptions To limit the scope of this thesis to something that is manageable, I will only classify eskers, kames and drumlins. Because I used PCA to eliminate morphometric variables that are not significant to the classification of these three landforms, I would have to run it again if other landforms are to be later included in the classification. Fortunately, since Fuzzy ARTMap allows for incremental learning, it will not be necessary to retrain the ANN from scratch every time a landform is added. One objective of this project is to see whether the system can identify landforms on medium-resolution NetTINs. This allows the wide application of this process, using nationwide or province-wide data sets. With one exception, no other data sets will be employed, because they tend to have limited spatial coverage, and will restrict the areas where the LCS can be used. The only other data that I will add to the TRIM data will come from supplementary field studies conducted during the summers of 2003 and 2004. The purpose of these studies was to get a general feel for the landforms in the study area, to clarify any ambiguous landforms that are seen in the NetTINs, and to ensure that the inventory of landforms identified from the NetTINs are complete. The field studies will help to ensure that the NetTIN identifies all landforms that we can see from the ground. While performing fieldwork, we used Global Positioning System (GPS) receivers to help identify and locate the landforms in question. Limiting the LCS to analyzing medium-resolution data may unfortunately result in the LCS being unable to identify smaller landforms. However, the advantage of this approach is that the system will have the potential to classify large areas of the Earth's surface. To be able to take advantage of this, the LCS must be able to process data rapidly. Since I designed the LCS with eventual commercialization in mind, I need a system that operates quickly and is scalable. One assumption in this thesis is that I can visually identify the landforms on the DTM, since it is necessary to see the landforms on the DTM in order to create training data for the ANN. For most drumlins, this is quite easy because of their relatively large size. Smaller eskers, however, which exist at the limits of what can be resolved using TRIM data, may not appear on the DTM. Differentiating between kames and smaller, more rounded drumlins, and between the smaller, more elongated drumlins and the largest eskers may be difficult. Even if we have a good set of landforms with which to train the ANN, there will always be some landforms that the ANN will not be able to identify. Each landform has a unique history and is the product of a unique set of conditions. For example, if a drumlin is located on top of a slip strike fault, there may now be a lateral offset between its two sides because of fault movement. Such a drumlin would have a different shape from its close neighbours that are not located on top of the fault. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 7 1.7 Summary The Landform Classification System can help geomorphologists to rapidly assess the landscape over large areas. A complete inventory of the landforms in an area helps to identify the dominant landforms, and serves to identify both those landforms that have yet to be identified as well as those that have been mapped. This chapter was a general overview of my project to automatically identify landforms. I discussed the eight major steps in the project (Figure 1-1) and provided a brief overview of the GIS, statistical and Artificial Intelligence techniques used in this project. Finally, to round out this discussion of the scope of this thesis, I discussed some of the inherent assumptions of my approach. Despite all of the previous work that has gone into landscape analysis, and the comforting presence of Facet Watersheds Data, the research presented in this thesis is inherendy risky, and carries with it a high risk of failure. In order to reduce the risk, I have conducted an extensive literature review. Much research has been published on topics related to the automated identification of landforms; by critically examining this research, I hope to avoid some of the pitfalls encountered by other authors. 1.8 Thesis Structure Chapter 2 is a discussion of how a system to automatically identify landforms might be applied. In Chapter 3,1 present the results of the literature review, placing particular emphasis on understanding those issues that have the most potential to cause the failure of this project. Chapter 4 is a discussion of the morphometric variables used in this project, followed by the results of the Principal Component Analysis. Chapter 5 is a brief discussion of the fieldwork that was conducted during the summers of 2003 and 2004. I used the literature review to help me produce the software design for this thesis, which is presented in Chapter 6. The software design helps to ensure that a project with many components stays as a consistent whole. In Chapter 7, I present the project results, and in Chapter 8, I discuss the results and their implications for future research. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 8 2 Potential Applications We are pioneers setting out to explore a new country. We have the thrill of ever-changing view, now and again we reach a ridge or summit which opens up new and unexpected vistas-of necessity our point of view must continually change. - Sir James Jeans, 1877-1946 Knowledge of landforms is critical in many areas, including topographic mapping, construction, agriculture, military operations and certain types of mineral prospecting. Instead of beginning a study with extensive airphoto interpretation or an expensive and difficult field survey, geomorphologists could make use of a completed "Landform Geodatabase" to immediately determine the location and extent of all landforms of a particular type. Dikau (1989) recognized the capabilities of GIS as an obvious component of any system to automatically classify landforms, even when it was unclear about how the rest of the analysis would be carried out: "It is premature to predict to what extent a completely automated relief form analysis can be achieved applying the methods discussed. It can be said, however, that the methods provided by GIS are well suited for processing these problems" (Dikau, 1989, p. 56). A simple GIS query would allow the user to shortlist areas of interest within seconds. Such a shortlist would be much less limited than the results of airphoto interpretation, whose difficulty and cost limit the number of areas that can be examined, and preliminary field surveys, which are even more limited by labour, road availability and helicopter time. Information on the land surface has already been collected by the agencies that create medium-scale topographic maps; however, the landforms are implicit, and need to be extracted from the map data. With appropriate analysis techniques, landform types and extents should not have to be collected again. In this sense, this project performs "data mining" — hidden information in a database is revealed with the appropriate analysis techniques. Furthermore, because these data have been collected by governmental agencies, whose goal is to create a uniform representation over their jurisdiction, they aren't biased in the same way as airphoto interpretation studies or field surveys. Thus, it is likely that a Landform Geodatabase will help to locate unexpected landforms. 2.1 Topographic Mapping Perhaps the most obvious application of a system for automated identification of landforms is the creation of new layers of topographic data for GIS. Currendy a cartographer, geomorphologist or geologist is required to identify landforms before mapping can begin. However, when confronted by hundreds of features needing identification, a person may require weeks or years to do the job. Increasing the data content of digital topographic maps to create a Landform Geodatabase is the key goal of this project, so that it might be possible in the future to answer questions such as "How many cirques are there in the Rocky Mountains?" with some degree of accuracy. During the summers of 2003 and 2004, field surveys showed that many of the eskers shown in the TRIM data were in fact members of esker trains, and had unmapped companions. Even with recendy collected 1:20,000 map data, it is clear that much topographic information is missing; kames and drumlins are not explicidy identified on the TRIM Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 9 sheets. The potential of the LCS to intelligently interpret the surface of the Earth and compensate for map classification errors and omissions is obvious. Having a more complete understanding of the topography and landforms in an area is valuable to geoscientists. Field studies of particular landforms are much easier to plan without having to do extensive airphoto interpretation studies in order to identify all landforms of a particular type. This capability may even have application on the Moon or Mars (NASA, 2003). Being able to identify landforms of interest near landing sites may be a particularly useful contribution to the planning of future manned and robotic space missions. A completed Landform Geodatabase will ensure that an optimally small study area containing all the features of interest can be identified, which has obvious value where resources are limited and scientific return must be maximized, whether on the Earth or in outer space. The Landform Geodatabase would include landforms that present hazards to people and property, enabling monitoring programs to be more thorough. Such landforms include unidentified dormant volcanoes, geological faults and areas prone to landslides. Areas known by geologists to contain geological hazards or desired landforms can also be compared with other areas to look for similarities (Pike, 1988b). 2.2 Construction Civil engineers can also make use of data from a Landform Geodatabase. As a first stage in routing highways, railroads and pipelines, geologists and engineers can use classified landform information to help plan preliminary routes. This can be particularly important in mountainous areas, since a classified DTM can be used to first identify all mountain passes in particular areas and then to rank them according to elevation and accessibility. The routing can then be followed up by a more detailed stage of analysis, in which landforms that obstruct construction are identified. In addition, the Landform Geodatabase can be used as a catalogue of the locations of small sand and gravel deposits (borrow pits) along proposed transportation corridors. These are critical for the economical construction of new transportation routes (see Section 2.6, page 10). 2.3 Agriculture and Forestry Since drainage characteristics and soil development are related to landforms, the Landform Geodatabase will contain important data for soil scientists. Because landforms are important predictors of soil types, they can be used to augment soil survey maps (MacMillan & Pettapiece, 2000). Knowledge of the location and extent of landforms will help foresters and agrologists to assess the effectiveness of particular areas for forestry or agriculture. 2.4 Military Operations Hart (1986) identifies five areas in which geomorphological knowledge can assist in military operations. These include: 1. Locations of choke points, and areas of vantage and refuge. 2. Suitability of terrain for cross-country movement. 3. Suitability of terrain for establishing temporary or permanent military facilities. 4. Identification of natural hazards (such as areas susceptible to flooding). 5. Availability of raw materials (sand and gravel). Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 10 The planning of troop movements can be assisted by identifying high points of land to serve as vantage points and defensible positions, and places that are protected from enemy fire as places of refuge. Identifying "choke points," where troops are vulnerable to attack, is another variation on this theme. Routing of vehicle movements, including trucks, aircraft and cruise missiles, can be assisted using the same principles (Mitchell, 1991). For off-road travel, we can estimate the "trafficability" (the ability of the land to support cross-country travel) of an area from the landforms present. Many factors affect the ability of soldiers or off-road vehicles to pass over terrain, including roughness and soil firmness (Mitchell, 1991). For military vehicle and troop movements, movement of construction vehicles prior to road construction, and even for planning robotic and human missions on other planets, trafficability analysis is an important aspect of landform identification (Pike, 1998b). 2.5 Mineral Prospecting Although the Landform Classification System is concerned with the surface of the Earth, it can also play a role in helping to prospect for minerals, since the Earth's surface sometimes provides clues about what's underneath it. As Hart (1986) states " . . . mineral resources are often related directly to relief (p. 177). He goes on to describe how geomorphological knowledge can be used to localize three types of mineral resources. Moving water transports and concentrates gold, diamonds, tin (Hart, 1986), platinum and copper (Mitchell, 1991) into placer deposits. Enriched copper, limonite (iron ore), manganese, bauxite (aluminium ore), cobalt, kaolin (Hart, 1986), uranium, and phosphorous (Mitchell, 1991) are examples of weathering products, which are the result of leaching and/or precipitation in water. Basin deposits, which form in natural depressions, include minerals such as coal and iron ore. In addition to Hart's list, we can include minerals that are concentrated in areas of internal drainage, such as gypsum, salt, potash, and peat. Mitchell (1991) gives clues about how a Landform Geodatabase could assist oil prospecting. He describes the accumulation of petroleum, minerals and water in traps (formed by anticlines in impervious sedimentary rocks) and aulacogens (failed rift valleys). On the surface, anticlines form cuestas (parallel hills with one side steeper than the other) when the layers have been eroded. In addition, Mitchell describes how metals can be associated with particular landforms, including volcanic dykes and meteor craters, in which many of the original minerals remain from the meteor. 2.6 Sand and Gravel Prospecting Sand and gravel, although technically mineral resources, have some unique characteristics that make them different from other mineral resources. Unlike many other resources, industry consumes tonnes of sand and gravel, even for relatively small projects. Accordingly, the cost per tonne for extraction and transportation of gravel is critically important. Leahy noted, "transporting aggregate 20 miles by truck can as much as double its cost" (Leahy, 1998, p. 1). Thus, construction projects require local sources of sand and gravel. Eliminating much of the time required for airphoto interpretation and geological evaluation can help to find new gravel sources quickly and reduce the cost to develop them. Prospecting for sand and gravel is possibly the most important potential application of the Landform Classification System. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 11 Hart (1986) describes the importance of sand and gravel: "in terms of tonnages, sand and gravel are the two most important materials extracted from the earth" (p. 177). In 1998, Leahy reinforced this point in a press release stressing the importance of sand and gravel to the economy of the United States. The construction industry uses enormous amounts of sand and gravel every year in the construction of roads and the manufacture of concrete. In the press release, Leahy states, "Much of the nation's infrastructure built during the 1950's and 1960's has deteriorated, and in many areas of rapid population growth the infrastructure is inadequate and new roads, streets, and sewage systems must be built... 'as urban areas expand, local sources of these resources are becoming less accessible'" (Leahy, 1998, p. 1). Bennett & Glasser (1996) state that the UK consumes 280 million tonnes of aggregate, sand and gravel per year. Of this, only 39% comes from natural sources; crushing stone produces the remaining 61% of the supply. The shortage of natural sand and gravel even extends into Canada. Culley (1973) notes: In many areas of Saskatchewan, known gravel deposits have been depleted or are diminishing rapidly; in other areas, particularly in the North, demands for all-weather highways and access roads are far exceeding the supply of gravel sources with which to surface them. These two problems have made conventional methods of air-photo and ground surveys insufficient for gravel location (p. 70). Hora (1988) examined the gravel reserves at commercial gravel pits in different parts of British Columbia. The projected lifespans for commercial pits (Table 2-1) are based on 1980 consumption rates and reserve capacities, which were supplied by the pit operators. Cleady, Prince George is a good place to find landforms, but a terrible place for a proof-of-concept test. The Trail-Nelson region is the obvious choice for a full test of the LCS to locate gravel sources, since it seems to be suffering from a local shortage. Region Mean Gravel Reserves (years) Kamloops 7 Vernon-Armstrong 10 Kelowna 5 Trail-Nelson <5 Prince George >50 Peace River 5 Kitimat-Terrace 6 Table 2-1. Projected lifespans for commercial gravel pits at 1980 consumption rates (Hora, 1988). Hart (1986) describes the geomorphological aspects of the sand and gravel industry in detail. It is a short step from having a complete Landform Geodatabase to identifying sand and gravel deposits, since " . . . suitable sand and gravel deposits neady always occur in well documented geomorphological situations" (Hart, 1986, p. 178). These situations include river channels, flood plains, river terraces, marine environments and glaciofluvial environments. Bennett & Glasser (1996) quote Crimes et al, (1972) and state that a targeted approach can cover six km2 per day. Without knowledge of sand and gravel bearing landforms, the traditional method of performing drillings and digging test pits can only cover a single square kilometre per day. Given that each TRIM map sheet covers 140 km2, it is possible that the LCS can effect an improvement of as much as 20 times over the targeted search technique at a lower cost. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 12 The development of a technology that is beneficial to many people in many different ways appears to be an obvious undertaking for a government agency. It is no wonder that the United States Geological Survey, "whose long term goals include an automated capability for evaluating and classifying vast amounts of topography from DEMs," is interested in technologies such as the Landform Classification System (Pike, 1988a, p. 111-20). 2.7 Summary Having an overview of the landforms that can be found in an area is valuable in many different areas. Although people have survived without the "global view" that the Landform Classification System provides, having a readily available inventory of landforms over large areas makes many of the tasks described in this chapter much more efficient, both in terms of time and cost. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 13_ 3 Literature Review An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem. - John W. Tukey Geographers, geologists and geomorphologists have named hundreds of different landforms. Since the goal of this thesis is to identify landforms on a Digital Terrain Model, it would likely be counterproductive to attempt to classify a large number of landforms at present. For this reason, I have chosen a small set of landforms; successful identification of these using a NetTIN will permit me to later expand the project to identify other landforms. This chapter focuses on answering specific questions related to the direction and design of this project. Eight major questions affect the design of this project, including: 1. What landforms should be chosen for analysis, and why? 2. Are there any specific characteristics that can help to identify particular landforms? 3. What data are required to represent the chosen landforms adequately? 4. How are the data best stored? 5. What variables can the system calculate from the stored data to best identify the chosen landforms? 6. How can I eliminate redundant information to make the classification as efficient as possible? 7. How can I classify the variables to identify the landforms? 8. What is the best way to delineate examples of landforms for training and comparison? The answers to these questions will help to shape the final software design, which is described in Chapter 6. 3.1 Choice of Landforms Five practical considerations help to influence my choice of landforms. First, the study area should be within a day's drive of Vancouver to minimize travel costs, and it should be compact, to simplify field studies and minimize the amount of data processing required. Second, the chosen landforms should be relatively large so that they can be resolved using NetTINs created from TRIM data. Third, eskers, kames and drumlins, which are the result of continental glaciation processes, are very common in northerly nations such as Canada. Fourth, eskers, kames and drumlins have distinct morphologies, and so are ideal for testing a system that identifies landforms based on their shape. Fifth, although not specifically required for this project, it makes sense to identify landforms that are of economic importance; eskers and kames are valuable sources of sand and gravel for the Construction Industry. Hart (1986) points to the role of the geomorphologist as the preferred prospector for sand and gravel: "The intimate theoretical knowledge of the depositional environments of sand and gravel enables the geomorphologist to play the role of prospector in the aggregate industry" (p. 178). Way (1973) classifies continental glacial landforms by their suitability for different types of sand and gravel extraction (Table 3-1). Eskers, kames and outwash plains are all excellent sources of construction material. Drumlins, however, are not, although they occur in some of the same areas as eskers and kames, and so serve as physically contrasting landforms. Outwash plains typically occur at glacial margins, which are far from eskers and kames, and so are not included, although they, too, are excellent sources of construction material. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 14 Landform Sand Gravel Aggregate Road Borrow Pits Extraction Extraction Extraction Surfacing (for Highway Construction) Eskers Good - Poor Excellent Excellent Unsuitable Excellent Kames Excellent Excellent Good Unsuitable Excellent-Fair Drumlins Unsuitable Unsuitable Unsuitable Good-Fair Good-Fair Till Unsuitable Unsuitable Unsuitable Excellent-Fair Good-Poor Moraines Good-Unsuitable Good-Unsuitable Good-Unsuitable Excellent-Fair Excellent-Poor Outwash Excellent Excellent Good Unsuitable Excellent Table 3-1. The suitability of different continental glacial landforms for material extraction (Way, 1973). 3.2 Geological History Three physical processes have dominated the history of my Study Area: Vulcanism, Hydrology and Glaciation. Several episodes of vulcanism have occurred in geological history, and there have also been as many as four episodes of glaciation. In the eastern half of my study area, the Fraser River, and its ancestral incarnations have influenced the geology. Current geological knowledge only provides us with a sketch of the geological history of the area; geological studies in the area have been conducted at small scales, both spatially and temporally. Furthermore, later geological processes, particularly the Fraser Glaciation, have removed or covered previous geological features (Tipper, 1971). Because of this, important local or short-lived geological phases are doubdessly missing from our knowledge of the Study Area. One thing to consider is that landforms may be superimposed upon one another (Bennett and Glasser, 1977). This clouds the issue of each landform having a distinctive shape. When landforms are superimposed, they no longer resemble their "textbook definition", and it becomes difficult to identify them. The relatively recent creation of glacial landforms by the Fraser Glaciation is somewhat fortuitous. On the till plain that makes up much of the study area, the thick blanket of glacial till "insulates" the eskers, kames, and drumlins from the form of the bedrock below. One notable exception to this is in the valley of the Fraser River, where a glacial lake was formed at the end of the Fraser Glaciation. In this area, landforms have been partially buried by lacustrine sediments. Another environmental factor that may affect the shape of landforms is erosion. Because my study area was deglaciated relatively recendy (less than 10,000 years ago), there has been relatively litde time for weathering to affect the shape of the landforms. 3.2.1 Vulcanism The bedrock within my Study Area points to a highly volcanic past. Over 95% of the bedrock in the area was formed during the Triassic period, and consists of sedimentary and metamorphic rocks, but includes volcanic rocks including andesitic basalt, basaltic tuff, tuffaceous siltite, tuffaceous argillite, and augite porphyry (Struik, etaL, 1990). To the east, the Naver Pluton was created during Early Cretaceous. The Naver Pluton is exposed today as a group of high hills to the east of my Study Area. Some suspected satellites of this pluton can also be found in the eastern half of Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 15 my Study Area; these rocks contain Biotite Granite, Quartz Monzonite, Monzonite and Granodiorite (Struik, et a/., 1990). During the Miocene or the Pliocene, a third phase of vulcanism created the Chilcotin Group. Eruptions of basalt and rhyolite as well as the deposition of flood basalts were accompanied by the creation of further granitic intrusions. Mount Baldy Hughes, as well as the unnamed flyggberg 12 km south of it were created as granitic intrusions during this period (Struik et al., 1990). South of my Study Area, it appears that volcanic rocks dammed the ancestral Fraser River at some time during this period (Lay, 1940). 3.2.2 Hydrology The Fraser River, and its ancestral versions, have been a feature of the landscape in this area for at least the past 24 million years. Below the Miocene and Pliocene lavas is a layer of poorly consolidated sediments that were likely laid down by the ancestral Fraser River (Struik, et al., 1990). Until the Miocene, the ancestral Fraser River flowed northward from the Chilcotin River, which joins the Fraser near Alkali Lake, BC. At that time, the ancestral Chilcotin River captured the headwaters of the ancestral Fraser, and the flow was gradually diverted southward. Evidence for this episode of stream piracy includes the northward flow of all tributaries, and the wide (old) valley of the Fraser River north of the Chilcotin River (Lay, 1940, Tipper, 1971). At the end of each interglacial period, the glaciers returned, obliterating the Fraser River, and eroding and depositing materials into the channel, causing changes to the course of the river when the glaciers retreated again. During the retreat of the glaciers after the Fraser Glaciation, glacial dams created a proglacial lake in the area of Prince George. This lake had a surface just below 762 metres, as evidenced by beach deposits south of Prince George. This glacial lake occupied much of the current valley of the Fraser River, and laid down lacustrine deposits, which partially buried some pre-existing drumlins in the Study Area (Tipper, 1971). 3.2.3 Glaciation Three, and possibly four till sheets have been found along Fraser River North of Quesnel (Tipper, 1971). These indicate that there have been many glacial advances and readvances in the area. It is possible that these advances correspond with the Nebraskan, Kansan, Illinoian, and Wisconsin glaciation, however, Tipper (1971) points out that the latest glaciation in the area, the Fraser Glaciation, has not been correlated with the Wisconsin glaciation that occurred in the rest of northern North America. It appears that the Fraser Glaciation was relatively mild compared with previous glaciations. Glacial grooves not associated with the Fraser Glaciation have been recorded at up to 1829 metres elevation south of the Peace River. This indicates that previous glaciations may have formed ice domes over top of the Rocky Mountains (Tipper, 1971). The Fraser Glaciation originated in the ice sheets of the Coast Mountains, and also flowed outwards from the Cariboo Mountains. Near Williams Lake, the two ice sheets coalesced, and the combined ice sheet flowed northwards towards the Parsnip River Valley, crossing my Study Area in the direction of 10 degrees east of north in the process (Figure 3-1). Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 16 It appears that the glaciers did not cover higher mountain peaks, although the entire Interior Plateau was covered by it (Tipper, 1971). Direction of glacial flow . . Margins and limits of late glacial advances \Glacial lakes -i* . .... ' ^ \ \ f .1. I VVt*,»\\ **r^ ° .^K.}STfi? //£\V V '/Bute Sinter \ \ \Kamloop Figure 3-1. Direction of ice flow during the Fraser Glaciation (Tipper, 1971). Based on the evidence in glacial striations, there may have been two advances of the glaciers during the Fraser Glaciation, the first carrying a heavy load of material, and the second carrying a light load (Armstrong & Tipper, 1948). This might explain some of the evidence that the drumlins were initially streamlined at an angle of 45 degrees west of north, before a second streamlining at 10 degrees east of north. At its maximum, the Fraser Glaciation ice was up to 1829m thick over river valleys such as the Fraser (Armstrong and Tipper, 1948). The ice laid down deposits ranging from 1.5m to 6m thick (Struik et al, 1990), although Armstrong & Tipper (1948) claimed that till up to 122m thick could be found in low-lying areas. In my study area, there are no rock drumlins, and this implies that the till blanket is relatively thick in the area (Tipper, 1971). Tipper (1971) points out "As yet, no end moraines have been recognized associated with the advance and retreat of the Fraser Ice Sheet" (Tipper, 1971, p. 33). Closer to the Coast and Cariboo Mountains, end moraines have been found. The absence of end moraines in the area led Lay (1940) to conclude that glacial growth stalled and the glaciers stagnated in place. Tipper (1971) believes that a steady retreat of the glacier might have created very small, and as yet unidentified moraines. However, in the absence of any evidence, the stagnation theory seems more plausible. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 17 Stagnation of the ice sheet helps to explain how some of the eskers are oriented perpendicular to the direction of ice flow. Tipper (1971) hypothesizes that eskers that are oriented parallel to the direction of ice flow were created during periods of ice movement, and those that are not oriented were formed after the ice stopped moving. This is consistent with the theory that eskers were created as the glaciers were eroding. Meltwater channels and eskers formed some modern streams in the area. South of my Study Area, the Blackwater River, with its parallel eskers, started its life as a meltwater channel (Tipper, 1971). Shaw (1993) describes "hairpin erosional marks" that are wrapped around the stoss sides of the drumlins in the Prince George area. He suggests that these deep grooves are the action of glacial meltwater flowing around the drumlins, and that the drumlins themselves may be the product of flows of meltwater. 3.3 Characteristics of Selected Landforms Eskers are long, sinuous ridges of moderately well stratified sand and gravel that are the remnants of intraglacial or subglacial streams. They are typically found within the zone of erosion near the margins of continental glaciers. Typically, glacial streams create their own beds, and when the glaciers melt, these beds are deposited onto the land surface, leaving long ridges behind. Occasionally, eskers may be beaded, which is the result of erosion after their deposition. Completely straight eskers may also occur; these are the result of streams running through crevasses in glaciers. In some cases, braided streams may create eskers, resulting in an "esker train," which is an interwoven series of eskers (Way, 1973). We found a good example of an esker train near my study area just south of Highway 16 (53°52' N, 123°17' W) near Berman Lake (Figure 3-2). Kames are conical, sharp-ridged hills that form where sand or gravel accumulates in depressions on the surface of a continental glacier. When the glacier melts, the material settles to form hills (Way, 1973). Presumably, kames form at the bottom of meltwater lakes in the surface of the glacier, which explains why the sands and gravels in kames are well stratified, and contain finer material than do eskers. Drumlins are elongated hills that are composed of glacial till (Way, 1973). Normally, drumlins exist in fields, and are rarely found individually. The degree of clustering in drumlin fields varies; drumlins may overlap each other, or they may be widely spaced on an outwash plain (Davies, 1969). The degree of clustering may be affected by the location of rock outcrops, in the case of rock drumlins, or by the effect of dilatancy, the tendency for viscosity to increase with shear forces, as glacial till is moved by the glacier (Smalley & Unwin, 1968). In order for the LCS to identify eskers, kames and drumlins, the NetTIN used must have sufficient resolution. Thus, it is important to know the dimensions of the chosen landforms. Table 3-2 shows the dimensions of eskers, kames and drumlins according to various authors. The table concludes with the minimum reported dimensions for each landform, which is important in defining the DTM resolution. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 18 Figure 3-2. An esker train located south of Highway 16. In the background, parallel eskers are visible (arrows). The esker in the foreground has been mined for gravel, and shows the fine material characteristic of these landforms (photo: B. Maguire). Reference Location Esker Dimensions (m) Kame Dimensions (m) Drumlin Dimensions (m) Charlesworth, 1939 (quoted in Vernon, 1966) Ireland 384 mean length 224 mean width Vernon, 1966 Ireland 308 mean length 155 mean width Davies, 1969 <= 50 high Tipper, 1971 Central British Columbia 182 - 48280 long 1.5-30 high 805-2414 long < 402 wide 15-23 high Way, 1973 805-1610 long <61 wide <31 high < 122 long < 122 wide < 15 high 805-1610 long 152-457 wide 18-61 high Rice, 1977 100-1 Oblong < 2-100 high < 50 high 1000-2000 long 500 wide < 50 high Summerfield, 1991 1000-2000 long ~500 wide 5-50 high Maguire, 2004 (Field Work) (See Section 5.3.3, p. 78) Central British Columbia 3.4 - 6.2 high (n»3) 924 long 262 wide 46 high (n=1) Minimum Reported Dimensions (all authors) 100 long <61 wide 1.5 high < 122 long < 122 wide < 15 high 308 long 152 wide 5 high Table 3-2. Dimensions of Eskers, Kames and Drumlins according to various authors. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 19 3.3.1 Distinguishing Between Similar Landforms Williams (1972) points out "one o f the basic tenets o f climatic geomorphology is that essentially similar landforms will result from comparable morphogenetic conditions." The converse is not necessarily true; different morphogenetic conditions sometimes yield similarly shaped landforms. It is important, however, to remember that similarly shaped landforms may have completely different internal structures. For example, flyggbergs (giant roches moutonnees), drumlins and the hills o f cockpit karst (Williams, 1972) are all similar in size and shape (Figure 3-3a). The first two o f these are glacial in origin, whereas the third is a karstic landform. Davies (1969) showed that there tends to be a continuum in the morphology o f drumlins and roches moutonees (Figure 3-3b). These landforms differ slightly in the angle o f their stoss and lee sides, but their main difference is composition, with roches moutonees being composed entirely of bedrock, and uncored drumlins being composed mostly of sand and silt. (a) (b) Figure 3-3. Karstic and drumlin-like landforms. An example of drumlin-shaped karst formations (a). Note that the long axes of the features in the contour map at bottom are not aligned (Williams, 1972, p. 149). Long profiles of glacially sculpted hills show a continuum of form from roches moutonees through to uncored drumlins (b). A and B - roches moutonees, C and D - crag and tail, E - rock drumlin, F - uncored drumlin (Davies, 1969, p. 172). Morphometric analysis o f T I N s has the potential to reveal minor differences o f slope and slope position, details that would likely be missed by a geomorphologist in the field. T o some extent, this "global view" o f landforms compensates for the fact that the geomorphologist can examine the materials that constitute the landform. Whether such precision enables the system to differentiate uncored drumlins from flyggbergs will be determined in this thesis. A m o n g the many drumlins in the study area are two flyggbergs (Summerfield, 1991), Mt . Baldy Hughes (53°37 ' N , 1 2 2 ° 5 7 ' W) and an unnamed hill 12km south of it (Figure 3-4). Brad Maguire, UBC Geography April 21, 2005 Figure 3-4. Unnamed fryggberg on horizon 12km south of Mt. Baldy Hughes (Photo: B. Maguire). Hobson (1972) describes variation in surface macro roughness as a function of the underlying geology. The ability to deduce information about underlying geology from surface roughness might be a valuable capability for the Landform Classification System. Way (1973) describes how the composition of drumlins can be deduced by examining their side slopes. Coarser constituent material is correlated with steeper side slopes. Smooth, broad slopes indicate that drumlins are mostly composed of fines, whereas steeper slopes indicate that more gravel and boulders are present. Davies (1969) notes a similar effect for kames: the more conical the kame, the coarser the constituent material. In addition to morphometries derived from single landforms, the system might obtain information by examining groups of landforms. This is an important reason why neighbourhood analysis has a role to play in landform classification. Although drumlins and karst peaks both occur in groups, drumlins are strongly oriented in the same direction, and karst landforms have no orientation (Williams, 1972) (see Figure 3-3). The mean orientation of eskers can also be of value. Davies (1969) states that eskers lie roughly parallel to the direction of ice flow, although Tipper (1971) suggests that if the glacier stagnates and ice flow stops, eskers can flow in any direction. Since most eskers in the study area are oriented from north to south1, this provides a possible way to differentiate eskers from synthetic features similar to eskers, such as abandoned roads or rail beds. 3.4 Data Requirements Brown, Lusch and Duda (1998) identify three conditions that must be met before automated identification of landscape types can be performed: 1 One esker train in the Study Area has an east-west orientation, unlike all the others. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 21 1. The landforms must have distinct shapes. 2. The detail in the DEMs used must be sufficient to be able to measure the distinct shapes. 3. The morphometric variables must be adequate for describing the distinct shapes of the landforms. One of the most basic maxims in Computer Science is "Garbage In, Garbage Out." As shown by Brown et al. (1998), there must be an unbroken chain of quality data and analysis from data collection to the final landform identification, if the automated identification of landforms is to be successful. One of the first links in the chain is having sufficient data resolution. Numerous studies to date have attempted to classify DEMs into landscape units and landforms; those that have accomplished this classification at scales near the landform level have relied on high-resolution DEMs (Dikau, 1989, Guzzetti & Reichenbach, 1994, MacMillan et al., 2000a). Unfortunately, high resolution DEMs, which generally require high-resolution custom data collection (Guzzetti & Reichenbach, 1994, MacMillan et al., 2000b), are not available over wide areas, due to the cost of data collection and the limitations of computer storage. Another link required for quality data analysis involves using a data structure represents the land surface accurately. In the following section, I discuss the differences between DEMs and TINs, which are one component of the NetTIN data structure used in the Landform Classification System. There are numerous reasons why TINs appear to be far superior to DEMs in terms of how they represent the shape of surfaces, and hence, landforms. Although researchers have described many different types of classification systems, none have tested the Fuzzy ARTMap Artificial Neural Network for classifying landforms. There are a number of advantages of this system over other classification systems, the most important being high classification accuracies (Seto & Liu, 2003). 3.4.1 Resolution No known studies have traced classification accuracies and error propagation from raw data through TINs, morphometric variables, and landform classification. A number of studies have established the superiority of TINs over DEMs for surface representation (Laurini & Thompson, 1992, Tachikawa, et al., 1996, Kidner & Jones, 2000, and Endreny & Wood, 2001), and some studies have examined the effects of data resolution on morphometric classification (Dikau, 1989). All of these studies of data resolution have used DEMs as the technique of surface representation and none have examined the effectiveness of TINs. It is probably a safe assumption, given that TINs represent topographic surfaces better than DEMs, that classification results will be superior given the same quality of input data. Therefore, where particular resolution values are quoted in studies using DEMs, these values can be accepted for this thesis, although it is possible that the assumptions are too conservative, and that better results will be obtained using TINs. 3.4.1.1 Resolving Landforms Dikau (1989) indicates that landforms must be wider than two pixels in order to be identified. Assuming that this rule-of-thumb applies to all data structures, then the following data resolutions are required to identify landforms that have the dimensions shown in Table 3-2. Table 3-3 shows the niinimum resolution required for the three chosen landforms, as well as for a study that aims to identify all three. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 22 Landform XY Resolution Required (m) Z Resolution Required (m) Eskers <30.5 <0.75 Kames <61 <7.5 Drumlins <76 <2.5 Minimum Resolution Required for All Landforms <30.5 <0.75 Table 3-3. M i n i m u m XY and Z Resolutions requited for eskers, kames and drumlins. Dikau (1989) examined the usefulness of DEMs with horizontal resolutions ranging from 12.5, to 50.0m (Table 3-4). He found that the two coarsest DEMs were unsuitable for morphometric studies of microforms (small landforms): "The investigations have shown that 40 and 50m DEMs produce no adequate results in the modelling for microforms. This means that microform modelling cannot be reproduced with this degree of generalization of the DEM" (p. 58) (Table 3-3). This finding is consistent with my independent conclusions; Dikau found that only the DEMs with vertical resolutions of 0.1-0.5 and 0.5-5.0m were useful. Dikau's results show that we must be cautious when switching between data sources having different resolutions, as resolution changes may have a significant effect on the results. DEM Horizontal Resolution (m) Vertical Resolution (m) 1 12.5 0.1 -0.5 2 20.0 0.5-5.0 3 40.0 2.0-9.0 4 50.0 1.5-5.0 Table 3-4. D E M Resolutions used by Dikau (1989). Several studies over the last decade have provided important clues about the relation between vertical resolution and the effectiveness of morphometric studies. Brown et al. (1998) examined landscapes resulting from continental glaciation in Michigan. In the paper, they describe working with a DEM that had a horizontal spatial resolution 94.56m which was derived from maps having a 50 ft. (15.24 m) contour interval, with supplementary 25 ft. (7.62 m) contours. The authors noted the limitations of such coarse resolution data, and then speculated about the resolution required to show smaller landforms on their DEM: Beach ridges are numerous and well formed on the lake plain in this part of Michigan, but they require a 5 ft. (ca. 152 cm) or 10 ft. (ca. 304 cm) contour interval to be resolved. The numerous smaller kames and eskers of the region are also below the vertical resolution of the topographic map (and therefore, the DEM) (Brown era/., 1998, p. 240). This is an interesting observation, since identifying drumlins, kames and eskers is one of the primary goals of this thesis. If the calculations in Table 3-3 are correct, the horizontal resolution needs to be less than 61m for kames and less than 30.5m for eskers. For vertical resolution, values below 7.5m for kames and 0.75m for eskers should be appropriate; this is in the same range as the values given by Brown et al. (1998). 3.4.1.2 Data Sources Three medium-resolution DTM data sources have been examined which cover widespread areas. The first is the British Columbia Terrain Resource Inventory Management (TRIM) program, in which the entire province of British Columbia has been sampled with photogrammetric mass points and breaklines that were collected to create DEMs having horizontal resolutions of 25m and vertical resolutions of 10m (Geographic Data BC, 1992, p. 35). Although the DEMs Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 23 created from this data are quite coarse, the mass points and breaklines from which they are created have a variable resolution. The BC government designed the mass point and breakline collection procedures for the creation of TINs, which they use to check the quality of the TRIM data. The variable resolution means that the mass points and breaklines are sparse in areas of uniform terrain, but very dense in areas of complex terrain. The TRIM Specifications state: a) In areas where the average slope of the terrain is less than 25°, the average spacing between points will be approximately 100 metres, and approximately 120 points per square kilometre. b) In areas where the average slope of the terrain is more than 25°, the average spacing between points will be approximately 75 metres, and approximately 200 points per square kilometre. (Ministry of Environment, Lands and Parks, 1992, p. 33) The absolute accuracy of TRIM mass points is 10m horizontally and 5m vertically, and the data is rounded to the nearest metre horizontally and vertically (Quackenbush, 2005). The mass points and breaklines have had data corrections applied, and are obvious choices for data, despite being limited in extent to British Columbia. The second DEM data source that covers a large portion of the Earth's surface is the 2000 Shuttle Radar Topographic Mission (SRTM). In 2000, the Space Shuttle flew into orbit with a large radar antenna and, over the course of 11 days, mapped approximately 60% of the Earth's surface with a horizontal resolution of 25m (USGS, 2002a) and an absolute vertical resolution of 16m (Miliaresis & Argialas, 1999, p. 727). The United States Geological Survey (USGS) is releasing SRTM data as 30m horizontal resolution DEMs within the US. Outside the US, they are currently releasing 90m resolution DEMs, and 30m DEMs are available for research purposes by special request (Farr, 2002). Researchers have used SRTM raw data to create DEMs with horizontal resolutions of 25m (USGS, 2002) and vertical resolutions of 16m (Miliaresis & Argialas, 1998, p. 727). Ideally, it would be best to obtain the uncorrected raw data, which is now available for the United States (USGS, 2004). The third data source is USGS 7.5-minute DEMs, which have a 30m resolution, although certain areas have also been mapped at a 10m resolution (USGS, 2005). These files are available for the United States only. In follow-up studies, however, I may have to replace the TRIM data with 7.5-minute DEM or SRTM data if I perform this work outside British Columbia. It remains to be seen whether SRTM data are sufficientty accurate to permit the creation of detailed NetTINs. The 7.5-minute DEMs should have sufficiendy high resolution, although these will have to be examined in detail to assess their vertical resolution and data quality. I will have to compare SRTM and 7.5-minute DEM with TRIM-derived NetTINs to fully assess the suitability of these data sources for landform identification. 3.4.2 Accuracy Both TRIM and SRTM data have horizontal accuracies that are below the threshold shown in Table 3-3, but the absolute vertical accuracy of the SRTM data may be too coarse. Fortunately, within localized areas, (such as the area surrounding an esker), the relative accuracy is more important (Hawkins, 2002). The absolute accuracy value describes how far off the elevation value is from its posted value above a vertical datum, such as mean sea level. For Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 24 morphometric studies, however, it is much more important to know the relative accuracy between one point and the mean of a group of adjacent points. Relative accuracy values are always higher than absolute accuracy values. For example, two adjacent elevation points might have an absolute accuracy value of ± 1 0 m (Figure 3-5). However, the relative accuracy of the instrument that made the measurements might be only ± 3 m. In other words, if Point A is determined to have an elevation of 520.1m, and Point B is determined to have an elevation of 503.2m, we know that Point B ranges from 517.1m to 523.1m relative to Point A. Because the relative accuracy is quite good, it enables us to determine that Point B lies at the base of an esker, whereas Point A lies on top. Point A: 520.1m Point B: 503.2 i Absolute Accuracy: How accurate are all the measurements relative to the vertical datum (mean sea level)? 0m ± 10 m Relative Accuracy: How accurate is a local measurement relative to the mean of its nearest neighbours? — Datum (mean sea level) Figure 3-5. Absolute and Relative Accuracy in Digital Terrain Models 3.5 Data Structures for Storage and Analysis Mapping agencies provide most elevation data as regular grids, which allow the easy creation of Digital Elevation Models (DEMs). This prevalence of this type of data is an offshoot of the current popularity of DEMs as a way to represent topographic data. As Kidner and Jones (1991) point out "...irregularly sampled elevations [used in TIN construction] are less well understood and often dismissed by GIS users who prefer the quick-fix solution of the D E M " (p. 380). DEMs are an inappropriate data structure for this thesis because I must represent the land surface extremely accurately. I use NetTINs instead, which combine a stream network with a Triangulated Irregular Network (TIN) to represent the land surface accurately. Within each cell of a D E M are many possible elevation values, but the algorithm chooses only one and assigns it to the entire cell. Because of this, a D E M approximates the land surface, and is therefore less accurate than the points from which it was created. O f course, the smaller the pixel size in a D E M , the more accurately that D E M will reflect the actual ground surface. Unfortunately, this fact often leads D E M users to conclude that all problems of D E M analysis are simply problems of insufficient resolution, rather than problems with a data structure that does not represent the land surface very well. Kidner and Jones (1991) sums this point up suctincdy: "Critics of the D E M argue that the grid cell is an aberration which over-simplifies terrain modelling (Lee, 1991b)" (p. 380). Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 25 Laurini and Thompson (1992) describe some of the limitations of DEMs: 1. There is an a priori fixed resolution. 2. None of the three methods of determining the attribute for each cell is perfect. 3. The DEM creation algorithm replaces the exact location of measurements (a point) with the area covered by the grid cell. 4. Rasters do not record point and line features precisely. Laurini and Thompson expand on the second limitation to show why there are problems with each of the methods used for determining the attributes for each cell. The rule of dominance states that the polygon value that occupies most of the area of each grid cell is chosen. The rule of importance takes the most important value for the grid cell attribute. Finally, the algorithm can simply choose value closest to the centre of the cell, irrespective of its importance or dominance. In general, DEMs are seen by Laurini and Thompson to be a poor approximation of the data: "Gridding procedures providing regular tessellations do not usually recognize data points, do not provide explicit topological information, and are not adjusted to known conditions like breaklines" (Laurini and Thompson, 1992, p. 252). Fortunately, TINs are a much better alternative to DEMs for the accurate representation of the Earth's surface since they are able to reflect the complexity of the land surface, using a series of triangles. Laurini and Thompson (1992) recognized that triangles are a good way to represent terrain when they stated, "It is generally thought that, at least visually, it is preferable to break up a surface into triangular facets rather than squares or other polygons" (p. 246). A number of authors have supported and expanded upon this position, particularly when it comes to the usefulness of TINs for accurately representing the land surface: The most relevant improvements in computer software for landform analysis are in image processing, pattern recognition, the analysis of shape (morphometry), and GIS - particularly the capabilities for spatial taxonomy promised by such advanced techniques as the Triangulated Irregular Network (TIN) (Pike, 1988a, p. 16). Kidner and Jones (2000) pointed out how TINs can be useful for terrain studies: " . . . the triangle cell of the TIN should best represent the surface behaviour between elevation samples. This is essential for modelling many spatial processes of the physical environment that are dependent upon terrain, such as landslides, hydrology, or erosion (Weibel, 1997)" (p. 380). Unlike DEMs, whose square data structure requires the surface to be mathematically recreated before morphometric analysis can take place, the TIN data structure creates and stores a piecewise linear interpolation of the surface that can be used immediately for morphometric analysis. The resolution of the TIN is variable; where the land surface is complex, the algorithm creates more triangles, and where the land surface is simple, it creates fewer ones. The variable resolution improves the accuracy of some morphometric measures. Mark (1975) quotes Trewartha and Smith (1941): "The size of the rectangle for which relief readings are made appears to need adjustment for the degree of coarseness or fineness of the relief pattern" (p. 168). Although Mark notes that Trewartha and Smith did not indicate how to do this, it is one of the inherent properties of a TIN. Because they have variable resolution, UNs make better use of the data Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 26 that is available: "In essence, the TIN is finely tuned to the variability of the terrain and its surface features, whereas its equivalent regular grid DEM would possibly need to be at a metre resolution or less in order to capture the surface with similar precision" (Kidner and Jones, 2000, p. 395). Thus, the TIN, with its variable resolution, plays an important part in minimizing the propagation of error in this project. Despite all of this support for TINs, there are some significant drawbacks in using this data structure. Laurini and Thompson (1992) point out that " . . . their [TINs] creation is computationally demanding; there are many possible triangulations for any set of points; and they can miss important aspects of surface morphology unless the edges are constrained to fit major breaks of slope" (p. 251). The sheer number of triangles, which increase with the addition of breaklines, put a strain on computer resources. Whereas a computer can analyze a DEM in seconds, producing and analyzing TINs may take one or more hours of computer time. Fortunately, the system can perform these operations in a batch process, which allows it to analyze large numbers of map sheets over several nights. In addition, the constant increases in computer processing speed are making the demands involved in processing TINs increasingly irrelevant. Fortunately, including TRIM breaklines ensures that TINs reflect breaks of slope. One other side effect of the breaklines is that they may produce many narrow triangles if the density of the points in the breaklines is high. Narrow triangles can cause the land surface to be very rough, and can adversely affect a number of morphometric variables, such as slope (Hawkins, 2003). Oh the other hand, some authors, such as Kidner and Jones (2000) contend that sliver triangles can be useful when representing surfaces that are strongly curved in one direction, but not the other. This situation might occur along the banks of a meandering river, where the surface is essentially flat on the plane that intersects the river, but is highly curved on the perpendicular plane. Creating a U N out of the thousands of input points in the mass points and breaklines is not an easy task, but the Delaunay triangulation process is a well-recognized algorithm that rapidly produces an optimal triangulation. Cause&Effect employs this algorithm during the creation of NetTINs. Given the support lent to TINs, it appears that there are few reasons to choose DEMs, particularly when ungridded input data, adequate computer resources and the necessary computer software are available to perform the analysis. 3.6 Morphometric Variables In theory, the process of identifying landforms from their shape should be easy. After all, geomorphologists visually identify landforms by their shape on a daily basis. Since a DTM shows none of the vegetation, and is viewed from overhead as if the viewer were in an aircraft, the task should be even easier than for a geomorphologist on the ground. Pike (1988b) makes the connection between visual interpretation and computer analysis: "The visual perception of topographic form should be possible to simulate using numerical methods and digital elevations" (p. 492). The question is how to make that connection. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 27 Morphometry, or more precisely, geomorphometry (Evans, 1972), is the measurement of the shape of the Earth's surface. Researchers have proposed dozens of different ways of measuring landform surfaces. There has been much debate on the meaning and appropriateness of different morphometric variables. Certain morphometric variables are mathematically identical to others, others have little "real world" meaning, and still others are being created today (Yokoyama et al, 2002). There are four goals for morphometric analysis (paraphrased from Pike, 1998b). They are: 1. Numerically describe continuous topography without being restricted to customary geographic units. 2. Represent continuous topography as a surface composed of discrete lines and planes. 3. Assess the information content of topography by identifying geometric constituents of landform and determining their relative importance and degree of independence. 4. Automate such topographic description by computerized algorithms that require little or no manual interaction. It is interesting to note that Pike's goals correspond closely to the aim of this thesis, although I use morphometric analysis to recreate "customary geographic units." In particular, the second item is support for the use of NetTINs, which divide the land surface into a series of triangular planes. Although morphometries are the dominant technique used for describing the land surface, they have some limitations, and are not the only technique in use (Chorowicz et al., 1989). Mather (1972) describes some of the limitations of describing areal data using numbers: 1. The arbitrariness involved in defining a geographical individual. 2. The effects of variation in size and shape of the individual areal units. 3. The nature and measurement of location. While I have already discussed the third item in Section 3.4.1, the first and second items demand special attention. In an effort to overcome the arbitrariness of landform names, authors such as MacMillan et al, (2000b) have made use of continuous classification systems that avoid many of the problems encountered when trying to identify particular landforms. Two problems with this approach are that the results of these all-encompassing classification systems require special training, for both identification and interpretation, and such systems are unknown to the vast majority of people who work with landforms on a daily basis. The system used by MacMillan et al (2000b) (modified from Pennock et al, 1987) is also the only one that is applicable on the scale of individual landforms. Many researchers, such as Hammond (1954), Wallace (1955), Linton (1970) and Crozier & Owen (1983) have designed landform classification systems for analysis at small map scales. Hammond, for example used 6x6 mile squares in devising his classification system (Brabyn, 1997). The alternative to such special-purpose landscape classification systems is using the common, sometimes contradictory names for landforms, which have developed on an ad hoc basis over the course of centuries. These names vary both by country and by individual. The naming of landforms has been a haphazard process; a geomorphologist may give a name to a distinct configuration of the land surface, but may ignore the "uninteresting" land that surrounds it. An obvious example from glacial geomorphology is the fact that "kames" (ice-contact deposits, forming conical hills) have few physical or genetic relationships with "delta kames" (relict deltas formed in drained glacial lakes) and "kame Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 28 terraces" (terraces formed at the edges of glaciers), other than the fact that they were all created by glaciers (Strahler & Strahler, 1992). Another example is the continuum in drumlin-like landforms. Mitchell (1991) discusses how the defining characteristic of a landform is its amount of internal variation. In an ideal situation, a classification would identify what Woolridge (1932) called a "morphological electron" — a landform with uniform lithology, soil and vegetation structure. Examples include small outcrops, gullies on hillsides, and swampy patches in fields (Mitchell, 1991). Unfortunately, even the finest custom-collected DTMs lack the resolution to differentiate these small features. The geological history of most places is enormously complex. Although we can reconstruct some of that history based on the evidence contained in the landforms themselves, many of the nuances will forever remain hidden to us. These nuances make every landform unique. In reality, few landforms match their "textbook" definition. The Landform Classification System must determine whether a particular landform is close enough to match a particular textbook definition. 3.6.1 Morphometries on NetTINs Currendy, the standard approach in performing morphometric analysis on TINs is to convert the TIN into a DEM and then to calculate the morphometric variables from the DEM. The main reason for doing this is that a number of morphometric variables are calculated using "moving windows," which are groups of grid cells that are used for calculating mean values. Moving windows are typically square, and are 3x3 or 5x5 pixels in size, although different shapes and larger groups may be used. For each pixel in the input grid, the moving window retrieves the values for the surrounding pixels as the basis for some calculation (e.g., the mean) (Figure 3-6). Figure 3-6. A 3x3 moving window (tan) moves across a D E M (white grid) in an orderly pattern. A value calculated from the nine cells in the D E M is assigned to the centre cell (dark green) of a new D E M . The new value could be based on any operation, including calculation of the mean, to effect smoothing, or a directional filter, to emphasize certain features. Moving windows and other forms of raster analysis are computationally simple and thus run very quickly even on older desktop computers. In part, this simplicity explains their popularity. Another reason that DEMs were used for morphometric analysis is simply because, until now, it was the only computer technique widely available to do this type of analysis. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 29 For this project, I have developed a new type of neighbourhood analysis, which uses only NetTINs. It allows the direct calculation of morphometric variables on the NetTIN, and preserves much of the accuracy of the original data. Because I don't need to approximate the surface through the conversion of the NetTIN to a DEM, much of the morphometric information that was present in the original measurements is preserved. 3.6.2 Morphometric Variables of Interest Of the many different morphometric variables that have been described in the literature, a large number are either redundant or have litde to offer in improving the characterization of landforms. Variables that have a fundamental role in the description of landforms, together with those that show special promise for distinguishing between eskers, kames and drumlins will be short-listed in order to obtain the best classification possible. In 1972, Evans published a massive 73-page paper summarizing all of the research on morphometry to date. In that paper, Evans was able to classify and organize all of the known morphometric variables into a rigorous framework, and he was able to evaluate the effectiveness of each for geomorphological work. Slighdy later, David Mark (1975) evaluated 14 additional morphometric variables. We can group these variables into fundamental, elevation-based, hypsometry, landscape complexity, drainage and landform-based categories: Fundamental Variables • Elevation (Altitude) • Slope (Gradient) • Aspect • Vertical convexity • Horizontal convexity • X-Coordinate • Y-Coordinate Elevation-Based Variables • Local relief • Standard deviation of elevation • Mean elevation • Skewness of elevation • Available relief • Mean available relief • Drainage relief Hypsometric (Area/Relief) Variables • Hypsometric integral • Hypsometric curve Landscape Complexity Variables • Texture ratio • Ruggedness number • Roughness factor • Triangle area • Peak density • Ridginess • Reticulation • Positive openness • Negative openness Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 30 Drainage Variables • Drainage density • Source density Landfonn-Based Variables • Mean orientation • Standard deviation of orientation • Topographic component In summarizing his discussion, Mark states: "While there may be some redundancy among the parameters noted, it is believed by the writer that with the possible exception of local convexity, all important terrain information is contained within the above measures" (p. 175). I need to determine exacdy which of these I should keep and which of these I should eliminate in order to reduce the amount of programming required. An examination of the literature available helps to clarify the variables of importance. 3.6.2.1 Fundamental Variables According to Evans (1972), the most fundamental classification of variables is related to elevation and its various derivatives: "Building on some recent work of Tobler (1969), I therefore suggest that altitude at a point, and its first two derivatives, provide the unifying concept around which general geomorphometry may be built" (Evans, 1972, p. 22). Later, he explains how the following key morphometric variables are all vertical or horizontal derivatives of elevation (Evans, 1972): • Elevation (Altitude) (Z) • Slope (Gradient) (Z'v) • Aspect (Z'h) • Vertical convexity (Z'V) • Horizontal convexity (Z"h) Some support for Evans' assertion comes from Brown et al. (1998) who used five morphometric variables to identify landscape features in their study: elevation, slope, local relative relief, local roughness, and upslope area. Using an Error Matrix to determine the effect of each morphometric variable, they determined that elevation alone contributed 18.1% towards the overall 57.1% classification accuracy. No other variable individually contributed more than 1.2% to the total (p. 245). Although it is easy to measure elevation from a DEM, Evans (1972) warns that "The rougher surfaces formed by higher derivatives require increasingly fine-meshed altitude matrices [DEMs] for adequate definition" (p. 50). In terms of NetTINs, which are piecewise linear representations of a slopes, elevation is represented by the points, slope and aspect are represented by the triangles, and convexity and concavity are represented by the triangles and their immediate neighbours. Slope is the change in elevation with distance. Strahler (1956) described slope as "perhaps the most important aspect of surface form, since surfaces are composed completely of slopes and slope angles control the gravitational force available for geomorphic work" (Evans, 1972, p. 36). Evans also points out that the separation of slope and aspect is Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 31 valuable to geomorphologists, since the effects of aspect are very small when compared with the effects of slope, since slope acts to modulate the force of gravity. Guzzetti and Reichenbach (1994) point out one of the drawbacks of slope analysis with DEMs: "Slope is possibly the single most descriptive measure of mesoscale topography . . . theoretically a point value, slope measures the rate of change of altitude over a finite length and is therefore highly sensitive to pixel size" (p. 58). In addition to DEMs, any field studies that measure elevation at regular intervals also suffer from this problem (Brabyn, 1997). The intuitive answer, for DTMs as well as field studies, is to measure slope between breaks of slope, by dividing hills into areas of uniform slope (Doomkamp and King, 1971). This is exacdy how the Delaunay triangulation process segments the landscape to create a NetTIN. Aspect, the horizontal component of the first derivative of elevation, is the direction that a particular slope faces. Whereas slope modulates the effect of gravity, aspect modulates the amount of solar radiation that hits a particular slope. Slopes with a southern aspect tend to be warmer, drier and less snowy than slopes with a northerly aspect. We can see this phenomenon on eskers that run east west in our study area (Figure 3-7). Guzzetti and Reichenbach (1994) suggest that aspect is of value as a morphometric variable: "Such topographic properties as azimuth (aspect) and the topologic arrangement of ridges and valleys, which describe landscape pattern in plan, would vasdy improve the numerical taxonomy" (p. 72). (a) (b) Figure 3-7. Two photographs taken from the same point on the crest of an east-west running esker. The north slope (a) features luxuriant growth and high understory, while the south slope (b) has low shrubs and open meadows. (Photos: B. Maguire) To summarize, both slope and aspect are important morphometric variables, but they are best treated separately. The algorithm for vertical convexity uses the relationship between slopes in adjacent triangles, and horizontal convexity uses the relationship between aspects in adjacent triangles. In this thesis, I will refer to convexity and concavity together with the term "convexity." Since both convexity and concavity express the curvature of slope, I have adopted the convention that concavity is the same as negative convexity. Thus, a convex surface (where adjoining edges face opposite directions) extends from 180° to greater than 0°, a flat surface has a convexity of 0° and a concave surface has a convexity of less than 0° to -180°. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 32 Vertical convexity, the vertical component of the second derivative of elevation, allows for the identification of the transition zone between my selected glacial landforms and the till plain on which they reside. It is also important in watershed studies, because it affects soil and water collection properties, and affects whether an area is primarily erosional or depositional in nature (Mark, 1975). Horizontal convexity is the horizontal component of the second derivative of elevation. In lay terms, it is the amount of change in a hillslope's aspect. The variations in convexity within a particular landform provide information on the sinuosity of the landform, which is particularly important for distinguishing eskers from other landforms. Since eskers meander along their course, they express a high amount of variation in their horizontal convexity. Blaszczynski (1997) states that determining the vertical convexity by calculating the second derivative of elevation, as proposed by Zevenbergen and Thorne (1987), did not serve his purposes, and proposes an alternative method, which works from the centre of each cell in a DEM to the centres of the nine adjacent cells, to calculate a convexity value for each cell. I use a similar approach to calculate convexity in a NetTIN by selecting the neighbours of the NetTIN in the upslope, downslope, left and right directions (Figure 3-8). Figure 3-8. Local convexity can be defined based on the slope and aspect values downslope, left, upslope and right from every triangle in a NetTIN (Maguire, 2003a). I have included the X and Y coordinates of the centroid of each triangle in our set of fundamental morphometric variables, in an effort to allow the ANN to recognize the relative position between triangles. If two triangles are close together, their X and Y coordinates will be similar. If the ANN can pick up the similarity between the coordinates, it may be able to "consider" spatial autocorrelation when defining the clusters that define the different landforms. This might allow the ANN to coalesce groups of classified triangles into landforms. 3.6.2.2 Elevation-Based Variables Since elevation is a point measurement, researchers have created numerous elevation-based calculations that are more suitable for areal features, such as landforms. Local relief is the difference between the highest and lowest elevations occurring within an area. Mark suggests, "It would appear that for both computational and geomorphic reasons, localized relief for standardized sample areas represents the best single measure of the vertical dimension" (Mark, 1975, p. 169). Unfortunately, because local relief uses the extremes of elevation, it is highly sensitive to outliers, so Evans Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 3 3 (1972) urges the use of standard deviation of elevation instead. As he emphatically puts it, "The case for use of standard deviation of altitude as the measure of relief is overwhelming when combined with the use of mean and skewness of altitude ..." (p. 33). Mark suggests three additional measures of relief: available relief, mean available relief and drainage relief. The first is available relief, which is "the vertical distance from the former position of an upland surface down to the position of adjacent graded streams" (Johnson, 1933, p. 295 quoted in Mark, 1975, p. 168). This one morphometric variable does not apply particularly well to my study area, since the "former position of an upland surface" is unknown, as glaciers have scoured the area and a blanket of glacial till now covers much of the study area. A second, more useful variable is Dury's mean available relief, which is "the average height of the land above the streamline surface, computed as the difference in volumes under the actual and streamline surfaces, divided by the area" (Dury, 1951, p. 342-3 quoted in Mark, 1975). This is quite useful, since the topography in the study area is quite variable, depending on whether eskers, kames, or drumlins are the predominant landform in the area. The third morphometric variable of interest is drainage relief. This is the vertical distance between adjacent divides and streams. This can be a useful diagnostic variable for distinguishing drumlins from the other landforms, since the drainage in the drumlin fields is deranged and not highly incised. In areas near eskers, the drainage is much better pronounced than near the drumlins, and therefore there has been more erosion, which increases the available relief in these areas. 3.6.2.3 Hypsometric Variables Hypsometry describes the "distribution of mass under the topographic surface" (Mark, 1975, p. 166). The hypsometric integral is a measure of the distribution of elevation values in an area. Mark relates Strainer's 1952 observation that the hypsometric integral reflects the "age" of landscape evolution; Mark contends this is the only morphometric variable that has a proven relationship to geomorphic processes. Evans (1972) points out that the hypsometric integral is highly sensitive to outliers, however. As a replacement, he suggests calculating the skewness of the elevation distribution, which is less sensitive. As he says, "The skewness of this [elevation] distribution contains all of the information sought from the hypsometric integral, but without resorting to extreme values" (Evans, 1972, p. 46). In addition, Evans points out that this also replaces all measures of "dissection" or "aeration." A related variable is the hypsometric curve, which is the plot of the area above a height versus relative height. Since this is simply another way of quantifying the elevation distribution curve, the addition of this variable does not seem particularly valuable. 3.6.2.4 Landscape Complexity Variables Mark (1975) discusses six variables that he considers useful for describing landscape complexity. The first of these is the texture ratio, the number of crenulations on the most convoluted contour within a drainage basin divided by the length of the perimeter of the drainage basin. Unfortunately, this requires contours, which are not included in the NetTIN. Including contours would violate one of the conditions of this study, which is to use a NetTIN as the only source of data in this thesis. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 34 Mark (1975) next describes Strahler's (1958) ruggedness number, which is defined as local relief multiplied by the drainage density. Mark shows that this value is equal to half of the mean slope, and thus is of no additional value. The roughness factor is a localized measure of the amount of variability in a surface. Mark (1975) describes an implementation of the concept by Hobson (1972), which makes use of the normal vectors of regularly spaced triangular facets (apparendy a precursor to TINs). He comments that using irregularly spaced triangles would be more appropriate, since the algorithm could make use of the area of the triangles (Mark, 1975). I will use this modified approach to calculate the roughness factor on a NetTIN. Because of the Delaunay triangulation process, the area of each triangle is inversely proportional to the complexity of the land surface represented. We can thus use triangle area as an accurate measure of surface complexity. Triangle area was added to my chosen morphometric variables after the Principal Component Analysis was performed, because it seemed to be a much more intuitive measure of slope complexity than the roughness factor. Peak density is "the number of closed hilltop contours per unit area" (Mark, 1975, p. 167). This might be a useful morphometric variable to help distinguish between drumlins and kames, each of which define a peak, and eskers, which are more ridge-like. Although one of the constraints of this project is to avoid the use of contours, we can replace closed hilltop contours with peaks, which are implicidy defined in the NetTIN. A peak is defined as a triangle vertex that has a height greater than all direcdy adjoining vertices. Conversely, ridginess, the total length of ridges per unit area (Speight, 1968, quoted in Mark, 1975), will emphasize the ridge lines of eskers, and possibly the elongated crests of drumlins, but not the conical peaks of kames. The last variable mentioned is reticulation, which Speight (1968, p. 248 reported in Mark, 1975) defines as the size of "the largest connected network of crests that projected into a sample area." In other words, the crests of landforms often form a network of ridges. Within any particular neighbourhood, there may be one or more networks. The total length of the largest network of ridges that enters a particular neighbourhood is its reticulation. This morphometric variable might be very useful in dissected landscapes, but in my study area, where the landforms are relatively smooth, it is unclear what this variable would show that ridginess does not. Yokoyama et al. (2002) have defined two new morphometric variables, which they call "positive openness" and "negative openness." Openness is the degree to which an imaginary observer can see the sky above or the land surface below. As an example, the top of a mountain has an unrestricted view of the sky, so it has a high positive openness value, whereas the bottom of a depression has a view mosdy of the ground, and thus would have a high negative openness value. In studies of formedy glaciated terrain, openness should help to distinguish small landforms. According to the authors, "The resulting maps of openness superficially resemble digital images of shaded relief or slope angle, but emphasize the dominant surface concavities and convexities" (Yokoyama et al 2002, p. 257). In an area of irregular topography, the authors note: "Landforms in the vicinity of Mt. Fuji in south-central Honshu [Japan] (Figure 7) that are easily recognized from openness textures include aligned low hummocky hills formed in flank eruptions prior to 1707" (p. 260). Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 35 3.6.2.5 Drainage Variables Since water flow plays a large part in the development of landforms, it makes sense that morphometric variables related to drainage have a part to play in the identification of landforms. Drainage density measures the total length of stream channels per unit area (Horton, 1945, p. 283 quoted in Mark, 1975). Drainage density includes both streams that originate in an area and those that merely pass through it. Given that my study area is reasonably small and has low relief, with few high mountains in the area to induce orographic rainfall, it is probably safe to assume that rainfall is uniform across the whole area. Because of this, differences in drainage density are a product of differences in landforms, soil composition and history. Sudden drops in drainage density may indicate the presence of well-drained sands and gravels. Unfortunately, this is not a clear-cut case, since there is least one sinkhole in the area, indicating the likelihood of underlying limestone caves, so it is also possible that such drops are the result of streams entering cave systems. Source density is another morphometric variable that has the potential to reveal information about subsurface conditions. Source density is the number of stream sources per unit area (Mather, 1972, p. 311 quoted in Mark, 1975). For my purposes, however, the distinction between source density and drainage density is negligible. Areas which have a high source density, which might indicate a point of aquifer discharge, will likely have a high drainage density unless the water re-enters the aquifer, so there is litde value in including this variable in my study. 3.6.2.6 Landform Variables In addition to the basic variables described by Evans (1972) and Mark (1975), a number of "special purpose" morphometric variables appear to be of value in helping to identify eskers, kames, and drumlins, and to allow them to be differentiated from one another. As mentioned in Section 3.3.1, orientation is one key to help differentiate drumlins from other landforms. Mean orientation is helpful in differentiating those features that were oriented in the direction of former ice flow, such as drumlins, and to a lesser extent, eskers, from unoriented landforms, such as kames, and landforms that are oriented in other directions. The standard deviation of orientation should be helpful for distinguishing sinuous features such as eskers, from strongly oriented features, such as drumlins. For this reason, this morphometric variable has been included with our standard set of variables. The topographic component is the area surrounding a ridge, which extends downwards to the nearest stream or breakline. Simple landforms, such as kames, which are conical in shape, will be equivalent to the topographic component that extends from their crest. More complex landforms, such as the flyggbergs in the study area, have breaklines and crests that split and coalesce, and are broken into multiple topographic components as a result. 3.7 Minimizing Variable Redundancy Each morphometric variable is the expression of one or more underlying pieces of underlying information. Elevation is probably the "purest" of the morphometric variables that I have examined, because it is a measurement of distance above a datum, and contains no other information. Other variables, such as drainage relief, are a combination of many types of information, including elevation, triangle size and stream elevation, which is dependent on climate and Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 36 paleoclimate, the shape of the drainage basin to which it belongs and the underlying geology of the area. Since drainage relief is partially the product of elevation, it is not a "pure" variable, and some of the information that it expresses "overlaps" with elevation. Although the morphometric variables chosen for this thesis have already been evaluated in the literature, it is likely that some of the variables, in particular the more complex ones, will overlap with each other. Principal Component Analysis (PCA) identifies those variables that overlap to the greatest degree, allowing the removal of those that overlap from the analysis. This reduces processing time is improves the quality of the analysis. Mather (1987) states that "The purpose of Principal Component Analysis is to define the number of dimensions that are present in a dataset and to fix the coefficients which specify the positions of that set of axes which point in the directions of greatest variability in the data . . . . These axes are always uncorrelated" (p. 207). Ian S. Evans, one of the pioneers of morphometric analysis, warns of the mchscriminate use of PCA. He writes: "it seems unlikely that principal component analysis or factor analysis will clarify the concepts involved in geomorphometry" (Evans, 1972, p. 20). One of his concerns with PCA is that it is possible to include "accidental contributions from irrelevant variables." He points out, quite correcdy, that PCA is no substitute for the careful choice of morphometric variables (Evans, 1972). While this is important to consider when designing a morphometric study, PCA still has a role to play in identifying overlap between carefully chosen variables. Although I will use PCA in this thesis, I will not apply it mdiscriminately to winnow through every morphometric variable that has ever been proposed. I will use Principal Component Analysis to evaluate only those promising morphometric variables that I listed in Section 3.6.2. Since I am only attempting to identify eskers, kames and drumlins in this thesis, some of the morphometric variables identified in the literature will not be helpful in differentiating between these. Removing these unused variables will simplify the analysis, but is done at the risk of eliminating some that will help in identifying other types of landforms. While this is not critical in this thesis, classification of further landforms may require me to redo the PCA. 3.8 Classification of Training Data There are numerous techniques for creating "clusters" of data. I broadly divide these into statistical and non-statistical approaches. The most common statistical method is the Maximum Likelihood (ML) classifier (Brown et al, 1998). The k-means clustering algorithm is another similar technique (Jones, 2003). One of the assumptions for the ML classifier is that data are normally distributed. Unfortunately, researchers often use data that violate this assumption (Foody et al., 1995, Seto and Liu, 2003). Nonstatistical classifiers do not assume normally distributed data, so they are a better choice when I cannot check the distribution of the incoming data. Nonstatistical classifiers can be broadly divided into rule-driven (top-down) or data driven (bottom-up) (Meech, 2004). Rule driven classifiers, such as Expert Systems and Fuzzy Logic, are best when the problem at hand is well known, and there are many rules available for assisting with the classification. In this thesis, however, the data-driven nonstatistical Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 37 classifiers, such as Artificial Neural Networks (ANNs), show the most promise. These classifiers work particularly well when it is difficult to elaborate the rules for identifying features, but it is easy to visually identify the features in question. For landforms such as eskers, kames and drumlins, this approach works welL since I am able to identify landforms on the NetTIN (Maguire, 2003a). A number of requirements for an ANN have become apparent during my literature search and in previous class projects. One of the main problems with many types of ANNs is the difficulty of training them. Given a particular set of inputs and desired outputs, there are hundreds of combinations of ANN type, number of layers, nodes and connections. It is not surprising to find out that the configuration and training of these ANNs is an art, with no more than a few rules of thumb to help with what can be a very difficult task (Hammerstrom, 1993b, Openshaw & Openshaw, 1997, Brown etal, 1998). Perhaps the most important consideration is the difficulty of training ANNs to handle complex problems. While investigating the ability of a Multi-Layer Perceptron (MLP) ANN to locate crests of sand ripples in images taken in a flume, it became obvious that this type of ANN does not work well in finding the best solution to complex problems (Maguire, 2003b). The function to describe a problem forms a "surface" in multidimensional space, and the goal of an ANN is to find the (global) minimum on that surface. Unfortunately, the MLP uses a "gradient descent" algorithm, and is only able to locate the local minimum as a result. With a complex surface that may have thousands of local minima, it is extremely unlikely that an MLP will ever find the global minimum. This means that it is necessary to train and retrain an MLP ANN until it produces an acceptable answer. For this thesis, I need an ANN that is able to find the global minimum efficiently. Many varieties of ANN can only be trained at the beginning of a project. If additional classifications are required later, then the user must add new training data to the original, and then he must retrain the entire ANN from scratch. In this thesis, I am attempting to classify eskers, kames and drumlins. In future, however, if this work is continued, a tool will be required that can be expanded to identify additional landforms. The difficulty of having to train an ANN separately for each landform is compounded by the fact that the user must train the MLP ANN multiple times to produce an acceptable result on a complex surface. These two limitations effectively preclude this thesis from making use of an MLP ANN. Fortunately, some newer varieties of ANNs such as Fuzzy ARTMap dynamically define the configuration of the network. The ANN begins with the simplest possible case and increases network complexity as necessary to classify the training data. The Fuzzy ARTMap ANN (Carpenter and Grossberg, 1992) ignores local minima and learns constantiy. Fuzzy ARTMap automatically clusters data according to a vigilance parameter, p (0.0-1.0); the higher the value for p, the more numerous and smaller the clusters become. At extremely high levels, with p approaching 1.0, Fuzzy ARTMap will memorize the examples provided, and will be unable to classify anything that is not an exact match to a training example. At low levels, when p approaches 0.0, all examples (even outliers) will be grouped into a single cluster. In Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 38 between these extremes, Fuzzy ARTMap will group data into a number of clusters based on like values, and outliers will be assigned their own clusters. Compared with the MLP ANN, Fuzzy ARTMap is more suitable for larger scale problems with its rapid training and classification. Carpenter and Grossman (1992) state: "Fuzzy ARTMap leams in five training epochs a benchmark that requires twenty thousand epochs for back propagation to leam (Lang and Witbrock, 1989)" (p. 760). Seto and Lui (2003) also claim that Fuzzy ARTMap produces more accurate results than does MLP. One additional important advantage of Fuzzy ARTMap should pot be overlooked: Carpenter and Grossberg (1992) claim to have solved the "black box" aspect of ANNs: "At any stage of learning, a user can translate the state of a Fuzzy ARTMap system into an algorithmic set of rules" (p. 40). The quantification of the Fuzzy ARTMap learning may be lead to unexpected discoveries about the dynamics of landform development. 3.8.1 Training Method I have designed the Landform Classification System to assist professional geomorphologists in rapidly classifying landforms over a large area. The system imitates the geomorphologist's skill at visually identifying landforms. To do this, the system displays the NetTIN, so that the geomorphologist can train the system by highlighting the areas covered by landforms. Visually identifying the landforms on a NetTIN is easier for some landforms than for others. Although it is relatively easy to identify the larger drumlins, it is much more difficult to identify kames, eskers and the smaller drumlins in our study area. Fortunately, there is some supplemental information available that helps to identify the landforms of interest.2 There appears to be a continuum in the size of the d^ rumlins in our study area. At one extreme, I have flyggbergs, which fall outside the normal definition of a drumlin because of their size. At the other extreme, there are small streamlined hills and sections of hills that do not show up on low resolution TINs (Figure 3-9). During a ground reconnaissance in the summer of 2004, we flagged every drumlin that we could see using GPS receivers. The detail in Figure 3-9 suggests that vegetation obscured many of the smaller drumlins that we missed during the GPS reconnaissance. TRIM data contains supplemental information for eskers, which can assist geomorphologists in outlining the extent of the eskers on the NetTIN. A photogrammetrist explicidy recorded some of the larger eskers during map data collection (presumably because they were useful as breaklines) but ignored many of the smaller eskers (Figure 3-10). This was confirmed during field studies conducted during the summers of 2003 and 2004, when I found that many eskers, particularly those in esker trains, were not shown on TRIM topographic maps. 2 This is technically a violation of the project constraints that I laid out in Section 1.6, but is necessary to ensure that the training data are as accurate as possible. Later, when the system is more refined, it should not be necessary to have perfect training data. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 39 Figure 3-10. TRIM data does not always identify all the eskers in an esker train. The two eskers at left are from TRIM data (light blue) and the five additional eskers at right are from a GPS survey conducted in 2004 (black). Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 40 3.9 Summary To summarize the main points of this chapter, I will use eskers, kames and clnimlins to test an automated landform identification system in my study area southwest of Prince George, British Columbia. The system will use four NetTINs constructed from TRIM data. I use one of these NetTINs for training, and use the other three to test the classification accuracy. Of the 30 morphometric variables described in this chapter, I removed 8 based on the critical analyses that I have summarized. I will use the following variables in the remainder of my research: Elevation (Altitude) Slope (Gradient) Aspect Standard deviation of elevation Mean elevation Skewness of elevation Vertical convexity Horizontal convexity X-Coordinate Y-Coordinate Topographic component Mean available relief Drainage relief Roughness factor Triangle area Peak density Ridginess Positive openness Negative openness Drainage density Mean orientation Standard deviation of orientation The Fuzzy ARTMap ANN creates clusters of morphometric variables to represent the three chosen landforms. Because of the characteristics of Fuzzy ARTMap, training should be relatively easy, and, in future, we can add additional landforms to the system, if required. The system will compare triangles that the system has automatically classified with manually classified triangles to determine their accuracy. The system will produce an Error Matrix, calculate Producer's and User's accuracies, and perform Kappa Analysis to allow me to examine the accuracy of the classification. This literature search has helped to answer many questions about the directions that my thesis research should take. In the next chapter, I will examine the morphometric variables in much greater detail. I will discuss how other researchers have calculated these variables using DEMs, and I will show how NetTINs can calculate these variables. I will subject these variables to Principal Component Analysis in order to remove those that overlap, and to identify which variables are the best for identifying each landform. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 41 4 Morphometric Variables and Principal Component Analysis The often implicit belief, that factor or principal components analysis of as many variables as can easily be measured, will solve problems of variable redundancy, runs counter to the first law of computer science: "Garbage In, Garbage Out." Many of the operational definitions used in geomorphometry are extremely poor as measures of the intended concepts (Evans, 1972, p. 17). Morphometric variables isolate and describe particular characteristics of the Earth's surface. With a detailed Digital Terrain Model, such as a NetTIN, a computer can extract many different types of information about the shape of the land surface. Although different authors have proposed dozens of morphometric variables over the past several decades, simply programming them all into the LCS would not be productive. Some of the variables represent essentially the same data, for example, maps of "elevation" and "mean elevation" appear very similar. These variables overlap partially; complete overlap might occur if I analyzed two exact copies of the same morphometric variable. The approach taken in this thesis is to carefully choose those morphometric variables most recommended in the literature (see Chapter 3). The topographic component and triangle area were added to the list of variables to be used after the Principal Component Analysis (PCA) had been run, so these were not included in it. We use PCA to identify and remove any overlapping variables from the list at the end of Chapter 3 prior to further analysis. To isolate those variables that are important for the identification of particular landforms, I used the LCS to identify and isolate the triangles belonging to eskers, kames and drumlins. The system made use of PCA to separately examine each set of triangles to identify the variables of greatest importance for each landform. In addition, the system examined the morphometric variables in the entire TIN model to determine whether they helped to identify particular landforms. In this chapter, I first discuss some of the peculiarities of the NetTIN data structure, and how they affect morphometric analysis. Next, I describe morphometric variables in general terms, and present the PCA results from SPSS (SPSS, 2002). Finally, I describe the shortlisted morphometric variables that I will use for landform identification. 4.1 Implications of the TIN Data Structure One of the first major decisions in this project was to make use of a NetTIN data structure. The GIS industry often uses TINs as an "alternative" method of creating three-dimensional models of the land surface, and frequendy uses them to create slope and aspect maps. For more involved morphometric analysis, however, GIS practitioners most frequendy use DEMs, which are easier to analyze. In this study, I make use of NetTINs, which contain a TIN data structure together with a stream network. Converting a NetTIN into a DEM unfortunately involves throwing away all of the advantages of NetTINs, including the better representation of the land surface and the more compact data storage that is associated with NetTINs. It makes more sense to create ways to adapt morphometric calculations to the NetTIN data structure, in order to ensure Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 42 that the resulting morphometric variables maintain the accuracy levels found in the NetTIN. Even a decade ago, performing morphometric analysis direcdy on NetTINs would have been very difficult, and analysis on DEMs was really the only practical option. Today, this is no longer a valid concern, given the speed of today's computers. 4.1.1 Methods for Analyzing and Processing TIN Data There are a number of ways for us to implement the morphometric analysis of NetTINs. Each method has implications for the speed of computation, amount of storage space required and types of calculations that are geometrically possible. For a NetTIN data model, there are three ways of looking at space: first, as a series of points, second, as a series of regions surrounding the points, and third, as a series of triangles connecting the points (Figure 4-1). Figure 4-1. Point, Voronoi and Triangular Data Structures. In (a), each point has an X, Y and Z coordinate, but the slope needs to be dynamically calculated for each point. In (b), the X, Y, and Z coordinates of the points are preserved, and the mean slope can be calculated for the area nearest to each point. In (c) the X, Y, and Z coordinates are still preserved, and the mean slope is calculated for the triangular faces that are created between the points. The first way of dividing up space is to focus only on the points that make up the triangles in the NetTTN (TRIM mass and breakline points), and ignore the space between them. The advantages of this are that point measurements, such as elevation, are stored direcdy in the NetTIN, with their original precision and accuracy. This method makes no assumptions about what exists in the unmeasured space between points. However, areas, not points, are the basis of most morphometric variables, so to compute these, this method must interpolate the data for the areas between the points. This creates a large computational burden during the calculation of areal morphometric variables. The solution to this problem is to interpolate the data for the areas between the points, and permanendy store these data for future use. Such an interpolation process creates a tessellation, the complete division of all of the space covered by the NetTIN. A tessellation simplifies the conversion of morphometric calculations from those applied to regularly gridded DEMs, since, in both cases, I am working with areas. Our second and third techniques for dividing up the map are the two types of tessellations that come naturally out of the NetTIN data structure: Voronoi polygons and triangles. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 43 Voronoi polygons, the second technique, divide the map into regions centred on each point (Figure 4-lb). The Voronoi polygons that result are the topological duals of the triangles used in the NetTIN. Unfortunately, since triangles are the basis for the NetTIN data structure, this process must assemble the values from many triangles in order to obtain the values for the area within the Voronoi polygon. This results in an averaging of many values and a dilution of the accuracy that is inherent in the NetTIN. In addition, the process of converting data from a triangular tessellation to a Voronoi tessellation creates large computing overheads. The third possible technique simply makes use of the existing NetTIN triangles. This has the advantage of providing complete coverage of the area within the NetTIN, and uses predefined procedures for calculating simple morphometric variables such as slope and aspect. One small disadvantage is that this technique must use the centroids of the triangles for elevation, which is a point-based measurement. This forces the use of interpolation to obtain the Z values, which introduces rounding error. One of the results of the Delaunay triangulation process used to create NetTINs is that the triangles represent areas of uniform slope. There is no uniform slope when we divide the surface using points or Voronoi polygons. The advantage of having triangles with uniform slope is that it allows the normal vector to the triangle to be calculated, which is an important step in the calculation of aspect, slope and downhill direction (Martin, 2004). In the LCS, I have chosen to use the NetTIN triangles because it is the easiest structure to derive, has the least computing overhead, preserves much of the original data quality, and is the most appropriate way to calculate areal morphometric variables. NetTIN data structures are robust, can represent any type of terrain including overhangs and are a well-established technology in the GIS industry. 4.1.2 Neighbourhood Analysis Regulady gridded DEMs are simpler to manipulate than NetTIN data structures. In particular, it is very easy to pick a 3x3 or 5x5 neighbourhood window around a central pixel in a DEM. Identifying neighbourhood pixels involves calculating a simple offset in X and Y from the central pixel. The program can repeat this calculation for each pixel in a DEM, allowing the extension of neighbourhood calculations to the entire DEM. The DEM creation algorithm assigns a single value to the entire pixel, resulting in the exaggeration of that value's importance. In addition, fixed 3x3 or 5x5 neighbourhoods assume that we know nothing about the complexity of the data in that neighbourhood. Since the size of each triangle in a NetTIN is inversely proportional to the complexity of the underlying terrain, the Triangle Neighbours algorithm creates neighbourhoods that encompass areas of similar complexity. In other words, a first order neighbourhood around one triangle may be twice the area of a first order neighbourhood around another triangle, but the total amount of variation within each neighbourhood will be approximately the same. This results in the preservation of spatial properties, particularly spatial autocorrelation. Although the calculations are more involved in creating neighbours for a NetTIN than for a DEM, precalculating the first order neighbours for each triangle in the NetTIN minimizes the total amount of processing. Although the precalculation is lime-consuming (about 1.5 hours for 17,000 triangles on a Sun server), once this step is complete, it takes only a few seconds to apply the resulting index to a morphometric variable to obtain all of the neighbouring Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 44 values for each triangle in a NetTIN. The structure of the first order neighbourhood indicates the identity of the neighbouring triangles for each triangle in the NetTIN (see Table 4-1). Once it has identified the first order neighbours, the algorithm uses this information to assemble the values from the neighbourhood. The system can then run statistical functions on the assembled values to calculate morphometric variables. For example, in Table 4-1 c, running the mean function on the assembled slope values for triangle 1 (2.1, 0.5 and 1.0) produces in a mean slope of 1.2. Index Slope (%) 11 1.2 12 2.1 13 1.0 14 0.5 15 1.3 Index Neighbours 11 12 14 13 12 11 14 13 15 13 15 12 11 14 14 12 11 13 15 15 12 14 11 Index Neighbour Slopes (%) 11 2.1 0.5 1.0 12 1.2 1.3 1.0 1.3 13 1.3 2.1 1.2 0.5 14 2.1 1.2 1.0 1.3 15 2.1 0.5 1.2 The indices in the first order a) Original table of b) The first order neighbours show c) slopes, one for each which triangles are neighbours to neighbours are used to pick slope triangle in the NetTIN which others. In this example, values for the neighbouring triangles 2, 4 and 3 are neighbours to triangles. triangle 1. Table 4-1. How neighbourhoods are applied to triangle values. This is a very simplified example; in reality, each triangle would have a minimum of six first order neighbours. The NetTIN neighbours that join at an edge can easily be separated from those that join at a vertex (Figure 4-2). Those triangles that join along an edge share a common border, which means that their centres are closer than those that join at a vertex. Spatial autocorrelation thus suggests that these triangles will have more in common than those triangles that adjoin the triangle only at the vertices. In the NetTIN (Figure 4-2a), the algorithm can identify those triangles that join along an edge (area 2 — green) as well as those that join only at a vertex (area 3 — yellow). While it is relatively easy to make the same differentiation on a DEM, square neighbourhoods are nearly always used. In Figure 4-2b, note that only the 3x3 and 5x5 neighbours of the centre pixel (area 1 — blue) are generally calculated. Table 4-2 describes the relationships in Figure 4-2a. I have formalized the neighbourhood order shown in Table 4-2 into the concept of full and half neighbours. Half neighbours adjoin the centre triangle at its edges and full neighbours adjoin at the edges and vertices. The half and full neighbour naming convention can then be extended to allow additional layers of triangles to be added to the outside of the neighbourhood. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 45 • 1 • • 3 4 5 (a) (b) Figure 4 -2 . Analogous neighbours of a triangle in a TIN (a) and of a pixel in a regularly gridded D E M (b). Note: Although the TIN neighbourhood in Figure 2 is shown as a square area for clarity, in practice the TIN should be created from points of significant elevation and breaks of slope, not regularly spaced points. See Table 4 -2 for an explanation of the colours and numbers used. Neighbourhood Order Description 0.0 (Area 1 - Blue) The centre triangle only 0.5 (Area 2 - Green) Half neighbours. All triangles that adjoin the centre triangle at an edge (always 3 for a TIN) 1.0 (Areas 2, 3 - Green & YeUow) Full neighbours. The full neighbours are composed of the half neighbours shown above, which join at an edge, plus all neighbours that join at the vertices (at least 6 for a TIN). 1.5 (Areas 2, 3, & 4 - Green, Yellow & Orange) The full neighbours, as described above, with the half neighbours of the outside triangles included 2.0 (Areas 2, 3, 4, & 5 - Green, Yellow, Orange & Red) The full neighbours, with the full neighbours of each outer triangle joined in. Table 4-2. Description of neighbours in Figure 4-2a. 4.1.2.1 Triangle Neighbours In Cause&Effect, the software package in which I am developing the Landform Classification System, NetTINs are stored using the quad-edge data structure (Guibas and Stolfi, 1985). Guibas and Stolfi designed this data structure for three-dimensional computer graphics, in particular the display of polygons in three-dimensional space. It also works very well for the analysis of triangles in NetTINs. In a quad-edge data structure, all polygons are broken down into their constituent edges. Each edge has four pieces of information associated with it: the to-node, the left polygon, the from-node and the right polygon. With this information and the location of the vertices in three-dimensional space, it is possible to construct the topology for the triangles in a NetTIN. Using the quad-edge data structure enables the half neighbours of each triangle to be calculated. Once a particular triangle is selected, for example triangle 162 in Figure 4-3, the edges in which 162 occurs as either the left or right triangle can quickly be identified, which informs us that edges 38, 54 and 86 make up triangle 162. Taking the left and right triangles for each of these edges, and removing the duplicates and polygon 162 from the combined list tells us that triangles 183, 32 and 56 are the half neighbours to 162. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 46 Figure 4-3. Triangle 162 is constructed using a quad-edge data structure. Edges 38,54 and 86 form the edges of the triangle. Edge 54 has a from-node (F ~ 82) a to-node (T ~ 154), as well as a left (L — 32) and right (R — 162) polygon (measured when facing the to-node). Dashed lines show the edges of adjoining triangles (based on Guibas and Stolfi, 1985). We can extend this procedure to identify the full neighbours (those that adjoin the triangle of interest at either an edge or a vertex). By obtaining a list of all of the from-nodes and to-nodes for the edges that make up triangle 162, the algorithm can identify that it uses nodes 44, 82 and 154 (see Figure 4-3). Other triangles join 162 at these vertices; by finding all of the edges and the triangles that use these edges, we obtain a list of all triangles that join triangle 162 at a node. Since the quad-edge data structure cannot store triangle record numbers sequentially, the record number for each triangle needs to be adjusted, once the system has identified all of the half and full neighbours for each triangle in a NetTIN. In addition, the NetTIN contains a number of triangles on the outside, which create a convex hull around the data points, which is a requirement of the NetTTN data structure. Both of these changes are required to number the Triangle Neighbours in the same way as other Cause&Effect functions used in the LCS. 4.1.2.2 Convexity Neighbours Horizontal and vertical convexities require a modified version of the Triangle Neighbours mentioned above. They require information about the 'left," "right," "up," and "down" neighbours of each triangle (Figure 4-4). If I determine the aspect of each triangle and then draw a line in that direction to the next adjoining triangle, that triangle is the neighbour in the "down" direction. I can determine the "right," "up," and "left" neighbours by rotating this line 90, 180, and 270° (See Figure 3-8, p. 32). This algorithm uses the same "left" and "right" terminology as does the Triangle Neighbours algorithm, above, but the meanings are different. The Convexity Neighbours algorithm always identifies the left and right triangles using the assumption that the observer is facing downhill. The "up" and "down" neighbour Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 47 triangles ate used to calculate vertical convexity, and the 'left" and "right" neighbours are used to calculate horizontal convexity. Figure 4-4. Horizontal convexity is measured along the slope, with "left" and "right" triangles being compared, and vertical convexity it measured along the line of steepest descent, with "up" and "down" triangles being compared. Here (a) and (d) are concave and (b) and (c) are convex. 4.1.2.3 Topographic Neighbours Topographic Neighbours are similar in implementation to the Triangle Neighbours described in Table 4-1, except that neighbourhoods are much larger in area. The areas between the ridges and streams in the NetTIN form regions. These ridge-centric areas are the Topographic Neighbours. Thus, for simple conical hills such as kames, a single topographic neighbourhood covers the entire landform. The system may divide more complex landforms, which are composed of multiple ridges, into several topographic components. This variable was added in an effort to "inform" the ANN about which triangles were in the same area, in order to help agglomerate the patchwork of classified triangles in the raw ANN output into more cohesive regions. 4.2 Morphometric Analysis The use of a NetTIN data structure and the novel way in which I have calculated neighbourhoods from it have required me to modify the published morphometric algorithms. This section briefly discusses the algorithms for the morphometric variables used in the Landform Classification System. 4.2.1 Elevation Direct measurements of elevation are the basis of NetTIN construction. The Delaunay triangulation process used in the construction of the NetTIN discards redundant elevation measurements (van Kreveld, 1997), and creates a triangulated network out of the remaining points. The algorithm chooses points based on their importance in representing the topography of an area. The triangulation process removes the data points that do not contribute much to the overall shape of the NetTIN, so the resulting triangles represent areas of roughly uniform slope and aspect. To calculate elevation, the algorithm must first interpolate each triangle's centroid from its three vertices. The program then extracts the Z coordinate from the centroid to provide an elevation value for the triangle as a whole. Elevation is measured in metres above Mean Sea Level. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 48 4.2.2 Slope Slope is a measure of the steepness of the land surface. The procedure for creating a NetTIN results in triangles that represent areas of nearly uniform slope and aspect, as discussed above. Slope can be determined by simply examining the orientation of the triangles in a NetTIN. The Cause&Effect path_down_to_net (<Tin>) command calculates a flow path from the centre of each triangle downhill to the nearest watercourse. The first segment of the flow path, by definition, extends along the surface of the triangle from the centre to the edge. The algorithm uses the start and end points of these segments to calculate the amount of rise and run, and by dividing rise by run, it calculates the slope of the triangle. Slope is measured in degrees, from 0° (flat) to 90° (vertical). 4.2.3 Aspect Like slope, aspect is relatively easy to extract from each triangle in a NetTIN. First, the algorithm extracts a list of triangles from the NetTIN. The Cause&Effect polygon_unit_normals (**<Point>) command uses these to produce lm long vectors that are normal to the surface of each triangle. The direction of the normal in the X-Y plane, which is equal to the aspect of each triangle, is determined for each normal vector. The LCS measures aspect as a bearing (0-359°) clockwise from True North. 4.2.4 Standard Deviation of Elevation The standard deviation of elevation calculation makes use of the first order Triangle Neighbours to obtain all of the elevations for the triangles in the neighbourhood around each triangle. The sdev ( * * 0 . 0 ) function calculates the standard deviation of these values, and assigns the result back to the centre triangle. 4.2.5 Mean Elevation The algorithm for mean elevation organizes the elevations using the first order neighbours of each triangle in the NetTIN. It then applies the mean ( * * 0 . 0 ) function to obtain the mean elevation of all the neighbours, and assigns this value back to the centre triangle in each neighbourhood. Mean elevation is measured in metres above mean sea level. 4.2.6 Skewness of Elevation The algorithm for skewness of elevation uses the first order neighbours to return all of the elevation values in each neighbourhood. The algorithm uses the skew (** 0 .0 ) function to return the skewness of the elevation distribution within each neighbourhood. 4.2.7 Horizontal Convexity The calculation of horizontal and vertical convexity in the LCS uses a highly modified version of the algorithm proposed by Blaszczynski (1997). Instead of using the nine neighbouring cells in a DEM, it uses the Convexity Neighbours for the NetTTN, which I described in Section 4.1.2.2 on page 46. Horizontal and vertical convexity calculations are a little difficult in the LCS. The majority of morphometric variables are based on area, so the use of the NetTIN triangles, which are areal features, works well. However, convexity Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 49 functions, which use the values of adjacent triangles, create some issues. For horizontal convexity, the algorithm must combine two values: the angle between the triangle and the "left" triangle, and the angle between the triangle and the "right" triangle. Of course, there can only be a single value for the horizontal convexity of each triangle. A number of options were available to address this discrepancy. These included: • Subtracting one value from the other • Taking the mean of the two values • Picking the higher or the lower value of the two • Systematically discarding one of the convexity values The same problem exists for vertical convexity, in that there are two angles formed using the central triangle and the "top" and "bottom" triangles. To combine these two angles and calculate horizontal convexity, I chose the first option and subtracted one angle from the other to obtain the difference. This, in effect assumes that the centre triangle does not exist, and that the "left" and "right" (or "top" and "bottom") triangles are adjoining. The algorithm then assigns the difference between these angles to the centre triangle. This algorithm for horizontal convexity must compensate for angles that cross 0°, such that the difference between 350° and 20° is 30°, not 330° (Figure 4-5). Horizontal convexity is measured in degrees; concave angles range from -180° to 0°, convex angles range from 0° to 180°, and flat surfaces have an angle of 0°. 350° 7K D 2 0 ° 20° _ 350° 7K.k D \ \"R/ (a) (b) Figure 4-5. Horizontal Convexity. The triangles shown are viewed from above. A convex surface with a horizontal convexity of 30° (a) and a concave surface with a horizontal convexity of -30° (b). Illumination is from the right. Using this method is a conservative approach that smoothes the NetTIN, since two values are always combined into one. The same is true if we were to take the mean of the two values. Picking the lower value effectively smoothes the DEM, while picking the higher value accentuates ridges and valleys. Systematically choosing either the left or the right angle preserves most of the angles, but introduces a spatial offset, since a value from the edge of the triangle is being "moved" to the centre. 4.2.8 Vertical Convexity The vertical convexity algorithm is similar to the horizontal convexity algorithm, only it subtracts the slope of the "up" triangle from the slope of the "down" triangle. It is important to compensate for the aspect of the triangles. The algorithm must reverse the slope of the "up" triangle if the aspect of the "up" triangle is more than 90° away from the aspect of the "down" triangle. The system expresses vertical convexity in degrees; values range from -90° to 90°. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 50 Negative values indicate concave surfaces, positive values indicate convex surfaces and a value of 0° indicates a flat surface. 4.2.9 X-Coordinate The X-Coordinate was originally included in an effort to group together features that lie in close proximity to one another. The X-Coordinate is measured in metres east of a false easting (UTM projection, Zone ION). I removed this variable from further analysis in December 2004, when I realized that this coordinate negatively affected all classifications of map sheets other than the one used for training. 4.2.10 Y-Coordinate As with the X-Coordinate, the Y-Coordinate was included in order to allow the Artificial Neural Network to have some "understanding" of spatial autocorrelation, by allowing features that are close together to be grouped using this variable. The Y-Coordinate is measured in metres north of the equator (UTM projection, Zone 10N). This variable was removed from analysis in December, 2004, because it negatively affected all classifications of map sheets to the north and south of map sheet 093G.066, which was used for ttaining. 4.2.11 Topographic Component In December 2004, it became apparent that there were some problems with my attempts to classify landforms based solely on a classification of the individual triangles. Although the system created high densities of classified triangles in areas, there was no cohesiveness to the distribution — it was apparent that the ANN was treating the triangles in a statistical rather than a geographical fashion. Many unclassified triangles were present even in areas having an overwhelming majority of classified triangles, and the results were not intuitive or visually pleasing. One way to resolve this problem is to provide the ANN with the "Topographic Components" that make up the landforms. A Topographic Component is simply a Topographic Neighbour that I have used as a morphometric variable. The algorithm gives every triangle within a particular Topographic Neighbourhood the same value. Simple conical hills such as kames consist of single topographic components, but more complex hills, which have breaklines or complex ridges, are composed of many topographic components. Using topographic components should allow the ANN to identify that the triangles in a topographic component belong to a landform, so that particular topographic components become predictors of landform existence or absence. 4.2.12 Mean Available Relief The Landform Classification System determines mean available relief based on the Dury algorithm (1951, quoted in Mark, 1975). Mark defined mean available relief as "the average height of the land above the stream-line surface" (Mark, 1975, p. 169). This algorithm calculates mean triangle elevation using the Triangle Neighbours. In addition, the algorithm clips out all rivers within the boundaries of the neighbourhood. It then subtracts the mean triangle elevation from the mean elevation of the rivers to obtain the mean available relief within each Triangle Neighbourhood. Mean available relief is measured in metres. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 51 4.2.13 Drainage Relief Drainage relief is similar to mean available relief, except that instead of subtracting the mean triangle elevation from the mean river elevation within a neighbourhood, the algorithm subtracts the mean elevation of all ridges within the neighbourhood from the mean river elevation within that neighbourhood. This results in higher elevation differences than those created by the mean available relief variable. The Cause&Effect ridge_edges (<Tin>) command identifies all ridges (i.e. where triangles meet at a sharp edge and point downslope in different directions) in a NetTIN. The algorithm next calculates the elevation difference between the ridges and the bottom of the streams for each triangle on the surface and takes the mean. Drainage relief is measured in metres. 4.2.14 Roughness Factor The roughness factor is the density of elevation changes in an area. The LCS calculates the roughness factor using Hobson's method (Mark, 1975). With this method, the algorithm creates unit normal vectors for each triangle in the NetTIN using the Cause&Effect polygon_unit_normals (**<Point>) function. The program scales each normal vector by the area of each triangle, and then sums the scaled normal vectors within each first order neighbourhood. It then divides these values by the sums of all of the vectors in the NetTIN, and converts the results to percentages. 4.2.15 Triangle Area In the LCS, the NetTIN data structure simplifies calculating triangle area because most of the necessary calculations were performed during NetTIN creation. The algorithm determines triangle area by using the Cause&Effect polygon_area (*<Polygon>) command to determine the area of each polygon in the NetTIN. Triangle area is measured in square metres. 4.2.16 Peak Density The LCS calculates peak density with the peaks (<Tin>) function, which identifies all of the peaks on the map sheet. The function defines a peak as any triangle vertex on the map that has an elevation that is greater than or equal to the elevation of all vertices with which it is connected. The algorithm selects and counts the peaks within each triangle's neighbourhood. The algorithm then divides the count by the total area of the neighbourhood, which is determined by summing the area of all triangles within the first order neighbourhoods, to calculate the peak density in peaks per hectare. 4.2.17 Ridginess The system calculates ridginess by dividing the length of ridges by the area of the neighbourhood. The Cause&E-ffect ridge_edges (<Tin>) command returns all ridges within the NetTIN. The algorithm then calculates the area within each neighbourhood by summing the areas of all the triangles. Ridginess is measured as the length of ridges (in metres) per hectare. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 52 4.2.18 Positive Openness The LCS calculates both positive and negative openness (Yokoyama et al, 2002). The approach they used involves extending lines out a set distance from each triangle centre at 45-degree increments. For each triangle, the algorithm drapes eight 1000m long radiating lines over the surface of the TIN, which allows the angle of each point created above the horizon to be calculated. The maximum angle above the horizon is determined for each of the eight radiating lines, and the algorithm averages these values to give the positive openness. Positive openness expresses the average zenith angle; it describes as the average amount of sky that is visible from the visible horizon to the zenith. Positive openness is measured in degrees. 4.2.19 Negative Openness The LCS calculates negative openness in the same fashion as positive openness; the system takes the mean of each radiating line's nadir angle (the maximum angle below the horizon). The algorithm experiences an edge effect, which seems to be the result of the radiating lines near the edge "dropping off the TIN. Negative openness expresses the average amount of ground visible from the nadir to the visible horizon, and is measured in degrees. 4.2.20 Drainage Density The system calculates drainage density by first summing the total length of rivers within each neighbourhood and by then dividing this value by the area of the all triangles within the neighbourhood. Double-line rivers, such as the Fraser River on sheet 093G.067, have each of their sides counted as a separate river. This is not presendy a critical problem, but I will need to address this in future, if I expand the system to classify fluvial landforms. Drainage density is expressed as metres of river per hectare. 4.2.21 Mean Orientation Mean orientation, according to the literature, refers specifically to entire landforms. In the LCS, I make use of the Topographic Neighbours instead of the outline of the entire landform. Mean orientation is relatively simple in concept, but its implementation is less obvious. Because bearings range from 0° to 360°, it is difficult to obtain meaningful statistics on angles. Batschelet (1981) describes how to obtain meaningful statistics for angles, where the mean of 358° and 2° is not 180°, but 0°. The technique involves creating a unit vector for the particular angle, summing the vectors, and then determining the angle that connects the start and end points. This approach was used to create two functions: circular_mean (**0 . 0) and circular_SD(**0.0) (standard deviation). 4.2.22 Standard Deviation of Orientation This algorithm for the standard deviation of orientation uses the above-described circular_SD (**0 . 0) function to calculate the standard deviation of the elevation for all triangles within a Topographic Neighbourhood, and return this value to the central triangle. The system should calculate the standard deviation of orientation for entire landforms, but calculating it within a Topographic Neighbourhood is a reasonable strategy for use in the LCS. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 53 4.3 Principal Component Analysis Principal Component Analysis (PCA) is a multivariate statistical technique that simultaneously examines the distributions of multiple sets of data. It creates new synthetic variables, can reduce the number of variables involved in an analysis or allow for the examination of data and identification of clusters (Shaw and Wheeler, 1985). In this thesis, PCA helps me to identify which variables significandy overlap, so that only one of the overlapping variables can be included in the analysis. An added benefit of PCA is that it helps us to understand the data, which is a prerequisite to successful analysis of the data using ANNs (Meech, 2003). Another technique to help evaluate which variables are best is Discriminant Analysis, the use of which is described in Section 7.2.3.2. PCA creates a series of synthetic variables known as eigenvectors that best explain the distribution of a series of data points in multidimensional space (Figure 4-6). This is similar to how a regression curve traces the best line through a plot of two variables. The first eigenvector is the best "straight line" that passes through all of the data. The variance remaining after the creation of the first eigenvector is isolated, and the PCA algorithm creates a second eigenvector to explain it. It creates successive eigenvectors until it has explained a majority of the variance, and what then remains we can consider to be measurement error. Because PCA begins by extracting those eigenvectors with the most variance, it favours those input variables that have the most variance (SAS Institute, 2005). Figure 4-6. How eigenvectors characterize the distribution of data. The first eigenvector (a) shows the trend for a majority of the data. The second eigenvector (b) shows the trend for much of what remains; and the third eigenvector (c) shows the trend for the data points that were not well explained by the first or second. Eigenvectors are commonly created in multi-dimensional space, with one dimension for each variable in the PCA (after Shaw and Wheeler, 1985). Each morphometric variable will have some level of correlation with the each of the eigenvectors that the PCA has created. A component matrix shows the correlation strength between each variable and eigenvector. Variables that are related to an eigenvector will have positive values, inverse relations will have negative values and variables with no relationship to an eigenvector will have values of 0.0 (Shaw and Wheeler, 1985). I am only concerned with the absolute value of the correlation when evaluating the PCA results; a negative correlation simply implies that the inverse of the variable is highly correlated with an eigenvector. Whether the sign is positive or negative, a strong correlation indicates that the eigenvector represents the data well. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 54 This thesis uses PCA to choose which morphometric variables to keep for analysis with the Fuzzy ARTMap ANN. More than one morphometric variable may be highly correlated with a single eigenvector. If two morphometric variables are highly correlated with a particular eigenvector, then they essentially show the same information, and I remove the variable with the lower correlation value from the analysis. I selected TRIM map sheet 093G.066, which contains 4 eskers, 1 kame and 88 drumlins, for the PCA portion of this thesis. In total, the TIN for 093G.066 contains 27,910 triangles, of which 193 form eskers, 11 form the kame, 8911 form drumlins and 18,795 form other landforms. I used the Statistical Package for Social Sciences (Statistical Package for Social Sciences [SPSS], 2002) to load the data and run the PCA. I first used PCA to examine the entire map, and then I used it on the subsets of triangles that made up the eskers, kame and drumlins. I evaluated component matrices for each of the four runs to identify the most significant morphometric variables in each. First, I examined each column to locate the correlation with the highest absolute value. I then matched each value with the morphometric variable that produced it. The Artificial Neural Network was then provided with these morphometric variables for analysis. Next, I removed those variables that had a significant ovedap with those already selected, using an arbitrary cut-off value of 0.1. In other words, if the highest-ranking variable for an eigenvector had a correlation of 0.680, I eliminated all overlapping variables having correlations between 0.580 (0.680 - 0.1) and 0.680. Those variables that I did not explicitly select or remove from the analysis were included to round out the variables chosen. In some cases, I excluded a morphometric variable because it overlapped another variable for an important eigenvector, even though it was the most important variable for an eigenvector of lesser importance. In cases such as this, the analysis of the more important eigenvector overrides the analysis of the less important eigenvector, and the variable remains excluded. Table 4-3 through Table 4-5 shows the component matrices produced by the PCA first for the entire sheet, the eskers, the kame and the drumlins, respectively. Each column represents an eigenvector; the leftmost is the most important, and the rightmost is least important. Each row shows the correlation between a single morphometric variable and each of the eigenvectors. The component matrices depict the highest correlation values and the morphometric variables associated with these correlations in large red text, ovedapping variables (to be excluded) in black italics, and the remaining variables are displayed in green Arial font. Table 4-7 is a summary of the results for all the component matrices. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 55 Component Matrix Component 1 2 3 4 5 6 7 8 9 10 11 12 % Of Variance Explained 17.942 10.639 6.850 5.981 5.768 5.315 5.063 5.063 4.954 4.66314.656 3.754 ELEV .802 .531 -.118 -.023 -.016 .018 .067 .011 .013 -.062 -.053 .001 SLOPE .076 -.008 .456 -.168 -.559 .260 -.120 -.001 -.115 -.082 -.006 .063 ASPECT .132 -.071 .217 -.556 .165 .078 .180 -.064 .035 .549 -.153 .289 SDELEV .752 -.323 .140 .093 -.022 024 -.136 -.091 -.041 .067 .163 .111 MEANELEV .796 .541 -.126 -.025 -.018 .020 .064 -.003 .016 -.064 -.052 .004 SKEWELEV -.142 .161 -.256 -.257 -.140 .118 .094 -.037 -.136 .118 .858 -.065 ROUGHFAC -.447 .563 .126 136 .016 -.014 -.091 .007 .020 .141 .000 .086 MEANORIE .052 .286 .030 -.213 .095 -.255 -.640 .176 -.071 .426 -.024 -.193 SDORIENT -.139 .196 .387 -.315 .282 .019 .422 .077 .053 -.108 -.021 -.572 HORICONV .071 -.041 .104 -.037 .079 -.123 .168 .866 -.204 -.126 .078 .331 VERTCONV -.010 -.020 .055 -.072 -.240 .281 -.139 .297 .843 .004 .068 -.055 MEANAVRE .680 -.255 .342 .244 165 -.045 -.013 -.047 .068 .125 .195 -.046 DRRELIEF .692 -.279 .280 .270 .097 -.001 .072 .019 .020 .106 .112 -.121 PEAKDENS -.312 .324 .072 .572 -.096 .037 -.065 .116 -.031 .209 .043 -.088 RIDGINES -.267 .363 .448 .289 .007 .114 .163 .055 -.108 .208 .052 -.072 DRDENS -.277 .348 .278 .086 .211 .007 .146 -.296 .217 -.051 .127 .507 OPZENITH -.006 -.028 -.104 .061 -.165 -.792 .272 -.020 .324 .130 .096 .003 OPNADIR -.022 -.053 -.342 .133 .654 .316 -.079 .100 .150 .053 .076 .018 X -.422 -.357 .429 -.090 .263 -.212 -.344 - 049 .028 -.275 .182 .013 Y -.330 -.588 -.173 .181 -.164 .161 .265 .058 -.044 .396 -.086 -.059 Extraction Method: Principal Component Analysis. 12 components extracted. Table 4-3. Principal Component Analysis Component Matrix for Entire TIN (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arial font (SPSS, 2002). Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 56 Component Matrix Component 1 2 3 4 5 6 7 8 % Of Variance Explained 22.223 18.900 13.063 9.131 6.649 5.972 4.917 4.299 ELEV .892 -.324 .102 -.042 .238 .025 .032 .004 SLOPE .369 .848 -.089 -.182 .007 -.105 -.178 .097 ASPECT .045 .444 -.106 -.710 -.140 .189 -.015 .246 SDELEV .116 .365 .766 -.139 .084 .188 .031 .114 MEANELEV .912 -.334 .087 -.050 175 .027 .003 -.003 SKEWELEV -.066 .021 .225 -.609 -.387 .134 .018 -.308 ROUGHFAC .097 .061 -.770 .400 -.331 -.143 -.004 -.092 MEANORIE .409 .079 -.090 .120 .246 .616 .309 -.216 SDORIENT -.010 -.226 -.493 -.036 .462 .372 -.330 -.098 HORICONV .152 .190 .057 .381 -.278 .555 .213 .342 VERTCONV -.219 -.293 .237 .270 .092 .141 -.586 .409 MEANAVRE -.002 .636 .474 .413 .092 -.024 .096 -.178 DRRELIEF -.170 .597 .418 .408 .270 -.239 .071 -.094 PEAKDENS -.279 .516 -.612 .165 -.003 .206 -.008 -.134 RIDGINES .185 .070 -.407 -.095 .204 -.265 .493 .501 DRDENS .340 -.039 .152 325 -.665 .123 -.083 .079 OPZENITH -.533 -.727 .184 .087 .032 029 .197 .065 OPNADIR -.315 -.843 .193 .019 -.101 .026 .205 -.076 X -.943 .138 -.027 -.088 .100 .134 .066 .072 y -.940 .201 -.001 -.069 .098 .152 .084 .045 Extraction Method: Principal Component Analysis. 8 components extracted. j Table 4-4. Principal Component Analysis Component Matrix for Eskers (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arial font (SPSS, 2002). Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 57 Component Matrix Component 1 2 3 4 % Of Variance Explained 50.518 31.223 9.848 7.139 ELEV .875 -.393 .271 -.077 SLOPE .869 450 -.040 .194 ASPECT -.716 .523 .359 180 SDELEV -.945 .099 -.306 .022 MEANELEV .997 -.025 .072 .019 SKEWELEV .212 .818 .462 -.251 ROUGHFAC .173 .920 -.313 .162 MEANORIE .892 -.184 .395 .096 SDORIENT .96* .055 -.088 -.249 HORICONV -.410 .705 .568 -.016 VERTCONV .245 .444 .545 .613 MEANAVRE -.395 .879 -.053 -.155 DRRELIEF -.673 .684 -.046 -.257 PEAKDENS .325 .825 -.428 .172 RIDGINES .510 .683 -.475 .199 DRDENS .834 -.040 -.166 -.525 OPZENITH -.866 -.471 .132 -.088 OPNADIR -.895 -.428 .108 • 058 .952 - 024 .214 -.147 Iv .186 -.749 -.235 .590 Extraction Method: Principal Component Analysis. 4 components extracted. Table 4-5. Principal Component Analysis Component Matrix for Kames (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arial font (SPSS, 2002). Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 58 Component Matrix Component 1 2 3 4 5 6 7 8 9 10 11 12 % Of Variance Explained I 14.267 12.577 10.893 6.954 6.236 6.030 5.518 5.066 4.895 4.756 4.644 4.121 ELEV .240 .913 .029 .066 .119 -.211 073 -.060 | -.051 .036 .011 -.022 SLOPE .904 -.239 -.024 -.211 -.040 -.091 .003 -.027 -.028 -.037 -.019 -.060 ASPECT .020 .016 .101 -.098 .209 .113 .629 .181 .498 -.085 .334 -.194 SDELEV .228 .111 .629 .001 -.381 -.027 -.049 .102 .171 .176 .007 .150 MEANELEV .234 .915 -.011 .077 .080 -.216 .100 -.066 -.057 .040 .012 -.030 SKEWELEV -.101 .071 -.351 -.362 -.110 -.240 -.135 .170 .328 .185 -.006 .610 ROUGHFAC .220 -.109 -.457 .589 -.116 .032 -.003 .083 .230 -.128 .067 -.034 MEANORIE .185 .204 -.209 .312 -.167 .286 -.231 -.030 .008 -.613 .228 .250 SDORIENT .036 -.073 -.066 -.173 .756 .249 .127 -.119 -.300 -.077 .017 .312 HORICONV .002 .128 .120 -.068 .341 -.046 -.283 .414 .305 -.306 -.606 -.181 VERTCONV .021 -.005 .030 .019 -.014 -.134 -.022 .820 -.463 .006 .303 -.017 MEANAVRE .233 -.086 .680 .373 .071 .196 .033 .010 -.060 .026 -.147 .269 DRRELIEF .208 -.047 .757 .238 .162 .091 -.117 .004 .112 .129 .138 .054 PEAKDENS .003 -.150 -.255 .464 .174 -.219 -.254 -.088 -.085 .361 -.017 -.186 RIDGINES .214 -.142 -.320 .374 .384 -.089 -.025 .130 .266 .281 .104 .172 DRDENS .079 -.094 -.230 .292 -.244 .113 .625 .133 - 189 .083 -.496 .176 OPZENITH -.875 .297 .162 .175 .092 .052 -.049 .049 .058 .011 .011 .054 OPNADIR -.872 .169 .037 .143 -.093 .078 .074 .003 -.005 .023 .013 .021 X .061 .084 -.226 -.165 -.060 .792 -.206 .145 .063 .354 .000 -.138 Y -.314 -.713 .201 .037 .067 -.346 .037 -.044 .017 -.151 .051 .003 Extraction Method : Princip traded. ai Component Analysis. 12 components ex Table 4-6. Principal Component Analysis Component Matrix for Drumlins (Minimum Eigenvalue 0.8). Most significant regression values and variables are shown in large red text, overlapping variables are shown in black italics and remaining variables are in green Arial font (SPSS, 2002). Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 59 Morphometric Variable Full TIN Drumlins Eskers Kames Elevation X X • Slope V •/ • Aspect X • Standard Deviation of Elevation X • X X Mean Elevation X X Skewness of Elevation • • Roughness Factor X Mean Orientation s • Standard Deviation of Orientation X • X Horizontal Convexity X Vertical Convexity X Mean Available Relief • X • X Drainage Relief • • • Peak Density • • X Ridginess X • X X Drainage Density • X • Positive Openness X • • Negative Openness X X • X-Coordinate X X Y-Coordinate • X • Table 4-7. Summary of Morphometric Variables with the Highest Correlations to Eigenvectors, as Identified by PCA. Red check marks indicate that these variables have the highest correlation for a component, green dots indicate remaining variables that have no overlap, and black crosses indicate overlapping variables that were eliminated. 4.4 Geography of 093G.066 In section 4.5, I discuss the appearance of different morphometric variables on map sheet 093G.066, and speculate about the geomorphological significance of the PCA results that are described above. Some of the images for each morphometric variable shown in the next section are meaningless without contextual information. Figure 4-7 shows the contours and streams on map sheet 093G.066. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 60 093G066 Contours and Water Features 2 0 2 4 Kibmeters Figure 4-7. Contours and water features for map sheet 093G.066. Index contours are thick dark grey tines, and streams are dark blue. Lakes and swamps are light blue filled areas. A drumlinized till plain that blankets the top of the Fraser Plateau covers the majority of map 093G.066 (Figure 4-7). The Fraser River is located about 7km to the east of this map sheet. Major features in the area include Nadsilnich Lake, Mount Baldy Hughes and Shesta Lake. The entire map sheet is very flat, with the exception of Mount Baldy Hughes (a flyggberg) and the depression in which Nadsilnich Lake is located. The only other landforms are the eskers, kames and drumlins, moraines, swamps and bogs. The only major stream on the sheet is McCorkall Creek, which is a minor tributary of the Fraser River. In the hillshaded images presented in Figure 4-8, it is easy to see the hummocky appearance and the direction of glacial flow. Figure 4-8 shows the landforms that I identified on this map sheet, based on information from topographic maps and field surveys. Drumlins are marked in blue in Figure 4-8a. The uncored drumlins were created by glacial flow, and all of the large, rocky hills in the area, such as Mount Baldy Hughes, have been sculpted Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 61 * m • Figure 4-8. Hillshadcd views of 093G.066 showing landforms identified during training: drumlins (a), eskers (b), the kame (c), and an isometric view of map sheet from the southeast (d). Illumination is from the southwest; vertical exaggeration is 7x. by glacial flow. In order to ensure that predictions made with the Landform Classification System are defensible, I have taken a deliberately conservative approach with the landforms that I have identified. Only if I am certain of the landform type have I included it in the training sets shown. Figure 4-8d shows an isometric view of the entire map sheet to help put the landforms into context. It shows the general topography of the map sheet, highlighting Mount Baldy Hughes, the depression that contains Nadsilnich Lake and the till plain between these landforms. 4.5 Geomorphological Significance of PCA Results The results in Table 4-7 (p. 59) present some challenges as I move from the realm of Statistics into the realm of Geography. Much of what I can see in Table 4-7 makes geographical sense, although a few of the correlations require some thought and others are inexplicable. Attempting to explain the meaning of each of the correlations can help shed light on how to classify landforms, but it is a highly speculative endeavour. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 62 It is important not to go to extremes when trying to explain the geomorphic significance of morphometric variables. A correlation between a morphometric variable and a particular area on the map sheet does not prove that there is a causal relationship between the two. O f all the morphometric variables related to elevation, only the hypsometric integral has a proven relationship to geomorphic processes. It is related to the Davisian "age" of the landscape (Mark, 1975). The relationships between eigenvectors and morphometric variables are quite complex and cumbersome to explain. In the interest of conciseness, I will simply refer to a "dominant variable" or "dominant correlate" when, in a strict sense, what I really mean is a "variable that has the highest correlation with a particular eigenvector." The PCA process creates eigenvectors that explain the most variance first, and so the variables that are extracted will be those that have the greatest variability (SAS Institute, 2005). The data presented below have been normalized to fit a range of 0.0 to 1.0, using a linear stretch based on "theoretical" minimum and maximum possible values. Although the range of each variable has been scaled, the amount of variability within each morphometric variable should remain constant Included with each morphometric variable described below is an image showing how the variable appears on TRIM map sheet 093G.066, which is 25km southwest of Prince George. There are 27,910 triangles on map sheet 093G.066, and each of these is shaded according to its morphometric value. Each map shows the distribution of values from minimum (white) to maximum (black). In some cases, I have altered the distribution of colours to compensate for undesirable shading caused by outliers in the data. 4.5.1 Elevation On sheet 093G.066, the image shows the depression that is home to Nadsilnich Lake in white, and it shows progressively higher elevations in darker colours. Elevation values range from a low of 696m to a high of 1124m at the summit of Mount Baldy Hughes. Since elevation is considered by Evans (1972) to be the most important morphometric variable, it makes sense that this shows up as the highest scoring correlation for the entire NetTIN; however, it is odd that it does not show up as a dominant correlate for eskers and drumlins (see Table 4-7, p. 59). Upon closer examination, however, I see that elevation is highly correlated with the first eigenvector for eskers, and the second eigenvector for the drumlins, scoring only slighdy below the dominant correlate in each case. When I consider that PCA favours variables with high variability, it is easy to explain why the entire NetTIN shows elevation as a significant variable, because elevation is highly variable over the NetTIN. Individual landforms tend to show a much narrower band of elevation values, which may explain why elevation is not a significant variable for any of the landforms. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 63 3 « H / 4.5.2 Slope On map sheet 093G.066, slope is highest in the upland areas, particularly along a cliff at the crest of Mount Baldy Hughes and along the valley of McCorkall Creek in the southeast comer of the map sheet. The remainder of the till plain is relatively flat, and shows the U-shaped areas of steep slope that are found along the sides and back of the drumlins, relative to the direction of ice flow. Note the red-circled area, which is the lone kame in the study area (see Figure 4-8c, p. 61). Slope is a dominant correlate for drumlins and eskers and a strong correlate over the entire NetTIN (see Table 4-7, p. 59). On kames, however, it is not a dominant correlate. Both eskers and drumlins, with their concave and convex slopes will have many slope values, which explains why these landforms have dominant correlations. Kames, being conical in shape, should have little variability in slope, and thus should not have slope as a dominant correlate. The entire map sheet, with nearly flat areas as well as steep slopes around Mount Baldy Hughes, also has slope as a dominant variable, which is not surprising. 4.5.3 Aspect The image of aspect clearly shows the degree to which glacial flows have sculpted the land surface. Every landform in the image, including Mount Baldy Hughes, shows a southwest-northeast orientation. Aspect is a dominant correlate for drumlins and eskers, but not for kames or the entire NetTIN (see Table 4-7, p. 59). Elongated landforms, such as eskers and drumlins, should have two dominant aspects (one for each side), whereas kames, being conical in shape, should show all aspect values evenly. Although the PCA algorithm has a high correlation for the NetTIN, presumably because there is a relatively even distribution of aspect values in the image at right, this variable overlaps with peak density, so I removed it from further consideration. Kames should definitely have aspect as a dominant variable, but eskers and drumlins should not, since they are more elongated than kames. The PCA results are at odds with these expectations. 4.5.4 Standard Deviation of Elevation The image of standard deviation of elevation highlights the steeper slopes on the map sheet. Mount Baldy Hughes and the depression containing Nadsilnich Lake show a higher standard deviation of elevation than the remainder of the till plain. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 64 The standard deviation of elevation is not a dominant variable for any of the landforms, or for the NetTIN as a whole (see Table 4-7, p. 59). Drumlins were not highly correlated with the standard deviation of elevation, presumably because these landforms have a similar height distribution throughout the map sheet. For eskers, and the entire NetTIN, both of which exhibit increased variability in their height distributions, the standard deviation of elevation had a high correlation, but the variable overlapped with another. For eskers, the standard deviation of elevation overlapped with the roughness factor, and for the NetTIN, it overlapped with elevation. Kames were eliminated because they overlapped with mean elevation, but I do not understand why they initially had a high correlation with an eigenvector, since the triangles for the single kame are all at essentially the same elevation. 4.5.5 Mean Elevation The image of mean elevation looks similar to the image for elevation, but shows the smoothing that is introduced by calculating the mean of the triangles in the first order neighbourhood. The broader physical trends on the map sheet are made evident by the removal of the "noise" that happens when the mean is taken. Mount Baldy Hughes, the depression around Nadsilnich Lake, and the valley of McCorkall Creek all show up prominendy in this image. Mean elevation is the dominant correlate for clrumlins and kames, but was eliminated because it overlapped other variables for the entire NetTTN and for eskers (see Table 4-7, p. 59). Because PCA favours high variability, I would expect that the entire NetTIN and the drumlins would have mean elevation as a dominant variable. Since there is only a single kame, having a narrow range of elevation values, it seems strange that mean elevation was highly correlated with an eigenvector for this landform. Furthermore, it is even stranger that mean elevation was eliminated for kames because it overlapped with the X-Coordinate. The mean elevation values should be highly variable over the entire map sheet, and thus this variable should show up as a dominant correlate for the entire NetTIN. Mean elevation was eliminated for the NetTIN because it overlaps elevation. 4.5.6 Skewness of Elevation As with the standard deviation of elevation, the skewness of elevation shows a motded appearance over sheet 093G.066. In this image, the results do not have any immediately recognizable meaning. The lightest areas have a positive skew whereas the darkest areas have a negative skew. The medium grey areas have an approximately normal distribution. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 65 The skewness of elevation favours the entire map sheet and drumlins, but ignores eskers and kames (see Table 4-7, p. 59). This makes sense because these landforms have characteristic shapes, with characteristic elevation distributions, and each exists in a narrow elevation band on the surface of the till plain. The PCA should choose the drumlins, which have the most variable shape of all the landforms. The entire map sheet has a broad variety of landforms, and hence a broader range of skewness values. This explains why PCA makes skewness of elevation a dominant variable for drumlins and the entire map sheet, but not for any of the other landforms. 4.5.7 Horizontal Convexity The image of horizontal convexity for map sheet 093G.066 doesn't make a great deal of sense. The horizontal convi essentially random. Horizontal convexity should be a dominant variable for all landforms that are sinuous. This is at odds with the PCA results, in which horizontal convexity is a dominant variable for drumlins, kames, eskers, and the entire NetTIN (see Table 4-7, p. 59). For eskers, horizontal convexity overlapped with mean orientation, which makes some sense. The PCA might have selected the drumlins because they are elongated, but why the kame was chosen, when it should have uniform convexity, is unknown. The wide range of landforms might explain why horizontal convexity is a dominant variable for the entire mapsheet. 4.5.8 Vertical Convexity Over much of map sheet 093G.066, vertical convexity shows litde variation. Valleys such as McCorkall Creek contain both high (dark) and low (light) vertical convexity values. Light areas bordering dark ones indicate the presence of cliffs in the image at right. Vertical convexity was a dominant correlate for all landforms and the NetTIN (see Table 4-7, p. 59). This makes sense, given that eskers and drumlins have convex forms on top and concave forms at their bases. Since drumlins dominate the mapsheet, it is not surprising that the entire map sheet should show vertical convexity as a dominant correlate. Kames, being conical in shape, should have a uniform vertical convexity, so why they were initially selected by the PCA is unknown. Kames were removed from further analysis because vertical convexity overlapped with horizontal convexity for these landforms. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 66 4.5.9 X-Coordinate The X-Coordinate, not surprisingly, increases uniformly from west to east in this image from map sheet 093G.066. The X-Coordinate of the centre of the selected triangles was initially a dominant correlate for all landforms and the NetTIN (see Table 4-7, p. 59). It was eliminated as a variable for the entire NetTIN, because it overlapped with slope, and for kames, it was eliminated as a variable because it overlapped with mean elevation. Why it was initially a dominant variable for the kame, which has such a narrow range of X -Coordinate values, is unknown. 4.5.10 Y-Coordinate Like the X-Coordinate, the Y-Coordinate increases monotonically from south to north, and the variation created by different triangle sizes can he seen if the image is examined closely. The entire map sheet, the drumlins and the eskers should have the Y-Coordinate as a dominant variable, but the kame should not have, because the range of Y-Coordinates for the single kame is very narrow. What we see is that the Y-Coordinate is a correlate for the entire map sheet and the eskers, but not for the drumlins or the kame (see Table 4-7, p. 59). I eliminated the Y-Coordinate for eskers from further analysis because it overlapped with the X-Coordinate. It makes sense that the Y-Coordinate was not a dominant variable for kames, but it should have been for drumlins, because they cover the entire map sheet. 4.5.11 Topographic Component The image at right shows map sheet 093G.066 being broken up into multiple, elongated topographic components. The sheet consists of 1993 topographic components. On conical hills, such as kames, the topographic component encompasses the entire landform, but more complex landforms, which have breaklines or crests that split and coalesce, may be broken into multiple topographic components. The topographic component was not included in the Principal Component Analysis results, because I introduced it after performing the PCA, in an effort to combine the individually classified triangles into recognizable landforms. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 67 4.5.12 Mean Available Relief In the image at right, the mean available relief shading clearly shows the eastern slopes of Mount Baldy Hughes, and highlights the western slopes of the depression that contains Nadsilnich Lake. The light-coloured areas show quite clearly that much of this map sheet has low topographic relief. Mean available relief was a dominant correlate for drumlins and kames, but I eliminated it for drumlins because it overlapped with drainage relief (see Table 4-7, p. 59). For kames, this variable was eliminated because it overlapped with the roughness factor. This variable was not selected by the PCA for the eskers and for the entire NetTIN. The eskers may be drained well enough that there are no surface streams near these landforms, so the mean available relief, which requires the existence of streams, may be zero for many eskers. The entire NetTIN may be uniform enough (see image) that the Principal Component Analysis did not select it. Perhaps, because the entire landscape was glaciated only a few thousand years ago, there has been insufficient time for differential erosion rates to cause large differences in the mean available relief through the process of downcutting. 4.5.13 Drainage Relief In the image at right, the drainage relief variable shows where flows off slopes are likely to be most erosive. The image at right highlights the eastern slopes of Mount Baldy Hughes, indicating that there are high ridges in close proximity to streams. This means that water flowing along the slopes between the two acquires energy because of the elevation difference between the high slopes and the rivers, resulting in high erosivity. The obvious difference between this image and the one above is the slopes along the western edge of the depression that contains Nadsilnich Lake, where I see low drainage relief, but moderate to high mean available relief, which indicates that erosion is less likely to be an issue in this area, despite its steep topography. The drainage relief variable is dominant for the drumlins, but is of no significance for the map sheet or the other landforms (see Table 4-7, p. 59). This seems to be a function of the availability of watercourses near the landforms. Small streams outline and separate the drumlins from each other, so each drumlin will have its own drainage values. Since eskers and kames tend to be composed of sandy materials that are highly porous, few rivers form, and drainage relief for many of these features will be zero. As with mean available relief, drainage relief is not a significant correlate for the entire map sheet, which, once again, is likely to be a product of the "age" of the landscape. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 68 4.5.14 Roughness Factor The roughness factor measures the density of elevation changes in an area. The image of the roughness factor at right shows many areas of high roughness (dark) that are interspersed with circular-shaped areas of low roughness (light). Areas of low roughness correspond with the rounded hills that can be seen in Figure 4-8 (p. 61). Interestingly, Nadsilnich and Shesta Lakes show up as areas of moderate roughness, possibly because they are found at the edge of the map sheet. The roughness factor is a significant variable for eskers, kames and drumlins, but was eliminated for entire NetTIN because of an overlap with the Y-Coordinate (see Table 4-7, p. 59). This makes sense, since these landforms are the dominant sources of roughness on the till plain, and the NetTIN is dominated by these landforms. 4.5.15 Triangle Area At right, the image clearly shows that there is a large amount of variation in the area of the triangles found on TRIM sheet 093G.066. Since the Delaunay Triangulation algorithm that constructs the NetTIN ignores input points that add litde information to the existing triangulation, each triangle in the NetTIN represents an area of nearly uniform slope. Where the triangles are large, there is litde variation in the area, and where the triangles are small, the amount of variation in the land surface is large. I added the triangle area to the Landform Classification System after running the Principal Component Analysis. 4.5.16 Peak Density Peak density produces an image that shows the peaks and ridges within mapsheet 093G.066. The dark areas not only represent mountain peaks, but also the summits of drumlins and kames, as well as points of high elevation on eskers. In the image of peak density at right, I see that there are locations on the map with extremely high peak densities and other locations with peak densities of nearly zero. Thus, it is understandable why the PCA selects the peak density as a dominant morphometric variable for the entire map sheet (see Table 4-7, p. 59). Individual landforms tend to have a Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 69 narrower range of peak density values, with eskers having the highest values, followed by drumlins and kames. Kames were selected as a dominant variable, but were eliminated from further analysis because they overlapped the roughness factor. As a result, none of the landforms has peak density as a dominant morphometric variable. 4.5.17 Ridginess At right, the ridginess image shows the same general distribution as peak density (see above). The image is darker than for peak density because, in addition to peaks, the areas that contain only ridges are also included in this image. m k Ridginess was selected as a dominant morphometric variable for the NetTIN as well as for eskers and kames, however this variable was eliminated from further consideration because it overlaps with other morphometric variables (see Table 4-7, p. 59). For the NetTIN, ridginess overlapped with slope, for eskers it overlapped with vertical convexity, and for kames, it overlapped with horizontal convexity. Drumlins, being both large and smooth landforms, tend to have uniformly low ridginess values, which explain why they were not selected by the PCA. There would tend to be a greater range of roughness values in the other landforms, and in the NetTIN as a whole, so it makes sense that ridginess was a dominant correlate for these landforms. 4.5.18 Positive Openness Positive openness (Yokoyama et al, 2002) highlights those areas that have the most open view of the sky. In the image at right, high positive openness values (dark) occur over much of mapsheet 093G.066, particularly on ridges and peaks. Low positive openness values are shaded lighdy. These are found in the stream valleys, the areas between drumlins, at Nadsilnich Lake and the in the areas between the peaks of Mount Baldy Hughes. Positive openness was an important correlate only for drumlins and the map sheet as a whole, but was eliminated from further consideration for drumlins because it overlapped with slope (see Table 4-7, p. 59). Since the image of positive openness shows uniformly high values for eskers, kames and drumlins, the PCA should ignore this variable for the landforms.3 Why drumlins were selected by the PCA is unknown. The entire map sheet shows much variability in positive openness values, which explains why the PCA selected this as a dominant correlate. 3 Although this variable was eliminated from further analysis by the PCA, having uniform values within landforms and dissimilar values over the entire NetTIN may make this variable useful within the A N N . In Section 7.2.3.2, this variable was selected when we tested Discriminant Analysis as an alternative to P C A for selecting variables. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 70 4.5.19 Negative Openness Negative openness (Yokoyama ft aL, 2002) shows areas where the view of the ground surface is most dominant. Note that the image at right is not the inverse of the positive openness image (above), and that there is a strong edge effect. The negative openness image highlights the peaks of Mount Baldy Hughes, and shows the larger drumlins quite clearly as light oval patches with a southwest-northeast orientation. Contrast this image with the image for drainage density (above), which highlights the area surrounding the drumlins. Negative openness was a dominant correlate for the entire NetTIN, and for drumlins and eskers (see Table 4-7, p. 59). Because drumlins and eskers both overlapped slope, they were removed from further consideration. The kame was never selected by the PCA, which is what I would have expected for drumlins and eskers, because all of these landforms appear to have uniformly low negative openness in the image above. The NetTIN overall shows more variation, so it makes sense that the PCA selected negative openness for it. 4.5.20 Drainage Density The image of drainage density shows the many small streams that surround the drumlins on this map sheet. Since this variable only considers the length of streams per unit area, not their importance, it highlights the smaller streams on the map, even though there may be much larger streams present that drain larger areas. The PCA results for drainage density make little sense. Drainage density was a dominant correlate for drumlins and eskers, but the variable was removed from further consideration for drumlins because it overlapped with aspect (see Table 4-7, p. 59). Eskers were not selected for other drainage-based variables, such as drainage relief, presumably because there are few streams near the eskers. For both the NetTIN and for kames, the PCA did not select drainage density as a dominant correlate of an eigenvector. In the image at right, the drumlins have uniformly low drainage density values, so the P C A should never have identified drainage density for the drumlins. The image shows a lot of variation over the entire NetTIN, so it should have been selected, but was not. That leaves the kame, which is the only landform that makes sense, because the drainage density near the kame should be uniform, causing the PCA not to select it. 4.5.21 Mean Orientation The mean orientation for sheet 093G.066 shows the broader trends of the land surface, particularly the slopes along the east and west side of the depression around Nadsilnich Lake, and on the northwest side of Mount Baldy Hughes. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 71 Mean orientation, as used in the PCA, was originally calculated using the first order neighbours of each triangle. This was at odds with the definition used in the literature, which specified that the mean orientation was to be calculated for each landform. The algorithm to calculate mean orientation was revised to make use of the Topographic Neighbours, which isolate individual hills in the NetTIN. Even though I have changed the size of the neighbourhoods to calculate mean orientation better, the patterns are similar to what they were when mean orientation was calculated using the Triangle Neighbours. Mean orientation is a dominant variable for the full NetTIN, drumlins and eskers, but not for the kames (see Table 4-7, p. 59). This may simply be because there is only a single mean orientation value for the one kame on map sheet 093G.066, whereas the eskers, drumlins and the entire map sheet show much more variation. 4.5.22 Standard Deviation of Orientation In the image at right, most of the map has a motded appearance, which indicates that the standard deviation of orientation is highly variable. I can see two areas of lower standard deviation of orientation along the slopes bordering Nadsilnich Lake, and along the eastern and northern slopes of Mount Baldy Hughes. In both of these areas, there are large, uniform slopes. When I ran the Principal Component Analysis, the algorithm for standard deviation of orientation calculated the standard deviation using first order neighbours. This was incorrect, because I should be calculating this variable for entire landforms. Since I replaced the Triangle Neighbours with Topographic Neighbours, however, the algorithm is now much closer to the original definition. The standard deviation of orientation is a dominant variable for drumlins, kames and the NetTIN (see Table 4-7, p. 59). I eliminated this variable for the NetTIN because of an overlap with slope, and for kames, because of an overlap with mean elevation. This implies that the terrain surrounding the kames and drumlins is highly variable, unlike eskers, which Tipper (1971) suggests are often formed in meltwater channels, which are relatively flat and uniform. The image at right shows some areas of very high and low standard deviation of orientation, which fall outside the area occupied by my landforms of interest. This explains why the standard deviation of orientation was a correlate for the entire NetTIN. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 72 4.6 Summary In this chapter, I described the use of Principal Component Analysis (PCA) to identify which variables to preserve or discard. Based on the PCA results, I have divided the original 22 morphometric variables into three groups, one for each landform (see Table 4-7, p. 59): Eskers Slope Aspect Skewness of Elevation Roughness Factor Mean Orientation Standard Deviation of Orientation Vertical Convexity Mean Available Relief Drainage Relief Peak Density Drainage Density Positive Openness X-Coordinate4 Elevation Slope Aspect Mean Elevation Skewness of Elevation Roughness Factor Mean Orientation Horizontal Convexity Drainage Relief Drainage Density Positing Openness Negative Openness Y-Coordinate4 nnimlins Slope Aspect Standard Deviation of Elevation Mean Elevation Skewness of Elevation Roughness Factor Mean Orientation Standard Deviation of Orientation Horizontal Convexity Vertical Convexity Drainage Relief Peak Density Ridginess X-Coordinate4 Y-Coordinate4 These variables will be used to train a Fuzzy ARTMap Artificial Neural Network to see whether it can identify eskers, kames and drumlins. 4 These were later removed from the analysis when I realized that they biased the analysis in favour of the map sheet on which the training was performed. April 21,2005 Brad Maguire, UBC Geography Towards a Landform Geodatabase 73 5 Field Work The problem is essentially one of how to produce a generic classification of landforms that can be applied automatically, and virtually without alteration to a wide variety of landscapes. Opinions vary regarding the most efficacious way of producing such a classification (Macmillan et al., 2000b, p. 82) In 2003 and 2004,1 made trips to my Study Area, which is 25km southwest of Prince George, British Columbia. These trips were required to confirm the landforms described in the TRIM map sheets, and to collect data to ensure the landform classifications produced by the Landform Classification System are correct. 5.1.1 Study Area In this thesis, eskers, kames and drumlins are the landforms of choice for initial classification. After a cursory survey of the areas containing drumlins and eskers in British Columbia, I chose a study area on the Interior Plateau near Prince George, British Columbia (Figure 5-1). This area has the largest concentration of drumlins in British Columbia. The areas to the southwest and northeast of Prince George were strongly affected by glaciers during the last ice age, and contain thousands of drumlins. In our study area 25km southwest of the city, eskers, kames, and drumlins he within a belt that is roughly 30km by 60km in extent running north-northwest to south-southeast (54°15' N, 123°20' W to 53°20' N, 122°40' W). Figure 5-1. The location of the Study Area in British Columbia, Canada. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 74 5.2 2003 Field Season During the summer of 2003, Steven Janssens and I (Figure 5-2) took a series of short hikes away from the main logging roads in the area to examine some of the landforms in the study area. On our last day, the clouds had rolled in and rain began to fall, so we did what we could from the car that day. Figure 5-2. The 2003 team on our first day of fieldwork, June 23, 2003. Steven Janssens is at the left and Brad Maguire is at the right (photo: B. Maguire). 5.2.1 Objectives A series of 12 unusual landform configurations had been identified from a preliminary examination of the TRIM maps and some early NetTINs of the study area. Table 5-1 shows the questions that I wanted answered during this visit to the study area. In addition to examining these 12 landforms listed in Table 5-1, our secondary objectives were to perform a cursory check of the extent of the esker mapping shown on the TRIM sheet, to examine the landforms first hand, and to take photographs of the landforms and the study area. 5.2.2 Equipment We located each of the landforms in question by finding its U T M coordinates with a Magellan Blazer 12 handheld GPS unit (Magellan, 2004). The datum for these points had to be converted from North American Datum 1983 (NAD83) to North American Datum 1927 (NAD27) prior to the transfer to allow the data to be entered into the GPS unit, which only supported NAD27. The Blazer 12 can store up to 500 waypoints, but has no capability for storing tracks, which would have been useful for recording data as we ran traverses along the top of eskers. To map the eskers, Steven and I worked closely together, with Steven reading the U T M coordinates off the GPS receiver while I recorded the coordinates on a Palm Pilot. Prior to embarking on this task, I created a template for the coordinates using Cause&Effect's table format. This enabled us to direcdy download the coordinates from the Palm Pilot onto a PC as a text file, and I then read this file into Cause&Effect to produce a record of the points along the tops of the eskers. I then reprojected the results from NAD27 back to NAD83 within Cause&Effect for use with the TRIM data. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 75 Site UTM X-Coordinate (NAD 27) UTM Y-Coordinate (NAD 27) Query A1 501142.0 5930726.0 Is this feature an esker or a drumlin? A2 503491.0 5937882.5 The TRIM shows 2 eskers joining into 1 here. Is the TRIM map correct? A3 502461.0 5936983.0 The TRIM shows a horseshoe-shaped esker here. Is it real? A4 500306.0 5936818.5 The TRIM shows an esker here. Is it real? A5 500089.0 5934814.5 Is this an esker or a drumlin? A6 504995.5 5937001.5 The TRIM shows an esker overlain by a drumlin here. Is this real? A7 503010.0 5933779.5 Where does this drumlin end at its northernmost extremity? A8 511916.5 5937672.0 What is the material between drumlins here? Bedrock? A9 509286.0 5928252.0 Is this an esker, a drumlin or an esker on top of a drumlin? B1 479076.0 5968031.5 Is this really a depression between eskers? B2 479338.5 5968108.5 Is this really a depression between eskers? B3 479440.5 5968247.5 Is this really a depression between eskers? Table 5-1. Questions to be answered during Summer 2003 fieldwork. Our method of transportation was my venerable old Honda Civic (Figure 5-2), which received a thorough beating, and was unable to venture far beyond the main line logging roads because of its low ground clearance. We once managed to "hang up" the car (i.e. some of the wheels were no longer on the ground), at the entrance to the McKenzie Lake Forest Service Campground. After this, we were careful about the areas into which we took the Honda, for fear of being stranded. 5.2.3 Findings Because of the limitations of our mode of transportation, we were only able to make it to 7 of the 12 sites in question (Table 5-2). The remainder of these were too remote to access without a four-wheel drive vehicle. 5.3 2004 Field Season In 2004, we returned to the study site with another team member, Richard Bader. This time, we substituted a rented Jeep Grand Cherokee for the Honda Civic used during 2003, which allowed us to access the majority of logging roads in the area. Our team planned five full days of fieldwork, from July 5th to 9*. 5.3.1 Objectives The main objective of the 2004 field season was to further assess the accuracy of the TRIM map sheets with respect to the eskers, which had been a source of concern since the project began. In particular, I was concerned that there might be many eskers that had been missed during the TRIM mapping. To determine this, we divided the study area into 4x4km grid cells, and attempted to enter each grid cell using our vehicle. The theory was that there was a reasonable chance that we would see or cross any unmapped eskers. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 7 6 Site Query Result A1 Is this feature an esker or a drumlin? This is a drumlin. The sides are much less steep and the feature is several hundred metres wide. A2 The TRIM shows 2 eskers joining into 1 here. Is the TRIM map correct? The TRIM shows an "h" configuration. This is a braided esker that has 2 former confluences in an "H" shape; the westernmost esker was the only one picked up by the photogrammetrist. There is a smaller esker to the east at the bend in the 'h'. Maximum height here is only about 30m, but the eskers show up because the area is so flat. A3 The TRIM shows a horseshoe-shaped esker here. Is it real? The easternmost esker runs N-S and a larger esker branches off to the west. There is a peat bog in centre of horseshoe. A7 Where does this drumlin end at its northernmost extremity? The drumlin breaks into several small hills at x=503184.0, y=5934394.0. B1 Is this really a depression between eskers? Yes, it is a depression. There is no stream, and the area is very dry. Local vegetation is pine and alder. B2 Is this really a depression between eskers? Yes, it is a depression. I sighted a lake from the esker to north (we didn't actually make it to site). B3 Is this really a depression between eskers? Yes, there's a depression with a small lake here. Table 5-2. Answers to 2003 field work questions. In addition, I wanted to get some information on the size o f the eskers and drumlins in the study area, since the ones that we mapped north o f the study area in 2003 were more than 100m high (Figure 5-4). Where possible, we also attempted to complete the assessment o f the unusual landforms from 2003, and to take some better pictures of the landforms in the area. Figure 5-3. The 2004 field crew: Steven Janssens (left), Richard Bader (middle), and Brad Maguire (right) (Photo: B. Maguire). Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 77 Figure 5-4 Richard Bader shows off his GPS mapping and bug swatting skills on top of a drumlin (a). Tracks recorded using the GPS when measuring the crest and boundaries of this drumlin (b). Note: the Z values on the track come from the GPS; this is not the result of data being draped onto the NetTIN. The GPS values were elevated 20m above the surface for clarity in this illustration (photo: B. Maguire). 5.3.2 Equipment In 2004, we made use of Steven Janssens' digital camera, which enabled us to take many more photographs, and we downloaded these to my laptop computer at the end of each day. One nice feature of the digital camera was that it allowed us to assess the quality of the photographs in the field to ensure that they were suitable for later use. We also came with 1:20,000 plots of the TRIM map sheets, showing all roads, trails and cut lines that we could (theoretically) use to reach the areas in question. Two further pieces of equipment extended our capabilities considerably during the 2004 field season. I purchased two Garmin RINO 130 GPS/radio units for mapping the location of landforms (Garmin, 2004). These units were vastly superior to the Magellan Blazer 12 unit that we used in 2003. The RINOs had the following features that proved valuable during our fieldwork: • Accuracy of < 3m 95% of the time using Wide-Area Augmentation System (WAAS) (Garmin, 2004) • The ability to store tracks with X, Y, and Z coordinates • 24 Mb of internal memory • The ability to download direcdy to a computer • The ability of upload TRIM map sheets and view them on the GPS screen • Support for the NAD 83 datum • Two-way voice communication and transmission of position between units These features were very useful, since they allowed us to automatically map esker crests and boundaries as 3D tracks. The voice communication feature allowed us to have two teams in the field, and we were able to coordinate our mapping efforts using the radio and the automatic transmission of each unit's GPS position to the other unit.5 5 This proved to be very valuable on Thursday, July 8, when, while working alone, I became disoriented and confused during a long traverse of an esker, and had to rely on the other team to radio me the directions to their location. It later turned out that I had begun to traverse an adjoining unmapped esker that was parallel to the one that I thought I was on, but the thick undergrowth made it impossible to determine this on the ground. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 78 The 24 Mb of memory on each unit also allowed us to download subsets of the TRIM data to the RINOs, so that we had moving maps of our location as we traversed the landforms, as well as the locations of the landforms in question that we missed during the 2003 field season. At the end of each field day, we were able to download information from the RINOs to a laptop computer at the field camp. The rented Jeep Grand Cherokee had a minimum ground clearance of 19.6cm (7.7 inches) (Jeep, 2004), which made it easier to navigate logging roads. 5.3.3 Results Despite the rental of a superior vehicle for the 2004 field season, weather conditions conspired to limit its utility. Most of the week was very rainy, and some roads were impassable, particularly by week's end (Figure 5-5). This forced us to do more hiking than we had planned. I Figure 5-5. Returning to our vehicle, which could not proceed further because of deep mud in a gulley (Photo: R. Bader). The goal of accessing the 4km2 grid cells resulted in our driving over more than 800km of logging roads in four days, and included several tens of kilometres of hiking. In cases where no roads went to known esker locations or the roads were blocked, we hiked in to the eskers, if the distance was not too far from the vehicle. Richard Bader proved to be an accomplished (and fearless) off road driver, Steven Janssens sat in the passenger seat with one GPS unit, mapping the peaks of dnimlins and location of gravel pits as we saw them, and I sat in the back seat, highlighted our route on the TRIM plots, and determined which side roads we should follow to access the grid cells. In all of our driving, we did not find any solitary eskers that had not been mapped. However, in many cases, single eskers were mapped on the TRIM sheet when in fact, there was an esker train. Despite this, the mapped eskers proved to be good indicators that eskers or esker trains could be found in the area. We did find one error of commission in the TRIM data: one esker was mapped that did not exist. Curiously, when later viewing the area from a neighbouring dnimlin, we could easily make out a line of higher trees, which the photogrammetrist must have mistaken for an esker. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 79 Although checking out the eskers using the 4-km2 grid cells was labour intensive, it enabled me to calm my doubts about the quality of the esker mapping in the TRIM data. The only landform from the 2003 field season that we were able to reach in 2004 was point A5, at U T M coordinates X=500089.0, Y=5934814.5 (NAD 27). From the TIN, it was unclear as to whether this was an esker or a drumlin, but upon closer inspection, we determined that it was an esker. This was one of the landforms that we mapped in detail to obtain height and length information (Figure 5-6). The crest of the esker (the longest line shown in Figure 5-6) is 548m long, and the height is 3.39m on the west side, but only 0.54m on the east side. The esker formed on the side of a drumlin, so it was laid down at an angle. (c) Figure 5-6. GPS Tracks for Esker at A5 showing differential height of east slope versus west slope. East slope — mean elevation difference between the two highlighted tracks (thick red line) is 0.54 m (a). West slope - mean elevation difference between the highlighted tracks (thick red line) is 3.39 m (b). Location on map sheet 093G.056 (red section, marked with arrow) (c). Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 80 Two other eskers were mapped in detail during the summer of 2004. The first, on map sheet 093G.056 had a mean height of 6.22 m (Figure 5-7). The measured section was 325m long, and the esker itself was 440 metres in length. The esker is one of many found in an esker train, which has a total length of nearly 2 km. Figure 5-7. Esker examined using GPS receiver on map sheet 093G.056. Sections drawn with a thick red line were used to obtain mean elevation (a), and (b) indicates the location of the esker on map sheet 093G.056 (black arrow). We measured two sections of another esker on map sheet 093G.066 (Figure 5-8). We found it to have a mean height of 3.63m in the northerly section, and 3.75m in the southerly section. The length of the esker is just over 1.5 km. Unfortunately, barricades on the road and poor weather conditions prevented us from accessing more territory in 2004. We discovered a large farm (Alma Farm) on the western slopes leading down to the Fraser River, but unfortunately, due to time constraints, we were unable to obtain permission to enter the farm property to continue our search there. Because our vehicle was a rental and lacked a winch, we were unwilling to risk losing it in mud. 5.4 Remaining Work Although there are no plans for a 2005 trip to the field, the following items are worth investigating, although none are critical for the completion of this thesis: 1. Visit the remaining sites in the 2003 list of landforms to investigate 2. Obtain permission from Alma Farms to look for landforms on their lands. 3. Investigate the kame that I identified on sheet 093G.066 to verify that it is real, and look at some other promising sites to see if they, too, are kames. 4. Try to access some of the grid cells that we could not reach during 2004 field season. Dryer weather would help considerably with this task. 5. Confirm or disprove the existence of any landforms that are predicted by the LCS Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 81 Figure 5-8. We measured the esker on map sheet 093G.066 at two locations. The section shown with the thick red line in (a) had a mean height of 3.63m, and the section shown with the thick red line in (b) had a mean height of 3.75m. The location of the esker is shown by the arrow in (c). 5.5 Summary Fieldwork conducted during the summers of 2003 and 2004 was able to resolve a majority of the doubts that I had about the quality of the 1:20,000 TRIM mapping of the study area. We found that the data has sufficient resolution to represent the landforms that are being studied in this project. In particular, my concerns about the locations of eskers in the TRIM data were resolved. Although we were unable to resolve all of the questions that were set out in 2003, enough of the landforms in question were examined first hand that the remaining unvisited sites are no longer of great concern. Although another season of field studies might be contemplated, I now have a reasonably strong understanding of the landforms and the character of the study area, and further field studies are only warranted if additional critical questions about the study area come to light. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 82 The fieldwork conducted during the summers of 2003 and 2004 helped me to verify the landforms that I had identified on the NetTIN, and laid the foundation for my test of the classification abilities of the LCS. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 83 6 Software Design The visual perception of topographic form should be possible to simulate using numerical methods and digital elevations (Pike, 1988b, p. 492). 6.1 Introduction In previous chapters, I presented a broad outline of this project, together with the results of a literature search. I discussed why this project was undertaken, and described the location of the field study. I described the morphometric variables that I used in this project, showed what they mean in a geographical sense, and I briefly described the algorithms for these variables. In this chapter, I deepen the discussion, and present my proposed solution to the problem. 6.2 Overview of Research In this project, I am developing the Landform Classification System (LCS), which will allow a computer to automatically identify landforms over many map sheets with sufficient rapidity to enable the classification of province-sized areas. I will make use of Facet Decision Systems' Cause&Effect package (Facet, 2004) to construct the LCS. Within the Prince George study area, three TRIM map sheets (093G.056, 093G.057, 093G.067) covering approximately 420 km2 in total will be classified, based on the landforms identified on a fourth map sheet (093G.066). The user of the LCS delineates landforms using a three-dimensional hillshaded view that includes additional superimposed TRIM and field survey data. Although the system will use only one of the four map sheets for training, I will manually delineate landforms on all four sheets, so that the system can compare the results of the classification with this "known" data. Of course, it is impossible to perfecdy delineate the landforms, given the variability in landform size and shape and the limitations imposed by the accuracy of the NetTIN, so the classification accuracy is limited by the accuracy of the delineation. The advantage of this method is that is not necessary to spend a lot of time examining each landform to determine which morphometric variable values best describe it. Essentially, the user points at the landform, and the ANN examines all of the landforms on the training map to determine which morphometric variable values best describe that landform. The landform classification depends on the morphometric variables that the system calculates for each triangle in the NetTIN. These are the key to any classification; if the morphometric variables are calculated incorrecdy or I have chosen the wrong set of variables, the system's classification abilities will be limited. The system will analyze the reduced set of morphometric variables using a Fuzzy ARTMap Artificial Neural Network. This ANN will map combinations of morphometric variables to landform types through a training process on one of the TRIM map sheets. The software will use the trained ANN to create predictions of the landform types for each triangle in the NetTIN based on the morphometric variables on three unclassified map sheets. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 84 Error Matrices (see Appendix B) will help to quantify the accuracy of the predictions, by enabling Overall Accuracy, Producer's Accuracy and User's Accuracy to be calculated (Congalton, 1991). These will be helpful in assessing how well the classifications are working, but are susceptible to overestimating accuracy values, because even a completely random selection of triangles is likely to correcdy identify some landforms. To resolve this, the system will employ Kappa Analysis (Congalton, 1991) as the primary tool for the assessment of classification accuracy. Kappa Analysis produces error values that compensate for the effects of random distributions. 6.3 Software Capabilities Once I have assembled the data for this project, I will build a series of computer software modules to facilitate the processing and analysis of the data. These will form the core of the Landform Classification System. Figure 6-1 describes the general flow of data through the software; I describe the individual modules in the following section. Data Loading and Formatting Morphometric Variables Principal Component Analysis Selection of a Subset of Morphometric Variables Identification of Landforms for Training Selection of a Subset of Triangles to Train Assembly of Data From Different Map Sources Artificial Neural Network Calculation of Accuracy Figure 6-1. Overview of software modules and the flow of data through them. The Principal Component Analysis module and the Artificial Neural Network will use external software products. 6.3.1 Data Loading and Formatting Data loading and formatting are the most basic functions performed by my software. This module will import and format the TRIM data to create a NetTIN. A minimum distance of 5m between points will be used for the mass Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 85 points, and the breaklines -will be filtered so that vertices are no closer than 5m apart. This will ensure that the NetTIN represents the land surface accurately, but does become too large because of the density of data in the breaklines. The system will drape supplementary TRIM data including contours, rivers, eskers, roads and water features onto the NetTIN for display. It will also drape symbols representing landform locations that we collected during the field study to provide assistance in identifying drumlins and eskers. 6.3.2 Morphometric Variables The Morphometries Module will calculate the 22 variables that were shortlisted in Chapter 2 (Figure 6-2). The software will calculate these by using the information contained either within a single triangle or within a central triangle and its neighbours. All morphometric values will be "returned" to the central triangle, so that a table of values can be built with 22 columns, one for each variable, and as many rows as there are triangles. The order of the rows will correspond to the order of the triangles in the NetTTN. Sheet 0030058 Point Resolution: 5.0m Line Resolution: 5.0m Only Fist 500 Records arc Shown Progress M 30.0% | Loading morphometries table mofph_OQ30058_...i elevation 1 slope aspect area IverticalCon... horizontalC.J sdElevation maanElevat. skewnessE.Jmearv 835.3 7.0 32 5285.0 83 -02.0 123 937.5 oal 843.0 13.8 70.1 118883 -73 -833J 103 940.1 02j 823.0 4.0 2982 27905 •4.7 11.7|_ 22 825.4 -03) 823.7J 30 291.9 28443 13 •«2l 2.1 8243 -03| 837.71 4.0 332.4 78413 83 -143| 72 830.4 - ia| 338.0 3.9 326.0 118803 82 38.4 8.1 8323 -2.11 012.3 53 81X1 1500.0 13 20.7| 32 9123 -12! 906.7 0.8 73.3 22023 -03 43| 28 8112 -13] 930.3 43.0 01.0 4383 483 48.8! 23 838.7 03| 838.0 4.0 271.0 2523.0 -413 22.1) 13 8383 03| 881.7 5.0 127.4 28283 18.7 -135.1 43 882.4 2.1 863.0) OA • 41873 03 03| 4.7 862 1 12 840.0 102 285.1 0108.5 83 20.3 13.4 8383 -1.11 B43.0| 103 2872 183103 143 153{ 13.4 843.1 -23| 825.0 ma 108.4 23355.0 -23 8.0I 73 8183 02| 821.7| 10.4 112.8 141170 •1.4 183| 73 822.1 -13| 804.3 43 329.7 800.0 -3.7 -783 13 8643 2.0 804.7 SB 318.4 7403 43 11.4 13 804.4 13 881.3 IBB 131 J) 148453 -123 -14.4] 10.8 880.3 02 877.0| 32 154.4 14305.0 -133 -30.4! 83 8815 870.7L 8.8 483 37333 2.5 1143! 73 963.7 -12| 000.3! 113 2973 S1423 -102 103.3 53 9863 0.4j 803.3! 3.0 3205 70803 -3.1 -11731 8.7 9033 -12 889.3 33 328.4 130083 13 3.0l 83 9003 -13| 831.7) 18.9 1123 383 172 203| 13 830.7 •0.4] 832.3! 183 104.4 553 -11.7 03| 4.4 8323 -13| 840.7 13 0B 280.0 0.8 -47.8 23 848.0 23| 840.3 IB 3333 2473 -8.1 482 2.4 8473 23| 878.7] 3.4 105.4 4S153 0.1 18.3 23 8783 -23| 870.3 2.9 105.4 53823 3.7 -24.1 32 970; -2.7| 903.3 8.8 105.0 25183 -3.8 843 4.0 8023 -02| 889.7 52 2883 18833 72 -133] 43 8003 02 838.0 113 1053 34715 -123 - M 73 8373 13 849.3! 4.1 90.7 21313 -103 -1831 83 8333 -0.5 891.0J 173 843 28155 143 203| 102 8822 -2.8 80B.7 SI 852 18095 123 3531 63 883.1 -23| . . . . . . . . . . . . . . . . . . . . . . . . . . I Figure 6-2. The Morphometries Module Display. Note the progress bar, which allows the user to monitor the progress of morphometric variable generation, which can be time consuming. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 86 6.3.3 Selection of a Subset of Morphometric Variables Since the objective of using Principal Component Analysis is to reduce the number of morphometric variables that will be used, the LCS contains the Morphometric Sets Module, which allows the user to select a subset of the 22 morphometric variables that were programmed (Figure 6-3). Morphometric Sets are similar to Pike's "geometric signature," which is a combination of morphometric variables and particular ranges of values that work together to uniquely identify a particular landform (Pike, 1998a). M( n ;ihn -11.1 9& IB HIIO i H [TH Landformi 9 C3 Drumlin I 1 drurnlinDtsorirninjnt drumllnDiMriminanlZ O drumlin* O diumllnsyod eh n Qtr.tr _Jn|x| drmuHnsMod Drumlin columns, as per PCA X and Y have been removed to prevent problems on non-training sheets CategDiyi Continental Glaciation Drumhn Revision: 1.2 Date: Feb 13. 2005. 17:24:36 Author: Brad Magure Selected Cohnnm: 13 Edit drumlirrsMod ox Landform Type: Columns to Use: Category: Continental Glaciati... • Selected Columns: slope, aspect, verticalConve 5s ortsotrtciSC oriiresssty, sdElevation, mcanElevabon, sktwnKsElcvauon, dramageRehef, roughnessFai peakD easily, ridginess, meanOrientation, sdOner Mar07.20O6. 13:12:10 elevation slope aspect area vertiojIConvexrty ho rrzontalConvexrty jsdElevation meanElevation IskawnassE levatlon Documentation- D |umlin columns, as per PCA. X and Y have been removed to prevent problems on non-training sheete Figure 6-3. The Morphometric Sets Module User Interface (left), and the editing window (right), with which morphometric variables can be selected. The selected morphometric variables will then be passed on to later modules and eventually to the ANN. In addition to the groups of variables for eskers, kames and drumlins that were identified using PCA, three other groups will be tested: 1. The combined set of morphometric variables that were judged to be most effective for all the landforms combined (see Table 4-3) 2. All of the 22 morphometric variables that have been programmed 3. Significant variables for eskers, kames, and drumlins identified using d i s cr iminant analysis, an alternative to PCA. By comparing the classification accuracies that result from the use of these different sets of variables, I can confirm the utility of the Principal Component Analysis, and can check whether the combined set of morphometric variables works for all of the landforms. Once the most effective set is established, it will be "frozen" and used for production runs. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 87 6.3.4 Identification of Landforms for Training In order to identify landforms on a NetTIN, the A N N must first be provided with some training data. In this project, the landforms from map sheet 093G.066 are used for training. The Landform Trainer module takes advantage of Cause&Effect's 3D data display capabilities to help identify landforms. By displaying the hillshaded NetTTN, draped TRIM vector data (contours, rivers and lakes) and supplemental information gathered from our fieldwork (mapped eskers and drumlin locations), the identification of landforms becomes much easier (Figure 6-4). Figure 6-4. Training of landforms on a hillshaded view of a NetTIN. Here drumlins (dark blue areas) are being trained by selecting triangles on a NetTIN. Background information draped onto the NetTIN includes contours (dark grey lines), roads (red lines), streams (dark blue lines) and swamps (light green lines). Using the Landform Trainer module, a modified version of the Cause&Effect ThreeDee viewer, the user can rotate the NetTIN in all three dimensions and zoom in on areas of interest. By rotating the NetTIN, the user can gain a better understanding of the three-dimensional character of the model better than from hillshading alone. O f course, both the angle of illumination and the vertical exaggeration of the NetTIN can be altered "on the fly." The Landform Trainer can also provide the U T M coordinates of the location that the user points to, or it can provide supplemental information about the individual triangle selected. For training, the Landform Trainer allows the operator to surround an area with a polygon, and the system can either classify or remove the classification of all the triangles within it. In addition, individual polygons can be toggled on and off. When the user is satisfied that the correct triangles have been selected, the results can be transferred to the Lesson Database for cataloguing and storage (Figure 6-5). This allows a library of training triangles to be built up for each mapsheet and landform. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 88 CJ* Lesson Database • X |l ° \ training, sets 9 Drumlin O drumoopy003GO€ORes15 O dnjmcopy003000SR«s25 O dmmoopy003G007R*s15 O dnimcopy003G007Res25 O drumliM0O3GO9S O drumliMOO3GO50Res6 O d.umllns003&057 O drumlirrj0C3GO57R*s5 O drumliraOQ3G006 O d ruml (M0D300e0R«9 O drumlliisDB3GOB7Res3 O drurnlinsS3G057 O - ( I3E*«I 9 • Kame O kjmwooaoose O l u m * H 0 0 3 0 0 9 0 R « S O luim«90B3OO57 O tcemeSuO30O57Ree3 O kameaOCSGCOO <lr*nwlIns093G057Res5 Drumlins for 093G.057. Copied form 25m (point) and I Dm (sine). Eflganaichedto 093G.056\ Q93G.066, 093G.G67 F.dil d r u m l i r t J > 9 3 G 0 5 7 R i ^ 5 . • Ix Canflnentat Glaciation Process: Continental Glaciation f J B M B M H B K Drumlin Revision: 1.4 Date: Dec 01. 2004, 12:57:38 Author: Brad Maguire Date: DooumentJtioi Mm 07.2000. 13:15:13 Linrrform Type: Revision: Choose Color S Choose Color Drumlins *of 0636.057 Copied form 25m (pelnO and 10m (line). Edgematched to 003G.05S. DB3G.0DB, 0B3O.067 Map Sheet: 093G057 Point Resohrtion: 5.0 Line Resolution: 5 0 Total Triangles: 33604 Selected Triangles: 4388 Map Shoot 0S3GOS7 <^ Sara From Trainer Point Resolution: 5.0 Una Resolution: 5.0 | Copy Training Sort j <^> Copy To Trainer Figure 6-5. The Lesson Database Module User Interface (left) and the editing window (right), from which training sets are transferred to and from the Landform Trainer. Although drumlins are relatively easy to identify on the NetTIN, eskers and kames are more difficult, since eskers tend to be quite small and kames are difficult to locate because they are scarce. How this will impact classification accuracies remains to be seen. The classification of drumlins should work quite well, but kames may be missed because they are small and hard to spot. I expect that the larger eskers will be identified, but the classification will become less reliable as the height of the eskers decreases. 6.3.4.1 Exporting Data to the ANN Just as not all morphometric variables will be passed on to the ANN for training, not all of the triangles will be passed on either. The Export to ANN module to produces training sets for the Fuzzy ARTMap ANN. This module takes the selected morphometric variables and the triangles identified by the Landform Trainer to produce a set of data that can be used for training. The data output can be used to control what is sent on to the ANN for analysis. The data output settings include: 1. Sending all of the triangles on each map 2. Sending only the triangles that have been visually identified as belonging to a landform 3. Sending the triangles that belong to a landform plus an equal number of entries that do not belong to a landform, chosen at random 4. Sending a subset of 1250 triangles that have been visually identified as belonging to a landform, plus an equal number of triangles that do not belong to any landform. 5. Sending all of the triangles for each map, but zeroing all of the training data — this is only used when triangles are exported for testing, to ensure that the training examples do not "contaminate" the expected results if something goes wrong with the ANN. In Chapter 7, each of these subsets, except for the last, will be tested to determine which results in the best training accuracies. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 89 6.3.5 Assembly of Data from Different Map Sheets The Import From A N N module reads the results from the Fuzzy ARTMap A N N , and makes the results available to other modules in the LCS. Although only a single map will be used for the training of the A N N , I will be testing the results on three map sheets. This requires the Assemble Data module to assemble the A N N classification results, the training data for each map sheet and all required NetTINs together. To prevent all these data from becoming mismatched, the system counts the elements of each to ensure that they match. If they do not, the module returns an error message and will not proceed further. 6.3.6 Artificial Neural Network Originally, I had planned to integrate a Fuzzy ARTMap Artificial Neural Network direcdy into the LCS, but was unable to find any suitable software that could be easily compiled on a Sun workstation, which could be integrated into the system. Eventually, I chose a version of the Fuzzy ARTMap A N N written by Aaron Garrett at Jacksonville State University in Jacksonville, Alabama (Garrett, 2004) for this thesis. This version runs on a PC under the M A T L A B package (Mathworks, Inc., 2004). Work is underway at Facet Decision Systems to direcdy integrate a Fuzzy ARTMap A N N into the Cause&Effect language. The A R T Gallery, by Lars Liden (Liden, 1995), is currendy being modified so that it can be compiled on the Solaris operating system, on which Cause&Effect runs. Unfortunately, the A R T Gallery has no classification routine6, so this will have to be written before the A R T Gallery can be employed. 6.3.7 Calculation of Accuracy In order to objectively assess the accuracy of the results from the A N N and to ensure that incremental changes to the system continue to improve the accuracy values, I have constructed the Error Matrix Module to automatically report accuracy levels. This module compares the known classifications for one or more sheets with the classifications that are generated by the A N N . These comparisons are used to generate an Error Matrix (Congalton, 1991), which in turn is used to calculate Overall Accuracy, User's Accuracy and Producer's Accuracy. The module also uses the Error Matrix as a data source for Kappa Analysis (Figure 6-6). Note that in Figure 6-6, an Overall Accuracy value of 96.61% produces a K score of only 79.40%. These data can also be displayed as a 3D map, so that their accuracy can be assessed visually (Figure 6-7). 6 This was independently confirmed in October 2004, by Byron Berglund at Facet Decision Systems (Berglund, 2004) April 21, 2005 Brad Maguire, UBC Geography Towards a Landform Geodatabase 90 F ie Edit View Tab Frame Help Overall Accuracy 96.61% KHAT Accuracy = 79.40% Accuracy by Category. Cateaory I Druml in I Not Druml in Producers Accuracy 88 87* 8730* User's Accuracy 74.84* 98.98* ElTOr Matrix: Expected Results Category Druml in I Not Druml in I Total Drumlin 2710 913 3628 Mot Drumlin 340 33021 33361 Total 3060 33834 38880 Results Figure 6-6. Window showing Error Matrix in Landform Classification System. Overall, K, User's and Producer's Accuracy values are also displayed. Figure 6-7. Three-dimensional map view of classified results. Sheet 093G.066 (left half) was used for training and sheet 093G.067 (right half) was used to visually examine which areas were classified. The operator can rapidly rotate the image in any plane to obtain a better understanding of where on the landforms classified triangles were placed. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 91 6.4 Summary A prototype software package has been constructed to facilitate the evaluation of procedures that will enable the automated classification of landforms to take place. The system allows users to identify landforms on the surface of a NetTIN, and the morphometric variables that have been calculated for these landforms is used to train a Fuzzy ARTMap Neural Network. The trained ANN can then be used to classify morphometric variables that have been calculated for other map sheets. This system contains a module to automatically calculate Error Matrices, Producer's, User's, and Overall Accuracies, as well as perform Kappa Analysis. This module will allow the immediate evaluation of changes to the software parameters, such that the accuracy of the ANN analysis can gradually be "ratcheted up" until the system classifies landforms with the greatest accuracy permitted by the software design. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 92 7 Results "You see those hills, Mr. Palmer?" "I do." "You may not know it, but is possible from their contours, the lack of trees, and their general appearance to say that they are made of chalk." That is a remarkable skill. I congratulate you-" "No, No, that is not the point It is the pattern that is everything." (Winchester, 2001, p. 98). 7.1 Introduction By October 2004, I had completed the Landform Classification System to its original design, which was described in Chapter 6. Seven parameters controlled the accuracy of the model, and through a "ratcheting" process, I was able to systematically modify each to improve the overall results. Training and classification times for the A N N were exceedingly long, averaging about 24 hours, and accuracy values were very low. I altered the system several times, and was eventually able to improve classification accuracies significandy. 7.2 The "Ratcheting" Process Ratcheting is a process to gradually improve classification accuracies by systematically altering values and retaining only the changes that prove to be beneficial. Just as a ratchet wrench can only turn in one direction, the ratcheting process ensures that I make constant, incremental improvements to accuracy levels. There were seven parameters to be tuned; the most effective way to determine optimal values for each was to cycle through them and alter one parameter while holding the remaining six constant. The seven parameters that I altered were: • The density of line and point data used to make up the NetTIN • The number of triangles presented in the training sets • The morphometric variables presented to the A N N for training • The Base Vigilance setting used for the Fuzzy ARTMap Artificial Neural Network • The learning rate used by Fuzzy ARTMap • The number of votes (1, 3, or 5) used in a voting system • The choice of a post-processing filter to improve the appearance and accuracy of the A N N results Because clrumlins were the easiest and most plentiful landforms to identify, I decided to use only drumlins for the tests. This assumes that changes made to improve drumlin classification accuracies will also be applicable to kames and eskers7. I restricted my initial tests to map sheet 093G.066, and added sheets 093G.056, 093G.057, 093G.067 at a later stage. Although this was somewhat time consuming, it was the only way that I could be sure that the classification worked well on multiple sheets; in early tests, I found that classifications that worked well on one sheet did not necessarily work well on another. Despite showing promise when modified individually, when I combined the 7 In hindsight, this may not have been a good assumption, and may be one of the reasons why the drumlin results presented below are much better than for kames and eskers. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 9 3 individual improvements, the results were disappointing. This is likely because two or more of the parameters work in combination, so it is not possible to isolate them using the "ratcheting" process. 7.2.1 Point Density Changes There are two parameters that control the number of triangles that are created in the NetTIN for each map sheet. These are the density of the mass points and the point density in the breaklines from which the network is created. Mass points are irregularly spaced, but relatively evenly distributed points that are used to create the general land surface in the NetTIN. The breakline points are then used to create a network, which helps to give the NetTIN better definition. From now on, these two parameters will be simply referred to as the point density and line density values. In Section 3.4.1, my examination of the landforms to be used for this thesis led me to calculate that a minimum resolution of 30.5m in the X Y axis and less than 1.0m in the Z-axis would be sufficient to allow landforms to be identified. Since the published accuracy value for the gridded TRIM data was 25.0m, I adopted an initial default point density of 25.0m and line density of 10.0m. The line density value was based on an estimate, since I knew that the breaklines had higher point densities than the mass points. In order to determine appropriate, but still manageable resolution settings for this application, I varied the line and point resolution values to determine how many triangles were created in the NetTIN in each case. When I reduced the point and line density values from 25.0m and 10.0m respectively to 1.0m each, I was surprised to find that the number of triangles in the NetTTN grew six fold (Table 7-1). Although using the published TRIM D E M accuracy values seemed to make sense as point and line resolution settings, this exercise showed that it is important not to confuse D E M accuracy values with irregular point density values. The NetTIN created with the l m point and line densities had a vasdy better appearance than the one that was created using the 25.0m values, and allowed eskers to be visually identified much more easily. Drumlins, tending to be larger, were not as affected by this increase in the resolution (see Figure 3-9, p. 39). When the resolution was increased, some very small drumlins were added and the boundaries of some of the larger drumlins needed to be altered, but overall, the drumlins defined at a lower resolution were a good representation of what was visible at a higher resolution (See Figure 3-9, p. 39). This proves that the NetTIN represents the land surface quite well, even when the number of points is varied significandy. Although it is tempting to use every single mass and breakline point, calculating morphometric variables for the huge number of resulting triangles takes many hours and frequendy causes the application to crash. I was able to create the NetTIN and calculate the neighbouring triangles for all map sheets at a 1.0m resolution, but I was unable to complete the morphometric variable calculations for even a single sheet.8 A line and point resolution of 5.0m was chosen for the NetTINs to be created for the final analysis, because they provide relatively good resolution without creating an unduly large NetTIN. At a 5.0m point and breakline density, 8 Running on a 4 processor Sun workstation with 40 Gb of memory, these calculations caused Canse&Effect to crash four times in a row. This is likely the result of a "memory leak" problem that is known to exist in version 3.6 of Cause&Effect, and is a good argument in favour of upgrading to version 4.0 in a future release. Version 4.0 should also significandy improve the speed of calculations. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 94 eskers can still be seen relatively clearly on the D E M , which implies that this resolution is sufficiently high for my analysis. In an ideal situation, I would run a complete analysis for every entry shown in Table 7-1 to assess the effect of different point and line resolutions on classification accuracy. However, since the process of creating a NetTIN (1-8 hours depending on resolution), calculating morphometric variables (1-12 hours), and defining training sets (1 day) is so time-consuming, this is not really a practical series of tests to run. I ran analyses with three different mass point and breakline density settings, however, and the results are presented in Table 7-2. Resolution Changes for Sheet 093G.066 Number of Triangles Created Point Density (m) Line Density (m) 10.0 7.5 5.0 1.0 25.0 21172 N/A 31164 77160 20.0 21401 N/A 31343 77329 15.0 21927 2541*5 31847 77773 10.0 23576 N/A 33372 79270 5.0 30608 N/A 40924 86174 1.0 59048 N/A 68962 115106 Table 7-1. Resolution Changes for Sheet 093G.066. The accuracy values for the highlighted entries are tested in Table 7-2. Although Table 7-2 shows that the highest resolution data result in the best accuracies, it is not clear from the limited number of tests where we obtain the best trade-off between resolution and accuracy. Considering that there is only a 2.15% increase in the K value between the 15/7.5m test and the 5.0/5.0m test, it seems likely that the optimal value lies somewhere between these values, but this cannot be accurately determined without further tests, so I have adopted the 5.0m line and point resolution settings for the remainder of this thesis. Training Set Mass Point Resolution (m) Breakline Resolution (m) Base Vigilance Learning Rate Overall Accuracy KHAT Accuracy Sht6667_25m_finalSetXY_allCols.m 25.0 10.0 0.2 1.0 87.95% 48.02% Sht6667_15m_tinalSetXY_allCols.m 15.0 7.5 0.2 1.0 89.36% 64.00% Srtt6667_5m_finalSetXY_allCols.m 5.0 5.0 0.2 1.0 89.69% 66.15% Table 7-2. Overall and K accuracies for low, medium and high-resolution data sets. 7.2.2 Training Sets In order for the Fuzzy ARTMap A N N to leam which groups of morphometric variables represent which landforms, it must be trained. Training consists of presenting the A N N with a number of examples. Each example consists of a single triangle's known classification (esker, kame, drurnlin, or not classified) together with its selected morphometric Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 95 variable values. After the ANN has been presented with many examples, it gradually leams to "see" what the different landforms look like. There are many ways in which the examples can be presented to the ANN. Each group of examples to be used for training is referred to as a "training set." Twelve training sets were examined to see which produced the highest accuracy results. The following list is a summary of the contents of each training set: • The All Rows training set exports data for all the triangles to the ANN • The Training Only training set exports only those triangles that "belong" to the landform. It excludes all triangles that do not "belong" to the landform • The Training + Random set doubles the number of triangles to the Training Only set; the new triangles are chosen at random from the unclassified triangles • Train_25f> through Train_5000 are subsets of the Training + Random training set In these sets, 250 to 5000 rows are taken from the "Training" part and an equal number are taken from the "Random" part Table 7-3 shows the results for drumlins when I tested these training sets on map sheets 093G.056, 093G.057 and 093G.067 after training on 093G.066. Training Set Landform Variable Set Training Set Overall Accuracy KHAT Accuracy sheet66 drumlins drumlinsAll.m Drumlins drumlins All Rows 62.30% 10.37% sheet66_drumlins_drumlins_trainto.m Drumlins drumlins Training only 21.04% 0.00% sheet66_drumlins_drumlins_traintr.m Drumlins drumlins Training + Random 50.80% 3.59% sheet66_drumlins_drumlins_train_250.m Drumlins drumlins train_250 64.92% 16.99% Sheet66_drumlins_drumlins_train_500.m Drumlins drumlins train_500 44.51% 2.76% sheet66_drumlins_drumlins_train_750.m Drumlins drumlins train_750 70.79% 13.93% sheet66_drumlins_drumlins_train_1000.m Drumlins drumlins train 1000 50.15% 3.99% sheet66_drumlins_drumlins_train_1250.m Drumlins drumlins train_1250 66.74% 14.84% sheet66_drumlins_drumlins_train_1500.m Drumlins drumlins train_1500 61.78% 16.87% sheet66_drumlins_drumlins_train_1750.m Drumlins drumlins train_1750 43.84% 2.41% sheet66_drumlins_drumlins_train_2000.m Drumlins drumlins train_2000 52.18% 6.45% sheet66_drumlins_drumlins_train_4000.m Drumlins drumlins train_4000 60.53% 11.37% sheet66_drumlins_drumlins_train_5000.m Drumlins drumlins train_5000 55.34% 10.04% Table 7-3. Overall and K accuracies for different training sets, with all other variables remaining equal. Table 7-3 shows no clear association between accuracy values and the number of rows used for the training. An earlier run, using a different training set, which is shown in Table 7-4, showed a clear relationship between the K accuracy and the number of rows that were used in the training set. For this reason, I employed 1250 rows in producing the results; since the new results call this into question, I will have to re-examine the number of rows in the training set later. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 96 Training Set Landform Variable Set Training Set Overall Accuracy KHAT Accuracy sheet565767_drumlin_discriminant_test.m Drumlins Drumlin_ discriminant l"rain_250 55.15% 7.82% sheet565767_dnjmlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_500 55.07% 6.13% sheet565767_drumlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_750 56.21% 7.27% sheet565767_drumlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_1250 56.95% 9.26% sheet565767_daimlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_1500 56.05% 7.73% sheet565767_drumlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_1750 55.27% 7.09% sheet565767_drumlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_2000 55.79% 7.80% sheet565767_drumlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_4000 54.46% 7.53% sheet565767_drumlin_discriminant_test.m Drumlins Drumlin_ discriminant Train_5000 54.09% 6.71% Table 7-4. A previous run of the analysis presented in Table 7-3 with a different variable set showed a clear peak in K accuracy at 1250 training rows. I created the Train_250 through train_5000 sets after I discovered a second way in which a Fuzzy ARTMap Neural Network can memorize data. Recall that in Section 3.8,1 discussed how, if the vigilance parameter in the ANN is set too high, memorization can occur. This is not the only method by which the Fuzzy ARTMap ANN can memorize training data. Even with low vigilance values, I have found that Fuzzy ARTMap may memorize training data when presented with many training sets (Figure 7-1). The solution is to significandy reduce the number of rows that I used to train the ANN. 7.2.3 Morphometric Variables After encountering problems with the settings that came out of the ratcheting process, I took a second look at the morphometric variables used in the LCS, in an effort to improve the classification accuracies produced by the system. This involved modifying some morphometric variables, and reconsidering the variable sets that were used for the training of the ANN. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 97 Figure 7-1. Map sheets 093G.066 (left) and 093G.067 (right) showing differences in classification accuracy. The A N N classified sheet 093G.066 perfectly because the results were memorized, but sheet 093G.067 is only sparsely classified, missing entire drumlins on the west side of sheet 093G.067. 7.2.3.1 Changes to Selected Variables Although the Principal Component Analysis indicated that the Y-Coordinate was a significant morphometric variable, I later realized that it led to higher accuracies on 093G.066, but at the expense of the accuracies on all other sheets. The initial reasoning that led to the inclusion of the X and Y-Coordinates in my initial list of morphometric variables was that the A N N would be able to compare the coordinates of classified triangles with those of unclassified triangles in the same area. If a triangle is not classified by the A N N , but is surrounded by 100 triangles that are, surely the A N N would recognize that most features in that area are classified, and would change the classification of the single triangle. This logic has three faults. First, the classification is done in a single pass by the A N N ; it looks at all the values, and determines whether each triangle falls into a "classified" or an "unclassified" cluster. The A N N never goes back to reconsider which cluster it has assigned a particular triangle. Second, all variables are treated equally. For the A N N to override its classification based on the X and Y-Coordinates, it would have to elevate the importance of these two morphometric variables above the other variables in the system. Third, the A N N uses absolute values, not relative values. This means that when the A N N is trained that a triangle at a particular X and Y-Coordinate should be classified, that "knowledge" cannot be transferred to another map sheet, because triangles on that map sheet will have different X and Y-Coordinates. When I included the Y-Coordinate in the drumlin classification, the results contained horizontal bands of classified triangles on map sheet 093G.067, to the east of the map sheet used for teaming. The same logical errors stymied my efforts to use the Topographic Component to remove unclassified triangles. In this case, I was assuming that if a majority of triangles within a topographic component were classified as belonging to a landform, that the A N N would correct the remaining values, so that all triangles within a topographic component would become either classified or unclassified. Unfortunately, this tended to skew the results on the testing map sheets, particularly when topographic components had the same value on the testing sheet as on the training sheet. The topographic component still has value, but the Fuzzy ARTMap A N N cannot use it to amalgamate triangles. It is used Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 98 in the initial calculation of some morphometric variables and in the creation of a topographic filter, as described in Section 7.2.7. A second variable added late in the process was the triangle area. Since the size of triangles in the NetTIN is inversely proportional to the complexity of the underlying terrain, I added this variable as another way of looking at the surface roughness. Unlike the roughness morphometric variable, the LCS can calculate the triangle area very rapidly. This variable proved to be useful in increasing K values by about 4%. 7.2.3.2 Changes to Variable Sets The LCS allows me to vary the morphometric variables that I present to the A N N . I created a number of different training sets, in addition to FinalSet, Eskers, Kames and Drumlins, which were the sets that I created with the results of the Principal Component Analysis. AllColumns includes every morphometric variable that I programmed into the LCS. This variable set was created early on as a "sanity check," to make sure that the variable sets chosen using PCA (FinalSet and Drumlins) resulted in higher accuracy values than all of the variables used together. Because of the problems that I have found with the use of the X and Y-Coordinates as morphometric variables, I created the EskersMod, KamesMod and DrumlinsMod variable sets. These are the same as the Eskers, Kames and Drumlins variable sets, except that I removed the X and Y-Coordinates. FinalSetHistoric, which resulted from an error in the choice of input variables based on Principal Component Analysis, was preserved because it actually resulted in high K accuracy values. A suggestion from Dr. Brian Klinkenberg to look at discriminant analysis resulted in a completely different set of variables. I used a M A T L A B (Mathworks, 2004) program (Kiefte, 1999) to perform discriminant analysis on the data produced by the LCS. Because LCS already exports data to M A T L A B , it was relatively easy to reuse these data within the discriminant analysis program. From the chscriminant analysis, I produced two data sets: DrumlinDiscriminant, and DrumlinDiscriminant2. To create DrumlinDiscriminant, I selected a cut-off correlation value of 0.7, which yielded 3 variables (see next page). I created DrunilinDiscrirninant2 from the same analysis, only I reduced the cut-off correlation value to 0.5, which yielded 10 variables. Unfortunately, as with the variables chosen with the help of PCA, these morphometric variables did not work particularly well with the neural net classification. The K score achieved DrunilinDiscrirninant2 was 7.10% (with no filter in place). The results for DrunilinDiscriminant were even lower. O f course, both were much worse than the results produced by the Drumlins variable set, which I created using the results from PCA. A separate attempt was made to select the variables visually, using the images produced in Chapter 4. For this test, I chose those variables whose images best showed the drumlins. These included slope, mean orientation, drainage density and positive openness. In total, twelve sets of variables were defined. I assigned the following variables to each variable set: Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 99 • FinalSet: elevation, slope, skewness of elevation, mean orientation, horizontal convexity, vertical convexity, mean available relief, drainage relief, peak density, drainage density, positive openness, negative openness, and Y-Coordinate • Eskers: slope, aspect, skewness of elevation, roughness factor, mean orientation, standard deviation of orientation, vertical convexity, mean available relief, drainage relief, peak density, drainage density, positive openness, and X-Coordinate • Kames: elevation, slope, aspect, mean elevation, skewness of elevation, roughness factor, mean orientation, horizontal convexity, drainage relief, drainage density, positive openness, negative openness, and Y-Coordinate • Drumlins: slope, aspect, standard deviation of elevation, mean elevation, skewness of elevation, roughness factor, mean orientation, standard deviation of orientation, horizontal convexity, vertical convexity, drainage relief, peak density, ridginess, X-Coordinate, and Y-Coordinate • AllCohimns: elevation, slope, aspect, standard deviation of elevation, mean elevation, skewness of elevation, roughness factor, mean orientation, standard deviation of orientation, horizontal convexity, vertical convexity, mean available relief, drainage relief, peak density, ridginess, drainage density, positive openness, negative openness, X-Coordinate, and Y-Coordinate • EskersMod: slope, aspect, skewness of elevation, roughness factor, mean orientation, standard deviation of orientation, vertical convexity, mean available relief, drainage relief, peak density, drainage density, and positive openness • KamesMod: elevation, slope, aspect, mean elevation, skewness of elevation, roughness factor, mean orientation, horizontal convexity, drainage relief, drainage density, positive openness, and negative openness • DrumlinsMod: slope, aspect, standard deviation of elevation, mean elevation, skewness of elevation, roughness factor, mean orientation, standard deviation of orientation, horizontal convexity, vertical convexity, drainage relief, peak density, and ridginess • FinalSetHistoric: elevation, slope, skewness of elevation, mean orientation, horizontal convexity, vertical convexity, peak density, positive openness, negative openness, and Y-Coordinate • DrarrdihDiscriminant: roughness, positive openness, negative openness • T>nimlitiT>isrriiminatit2! aspect, roughness factor, mean orientation, standard deviation of orientation, horizontal' convexity, vertical convexity, positive openness, negative openness • VisualSelection: slope, drainage density, positive openness Table 7-5 shows the accuracy values for seven of the variable sets that pertain to drumlins. I trained the A N N with each variable set, using map sheet 093G.066 for training, and sheets 093G.056,093G.057 and 093G.067 for testing. In Table 7-5, the variable set Drumlins, which is landform-specific, resulted in the highest K accuracy score. I can see that the K accuracy value of DrumlinsMod is 1.09% less than that for the Drumlins variable set. I expect that this slight drop in accuracy values on map sheets close to the sheet used for training will become less pronounced for map sheets further away from the training sheet, as the deleterious effects of having the X and Y-Coordinates in the variable set become more pronounced. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 100 Training Set Landform Variable Set Training Set Filter Overall Accuracy KHAT Accuracy Drumlin66_finalSetHistoric_train12 50.m Drumlin FinalSetHistoric Train 1250 None 56.62% 5.66% Daimlin66_finalSet_train1250.m Drumlin FinalSet Train 1250 None 61.96% 7.68% Drumlin66_drumlins_train1250.m Drumlin Drumlins Train 1250 None 66.72% 16.29% Drumlin66_drumlinsMod_train125 O.m Drumlin DrumlinsMod Train1250 None 67.38% 15.20% Drumlin66_allColumns_train1250. m Drumlin AllColumns Train1250 None 69.82% 8.01% Drumlin66_dnjmlinDiscriminant2_t rain1250. m Drumlin Drumlin Discriminant2 Train 1250 None 56.63% 7.10% Drumlin66_visualSelection_train12 50.m Drumlin VisualSelection Train 1250 None 54.87% 4.78% Table 7-5. Overall and K accuracies for different sets of morphometric variables, with all other parameters remaining equal. It is in some ways unfortunate that the Drumlins variable set produced the highest accuracy values. Because each landform-specific variable set uses different morphometric variables, it means that the LCS will be limited to classifying only a single landform at a time, unless two or more landforms coincidentaUy make use of the same set of morphometric variables. This has two implications. The first is that one of the key advantages of the Fuzzy ARTMap A N N is no longer relevant — if only a single landform can be classified at a time, then there is no advantage in the ability of the Fuzzy ARTMap to perform incremental training, since there will never be a second landform added to the Artificial Neural Network. Second, although classifications are quite rapid, having to perform many, possibly hundreds, of classifications in succession will adversely affect the final performance of the Landform Classification System. 7.2.4 Base Vigilance Base Vigilance (/?) is a parameter used by the Fuzzy ARTMap A N N to control the size and number of clusters that are created to represent the input values. The base vigilance allows the range of vigilance values to be "tuned" before the A N N is run. Vigilance values (p) are adjusted automatically within the A N N , so that small clusters can be created for unusual combinations of input values and large clusters can be created for more common combinations. Vigilance is a measure of how closely the Fuzzy ARTMap Artificial Neural Network matches a triangle's parameters with a cluster of values. High vigilance values lead to the creation of many clusters, and reduce the amount of generalization that the A N N uses in representing the input classes. Taken to an extreme, high vigilance values lead to the A N N "memorizing" the input data, where the A N N recognizes nothing but the learned examples. As vigilance values are reduced, more and more generalization is permitted, but the probability of classification errors also increases (Carpenter et aL, 1997). Table 7-6 shows how increasing the base vigilance values increases the classification accuracies for map sheets 093G.066 and 093G.067, when 093G.066 was used to train the Artificial Neural Network. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 101 Training Set Landform Map Sheets Base Vigilance Overall Accuracy KHAT Accuracy Sheet_6667_allRows_finalSetHistoric_ Zeroed_sigmoidal_normalization. m Drumlins 093G.066, 093G.067 0.75 82.91% 32.70% Sheet_6667_allRows_finalSetHistoric_ Zeroed_sigmoidal_normalization.m Drumlins 093G.066, 093G.067 0.5 84.62% 35.46% Sheet_6667_allRows_finalSetHistoric_ Zeroed_sigmoidal_normalization.m Drumlins 093G.066, 093G.067 0.4 85.68% 38.15% Sheet_6667_allRows_finalSetHistoric_ Zeroed_sigmoidal_normalization.m Drumlins 093G.066, 093G.067 0.2 85.82% 38.33% Sheet_6667_allRows_finalSetHistoric_ Zeroed_sigmoidal_normalization.m Drumlins 093G.066, 093G.067 0.1 85.82% 38.33% Table 7-6. Effect on Accuracy of Alterations to the Base Vigilance Level. Decreasing the base vigilance levels from the default value of 0.75 has a beneficial effect on prediction accuracy, but reducing the values below 0.2 does not increase the Overall or K Accuracy values any further. A vigilance value of 0.2 seems to be a reasonable compromise between processing time and accuracy, so I have used this to produce the final results. 7.2.5 Learning Rate The Learning Rate (P) controls the speed at which the ANN incorporates new data. If the rate is set to 0.0, no learning occurs whatsoever. Conversely, if the learning rate is set to 1.0, learning is immediate. Within the ANN, the learning rate parameter controls the proportion of existing values to update values, so if the learning rate were 0.5, the updated value would be calculated by taking half of the existing value and half of the updated value (Carpenter et al, 1997). Most authors that make use of the Fuzzy ARTMap Artificial Neural Network make use of a "fast learning" system, where the learning rate (|$) is set to 1.0 so that the Artificial Neural Network learns each example presented to it on the first pass. If the input data are perfect, this is obviously the best way to go, but what happens when the input data have errors? If the last training triangle presented to the ANN is misclassified, how will this affect the accuracy of the classification, and will a reduction in the learning rate reduce the impact of incorrect data on the classification accuracy? To answer this question, I tried reducing the learning rate from 1.0 to 0.1 to see whether the change would improve the accuracy of the final classification. The training set for drumlins on 093G.066 was used to train the ANN and it was then tested on 093G.066 and 093G.067 to assess the classification accuracy. Table 7-7 shows the effects of the learning rate changes on overall and K accuracy values. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 102 Training Set Landform Map Sheets Base Vigilance Learning Rate ANN Training Time (CPU seconds) Overall Accuracy KHAT Accuracy Sheet_66_allRows_finai 13vars_theonorm.m Drumlins 093G.066, 093G.067 0.2 1.00 -5000 89.89% 54.38% Sheet_66_allRows_final 13vars_theonorm.m Drumlins 093G.066, 093G.067 0.2 0.99 16849 90.05% 54.62% Sheet_66_allRows_final 13vars_theonorm. m Drumlins 093G.066, 093G.067 0.2 0.97 20944 90.08% 54.89% Sheet_66_allRows_final 13vars theonorm.m Drumlins 093G.066, 093G.067 0.2 0.95 20797 89.93% 54.28% Sheet_66_allRows_final 13vars_theononn.m Drumlins 093G.066, 093G.067 0.2 0.90 25982 90.00% 54.42% Sheet_66_allRows_final 13vars_theonorrn.m Drumlins 093G.066, 093G.067 0.2 0.85 26940 89.98% 54.61% Sheet_66_allRows_final 13vars theonorm.m Drumlins 093G.066, 093G.067 0.2 0.80 -40000 89.78% 54.13% Table 7-7. Effects of learning rate changes on classification accuracy. As I decreased the learning rate, the training time increased steadily, but the accuracy levels changed very litde (Figure 7-2). The highest K value was 54.89% at a learning rate of 0.97. However, the trade off for that 0.61% increase in accuracy was a more than fourfold increase in training time when compared with a learning rate of 1.0. I will employ a learning rate of 0.97 to produce the results only if I can train the system quickly with this setting, otherwise I will use a value of 1.0m. Accuracy | r 100.00% —^mmm—naaa_——_eai»—MMM—_.B 90.00% 80.00% 70.00% — — ^ — • — — ^ — — — — ^ - — ^ — 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% , , , , , 1 0.00% 0.80 0 85 0.90 0.95 0.97 0 99 1.00 Learning Rate Overall Accuracy • ^ • • • • • • • • • K H A T Accuracy Figure 7-2. Graphs of accuracy and training time changes for various learning rates. 7.2.6 Voting The classification results of Fuzzy ARTMap and other Artificial Neural Networks are dependent on the order in which training data is presented to them. One way to ensure that results are accurate is to use a "voting" system, in which the results from multiple classifications are amalgamated, and the most common choice is returned (Carpenter et al, 1997). Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 103 I made two modifications to the Landform Classification System to allow for voting. The first was to alter the routine to export training data to the A N N , so that the system could create multiple versions of training data with different orders. When this is used, the Cause&Effect shu f f l e (<AnyList>, 0) function shuffles the taining data many times to produce an odd number (typically 3 or 5) of alternative output files. An odd number of files is necessary to avoid the possibility of a tied vote. Once the classification is complete, multiple result sets can be loaded and combined to produce a final result set. In this way, the most common result becomes the final classification for each triangle. Table 7-8 shows that there is more than a 12% increase in the K values as we move from a single run to the best of 3 runs, and a further 3% jump as we move from the majority of 3 to the majority of 5 runs. The trade-off for these accuracy improvements is a tripling (or quintupling) of the A N N training times. Because of this, I will limit the voting to the majority of 3 runs when performing the final analysis. Training Set Landform Map Sheets Base Vigilance Learning Rate Overall Accuracy KHAT Accuracy Comment Sht6667_15m_finalSet_ allRows_Zeroed.m Drumlins 093G.066, 093G.067 0.2 1.0 89.78% 54.13% Single run Sht6667_15m_finalSet_ allRows_Zeroed_v3.m Drumlins 093G.066, 093G.067 0.2 1.0 90.42% 66.60% Combined results for 3 votes Sht6667_15m_finalSet_ allRows_Zeroed_v5.m Drumlins 093G.066, 093G.067 0.2 1.0 91.51% 69.45% Combined results for 5 votes results from 3 votes, and combined results from 5 votes. 7.2.7 Filters I originally designed the Landform Classification System to amalgamate groups of triangles by using their X and Y -Coordinates to determine the proximity of each triangle to its neighbours. Unfortunately, this approach was unsuccessful, as it did not cause the triangles to agglomerate, and additionally caused classification problems on map sheets other than 093G.066. Because the Landform Classification System now bases its results on individual triangles, the results do not form uniform areas (Figure 7-3a). What we see are clusters of triangles that have missing triangles in the centre, and many individual triangles are found in isolation. These results are not very intuitive. Joseph K. Berry has suggested that this might be because TINs maximize the difference between adjoining triangles as a byproduct of the Delaunay triangulation, since the algorithm minimizes the amount of variation within the each triangle (Berry, 2005). This patchwork results produced by the LCS are both a problem of aesthetics and of classification accuracy. Three filters were written to aggregate triangles that are close together into uniform areas, and to remove isolated triangles (Figure 7-3b, c, and d). Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 104 Figure 7-3. Effect of filters on a section of map sheet 093G.056. The raw data (a) has many disaggregated triangles; the majority_filter_thin (b) produces the best accuracy improvements by removing a majority of the lone triangles and doing some aggregation. The majority_filter thick (c) removes fewer triangles and aggregates more of them to produce a better looking, but less accurate result than majority filterthin. The topographic filter (d) takes a majority within regions surrounding ridgelines, and produces the most accurate and best looking result. The majority of filters work by comparing each triangle with its neighbours. This is an ideal use for the half neighbours that were discussed in Chapter 4; the half neighbours of a triangle are those adjoining triangles that share an edge with the triangle in question. The majority filters work by examining how many of the half neighbours are identified as belonging to a landform. With majority_futer_thin (Figure 7-3b), if a triangle is surrounded by three selected triangles, then it becomes selected, but with only 0 or 1 neighbours, then it is removed from the selection. The Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 105 majority_futer_thick filter (Figure 7-3c) performs two aggregations. If two or three neighbouring triangles are selected, then the central triangle becomes selected. This process is repeated, and then triangles with no selected neighbours are removed. Although majority_futer_thick produces a more visually appealing result than majonty_futer_thin, it actually reduces the classification accuracy instead of increasing it, as can be seen in Table 7-9. The Topographic Filter works by providing additional context information, so that each triangle has some "knowledge" of what the area around it looks like. One way of providing this is to locate the ridges and valleys in the NetTIN. Although it is not always correct to assume that a landform consists of a ridge or a valley, for many landforms this is a reasonable assumption. By associating each triangle with the nearest ridge or valley, we are essentially dividing the entire NetTIN into a series of "landform facets." If a threshold percentage of triangles have already been classified, the Topographic Filter will make all of the triangles within a landform facet classified (Figure 7-3d). Although filters can do much to clean up the output from the A N N , it should be noted that they are not a panacea. I would much rather have a perfect result come out of the A N N , as a product of careful parameter selection, than have to "patch up" an imperfect result with a filter. For this reason, the changes are first tested with the filtering turned off, and then the filter is activated to see how much it improves the results. Training Set Landform Map Sheets Base Vigilance Learning Rate Overall Accuracy KHAT Accuracy Comment Sheet565767_drumlins _train_1500.m Drumlins 093G.056, 093G.057, 093G.067 0.2 1.00 61.78% 16.87% No filter Sheet565767_dnjmlins _train_1500.m Drumlins 093G.056, 093G.057, 093G.067 0.2 1.00 62.76% 18.31% MajorityJilter_thin Sheet565767_drumlins _train_1500.m Drumlins 093G.056, 093G.057, 093G.067 0.2 1.00 47.97% 10.65% Majority_filter_thick Sheet565767_drumlins _train_1500.m Drumlins 093G.056, 093G.057, 093G.067 0.2 1.00 74.19% 28.62% Topographic Table 7-9. Changes in classification accuracies with majority filters. 7.3 Final Classification Accuracies The following results are based on the DrumlinsMod, EskersMod, and KamesMod groups of morphometric variables. A total of 1250 triangles were used in the training. The learning parameter for the Fuzzy ARTMap A N N was set to 0.97 and the base vigilance parameter was set to 0.2. The A N N was trained on 093G.066 and tested on 093G.056, 093G.057 and 093G.067. The majority of 3 votes was taken, and the topographic filter was used to clean up the results. 7.3.1 Eskers Although the K accuracy values for the eskers shown in Table 7-10 are quite low, the results shown in Figure 7-4 are quite mtiiguing. Figure 7-4b shows a number of landforms that were previously unidentified on map sheet 093G.056. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 106 Overall Accuracy 95.85% I KHAT Accuracy 3.99% Producer's Accuracy User's Accuracy Esker: 19.19% Esker: 2.83% Non-Esker 96.29% Non-Esker: 99.53% Table 7-10. Classification accuracies for eskers. Note: these values are based on the classification of map sheet 093G.056, 093G.057 and 093G.066 with the topographic filter (0.5 threshold). (c) Figure 7-4. Final classification results for eskers. Unfiltered A N N output (a), filtered output (b) and expected eskers (c) for map sheet 093G.056,093G.057 and 093G.066. The LCS has identified several previously unknown eskers on map sheet 093G.056 (Figure 7-5). Visually, these are very similar in appearance to the known eskers on the NetTIN, but there is no indication that the photogrammetrist who collected the TRIM data identified these landforms as eskers. It is possible that these features are merely extremely elongated drumlins, but their extremely high length to width ratio is not consistent with other drumlins on the map. I will have to confirm the existence of these eskers by conducting further field studies or by examining airphotos. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 107 Figure 7-5. Close-up hillshaded views of previously unidentified potential eskers. Isometric view at U T M coordinates 510.758E, 5,928,872N (a). Isometric view at U T M coordinates 510,855E, 5,938,199N (b). Plan view at 511.195E, 5,932,660N (c). Isometric view at 508.299E, 5.931.290N (d). Isometric view at 512.024E, 5.929.368N (e). Isometric view at 510.357E, 5,935,201N (0- All coordinates are U T M Zone 10, NAD 83. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 108 7.3.2 Kames In the images of the kame classification shown in Figure 7-6, the system has predicted dozens of kames; this prediction disagrees with our field observations. Despite this, the K value calculated in Table 7-11 is 0.00%, which is a result of having no kames in the training data to which the classified results are compared, as is shown in Figure 7-6c. In cases such as this, the value of the analysis is a result of being able to make new, focused predictions of landform locations that can be verified using other methods, as has been demonstrated above with the drumlin classification. Overall Accuracy 98.47% KHAT Accuracy 0.00% Producer's Accuracy User's Accuracy Kame: 0.00% Kame: 0.00% Non-Kame: 98.47% Non-Kame: 100.00% Table 7-11. Classification accuracies for kames (Topographic Filter used, threshold = 0.99). In this case, the low K Accuracy is the result of making a comparison where no kames are expected. With an incorrect K value, the Overall Accuracy becomes more important, since it is not affected the same way as K when no landforms are expected. In Figure 7-6b, I increased the threshold on the topographic filter to 0.99 to produce more focused predictions from the broad shading shown in Figure 7-6. Unfortunately, despite being more focused, they are no more accurate. The system has misclassified numerous drumlins as kames. Unlike the case for the eskers, zooming in on the predicted kames did not reveal any obviously kame-shaped candidates. The difference between the results for eskers and those for kames are most likely because of the difference in the number of training features used. For eskers, 193 triangles were used for training, while for kames, only 11 triangles were used for training. It appears that I need to find another study area with more kames for further training. 7.3.3 Drumlins By far the best classification results obtained in this thesis were for drumlins (Table 7-12). In, Figure 7-7a we see that the raw classification has the same general distribution that we see in the expected results (Figure 7-7c). When the topographic filter is applied (Figure 7-7b), many of the drumlins are identified on the till plain of map sheet 093G.056. On the other map sheets, however, the raw classification was not suffitiendy dense to create a majority within each topographic component. We can see this on map sheet 093G.057, where the system only identified about five drumlins, and on map sheet 093G.067, where the system identified none. Some of the accuracy of this classification may be due to the methods used for training, rather than being a result of the classification. Since drumlins are easy to visually identify, the expected drumlins shown in Figure 7-7c, are relatively more accurate than the expected kames and drumlins shown above. A band of larger drumlins on map sheet 093G.056 was missed by the classification. These are circled in the lower left hand corner of the image in Figure 7-7a, running south-southeast to north-northwest. This may have been omitted from the classification because the drumlins in this area are different in character from those that are found on 093G.066, which was the map used for training. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase (c) Figure 7-6. Final classification results for kames. Unfiltered (a), filtered (b) and expected kames (c) for map sheets 093G.056,093G.057 and 093G.067. Overall Accuracy 74.78% KHAT Accuracy 26.36% Producer's Accuracy User's Accuracy Drumlin: 44.23% Drumlin: 40.85% Non-Drumlin: 82.92% Non-Drumlin: 84.79% Table 7-12. Classification accuracies for drumlins (topographic filter, threshold = 0.4). Notice the high K value relative to eskers and kames. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 110 (c) Figure 7-7. Final classification results for drumlins: initial, unfiltered results from the A N N with unclassified zone circled (a), filtered results (b), expected results for map sheets 093G.056,093G.057, and 093G.067 (c). 7.4 Summary Although I was able to obtain relatively high accuracy values for drumlins after modifying the classification conditions somewhat, there is still much work to be done before the LCS can identify landforms reliably. The differences in both the number of classified triangles and the extent of the classified areas between drumlins and the other landforms make the analysis difficult. For Kappa Analysis to produce meaningful results, there must be sufficient landforms available for training, and at least one landform in the area to be used for testing. Although my efforts to ratchet up accuracy values by altering one of the seven parameters at a time were not wholly successful, it did provide enough information to help obtain some interesting results for eskers and kames. Although Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 111 they appear on the error matrix as misclassifications, the LCS predicted six promising eskers, where none were known previously. This is proof that the LCS is able to predict landforms, which it was designed to do. It remains to be seen whether or not the predicted eskers are real; further follow-up field or airphoto interpretation work will be required to prove these predictions. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 112 8 C o n c l u s i o n s If I have seen further it is by standing on the shoulders of giants. - Sir Isaac Newton In July 2002, when I began this project, I envisioned a system that could rapidly classify all of the landforms in a large area, such as the Province of British Columbia. While the Landform Classification System, as it became known, met some of the criteria that were required for successful commercialization, there remain some significant obstacles before all of the lofty goals of 2002 can be met. Much of the system works relatively rapidly and produces acceptable results, but it appears that some unknown key ingredient is still missing, which prevents the Artificial Neural Network from producing results like those that I envisioned in 2002. 8.1 Successes In its current form, the Landform Classification System (LCS) is able to load TRIM data and create accurate NetTINs using the breaklines in the TRIM data. Even with the highest resolution data being used, Triangle Neighbours (but not morphometries) can be calculated in several hours. Using the 5m point and line resolution, 22 morphometric variables can be calculated in a few hours. Further refinement of the software should be able to increase the speed of the system significandy. The approach of using a NetTIN to represent the land surface works quite well, and the morphometric variables that were calculated make logical and visual sense. Training data can be created and managed with the application, and the training data, together with the morphometric variables can be easily combined into a training set. Training sets can be exported to the Fuzzy ARTMap A N N for analysis. Once the trained A N N has performed a classification, the results can be imported back into the LCS, where they can be used to create the Error Matrix and calculate accuracy values. This, together with the onscreen display of a map of the classified results, allows the classification accuracy to be rapidly assessed. This approach seems to work relatively well, and the results in Chapter 7 prove that the system works, although not yet perfecdy. The Fuzzy ARTMap A N N is able to "see" the shape of the landforms reasonably well. Eskers and diumlins tend not to ovedap with each other or other landforms, such as the flyggberg on map sheet 093G.056,12km south of Mt. Baldy Hughes. It is important to note that this flyggberg was not confused with the drumlins, even though both have similar shapes. This implies that the morphometric variables contain enough information to allow the system to differentiate landforms with similar morphologies. The system is much less successful in making use of the spatial context of the landforms to combine groups of triangles into recognizable landforms. Although the topographic filter compensates for this during post-processing, a more rigorous A N N classification that examines the spatial context should improve classification accuracies significandy. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 113 The system is easily able to handle individual TRIM map sheets, each covering roughly 140 km 2. During testing, four map sheets could be viewed at a time, which made edge matching in the training sets very easy. This proves that the LCS is capable of processing large data sets in reasonably sized pieces. The advantages of the LCS over a human interpreter of topographic data are underscored by the discovery of six possible eskers (see Section 7.3.3). Despite looking at the NetTINs for hours, I spotted none of these candidate eskers until they were highlighted in the LCS results. Although I have achieved many successes in this project, commercialization of the Landform Classification System is still a long way off. Further development of the system should improve its reliability and ability to discriminate landforms. However, I have reached the end of this project's scope as it was laid out in 2002. The following two sections discuss some major changes to the software and procedures that will bring this project much closer to completion. 8.2 Software Improvements Required The following changes reflect the discoveries made during this project. Some of the ideas that I had in 2002 need to be revised, now that I have a better idea of where the project must go. These changes will form the core for Version 2.0 of the Landform Classification System. 8.2.1 Problems with NetTIN Construction One surprising limitation of our NetTINs is that the Delaunay triangulation does not seem to take into account Z values when calculating which points to preserve in the triangulation. Lischinski (1994) states that Delaunay triangulation is stricuy a 2D process. A modified Delaunay triangulation algorithm to either account for Z values, or disproportionately weight them would be helpful in producing more detailed NetTINs. This would ensure that the NetTIN better represents small landforms such as eskers. Verbree & Oosterom (2001) discuss this problem and propose a solution, which uses scanlines to create a Tetrahedronized Irregular Network (TEN). 8.2.2 The Triangle as the Basis for Morphometric Calculations My decision to not explicitly group triangles into regions (as done by Miliaresis & Argialas, 1999, Miliaresis and Argialas, 2002) before calculating morphometric variables appears to be wrong. I had expected that the Fuzzy ARTMap A N N would be able to account for the position and classification of triangles, but including the X and Y-Coordinates (and later the topographic component) only created more problems than it solved. Triangles are treated in isolation, which leads to the patchwork of classified triangles in the final result. Because of the failure of this approach, the LCS has no concept of topology. While it is true that the Triangle Neighbours and majority filters fix this problem to some extent, they are imperfect solutions. Some explicit method of associating a triangle with its neighbours before calculating morphometric variables would be a better solution. The first order neighbours that are used in the calculation of morphometric variables can be Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 114 replaced with Topographic Neighbours to create larger neighbourhoods for each variable, which reflect the topography of the area. Currendy, only the standard deviation of orientation and the mean orientation make use of the Topographic Neighbours. By modifying morphometric variables so that they use Topographic Neighbours, much of the patchwork appearance of the current results may be resolved. This may eliminate the need to use niters entirely. 8.2.3 New Morphometric Variables Topographic Neighbours can allow us to create a new morphometric variable. With them, I can determine the mean aspect for the triangles making up the landform. If we create vectors for each triangle in a Topographic Neighbourhood, with the direction equal each the triangle's aspect, and the magnitude equal to the area of each triangle, then summing the vectors should create a resultant with a direction that is close to the direction of ice flow for streamlined features such as drumlins (Figure 8-1). Resultant /\1 \ \ ti2 A J M l >> Figure 8-1. The sum of the triangle orientations in a drumlin (dashed line, right) is roughly equivalent to the direction of ice flow. The resultant of the vector addition (red dashed line, left) shows the direction of ice flow. In the LCS, slope and aspect are treated separately. One way to combine these variables in a meaningful way is to create a hillshaded image of the NetTIN. Since the human eye requires a hillshaded image in order to identify landforms, shading the triangles in the NetTIN according to their iDumination may prove useful as another morphometric variable. 8.2.4 Changes to Cache Files Right now, the NetTIN and the Triangle Neighbours, Convexity Neighbours, and Topographic Neighbours are stored externally in tin_*.xdr cache files, and the morphometric variables are stored in morph_*.xdr cache files. Using these cache files keeps the size of the application down, since they are only loaded when required. One more type of cache file needs to be created to store training data. Currendy, these data are stored directiy in the Lesson Database module, so that every time a lesson is added, the size of the application grows. Moving the training data into a cache file Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 115 (tentatively to be called train_*.xdr) will allow the LCS to handle virtually any size project without having the application grow too large to be loaded. I I 8.2.5 Fixing The Ratchet The poor classification results that I obtained after the completion of the ratcheting process is worrisome. The results are poor, likely because it is not possible to completely adjust the parameters in isolation. In a worst-case scenario, the seven parameters create a huge number of combinations that will be have to be examined individually in order to obtain the best combination of them for each landform. If we consider the multidimensional "surface" formed by the seven parameters that we were ratcheting, it appears in hindsight that it has many local minima and maxima. This explains why I had so much difficulty with the ratcheting process, and why I was able to produce a K accuracy value 36.64% on January 18, 2005, but was unable to later reproduce these results, presumably because something that was apparendy inconsequential had changed. The alternative to manually ratcheting values for several months more is to use the power of computers to test a finite, but large number of combinations in order to determine reasonable settings for the LCS. Three approaches might help to discover a better set of parameters for predicting landforms: a Heuristic Search, a Monte Carlo method, and a modified Genetic Algorithm. The first method is a heuristic search, in which a complete range of values (divided into large enough increments) is examined using a brute force approach. This has the potential of identifying combinations of parameters that I overlooked during the manual ratcheting process, since the computer has no biases about what makes a good combination of parameters. Unfortunately, it will take a long time to work through all of the combinations with this approach. The second approach would be to use a Monte Carlo approach, in which values are randomly varied, and those combinations that produce the best values are recorded. This approach is likely to take less time than the heuristic search, but may not explore the parameter space completely. The final approach is a variation of a genetic algorithm, where the system treats the parameter settings as a genome, and keeps only those "genes" (parameters) that help increase "fitness" (accuracy values). This is likely the fastest of the three approaches, but may fall into local minima. I will have to change some programming to enable the LCS to run automatic tests, no matter which approach I try. The system will have to be able to alter its parameters, produce a complete classification, and then analyze the results and store promising combinations of settings. This is not really a practical option until the Fuzzy A R T Neural Network has been fully integrated into the LCS. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 116 8.2.6 The Need for an Integrated ANN Despite its significant successes, the LCS is significantly limited by the lack of an integrated Artificial Neural Network. Work is currendy underway at Facet Decision Systems to embed this technology direcdy into the source code for Cause&Effect. Although the A N N might be written in Cause&Effect language9, ANNs typically require thousands of iterations, and this is not one of the strengths of this language. Cause&Effect is an ideal platform for the analysis of NetTINs and the calculation of neighbourhoods and morphometric variables, but it is much too slow at performing loops, which are required by all the A N N algorithms published to date. Cause&Effect itself is written in the C computer language, which is very good at performing loops, and the direct integration of an A N N package into the Cause&Ejfecl language would side-step its limitations. The Fuzzy ARTMap implementation that is being prepared for Cause&Effect will only be available in Version 4.0. This will require the LCS to be upgraded from Cawe&Effect Version 3.6 to Version 4.0. This upgrade will solve a number of problems, such as a "memory leak" that causes occasional crashes, and it will add some functions to significandy improve the speed of the morphometric variable calculations. 8.2.7 Variations on the Fuzzy ARTMap ANN I remain convinced that the Fuzzy ARTMap A N N is the best way of ckssifying landforms in the LCS. A key fmcling of Chapter 7, however, was that each landform being tested must have a unique set of morphometric variables to obtain the best classification results. This means one of the key advantages of the Fuzzy ARTMap A N N , namely its ability to add classifications on the fly, is no longer required. If I can find another classification technique that permits higher classification accuracies than Fuzzy ARTMap, and allows for training by example, then the whole classification system used in the LCS will have to be rethought. In its current form, some shortcomings in the Fuzzy ARTMap have become apparent, and it may be worthwhile to consider how Fuzzy ARTMap can be redeployed to produce better results. As I discussed in Section 7.2.3.1, the Fuzzy ARTMap A N N lacks some basic perceptual abilities that humans take for granted. The basic problem is that Fuzzy ARTMap does all of its analysis in a single pass, and never goes back to reconsider the decisions that it has made. While this allows classifications to be made very rapidly, it also can lead to results that appear hopelessly naive to a human being. The basic problem is that, although the Fuzzy ARTMap A N N may create a cluster of 100 classified triangjles, it may leave a single triangle out of the middle of the cluster. A human, upon seeing this, intuitively questions his assumptions and would probably add the missing triangle. To accomplish this using Fuzzy ARTMap, it may be necessary to set up a second A N N to re-classify the results after the Fuzzy ARTMap has completed its classification. This two-level system of ANNs might be able to solve a problem that defeats a single Fuzzy ARTMap A N N . This type of application has 9 Before deciding to use the Fuzzy ARTMap ANN, I programmed a feed-forward Backpropagation ANN using the Cause&Effed, but it could only manage about one iteration per second. Typically, many thousands of iterations are required by this type of ANN, so this level of performance was unacceptably poor. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 117 been created before: Hammerstrom (1993a) discusses how engineers at Sharp, Inc. created a three-level hierarchy of ANNs to allow for the optical character recognition of Japanese written characters. 8.2.8 The Need for Topological Functions Once the LCS can produce acceptable landform classifications, amalgamating the resulting data will require tools for polygon construction and labeling. Cause&Effect currendy lacks an extensive set of tools for constructing vector topology. At present, such work must be done in a vector GIS package. The LCS has a function to amalgamate classified triangles to produce a vector line map showing the boundaries between classified landforms and unclassified areas. This, together with the labels identifying whether an area is classified as a landform or not, can be exported as separate ESRI Shape Files. These files can then be imported into ArvGIS, for joining, edge matching, and polygon topology construction. Unfortunately, this step introduces a great deal of human involvement, which slows down the classification process and introduces the possibility for human error. 8.3 Procedural Improvements Required A number of procedural changes are required in addition to the software changes mentioned in the previous section. Although the LCS is a software product, without the correct procedures, it cannot be used effectively. 8.3.1 Rerunning PCA Since I ran the Principal Component Analysis for this thesis, a number of changes have been made to morphometric variables (standard deviation of orientation, mean orientation), new variables have been created (triangle area, topographic component), and further changes have been proposed (see above). All of these changes have the potential to improve classification accuracies further, but PCA needs to be re-run to determine which variables are now most suitable for the classification of eskers, kames and drumlins. 8.3.2 The Role of the Expert in Training Landforms It became apparent in this project that landforms to be used for training must be reviewed by at least one expert. Although it is easy to come up with a first approximation of the training landforms, the vagueness of the landform definitions commonly used, and the degree to which the definitions overlap, makes oullining a precise set of landforms difficult. In this project, there were a number of o^ rumlins that were suspicious, because the occurred by themselves. Having one or more experts exarnine the training landforms would probably have helped to improve the accuracy of the final classification. 8.3.3 Determining the Optimal Number of Rows in a Training Set In Section 7.2.2,1 found no correlation between the number rows in a training set and the accuracy of the classification that it produced. In an earlier test, run on January 11, 2005 using the DrumlimDiscriminant variable set, I found a strong correlation between the number of rows used in the training set, and the accuracy of the classification results. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 118 The accuracy levels peaked when 1250 classified triangles and 1250 non-classified triangles were used. I adopted this value for the remainder of the tests described in Chapter 7. On February 18, 2005, I was unable to reproduce the results in Table 7-3. One difference in the February 18 test was that I used the Drumlins variable set instead of DrumlinDiscriminant. I am certain that having too many rows of taining data leads to memorization in the Fuzzy ARTMap A N N , but it would be better to have a good idea of how many rows produce optimal classification results. It may be that the accuracy levels are a function of both the number of rows in the training set and the number of morphometric variables that are presented to the A N N for training. 8.3.4 Voting and Filter Usage Voting should be used as a tactic of last resort Although we saw in the previous chapter that a system of 3 votes could increase K accuracy levels by 12% and a system of 5 votes could add a further 3%, the cost of this improvement is high. Each additional vote requires all of the A N N training and classification to be performed again. I believe that voting and filtering schemes should not be employed until we are sure that the classification accuracies are as high as possible by other means. Voting and filtering are ways of "fudging" the data, and mask fundamental underlying problems in the classification procedure. 8.3.5 Rare Landforms The relatively poor classification results for eskers and drumlins need to be considered carefully. There are two possible explanations why these results are so poor: 1. The relief of these landforms is too low for them to be adequately represented on the NetTTN. 2. There may simply be too few examples available to train the A N N propedy for these landforms. Problems with the representation of the landforms on the NetTIN may be resolved by incorporating the changes to the Delaunay triangulation algorithm that were described in Section 8.2.1. In addition, the speed improvements that are promised for Canse<&Effect Version 4.0 may make it feasible to construct NetTINs with point and line resolutions finer than the 5.0m and 5.0m that are currently used. In cases where the landforms are rare, it might be worthwhile to find another training area that has more of the landforms in question, create a collection of "textbook cases," or even use artificial, mathematically defined landforms so that sufficient landforms are available for toiining. 8.3.6 The Effectiveness of Error Matrices Although Error Matrices and the accuracy values that are generated from them are effective tools for summarizing the effectiveness of classifications, this project has liighlighted some of the shortcomings of this technique. Since Error Matrices are based on comparisons between classifications and "known" values, the known values must be perfecdy accurate. Unfortunately, it is not possible to obtain perfect data for comparisons, even if we had covered every square Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 119 metre of the study area on foot. When "new" landforms are identified, such as with the eskers, the Error Matrix treats them as errors. There is no category for unexpected and potentially valuable discoveries! Similarly, the case for kames shows that any results, no matter how valuable or worthless they may be, produce a K value of 0.0% when no landforms are known to exist in an area. The values generated from an Error Matrix do not consider the spatial distribution of triangles. As I have shown with the topographic filter, solid clusters of triangles are much more valuable than "random" distributions of triangles in an area. The Error Matrix needs to be modified to take into account the expected spatial distribution of features on the land surface. In short, Error Matrices, and the accuracy values that are derived from them, present a concise measure of the accuracy of a classification, but they represent a statistical view, not the full picture. Images showing the distribution of the triangles are also required to be able to assess the quality of the results in a geographical sense. 8.3.7 Organizational Issues This is the largest single project that I have ever attempted, to the fore, and these should be noted. In hindsight, a number of organizational issues have come Firsdy, I spent a lot of time programming software before I was sure what the final product would look like. While it is true that a framework was required in which to perform my research, too much of the system was "hardwired" eady on. When the problems with the ratcheting process became evident in late 2004, it was difficult to make the necessary changes to the software. Some of the changes were poody programmed "kludges" that were attached to relatively well-written code. The number of these has now become a burden, and before further work on the LCS can proceed, the code needs to be cleaned up and streamlined. Secondly, although I was thorough with my literature search, I should have spent more time thinking about the core problems of this project. The limited amount of success by other authors in using morphometric variables should have been a clue that the entire concept behind these might be questionable. Recall that Evans (1972) stated, "Many of the operational definitions used in geomorphometry are extremely poor as measures of the intended concepts (p. 17)". I need to conduct more research into the process by which humans identify landforms. Insights into which combinations of shape, size, colour and texture humans use to visually identify landforms might prove helpful in refining the procedures used by the LCS. It is a maxim in private industry that committees never make discoveries, and it is only through the actions of individuals that we make progress. This project showed the truth of this statement, but it also showed that once an individual has shown the way, projects often become too large to be handled by individuals. The next major step, to build Version 2.0 of the Landform Classification System, will have to be finished as a team effort. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 120 8.4 Prospects for Commercialization If the K classification accuracy of the LCS can be raised to about 80% for eskers, kames, and drumlins, then the prospects for its commercialization are excellent. At present, this represents a near quadrupling of the accuracy values for drumlins, and a very large increase in the accuracy levels for eskers and kames. As I mentioned in the introduction to this chapter, there still appears to be a missing key ingredient in the LCS. If it can be found, then nothing further prevents the LCS from becoming a commercial product. One question that needs to be asked is whether the LCS represents a scalable solution. If a really large project were to be attempted, such as classifying the landforms for all of Canada, or even identifying all landforms on Mars or the Moon to support future exploration efforts, could the LCS handle the problem? Because the LCS works on a single map sheet at a time, there are no serious scalability issues in the design of the system. Although Cause&Effect only runs on Sun workstations, this does not present a serious problem. As more sheets must be processed, more powerful computers can be substituted for the current 4-processor machine — the top of the line Sun workstation now supports up to 106 processors (Sun, 2004). If we reach the limitations of current servers, we can simply add additional ones. If the scale of the project is large enough, it might be reasonable to replace a single human operator with some sort of batch process, but with the LCS, the amount of work that a full-time human operator can perform is substantial. Although I have not tried this, I estimate that a single operator could give commands to process at least 20 map sheets simultaneously, if enough computers are available. y If an A N N is integrated into the LCS, I estimate that a sheet with a 5m point and line resolution could be classified in about six hours. Given that there are 7027 TRIM sheets for the Province of BC (Base Mapping and Geomatic Services Branch, 2004), the province could be classified for one or more landforms in less than five years using a single server. If we increase this to 20 servers under the control of a single operator, the task could be completed in about 90 days. 8.5 Further Papers and Projects There are a number of follow-up studies to this thesis that would make interesting scientific papers. Perhaps the most obvious of these would be to compare the Triangle Neighbours that I have developed in this project with the types of neighbourhoods typically calculated for Digital Elevation Models. Which type of neighbourhood calculation best represents reality? What are the advantages or disadvantages of the Triangle Neighbours? In Section 4.2.7, I described the four possible ways that Horizontal and Vertical Convexity could be calculated by combining the angles to the left and right triangles. I chose to subtract one value from the other, because it is a meaningful way of combining the two angles, but does so by effectively ignoring the central triangle, and assuming that the two adjoining triangles are direcuy connected. There are three other techniques (systematically discarding one value, picking the higher or the lower value of the two, or taking the mean of the two angles) that need to be examined to determine which technique is the most accurate, and which technique has the fewest undesirable side effects. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 121 Since slope is such a critical morphometric variable, it would be interesting to compare slope measurements from the field with slope values that have been interpreted from a NetTTN and a DEM. Incorrect slope values direcdy affect the value of any geomorphological work that is performed using DTMs; this effect should be quantified so that it is a known factor in geomorphological studies. If the Delaunay triangulation process needs to be revised to better handle changes in the Z value of points, this would be a fundamental change in the way that NetTINs are constructed. If the method for constructing NetTINs can be altered to better portray individual landforms, this would potentially benefit all future users of NetTINs. In this thesis, I have made use of TRIM data exclusively. I have examined the characteristics of SRTM and USGS 7.5-minute DEM data, and they seem to be suitable for use with the LCS. A full-scale test will be required, however, to see how this data sources affect accuracy values. Some method needs to be devised to provide a quick assessment of accuracy values, as does the Error Matrix, but expected spatial distributions of data need to be accounted for. Perhaps the Error Matrix could be improved with some sort of compactness index, which indicates how well clustered are the patterns that result from a classification. A more in-depth project would be to examine the ways in which humans identify landforms. If I can identify the algorithms by which geomorphologists identify landforms, this information might prove valuable in refining the process for identifying landforms in the Landform Classification System. This might be an opportunity for collaboration with psychologists and computer scientists. The use of morphometries has been limited to identifying landforms in this thesis. In Chapter 3, I discussed how morphometries might be employed to provide additional information, such as average grain size, based on the slope of the identified landforms. Although the Landform Geodatabase that will be created by the LCS will contain mosdy just the type and outline of the landforms on a map sheet, I might be able to augment this information to provide more information for the user of the Landform Geodatabase. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 122 References Afbeen, Tommy (2002). Determining Who is Delaying the System. Mainframe Week (15), April 17, 2002. Available Online at http://www.main frameweek.com/journals/articles/0015/Determining+who+is+ delaying+the+system (November 16,2004). Armstrong, John E . & Tipper, Howard W. (1948). Glaciation of north-central British Columbia. American Journal of Science, 246, 283-310. Base Mapping and Geomatic Services Branch (2004). Trim Overview. http: / /simwww.gov.bc.ca/bmgs/trim/trim/trim /inrtey.html. (December 7, 2004). Batschelet, Edward (1981). Circular Statistics in Biology. New York: Academic Press. Bennett, Matthew R., & Glasser, Neil F. (1996). Glacial geology: ice sheets and landforms. Chichester John Wiley and Sons. Berglund, Byron (Head Programmer at Facet Decision Systems, Inc., 2004). Personal Communication. October 29, 2004. Berry, Joseph K. (W.M. Keck Scholar, University of Denver, 2005). Personal Communication. February 16,2005. Blaszczynski, J. (1997). T^nHforni charactpriyation with geographic information systems. Photogrammetric Engineering and Remote Sensing, 63, 2,183-191. Brabyn,L.K. (1997). Classification of macro landforms using GIS. ITC Journal, 1,25-40. Brown, Daniel G-, Lusch, David P., & Duda, Kenneth A. (1998). Supervised classification of types of glaciated landscapes using digital elevation data. Geomorphology, 21,233-250. Carpenter, G.A., Gjaja, S-, Gopal, S. and Woodcock, C. (1997). A R T neural networks for remote sensing. I E E E Transactions on Geoscience and Remote Sensing, 30(2), 308-325. Carpenter, G . & Grossberg, S. (1992). A self-organi^ng neural network for supervised learning, recognition and prediction. I E E E Communication Magazine, 30(9), 38-49. Chorowicz, J. , Kim, J. , Manoussis, S., Rudant, J., Foin, P. & Veillet, Y. (1989). A new technique for recognition of geological and geomorphological patterns in digital elevation models. Remote Sensing of Environment. 29, 229-239. Clark, Audrey N . (1985). Longman dictionary of geography. Hadow, Essex: Longman. Congalton, R.G. (1991). A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of the Environment. 37, 35-46. Davies, J.L. (1969). Landforms of Cold Climates. Canberra: Australian National University Press. Dikau, Richard. The application of a digital relief model to landform analysis in geomorphology. (1989). In J. Raper (Ed.) Three Dimensional Applications in Geographic Information Systems, (pp. 51-77). London: Taylor and Francis. Doomkamp, John C. & King, Cuchlaine A .M. (1971). Numerical analysis in geomorphology: an introduction. London: Edward Arnold. Endreny, T A . & Wood, E .F . (2001). Representing elevation uncertainty in runoff modeling and flowpath mapping. Hydrological Processes, 15,2223-2236. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 123 ESRI(2004). ESRI GIS and Mapping Software, www.esri.com. (October 21,2004). Evans, Ian S. (1972). General geomorphometry, derivatives of altitude, and descriptive statistics. (1972). In RJ. Chorley (Ed.), Spatial Analysis in Geomorphology. (pp. 17-90). London: British Geomorphological Research Group. Evans, Will, McAllister, Mike, and Snoeyink, Jack (1996). The NetTIN data structure. Unpublished research-implemented in Facet Decision Systems' Cause&Effect language. Facet Decision Systems (2002). Facet Watersheds Data. Vancouver Facet Decision Systems Inc. http://www.facet.com/projects/watersheds data.html (Aug. 18, 2002). Facet (2004). Facet Decision Systems Inc. http://www.facet.com (October 21, 2004). Farr, Tom (SRTM Deputy Project Scientist, 2002). Personal Communication. October 25, 2002. Foody, G.M. , McCullough, M.B. & Yates, W.B. (1995). Classification of remotely sensed data by an artificial neural network: issues related to training data characteristics. Photgrammetric Engineering and Remote Sensing, 61(4), 391-401. Garmin (2004). Rino 130 2-way Radio and Personal Navigator Owner's Manual, http://www.garmin.com/ manuals/Rinol30 OwnersManual.pdf (Oct. 31,2004). Gamer, H.F. (1974). The origin of landscapes. New York: Oxford University Press. Garrett, Aaron (2003). Fuzzy ARTMap Neural Network Implementation Version 1.0 (Computer Software) http: / / www.mathworks.nl/matlabcentral/fueexchange/loadF'ile.do?objectId=4306&objectType=file (October 24, 2004). Geographic Data BC (1992). British Columbia Specifications and Guidelines for Geomatics Content Series Volume 3 Digital Baseline Mapping at 1:20 000. Release 2.0. Victoria: Ministry of Environment, Lands and Parks, Geographic Data BC, Province of British Columbia. Guzzetti, F., & Reichenbach, P. (1994). Towards a definition of topographic divisions for Italy. Geomorphology. 11, 57-74. Hammerstrom, Dan (1993a). Neural Networks at Work. I E E E Spectrum, 30(6) (June, 1993), 26-32. Hammerstrom, Dan (1993b). Working with Neural Networks. I E E E Spectrum, 30(7) (July, 1993), 46-53. Hart, M.G. (1986). Geomorphology. Pure and Applied. London: George Allen and Unwin. Hawkins, Dave (President, Facet Decision Systems, Inc., 2002). Personal Communication. October, 2002. Hobson, R.D (1972). Surface roughness in topography: quantitative approach. In Chorley, RJ . (Ed.), Spatial analysis in geomorphology. (pp. 221-245) New York: Harper and Row. Hora, Z.D. (1988). Sand and Gravel Study 1985: Transportation Corridors and Populated Areas (Open File 1988-27). Victoria, B.C.: Mineral Resources Division, Geological Survey Branch. Jeep (2004). Jeep website (untided). http://www.jeep.com/4x4/index.html?context=grand cherokee-4 wheel drive&type=top (Oct. 31,2004). Jones, M . Tim (2003). A l Application Programming. Hingham, Mass.: Charles River Media, Inc. Kidner, D.B. & Jones, C.B. (1991). Implicit triangulations for large terrain databases. In Proceedings of the second European Conference on GIS (EGIS '91). Brussels. April. 1991. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 124 Kiefte, Michael. Discrim. (Computer Software) 1999. M A T L A B format. Available at http://www.mathworks.com/matlabcentral/fueexchange/loadCategory.do?objectType=category&ob)ectId=6 (February 1, 2005). Kreveld, Marc van (1997). Digital Elevation Models and TIN Algorithms. In Algorithmic Foundations of Geographical Information Systems, (pp. 37-78). New York: Springer-Verlag. Tanrini, Robert & Thompson, Derek (1992). Fundamentals of spatial information systems. London: Harcourt Brace Jovanovich. Lay, Douglas (1940). Fraser River tertiary drainage-history in relation to placer gold deposits. Bulletin No. 3 Victoria, B.C.: British Columbia Department of Mines. Leahy, P. Patrick (1998). Government's chief geologist stresses value of rocks and dirt (press release). Menlo Park, CA: United States Geological Survey. liden, Lars. The A R T Gallery: A Neural Network Simulation Package. (Computer Software) 1995. Windows Operating System. Available at http://cns-web.bu.edu/pub/Mden/WWW/nnet.html (October 24,2004). Liden, Lars H. (1995). The ART Gallery Documentation. V.1.0 8/02/95 http://cns-web-bu.edu/pub/laliden/ WWW/nnet.frame.html (April 18, 2004). Iischinski, Dani (1994). Incremental Delaunay triangulation. In Heckbert, Paul (Ed.), Graphics Gems IV. (pp. 47-59). New York: Academic Press, 1994. MacMillan, R A . , Pettapiece, W.W., Nolan, S.C. & Goddard, T.W. (2000a). A generic procedure for automatically segmenting landforms into landform elements using DEMs, heuristic rules and fuzzy logic. Fuzzy Sets and Systems, 113(1), 81-109. MacMillan, R.A. & Pettapiece, W.W. (2000b). Alberta landforms: quantitative morphometric descriptions and classification of typical Alberta Landforms. Swift Current, Saskatchewan: Minister of Supply and Services Canada. Magellan (2004). Blazer 12 User Manual. http: / / www.magellangps.com/assets/manuals / oldprod / manual Blazerl2.pdf (Oct. 31,2004). Maguire, B. (2003a). Artificial Neural Networks for NetTIN Classification. Unpublished Geography 516 Class paper. Maguire, B. (2003b). Artificial Neural Network Assisted Bedform Identification. Unpublished Geography 508 Class Paper. Maguire, Brad (2004). Neural Networks for Gravel Deposit Prediction. (Unpublished class paper, MINE 578). Mark, D.M. (1975). Geomorphic Parameters: A Review and Evaluation. Geographiska Annaler, 57A(3), 165-177. Martin, C.L. Scott (Programmer, Facet Decision Systems, Inc., 2004). Personal Communication. August 5,2004. Mather, Paul M . (1972). Areal classification in geomorphology. In Chorley, RJ . (Ed.), Spatial analysis in geomorphology. (pp. 305-322) New York: Harper and Row. Mathworks Inc. M A T L A B Release 14. Computer Software. 2004. Windows X P Operating System. Available at http://www.mathworks.com. Meech, John (Professor, School of Mining Engmerring, University of British Columbia, 2003). Personal Communication. Meech, John (2004). Class Notes. MINE 578. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 125 Miliaresis, G . Ch. & Argialas, D.P. (1999). Segmentation of physiographic features from the digital global digital elevation model/GTOPO30. Computers & Geosciences, 25, 715-728. Miliaresis, G . Ch., & Argialas, D.P. (2002). Quantitative representation of mountain objects extracted from the global digital model (GTOPO30). International Journal o f Remote Sensing. 23, 949-964. Ministry of Environment, Lands and Parks (1992). British Columbia Specifications and Guidelines for Geomatrics. Content Seriesr Volume 3: Digital Baseline Mapping at 1:20,000, Release 2.0r January. 1992. Available at http://srmwww.gov.bc.ca/bmgs/trim/lto20specs/specs20.pdf (December 16,2004). Ministry of Sustainable Resource Management (2003). Terrain Resource Information Management Program. http://srmwww.gov.bc.ca/bmgs/tiim/ (Dec. 31, 2003). Mitchell, Colin (1991). Terrain Evaluation. Burnt MilL Harlow, Essex: Longman. NASA (2003). Research opportunities in space science - 2003 NRS 03-OSS-01 http://research.hq.nasa.gOv/code_s/nra/current/ma-03-oss-01/appendA2.html#A.2.5 (January 3, 2004). Openshaw, Stan & Openshaw, Christine (1997). Artificial Intelligence in Geography. Chichester: John Wiley and Sons. Pike, Richard (1988a). Toward geometric signatures for geographic information systems. Proceedings, international symposium on geographic information systems, p. 89-103. Pike, Richard (1988b). The geometric signature: quantifying landslide-terrain types from digital elevation models. Mathematical Geology, 20(5), 491-511. Pike, Richard J. (2002). A bibliography of terrain modeling (geomorphometry), the quantitative representation of topography - Supplement 4.0. USGS Open-File Report 02-465. http://geopubs.wr.usgs.gov/open-file/of02-465/of02-465.pdf (December 2, 2003). Rice, RJ. (1977). Fundamentals of Geomorphology. London: Longman Group. Quackenbush, Paul (Head, Base Mapping and Data Exchange, 2005). Personal Communication, March 1,2005. SAS Institute, 2005. SAS/STAT User's Guide. http://ww.id.uni2h.ch/s0ftware/unk/statmath/ sas/sasdoc/stat/chap26/sect20.htm (March 15, 2005). Seto, Karen C , & Liu, Weiguo (2003). Comparing ARTMap neural network with the maximum-likelihood classifier for detecting urban change. Photogrammetric Engineering and Remote Sensing, 69(9), 981-990. Shaw, John (1994). Hairpin erosional marks, erosional vortices and subglacial erosion. Sedimentary Geology, 91, 269-283. Shaw, Gareth & Wheeler, Dennis (1985). Statistical techniques in geographical analysis. Chichester: John Wiley and Sons. Smalley, I.J. & Unwin, D.J. (1968). The formation and shape of drumlins and their orientation in drumlin fields. Journal of Glaciology, 7, 377-390. SPSS, 2002. Statistical Package for the Social Sciences, release 11.5.0. SPSS Inc. Chicago, II. Statsoft Inc. (2003). Principal Components and Factor Analysis, http://www.statsoft.com/textbook/stfacan.html (September 3,2004). Strahler, A . N . (1956). Quantitative slope analysis of erosional topography. Bulletin of the Geological Society of America, 67(571-596). Strahler, Alan H . & Strahler, Arthur N . (1992). Modem Physical Geography. Fourth Edition. New York: Wiley. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 126 Struik, L.C. , Fuller, E.A. , & Lynch, T . E . (1990). Geology of Prince George (east half). British Columbia f93G east): Descriptive notes and fossil list to accompany maps and geology, bedrock geology, mineral occurrencesr fossil localities. Vancouver: Geological Survey of Canada. Story, M. & Congalton, R. (1986). Accuracy assessment: a user's perspective. Photogrammetric Engineering and Remote Sensing, 52(3), 397-399. Summerfield, Michael A. (1991). Global Geomorphology: an introduction to the study of landforms. Burnt Mill, Harlow, Essex: Longman Scientific and Technical. Sun Microsystems (2004). Sun High End Servers, http://www.sun.com/servers/highend/ (December 7, 2004). Tachikawa, Y., Takasao, T., Shiiba, M . (1996). TTN-based topographic modelling and runoff prediction using a basin geomorphic information system. In HydroGIS 96: Application of Geographic Information Systems in Hydrology and Water Resource Management. Wallingford, Oxfordshire: IAHS Press. Tipper, H.W. (1971). Glacial geomorphology and pleistocene history of central British Columbia. Ottawa: Department of Energy, Mines, and Resources. USGS (2002a). Final Data Coverage Maps, http://srtm.usgs.gov/Data/coveragemaps.html (September 9, 2002). USGS (2002b). Shuttle Radar Topography Mission: Quick Facts, http://srtm.nsgs.gov/mission/quickfacts.html (September 9, 2002). USGS (2004). Obtaining SRTM Data. http://srtm.usgs.gov/data/obtainingHata.rirml (February 25,2004). USGS (2005). Digital Elevation Models (DEMs). http://edc.usgs.gov/products/elevation/detn.html (April 17, 2005). Verbree & Oosterom (2001). Sranline forced Delaunay TENs for surface representation. International Archives of Photogrammetry and Remote Sensing, 34(3/4). Available online at http://www.isprs.org/commission3/annapolis/pdf/Verbree.pdf (April 19. 2005). Vemon, Peter (1966). Pleistocene ice flow over the Ards/Strangford Lough area. County Down. Ireland. Journal of Glaciology, 6,401-409. Way, Douglas S. (1973). Terrain Analysis: A guide to site selection using aerial photographic interpretation. Stroudsburg, PA: Dowden, Hutchinson & Ross, Inc. Williams, Paul W. (1972). The analysis of spatial characteristics of karst terrains. In Chodey, RJ . (Ed.), Spatial analysis in geomorphology. (pp. 135-163) New York: Harper and Row. Winchester, Simon (2001). The Map That Changed the World. New York: Harper Collins. Wood, Joseph. (1996). The geomorphological characterisation of digital elevation models. Leicester, UK: University of Leicester, http://www.geog.le.ac.uk/jwo/research/dem char/thesis/index.html (Aug. 12, 2002). WSI (Wilhelm-Schickhard Institute for Computer Science, University of Tubingen, Germany) Java Neural Network Simulator 1.1. Computer Software. 2002. Java Operating System. Available at http://www-ra.informatik.uni-tuebingen.de/downloads/JavaNNS/. Yokoyama, R., Shirasawa, M. , & Pike, RJ . (2002). Visualising topography by openness - a new application of image processing to digital elevation models. Photogrammetric Engineering and Remote Sensing, 68 (3), 257-285. Zevenbergen, L.W. and Thome, C.R. (1987). Quantitative analysis of land surface topography. Earth Surface Processes and Landforms, 12,47-56. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 127 Appendix A: Glossary The literature on Geographical Information Systems and Geomorphometry contains multiple, overlapping names and definitions for data structures. In this thesis, the following definitions will be used. Artificial Neural Network (ANN): An Aruficial Intelligence tecrinique that is loosely based on the synapses and neurons of the human brain. Aruficial Neural Networks are very good at pattern recognition, generalization, and working with noisy data. Backpropagation: A technique whereby the outputs of an A N N are compared with actual known values during a training session. The difference between those weights is used to adjust the connections throughout the A N N , resulting in increased classification accuracy on subsequent classification attempts. Backpropagation Network; A type of A N N that uses at least one hidden layer. Backpropagation Networks are based on Perceptions, but have additional processing capabilities because of the presence of the hidden layer(s). Breakline: A line on a topographic map that indicates a ridge top, valley bottom or other significant break in slope. In TRIM data, breaklines have Z values for each vertex so that the breaklines can be used as an input in the construction of TINs. Cell: A square sub-unit of a raster. Cells are indivisible; they represent only a single dominant attribute of the land surface that they enclose. Concavity: See Convexity Connection: Connections and Nodes are the core parts of an A N N . Each connection has a weight, which determines how much of the signal from the source neuron is attenuated before it reaches the destination neuron. Convexity: The difference in curvature between two surfaces. In this thesis, convexity is given a positive value, from 180° to greater than 0°, and concavity is folded in with convexity, and is assigned a negative value from less than 0° to -180°. Flat surfaces have a convexity of 0°. Digital Elevation Model (DEM): A D E M is a specific variety of raster, in which each cell contains an elevation value (Figure A-l). In this thesis, the altitude matrix or lattice, a set of elevation points which are laid out on a square grid, will be considered to be the equivalent of a D E M , since it contains the same elevation information, and since an altitude matrix is mathematically equivalent to a D E M . 42 40 39 38 + 42 + 40 + 39 + 38 43 41 40 39 + 43 + 41 + 40 + 39 44 43 41 39 + 44 + 43 + 41 + 39 44 44 41 40 + 44 + 44 + 41 + 40 Digital Elevation Model (DEM) Altitude Matrix Figure A-l. Digital Elevation Models and Altitude Matrices contain exactly the same elevation information, and can be treated interchangeably. Digital Terrain Model (DTM): A digital terrain model is a generic term for any data structure that is used to store continuous elevation information across a surface. DTMs may consist of DEMs, TINs, NetTINs, altitude matrices, or lattices. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 128 Discriminant Analysis (DA): A traditional classification technique that makes use of decision rules to map inputs into an output space. Discriminant Analysis is less sensitive to non-normal data than is the Maximum Likelihood Classifier. Drumlin: A teardrop or elliptically shaped hill produced by the scouring of glaciers and the deposition of the material as glacial till. The aerodynamic shape of drumlins is the result of glacial flow over top of the deposits, once they have been laid down. Epoch: One complete pass of an A N N through a set of taining data. Error Matrix: A technique developed in Remote Sensing for the quantitative determination of classification accuracy. For each classification category, expected values are compared with actual values to determine the accuracy of the classification. An Overall Accuracy value can be determined by combining all of the classes. Esker. A long, sinuous, steep-sided ridge formed by a river running beneath the surface of a glacier. The river lays down a bed of gravel along its path, and when the glacier melts, this material is deposited onto the ground. Factor Analysis: A statistical technique to determine which variables are significant in explaining the variation of a particular variable in question. Principal Component Analysis is one type of factor analysis. Fuzzy ARTMap: A modern Artificial Neural Network that combines Fuz2y Logic with A N N techniques (Carpenter and Grossberg, 1992). Fuzzy ARTMap allows for training by example, generalization, rapid and incremental learning, variable morphology, and the extraction of explicit classification rules. Fuzzy Logic: An alternative to Boolean Logic in which a variable can lie partly in one class and partly in another. Building up these membership functions can lead to an unequivocal classification, that allows for significant variability in the input variables. Geomorphometric Variables: See Morphometric Variables. Kame: Conical or sharp-ridged hills formed in crevasses or depressions on the surface of a glacier. Sands or gravels predominate in poorly stratified layers (Way, 1973). Maximum Likelihood Classifier (ML): A traditional statistical classifier, in which inputs are mapped to outputs. Where pixels are found to belong to more than one class, they are assigned to the class to which they have the highest probability of belonging (PCI, 2003). Mass Points: In TRIM data, point data intended to be used for the creation of DTMs. Mass points do not include significant elevation peaks and pits (these are encoded as spot heights), but indicate the general land shape. Points are collected by photogrammetry in a line, but are not placed on a regular grid. Morphometries: The study of measurements of shape. Morphometric Variables: Variables that describe different aspect of surface shape, for example, elevation, slope, aspect, plan and profile complexity, and hypsometric integral. Technically, these should be called ^omorphometric variables, since they pertain to the shape of the Earth's surface. Multi-Layer Perceptron (MLP): A type of Perceptron that has one or more hidden layers. Network-Integrated Triangulated Irregular Network (NetTIN): (See also Triangulated Irregular Network) A vector data structure for storing representations of the Earth's surface, which is used for hydrological studies. In addition to the incorporation of breaklines, a NetTIN includes river network data. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 129 Raw TIN data structure (elevation TIN data structure may include point X,Y,Z triplets shown) streams (blue). No X,Y,Z triplets are shown. Figure A-2. Network-Integrated Triangulated Irregular Networks (NetTINs) use irregularly spaced X, Y, Z triplets to form triangles, each of which has an implicit slope and aspect value. Neural Network: See Artificial Neural Network. Neuron: See Node. Node (Artificial Intelligence): A core part of the structure of an A N N . A N N s consist o f nodes and connections. Nodes are where the inputs from the connection weights are summed. A n activation function is applied to the summed weights to calculate the output for each node. Node (Graph Theory): A representation of a point, which terminates an edge or joins multiple edges. Directed edges start at a from-node and end at a to-node. See Vertex. P E R C E P T R O N : The first Artificial Neural Network, created by Rosenblatt in 1958. Perceptron: A type of Artificial Neural Network with a single input and output layer. The P E R C E P T R O N (see above) was the first example of this type of A N N , and the name has come to describe all A N N s of this type. Point A feature that represents a single value in a single place. Points are usually defined in terms of their X , Y , and Z Cartesian coordinates. Principal Component Analysis (PCA): A particular type o f Factor Analysis. P C A is a statistical technique to determine which variables are significant in explaining the variation o f a particular variable in question. Synthetic variables (called eigenvectors) are created from the variables provided to explain the variation in the particular variable. The contribution of each variable to the synthetic variable can be shown, and the most important o f these can be chosen to best represent the variation o f the variable in question. Raster: A raster is a grid of square cells. Each cell has an individual value, which may represent any variable that is continuously distributed across a surface, such as S O 2 values, temperature, or elevation values. Shuttle Radar Topography Mission (SRTM): A program to create a D E M for much of the world, flown on the Space Shuttle in 2000. During this mission, the entire globe between 60° North Latitude and 56° South Latitude was mapped in 11 days. Data has recently been released for Canada, and has been available for the United States since 2002. Sigmoid Transfer Function: A function within a Node that takes all of the summed inputs and "squashes" them into a range of 0 to 1 (or —1 to +1). Terrain Resource Information Management (TRIM): The mapping program for the Province of British Columbia, Canada. Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 130 Triangulated Irregular Network (TIN, See also NetTIN): A T I N is a data structure that represents elevation using a series of irregularly spaced triangles. Vertex: The end point of a line, or the connection between two or more lines. W e i g h t A value assigned to a connection between neurons that is used to attenuate the signal from one neuron to the next. In an A N N , the weights are how the learning of the network is stored. Brad Maguire, UBC Geography April 21, 2005 Towards a Landform Geodatabase 131 Appendix B: Calculation of Statistics All of the landforms are areal features at the scale of this analysis (1:20,000). This enables the use in this thesis of some of the remote sensing tools that have been developed by other researchers. In particular, the Error Matrix is a way of comparing a classification with known values to assess the accuracy of the classification. B.1 Error Matrices Congalton (1991) describes the utility of Error Matrices and other measures that are derived from them in comparing classified remote sensing images with "real world" data. Not only can the Error Matrix (Table B-l) provide us with an overall measure of the accuracy of the classification, but it can also provide the individual accuracies for each class. Comparison (Expected) Classification X Y Z Total Study (Observed) X 94 33 33 160 Classification Y 48 136 25 209 z 27 27 95 149 Total 169 196 153 518 Overall Accuracy = (94 + 136 + 95) / 518 = 62.7% Table B-l. Sample Error Matrix for three classes (X, Y, and Z) and the formula for the calculation of Overall Accuracy (based on Congalton, 1991). The Error Matrix shows the number of observed triangles (or pixels) versus what is expected. In Table B - l , the first column indicates that 94 triangles that we expected to belong to Category X were observed in Category X , 48 triangles that we expected to belong to Category X were observed in Category Y and 27 triangles that we expected to belong to X were actually found in Z. The values on the diagonal (italicized) are counts of triangles that are correcdy classified as X , Y, and Z. We sum the triangles on the diagonal and divide these by the total number of triangles to obtain the Overall Accuracy. From the Error Matrix, it is easy to determine the errors of omission and commission. The Producer's Accuracy is a measure of the errors of omission. Its name comes from the fact that these calculations are used when a producer of a classification wants to know how well it worked. Using the values found in Table B - l , we divide the number of correcdy classified pixels by the total number of pixels that we found in the comparison classification. Thus, for Table B - l , the Producer's Accuracy is as follows: X 94/169 = 55.6% Y 136/196 = 69.4% Z 95/153 = 62.1% The User's Accuracy shows the errors of commission; we use it when we want to know the likelihood that a pixel on the map is correct according to the comparison classification. Again, drawing from Table B - l , the User's Accuracy is: Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 132 X 94/160 = 58.8% Y 136/209= 65.1% Z 95/149 = 63.8% Other researchers have made a number of improvements to the basic Error Matrix to address some shortcomings in the original procedure. The first is to normalize the Error Matrix. Congalton (1991) describes an "iterative proportional fitting procedure" (p. 37) to ensure that each column and row sums to 1.0. This factors the omission and commission errors into the final accuracy calculations, unlike the original Error Matrix design. The normalization procedure also makes it much easier to convert values into percentages. In this thesis, I make use of Kappa Analysis as another method for assessing the overall accuracy of the classification. B.2 Kappa Analysis Kappa Analysis is a measure of accuracy that takes into account the fact that even random distributions of triangles will result in some correcdy classified triangles. It produces a statistic known as K (also spelled KHAT) . A K score of 0.0% indicates a completely random result (in which some triangles are classified correcdy), and a score of 100.0% indicates a perfect classification. Negative values of K indicate negative correlations between the classified results and those that are expected. The formula for calculating the K score is: _ _ = — ! _ - i -N2-±(xi+xx+J) Where: R = the number of rows in the Error Matrix X y = the number of observations in row i and column j Xj+ = the marginal totals for row i x+j = the marginal totals for column j N = the total number of observations (Congalton, 1991) Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 133 Index Alma Farm, 80 Altitude, 29, 30, 40,127 Aluminium, 10 Anticlines, 10 A N N , see Artificial Neural Network ArcGIS, 117 Artificial Intelligence, 1, 7,125,127,129 Artificial Neural Network (ANN), ii , 3, 5, 6, 21, 32, 37, 40, 47, 50, 54, 72, 83, 84, 86, 87, 88, 89, 91, 92, 94, 95, 96, 97, 98, 99,100,101,102,103,105,106,110, 112,113,115,116,118,120,123,124, 126,127,128,129,130 Aspect, 5,10, 29, 30, 31, 32, 38, 40,41,43, 46, 47, 48, 49, 59, 63, 72, 99,128,129 Aulacogens, 10 Available Relief, 29, 33,40, 50, 51 Backpropagation, 116,127 Backpropagation Network, 127 Base Vigilance, 92, 94,100,101,102,103, 105 Bauxite, 10 Breakline, 4, 22, 23, 25, 26, 35, 38, 50, 66, 85, 93, 94,112,127,128 Cause<irEffect, 26, 45, 46, 48, 51, 74, 83, 87, 89, 93,103,116,117,118,120,123 Cell, 127 Choke Points, 9,10 Classification, i i , 3, 5, 6, 9,13, 20, 21, 27, 28, 29, 30, 36, 38, 40, 50, 73, 82, 83, 84, 86, 87, 88, 89, 91, 92, 94, 96, 97, 98, 100,101,102,103,105,106,108,109, 110,112,113,115,116,117,118,119, 121,122,124,127,128,131,132 Coal, 10 Cobalt, 10 Commercialization, 6,112,113,119 Concavity, 127 Connection, 127 Contours, 22, 33, 34, 59, 60, 85, 87, 92 Convexity, 46, 48, 49, 59, 65, 72,114,127 Copper, 10 Cuestas, 10 Delaunay Triangulation, 26, 31, 34, 43, 47, 103,113,118,121,124 DEM, see Digital Elevation Model Diamonds, 10 Digital Elevation Model (DEM), 22, 23, 24, 25, 26, 28, 29, 30, 32, 41, 43, 44, 45, 48, 93, 94,120,127,129 Digital Terrain Model (DTM), i i , 1, 3, 6, 9, 13,17,22, 26, 41,127 Discriminant Analysis, 128 Drainage Density, 30, 34, 35, 40, 52, 70, 98, 99 Drainage Relief, 29, 33, 35, 40, 51, 67, 99 Drumlin, i i , 1, 2, 3, 5, 6, 8,17,18,19, 20, 22, 29, 33, 34, 35, 36, 37, 38, 39, 40, 41, 54, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 78, 85, 86, 87, 88, 92, 93, 95, 96, 97, 98, 99,100,101,105, 106,108,109,110,112,114,117,118, 119,125,128 Drumlin Fields, 17, 33,125 D T M , see Digital Terrain Model Elevation, 1, 3, 4, 5, 9, 23, 24, 25, 29, 30, 31, 32, 33, 35, 40, 41, 42, 43, 45, 47, 48, 49, 50, 51, 52, 59, 62, 63, 64, 65, 68, 72, 79, 80, 99,120,122,124,125,126,127, 128,129,130 Epoch, 128 Error Matrix, 5, 30, 40, 89, 90,112,118, 119,121,128,131,132 Esker, ii , 1, 3, 5, 6, 8,13,17,18,20, 22, 29, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 54, 60, 61, 62, 63, 64, 65, 67, 68, 69, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 85, 86, 87, 88, 92, 93, 94,105,106,107,108, 109,110,113,117,118,119,128 Esker Train, 17,18, 78,80 Expert System, 36 Facet Watersheds Data, 1, 7,123 Factor Analysis, 125, 128, 129 Filters, 103,104,105,113 Flyggbergs, 19, 38 Fraser River, 52, 60, 80,117 Fuzzy ARTMap, ii , 5, 6, 21, 37, 38, 40, 54, 72, 83, 88, 89, 92, 94, 96, 97,100,101, 102,105,112,113,116,118,123,128 Fuzzy Logic, 5, 36,128 Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 134 G a r m i n R I N O 130, 77 Genet ic A l g o r i t h m , 115 Geographical Information System (GIS), i i , 1, 7, 8, 24, 25, 41, 43 ,117 ,122 ,123 Geomorph ic W o r k , 30 Geomorphologis ts , 2, 8 ,13, 26, 31, 38, 121 Geomorphomet r ic Variables, 128 G I S , see Geographical Information System G l o b a l Pos i t io i i ing System (GPS) , 6, 38, 74, 77, 78, 79, 80 G o l d , 10 G P S , see G l o b a l Posi t ioning System Gradient, 29, 30, 40 GraveL 2, 9 , 1 0 , 1 1 , 1 3 , 1 7 , 1 8 , 20, 78 ,128 G r a v e l Pits , 11 Gravi ty , 31 G y p s u m , 10 Heurist ic Search, 115 Hor i zon ta l Convexi ty , 29, 30, 31, 32, 40, 47, 49, 65, 99 Hypsometr ic Curve , 29, 33 Hypsometr ic Integral, 29 Hypsometr ic Variables, 33 Hypsometry, 33 I ron Ore , 10 K a m e , i i , 1, 3, 5, 6, 8, 13, 17, 18, 20, 22, 27 ,29 , 33, 34, 35, 36, 37, 38, 40, 41, 47, 50, 60, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73, 80, 86, 88, 9 2 , 1 0 8 , 1 0 9 , 1 1 0 , 1 1 7 , 118 ,119 ,128 K a m e Terraces, 27 K a o l i n , 10 K a p p a Analysis , 40, 84, 89, 91 ,132 Kars t , 2 , 19 , 20 ,126 L a n d f o r m Classification System (LCS) , i i , 1, 5, 6, 8, 9 , 1 0 , 1 1 , 1 2 , 1 7 , 20, 21, 28, 38, 41 , 43, 45, 46, 47, 48, 50, 51, 52, 61, 68, 80, 82, 83, 84, 86, 89, 90, 92, 96, 98, 1 0 0 , 1 0 3 , 1 0 6 , 1 1 0 , 1 1 1 , 1 1 2 , 1 1 3 , 1 1 4 , 115 ,116 ,117 ,119 ,120 ,121 L a n d f o r m Geodatabase, i i , 5, 8, 9 , 1 0 , 1 1 , 121 L a n d f o r m Trainer Modu le , 87 LCS, see Landform Classification System E a r n i n g Rate, 9 4 , 1 0 1 , 1 0 2 , 1 0 3 , 1 0 5 Lesson Database M o d u l e , 114 L imoni t e , 10 L o c a l Relative Relief, 30 L o c a l Relief, 29, 32 local Roughness, 30 Magel lan Blazer 12, 74, 77 Majority Filter, 104 Manganese, 10 Mars , 9 ,120 Mass Points , 128 M A T L A B , 89, 98 ,124 M a x i m u m L i k e l i h o o d Classifier, 5, 36 ,128 M e a n Avai lable Relief, 33, 50, 51, 67, 99 M e a n Elevat ion , 41, 48, 50, 51, 64, 79, 99 M e a n Orientat ion, 20, 30, 35, 40, 52, 70, 7 1 , 9 8 , 9 9 , 1 1 4 , 1 1 7 Memor iza t ion , 100 Meteor Crater, 10 M o n t e Car lo A p p r o a c h , 115 M o o n , 9 ,120 Morpho log ica l E lec t ron , 28 Morphomet r i c Sets M o d u l e , 86 Morphomet r i c Variables, 26, 29, 41, 59, 85, 86, 96 ,128 Morphometr ies , 1, 20, 27, 28, 85 ,112 , 121,128 M o v i n g W i n d o w s , 28 Mul t i -Layer Perceptron, 37, 38 ,128 Natura l Hazards, 9 Negative Openness, 39, 34, 40, 52, 70, 99 N e t T I N , see Network-Integrated Triangulated Irregular N e t w o r k Network-Integrated Triangulated Irregular N e t w o r k ( N e t T I N ) , i i , 1, 3, 4, 5, 6 ,13 , 17, 21, 29, 31, 32, 33, 34, 37, 38, 39, 41, 42, 43, 44 ,45 , 46, 47, 48, 51, 62, 63, 64, 65, 67, 68, 71, 77, 82, 83, 84, 85, 87, 88, 9 2 , 9 3 , 94, 9 8 , 1 0 5 , 1 0 6 , 1 1 2 , 1 1 3 , 1 1 4 , 1 1 8 , 1 2 0 , 1 2 3 , 1 2 4 , 1 2 8 , 1 3 0 Neura l Ne twork , see Ar t i f ic ia l Neu ra l Network . N e u r o n , 129 N o d e , 129 Outwash Pla in , 13 Overa l l Accuracy, 84, 89, 94, 95, 96 ,100 , 1 0 1 , 1 0 2 , 1 0 3 , 1 0 5 , 1 0 6 , 1 0 8 , 1 0 9 , 1 2 8 , 131 Brad Maguire, UBC Geography April 21,2005 Towards a Landform Geodatabase 135 P C A , see Pr incipal Componen t Analysis Peak Densi ty, 29, 34, 40, 51, 68, 69, 99 Peat, 10, 76 Percept ion, 128,129 P E R C E P T R O N , 129 Phosphorous, 10 Pla t inum, 10 Point , 24, 42, 48, 51, 93, 94 ,129 Posit ive Openness, 29, 34, 40, 52, 69, 70, 98, 99 Potash, 10 Predictions, 61, 83, 84 ,108, 111 Prince George, 1,11, 40, 62, 73, 83 Pr inc ipa l Componen t Analysis ( P C A ) , 5, 6, 7, 34, 36, 40, 41, 53, 54, 55, 56, 57, 58, 59, 61, 62, 63, 64, 65, 66, 67, 68, 69, 71, 72, 84, 86, 97, 98 ,117 ,128 ,129 Producer's Accuracy, 84, 89, 90 ,106 ,108 , 109,131 Raster, 129 Refuge, 9 ,10 Reticulation, 29, 34 Ridginess, 29, 34, 40, 51, 59, 69, 72, 99 Roughness Factor, 29, 34, 40, 51, 68, 99 Ruggedness N u m b e r , 29, 34 Salt, 10 Sand, 2, 9 , 1 0 , 1 1 , 1 3 , 1 7 , 1 9 , 37 Shape Files, 117 Shuttle Radar Topography M i s s i o n ( S R T M ) , 1, 2 3 , 1 2 1 , 1 2 3 , 1 2 6 , 1 2 9 Sigmoid Transfer Func t ion , 129 Skewness o f Elevat ion , 64, 65, 99 Slope, 4, 5 ,19 , 23, 26, 29, 30, 31, 32, 34, 40, 41, 42, 43, 44, 45, 47, 48, 49, 59, 63, 68, 72, 79, 98, 9 9 , 1 2 0 , 1 2 1 , 1 2 5 , 1 2 7 , 128,129 Source Density, 30, 35 S R T M , see Shuttle Radar Topography M i s s i o n Standard Devia t ion o f Elevat ion, 33, 99 Standard Devia t ion o f Orientat ion, 30, 35, 40, 5 2 , 7 1 , 9 9 , 1 1 4 , 1 1 7 Study Area , 1, 3, 6, 9 , 1 3 , 1 7 , 1 9 , 31, 33, 34, 35, 38, 40, 63, 74, 75, 76, 81, 83, 108,118 Terrain Resource Information Management ( T R I M ) , 1, 4, 6, 8 ,11 , 22, 23, 26, 38, 40, 42, 54, 62, 68, 74, 75, 76, 77, 78, 81, 83, 84, 85, 87, 93 ,106 ,112 , 1 1 3 , 1 2 0 , 1 2 1 , 1 2 5 , 1 2 7 , 1 2 8 , 1 2 9 Texture Rat io, 29, 33 tin, 10 ,114 T I N , see Triangulated Irregular N e t w o r k Topographic Component , 30, 35, 40, 41 , 50, 66, 97 ,108 ,113 ,117 Topographic Filter, 9 8 , 1 0 5 , 1 0 8 , 1 0 9 , 1 1 2 , 119 Topographic Neighbours , 47, 52, 71 ,114 Trafficability, 10 Tra in ing Set, 9 5 , 1 0 1 , 1 1 2 , 1 1 7 , 1 1 8 Triangle Area , 29, 34, 40, 51 Triangle Neighbours , 43, 45, 46, 47, 48, 50, 7 1 , 1 1 2 , 1 1 3 , 1 1 4 , 1 2 0 Triangulated Irregular N e t w o r k (T IN) , i i , 1, 3, 24, 25, 26 ,28 , 41, 42, 45, 49, 52, 54, 55, 59, 7 9 , 1 2 4 , 1 2 6 , 1 2 8 , 1 3 0 T R I M , see Terrain Resource Information Management Un i t ed States Geologica l Survey ( U S G S ) , 12, 23 ,124 Ups lope Area , 30 Uran ium, 10 User's Accuracy, 84, 89 ,106 ,108 ,109 , 131 U S G S , see Un i t ed States Geologica l Survey Vantage, 9 , 10 Vertex, 129,130 Ver t ica l Convexi ty , 29, 30, 31, 32, 40, 47, 48, 49, 65, 99 Vigi lance, 100 Vo lcan i c D y k e , 10 V o t i n g , 9 2 , 1 0 2 , 1 0 3 , 1 1 8 Weight , 130 X-Coord ina te , 29, 40, 50, 59, 66, 72, 75, 99 Y-Coord ina te , 29, 40, 50, 59, 66, 72, 75, 97, 99 Z values, 4 , 4 3 , 77 ,113 ,127 Brad Maguire, UBC Geography April 21,2005 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0091973/manifest

Comment

Related Items