TOPOGRAPHIC C H A R A C T E R I Z A T I O N FOR D E M ERROR M O D E L L I N G by YANNI XIAO B . S c , Lanzhou University, 1984 M . S c , The Chinese Academy of Sciences, 1987 A THESIS SUBMITTED IN P A R T I A L F U L F I L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF DOCTOR OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Geography) We accept this thesis as conforming to the required standard. T H E UNIVERSITY OF BRITISH C O L U M B I A March 1996 ©Yanni Xiao, 1996 In presenting this degree at the thesis in University of partial fulfilment of the requirements British Columbia, I agree that the for an advanced Library shall make it freely available for reference and study. I further agree that permission for extensive copying of department this thesis for or by his or scholarly purposes may be granted her representatives. It is by the understood that head of copying my or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Q&OGnftPH^/ The University of British Columbia Vancouver, Canada DE-6 (2/88) ABSTRACT Digital Elevation Models have been in use for more than three decades and have become a major component of geographic information processing. The intensive use of D E M s has given rise to many accuracy investigations. The accuracy estimate is usually given in a form of a global measure such as root-mean-square error (RMSE), mostly from a producer's point of view. Seldom are the errors described in terms of their spatial distribution or how the resolution of the D E M interacts with the variability of terrain. There is a wide range of topographic variation present in different terrain surfaces. Thus, in defining the accuracy of a D E M , one needs ultimately to know the global and local characteristics of the terrain and how the resolution interacts with them. In this thesis, DEMs of various resolutions (i.e., 10 arc-minutes, 5 arc-minutes, 2 km, 1 km, and 50 m) in the study area (Prince George, British Columbia) were compared to each other and their mismatches were examined. Based on the preliminary test results, some observations were made regarding the relations among the spatial distribution of D E M errors, D E M resolution and the roughness of terrain. A hypothesis was proposed that knowledge of the landscape characteristics might provide some insights into the nature of the inherent error (or uncertainty) in a D E M . To test this statistically, the global characteristics of the study area surfaces were first examined by measures such as grain and those derived from spectral analysis, nested analysis of variance and fractal analysis of DEMs. Some important scale breaks were identified for each surface and this information on the surface global ii characteristics was then used to guide the selection of the moving window sizes for the extraction of the local roughness measures. The spatial variation and complexity of various study area surfaces was characterized by means of seven local geomorphometric parameters. The local measures were extracted from DEMs with different resolutions and using different moving window sizes. Then the multivariate cluster analysis was used for automated terrain classification in which relatively homogeneous terrain types at different scale levels were identified. Several different variable groups were used in the cluster analysis and the different classification results were compared to each other and interpreted in relation to each roughness measure. Finally, the correlations between the D E M errors and each of the local roughness measures were examined and the variation of D E M errors within various terrain clusters resulting from multivariate classifications were statistically evaluated. The effectiveness of using different moving window sizes for the extraction of the local measures and the appropriateness of different variable groups for terrain classification were also evaluated. The major conclusion of this study is that knowledge of topographic characteristics does provide some insights into the nature of the inherent error (or uncertainty) in a D E M and can be useful for D E M error modelling. The measures of topographic complexity are related to the observed patterns of discrepancy between DEMs of differing resolution, but there are variations from case to case. Several patterns can be identified in terms of relation between D E M errors and the roughness of terrain. First of all, the D E M errors (or elevation differences) do show certain consistent correlations with each of the various local roughness iii variables. With most variables, the general pattern is that the higher the roughness measure, the more points with higher absolute elevation differences (i.e., horn-shaped scatter of points indicating heteroscedasticity). Further statistical test results indicate that various D E M errors in the study area do show significant variation between different clusters resulting from terrain classifications based on different variable groups and window sizes. Cluster analysis was considered successful in grouping the areas according to their overall roughness and useful in D E M error modelling. In general, the rougher the cluster, the larger the D E M error (measured with either the standard deviation of the elevation differences or the mean of the absolute elevation differences in each cluster). However, there is still some of the total variation of various D E M errors that could not be accounted for by the cluster structure derived from multivariate classification. This could be attributed to the random errors inherent in any of the DEMs and the errors introduced in the interpolation process. Another conclusion is that the multivariate approach to the classification of topographic surfaces for D E M error modelling is not necessarily more successful than using only a single roughness measure in characterizing the overall roughness of terrain. When comparing the D E M error modelling results for surfaces with different global characteristics, the size of the moving window used in geomorphometric parameter abstraction also has certain impact on the modelling results. It shows that some understanding of the global characteristics of the surface is useful in the selection of appropriate/optimal window sizes for the extraction of local measures for D E M error modelling. Finally, directions for further research are suggested. iv T A B L E OF CONTENTS ABSTRACT ii T A B L E OF CONTENTS v LIST OF T A B L E S xi LIST OF FIGURES xx ACKNOWLEDGEMENTS 1. xxx INTRODUCTION 1 1.1 Overview 1 1.2 Background of Digital Terrain Modelling 4 1.2.1 Digital terrain models 4 1.2.2 Digital Terrain Models in geographic modelling 12 1.2.3 Topographic data availability 13 1.2.4 Defining accuracy and error 16 1.3 Statement of the Problem 21 1.4 Thesis Outline 25 v 2. ERROR IN DIGITAL T E R R A I N M O D E L S 27 2.1 Overview 27 2.2 Error Detection and Rectification in Digital Terrain Models 28 2.2.1 Errors in USGS gridded DEMs 28 2.2.2 Gross-error detection and correction 31 2.3 Accuracy Estimation and Error Modelling 33 2.3.1 Global accuracy measure 33 2.3.1.1 Terminology 33 2.3.1.2 Topographic map accuracy standards 38 2.3.1.3 Effects of check points on the reliability of 2.3.1.4 2.4 3. accuracy estimates 45 Interpolation accuracy 47 2.3.2 Spatial variation of interpolation errors 59 2.3.3 Propagation of D E M error 61 Summary 61 S T U D Y A R E A D A T A A N D S O M E P R E L I M I N A R Y TESTS 63 3.1 Study Area Selection 63 3.2 Database Implementation 67 3.2.1 NGDC10 67 3.2.2 NGDC5 70 3.2.3 WSC2 73 vi 3.3 3.2.4 EMR1 75 3.2.5 T R I M 93G and T R I M 93H 80 Some Preliminary Tests 91 3.3.1 Topographic profile comparison 94 3.3.2 D E M spatial mismatch 96 3.3.2.1 Two comparisons between DEMs for the whole study area 3.3.2.2 Four comparisons between DEMs for the two subareas 4. 97 97 3.4 Observations Based on Preliminary Tests 107 3.5 Thesis Hypothesis 112 METHODOLOGIES FOR TOPOGRAPHIC C H A R A C T E R I Z A T I O N 115 4.1 Overview 115 4.2 Extraction of the Local Geometric Measures 119 4.2.1 Local relief (LR) 120 4.2.2 Standard deviation of altitude (SD) 121 4.2.3 Slope and aspect (a and p) 122 4.2.4 Roughness factor (RF) 125 4.2.5 Slope curvature (SQ 128 4.2.6 Number of higher points (HP) 129 4.2.7 Hypsometric integral (HI) 129 vii 4.3 4.4 4.5 5. Global Surface Characterization 132 4.3.1 Grain 133 4.3.2 Spectral analysis of terrain 133 4.3.3 Nested analysis of variance 141 4.3.4 Description of Terrain as a Fractal Surface 147 Hierarchical Terrain Classification 156 4.4.1 Principles of terrain classification 156 4.4.2 Concept of distance 158 4.4.3 Variable standardization 160 4.4.4 Terrain classification 161 What is next? 164 TOPOGRAPHIC C H A R A C T E R I Z A T I O N RESULTS 167 5.1 The Role of Scale in Topographic Characterization 167 5.1.1 Grain determination 168 5.1.2 Fourier interpretation 171 5.1.3 Examining scale effects by nested analysis of variance 189 5.1.4 Interpreting the variograms 196 5.1.5 A comparison of various global surface methods and their results characterization 201 5.2 Local Surface Characteristics 206 5.3 Multivariate Classification Results 219 viii 6. 5.3.1 Variable selection 220 5.3.2 Grouping of homogeneous areas 227 5.3.3 Interpretation of classification results 242 5.3.4 Comparisons of different classification results 264 D E M ERROR MODELLING 277 6.1 Overview 277 6.2 A New Approach to D E M Error Modelling 278 6.2.1 D E M errors and local roughness measures 279 6.2.2 D E M errors in various terrain clusters 294 6.2.2.1 WSC2 D E M errors as compared to T R I M DEM 6.2.2.2 295 E M R 1 D E M errors as compared to T R I M DEM 6.2.2.3 6.2.3 320 NGDC5 and WSC2 D E M errors as compared to EMR1 D E M 337 Significance tests of D E M error variation 355 6.2.3.1 Significance tests of WSC2 and E M R 1 D E M errors in the two subareas 6.2.3.2 Significance test of N G D C 5 and WSC2 D E M errors in the whole study area 6.3 356 Summary 368 374 ix 7. S U M M A R Y A N D CONCLUSIONS 379 7.1 Summary 379 7.2 Conclusions 381 7.3 A Practical Guide for the Users 383 7.4 Discussions and Future Research . . . 384 BIBLIOGRAPHY APPENDIX A: APPENDIX B: 387 A SELECTION OF DIFFERENT DIGITAL TOPOGRAPHIC D A T A PROVIDERS 401 T R I M D A T A M O E P ASCII FILE F O R M A T 407 x LIST OF T A B L E S Table 2.1 Probability of errors in the univariate case 42 Table 2.2a Probability of errors in the bivariate case (MSEP method) 43 Table 2.2b Probability of errors in the bivariate case (CSE method) 43 Table 3.1 A brief description of each sample data set 68 Table 3.2 The 11 parameters in the N G D C 10' global data set 69 Table 3.3 Summary statistics of the study area N G D C 10, NGDC5, WSC2 and EMR1 D E M data 71 Table 3.4 Specifications of EMR1 D E M data collection for the study area 78 Table 3.5 T R I M data collection density for the study area 83 Table 3.6 Statistics of two subset T R I M DEMs 89 Table 3.7 Summary statistics of six D E M comparisons (elevation differences) Table 4.1 Scale variance for the artificial data in Figure 4.7 146 Table 5.1a Scale variances for T R I M 93G D E M data {trim.93g.256) 190 Table 5.1b Scale variances for T R I M 93H D E M data (trim93h.256) 191 Table 5.2 Scale variances for EMR1.128(1) and EMR1.128(2) 194 Table 5.3 A summary of the important scales in the study area surfaces identified by various global methods Table 5.4a . . 110 203 The rank-order correlation matrix of seven variables for T R I M 93G (values above the diagonal) and T R I M 93H (values below the diagonal) subareas (window size: 7x7) xi 224 Table 5.4b The rank-order correlation matrix of seven variables for the whole study area. Values above the diagonal are for window size 5x5 and values below the diagonal are for window size 9x9 Table 5.5a 225 Summary statistics of local relief values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.5b 243 Summary statistics of slope values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.5c 244 Summary statistics of slope curvature values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.5d 245 Summary statistics of hypint values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.5e 246 Summary statistics of standard deviation of elevation values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.5f 247 Summary statistics of highpt values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.5g 248 Summary statistics of rough values (window size: 7x7) within each xii terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.6a 249 Summary statistics of local relief values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area Table 5.6b 257 Summary statistics of slope values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 258 Table 5.6c Summary statistics of slope curvature values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area Table 5.6d 259 Summary statistics of hypint values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 260 Table 5.6e Summary statistics of standard deviation of elevation values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area Table 5.6f 261 Summary statistics of highpt values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 262 Table 5.6g Summary statistics of rough values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 263 Table 6.1a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas Table 6.1b Summary statistics of the absolute WSC2 D E M errors in each terrain xiii 298 cluster (window size: 7x7, variable group: 3-5) for the two subareas . . 300 Table 6.1c Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Table 6.Id 301 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Table 6.2a 302 Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas Table 6.2b 303 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas . . 304 Table 6.2c Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas Table 6.2d 305 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas Table 6.3a 306 Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas Table 6.3b 307 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas . . . 308 Table 6.3c Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas Table 6.3d Summary statistics of the absolute WSC2 D E M errors in each terrain xiv 309 cluster (window size: 21x21, variable group: 2) for the two subareas . . 310 Table 6.4a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas Table 6.4b 312 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas . . . 313 Table 6.4c Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas Table 6.4d 314 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas . . 315 Table 6.5a Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas Table 6.5b 321 Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas . . 322 Table 6.5c Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Table 6.5d 323 Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Table 6.6a 324 Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas Table 6.6b 325 Summary statistics of the absolute E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas . . 326 xv Table 6.6c Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas Table 6.6d 327 Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas . . Table 6.7a 328 Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas Table 6.7b 329 Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas . . . 330 Table 6.7c Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas Table 6.7d 331 Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas . . 332 Table 6.8a Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas Table 6.8b 333 Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas . . . 334 Table 6.8c Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas Table 6.8d 335 Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas . . 336 Table 6.9a Summary statistics of NGDC5 D E M errors in each terrain cluster xvi (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Table 6.9b 338 Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Table 6.10a 339 Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Table 6.10b 341 Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Table 6.11a 342 Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Table 6.1 lb 343 Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Table 6.12a 344 Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area Table 6.12b 345 Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole xvii study area Table 6.13a 346 Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Table 6.13b 347 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Table 6.14a 348 Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Table 6.14b 349 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Table 6.15a 350 Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Table 6.15b 351 Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Table 6.16a 352 Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area 353 xviii Table 6.16b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area 354 xix LIST OF FIGURES Figure 1.1 The five generic forms of DTMs 8 Figure 2.1 Comparison of D T M surfaces with the 'true' terrain surface 35 Figure 2.2 Factors influencing the performance of a D T M 48 Figure 3.1 5' D E M image of British Columbia 64 Figure 3.2 Study area map sheet numbers 66 Figure 3.3 Statistical summary of the study area 10', 5', 2 km and 1 km D E M data 72 Figure 3.4 W S C data coding method 74 Figure 3.5 W S C 2 km D E M data points 76 Figure 3.6 E M R 1 D E M data points 79 Figure 3.7 Full-size and subset D E M images for subarea T R I M 93G 86 Figure 3.8 Full-size and subset D E M images for subarea T R I M 93H 87 Figure 3.9 Histograms of TRIM93G.SUB and TRIM93H.SUB DEMs 88 Figure 3.10 Slope histograms for subareas T R I M 93G and T R I M 93H 90 Figure 3.11 Registration of various study area D E M data sets 92 Figure 3.12 Four D E M images of the whole study area and the D E M images of the two subareas (93G and 93H) 93 Figure 3.13 Visual comparison of 10' and 1 km topographic profiles 95 Figure 3.14 Comparison results of the study area DEMs N G D C 5 and E M R 1 Figure 3.15 Comparison results of the study area DEMs WSC2 and EMR1 xx .... 98 99 Figure 3.16 Comparison results of EMR1 and T R I M DEMs for subarea 93G . . . . 101 Figure 3.17 Comparison results of EMR1 and T R I M DEMs for subarea 93H . . . . 102 Figure 3.18 Comparison results of WSC2 and T R I M DEMs for subarea 93G . . . . 103 Figure 3.19 Comparison results of WSC2 and T R I M DEMs for subarea 93H . . . . 104 Figure 3.20 Comparison results of WSC2 and T R I M DEMs for subarea 93G using three different interpolators Figure 3.21 105 Comparison results of WSC2 and T R I M DEMs for subarea 93H using three different interpolators Figure 3.22 106 Differences between the 50 m DEMs interpolated from WSC2 D E M using different interpolators (ncp=0, 2, 4) for subarea 93G Figure 3.23 108 Differences between the 50 m DEMs interpolated from WSC2 D E M using different interpolators {ncp-0, 2, 4) for subarea 93H 109 Figure 4.1 Forms of surface roughness after Mark [1974] 118 Figure 4.2 3 by 3 neighbourhood of elevation points for slope calculation 124 Figure 4.3 Vector strength and vector dispersion 127 Figure 4.4 2-D spectrum of B.C. 5' D E M 139 Figure 4.5 Hierarchical structure of a square matrix 144 Figure 4.6 A numeric example of a 16 by 16 even hierarchy (after Moellering and Tobler, 1972) Figure 4.7 Figure 4.8 145 Self-similar surfaces generated with a fractional Brownian process using a range of H values 152 Generic sequence of automated terrain classification 163 xxi Figure 4.9 Major steps for the study area surface characterization and classification Figure 5.1a 166 Graphs used to determine the surface grain for the two subareas 93G and 93H 169 Figure 5.1b Graph used to determine the surface grain for the whole study area . . 170 Figure 5.2a 2-D spectrum of the study area derived from 5' D E M 172 Figure 5.2b 2-D spectrum of the study area derived from EMR1 D E M 173 Figure 5.2c 2-D spectrum of subarea 93G 174 Figure 5.2d 2-D spectrum of subarea 93H 175 Figure 5.3a The spectrum plot of an east/west topographic profile from TRIM93G.SUB D E M Figure 5.3b 177 The spectrum plot of a north/south topographic profile from TRIM93G.SUB D E M Figure 5.3c 178 The spectrum plot of an east/west topographic profile from TRIM93H.SUB D E M Figure 5.3d 179 The spectrum plot of a north/south topographic profile from TRIM93H.SUB D E M Figure 5.4a 180 The spectrum plot of topographic profile emr.profile 1 from E M R 1 DEM Figure 5.4b 182 The spectrum plot of topographic profile emr.profile2 from E M R 1 DEM Figure 5.4c 183 The spectrum plot of topographic profile emr.profile3 from E M R 1 xxii DEM Figure 5.4d 184 The spectrum plot of topographic profile emr.profile4 from E M R 1 DEM Figure 5.4e 185 The spectrum plot of topographic profile emr.profile5 from E M R 1 DEM Figure 5.4f 186 The spectrum plot of topographic profile emr.profile6 from E M R 1 DEM Figure 5.5a 187 The spectrum computed for TRJJVI93G and TRIM93H data. Each aggregated in an even manner to form a hierarchy Figure 5.5b 192 The spectrum computed for EMR1.128(1) and EMR1.128(2) data. Each aggregated in an even manner to form a hierarchy 193 Figure 5.6a Variogram plots of T R I M 93G and 93H DEMs 198 Figure 5.6b Variogram plot of EMR1 D E M 200 Figure 5.7a Grayscale image representations of local relief for 93G and 93H . . . . 207 Figure 5.7b Grayscale image representations of the standard deviation of elevations for 93G and 93H Figure 5.7c 208 Grayscale image representations of slope and aspect for 93G and 93H Figure 5.7d 209 Grayscale image representations of roughness factor for 93G and 93H 210 Figure 5.7e Grayscale image representations of slope curvature for 93G and 93H . 211 Figure 5.7f Grayscale image representations of the number of D E M points that are xxiii higher than the center point of the moving window (subarea: 93G and 93H, window size: 7x7 and 21x21) Figure 5.7g 212 Grayscale image representations of hypsometric integral for 93G and 93H Figure 5.8a 213 Grayscale image representations of local relief and standard deviation of elevations for the whole study area Figure 5.8b Grayscale image representations of slope, slope curvature and roughness factor for the whole study area Figure 5.8c 217 The scattergrams of all the variable pairs (window size: 7x7) for subareas 93G and 93H Figure 5.9b 216 Grayscale image representations of HI and HP for the whole study area Figure 5.9a 215 222 The scattergrams of all the variable pairs for the whole study area . . . 223 Figure 5.10a Classification results (3 classes) based on variable group (3-5) and moving window size (21x21) for subareas 93G and 93H 229 Figure 5.10b Classification results (3 classes) based on variable group (3-6) and moving window size (21x21) for subareas 93G and 93H 230 Figure 5.10c Classification results (3 classes) based on variable group (2) and moving window size (21x21) for subareas 93G and 93H 231 Figure 5.10d Classification results (3 classes) based on variable group (5) and moving window size (21x21) for subareas 93G and 93H Figure 5.11a Classification results (3 classes) based on variable group (3-5) and xxiv 232 moving window size (7x7) for subareas 93G and 93H 233 Figure 5.11b Classification results (3 classes) based on variable group (3-6) and moving window size (7x7) for subareas 93G and 93H 234 Figure 5.11c Classification results (3 classes) based on variable group (2) and moving window size (7x7) for subareas 93G and 93H Figure 5 . l i d 235 Classification results (3 classes) based on variable group (5) and moving window size (7x7) for subareas 93G and 93H 236 Figure 5.12a Classification results (3 classes) based on variable group (3-5) and moving window sizes (5x5) and (9x9) for the whole study area 237 Figure 5.12b Classification results (3 classes) based on variable group (3-6) and moving window sizes (5x5) and (9x9) for the whole study area 238 Figure 5.12c Classification results (3 classes) based on variable group (2) and moving window sizes (5x5) and (9x9) for the whole study area 239 Figure 5.12d Classification results (3 classes) based on variable group (5) and moving window sizes (5x5) and (9x9) for the whole study area 240 Figure 5.13a Classification error matrices and K H A T statistics for 12 comparisons of classification results for subarea 93G (nclass=3) 267 Figure 5.13b Classification error matrices and K H A T statistics for 4 comparisons (difference between the two window sizes) of classification results for subarea 93G (nclass=3) 268 Figure 5.13c Classification error matrices and K H A T statistics for 12 comparisons of classification results for subarea 93H (nclass=3) xxv 270 Figure 5.13d Classification error matrices and K H A T statistics for 4 comparisons (difference between the two window sizes) of classification results for subarea 93H (nclass=3) Figure 5.13e 271 Classification error matrices and K H A T statistics for 12 comparisons of classification results for the whole study area (nclass=3) Figure 5.13f 272 Classification error matrices and K H A T statistics for 4 comparisons (difference between the two window sizes) of classification results for the whole study area (nclass=3) 273 Figure 5.13g Summary of the K H A T statistics for all classification comparisons for the whole study area and the two subareas (nclass=3) Figure 6.1a The scattergrams between WSC2 D E M error and local relief (window size 7x7) for the two subareas Figure 6.1b 282 The scattergrams between WSC2 D E M error and slope (window size 21x21) for the two subareas Figure 6.3a 283 The scattergrams between WSC2 D E M error and 'cwrv' (window size 7x7) for the two subareas Figure 6.3b 281 The scattergrams between WSC2 D E M error and slope (window size 7x7) for the two subareas Figure 6.2b 280 The scattergrams between WSC2 D E M error and local relief (window size 21x21) for the two subareas Figure 6.2a 274 284 The scattergrams between WSC2 D E M error and variable 'cwrv' (window size 21x21) for the two subareas xxvi 285 Figure 6.4a The scattergrams between WSC2 D E M error and 'hypinf (window size 7x7) for the two subareas Figure 6.4b 286 The scattergrams between WSC2 D E M error and variable 'hypinf (window size 21x21) for the two subareas Figure 6.5a 287 The scattergrams between WSC2 D E M error and 'sta" (window size 7x7) for the two subareas Figure 6.5b 288 The scattergrams between WSC2 D E M error and variable 'std' (window size 21x21) for the two subareas Figure 6.6a 289 The scattergrams between WSC2 D E M error and 'highpf (window size 7x7) for the two subareas Figure 6.6b 290 The scattergrams between WSC2 D E M error and variable 'highpf (window size 21x21) for the two subareas Figure 6.7a 291 The scattergrams between WSC2 D E M error and 'rough (window size 7 7x7) for the two subareas Figure 6.7b 292 The scattergrams between WSC2 D E M error and variable 'rough' (window size 21x21) for the two subareas Figure 6.8a Histogram of WSC2 D E M errors (3 clusters, variable group: 3-5, window size: 7x7) for the two subareas Figure 6.8b 296 Histogram of WSC2 D E M errors (4 clusters, variable group: 3-5, window size: 7x7) for the two subareas Figure 6.9a 293 297 The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21) xxvii 357 Figure 6.9b The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21) Figure 6.9c The significance test results for WSC2 D E M errors for the two subareas (variable group: 2, window size: 7x7 and 21x21) Figure 6.9d 360 361 The significance test results for WSC2 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21) 362 Figure 6.10a The significance test results for E M R 1 D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21) 364 Figure 6.10b The significance test results for E M R 1 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21) 365 Figure 6.10c The significance test results for EMR1 D E M errors for the two subareas (variable group: 2, window size: 7x7 and 21x21) 366 Figure 6.10d The significance test results for E M R 1 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21) 367 Figure 6.11a The significance test results for NGDC5 D E M errors for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5) . . . . 369 Figure 6.11b The significance test results for N G D C 5 D E M errors for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 9x9) . . . . 370 Figure 6.12a The significance test results for WSC2 D E M errors (as compared to EMR1 D E M ) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5) 371 Figure 6.12b The significance test results for WSC2 D E M errors (as compared to xxviii EMR1 D E M ) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 9x9) 372 Figure A. 1. The areal coverage of E M R G G G D digital terrain files 404 Figure A.2. British Columbia Geographic System (BCGS) 405 xxix ACKNOWLEDGEMENTS I wish to gratefully thank my supervisor, Dr. Brian Klinkenberg, for his very valuable support, guidance, advice and encouragement throughout my years as a student at the University of British Columbia and as a junior professor at the University of Lethbridge. I would also like to acknowledge the help of my advising committee members: Dr. Tom Poiker of Simon Fraser University and Dr. Michael Church of the University of British Columbia. Their ideas, suggestions, expertise and willingness to help are very much appreciated. To Dr. Rostam Yazdanni of B.C. Ministry of Environment, Lands and Parks and Ms. Susan Rountree of Water Survey of Canada my thanks for providing some of the sample test data. M y great appreciation also goes to Dr. Brian Klinkenberg of the University of British Columbia and Dr. Fionn Murtagh of European Southern Observatory, Germany for kindly providing some of the computer programs and codes. Thanks are also due to my best friend Rose Klinkenberg for her kindness in offering me my very first home to stay in Canada and her continuous encouragement and moral support. Finally, a lot of very valuable suggestions and comments on the manuscript, especially from Drs. T. Poiker, M . Church, P. Gong, I. Saunders and B . Klinkenberg, are very much appreciated. This thesis would not have been possible without the welcoming resources of the University of Lethbridge and encouragement from Drs. R. Barendregt and R. Rogerson. xxx 1. INTRODUCTION 1.1 Overview Topographic information about a geographic area is essential in any type of environmental research. It provides an integrating framework for a wide range of land resources, notably surface materials, soils, water, and vegetation, and thus for an equally wide range of users, including disciplines as diverse as forestry, archaeology, and civil engineering [Mitchell, 1991]. Digital representation of topography is particularly suitable for computer-based numeric analysis or modelling, so that 'real world' problems may be approached in a representative and efficient manner through automated means. There are many examples of natural resource applications that would benefit from including topography as a variable in the modelling and prediction of phenomena. Advantages of digitally encoding elevation data include fast manipulation, the ability to handle a large volume of data, and, of course, ease of integration with computerized mapping, geographic information systems (GIS) and remote sensing techniques [Theobald, 1989]. The conceptual framework for a computerized representation of topography is the Digital Terrain Model (DTM). It describes a terrain form and its evolution with the help of a model. Such a model defines observable quantities and relation between them [Frederiksen et al, 1985]. DTMs can be used as an analytical tool for quantitative characterization of topography and, so, they are increasingly being used in a wide range of applications which 1 are dependent on topographic characteristics. This is also partly because recent developments in photogrammetric methods in general, and orthophoto mapping in particular, have resulted in the increasing availability and the mass production of digital topographic data. These data carry the potential for breakthroughs in the characterization of topography. Worldwide, institutions are now involved in the collection of digital topographic data. Users either directly acquire those data that have been produced or collect their own to meet their special needs. Digital topographic data are collected at different scales or spatial resolutions with different standards and accuracy, and they are applied to various geographic models at different scales for different purposes. Scale has always been of primary concern in geography. Considerations of scale underlie geographical study in all of its aspects. A n essential part of the definition of an environmental system is to define its geographic scale [Church and Mark, 1980]. The term 'scale,' however, has many usages in geographical research, each with a different perspective. It can refer to size or area, the preciseness of measurement, the level of generalization, level of data detail or resolution, relative degree or extent, and so on [Griffiths and Doiron, 1971]. Nonetheless, all perspectives are interrelated. This diversity of usage is also reflected in this thesis. There are a wide range of scales at which geographic models operate, from a global-scale (or macroscale) atmospheric general circulation model (GCM) to a regional-scale (or mesoscale) 2 hydrologic model or a detailed (or microscale) representation of soil/slope processes. Each attempts to model different scale-governing or scale-governed processes. Haggett [1965] states that one of the characteristic features of geographical research is its concern with a particular scale of reality. The statement summarizes the essence of what has come to be known as the 'scale problem' in geography [DeLotto, 1989]. This concept is often dichotomized by terms as "local and global." Burrough [1983] illustrates this point by suggesting that variations in soil may arise from processes acting at vastly different scales, such as geology, erosion, and earthworms. Scale has especially long been a fundamental concern in geomorphology. As indicated by Church and Mark [1980], size and form of landscape features are inextricably connected through the function and scale of geomorphic systems. Global coverage of topographic data at fine resolution is unlikely to be available and may even be unnecessary. Data at coarse resolution are highly generalized, with much high- frequency spatial variation (i.e., geographic detail) removed. In general, mesoscale features will not be represented within global-scale models. A system which carries too much geographic detail is wasteful, and one which carries too little geographic detail is not much of use [Tobler, 1988]. Considering land evaluation research, for example, data resolutions needed can range from high to coarse depending on whether an intensive or reconnaissance evaluation is required. The resolution is governed by the purpose of the application and who the eventual users will be. It should be fine enough to show all necessary detail but coarse enough to avoid an excessive amount of data. 3 Just as there is no such thing as an absolutely accurate map, so the absolutely accurate terrain model does not exist [Shearer, 1990]. A D T M is merely a digital representation of a continuous topographic surface in the real world. It is known that any abstraction of reality will contain discrepancies from its source. A l l digital terrain models will thus contain inaccuracies to a greater or lesser extent, depending on a number of factors. It is, therefore, desirable to specify the accuracy of the terrain model in such a way that one could assess its suitability to the chosen or intended application. However, this is seldom addressed and, if it is, the accuracy estimate is usually constrained to one of the following: (i) an estimate of the root-mean-square error (RMSE) in the measurement of elevations; (ii) a description of possible source errors due to human mistakes during data acquisition; and (iii) a description of systematic error of striping, or random noise created during the photogrammetric processes. A l l these are considered from a producer's perspective. Seldom is the accuracy estimated, from a user's point of view, in terms of the spatial pattern (or distribution) of errors or how the spatial resolution of a D T M interacts with all kinds of topographic characteristics [Theobald, 1989]. As a result, the 'fitness of use' of a D T M in a specific application is not known and cannot be justified. 1.2 Background of Digital Terrain Modelling 1.2.1 Digital terrain models Although the concept of representing terrain using spot heights has been around for many 4 decades, the idea of creating digital models of the terrain is a relatively recent development. The term Digital Terrain Model (DTM) has historically been the generic term used to refer to any digital representation of a topographic surface. It had its origin in work performed by Miller in the Photogrammetry Laboratory at the Massachusetts Institute of Technology during the late 1950s [Miller, 1957; Miller and LaFlamme, 1958], but only later was its potential thoroughly explored. Miller and his colleagues were conducting research for the U.S. Bureau of Public Roads. The objective was to expedite highway design by digital computation based upon photogrammetrically-acquired topographic data. Their definition was as follows: The digital terrain model (DTM) is simply a statistical representation of the continuous surface of the ground by a large number of selected points with known X Y Z coordinates in an arbitrary coordinate field. There exist several other terms - e.g., Digital Elevation Model (DEM), Digital Height Model (DHM) and Digital Ground Model (DGM) - which are also commonly used. Although in practice these terms are presumed to be synonymous, in reality they often refer to quite distinct products [Kennie and Petrie, 1990]. Burrough [1986], for example, prefers the term Digital Elevation Model (DEM) "for models containing only elevation data" because, he argues, that "the term 'terrain' often implies attributes of a landscape other than the altitude of the land surface This can be seen in the definition of 'land' given by Townshend [1981, p.6] in which 'land' and 'terrain' are treated as synonyms: Land is an area of the Earth's surface which is characterized by a distinctive assemblage of attributes and interlinking processes in space and time of soil and other surface materials, their atmosphere and water, the landforms, vegetation and animal populations, as well as the results of human activity; also included to the extent that they directly influence the characteristics of the land under consideration are 5 suprasurface properties of the atmosphere, subsurface geological characteristics and the nature of the immediately surrounding land and water. We follow Christian and Stewart (1968) in treating 'land' and 'terrain' as synonyms However, many others tend to use the term D E M to represent only gridded matrices of elevations, which have also been called Altitude Matrices [Evans, 1980]; and to use the term D T M to refer to any digital representation of topography. Throughout this thesis, the term D E M refers to a regular array of elevations. The grids usually conform to either the graticule of latitude and longitude or to some grid system such as U T M (Universal Transverse Mercator). Those grids oriented to latitude and longitude are usually referred to as the arc-second or arc-minute data. This format is non-square except at the equator where a unit of latitude and longitude are equal. The term D T M will still be used to represent generally any digital model of a topographic surface and the term 'terrain' and 'topographic surface' will be used interchangeably. The development of DTMs has experienced a tremendous growth in the past three decades or so. This was especially reflected in the Digital Terrain Models Symposium held in St. Louis in 1978 by the American Society of Photogrammetry [American Society of Photogrammetry, 1978]. Many data structures, which may be broadly defined as collections of objects (data) together with the relations among them [Mark, 1978, 1979], or models, have been developed to digitally represent the continuous variation of surface elevations over an area. Among them, the contour or profile method, the uniform or variable grid structure, and 6 the triangulated irregular network (TIN) structure are the most popular ones (see Figure 1.1). Each has its capabilities and limitations with respect to data acquisition, storage efficiency, retrieval and processing speeds, and accuracy of representation of the surface. Because contour lines are drawn on most existing topographic maps to describe the surfaces, they are a ready source of data for digital terrain models. Extensive efforts have been made to capture them automatically using scanners [Burrough, 1986]. Unfortunately, as pointed out by Boehm [1967] and Peucker et al. [1978], while contour encoding can be adaptive to terrain variability (as shown in Figure 1.1), it is very inefficient for many types of computations. Digitized contours, created either manually by table digitizing or automatically by raster scanning, are not especially suitable for computing slope or other derivatives, and so they are usually converted to other formats such as regular grids for further analysis. Another standard cartographic method of portraying the surface is to use a series of parallel profiles showing elevations. A profile trace results from the intersection of a vertical plane with the surface and can be built from a stereo-photo model or from an existing topographic map. The profiles are usually a derived product used for slope analysis. The most common form of D T M , and in many ways the simplest, is the regular grid with the sample points located at the intersections of two orthogonal sets of regularly-spaced parallel lines. The data are usually obtained from quantitative measurements from stereoscopic aerial photographs made on analytical stereo-plotters, such as the Gestalt Photo Mapper (GPM-2) system [Kelly et al, 1977] or alternatively produced by interpolation from irregularly or 7 Figure 1.1 The five generic forms of DTMs. The contour lines are shown in every case for comparison purposes, (partly after Carter [1988]) 8 regularly spaced data points. The simplicity of the grid data structure lies in that only the altitude of the surface at each sample point needs to be measured and stored within the computer. The geographic locations are determined by the grid spacing, and are implicit in the sequential position of the altitude value in the data array. Another advantage of this structure is that the neighbours of a given data point, which are often required in the calculation of some surface parameters for geographic modelling purposes, can be obtained readily from the positions of points within the data array. The major disadvantage of the regular grid is its inability to adapt to areas of varying relief complexity without changing the grid size, hence, there is a tendency toward data redundancy in areas of uniform terrain. This problem can be largely solved by the practice of 'progressive sampling' [Makarovic, 1973] in which stereoscopic aerial photographs are automatically or semiautomatically scanned at grids of increasing fineness in areas of complex relief; thus, a variable instead of uniform grid structure is used. This procedure directly links the terrain characteristics (i.e. the local roughness of the terrain relief) with the sampling density. The TIN is a data structure designed by Poiker (formerly Peucker) and his co-workers [Peucker et al, 1978] for digital elevation modelling that avoids the redundancy of the regular grids and which at the same time is more efficient for many types of computation than systems that are based only on digitized contours. A TIN is a terrain model that uses a set of connected triangular patches based on the triangulation of irregularly spaced 9 observation points (e.g., Delaunay triangulation). These irregularly distributed points are assumed to be sample points from a single-valued surface and they should be critical points on the surface such as peaks, pits, and passes along lines of high information content (e.g., ridges and channel lines) [Peucker and Chrisman, 1975]. The location of these "coordinate random, but surface specific" sample points would, therefore, be dictated solely by the surface being modelled [Peucker et al., 1978]. Discussion and comparison of different data structures for D T M have been given by Boehm [1967], Peucker [1977, 1978], Peucker et al. [1978], Mark [1974, 1975a, 1978, 1979], and Kumler [1992]. Extensive research has also been done on the types of data structures that work best for operations on surfaces [Peucker, 1978]. Boehm, for example, compared the following five forms of topographic information storage: (i) contour points sorted on x; (ii) contour tree ordering; (iii) uniform grid; (iv) uniform grid, incremental altitude; and (v) variable-mesh grid, incremental altitude. Peucker [1977, 1978], Mark [1975a, 1978] and Kumler [1992, 1994] mainly compared the methods of regular grid D E M (surface-random sampling) and TIN (surface-specific sampling). Mark [1975a] and Peucker [1978] showed that rectangular grids need significantly more data points than TINs consisting of surface specific points to be able to represent geomorphometric parameters to the same degree of precision. Kumler [1992, 1994], however, concluded that for a wide range of different types of terrain, the D E M is not only the most efficient sampling scheme in the absence of prior terrain 10 knowledge, but is also the most accurate for a given volume of digital storage. None of the TINs produced in his experiment represented the surface more accurately than a comparablysized D E M . In every study area he examined, the standard gridded D E M produced by the U.S. Geological Survey (USGS) yielded more accurate estimates of the surface elevation than any of the TINs derived from digitized contours or subsets of points sampled from DEMs. In a few cases, the differences were statistically insignificant, suggesting that a TIN is comparable to the D E M . However, in most cases the DEMs were clearly superior. These same conclusions were reached regardless of which error measure (i.e., mean, maximum, PvMSE, or 90th percentiles of error) was considered. In fact, it appears that the per-point overhead associated with the irregular TIN structure tends to outweigh the advantage of variable resolution. This reverses a longstanding presumption in the GIS field, and could weaken the argument for the TIN model significantly [Goodchild, 1992]. Even the author himself found the above conclusions surprising. However, as mentioned earlier, Kumler only tested the TINs derived from digitized contours or subsets of points sampled from DEMs. Since a T I N was clearly designed to be surface specific, then if a D E M was compared to a TIN derived from the stereomodel itself or with prior knowledge of the terrain, a different conclusion might result. So why use gridded data? While gridded DTMs may not be, in some cases, an optimal source of topographic information, they are often the only source, as elevation matrices continue to be the predominant type of data distributed by many government mapping agencies. Although there exist different forms of DTMs or different data structures for 11 DTMs, the regular grid DTMs (regular grid matrices of elevations) are the most easily obtainable and available form, and will continue to be for some time. This is because of the convenience of collecting topographic data using photogrammetric methods, and the ease with which matrices can be handled computationally, in particular for numerical analysis or modelling purposes in raster-based GIS and image processing systems. There is a shortage of algorithms for these kinds of purposes that can be conveniently implemented on irregular structures. Furthermore, the regular geometric structure of grids ensures predictable distances between points, and an even information distribution. This is especially valuable when distance and spatial context are to be considered in terrain classification analysis using D T M [Weibel and DeLotto, 1988]. It is also valuable when data are being assembled for general purposes (as in topographic mapping) with no particular application in view, but many different ones anticipated. Therefore, only the regular grid DTMs will be considered later in this research. 1.2.2 Digital Terrain Models in geographic modelling Traditionally DTMs have been developed to aid in computerized mapping of topography, the determination of earthwork cut-and-fill volumes for road design and other surveying and civil engineering projects, and military applications. As a result they are usually limited to large scale (cartographically speaking) work with high density sampling of the terrain surface. Nowadays DTMs are being widely used in geographic modelling because of topography's predominant role in the processes being modelled and the increasing availability of digital 12 topographic data. Some examples of geographic models which include a D T M as a useful component include: the simulation of wind flow patterns over coastal and mountainous regions [Tesche and Bergstrom, 1978], geometric and radiometric correction of remotely sensed imagery [Wong, 1984], quantification of landslide-terrain types for geological hazard analysis [Pike, 1988a], energy balance simulation over rugged terrain [Dozier and Outcalt, 1979], the study of mesoscale runoff variability [Krasovskaia, 1988], and global climate modelling. These applications of DTMs to scales other than large scale make it necessary to extend the analysis of the metric performance of digital terrain representations to smaller scales. Natural phenomena, such as topography, generally possess a property recently made prominent through the study of Mandelbrot's fractals [Mandelbrot, 1977, 1982]~the closer one looks, the more the self-similar detail. The amount of resolvable detail is closely related to scale. Since the scales (in terms of spatial coverage) at which various geographic models are applied vary from global to regional, and the type of topography often varies from one area to another, the user needs to be able to know whether available digital topographic data adequately meet the needs of different modelling analyses. 1.2.3 Topographic data availability The tremendous development of DTMs is also evidenced by the fact that increasing number of digital topographic data sets are becoming available. Acquisition of machine-processable 13 topographic data is the first step in the creation of a D T M . Digital topographic data may be obtained in a number of ways, from ground surveys, from photogrammetric stereomodels of either aerial photographs or SPOT satellite imagery, from existing topographic maps, or from other systems such as altimeters carried in aircraft or spacecraft and global positioning systems (GPS). In practice, the accuracy and cost aspects are the limiting factors [Blais, 1988]. Currently, direct photogrammetry from aerial photographs or digitizing from contour maps are the usual methods for D T M data collection [Aronoff, 1989]. The development of various new instruments has made some methods more accurate, less laborious, faster, and/or less expensive than others in capturing topographic data. Worldwide, mapping organizations collect, produce and distribute digital topographic data covering either the entire world, a country, a province, a local municipality, or a specific region. Since terrain is almost unchanging and its components are simple and readily observable, worldwide cover could be achieved more easily than in most other thematic surveys. Nationwide D T M collection projects are currently being pursued in most industrialized countries [Weibel and Heller, 1990]. For instance, the National Geophysical Data Center (NGDC) in the United States has a variety of topographic data sets available for use in geoscience applications. The data were obtained from U.S. Government agencies, academic institutions, and private industries. The data coverage ranges from regional to worldwide; data collection methods range from digitization of existing topographic map to satellite remote sensing (e.g., SPOT). 14 A selection of different digital topographic data providers or distributors around the world is given in Appendix A. Seven data providers and/or distributors are profiled: N G D C (the U.S. National Geophysical Data Center), USGS (the U.S. Geological Survey), I G N (the French national mapping agency), OS (the Ordnance Survey of the United Kingdom), E M R (the Department of Natural Resources of Canada), W S C (Water Survey of Canada), and T R I M (Terrain Resource Information Management program of British Columbia). It is apparent that each D T M is produced differently—not only in its data acquisition method or data structure, but also in its geo-referencing coordinate system and spatial resolution. As pointed out earlier, different resolution is required for applications at different scale. The data providers listed here operate in territories of different size and probably characteristically serve different kinds of projects. Thus it is possible that their methods of data acquisition and quality control might vary accordingly. Therefore, it is fair to assume that each D T M would have a different accuracy and would suit a different research objective and operational mandate. Although the purpose to which the D T M will be applied should govern the resolution, accuracy, and precision of the D T M [Carter, 1988], it would appear that most DTMs reflect primarily producers' concerns, not the end-users' objectives. That is, the end-users have little say, in most cases, in the specifications of any digital data collection projects. Finally, it should also be noted that despite progress made in the mass production of large matrices and networks of terrain elevations, severe shortcomings in D T M quality remain, and D T M coverage of many parts of the world is either still lacking, of coarse resolution, or prohibitively expensive. 15 1.2.4 Defining accuracy and error A D T M is only an approximation of the real world topographic surface. As Miller's definition clearly indicates, a D T M is a digital representation of a continuous surface. It represents a topographic surface in terms of a set of discrete spatial coordinates obtained by sampling the surface. Assumptions about surface behavior between sampled points for reconstructing the original surface from the sampled values is inherent in the definition. The amount of information transferred from the input data to the data reconstructed from sampled point data determines the fidelity of the digital terrain model—the greater the amount of information transferred, the higher the fidelity [Makarovic, 1972]. The variations or discrepancies between the original surface and the digital representation of that surface are often referred to as sampling error. How accurately a topographic surface is represented by a D T M essentially depends on the combined interaction of three influencing factors: the sampling density—that is, the size of the interval at which values are picked out in relation to the variability of the surface; the measuring error introduced by sampling an analogue quantity and converting it to digital form; and the interpolation method employed for reconstructing a continuous surface from the sampled discrete data [Tempfli, 1980]. That is, f(x,y) - f (x,y) - 8i(x,y) - f(x,y) sampling measuring interpolation t ^ ^ Sampling (the determination of the sampling strategy), measuring (the determination of the discrete sample values) and interpolation (the reconstruction of the surface) transform f(x, y) 16 into f(x, y). The difference between f(x, y) and f(x, y) is referred to as the error of reconstruction [Tempfli, 1980]. Sampling is the selection of part of an aggregate to represent the whole. A sampling process implements a compression of the input data. In the case of a profile, this process can be formulated as n /*(*) where 6(x) = E/(*Ax)8(x-*Ax) is the unit impulse function and (1-2) Ax is the sampling interval. Ideally, input data should be compressed in such a way that the reestablished information will not be reduced significantly. The reconstruction of the input should be essentially a reverse process of the compression. In practice, however, some loss of information is unavoidable and can usually be accepted. The discrete signal measured at each sample point is g (x) =fi(x) + m (x), where m (x) is the t t t measuring error and is composed of various contributions such as instrumental errors and observational errors. Sampling error—as usually defined—includes the information loss due to sampling itself (/•) and the error introduced by measurement (g,.). Reconstruction of the input (i.e. the continuous topographic surface) is achieved by spatial interpolation, a procedure used to estimate the value of properties at unsampled points within 17 the area covered by sampled points. The rationale behind spatial interpolation is the very common observation that, in general, points that are close together in space are more likely to have similar values than points further apart. This is also known as Tobler's First Law of Geography [Tobler, 1970]. There are many different interpolation methods which have been devised. Most interpolation methods are linear with respect to their parameters and are space invariant. They can be formulated as (1.3) for an infinite record length. The interpolation is characterized by its 'weighting function' a(t). Instead of using the weighting function in the spatial domain, the linear, space invariant system can also be described in the frequency domain by its 'transfer function' (also called 'frequency response') A(v), the Fourier transform of a(t). The advantage of this transition from the spatial domain to the frequency domain is that the relationship between the system input and output can be defined by a multiplication instead of the convolution in the spatial domain. In the absence of a measuring error, sampling and interpolation can be combined to a single linear (but not space invariant) system and its transfer function can be computed as the ratio of output to input signal amplitudes [Tempfli, 1980]. Research has shown that a complex interpolation does not significantly compensate for the information lost through sampling, but may increase the cost. The loss of information depends primarily on the density of sampling. The information which has been lost through 18 sampling cannot be significantly regained by interpolation. The word 'error' is used here in its widest sense to include not only 'mistakes' but also to include the statistical concept of error, meaning 'variation' [Burrough, 1986] or 'uncertainty.' Two major types of error may be recognized: random and systematic. Random errors are unpredictable variations on either side of the 'true' value. Systematic errors, on the other hand, are consistently biased either above or below the true value. Blunders, a third type of error, are gross errors that are usually easily detected and are assumed to be an insignificant source of error in the context of this thesis. 'accuracy' and 'precision.' 'Error' is closely related to the concepts of Accuracy can be defined in terms of the magnitude of the difference between the reported value and the true value, or the value accepted as being true. For data to be accurate, they must be unbiased. systematic error. The bias clearly represents undetected Precision has different meanings with respect to different levels of measurement (i.e., nominal, ordinal, interval and ratio) [Klinkenberg and Xiao, 1990]. For interval/ratio data there is a statistical definition for precision which is usually a measure of the dispersion about the sample mean for replicate observations. This definition of precision is sometimes referred to as relative precision [Kirby, 1985]. In mathematics or computer science, precision refers to the ability to measure or represent data with a certain number of significant digits, a number which should be commensurate with the data collection process, and the nature of the underlying phenomenon. This form of precision is sometimes referred to as absolute precision and is the one used for DTMs. It is clear that 19 with DTMs both high accuracy and absolute precision are required in order to have a good quality product. For nominal/ordinal data, such as land-use information, precision defines 'detail of classification' [Campbell, 1987]. This definition is obviously irrelevant to DTMs. Following the above definitions, D T M accuracy refers to the degree to which a D T M differs from the real surface which it supposedly represents. Understanding of D T M errors requires not only an accuracy measure for each individual point but also an evaluation of the spatial variation of the inaccuracy. Only by looking at the distribution of the inaccuracy can we quantify the nature of the error. A D T M may be said to be "inaccurate" with respect to a specific point on the surface if the information presented is different than ground truth. This is true with respect to an assigned location and in a non-statistical sense. If sampling repeatedly over the surface reveals errors of a similar sort, that is, if the errors are revealed to be systematic, then the surface represented by the D T M is said to be inaccurate or, more specifically, biased. On the other hand, if the errors are revealed to be random, then it would be more correct to refer to the D T M as relatively imprecise. It is important that the D T M , in whatever form, should provide an accurate representation of the real terrain for intended application. It is difficult, however, to test how accurately any digital model actually captures the real world, for normally we do not have any independent model of the real world against which to test our digital model [Carter, 1988]. The only 'true' situation in the context here is the terrain surface itself, and since this condition of 'absolute' accuracy cannot be attained by measurement, the accuracy of any field survey data, 20 photogrammetric measurement or completed map can be assessed only by making comparisons with measurements made to a known higher order of accuracy. Thus, when defining accuracy, in most cases this is strictly relative rather than absolute accuracy [Shearer, 1990]. Perhaps because of this there are only a few developed procedures to account for errors in a D T M . 1.3 Statement of the Problem At present, there exist few geographic information systems which routinely estimate and report measures of the error inherent in any of their products [Goodchild and Wang, 1988; Goodchild, 1992]. Along with many other automated processes, there seems to be a tendency to lose sight of the quality and accuracy of the original data and the effects of subsequent manipulation on those data [Theobald, 1989]. Only recently have the problems of error analysis attracted the attention they deserve. For example, the accuracy of spatial databases was chosen as the subject of the first research initiative of the National Center for Geographic Information and Analysis (NCGIA) in the United States [Goodchild, 1989]. It has been realized that the error pattern of a digital product is important, both for the information it yields itself and as input into more advanced manipulation and analysis. Inaccurate products can lead to false inferences, bad decisions, and even litigation [Goodchild, 1988]. DTMs are usually a fundamental component of spatial database in most GISs and they are used in a very wide range of geographic analyses. Therefore, there is a need to develop a set of procedures to assess, as much as possible, the performance of DTMs in representing and 21 characterizing the actual topographic surface so that the much broader question of error analysis in spatial data can be fully addressed. Some previous research on D T M did address the issue of accuracy and the issue has been dealt with by many other authors [Bethel and Mikhail, 1983; Carter, 1989; Caruso, 1987; Felgueiras and Goodchild, 1995; Fisher, 1991; Frederiksen, 1981; Hannah, 1979; Leberl, 1973; L i , 1991; Shearer, 1990; Tempfli, 1980; Tempfli and Makarovic, 1979; Torlegard et al., 1986; Wehde, 1982]. Classically, the accuracy of a D T M is assessed by comparison of height values derived from the D T M with height values of corresponding check points obtained by measurement of the terrain surface to a known higher order of accuracy. For example, a D T M derived from digitized contours could be checked by measurements made by field survey or photogrammetric measurement. The data obtained from such a comparison will consist of height differences at the tested points which may then be analyzed to yield statistical expressions of the accuracy, such as the root-mean-square error or the standard error [Shearer, 1990]. The accuracy measures were expressed as either an R M S E of the vertical measurement of height or, following the specification of topographic map accuracy standard for contour lines and spot elevations, by stating, for example, that "at least 90 percent of all elevations determined from the contours shall be accurate within one-half the contour interval, and the remaining 10 percent shall be accurate within one contour interval." This kind of accuracy specification is useful for applications in automated mapping and surveying and civil engineering, in which the scales are usually large (1:1,000 to 1:50,000). But in geographic modelling applications, scales are relatively small and many other aspects 22 of topography are of interest as well. So the issue of the 'level of detail' becomes important to the user. The map can be checked and found to be within tolerance (the so-called required accuracy), but a specific user still does not know whether or not it fits the needs of a particular analysis [Ostman, 1987]. Therefore, the concept of spatial resolution is important in error/accuracy analysis because a model could be useless because of inadequate detail for the particular purpose [Klinkenberg and Xiao, 1990] or it could be locally inaccurate but regionally accurate. The construction of a D T M error model that goes beyond what we have now is, thus, very necessary. Accuracy of a D T M entails not only the average departure of points in the D T M from the real ground surface, but also involves the distribution and the non-random spatial component of errors, which have rarely been formally examined. It is difficult to extract much meaning from a single global accuracy measure such as R M S E because data uncertainty will almost always vary spatially across the elevation surface. That is, the nature of the error will determine the usefulness of the D T M for a particular purpose. Furthermore, the spatial variation of error is influenced not only by the characteristics of the topography being modeled but also by the algorithm that produces the elevation model [Wood and Fisher, 1993]. For example, the types of errors associated with interpolating photogrammetricallyderived spot elevations probably differ from those produced by interpolating contour data [Carter, 1989; Hannah, 1979; Wood and Fisher, 1993]. Whether or not a D T M is acceptable for a given application depends upon the objective of 23 the research and the precision required, as well as the resolution of the sampling method and its sensitivity to the variability of the terrain [Theobald, 1989]. So, a selected spatial resolution should reflect a desired level of detail guided by the actual geographic detail of the phenomenon and the application. According to the sampling theorem, terrain variations which correspond to a wavelength less than twice the sampling interval (the corresponding frequency is called the Nyquist frequency) are not represented. Therefore, the detection of a feature is possible only if the sampling rate is at least twice as fine as the size of the smallest feature to be detected. Specification of the Nyquist frequency presupposes that the structure of the perturbation in question is known (i.e., usually, in theory, sinusoidal). In terrain, the features are apt to be irregular. In practice, more than the theoretic minimum information is required, then, to characterize them. This implies that one must know the spatial size of the features in which one is interested before one starts to collect data. Additionally, one cannot expect a collection of digital geographic data to be suitable for all kinds of problem [Tobler, 1988]. The interaction between sampling frequency and terrain variation is, thus, important and needs to be understood because terrain features captured at a certain resolution are dependent upon the relative prominence of large- and small-scale topographic features. In summary, it is evident that DTMs are becoming more and more important in various geographic modelling applications and digital topographic data are becoming more readily available. Meanwhile, there is a growing realization that error analysis is important in determining the usefulness of a D T M . However, current measures of D T M inaccuracy are 24 inadequate. Therefore, there is a need to develop new methods of investigating the error in DTMs—methods that are spatially explicit and relate the spatial pattern of error to the characteristics of topography. 1.4 Thesis Outline This introductory chapter has defined the problem and provided some background information on digital terrain modelling. The rest of the thesis is organized as follows: Chapter 2 gives a literature review on previous research on D T M error modelling and accuracy estimation. Sources of error in DTMs are first identified and some gross-error detection and correction techniques are introduced. Then, common terminologies used in D T M error modelling are described and general empirical and analytical accuracy estimation approaches are reviewed. Chapter 3 describes the collection of some digital topographic data and the implementation of a true multi-scale D E M database for a study area chosen for this research. Some pre- processing such as data extraction, conversion and interpolation is done for the proper registration of the multiple data sets. A number of observations are made regarding D E M errors and their spatial distribution based on some preliminary test results of comparing DEMs of differing resolutions. This Chapter concludes with the statement of the thesis hypothesis and prepares for further exploration of the relation between D E M errors and 25 terrain characteristics. Chapter 4 discusses the methodologies used for D E M error analysis. First, characterization of the variation and complexity of terrain is discussed. Both local and global characteristics are examined. Local characterization is made by means of general geomorphometric parameters such as local relief and slope. Global descriptions are made using grain measures, spectral analysis, nested analysis of variance, and fractal analysis of DEMs. Then, a multivariate statistical analysis method based on local roughness measures for automated hierarchical terrain classification is introduced. Chapter 5 shows the results of the topographic characterization (both global and local) and classification of the study area surfaces and provides interpretations of the characterization results. Chapter 6 presents a new approach to the problem of D E M error modelling. Based on the results from Chapter 3 ( D E M errors) and Chapter 5 (topographic characterization), quantitative relations between the extent and the spatial pattern of D E M errors, spatial resolution, and the terrain characteristics are analyzed. The hypothesis proposed in Chapter 3 is tested in this Chapter. Chapter 7 concludes the thesis with a summary of the methods presented, the study results and recommendations for future research. 26 2. ERROR IN DIGITAL T E R R A I N M O D E L S 2.1 Overview Any particular point in a D T M may not be actually at the elevation recorded for it, and the sources of this error may be multitudinous [Fisher, 1991]. To evaluate the accuracy of a D T M requires consideration of how the D T M is created. The D T M created directly from an existing topographic map can be no more accurate than the source map from which it was derived [Carter, 1988]. Sources of inaccuracy include the original survey by field workers or photogrammetrists, the expertise of the cartographer who generated the map, and of the digitizer operator who converted the contour from analogue to digital form, or, if generated directly from aerial photographs, of the photogrammetrist. The error may be caused by faulty calibration of the measuring instrument, limited precision of the data format, human mistakes in the reading of the instrument and the copying of the figures, or poor interpolation. Therefore, in generating a D T M from a topographic map by digitizing, at least three stages are present when error may be introduced: map compilation, D T M generation from the map, and comparison of D T M elevations with those directly measured from the map. The last of these and a combination of the first two also occur if the D T M is generated directly by photogrammetry [Fisher, 1991]. Error in DTMs is widely acknowledged, and has been the subject of some studies. The rest 27 of this chapter will summarize the previous research in the areas of error detection and rectification in digital terrain models, and D T M accuracy estimation and error modelling. 2.2 Error Detection and Rectification in Digital Terrain Models 2.2.1 Errors in USGS gridded DEMs As mentioned in section 1.2.3, the USGS is one of the two major D E M data producers for the United States. Because of the wide use and study of USGS D E M data sets, errors detected in these data sets are specifically discussed in this section. The USGS D E M User's Guide specifies that the root-mean-square error (RMSE) should be included with all their D E M data [USGS, 1987]. Accuracy testing of DEMs by the USGS consists of comparing the known elevations of at least twenty control points (usually acquired from a map) to the elevations of these points as interpolated from the D E M . The RMSE is then calculated in the Z dimension. The RMSE is defined as n £ « y RMSE = where 8 Z 2 (2.1) i=l is the elevation difference at n individual test points. In addition, it should be noted that most USGS source maps are commonly stated as 28 conforming to the National Map Accuracy Standard, which states that "at no more than 10 percent of the elevations tested will a contour be in error by more than one half the contour interval," as established by comparison with survey data. Based on the source of the D E M and, in part, on the R M S E calculation, the USGS identifies three levels of D E M quality [USGS, 1986]. • Level I DEMs are of the lowest quality and contain no points with elevations in error by more than 50 m. The maximum R M S E permitted for the whole elevation model is 15 m. DEMs derived from profiling high-altitude aerial photography, such as was done with the Gestalt Photo Mapper II (GPM-2) instruments, typically fall within this level. • Level II DEMs have a maximum R M S E of 7 m (half a contour interval) and contain no points with elevations in error by more than twice the contour interval of the source map. DEMs acquired by contour digitizing typically fall within this level. • Level III DEMs have a maximum R M S E of 7 m and contain no points with elevations in error by more than the contour interval of the source map. Digital Line Graph (DLG) DEMs, which incorporate hypsographic and hydrographic data, fall within this level. Caruso [1987] discusses the standards employed by the USGS to evaluate the accuracy of 7.5-arc-minute quadrangle gridded DEMs. The USGS classifies errors in their DEMs into three categories. 'Blunders' are gross errors that sneak into DEMs at creation. They should be easily detected and are, therefore, usually removed from the D E M in the editing stage 29 prior to general release and rarely get into a published D E M . 'Systematic errors' are non- random errors associated with specific procedures that introduce biases and artifacts into the DEM. A n example is 'striping,' an artificially high level of spatial autocorrelation in elevation values along one axis of the D E M . Although such systematic errors are often easily detected, they are not always correctable. 'Random errors,' the third error category, result from measurement error and, unlike systematic errors, reduce precision but do not introduce bias. Carter [1989] also identifies a number of different basic types of error in USGS gridded DEMs but presents an alternative taxonomy. He defines 'relative' and 'global' error based on the extent of the error. The former refers to a situation in which a number of single elevation values are obviously inconsistent relative to their neighbors which, as a group, give an adequate representation of the surface. The latter—global errors—are thought of as those situations where the general form of the land surface is adequately defined by the digital data, but the total model departs significantly from the source map or the actual land surface. Global errors are particularly problematic when matching neighboring 7.5-arc-minute DEMs. The author identifies four types of relative error common in USGS DEMs and demonstrates how these are often difficult to detect and correct. United Kingdom's Ordnance Survey often supplies the R M S E of its digital contour data in addition to the likely R M S E of interpolation as well. Stereo imagery from SPOT Image Corporation is now capable of supporting the generation of a D E M as a standard product on 30 a 10 m grid [Gugan and Dowman, 1988]. Studies have shown the error in these products is less than 10 m R M S E in all three dimensions [Swann et al, 1988]. The inference of the error reporting used by the USGS, the United Kingdom's OS, and, in fact, probably all mapping organizations, is that the error at any point occurs independently of that at any other point. That is, aspatial statistical measures form the basis of their error reporting procedure. This is obviously not the case—'striping' and Carter's 'relative errors', as identified above, are two examples where the error is apparently spatially related. 2.2.2 Gross-error detection and correction Hannah [1979] presents a method for detecting and correcting gross errors in DEMs produced by computer correlation of stereo images—one of the typical methods of matching pairs of corresponding image points for stereoscopic view of the terrain. The basic problem to be solved in automatic digital stereo mapping is the determination of the precise geometric positions of corresponding points (features) on the focal planes of the stereo pair. Once these points have been matched up, it is a straightforward process to photogrammetrically intersect the corresponding rays through the points using the appropriate sensor model to produce digital terrain data in terms of X Y Z coordinates [Panton, 1978]. Algorithms have been developed by Hannah [1979] to detect and correct errors resulting from having mismatched sub-areas of the two images, a problem which can occur for a variety of image- and terrainrelated reasons, including sensor noise, low contrast in portions of the images, relief-induced 31 distortions between the images, and the presence of ambiguities as a result of identical objects or highly periodic textures on the terrain. Based on the assumption that any data points causing sharp discontinuities in the elevations or sudden changes in the surface slopes can be suspected of being errors, the algorithms focus on the use of constraints on both the allowable slope and the allowable change in slope in local areas around each point. Relaxation-like techniques are employed in the iteration of error detection and correction. This method succeeds in identifying and correcting gross errors. However, problems may be encountered along steep ridge lines and an extreme level of surface smoothing sometimes occurs. Norvelle [1992] presents some techniques used in the U.S. Army Engineer Topographic Laboratories for enhancing the accuracy of DEMs extracted automatically from stereo images using digital correlation methods. Window shaping operations are performed within the image correlation process to reduce the size and shape differences between small corresponding areas on stereo images and contribute to a more accurate determination of stereo image correspondence and, consequently, D E M values. A new D E M correction technique—the Iterative Orthophoto Refinements (IOR) method, is also introduced. It can be used to edit and correct D E M values based on the geometric relation between pairs of orthophotos and the D E M used to produce them. Bethel and Mikhail [1983] discuss a method for detecting gross errors, or blunders, in DTMs based on mathematical modelling of the terrain surface by tensor-product B-splines. The 32 authors view this procedure as the first stage of an on-line quality assessment system for DTMs. The procedure is based on fitting the tensor-product of two one-dimensional B - splines locally over the D T M . Residuals are then computed and a statistical test is performed to yield an overall assessment of the presence of outliers (i.e. gross errors) in the D T M . Specific outliers are then identified in a candidate subgroup of the residuals and flagged as gross errors. Tests performed on a set of synthetic surfaces generated using known mathematical functions perturbed with pseudo-random numbers of known variance and actual DTMs reveal that this approach is especially effective in the case of multiple blunders of relatively large magnitude. 2.3 Accuracy Estimation and Error Modelling 2.3.1 Global accuracy measure 2.3.1.1 Terminology As mentioned before (section 1.3), most D T M quality evaluation techniques involve estimating a global accuracy with regard to a reference. In empirical tests on the accuracy of DTMs, a set of check points is used as the 'ground truth' and then the points derived from the constructed D T M surface are checked against the corresponding check points. After that, the differences of the two heights at each test point are obtained. These differences are used to compute statistical values such as the mean difference and standard deviation of the 33 differences which are used as a measure of D T M accuracy [Li, 1991]. Shearer [1990] provides the following terminology given the height differences as represented by v (residuals at n individual test points being v , v„): ; • Algebraic mean The mean is computed as follows: Algebraic mean (2.2) v •= This expression takes into account the sign of the residuals, and will tend to zero if there are similar magnitudes of positive and negative values. If the differences are truly random, a result of y=0 w o u l d gi v e n o indication of what may be quite large differences in height at individual points. On the other hand, if a significant positive or negative value resulted from this computation, this would indicate that there was a systematic component in the residual values, which means that one surface was systematically higher or lower than the other (i.e., biased). The three possible situations which may result from the comparison of height values in a D T M with 'true' heights on the terrain surface are illustrated in Figure 2.1. The diagrams show profiles only. Situation A indicates that one surface is uniformly higher than the other. Situation B illustrates quite random values of v in terms of both magnitude and sign. In the 34 Figure 2.1 Comparison of D T M surfaces with the 'true' terrain surface. 35 case of situation C, the magnitude of v varies quite considerably but majority of differences are in one direction, which reflects a combination of random and systematic effects. • Mean absolute error The mean absolute error is computed in the same way as for the algebraic mean, but the sign of the residuals is ignored. That is: " l l ME = E — ~ v Mean absolute error (2.3) Because the sign is not considered, the result will always be greater than zero and will reflect the average magnitude of the residuals. Shearer [1990] notes that "50% of the residual values will lie in the range -ME to +ME." • Root-mean-square error The problem of having positive and negative values cancel each other out is also obviated through the use of the root-mean-square error (RMSE): Root-mean-square error RMSE = N n-l (2.4) Properly, (n-l) rather than n is used in this case because in dividing by (n-l) rather 36 than n, an unbiased estimate of population parameters is obtained. However, some authors use n because if the number of points tested is large, the difference in the result will be of no significance. Assuming y=0 > a n < ^ m a t there l s a normal distribution, then 68.27% (approximately two-thirds) of the residual values will fall in the range -RMSE to +RMSE. The term standard error is frequently employed in mapping to describe this expression of accuracy. • Standard deviation The standard deviation of residuals is also a commonly used statistical expression, and is computed as follows: Standard deviation 5 N t°i n -1 (2.5) Again, when the number of tested points is large, the above expression can be, and often is, rewritten in the form S= (2.6) i=l n or 37 S = \I(RMSE - V ) 2 2 - (2 7) It is clear from the above that if V=Q , then S = RMSE, and the root-mean-square error can replace the standard deviation. This explains the confusion which may arise from the fact that both root-mean-square error and standard deviation are often referred to as the standard error in the context of mapping accuracy (see below). 2.3.1.2 Topographic map accuracy standards As DTMs are often derived from topographic maps, some topographic map accuracy standards need to be reviewed. There are two principal methods of representing height information in conventional mapping, both of which may be used as input data for DTMs: spot heights and contours. In both cases, the accuracy has to be considered in terms of both horizontal position and vertical height value. For example, a spot height may be incorrectly positioned but correct in height, correct in position but incorrect in height, or, most likely, incorrect in both position and height [Shearer, 1990]. Spot heights are normally assessed with respect to accuracy in terms of R M S E values related to horizontal position and height. Contours are linear features, and their accuracy is not quite so simple to define as point-related data. The common practice is to establish a tolerance (RMSE) value for permissible errors in contours, and then to check that points on the map fall within such a tolerance. The accuracy specified is determined with reference to the interrelated factors of scale, terrain slope and vertical interval, and is based on the R M S E 38 values established for points. The normally accepted standard for contours is about three times that which can be attained for measured points. It is normally expressed in terms of probability related to the vertical interval. For example, "90% of tested points should be within one-half of the vertical interval of their true value." In the United States, the definitive accuracy standard for topographic maps is the National Map Accuracy Standard (NMAS), which is currently applied to the USGS topographic map series. This standard is based on compliance with a horizontal and a vertical accuracy standard which defines the limit of acceptable error in the horizontal and vertical map dimensions. Compliance testing is based on a comparison of at least twenty well-defined map points relative to a survey of higher accuracy. The term "well-defined points" pertains to features that can be sharply identified as discrete points. The horizontal accuracy standard states that, at most, ten percent of the map points tested may have a horizontal error greater than 1/30 inch for map scales smaller than 1:20,000, or 1/50 inch for scales of 1:20,000 or greater. The vertical accuracy standard states that at most ten percent of the map points tested may have a vertical error greater than one-half of the contour interval of the map. In checking elevations taken from the map, the apparent vertical error may be decreased by assuming a horizontal displacement within the permissible horizontal error for a map of that scale [Thompson and Davey, 1953]. The American Society of Civil Engineers [1983] has proposed the Engineering Map Accuracy Standard (EMAS) as an alternative to N M A S for large-scale maps. E M A S gives a statistical 39 expression of map accuracy based on errors in the X Y Z coordinates of at least twenty welldefined and well-distributed sample points. The mean and the standard deviation (referred to as the mean error and the standard error respectively) are calculated for each of the x, y and z dimensions. Compliance testing for E M A S is performed by comparing the computed mean errors and standard errors in the x, y and z dimensions to their respective maximum acceptable limits (referred to as the 'limiting' errors). A t-test and a yf-test are performed for the degree of bias and the degree of imprecision for each dimension. Specific values for the limiting mean error and the limiting standard error are not provided as they are assumed to be application-specific. E M A S is intended to facilitate accuracy testing for a variety of special-purpose maps. However, the limiting horizontal and vertical standard errors can be computed from the horizontal and vertical map accuracy standards of N M A S . The limiting vertical standard error is defined by = 0.60SVMAS (2.8) where VMAS = the vertical map accuracy standard in Z (i.e. one-half the contour interval). The limiting horizontal standard error is defined by s =s o x = 0.466 CMAS (2.9) Jo where CMAS = the circular (or horizontal) map accuracy standard in X , Y expressed at full (ground) scale (i.e. the appropriate fraction, 1/30 inch or 1/50 inch, weighted by the denominator of the scale representative fraction). The constants in equations (2.8) and (2.9) are derived from the 90th percentile of a univariate 40 and bivariate normal distribution respectively [Rosenfield, 1971]—two major types of distribution used in mapping. Table 2.1 lists the probability of errors for different multiples of standard deviation in the univariate case. Tables 2.2a-b give the probability of errors in the bivariate case using two different methods: the mean square error of position (MSEP) and the circular standard error (CSE). The information in the above Tables is derived from cumulative standard normal distribution table values, which assume uncorrelated error, available in most statistics books. From Table 2.1, for example, it can be seen that a 1.645 multiple of the standard deviation will give a ninety percent confidence level for univariate case. Whereas from Table 2.2b, it can be seen that for the circular bivariate case, a 2.146 multiple of the standard deviation will give a ninety percent confidence level. These relations are used to define three classes of map accuracy by the American Society of Photogrammetry [1985], whose proposed accuracy standard for large-scale maps was similar to E M AS until changed by USGS. Merchant [1987] gives a more detailed discussion of the American Society of Photogrammetry standard. The limiting vertical standard error for Class 1 maps is defined in accordance with the N M A S vertical accuracy standard (i.e. one-half the contour interval). The limiting horizontal standard error is more stringent than the N M A S horizontal accuracy standard, and corresponds approximately to a C M A S of 0.54 mm (equivalent to about 1/47 inch) expressed at full (ground) scale of the map. The limiting standard errors for lower-accuracy Class 2 and 3 maps are defined by multiplying the Class 1 limiting standard error by a factor equal to the accuracy class. That is: • A Class 2 map is one with a limiting R M S E twice that of a Class 1 map. • A Class 3 map is one with a limiting R M S E three times that of a Class 1 map. 41 Table 2.1 Probability of errors in the univariate case Multiple of standard deviation % of points falling within this multiple (probability) Common term 1.000 68.27 Standard error 1.645 90.00 Linear map accuracy standard 2.570 99.00 3.000 99.73 'Maximum' (Near certainty) error 42 Table 2.2a Probability of errors in the bivariate case (MSEP method) Multiple of standard deviation % of points falling within this multiple (probability) Common term 1.000 63.21 Mean square error of position 1.520 90.00 Mean square error of position 2.140 99.00 (Rejection level) 2.470 99.78 Mean square error of position Table 2.2b Probability of errors in the bivariate case (CSE method) Multiple of standard deviation % of points falling within this multiple (probability) 1.000 39.35 Circular standard error 2.146 90.00 Circular map accuracy standard 3.035 99.00 (Rejection level) 3.500 99.78 Circular near certainty error 43 Common term Merchant [1987] presents a revised version of the American Society of Photogrammetry standard, which is referred to as the American Society of Photogrammetry and Remote Sensing (ASPRS) spatial accuracy specification for large scale topographic maps. The revised standard expresses accuracy in terms of a limiting R M S E in each dimension. Hence the systematic bias associated with the mean error is not removed in computing map accuracy. Hypothesis testing for precision is performed by comparing the calculated R M S E to the limiting horizontal or vertical R M S E . The limiting horizontal R M S E is computed in the same fashion as the limiting standard errors in the American Society of Photogrammetry standard. However, the limiting vertical R M S E is somewhat more stringent, and is defined as one-third of the contour interval for well-defined points. Another alternative to N M A S to specify the accuracies for height representation on maps is provided by Koppe's formula, which accounts for the effects of terrain slope on mean vertical or horizontal error [Shearer, 1990]: Mean vertical error Mv = ±(A + B * tan(a)) (2.11) Mean horizontal error Mh = ±(B + A * cos(a)) (2.12) where a is the slope angle. Coefficients A and B are empirically-derived constants for a particular map in relation to the scale and accuracy requirements, where A represents the vertical error when the terrain slope is zero and B is related to the horizontal error at a given point. The equation (2.11) shows that a given degree of horizontal error will produce a 44 greater degree of vertical error as the terrain slope rises. Thus any increase in the terrain slope is associated with an increase in vertical error. Regression analysis can be used to estimate A and B based on the observed vertical error and terrain slope associated with a sample of points, although this may not always produce a satisfactory answer. Once the coefficients in Koppe's formula have been estimated for a particular map, the horizontal errors in points or contour lines can be determined. Koppe's formula offers a number of advantages over N M A S as a statement of map accuracy. In contrast to Koppe's formula, N M A S is simply a statement of compliance with an accuracy test rather than a statistical expression of accuracy and it ignores the effects of terrain slope on vertical accuracy. Standards for Canadian topographic maps do not appear to be as well defined. The Canadian Surveys and Mapping Branch designs its maps so that "on Class A maps the contours are accurate to one-half a contour interval." If it is assumed that this represents a 95% confidence level, the allowable root-mean-square height error can be estimated as 0.255 times the contour interval. Points on Canadian Class A maps are to appear within 0.5 mm of their true positions as map scale—this would represent 25 m on the ground for 1:50,000 scale maps [Mark, 1974]. 2.3.1.3 Effects of check points on the reliability of accuracy estimates In the case of the above mentioned empirical standards on D T M accuracy, it is clear that the final D T M accuracy figures, such as the mean error and the standard error estimated from 45 the test results, are definitely affected by the characteristics of the set of check points which were used as the ground truth. L i [1991] has investigated the effects of the check points used in the empirical tests on the reliability of the D T M accuracy estimates. The concept of reliability in this context might be defined as the degree of correctness to which the D T M accuracy figures have been estimated. In any case, it is apparent that the D T M accuracy results obtained from empirical tests are not absolutely certain and one can accept these results only to a certain confidence level. A set of check points can be characterized by three main parameters: (i) the sample size (i.e. the number of test points); (ii) the measurement accuracy of the sample points; and (iii) the distribution of the test points. It seems obvious that the inclusion of more check points in the test will lead to a more reliable result, although more points may lead to more chance of correlated error. However, a large number of check points can be costly to produce and, in some cases, even impossible to provide in the context of D T M accuracy testing. Therefore, an important question which arises is whether a large number of check points is necessary. If not, then what is the minimum number of check points required for a given degree of reliability for the accuracy estimates? From statistical theory, the sample size required for the accuracy estimates depends upon the variation associated with the random variable. The smaller the variation, the smaller the sample size that is needed to achieve the degree of accuracy required for the accuracy estimates. The required minimum sample size also depends on the degree of accuracy requirement itself and the spatial autocorrelation of the data. Some equations are given by L i to provide a general guide to the values required in 46 practice. The reliability of the estimated D T M accuracy figures is also affected by the accuracy of check points, which is usually specified in terms of R M S E or standard deviation. Only if the sample size is increased and the accuracy of the check points is improved at the same time, can the reliability of the final estimates be improved. Another important concern with the check points used for the D T M accuracy test is their distribution. That is, locations and patterns. In some tests, the check points are in a grid pattern. The question is raised as to whether such a pattern is suitable. Ley [1986] states that "an accuracy assessment of a D T M should be based on a sample of heights taken from the entire model." He also points out that such "a sample of points should include both the recorded (measured) and interpolated heights." L i suggests that the check points be sampled randomly from the entire testing area. It is worth noting that although geodetic and other higher order control points would be the most accurate vertical check points, they are very sparsely distributed and, therefore, would not satisfy the minimum sample size requirement. The development of GPS technology may improve this situation, but not without its limitation. 2.3.1.4 Interpolation accuracy The goodness of fit of a D T M to reality depends on the terrain itself, on the sampling pattern « and density, and on the method of interpolating a new point from the measurements [Leberl, 1973] (see Figure 2.2). 47 Sampling density Measuring pattern 4 > Method of Interpolation Type of terrain • • Accuracy of terrain representation Figure 2.2 Factors influencing the performance of a D T M . 48 Interpolation accuracy depends on the nature of the interpolation algorithm, the complexity of the underlying surface, and the distance between control points relative to the frequency of spatial variation. Many of the earliest studies on interpolation accuracy focus on the effects of the number and spatial distribution of sample points [Veregin, 1989]. According to Morrison [1969], the ability of a given set of sample points to capture the variation present in the surface is inversely related to the degree to which the points are clustered in space. In his opinion, systematic sampling is preferable to stratified random sampling, and stratified random sampling is preferable to random sampling. These conclusions are based on the notion that a dispersed sample is more capable of capturing surface variation and the sample is not at the periodicity of the terrain. A random sampling means that each element in the population has an equal chance of being selected. In a spatial context, the simple random sample is derived by establishing a coordinate system with some specified resolution, and taking two random numbers to establish the location of each point. The spatial systematic random sample, on the other hand, establishes an initial location and a fixed interval, and points are selected at that fixed interval in both directions across the area to be sampled. The systematic sample overcomes one of the perceived problems of the simple random sample—that it may not give good coverage within the spatial sampling space. On the other hand, if there is some regular periodicity in the population, systematic sampling is likely to reproduce only a part of that periodicity, or alternatively, to bias the sample by being overrepresented by the periodicity [Clark and Hosking, 1986]. In a stratified random sampling design, the elements of the 49 population are allocated into subareas or strata before the sample is taken, and then each stratum is randomly sampled. Rhind [1971] argues that those conclusions by Morrison are unwarranted, since there is no guarantee that a more dispersed sample will yield a more accurate interpolated surface. Surface-specific sampling, in which sample points are selected if they define critical points in the surface, may exhibit a high degree of spatial clustering but produce more accurate results than systematic sampling. Accuracy may also be affected in systematic sampling if periodicity is evident in the surface, since systematic random sampling method rests on the assumption that there is no periodic trend in the ordering of the individual elements [Clark and Hosking, 1986]. Thus, the appropriateness of a given sampling method depends, in part, on the nature of the surface. The effects of the nature of the surface on interpolation accuracy have been explored by Morrison [1968; 1971]. In an empirical test, four different synthetic surfaces were constructed, each described by a finite set of mathematical terms. Each of the four surfaces was sampled using the six sampling methods described in Morrison [1969], that is, unaligned/aligned random sampling, unaligned/aligned stratified random sampling, and unaligned/aligned systematic sampling. For each surface/sampling method combination, four samples of different sizes (25, 49, 100, and 144) were selected, yielding a total of 96 sets of sample points. For each of these sets, interpolation was performed with ten different interpolation methods. Interpolation accuracy for each of the resulting 960 interpolated 50 surfaces was assessed by comparing observed and interpolated values for a set of 100 grid points. Two indices of accuracy were defined—the correlation between the observed and interpolated values, and the standard deviation of the residuals. For each of the two accuracy indices, three-way analysis of variance was employed to test the effects of interpolation method, sample size and sampling method on interpolation accuracy. For the second index of accuracy, the sampling and interpolation methods were observed to have the greatest effect on accuracy. Sample size and the first-order interactions between the three factors were observed to be only marginally significant. Similar results were obtained for the alternate index of accuracy (i.e., the correlation coefficient), except that sample size was observed to have a significant effect. Comparison of interpolation accuracy for different sampling methods revealed that higher levels of accuracy were associated with unaligned methods. Variations in accuracy associated with sampling method were also observed to have less significant impact as surface complexity increased, due to an overall decline in accuracy for more complex surfaces. A n increase in accuracy was also observed as sample size increased, although this effect, again, was found to be less significant for complex surfaces. It should be noted that some of Morrison's findings are partly attributable to the use of synthetic surfaces exhibiting an unrealistically high degree of smoothness (i.e., without any breaks). This strong condition on the surface smoothness limits the usefulness of the findings. In the real world, irregular functions with breaks such as those encountered in topography are, in fact, more frequent than smooth surfaces. 51 The factors considered by Morrison in his study have since been examined in greater detail by other authors. Shepard [1984], for example, examined the effects of variations in sample size on interpolation accuracy. The author interpolated the grid point values for a 60 x 64 grid using a sample of between 4 and 272 points randomly selected from the grid. The relative R M S E was computed for all 3840 grid points on each interpolated surface. The relative R M S E is defined as the ratio of the R M S E to the standard deviation of the observed grid point values. Regression results indicated a close fit (r = 0.997) between the relative 2 R M S E and sample size according to a relationship of the form: RMSE = 2 . 8 3 n " (2.13) 056 r where: RMSE = the relative R M S E ; and r n = the sample size. Hence as n increases, RMSE decreases at a declining rate. r In a similar study, MacEachren and Davidson [1987] examined the effects of sample size on interpolation accuracy for surfaces of varying complexity. Six surfaces were defined, each of which portrayed topographic elevation values for a 103 x 103 grid. Eight samples of points were obtained for each surface using unaligned stratified random sampling in which sample size varied between 100 and 2025 points. Accuracy was calculated as the mean absolute deviation between actual and interpolated values for all 10,609 grid points on the surface. The mean absolute deviation, d, is defined in the same way as mean error in Equation 2.3. Regression results revealed a close fit between d and n according to an equation of the form 52 d = an' b (2.14) Coefficient b was observed to be relatively constant over all surfaces, with a mean value of approximately 0.3. Coefficient a was found to be directly related to the range of elevation values on the surface, such that the value of a was lower for surfaces with smaller elevation range. The authors examined the spatial distribution of error and observed that errors tended to be more clustered in space for smaller sample sizes. The study of the effects of various factors on grid D T M interpolation accuracy by Leberl [1973] was based on a numerical test. Leberl compared six terrain models created using different interpolation procedures and with grid spacing varying from 10 m up to 450 m. For grid DTMs, comparison of the different interpolation algorithms lead to the conclusion that "linear prediction" (or "least squares interpolation"), "moving averages" and "patchwise polynomial interpolation" provided the highest accuracy. Consideration of computational complexity, however, indicated that the method with "moving averages" was comparatively rather expensive, so that the other two remained as the most effective interpolation methods. In general, however, the difference between interpolation methods was fairly small. In a comparison using the 2 x 2, 4 x 4, or 6 x 6 surrounding reference points for interpolation of a new point, it was concluded that no gain might be expected by using more than the 4 x 4 reference points. On the other hand, use of 4 x 4 points tended to be slightly superior to the use of only the four closest reference points. It was also shown that a linear relation existed between accuracy of interpolation and sampling density. The slope of the linear regression 53 equation was correlated with the terrain type. Obviously, a problem one immediately faces here when "different types" of terrain must be evaluated is to describe a terrain type concisely and quantitatively. In Leberl's study, a simple indicator for the terrain type, namely the "normalized standard deviation of terrain relief," was used. The effects of different interpolation methods have been explored by other authors as well. Hundreds of interpolation algorithms exist and each, of course, offers a slightly different result [Kvamme, 1989]. Leberl [1973], for example, evaluated the effects of varying the weighting function in interpolation methods based on distance-weighted averaging. Accuracy was assessed for each interpolated surface as the mean absolute error. Variations in weight (defining the influence of sample point on grid point as a function of the distance between them) for surface-specific sampling were observed to produce small but systematic changes in the accuracy of the interpolated surface. The aligned systematic sample of points yielded a mean absolute error less than or approximately equal to the surface-specific sample. Braile [1978] examined the effects of four different interpolation methods on interpolation accuracy. The interpolation methods were applied to a sample of 200 random points derived from a digitized map. The R M S E was computed for each interpolated surface as an index of accuracy. The R M S E was observed to vary significantly for the four interpolated surfaces, with the n'^-order polynomial interpolation method yielding the lowest error, and one of the distance-weighted averaging methods yielding the highest error. 54 Tempfli and Makarovic [1979] made a general and uniform evaluation of the performance of three classes of interpolation methods: piecewise polynomials, moving averages, and linear-least-squares algorithms. Transfer functions, which represent a generalized sampling theorem, were determined numerically for different interpolation methods and their variants. Transfer functions establish a relation between sampling density and transfer ratio. The latter is a measure of fidelity of the reconstructed data which is determined by the amount of information transferred from the input data to the data reconstructed from sampled point data. Transfer functions were shown to be useful in identifying the appropriate weight function, limiting distance, and in quantifying the parameters used in the interpolation algorithms. It was concluded that fidelity is strongly affected by the sampling density. The complexity of the interpolation procedure may also have a significant impact if applied with careful consideration. However, the chance for incorrect application is greater for more elaborate interpolation methods than for simple ones. Of the two variants of piecewise polynomial methods tested by Tempfli and Makarovic [1979], the third degree polynomial method performed slightly better than the simple linear interpolation within the entire range of sampling density considered. The moving average methods were represented by several variants of the polynomials of the zero- and the second degree. The two basic versions provided nearly equal results, though the second degree polynomials seemed to be slightly superior. The best variant of the second degree polynomials performed as well as the third degree piecewise polynomial. Similarly, the best variant of the weighted mean (0-degree polynomial) performed as well as the simple linear 55 interpolation. Fidelity was found to be far more sensitive to changes in the parameter values of a weight function than to the different types of weight functions. Kvamme [1989] investigated three elevation interpolation algorithms that produced alternative and slightly different DEMs from digitized contour lines. Three interpolation methods examined included the 'steepest ascent' algorithm, the 'weighted average' algorithm, and the 'vertical scan' technique. Some interpolation artifacts were described by the author, such as the 'bench-like' effect along ridges and drainages, a result of using the 'weighted average' method in regions within horseshoe-shaped contours, and a 'stair-step' effect, a result of using the 'vertical scan' algorithm. Boehm [1967] also discussed different surface interpolation methods from digital contour representation and grid representation. He concluded that the most reliable method is the weighted distance interpolation, which is equivalent to the extension of bilinear interpolation between unequally spaced data points. Higher order interpolation methods can in many cases yield errors as great as or greater than bilinear interpolation. As indicated in section 1.2.4, production of a D T M involves sampling and interpolation. The transfer function of this process defines the fidelity of the reconstructed surface. The transfer function allows for a comparative evaluation of different interpolation procedures and can also be used for determining an adequate sampling interval. If the required quality of a D T M is specified by a standard error, or if the laws of error propagation need to be applied for 56 evaluating the accuracy of the derived product, then additional knowledge is required. It becomes necessary to characterize the terrain by its power spectrum and also the measuring error, which then allows expression of the accuracy of the reconstruction in terms of a variance, or its square root, the standard error. Tempfli [1980] gave a demonstration of D T M accuracy estimation using spectral analysis. To compare the variance estimates with the actual mean square discrepancies, computer generated surfaces were used. Three profiles were selected from the contour plots and three interpolation methods were applied. Reconstructing two surfaces by bilinear interpolation was also studied. The results showed that a higher accuracy was attained by linear interpolation than by the chosen moving average for all three profiles and for all the sampling intervals used. The chosen linear least square interpolator performed better than linear interpolator only when the sampling density was high. The accuracy of the reconstructed profiles decreased, in general, with an increasing sampling interval. It is clear that the sampling density is the decisive influencing factor for the accuracy estimate, not the interpolation method used, assuming a suitable parameter choice is provided. Fractals are a relatively recent field of research and yet a great deal has been written about the fractal nature of topographic surfaces [Klinkenberg and Goodchild, 1992]. The fractal dimension concept is used by Polidori et al. [1991] for grid D T M quality assessment. The authors claim that most D T M evaluation techniques, which consist of estimating a global accuracy with regard to a set of check points, do not detect artifacts such as those caused by digitizing and resampling. By computing the fractal dimension value at different scales and 57 in different directions based on a fractional Brownian motion (fBm) surface model, interpolation artifacts, like excessive smoothness and directional tendency, may be revealed. The D E M used in the study was interpolated (40-meter interval) from digitized contour data (from a 1:25,000 scale topographic map). Estimates of the fractal dimension (D) made from the D E M were observed to be lower when obtained over short distance intervals (1 to 5 pixels, i.e. < 200 m, D = 2.07) than when estimates were obtained using longer distance intervals (10 to 30 pixels, i.e. 400-1200 m, D = 2.25). The authors state that this difference is due to the smoothing inherent in the interpolation process that was used to obtain the gridded data from the initial contours. Goodchild and Tate [1992], however, argue that the central conclusion of this research—that fractal analysis can provide an effective index of D E M quality—appears to be unwarranted based on the results presented, and in the light of the available literature. One of their arguments, for example, is that real surfaces are rarely pure fBm, and often contain clear departures from the fBm model in the form of local linear trends and scale dependencies. In order to detect and prove the existence of an interpolation smoothing effect over short distance intervals, D at short values must be shown to have been significantly reduced beyond the variation expected by chance. A number of authors have proposed techniques for assessing the interactions between interpolation and measurement error associated with sample points. These techniques often suggest ways in which interpolation methods can be devised to minimize the effects of such interactions. A bibliographic review is given by Veregin [1989]. 58 2.3.2 Spatial variation of interpolation errors Many measures of error that have been proposed, such as root-mean-square error for ratio data and the classification error matrix for nominal data , have no spatial dimension and they 1 are of little use in error study of spatial data [Goodchild and Gopal, 1989]. How the error is distributed across the area of any one D E M is still unknown, and factors that may affect the distribution of error are largely unresearched. For effectiveness and accuracy, many GIS studies depend on reliable digital elevation models. To understand the impact of error in D T M on error-sensitive GIS-type applications, the following is important [Wood and Fisher, 1993]: 1) identify the occurrence of uncertainty in spatial data, 2) identify the spatial distribution of this uncertainty, and 3) identify the effect this uncertainty will have on subsequent GIS operations. For the above reasons, the study of spatial variation of error has become the focus of recent research [Theobald, 1989; Shearer, 1990; Csillag et al, 1992; Wood and Fisher, 1993]. Wood and Fisher [1993], for example, describe a visualization method that can be used to identify the spatial variation in D T M interpolation accuracy produced by interpolating digital contour data from Ordnance Survey 1:10,000 digitized contours at 10-meter vertical intervals. 'Error matrix: a common means of reporting site specific accuracy for a classification with a two dimensional array that cross-references the classification results with the reference data. The correctly classified objects are indicated by the diagonal elements and omission and commission errors are indicated by the off-diagonal elements. 59 Four methods of interpolation are applied to the rasterized contour data with a horizontal resolution of 10 m. The interpolation processes are then visualized using different techniques such as shaded relief maps, aspect maps, Laplacian maps, and convexity maps. Each of the techniques emphasized a particular facet of the elevation model. For instance, the shaded relief maps and, especially, convexity maps, showed clearly some terracing effects in the topography due to the distribution of the original contour lines used to represent the surface. Slope direction maps indicated some interpolation artifacts in flatter regions which related to the original profile directions used in a one-dimensional spline fitting interpolator. The visualization and examination of the distribution of root-mean-square error produced by this interpolator also identified patterns of accuracy loss related to the profile directions used in interpolation. Shearer [1990] demonstrates two possible methods of graphical representation of accuracy in digital terrain models showing error magnitude and distribution. Based on the difference values (or residuals) obtained by comparing two gridded DEMs derived from the same digitized contour data, but processed using different interpolation packages, errors at grid nodes can be displayed by means of proportional circles. Different colours can be employed to differentiate between positive and negative residuals. Such representations have the advantage of clearly indicating grid nodes where serious, and perhaps anomalous, errors occur. A n alternative and commonly applied method is to plot the errors as contours. Comparison of such diagrammatic representations with, for example, a plot of the original input contours, can be informative with respect to the occurrence and magnitude of errors in 60 relation to such factors as terrain slopes and distribution of input data. 2.3.3 Propagation of D E M error As reviewed in the above sections, most of the studies on error in DTMs have concentrated on the nature and description of the error, rather than its propagation into derivative products resulting from GIS-type operations. Fisher [1991] examined how RMSEs as reported in each USGS gridded D E M propagated into DEM-derived products such as the yiewshed—the area observable from a viewing location versus that which is invisible. A Monte Carlo simulation and testing approach is used for studying the propagation of D E M error, in which repeated error fields with varying parameters were added to the original D E M , and the viewshed was determined in the resulting noisy D E M . In the absence of any other information on error structure, the assumption of independence implied by the USGS error reporting was used in the test. It assumes that the R M S E is equivalent to the standard deviation of a normal distribution. Results showed that the area of the viewshed calculated in the original D E M may significantly overestimate the viewshed area. 2.4 Summary A literature review of the previous research on D T M error modelling and accuracy estimation has been conducted. Various sources of error in DTMs were identified and different approaches taken by researchers for the evaluation of D T M errors were briefly discussed. 61 Generally, aspatial approach has been taken to the study of error in DTMs. Most studies use a global measure of D T M accuracy, such as the R M S E as used by the USGS. The spatial distribution of errors and its interaction with scale and resolution have rarely been examined. However, a few authors have observed that there appears to be a link between terrain complexity and the spatial distribution of D T M errors. While some recent studies have begun to actively investigate the spatial variation of D T M error, none so far have quantitatively examined the relation between terrain complexity and D T M error. This paper will try to fill this blank and present a new approach to this issue. In the next chapter, the selection of the study area and the implementation of the sample database will be discussed. Some preliminary test results will also be presented. 62 3. S T U D Y A R E A D A T A A N D S O M E P R E L I M I N A R Y TESTS In order to have a better understanding of the issue of scale, accuracy, and spatial pattern of errors in digital topographic modelling, a true multi-scale D E M data set was developed. In this chapter, a study area is selected and some preliminary investigations based on the sample database are conducted. The principal objectives of this are threefold: 1) to demonstrate the availability of digital topographic data from different organizations, at different resolutions, and with different formats; 2) to compare different DEMs and to investigate the interaction between the terrain variation and the spatial resolution and their influence on D E M error and its spatial pattern, thus to form the hypothesis; and 3) to provide empirical data for testing the thesis hypothesis and for a more detailed evaluation of the role of scale in topographic characterization, terrain classification, and D E M error modelling (discussed in Chapters 4, 5 and 6). 3.1 Study Area Selection Several DEMs within a study area in central British Columbia were chosen to demonstrate the effects of D E M resolution on accuracy and the interrelations among the spatial pattern of errors, resolution and the complexity of terrain. The study area is north and east of Prince George and covers approximately 198 km (E/W) by 175 km (N/S). Figure 3.1 shows a D E M image of British Columbia (5' resolution) and the location of the selected study area. The 63 <u fcCS <u o 29 09 8S 9S P9 29 09 8fr 1 1 _ _i O 9fr d CM o CD T3 o eg -*—• =3 C u E i c CM o CO c in oo CM LU Q CM CO d o 1 "3 U xi GO CD CD O o J co 29 1 09 1 1 89 99 1 1 29 64 1 09 1 8fr 1— 91? <u W Q ID areal extents of the study area are 53.5°-55°N and 120°-l23°W corresponding to the boundaries of 36 NTS 1:50,000 scale sheets (each generally defined by a quadrangle area of 15 minutes of latitude and 30 minutes of longitude). A l l the map sheet numbers included are illustrated in Figure 3.2 for identification purposes. There are two main reasons why this area is selected. They are: 1) the availability of different DEMs in the area. Since some larger scale digital topographic data programs such as B.C. T R I M 1:20,000 project are completed only for selected areas, the selection of the study area is thus limited. 2) the general characteristics of terrain. British Columbia is essentially a mountainous region except for the northeast corner which includes a small portion of the Interior Plains. Although the province is largely mountainous, there are extensive plateaus, large plains and basins, and areas of prairie [Holland, 1964]. As seen in Figure 3.1, the study area is located in the Rocky Mountain area with high terrain variability. Of the 21 major physiographic divisions of landforms in British Columbia defined by Holland [1964], three occur in this area including the Interior Plateau and the Rocky Mountain Trench, two very different physiographic divisions. The selection of an area with variable sub-terrain types will allow an analysis of the correlation between D E M errors and terrain complexity. 65 93 J/15 93 J / 1 6 93 1/13 93 1/14 93 1/15 93 I 16 93 J / 1 0 93 J / 9 93 1/12 93 1/11 93 1/10 93 1/9 93 J / 7 93 J / 8 93 1/5 93 1/6 93 1/7 93 1/8 93 J / 2 93 J / 1 93 1/4 93 1/3 93 1/2 93 1/1 93 G/15 93 G/16 93 H/13 93 H/14 93 H/15 93 H/16 93 G/10 93 G / 9 93 H/12 93 H / l l 93 H/10 93 H / 9 123°00' 122°30' 122°00' Figure 3.2 121°30' 121°00' 120°30' 120°00' Study area NTS map sheet numbers. 66 93 H/16 93 H/16 W E 3.2 Database Implementation Different D E M data sets for the above selected study area were obtained from the mapping institutions mentioned in section 1.2.3, and a multi-scale D E M database was constructed on a Unix workstation. In particular, five digital topographic data sets (i.e., N G D C 10, NGDC5, WSC2, E M R 1 , and T R I M DEMs) were ordered from N G D C , WSC, E M R and B.C. Ministry of Environment, Lands and Parks respectively. A brief description of each data set is given in Table 3.1 including their spatial resolution, matrix dimension, geo-referencing coordinate system and elevation mode. Because of differences in both resolution and format as seen from Table 3.1, some computer programs were written (mainly in Fortran or Splus ) to extract or transfer necessary 2 information. The following five sections introduce each digital file and its format, and describe the data preprocessing procedures used for the database implementation. 3.2.1 NGDC10 The N G D C 10' (i.e., 10-arc-minute) global data set was produced originally by the US Navy Fleet Numeric Oceanographic Center (FNOC) and consists of a 1080 x 2160 geographic (i.e. Latitude/Longitude) grid containing a number of parameters including maximum, minimum and modal elevations, recorded to the nearest 100 feet (see Table 3.2). 2 Splus is a graphic and statistical data analysis software developed by StatSci. 67 Pi o O o ON 3 .. x 9 K 00 w n tn H as o o o m rm oo ON o ON S ON O ON ON O O d as X o X r- oo ON ON 60 C ^ Jj m OH ON >n g d 3 i o ON a ON o •J 00 a, 1/3 ON ON u oo 00 U Q O a o ID wo Os x in 03 ON 6 60 a u 8 Z * 2 9 i o ON I X x o T3 O 173 o c o 3 -a 'en e a C 3 S 0) 6J o Table 3.2 The 11 parameters in the N G D C 10' global data set Format: Field contents: F7.2 F6.2 13 13 13 12 12 12 12 13 13 Longitude (+/- 180°) Latitude (+/- 90°) Modal height in hundreds of feet Maximum height in hundreds of feet Minimum height in hundreds of feet Number of significant ridges Direction of ridges in 10's of degrees (0-18) Primary surface type code Secondary surface type code Percentage of surface covered by water Percentage of surface covered by urban development 69 The modal, minimum and maximum elevations for the area (45°N-65°N, 140°W-110°W) were extracted to cover the whole province of British Columbia. Each of the three elevation matrices has a dimension of 120 x 180, from which a 10 x 19 sub-matrix was then extracted for the study area chosen for this research. A statistical summary of the 10' study area data set (modal elevations only) is shown in Table 3.3. The descriptive statistics calculated include mean elevation, standard deviation, maximum elevation, median elevation, minimum elevation, upper quartile and lower quartile. A histogram of the study area 10' modal elevations is shown in Figure 3.3. From the summary statistics and the histogram of D E M data, the overall D E M characteristics can be observed. These statistics are also used for a general comparison of various study area DEMs, and to identify obvious problems, if any, in the D E M data sets before doing any further analysis. 3.2.2 NGDC5 The 5' northern hemisphere average elevation data (also known as ETOP05) were originally prepared by D M A from arithmetic averages of data digitized from contour maps, and now are distributed by N G D C as part of the Global Ecosystems Database. The data set consists of a 2160 x 4320 geographic (i.e. Latitude/Longitude) centroid-registered grid with elevation data expressed to the nearest meter. The British Columbia 5' mean elevation matrix (200 x 350) (see Figure 3.1) and the study area 5' data (19 x 37) were extracted from this data set. Some summary statistics of the study area 5' elevation data are listed in Table 3.3. histogram of the study area 5' elevations is shown in Figure 3.3. A comparison of the 70 A in • i-H l-H ON ON NO 00 CN CN 00 00 00 CO ON CN in ON 00 CO CO NO OH OH cr CO co NO CN <N w a 3 3 C o NO c '•3 r-- rH ON o o NO m TT NO NO 1 O '" O wo H o m r - oo O o '" *—H H 6 3 d (m) s (Tl CN o ON NO ON (N CO co o (N o CO *-H CO > T3 ON O •d •*-» ON 00 co CO r oo co NO NO r - m oo 3 ed D oo CO CO H — i 1— 1H jo o U Q O m U Q O CN u 3 "o1 T-H > VH < T-H p tr w 4— -* 3 71 Ov OS 0 OOSL sjujod p jeqwnN 009 0 sjujod jo jaqwnN 72 g statistical summary of NGDC10 and NGDC5 DEMs as shown in Table 3.3 and Figure 3.3 indicates the closeness of the two data sets. No apparent problem was found in either one. 3.2.3 WSC2 For the long range development of hydrometric networks in British Columbia, the Water Survey of Canada, in conjunction with Shawinigan Engineering Company, Limited, carried out a deliberate and systematic study of data collection starting in 1970. The U T M grid on topographic maps was used as the basic unit for data collection and for determining physiographic characteristics in the development of the physiographic and hydrologic data banks. This provided the tool for multiple regressions between hydrologic and physiographic characteristics. Considering the mountainous terrain and the small size of many of the ungaged rivers in British Columbia, a grid interval of 2 km was selected as the basic unit for the abstraction of the parameters. The data which were abstracted from the NTS 1:50,000 map sheets included information to compute mean elevation, average land slope, stream density, and area of lakes and swamps [Kreuder, 1979]. The mean elevation is the elevation at the center of the 2 km x 2 km square which is assumed to represent the average elevation of the square. The data that were abstracted were written and color coded into each square on the map sheet as shown in Figure 3.4. The data were then transferred from the map sheets to computer cards in a format which would allow verification of data, identification and verification of map sheets, and the transfer of the data onto magnetic tape. 73 2 0 kilometers 18 15 1050 4 %w CN gure 3.4 3.5 3 Sample of WSC coded grid square (after Kreuder, 1979). Elevation in feet was coded in red in the upper right corner, the number of contour crossings (EAV and N/S) coded in blue in the upper left corner, the number of dots for areas of lakes and swamps coded in brown in the lower left corner, and the number of divider intervals for measuring length of streams coded in purple in the lower right corner. 74 The 2 km raw physiographic data for the study area (covering 36 1:50,000 map sheets as shown in Figure 3.2) and the Fortran programs used for mean elevation data extraction were obtained from W S C directly. The corresponding U T M coordinates for the study area range easting (X) from 501,000 to 697,000 m and northing (Y) from 5,929,000 to 6,097,000 m in U T M zone 10, which resulted in an 85 x 99 elevation matrix. A plot of all the data points (see Figure 3.5) indicated that in addition to some missing values along the border, five data values inside the area were also missing. They were later interpolated by taking the average of their surrounding data points. The descriptive statistics of the data set are shown in Table 3.3 and a histogram of WSC2 elevations is shown in Figure 3.3. 3.2.4 E M R 1 The EMR1 D E M data for the study area were acquired from the Earth Physics Branch of the federal Department of Natural Resources (formerly Energy, Mines and Resources). The data were digitized from available topographic and bathymetric maps and distributed on magnetic tapes with a minimum data retrieval unit of a 15' of latitude by 30' of longitude quadrangle corresponding to the boundaries of the NTS 1:50,000 maps. For the purpose of this research, elevations and water depth data for 36 quadrangles were ordered to cover the 1°30' x 3° study area (see Figure 3.2). Data are stored by quadrangle, and some quadrangles are duplicated to give fresh water depths instead of land elevations. There is header information for each 75 OQS 9>S 0'V9 76 96S quadrangle. Spot values are listed by column from south to north and west to east within the quadrangle. The collection of the spot elevations is in arc-minute format and the average spacing is about one kilometer. More specifically, the number of rows is 28 and is constant for all quadrangles, but the number of columns varies for different latitude ranges. Table 3.4 gives the detailed specifications of data collection for the study area. Several Fortran programs were written in order to extract elevations and their coordinates. Because of the different numbers of columns in the upper part (latitude > 54.25°) and the lower part of the study area as indicated in Table 3.4, two elevation matrices were prepared in Splus. The dimensions are 84 x 192 for the upper part and 84 x 198 for the lower part. Figure 3.6 illustrates the location arrangement of spot elevation collecting points for the upper and lower part. Because of the space limitation, only points within the range of 54°N to 54.5°N and 121°W to 122°W were plotted. In order to be comparable with other D E M data sets, and to merge the upper and lower parts into one D E M for the whole study area, the coordinate system was transformed from geographic coordinates (longitude, latitude) into U T M coordinates (easting, northing) using the SPANS GIS program. Then a D E M of dimension 160 x 188 with a resolution of exactly 1 km was interpolated for the study area from the elevation data in both upper and lower parts using an interpolation function in Splus—interp. The algorithm is based on a method of bivariate interpolation and smooth surface fitting for irregularly distributed data points developed by Akima [1978a]. In this method the x-y plane is triangulated or divided into a 77 z << _) H 00 •—1 Q <N CN CN m 00 CO O NO ON ON CN CN o »—I 1—I as ILAM Q oas oas ON m ON NO d Os ON ON as ON oON O ON SO NO ON 00 NO ON 00 co rr. in m CN in CO o ON O O as 00 o O o O o i> CO r~ CO r-~ CO O om O ON ON ON q om ON om ON 5 is co co CO CO CO CO CN CO CN CO CN co fl M O 00 w TJ- 00 CN f- r-H 00 ON 00 co CO- CO CO CO- Q CO as as r-H PH in co r>n co m rin co in in CO in CO in CO 00 00 00 oo 00 00 m O in CN O m m r- O co in o in in m m H Q Z CM U-! m CN r~; ON ON CM oq ON ON CN ON ON m CN ON ON m CN u o s 2£ £ C S3 — II O (u o e in ON ON ON m CN q in m GO in co" m m CO m o o •<fr m a o m CN o in m m in in Q 78 ^ E E 2 j J M P u Z Q Q Z Q Q Z o CM CM CM CM g 'S CO -*—' c 'o OH CL id CO CO CM CM LU Q w CD 33 CD T3 DC '6) CD 111 CM CM CO CD CO c o Q 2 w CD T3 a >» CO CO 00 CM CM CO CM 1 9>9 1 VV9 : 1 6>9 1 Zt9 (99j6ap) 9pni!iBT 79 1 L>9 1 0fr9 co I-I 3 number of triangular patches, each having projections of three data points in the plane as its vertices, and a bivariate fifth-degree polynomial in x and y is applied to each triangle. Estimated values of partial derivatives at each data point are used in determining the polynomial. The method does not smooth the data and gives an exact interpolator. That is, the resulting surface passes through all the given points. In order to examine the impact of the interpolation, and later for comparison purposes, the descriptive statistics of both raw and interpolated study area EMR1 D E M data sets are given in Table 3.3 together with statistics for other study area D E M data sets. The interpolated 1 km EMR1 D E M is almost identical to the raw EMR1 elevation data in terms of their summary statistics. The grayscale images and histograms of the raw and interpolated EMR1 elevation data were also compared to each other and no apparent differences were found. It can, therefore, be concluded that the impact of the interpolation is insignificant—if the spatial resolution does not change much after the interpolation, of course. The variation caused by different interpolators can, thus, be ignored in future analyses for the purpose of comparing DEMs of different resolutions. The EMR1 D E M data set will, from now on, refer to the interpolated data only. The histogram of the study area E M R 1 elevations is shown in Figure 3.3 together with the histograms for NGDC10, NGDC5 and WSC2 D E M data. It can be seen from Table 3.3 and Figure 3.3 that WSC2 and EMR1 DEMs are very similar in terms of their summary statistics and histograms. 3.2.5 T R I M 93G and T R I M 93H For the 1:20,000 scale T R I M D E M data as described in Appendix A , only two smaller 80 regions within the study area were chosen, given both budget and computer power constraints—one relatively flatter subarea (herein referred to as T R I M 93G or just 93G) and one rougher subarea (TRIM 93H or 93H). Each subarea contains four adjacent map sheets and their map numbers are as follows: • T R I M 93G - 93G.089, 93G.090, 93G.099, and 93G.100 • T R I M 93H - 93H.088, 93H.089, 93H.098, and 93H.099 As shown in Figure A.2 in Appendix A , a map number in the T R I M project consists of the appropriate NTS 1:250,000 lettered map number followed by the numbers of each successive breakdown (001-100), each separated by a period. According to British Columbia Specifications and Guidelines for Geomatics [MOEP, 1990], all T R I M mapping is presented on the U T M coordinate system (based on the 1983 North American Datum) and the accuracy requirements reflect those standards set under the North Atlantic Treaty Organization (NATO) Standard Agreement (STANAG) for the evaluation of Land Maps. For D E M data in particular, the following accuracy specifications are stated: 1) ninety percent of all well defined planimetric features shall be coordinated to within 10 m (i.e. 0.5 mm x 20,000) of their true position. 2) ninety percent of all discrete spot elevations and D E M points shall be accurate to within 5 m of their true elevation. 3) ninety percent of all points interpolated from the D E M (including contour data) shall be accurate to within 10 m of their true elevation. 4) true position/elevation is defined as the coordinates which would be obtained 81 from positioning with high order ground methods. 5) accuracies relating to elevations relate to ground not sufficiently obscured by vegetation or other features to cause significant error. The original spot elevation data for the selected eight map sheets were collected in random mode. Appendix B gives the M O E P ASCII file format for T R I M data. Submissions from the Digital Mapping Group Limited (DMG) give all coordinates (X-easting, Y-northing, and Z-elevation) to the nearest meter which conforms to accuracy requirements. A Fortran program was written to extract the X Y Z coordinates of all the points from each of the eight D E M files. The eight D E M point files were then read into Splus separately for further processing. Table 3.5 lists the X , Y coordinate ranges, the number of elevation points (AO, the random point collection density (D), and the average spacing between points ( 5 ) for each map sheet. The density is computed by (3.1) where A is the area covered by each map sheet. The average spacing, another way of looking at density, is computed by S = 1.0746. - (3.2) when a hexagonal spacing is assumed, which is the arrangement when the N points are equidistant. Given that the points are randomly distributed, this formula provides only an 82 X Os os m m SO CN rr- CN 30298 5987421 5975237 684203 00 00 00 Os o o o o 8.3- 30645 34668 r-- 00 Os 93H. 00 r~ 93H. 00 33476 00 30304 670120 o o o r- o 93H. o co 5986931 5974769 671091 657016 Os ON 00 93H. CN Tir- 30197 5964120 684674 670572 o ON o ON — r1 93G. CN CN 93G. •a a . r- CO 5976310 5975817 5963664 671525 657394 5984189 5972471 565940 552223 5984026 552785 539113 5972330 566066 552362 ON Os 5961213 00 552884 Average Spacing (m)* r- 93G. Number Points* 35213 =t* 5973096 c E X 5961355 « E 35291 >* 5972896 s E >< 93G. S E S 539234 Map Q CN O approximation of the true spacing. It should be noted that the values listed in the above table do not include the breaklines, ridgelines, and planimetric data which are also included in T R I M data and are supplements to the point elevations. As indicated in Appendix A , the T R I M D T M data are collected either in a random mode or a grid mode by stereo compilation. The D T M points are defined to fit a grid spacing of 50 m in steep terrain (average slope > 25°) and 75 m in less complex terrain when captured in a grid pattern. When a random point collection pattern is used, the average spacing between sampling points will be approximately 75 m in steep terrain and 100 m in flatter, less complex, terrain. When supplemented by breaklines, ridgelines, and planimetric data, a 25 m grid D E M can be derived [Balser, 1989]. Therefore, in order to create a regular grid D E M for future analyses, the random elevation point data were, in this case, interpolated into 50 m grid format for each of the two subareas (TRIM 93G and T R I M 93H) using the interpolation function in Splus—interp. As discussed in the previous section, the variation that might be caused by the interpolation can be ignored. Two elevation matrices of the same size (438 x 520) were created. In order to reduce the computational demand in further analyses, a smaller subset D E M of size 300 x 320 was extracted from each of the two 438 x 520 elevation matrices. The interpolated full-size D E M images of the two subareas (TRIM 93G and 93H) along with their smaller subset DEMs (TRIM93G.SUB and TRIM93H.SUB) are shown in Figures 3.7 and 3.8 respectively. The histograms of the two subset DEMs are displayed in Figure 3.9. Their descriptive statistics 84 are shown in Table 3.6. A comparison of the two subset DEMs reveals the overall differences between the two subareas in terms of their general roughness or complexity with TRIM93H.SUB having a much larger elevation variance. As shown in Table 3.6, the standard deviation of elevations for TRIM93H.SUB is 409 m whereas for TRIM93G.SUB it is only 85 m. As indicated in Appendix A , when reading T R I M elevation data (X, Y , Z) in a random pattern, the density specifications are as follows: 1) in areas where the average slope of the terrain is less than 25°, the average spacing between points will be approximately 100 m, and approximately 120 points/km ; 2 2) in areas where the average slope of the terrain is more than 25°, the average spacing between points will be approximately 75 m, and approximately 200 points/km . 2 In order to find out whether the original random point data collection for the two subareas (TRIM 93G and T R I M 93H) actually meets the density specifications set by M O E P [1990], slopes were derived from T R I M DEMs (see section 4.2.3 next chapter for detailed procedure). Figure 3.10 displays the slope histogram for each subarea. It can be seen from Table 3.5 that the data collection for all four map sheets in T R I M 93G subarea conforms with the above mentioned density specifications for rougher terrain. Although the average slope for this whole subarea is only 6°, there are areas with slopes up to 54°, as shown in Figure 3.10. This probably explains why a high density of over 200 points/km was used for each of the four 2 85 "S* C o o o o O C/J C3 4—» CO LU E CQ [?</) o o o o 1-5 OH O o co o OS CJ -S S3 <D LU 2 H U 03 (J •S 3 3 T3 VH £2 <U l-H o o o o •<* 00008 0009/ 0000Z es 00099 (uu) 6UJL|)J0N gs H © :=) C N _,3 H co t3 <D co g ! a U o o o o oo -O CD C3 -pLLI o o o o in -35 £ Q U)<5 c co CO co LU ^ £ 3 W O £ « oo 03 CN x 'B oo 2 co S oo U 00 >> .3 -3 <U o o o o •<+ CO 00008 0009Z 0000Z (Ul) 6UJL|)J0|S| 86 00099 u 1-1 3 60 o o o o oo LU o o o o g></) CO (J, o o o o CD 00098 00008 0009Z OOOOZ 00099 (LU) 6u|quoN o o o o oo o o o o h- S O D)X 8 1 g CO « 5 CO - £ oo * C CO LU i f 4> _N o o o o co CO 00098 00008 0009Z OOOOZ 00099 CD t-l 0 (oi) 6u!L|U0N 87 Z! H ^ «s If2 oq £ 5 B S _1 CO Z> CO I o o o co o o m CM o o o CO CD CM DC c o > CO co o o in r- o o o 0000L 0008 0009 OOOfr 0002 Pi H § g CO b sjinod p jequunN CO ON o o CQ • CO r "J CD I"- 3 1 r CO CD 3 o o o I-I o o o CT> 0\ or o o oo L.1 00002 0009L 0000L 0009 sjinod jo jaqwriN 88 o o co 3 £1 Maxir (m) 1148 3252 Std.dev (m) ON o m o 1820 TRDA93G. SUB 493H. SUB ranu oo Mean (m) as SO r1816 Median (m) Miniinum (m) Upper quartile (m) Lowe quart (m) 1544 SO 00 2093 Tf oo m r- O >/-> as 00 00 ON H 0002L0000L 0008 0009 000V OOOS 0 s i u p d io J s q a i n N o co CD or i 00002 0009 L 0000 L siuiod p jaqiunN 90 0009 L O map sheets in this flatter subarea. On the other hand, the collection density for all four map sheets in the T R I M 93H subarea falls short a little bit, although this is a rougher area with higher average slope compared with T R I M 93G (see Figure 3.10). The slope values in this subarea range from 0° to 82° and the average slope is 23°. It can be concluded then that the point collection density specifications were generally met for both subareas, but there exist some inconsistencies from one area to another. Cheong [1992] discussed in more detail this problem associated with T R I M D T M data point collection and the specifications set by MOEP. 3.3 Some Preliminary Tests In order to compare DEMs of different resolutions and formats, geometric registration of all the DEMs from various sources is the first necessary step. Figure 3.11 shows the registration of various study area data sets as introduced in section 3.2. Only data points from NGDC10, NGDC5 and WSC2 are displayed in this figure since EMR1 has too many points to show in the limited space. The four D E M images of the study area as represented at different resolutions and the D E M images of the two subareas are displayed in Figure 3.12. Also shown in this figure (on the image of EMR1 DEM) are the boundaries of the two T R I M subareas (93G and 93H) in order to indicate their locations in relation to the whole study area. Most of the data manipulation and graphic display used here were accomplished using the Splus program. The GIS packages ARC/INFO and SPANS were also used for conversion 91 O CM x:--x.:::::^:"::x-" .'5<::::::-?s-::: X-- ~-x:z X::vx—"«":-::x::--H-::::i<::: ::x •*•_>:::»:;:•::*•••• »--»"»":€::K''^ *-:«"fci:4::x'-«'--»--^'-V:i*:: mm _ o c — o . £ QQ. O CO i x:::::iH:":::7<- >i:--">K:::::x s<::::::x:::::>«-- in ci CM r- m CM : : s 'o .*...«. .x:::::x -" H>e::::::si;:::::x- -->«' :: oo OH - X - »•••¥• •H.::::irir:::;xL-:-->«}---.x:L::::js:::::x—x-.z/.a::: xs q CM • •»"S!«i"I<i":K."^;:v--«r-» -«!Ii!!Ii! , -x '.'x:z::\x-— x- -.:i2::::::H;:::::jK::- x CD CD i n> in x ::2KU"">< " -x "-^::::x:":"K":^"""i< : x : :: :>:::i:::«::s -xx c o CM CM • x x x zi/.y/.l::-^ : m CM CM •X- :^::^a::^:i: :i^> ^i::^r::;^ x X :: ::X:::::JK:"::»:::::3^ -X X X - X X - X X X--- — X - - o CO CM 0S9 9fS 0>9 (eej6ep) eprujnri 92 60 x--.'.'..x":z\x:: -X --"X..:::X:L":X:::::X::::::S::::::X:::;::X:::::?K:: : 3 O Pi is; :X:;:::3K:::::X::::::?« o 'H-H -: •»••••••••••»-. x 3 3 .O :::: L-?s:::::x:;:: • x .»...•...»..*.. x — ..x::::::x::::x- CD 73 W Q <s <u 53 >> 9S9 co a 3 60 ( Bep)epn,!, e81 M Buim»N e1 93 between geographic coordinates and U T M coordinates, which is sometimes necessary for the registration and comparison of different DEMs. In addition to showing images of different DEMs and their descriptive statistics, there are two other ways to further visually and quantitatively examine the differences between DEMs of various spatial resolutions: topographic profile comparison and D E M spatial mismatch evaluation. 3.3.1 Topographic profile comparison Several terrain profiles can be extracted from DEMs and comparisons made with regard to topographic variation revealed by different D E M data sets. For example, the N G D C 10 data set provides maximum, modal and minimum elevations for each 10' by 10' grid area while EMR1 gives a spot elevation every 0.535714' by 0.909091' (about 1 km by 1 km). Figure 3.13 shows the variation revealed by EMR1 within a 10' band (west/east direction from 123° to 120° W) around latitude 54.08°. From the elevation variation shown by the 10' profiles and EMR1 band it can be seen that two distinctive regions exist in the area. One portion of the surface is relatively flat (from 123°W to 121.7°W), where 10' maximum/modal/minimum profiles and EMR1 spot elevations are all close to each other; and the other portion is very rough (from 121.7°W to 120°W), where a great amount of variation is evident. Also note that there are many E M R 1 points outside the minimum/maximum bounds given by 10' data, particularly near changes in roughness. 94 95 3.3.2 D E M spatial mismatch As indicated before, D E M accuracy assessment is classically carried out by comparing some sample points of the D E M to data of a higher accuracy, and an R M S E is computed to estimate the D E M accuracy [Polidori et al, 1991]. In fact, however, it is necessary to know not only the global measure of the average departure of points on the D E M from the 'real' ground surface, but also the spatial distribution of errors. To compare D E M s at different resolutions, therefore, it is important to examine both the extent and the spatial pattern of mismatch between them and to study the relation between the mismatch and the resolution. The mismatch between two DEMs is determined by calculating the elevation differences between all the data points on the D E M of higher resolution and those interpolated from the D E M of lower resolution. The Splus function interp is used for the interpolation. The mismatch between two DEMs can also be assumed to be the D E M error for the one with lower resolution while considering the D E M with higher resolution as the reference or accepted 'truth.' It can be derived by calculating the elevation differences between the two DEMs. In the rest of this section, NGDC5 and WSC2 D E M errors as compared to EMR1 D E M for the whole study area will be determined first and then WSC2 and EMR1 D E M errors as compared to T R I M DEMs will be examined for the two subareas. 96 3.3.2.1 Two comparisons between DEMs for the whole study area As indicated earlier, study area D E M NGDC5 has a 5' resolution and Latitude/Longitude georeferencing coordinate system. E M R 1 has a spatial resolution of 1 km in U T M . To determine their mismatch, the higher resolution D E M EMR1 is taken as the reference data set or accepted 'truth.' NGDC5 elevations are then used to interpolate a 1 km D E M . Finally, the interpolated 1 km D E M from lower resolution NGDC5 is compared to the actual 1 km D E M EMR1 and a 160 x 188 elevation difference matrix is calculated for the whole study area (a few missing data points can be observed due to the border effects during interpolation). Figure 3.14 shows the comparison results of the study area DEMs NGDC5 and E M R 1 . The display includes a grayscale image showing spatial variation of elevation differences between two DEMs and a histogram of the elevation differences. Following the same procedure, 2 km D E M WSC2 is also compared to 1 km EMR1 D E M . The spatial pattern of the mismatch between WSC2 and E M R 1 DEMs and the histogram of their discrepancies are shown in Figure 3.15. From the mismatch patterns present in above two figures, it is noted that for both comparisons the D E M errors are not evenly or randomly distributed but show certain correlations with the variation of terrain revealed by the study area D E M images. 3.3.2.2 Four comparisons between DEMs for the two subareas The comparison of EMR1 D E M with 50 m T R I M DEMs is also done for the two subareas 97 98 r o O LO cD C i— 9 o c T5 c CD 1 — g > _CD CD O E o o in 0 15 1 CO _> CD OS LU u LU PJ Q CU 0002 L 0000 L 0008 0009 OOOfr 0002 c/i CU sjujod jo jaqLunN J3 LU D E O 3 •o c <u C O E 3 O so 2 CM c CD CD I CD -O .C O U o to E cn CO — ' 3 co D. co D03 9v0L,S0'9 9vOU9 ( L U ) 6U|L|IJON 99 9vOU96'9 m 93G and 93H. In this case, the higher resolution T R I M DEMs are considered as the reference data sets. First, a sub-matrix is extracted from EMR1 D E M for each corresponding T R I M subarea. Then, a D E M of a 50 m grid resolution is interpolated from EMR1 elevation points in each sub-matrix. Finally, differences between T R I M D E M and the interpolated D E M from EMR1 are calculated and a 300 x 320 difference matrix is derived for each subarea. The results are seen in Figures 3.16 and 3.17 respectively for 93G and 93H. In addition to the spatial variation of elevation differences between high resolution T R I M D E M and low resolution E M R 1 D E M , each figure also displays a histogram of elevation differences and a 3-D perspective view of the spatial variation of elevation differences (only a sampling of every four lines is used for clearer visualization). Following the same procedure, Figures 3.18 and 3.19 display the comparison results of WSC2 and T R I M DEMs for the two subareas 93G and 93H. In order to test whether or not the interpolation procedure used in the D E M creation or transformation has any significant impact on D E M accuracy/error analysis, a test was done for both subareas 93G and 93H using different interpolation methods in Splus. Test results are shown in Figures 3.20 and 3.21. Figure 3.20 shows the spatial variation of elevation differences between TRIM93G.SUB and WSC2 DEMs and a histogram of differences using each of the three different interpolators. The three interpolators tested include the linear interpolation in the triangles bounded by original data points (i.e., ncp=0 ), cubic 3 ncp is a parameter in Splus interp function which refers to the number of additional points to be used in computing partial derivatives at each data point for determining the polynomial to fit to the surface in each triangle. A number between 3 and 5 (inclusive) is often recommended. 3 100 OOOSi OOOOZ (UJ) 6 u i q i i o 00099 0009Z OOOOZ 00099 (w) 6 U U O N N N 101 00098 00008 000SZ Q (ui) 6UI1410N 00058 00008 (LU) B U L L I O N 102 0009/ 0009/ OOOOi 00099 Q 103 0009Z OOOOZ 0O0S9 o CA) E 0 1 X LU CO ON > > 5 E o 3 CO 009 00008 0 — (1 w Q 009-C SDUSiOUip UO|)BAB|3 I co -a 2 HI X) a CN u o CA) o s 0 co 000O6 00098 00008 0009/ 0000A 00003 00099 00091 00001 0009 (Ul) BullflJON I CO CD o U s ON CA) I- 5 LU a m co O co D co i a 1 m 00008 00008 (uj) BumuoN (UJ) B H H H O N 104 OOOSZ II c I OOOOZ OOOOe 00059 000S2 (UJ) eUKJUOfJ 00002 OOOSl is 0000 L sjuiod p jeqiunN 5 o E D 5= to m GO CL MRS 0000Z 000S9 ooooe ooosz (lU) f)UIL)|JON O O O O B OOOSI ooooi siujod jo jsqiunN Q- OOOOZ 000S9 ooooe ooosz ooooz ooost ooooi siuiod to jeqmnN (UJ) 6umnoM o CN 3 DO it 105 106 interpolation with 2 additional points used in computing partial derivatives at each data point (i.e., ncp=2) and cubic interpolation with 4 additional points used in computing partial derivatives (i.e., ncp=4) [Akima, 1978a,b]. The standard deviations of elevation differences derived using three different interpolators are very close and they are 29, 31, and 30 m respectively. Figure 3.21 gives similar results for subarea 93H and the standard deviations of elevation differences are 186, 183, and 172 m respectively when using each of the three different interpolation methods. Furthermore, the three different 50 m DEMs interpolated from 2 km WSC2 D E M using different interpolators for each of the two subareas were subtracted from each other. The spatial pattern and the magnitude of their differences are shown in Figures 3.22 and 3.23 for subarea 93G and 93H respectively. It should be noted that, in Figures 3.22 and 3.23, the same vignetting appears, suggesting the same spurious structure has been introduced by the interpolators and is likely to be somewhat related to the variability of terrain. From the above test results, however, it is evident that the magnitude of differences between two DEMs caused by using different interpolators is much less than the difference between the D E M data sets of differing resolution. It is, thus, concluded that the effects of different interpolation methods can be ignored in the above interpolation procedure when comparing a lower resolution D E M with a higher one. 3.4 Observations Based on Preliminary Tests Table 3.7 summarizes some basic statistics (e.g., mean and standard deviation of elevation differences) of all six comparisons as discussed in section 3.3. From the above preliminary 107 2 <N 13 II II o o « 11 CD CM 6 § too too § .2 a o & mm • OOOOZ 000S9 OOOSZ 00003 00OSI OOCXH (UI) Bu|UUON s ° CN 3 W o Q s u on «S c £I 1 .2 J? o & -rt p I J & 2 M &^ OOOSZ 0O0OZ ooooe ooosz ooooz ooosi ooooi (111) eumuofg Q w sl -° > ? s § Cf g Ji 18 & c? Q b CN OOOSi 0000/. 00099 OOOOe (uj) BuiguoN 108 00003 OOOOL tri u s 109 30080 (188x160) oo Number of points Mean (m) EMRl & TRIM 93H 96000 (320x300) WSC2 & TRIM 93H 96000 (320x300) EMRl & TRIM 93G 96000 (320x300) CN -459 CO m -847 oo CN CO 1 r-- T—1 1 CN Maximum (m) oo CO -763 CN -1325 30080 (188x160) O Minimum (m) 96000 (320x300) WSC2 & EMRl o Std.dev (m) WSC2 & TRIM 93G CN NGDC5& EMRl CN U3 VO Os O 00 o 1—1 CN O <N as vo oo as 110 test results, some observations can be made: 1) A l l six means of elevation differences are small relative to the standard deviation (see Table 3.7), and the histograms of elevation differences all show a fairly symmetrical distribution of the discrepancies about each mean (see Figures 3.14 to 3.19). This indicates that there is no obvious systematic error component in the elevation difference values. Furthermore, statistical normality tests indicate that none of the histograms follows a normal distribution. 2) The standard deviation calculated for each histogram indicates the 'width' of the frequency distribution of the elevation differences and provides a global measure of D E M accuracy. 3) The standard deviation of discrepancies for each comparison varies (see Table 3.7) and is related to the spatial resolutions of the two DEMs being compared. For example, the standard deviation of discrepancies between 2 km WSC2 and 1 km EMR1 study area DEMs is 83 m but is 262 m between 5' NGDC5 and 1 km EMR1 DEMs. Between 1 km EMR1 D E M and 50 m T R I M D E M for subarea 93G, the standard deviation of elevation differences is 21 m whereas between 2 km WSC2 and T R I M DEMs it is 31 m. For subarea 93H, the standard deviation of elevation differences between 1 km EMR1 D E M and 50 m T R I M D E M is 126 m and is 183 m between 2 km WSC2 and 50 m T R I M . It is obvious from above preliminary tests that the coarser the D E M resolution the lower the global accuracy. This is true no matter how rough or flat the 111 surface is (i.e., 93H or 93G) and which D E M is used as 'truth' or the one with the highest resolution (i.e., 1 km EMR1 or 50 m T R I M DEMs) for accuracy evaluation. 4) Based on a visual inspection of the spatial mismatch plots (see Figures 3.14 to 3.19), it appears as though elevation difference values vary spatially across the surface in the study area. The spatial pattern of mismatch is not random, but closely related to the variation and complexity of terrain. In other words, the mismatch pattern visually appears to reflect the terrain variability. Apparently, the rougher the terrain, the more the mismatch pattern appears and the larger the discrepancies. As seen in Table 3.7, for example, the standard deviation of discrepancies between EMR1 and T R I M in the flatter subarea (93G) is about 21 m, much lower than that of discrepancies in the rougher subarea (93H) which is close to 126 m. The standard deviations of discrepancies between WSC2 and T R I M DEMs for two subareas follow a similar pattern, with a standard deviation of 31 m in the flatter area and 183 m in the rougher area. 3.5 Thesis Hypothesis In order to investigate the various relations as observed above quantitatively, the following question needs to be answered: are measures of topographic complexity significantly related to observed patterns of mismatch between DEMs of differing resolution? In other words, 112 does knowledge of the landscape characteristics provide some insights into the nature of the inherent error (or uncertainty) in a D E M ? Based on the above preliminary observations, the following hypothesis is to be tested in this thesis: Knowledge of topographic characteristics provides insights into the nature of DEM errors and can be useful for DEM error modelling. To test this hypothesis, first it is necessary to extract some geometric measures from DEMs which characterize the variation and complexity of topographic surfaces. Then a multivariate classification is necessary to automatically identify relatively homogeneous terrain classes based on various roughness measures. Concepts and methodologies involved in topographic characterization and terrain classification are discussed in Chapter 4. 113 114 4. METHODOLOGIES FOR TOPOGRAPHIC C H A R A C T E R I Z A T I O N 4.1 Overview While the idea of utilizing geometric measures or signatures to characterize the nature of topography and to classify terrain is relatively new to GIS [Pike, 1988a,b; Weibel and DeLotto, 1988], the conceptual basis has been developing in geomorphology for some time [Langbein et al, 1947]. Geomorphology is concerned with the description of the form of the Earth and its genesis and, therefore, supplies knowledge on terrain forms and characterization. Geomorphometry, a sub-discipline of geomorphology, is specially devoted to developing quantitative description of landforms [Evans, 1972; Mark, 1975b]. Terrain forms are often classified according to their genesis: by uplift, erosion, sedimentation and other processes. They are also classified by their magnitude and extent, as well as their roughness, all of which are related to the genesis of the terrain [Frederiksen et al, 1985]. Evans [1972] distinguished two aspects of geomorphometry, namely: (i) 'specific geomorphometry,' which is concerned with the identification of named landforms such as drumlins, ridges, peaks, or karst topography; and (ii) 'general geomorphometry,' which attempts to provide a geometric description of the landforms that is applicable to any continuous rough surface. Only the latter will be the focus of this study because the roughness or complexity of the terrain is a major factor influencing D E M accuracy, as observed in previous chapters. 115 A considerable number of parameters have been proposed to describe the spatial variations of the terrain in general terms and to represent several different attributes of terrain geometry. Examples of studies which incorporated geomorphometric parameters into a terrain classification procedure include Wood and Snell [1960], Mather [1972], Pike [1988a,b], and Weibel and DeLotto [1988]. These studies, along with Evans [1972] and Mark [1974, 1975b], provide a theoretical basis concerning geometric measures that have been developed, as well as many of their characteristics. For instance, Mark [1975b] discusses geomorphometric parameters in terms of three components or dimensions of a topographic surface including: vertical dimension (e.g., relief), horizontal dimension (e.g., texture, grain, and drainage density), and slope and its derivatives (e.g., local convexity)—which relate the vertical and horizontal dimensions—and a few less important variables. Evans [1972] describes, at any point on a surface, five measures of important geometric properties. They are: (i) altitude, z\ (ii) slope gradient, z ' ; (iii) aspect, z ' ; (iv) vertical convexity, z " ; (v) v A v horizontal convexity, z " . Areal parameters of surface geometry are described by their h frequency distributions which may be summarised by statistical moments such as standard deviation and skewness of altitude and standard deviation of gradient. In addition to classical geomorphometric parameters, some other measures have also been proposed to describe topographic variation. Examples are: number of points higher than the center point of the moving window, the entropy of altitude, and fractal dimension. There is no single 'magic measure' that is sufficient to characterize the complexity or roughness of terrain. In a general sense, roughness refers to the irregularity of a topographic 116 surface. In the description of natural terrains, roughness parameters should be established that they can be used to describe surface irregularities ranging from a few centimetres to tens of meters. However, a single concise definition of surface roughness is probably impossible [Hobson, 1972]. As observed by Stone and Dugunji [1965], and Hobson [1972], roughness cannot be completely defined by any single measure, which usually describes only some aspect of the physical or mathematical properties of a surface, but must be represented by a 'roughness vector' or a set of parameters. One area may be rougher than another because it has a finer texture, a higher relief, an irregularity of ridge spacing, or sharp ridges (see Figure 4.1) [Mark, 1974]. It is, therefore, necessary to develop multiple signatures to quantitatively describe the terrain. Fractal dimension (£>)—a non-integer value which ranges from 2 to 3 and increases when the surface progressively changes from a plane (D=2) to a surface so folded that it would fill a volume (£>=.?)—was once expected to emerge as the most promising single parameter to measure terrain variation. However, empirical measurement of fractal dimension leads to different results, depending on which measurement method is used, and it is observed that the concept of self-similarity~a form of invariance with respect to changes in scale—is not easily applicable to terrain [Mandelbrot, 1986] [Klinkenberg and Goodchild, 1992]. Therefore no single fractal dimension is sufficient to represent scale-dependent (and thus non-fractal) roughness of topographic surfaces [Weibel and DeLotto, 1988]. In the following sections, methodologies for quantitative topographic characterization for D E M error modelling will be reviewed and presented first. Then, the principles of hierarchical terrain classification will be discussed. Characterization of the variation and 117 Figure 4.1 Forms of surface roughness after Mark [1974]. The numbers refer to the relation of the amplitudes and wavelengths between the profiles (e.g., the amplitude of the second profile is twice that of the first and the wavelength of the first profile is twice that of the third). 118 complexity of terrain will be achieved by means of both local measures and global measures. The local measures include general geomorphometric parameters such as slope and local relief. The global characteristics will be identified using the grain measure, spectral analysis, nested analysis of variance and fractal analysis of DEMs. The rationale for determining both local and global measures of surface roughness from DEMs lies in that: (i) the local measures are required for automatic terrain classification, that is, to identify various terrain clusters and, therefore, to relate the spatial pattern of D E M errors to the roughness variation of the surface; and (2) the global measures are used to provide 'context' and to allow for comparisons to be made with regard to the D E M error modelling for various surfaces (93G and 93H) and resolutions. 4.2 Extraction of the Local Geometric Measures While mass-production of DEMs has all but replaced the laborious topographic data capture that once restricted the quantitative description of land form, new computer methods are automating the characterization of topography [Pike et al, 1988]. Automation is helpful in providing broad coverage by standardised methods and permitting comparisons of parameters for different areas. Review of the literature of general geomorphometry shows that all of its parameters may be defined in terms of altitude [Evans, 1972]. For example, relief is usually expressed as range in altitude, texture is the spacing of maxima or minima (or the shortest significant wavelength), grain is the longest significant wavelength, slope is the first derivative (rate of change) of altitude, convexity is the second derivative of altitude (rate of 119 change of slope), and the hypsometric integral represents a measurement of the interrelation of area and altitude. Geometric measures in general geomorphometry have often been extracted and evaluated from gridded DTMs through a method which is similar to texture analysis in image processing [Haralick et al, 1973]. That is, a sampling window of a certain size (e.g., 5x5 or 7x7) is moved through the elevation matrix. As the window visits each matrix element, geomorphometric parameters are calculated from all elevation elements in the window, and these values are assigned to the center point. Then, each point that is visited by the window is characterized by a multivariate description of the local surface geometry. As the primary purpose of this study is not to formulate new geometric measures or to evaluate the relative capabilities of different parameters, the following seven groups of established parameters for the description of the roughness of terrain are identified and reviewed based on the results of the previously mentioned studies. Each of these parameters is later extracted from the study subarea DEMs using different moving window sizes to demonstrate the feasibility of the concepts and techniques that are presented. 4.2.1 Local relief (LR) 'Relief is a concept usually used to describe the vertical extent of topography. There is no generally accepted definition of 'relief [Evans, 1972], but range in altitude is most commonly 120 used and is referred to as 'local relief by Mark [1974] or 'relative relief by Smith [1935]. Local relief is calculated as the difference between the highest and lowest elevations. That is, LR = z max -z min (4.1) Local relief is always defined with respect to some particular area and there have been various ways of defining the area within which range is to be measured. In most cases, local relief is determined for arbitrarily-bounded sample areas such as squares, circles, or latitudelongitude quadrangles; local relief has also frequently been determined for drainage basins and 'hills' [Mark, 1974]. 4.2.2 Standard deviation of altitude (SD) Evans [1972] observed that most relief measures depend upon the use of extreme values of the distribution of elevations within the sample area as in LR above, and would therefore be sensitive to even minor variations in estimations of these values. To describe the dispersion of a distribution or, in particular, to characterize the vertical dimension of topographic surface, the standard deviation is a more powerful and stable statistic than the range since it is based on all values and not just the two extremes. Evans, thus, proposed use of the standard deviation of altitude as the relief measure. But he also noted that "... the autocorrelation of altitude admittedly makes range more reliable than it is for random variables, since on a continuous surface all intermediate values between the extremes must 121 be represented [p.31]." Most of the previous studies did not use the statistically preferable standard deviation generally because of its high computational demand with manual methods. Automation of parameter extraction from DEMs using computers has made the effort required to calculate standard deviation insignificant. The standard deviation of altitude is calculated by the following formula. SD = where z (4.2) i=l N n - 1 is the mean elevation within the moving window and n is the number of grid points within the window. 4.2.3 Slope and aspect (a and (3) Slope measures have been considered as the most important type of parameters by geomorphologists and are, thus, most widely used. Evans [1972], for example, stated that"... slope is perhaps the most important aspect of surface form, since surfaces are composed completely of slopes, and slope angles control the gravitational force available for geomorphic work." Unlike relief and most other geomorphometric parameters, which are normally defined for sample areas, slope is theoretically defined at every point. Slope at a point is defined in terms of a plane, characterized by its gradient and aspect, tangential to the surface at that point [Evans, 1972]. Mathematically, the tangent of slope (tan a) is the first 122 derivative of altitude (i.e. the rate of change of altitude with distance). In practice, however, slope is generally measured over a finite distance [Mark, 1974] and, thus, involves considerable sampling problems. influence the values obtained. The size of area over which slope is measured will The effects of different sampling intervals on slope calculations have been discussed by Gerrard and Robinson [1971]. In a discrete D E M representation of a continuous surface, slope calculations will depend on the resolution of the grid. Obviously, the accuracy of slope calculations at a point will decrease with coarser grid resolutions. The rate of this decrease in accuracy will depend on the spatial variability of the surface. Certainly deriving slope from the square grid method is safe if, in accord with the sampling theorem, the grid size is less than half the shortest wavelength of variability present. Evans [1972] concluded that grid sizes of 20, 50 or 100 m are suitable for study of mesorelief. If elevation is sampled with a resolution of 1 km or coarser, Evans suggested that slope and convexity be measured separately at each point, and not derived from the altitude matrix. There are various techniques that could be used to calculate slope and aspect in a D E M . Most methods are based on a 3 by 3 moving window which traverses the D E M . Following DeLotto [1989], the method used in this study is illustrated in Figure 4.2. This method defines gradient as the maximum gradient of either the steepest drop or the steepest rise from the center cell to the eight nearest cells. Aspect is then the direction (in 45° intervals) of the 123 o 1 315 z z 1 It 270° Figure 4.2 7 L z 3 \ T / - z - *z / 1 \ z z 5 z 225 2 45 8 180' 6 90 9 135 3 by 3 neighborhood of points for slope calculation. 124 maximum gradient. That is, given the elevations for a 3 by 3 neighbourhood, a simple slope angle calculation for the center point can be obtained for each of the four directions with edge-contact neighbors with the formula: a, = arctan(-^ ^) (4.3) d where z, is the elevation of each of the appropriate neighbors and d is the grid resolution. For each of the four other directions with corner-contact neighbors, the formula is: z ~ Za, = arctan( — -) y/2 d (4.4) Once these eight angles have been calculated at a point, the slope for that point is then defined as the maximum of the eight angles, that is, a = max(oCi), and the aspect (p) is the direction of the gradient clockwise from north as seen in Figure 4.2. For example, if the maximum gradient is in the direction of cell z then aspect = 45°. This is only a crude 3 measure of aspect and is likely to be influenced by D E M error. 4.2.4 Roughness factor (RF) The study of the dispersion of slope angle and direction using three-dimensional vector analysis gives rise to another surface roughness parameter. Hobson [1972] uses the distribution of planes to describe the three-dimensional orientation of surfaces within an area, treating the perpendiculars to slope units as vectors and applying well-established 125 mathematical approaches to the analysis of directional data. As illustrated in Figure 4.3 (after Hobson [1972]), the test area is simulated by a set of intersecting planar surfaces or triangular facets formed by inserting diagonals into a regular grid in an altitude matrix. Normals to these planes are represented by unit vectors. Vector mean, vector strength and vector dispersion are computed using methods defined by Fisher [1953]. Vector strength indicates the length of the resultant sum of the unit vectors (R) and is obtained by using the direction cosine method. Dispersion (1/K), on the other hand, indicates the variability or spread of the unit vectors in space and is similar, in some respects, to 'standard deviation.' K is calculated as: K - 2LJ1 (N - R) (4.5) where N is the number of vectors. As a surface approaches planarity (a roughness of zero), the vectors will become parallel, R will approach N, and K will become infinite. Thus the inverse of K gives an intuitive roughness measure. Vector strength is usually high and vector dispersion low in areas characterized by similar elevations (see Figure 4.3-B) or equal rates of elevation change, whereas non-systematic elevation changes yield low vector strength and high vector dispersion (see Figure 4.3-C). Extending Hobson's work, Mark [1974] proposes that the best measure of vector dispersion roughness is the roughness factor (RF), defined by: RF = 100 - L(%) (4.6) where L(%) is 100(R/N), the vector strength in per cent. In the case of large N, RF will 126 127 approximately equal 100 times the inverse of K. The following equation can also be derived and will be the method used in this study: RF = 100(1 - cosa) 4.2.5 (4.7) Slope curvature ( S Q The slope curvature—or the local convexity according to Evans [1972]—is the rate of change of slope, the first derivative of slope or the second derivative of altitude mathematically. It describes the convexity, concavity or straightness of slope profiles and comprises downslope (vertical) convexity and cross-slope (horizontal) convexity. It can be obtained by fitting a quadratic trend surface to the 3 by 3 neighbourhood of each sample point in the altitude matrix and then differentiating the resulting quadratic function for the surface twice—just as slope and aspect can be obtained by fitting a linear surface. Standard deviation of slope is another expression of degree of curvature, although it does not distinguish convexity from concavity. That is n SC = E K - «) i=l N n - 1 128 2 (4.8) 4.2.6 Number of higher points (HP) A number of non-standard measures of vertical component of topography have been reported in the literature. One of the measures, for example, involves computation of differences in elevation between the center point of a moving window and the computed mean for that window. Another example involves measurement of the mean difference between the center cell and elevation of cells in a window that are higher than the center cell. The number of points in a window that are higher than the center point (HP) is another such measure used in some studies [Weibel and DeLotto, 1989]. This measure emphasizes terrain discontinuities even more than drainage density or profile convexity emphasize them. 4.2.7 Hypsometric integral (HI) In order to have an adequate measure of the 'dissection' or 'aeration' of a landscape, that is, the extent to which it has been opened up, especially by erosion [Clarke, 1966], geomorphologists have proposed several measures to describe aspects of the distribution of landmass with elevation. Most of these measures are based on the hypsometric curve. That is the cumulative frequency curve of elevation which was first developed by Imamura [1937]. The hypsometric integral, proposed by Strahler [1952] to represent the area under the dimensionless hypsometric curve (also termed relative or percentage hypsometric curve by Mark [1974]), is now the most widely used coefficient of dissection. It is given by: 129 1 (4.9) HI = f a(h)dh o where a ( h ) is the relative area above a height and h is the relative height (altitude above the lowest point), defined by: (z - z • ) h =— -22— where z is the actual elevation, and z min and z max (4.10) are the lowest and highest elevations, respectively, within the area. In an effort to reduce the tedium inherent in the calculation of the hypsometric integral measure, Pike and Wilson [1971] showed that the elevation-relief ratio (E) of Wood and Snell [1960] is mathematically equivalent to the hypsometric integral. Therefore, it can be estimated as: HI = E = (4.11) {Z where z is the mean elevation within the area. According to Wood and Snell [1960], the measure E expresses the relative proportion of upland to lowland within a sample region. As pointed out by Evans [1972], this expression is also termed the "relative coefficient of massiveness" by Merlin [1965]. Pike and Wilson [1971, p.1081] stated that"... experience has shown that a sample of 40 to 50 elevations will ensure accuracy of E to, on the average, 0.01, the value to which area-altitude parameters 130 customarily are read." Mark [1974] pointed out that Evans's [1972] estimates of HI using grid values, at least for the smaller sub-matrices such as 3 by 3 (9 points), are probably in serious error. Two extensive investigations of terrain geometry by Wood and Snell [1960] and Pike [1963] showed that topographic samples might resemble one another with respect to local relief, average slope, or other geometric aspects, and yet vary appreciably in appearance as demonstrated by different values of E. Identical E values can also represent dissimilar terrain types. It is, thus, necessary to refer to complementary geometric parameters to obtain a comprehensive and meaningful description of a topographic surface. E usually ranges from 0.15 to 0.85, with values tending to cluster between 0.40 and 0.60. Low values occur in terrains characterized by isolated relief features standing above extensive level surfaces, whereas high E values describe broad, somewhat level, surfaces broken by occasional depressions [Pike and Wilson, 1971]. In summary, among all the seven identified variables as discussed above, some of them might actually describe a similar aspect of the roughness of terrain and are, thus, redundant because of correlation. If variables are highly correlated, they add little new information to the multivariate description of the surface characteristics. Therefore, a variable selection process is necessary before they are used in further analyses. According to Pike [1988a,b] five groups of geometric measures might be distinguished: (i) statistics of altitude; (ii) variables of the power spectrum of altitude; (iii) statistics of slope at a variable horizontal length; (iv) statistics of slope at a constant horizontal length; and (v) statistics of slope curvature at constant length. Various studies by Evans [1972], Mark [1975b] and Pike [1986] have shown 131 that correlation between variables of these groups is negligible (i.e., they are complementary and explain different aspects of topography). Correlations among variables within the same groups, however, are significant and thus redundancies exist. 4.3 Global Surface Characterization While the 'moving window' method used above is computationally attractive and has been employed in many studies of terrain classification, the scale-dependent nature of topography limits its usefulness. Several studies have demonstrated the sensitivity of general geomorphometric parameters, such as slope, to changes in D E M resolution [Dubayah and Davis, 1988] [Corbett and Gersmehl, 1987]. Besides, selection of the size of the moving window has, in many instances, been a somewhat arbitrary decision [Mark, 1975b], with the only criterion being a priori knowledge of the approximate size of landforms [Weibel and DeLotto, 1989]. If the moving window is too small, statistical significance of the resulting parameters would be too low because of a small sample size; if the window size is too large, smaller landforms may not be discriminated because geometric signatures are averaged out by the excessive size of the window. Different global characterization techniques, such as grain measure, spectral analysis, nested analysis of variance and fractal analysis, have been proposed to guide the selection of an appropriate optimal window size based on the surface characteristics. The following sections discuss these techniques. 132 4.3.1 Grain The above-mentioned local derivatives of elevation at a point, and moments of their distribution over an area, cover all geomorphometric concepts except for horizontal variation. Grain has been used generally to describe in some way the scale of horizontal variations in the terrain. Grain is used to refer to the size of area over which the other geomorphometric parameters are to be measured and is dependent on the spacing of major ridges and valleys [Wood and Snell, I960]. The grain of a surface can be determined by calculating the local relief within concentric circles around a randomly-located point. According to Wood and Snell, the grain exists at a 'knick point' in the curve plot of local relief against the circle diameter, which is a point where the line representing relief stops to increase dramatically with distance. The diameter at the knick point will then be the grain. It indicates the characteristic wavelength or optimal sampling area of the topographic surface. 4.3.2 Spectral analysis of terrain Periodic landforms have been described often in the literature. It has been shown that regular spacing of landform elements may actually be a dominant aspect in many types of topography, including meandering river valleys, ridge-and-valley terrain, sand ripples and dunes, and some drumlin fields. Information about horizontal variations in the topography may be obtained by study of the autocorrelation properties of elevation and its derivatives [Evans, 1972]. Topographic signatures could be made more discriminating by including a 133 variable that is sensitive to the periodicity of terrain. The spectrum is a quantity well-known to electrical engineers. It is used in spectral analysis to examine the power content at different frequencies or wave lengths in signals. Spectral analysis was first used on terrain by engineers to determine surface roughness for military purposes [Pike and Rozema, 1975]. The basic premise of spectral analysis is that by transforming the data from the time, h(t), or space, h(x), domain into the frequency domain, H(f), certain types of interpretation and manipulation may be more easily performed. For every space domain feature there is a frequency equivalent which consists of one or more sinusoidal frequency components having distinctive wavelengths, phase leads or lags, and directions. Both domains contain exactly the same information [Robinson, 1973]. The representation of the terrain in the frequency domain greatly simplifies the separation of various surface forms or features of various sizes and amplitudes. Fourier transformation is most commonly used for frequency analysis. The two-dimensional Fourier transformation and inverse Fourier transformation equations can be expressed as: Hif f ) v 2 = l\Kx,y)e -2ni(f *f y) lX 2 dx dy (4.12) oo oo h(x, y) = j fH(f f ) v 2 134 e df, df 2 (4.13) where the data surface h(x, y) has coordinates (x, y), and the spectral array H, which is complex, has frequency (or wave number) coordinates (fj, fj. For discretely sampled data, a summation sign would replace the integral and the equation (4.12) becomes: H(n v where: h(k n) = £ 2 E Kk k ) - ^"^ 2 v 2 e «-™W"i (4.14) = elevation at (k kj on surface; lt p Nj, N = total number of elevation sample points along each direction. 2 The two-dimensional Fourier transformation involves the fitting, by using the least-squares criterion, of a set of parallel sinusoidal waves of varying wave length and orientation. The transformed coefficients define the amplitude, phase (point at which the particular wave intersects the origin) and orientation of each wave. The original elevation data array is replaced by a new array of the same size which describes the power information content (or magnitudes) at different wave lengths in the fluctuations of the terrain surface. The coordinates are now wave length rather than distance from an origin. The amplitude of the wave is a measure of the variability at that wave length. The transformed array, therefore, presents a ranking of the variability as a function of magnitude and direction of scale [Rayner, 1972]. The inverse Fourier transformation can simply transform the ranked variabilities back to space domain data. 135 The continuous variance spectrum can be estimated from the autocorrelation results by integrating the autocovariance function for all lag intervals according to one of the Fourier functions. Theoretically, the actual continuous spectrum of a random series can only be determined from an infinite record. Most spectra, however, are derived from only small samples of much larger data populations. Therefore, many statistical considerations are important in the selection of an appropriate procedure to estimate the true spectrum from a sample of finite length. First, spectral theory assumes that the series examined is stationary; that is, the mean and higher moments of the series are time (or space) invariant. The statistical properties of a topographic profile must remain unaffected by any change in its location or orientation. Long and regional topographic trends may cause nonstationarity and distort other parts of the spectrum by dramatically increasing the variance in all wave length bands, and, thus, should be removed or filtered out before the variance spectrum can be estimated. The second consideration is the aliasing problem in spectral analysis which involves the distortion of the 'true' spectrum by wave lengths less than twice the sampling interval. Aliasing error is inherent in all topographic profiles or surfaces which are constructed from sampled data. The distance between sample points limits the range of frequencies that can be determined. The highest detectable frequency, the 'Nyquist,' has a wavelength of twice the sampling interval. Sampling theory shows that frequencies higher than this limit may be present in the data set but they cannot be detected or measured. Their amplitudes, however, appear as aliased additions to the amplitudes of lower and presumably more accurately measured frequencies. Aliasing can often be overcome by choosing a sampling rate at least twice as fine as the size of any important features to be detected. Or 136 it can be minimized by using a digital band pass filter which can be used to extract a limited band of frequencies. Filtering is accomplished by multiplying the amplitudes of the frequency components by some value. Usually the multiplier is zero for frequencies that are to be deleted and unity for those that are to be retained unchanged [Robinson, 1973]. The third consideration is tapering the finite-length series. It is often desirable to taper a random series at each end by multiplying the series by a 'data window,' analogous to multiplying the correlation function by a lag window [Otnes and Enochson, 1978]. It is equivalent to applying a convolution operation to the 'raw' Fourier transform. The purpose of tapering when looked at from its frequency domain effect is to suppress large side lobes in the effective filter obtained with the raw transform. When viewed from the space domain, the effect of tapering is to 'round off potential discontinuities at each end of the finite segment of a long series being analyzed. By the means of the spectrum, a statistical plot of the wave length of terrain undulations against the variance (square of the amplitude) of the different size-ranges of terrain undulations, spectral analysis can ascertain and express the spacing of topographic periodicities and measure two other landform properties: absolute terrain roughness and relation of large-scale to small-scale roughness characteristics [Pike and Rozema, 1975]. Two kinds of spectral analysis can be used in topographic studies: the one-dimensional spectrum for a topographic profile, and the two-dimensional spectrum for a matrix of elevations. Each method has its advantages and disadvantages. The major advantage of two- 137 dimensional terrain spectral analysis is that it explicitly retains any directionality in the topography [Steyn and Ayotte, 1985] and thus provides the most complete representation of the scales of terrain variability. Anisotropy is often encountered in areas with regional geologic or structural fabrics. Fourier domain approaches can be used in which topographic anisotropy would be plainly visible in two-dimensional power spectra—the isopleth plots of spectral amplitude in wavenumber space. Perfectly circular isopleths centred on the (0, 0) point in wavenumber space would represent a totally directionless topography. Any systematic deviation from circularity would then indicate directional bias. Figure 4.4 illustrates an example of two-dimensional spectral analysis of B.C. 5' D E M . A standard two-dimensional discrete Fourier transform function fft in Splus software package was used for the analysis. Before the Fourier transform analysis, the data arrays were first averaged, and the average elevation subtracted from each point in order to remove the large amplitude spike at zero wavenumber which represents the average terrain elevation. Then all topographic trends were removed and a cosine taper was applied to the resulting D E M matrices in order to reduce the variance introduced to the spectral estimates by the edge discontinuity. The spectrum in Figure 4.4 has symmetry about the origin and it demonstrates a considerable directional bias. It shows marked anisotropy, with variance contributions extending to much greater wavenumbers in a northeast to southwest direction. Such a pattern represents a strongly ordered terrain with a series of ridges and valleys running in a northwest to southeast direction which, in this case, represents the coast and rocky mountains' structural lineation and the trellis drainage pattern. 138 139 One disadvantage of two-dimensional spectral analysis is that, instead of providing a single measure of surface wave length, it produces a matrix of power at various wave lengths and directions. One-dimensional spectral techniques are, therefore, of continuing value, for dealing with either the average of the two-dimensional spectrum over all directions, or the spectrum of profiles across or along the 'grain' (i.e., the longest significant wave length in the topography) of the surface. In the case of directionless topography, an isotropic spectrum should be just collapsed to the simpler, one-dimensional case. The one-dimensional spectrum organizes the geometry of a terrain profile according to different sizes (or wave lengths) of topographic undulations. Since the greatest vertical variability within an original topographic profile is almost always associated with the longer wave lengths, the spectral density function of the variance (i.e., the Fourier transform of the autocovariance function) diminishes rapidly with decreasing topographic wave length or increasing frequencies. According to Pike and Rozema [1975], each spectrum plot conveys mainly the following information: 1) spikes in the plotted spectrum curve mean periodicities of topography. A prominent peak in a topographic spectrum indicates that the original profile contains an unusually high number of terrain undulations within a restricted horizontal scale range, or perhaps a few undulations at that wave length may have unusually high local relief. While one would imagine that this periodic behaviour could be most easily detected by simply examining the original terrain profiles, Pike and Rozema explain that "the advantage of interpreting spectral peaks lies not in corroborating evident regularities, but rather in identifying unexpected or less obvious periodicities in the topography." 140 2) the overall displacement of the curve above the horizontal axis measures variance in elevation of the profile and indicates absolute roughness of the original topography at the indicated wave lengths. 3) the slope of the curve expresses the relative importance of large-scale and small-scale landforms. Steeper spectral curves mean proportionately less roughness in small topographic features than in larger features. A lower spectral slope indicates that small topographic features are rougher than large features, a common natural condition. In general, according to Frederiksen et al. [1985], if the slope of the spectrum is larger than 2.5, the landscape is smooth because of the absence of high amplitudes at high frequencies. On the other hand, a slope less than 2.0 indicates a rough surface with relatively large variations of high frequencies. 4.3.3 Nested analysis of variance Another way of examining scale effects in digital terrain modelling is through the use of a nested analysis of variance technique. This method was first proposed by Moellering and Tobler [1972]. It is analogous to spectral analysis but more adapted to the types of data available in geography. That is, it can be used not only for situations in which the data are available at uniform intervals of space, but also for any nested hierarchical geographical data which are irregularly spaced. Human society generally is arranged into nested hierarchies such as governmental units; townships, counties, provinces, and nations; or census tracts. The 141 geographical hierarchy orders different levels by areal size, and this can be taken as a surrogate for scale or resolution. Analyzing the data at different levels of the hierarchy is thus equivalent to analyzing the data at different geographical scales [Moellering and Tobler, 1972]. Analysis of variance (ANOVA) is done through a hierarchical investigation of a region, where different classes are not different zones in the region as in more conventional applications of A N O V A , but rather are different resolutions of the same zone. The idea is to find the resolutions (scales) where significant variability exists in the data, assuming that the best resolution for studying a phenomenon is the one "where the action is", or where it is not. This may give clues to the scale at which the phenomena of concern are operating. Regular grids in a D E M present a simplified situation for the application of a nested A N O V A . Grouping of the grid cells is a natural way to form a hierarchy from such data, same as the concept of a quadtree structure. Figure 4.5 illustrates the nested hierarchical structure of a square matrix. As shown in the figure, the square grid cells nest in a ratio of 4:1 from the lower level to the higher level. The four levels shown would serve as classes in a nested A N O V A procedure. Using these levels as classes, the variance at each resolution can be identified as a percent of the total variance by comparing the mean at one level to the mean of the level just above it. The total sum of squares is given by: ss total where X ijk = E (** " If is the data value at each observation and of squares at each level is given by: 142 - (4 15) x is the grand mean. The sum i ss =E - x.) 2 2 = E <*<., ~ (4 ss ss* = E -V - 16) 2 where each X represents the mean of the area as shown in Figure 4.5. The total sum of squares will equal the sum of components at each level. That is, totci = SS SS SS SS 1+ 2+ 3 (4.17) The above equation shows how the variation around the grand mean may be partitioned into parts attributable to the various scale levels. Figure 4.6 shows a set of artificially constructed data used by Moellering and Tobler to demonstrate this technique. The results of a nested A N O V A on that data are shown in Table 4.1—a typical analysis of variance table as developed by R.A. Fisher. In contrast to an ordinary analysis of variance procedure, the significance test using the F-ratio was omitted from the above statistical table by the authors because they assumed a complete enumeration of the population rather than just a sample. As is clearly indicated in the table, all of the "action" is on levels 1 and 4 while there is no "action" on levels 2 and 3. 143 Level 0 Level 2 Level 1 Level 3 Figure 4.5 Hierarchical structure of a square matrix. 144 2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8 5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5 2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8 Figure 4.6 5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5 2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8 5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5 2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8 5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5 5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5 8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2 5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5 8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2 5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5 8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2 5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5 8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2 A numerical example of a 16 by 16 even hierarchy (after Moellering and Tobler, 1972) 145 Table 4.1 Scale variance for the artificial data in Figure 4.7 Scale Level Sum of Squares Percent of Total SS DF Mean square ScaleVariance Component 4 576.0 50.0 192 3.0 3.0 3 0.0 0.0 48 0.0 0.0 2 0.0 0.0 12 0.0 0.0 1 576.0 50.0 3 192 3.0 0 1152.0 100.0 255 - - 146 4.3.4 Description of Terrain as a Fractal Surface Terrain models can be described by the concept of self-similarity and fractals. 'Fractals,' or fractional dimensions, were first presented by Mandelbrot [1975] as a new mathematical basis for describing many complex scale-invariant natural patterns. Mandelbrot introduced the term 'fractal' specifically for temporal or spatial phenomena that are continuous but not differentiable, and that exhibit partial correlations over many scales [Burrough, 1981]. By studying various natural phenomena such as coastlines, Mandelbrot observed a form of invariance with respect to changes in scale over a wide range, and he introduced the concept of self-similarity to describe this phenomenon [Frederiksen et al, 1985]. By definition, self-similarity is a property of certain curves where each part of the curve is indistinguishable from the whole, or that the form of the curve is invariant with respect to scale. That is, a spatial feature is said to be self-similar if any part of the feature, when appropriately enlarged, is indistinguishable from the whole feature. In the case of natural phenomena such as coastlines, self-similarity must be interpreted statistically—that is, each part of the curve should be statistically indistinguishable from the whole [Goodchild, 1980]. Either in a strict or a statistical sense, fractals are invariant as a result of transformations of scale [Mandelbrot, 1977]. This concept may be illustrated when measuring coastline length. Any objective method used to measure the length of the irregular line has an implied sampling interval in the form of the 147 diameter of a wheel rolled along the line or the step size of a pair of dividers. It has been shown that over a wide range of scales there is a tendency for the length of a coastline to behave in a predictable manner such that when length is plotted against sampling interval on logarithmic scales the points tend to follow a straight line. Not only does more detail become apparent at larger scales, but it tends to do so at a predictable rate. Suppose that the length of the coastline is measured using dividers with step size s lt and that n, such steps are found to span the line, then the length of the line will be estimated by n s . 7 7 If the process is repeated with a smaller step size, s , another length estimate, n ^ , will be obtained and it will 2 be greater than or equal to the previous estimate. The more irregular the line, the greater the increase in length between the two estimates. One may define the Hausdorff-Besicovich dimension (D) of an irregular line as follows [Goodchild and Mark, 1987]: D = login-In,) 2 1 logCy^) (4.18) A similar definition can be applied to the situation of using a square grid to measure the area of an irregular patch or a rough surface. The value of D characterizes the intricacy or the jaggedness of the entity. Lines have dimensions varying from 1 to 2 while surfaces are described by values of D ranging from 2 to 3. As D increases towards the upper limit of the range, the entity becomes so rough and irregular that the line would take up the whole of a plane space and the surface would fill a volume space. 148 A fractal, according to Mandelbrot's original definition [1977], is a set for which the Hausdorff-Besicovitch dimension (D) strictly exceeds the topological dimension in Cartesian geometry (e.g., 0 for points; 1 for lines; 2 for areas); for such functions, D is often termed the fractal dimension. It appears, however, that most real spatial entities such as coastlines are not fractals in the pure sense of having a constant D but in a looser sense of exhibiting the behaviour associated with noninteger dimensions [Goodchild and Mark, 1987]. Mandelbrot's revised definition of fractals states more openly that a fractal is a shape made of parts similar to the whole in some way. Self-similarity is an important concept of fractal geometry and fractal scaling is easily demonstrated using self-similar fractals. In this compound term, while 'fractal' points to disorder and is compatible with intractable irregularity, 'self-similar' points to a kind of strict order [Mandelbrot, 1977]. However, the concepts of 'fractal' and 'self-similarity' should not be confusingly mixed. In fact, not all self-similar objects are fractals. A straight line would be such an example. Several different but related methods have been proposed to determine the fractal dimension of surfaces: the dividers method, the cell counting method, and the variogram method [Goodchild, 1980; Burrough, 1981; Shelberg et al, 1982; Shelberg et al, 1983; Mark and Aronson, 1984; Eastman, 1985; Roy et al, 1987; Clarke and Schweizer, 1991; Klinkenberg and Goodchild, 1992]. Among them, the cell counting and the variogram methods use the elevation values directly while the dividers method use the contour lines derived from the elevations. There is still controversy in the literature over the effectiveness of some of the different measurement methods and how much variation to expect between different methods. 149 Therefore, the search for more robust methods of estimating fractal dimension is a topic of current research [e.g., Clarke and Schweizer, 1991]. Surface fractal models may be created through fractional Brownian motion (fBm) [Mandelbrot, 1975; Goodchild, 1982; Burrough, 1983] which in practice may be generated from fractional Gaussian noise. The fractional Brownian functions are characterized by variograms of the form: E[(Z(x) - Z(x+d)) ] = C(\d\) 2 2H ( -!9) 4 where C is a constant, E[.] denotes the mathematical expectation of the argument, d is the distance (or lag) between two points, and Z(x), Z(x+d) are the elevation values observed at locations x and x+d respectively. A variogram is often shown as a plot of squared differences between paired observations, averaged by distance bins, against distance between paired observations. The essence of the variogram technique is that the statistical variation of the elevations between samples is some function of the spacing between them. This technique is the principal tool of a method of spatial modelling known as regionalized variable theory and it is well-known for determining the degree of spatial dependence between samples in the geosciences to model geological, geochemical and geophysical phenomena [Davis, 1986]. The term regionalized variable was coined by Matheron [1963] to describe variables whose variation in space is erratic and often unpredictable from one point to another. Yet, the behaviour of these variables is not completely random; values taken at neighboring points are related by a complex set of correlations reflecting the structure of the underlying phenomena. That is, a regionalized variable has properties 150 intermediate between a purely deterministic process and a purely random one [Carr, 1995]. The variogram takes on the form of a power function in which the parameter H varies between 0 and 1, and relates to the fractal dimension of the surface through D =3 -H (4.20) H is an indicator of the surface complexity [Burrough, 1981]. The smaller is H, the larger is D and the more irregular the surface. On the contrary, the larger is H, the smaller is D and the smoother the surface. In the case of profiles D=2-H. (4.21) The fractional Brownian model (fBm) provides a method of generating irregular, self-similar surfaces that resemble topography and have known fractional dimension [Mandelbrot, 1975]. It allows simulation of terrain with specific surface characteristics. Figure 4.7 shows example realizations of fBm for various values of H. If the surface has fractal characteristics, then H is approximated by the slope, p\ of the bestfitting line produced from the plot of the mean-squared difference in the elevations as a function of the distance between samples on log-log scales. That is, H = [3/2 or D = 3 (pV2j. The ordinate-intercept of the best-fitting line represents the expected difference in elevation for points a unit distance apart and is referred to as the gamma value by Klinkenberg and Goodchild [1992]. It has been shown that gamma captures some aspect of 151 z A 152 the magnitude of the roughness of the surface and has significant correlations with the standard deviations of the elevations. The combination of the fractal dimension and the gamma value is anticipated to capture the essential characteristics of the land surfaces. Of course, a real fractal surface is ultimately unmappable because by definition it includes variation at all spatial resolutions from zero to infinity. Fractal characteristics can only be assessed within a limiting maximum resolution (corresponding to the overall size of the sample) and minimum resolution (corresponding to the sampling interval) [Goodchild and Tate, 1992]. If the statistical behaviour of the land surface is not similar to that of a fractional Brownian surface, then the variogram derived dimension may not represent the fractal dimension. One way of judging the goodness of fit of the self-similarity model is to observe the linearity of the plots used in the determination of the fractal dimension [Yokoya et al, 1989]. This can be accomplished by applying an interactive least-squares line fitting procedure during the analysis stage. Self-similarity implies that the spatial structures inherent within a landscape repeat themselves at all scales, that is, a part of the whole resembles the whole [Clarke, 1987]. Empirical research results show, however, that the self-similar fractal model provides a very good fit for some land surfaces but an imperfect fit for some others [Klinkenberg and Goodchild, 1992]. As shown by Mark and Aronson [1984], it is not likely that fractal characteristics are displayed by terrain as a whole. Real surfaces are rarely pure fBm, but often contain 153 departures from the fBm model in the form of local linear trends and scale dependencies [Goodchild and Tate, 1992]. Real surfaces are often multi-fractal in that D is observed to vary both spatially and with scale. This is because earth-forming and earth-moving processes have been at work on almost all landscapes, for varying amounts of time, and as a result the landscape bears the 'forms' or manifestation of the scales at which these processes operate or operated in the past [Clarke, 1987]. Topography may have many statistical properties similar to fractional Brownian surfaces, but different dimensions may apply over different scale ranges separated by distinct scale breaks. For scale ranges between adjacent breaks, surface behaviour should be that predicted by the fractal model; the breaks represent characteristic horizontal scales at which surface behaviour changes substantially. These scale breaks are especially important for digital terrain modelling, since they represent scales at which there is a distinct change in the relation between sampling interval and associated error [Mark and Aronson, 1984]. Mandelbrot's D can be used as a useful indicator of the complexity of autocorrelations over many scales for natural phenomena. For surfaces with high fractal dimension, the autocovariance is low, and points cannot be accurately predicted (interpolated) from the elevations of neighboring points. Thus, considerable information is lost as the sampling interval is increased. Conversely, when the fractal dimension is low, the surface is 'smooth,' and elevations can be interpolated from their neighbors; the rate at which precision decreases as the sampling interval is increased would be much slower [Mark and Aronson, 1984]. 154 However, although many natural phenomena do display certain degrees of statistical selfsimilarity over many spatial scales, there are others that seem to be structured and have their levels of variability clustered at particular scales. This behaviour does not exclude them from the fractal concept. Mandelbrot considers that it is quite acceptable to have a series of zones of distinct dimensions connected by transition zones. If this is reasonable, it means that the examination of D values would be useful for trying to separate scales of variation that might be the result of particular natural processes. Moreover, identifying such scales could be of enormous practical value because one could then tailor sampling to a particular scale range of the phenomenon in question, therefore improving the efficiency of expensive field investigations and the resulting interpolations [Burrough, 1981]. Research by Klinkenberg and Goodchild [1992] also shows that the fractal dimension is capturing an aspect of the land surface which is not reflected in the traditional morphometric parameters. Therefore, it will be combined with other parameters to quantitatively characterize the terrain surface. As shown by Goodchild [1980], some error in computer representations of surfaces is related to sampling interval through the fractal dimension. Changes in the slope of the best fit line indicate breaks in scaling where fractal dimension changes. Thus those breaks would be of key significance in assessing the efficiency of sampling densities for digital elevation models. 155 4.4 Hierarchical Terrain Classification 4.4.1 Principles of terrain classification Classification, the ordering of the basic units—usually referred to as operational taxonomic units, OTUs, by numerical taxonomists—into groups on the basis of their relations and a specified set of criteria, is one of the fundamental procedures in any scientific discipline. Geographers have traditionally been concerned with the description and differentiation of the earth's surface [Mather, 1972]. In the late 1960s and early 1970s there was a considerable amount of effort devoted to the development and application of numerical methods of classification (numerical taxonomy) to geographical data [Burrough, 1986]. Cluster analysis is the name given to a bewildering assortment of techniques designed to perform classification by assigning observations to groups so that each group is approximately homogeneous and distinct from the other groups [Davis, 1986]. In general, two types of approaches are identified in areal classification or clustering—hierarchical and nucleated. Hierarchical groupings usually involve a varying threshold which denotes the level of connection necessary for potential group entrance. OTUs that are highly connected are grouped together at an early stage, and as the threshold is lowered more OTUs or groups merge to form classes at different levels of generalization until all OTUs are joined to form a single class. The result is a hierarchy of classes and is typically illustrated in the form of a linkage tree diagram or dendrogram. This will be useful in the examination of the role of 156 scale and, therefore, will be the method used in this research. In hierarchical clustering, partitions are achieved by cutting a dendrogram after examining the difference between fusion levels or deciding on the appropriate number of clusters. Hierarchical classification techniques may be further subdivided into agglomerative methods which proceed by examining the similarities between individual data points before fusing them into groups, and by divisive methods which consider the whole data set first, and then examine the best ways to successively subdivide into finer groups [Dunn and Everitt, 1982]. A large number of algorithms for cluster analysis have been developed and many are today available in the form of well-documented computer packages (e.g., Splus). Nucleated clustering method can be used for the classification of objects which can be represented as points in Euclidean space of some number of dimensions. The aim is to partition a set of n points into m groups so as to minimize the total within group variance (i.e., sum of squares) about the m nuclei. It operates on the similarity between the observations and a set of m arbitrary starting points that serve as initial group centroids. The observation closest or most similar to a starting point is combined with it to form a cluster. Observations are iteratively added to the nearest cluster, whose centroid is then recalculated for the expanded cluster. This appears to be a more reasonable method of obtaining compact clusters since long chains of points are avoided. 157 4.4.2 Concept of distance Most clustering methods use a concept of 'distance' in the variable (or property) space as a measure of similarity. The closer the data points in m-dimensional space, the more similar the data points. The property values form a multivariate description that can be partitioned into groups through a number of different criteria. Thus, the range of methods for calculating similarities and for establishing linkages between items and groups (e.g., single-, average-, or complete-linkage) leads to a large number of possible classification strategies. Ideally, if the data are well-structured, then different methods should yield similar results [Burrough, 1986]. The goal of different grouping criteria is usually to minimize the distance vector within groups and maximize between-groups-distance. The similarity between a pair of data points with a set of m properties in the data space can be estimated in several ways. The method used will depend on the types of property studied, and on the way in which the information has been coded. The complement of the similarity of two units is their dissimilarity, and in many cases it will be this measure that is determined from the data. Sokal and Sneath [1963] identify three types of measures of similarity or dissimilarity—coefficients of association, correlation coefficients and distance measures. Spence and Taylor [1970] add a fourth type of measure-the probability approach. Coefficients of association involve presence/absence (i.e., binary) data arranged into a 2x2 frequency table of OTU against OTU. The best known measure of association is the % 2 158 statistic and it has been widely used in taxonomy, particularly in divisive approaches. When dealing with data other than on a binary scale some other measure of similarity must generally be used. Correlation measures exist for all levels of measurement scale. For example, the non-parametric measures—the contingency coefficient and Spearman's rank correlation coefficient—can be used for nominal and ordinal data respectively. The most commonly used correlation measure of similarity for interval/ratio data is the Pearson's product-moment correlation coefficient described in most statistical textbooks. It should be pointed out, however, that not all taxonomists believe this coefficient to be a suitable measure of similarity. Distance is a more restrictive measure of dissimilarity. It measures identical items by a zero value, and by increasingly large (positive) values as the proximity of the items decreases [Murtagh and Heck, 1987]. Given the coordinates of any two points on a plane the distance between them can be easily calculated by using the Pythagorean sum of squares equation. This equation can be generalized to measure distances between points in m-dimensional space, given their coordinates. Thus the simplest distance measure: 4= m where d is the distance between points i and j in a space of m dimensions with X being the tj value of property k. Such an expression arises from the assumption that the m properties are represented by m orthogonal axes (i.e. axes at right angles to each other). As one would 159 expect, a low distance indicates the two points are similar or "close together," whereas a large distance indicates higher dissimilarity. The Euclidean distance measure defined above is conceptually the most straightforward and has been the dissimilarity measure most widely used in numerical taxonomy. It is important to note, however, that this measure should only be used where the space is orthogonal [Spence and Taylor, 1970]. Otherwise many of the properties of Euclidean geometry will no longer apply, including Pythagoras's theorem. However, in practice, because of the correlation of different properties examined, the assumption of orthogonality will not be justified and the Euclidean distance will be a poor measure of distance between OTUs. This problem can be overcome either by using oblique coordinate axes and a measure such as Mahalanobis' generalized distance or by transforming to principal component axes [Dunn and Everitt, 1982]. Mahalanobis' distance measure has the advantage in that it allows for correlations between variables. The principal component analysis is a mathematical technique for examining the relations between a number of individuals, each having a set of properties. The original data are transformed into a set of new properties, called principal components, that are linear combinations of the original variables. The principal components have the property that they are orthogonal (independent) of each other. 4.4.3 Variable standardization Another important point to make clear is that Euclidean distance calculated directly on the 160 raw data may make little sense if the m properties observed on each O T U have different scales or measurement units—as is often the case. Some scaling of the data will be required before using a distance. Euclidean distances. deviations. Thus all properties should be standardized prior to computing By this is meant expressing each property in units of standard This standardization transformation is done for each property dimension (k) simply by subtracting the mean of each variable distribution ( x ) from each raw data value k (x ) and dividing by the standard deviation (s ) of all the data values of that property. That ik k is: (4.23) This is also called the standard score (or Z score) for an element. It ensures that each variable is weighted equally. Otherwise, the distance will be influenced most strongly by the variable which has the greatest magnitude. 4.4.4 Terrain classification Automated terrain classification involves the partitioning of an area into homogeneous topographic regions through quantitative interpretation of a digital terrain model. The classification of elements in a D E M is performed in the same way as the classification of a remotely sensed image in digital picture processing. In remote sensing, land-cover is classified on the basis of spectral reflectance values over different wavelength bands. In the 161 case of terrain classification, rather than using spectral signatures, a variety of geomorphometric measures are employed instead. The geomorphological data matrix (which consists of various geometric measures) is used as input for data processing and statistical classification using a hierarchical clustering method. The result of this procedure is a set of terrain units homogeneous in terms of their measured differentiating characteristics. Weibel and DeLotto [1988] presented a generic sequence of steps involved in the process of automated terrain classification. It consists of three steps: specification of variables, extraction of geometric signature, and multivariate classification. These steps are shown graphically in Figure 4.8. Different variables may require that different extraction techniques be applied; and a different classification algorithm may be used depending on the type of variables selected. The first two steps (i.e., specification of variables and extraction of geometric signature) have been discussed and illustrated in earlier sections. Multivariate classification consists of the following two steps: (i) preprocessing for variable selection and/or transformation, and (ii) grouping of homogeneous areas into distinctive classes. Both steps are affected by a priori knowledge. A logical structure is necessary to indicate how a set of variables over a group of observations can be ordinated, scaled according to some measure of similarity, and finally grouped or divided. The purpose of preprocessing is to remove multicollinearity between different variables. Variables that are used in a multivariate classification model should not be highly interdependent. Only if the variables are not correlated may it be assumed that each variable contributes an individual share to the explanation of topographic variation and is, thus, useful. Two different methods may be used 162 digital terrain model • specific ation of variaibles a priori knowledge extraction of geometric signature classification (preprocessing, grouping) classes of homogeneous terrain characteristics Figure 4.8 Generic sequence of automated terrain classification. 163 to achieve this: variable selection based on correlation analysis, or variable transformation (i.e. ordination) by a variety of multidimensional scaling methods such as principal components/factor analysis. Factor analysis transforms a highly interrelated and interdependent data set into a series of basic orthogonal or independent dimensions, so it can be used to disentangle unknown interdependencies in a particular set of data [Spence and Taylor, 1970]. Each new dimension represents an independent or orthogonal axis of variation on which an individual variable or observation can be rated. However, the new uncorrelated synthetic variables thus derived have to be interpreted anew since no one-to-one causal relationship exists between the original and the derived variables. Variable selection only eliminates significantly correlating variables, the remaining variables are part of the set of original variables. Interpretation of classification results may therefore be more easily accomplished if only variable selection is used [Weibel and DeLotto, 1988]. That is, the classification results can be analyzed in relation to the original roughness variables so that the characteristics of various terrain clusters are fully understood. For this reason, variable selection method is used in this research. 4.5 What is next? In the following chapter, a global characterization of the surface variabilities will be accomplished using the various methods discussed in section 4.3 in order to identify the significant scale levels or breaks in the study area surfaces. Those significant scales will then be used to guide the selection of the moving window sizes that will be tested for the 164 extraction of the local roughness measures. Finally, a multivariate statistical analysis, based on the local geomorphometric measures derived from the study area DEMs, will be used for automated hierarchical terrain classification in which relatively homogeneous terrain types at different scale levels will be identified. Some detailed test results based on the study area multi-scale D E M data sets will then be presented. Figure 4.9 summarizes the steps that will be followed in the next chapter for the study area surface characterization and classification. 165 DEM Global characterization a priori knowledge Grain Spectra ANOVA Variogram Specification of local measures Selection olfthe moving wirtdow sizes Extraction of the local measures Variable group selection Hierarchical clustering Interpretation of the terrain clusters Figure 4.9 Major steps for study area topographic characterizaton. 166 5. TOPOGRAPHIC C H A R A C T E R I Z A T I O N RESULTS 5.1 The Role of Scale in Topographic Characterization Because of the wide range of topographic variation present in different landscapes, elevation is no exception to the 'scale problem.' In the case of digital terrain modelling, since the degree of roughness of elevation data is important when trying to make interpolations from sample point data, such as by least-squares fitting or kriging, it is worth examining the elevation data beforehand to see if the data contain evidence of variation over different scales, and how important these scales might be [Burrough, 1981]. As indicated in section 4.1, it is useful to determine some global characteristics of the surface from the study area DEMs. Firstly, they can be used to examine the relation between general surface characteristics and significant scale breaks. It is hoped that some general conclusions can be made with respect to the determination of appropriate/optimal window sizes for the extraction of local measures. Secondly, they may be helpful when making comparisons with regard to the usefulness of a D E M error model for various surfaces (e.g., 93G, 93H, and E M R l ) and different resolutions (e.g., 50 m and 1 km). This section will examine the role of scale in topographic characterization and interpret some global characterization results. The global characteristics of the whole study area and the two subareas will be identified using the grain measure, spectral analysis, nested analysis of 167 variance and fractal analysis of EMR1 and T R I M DEMs using the methodologies discussed in section 4.3. 5.1.1 Grain determination As discussed in section 4.3.1, grain can be used to guide the selection of the size of area over which the other local geomorphometric parameters are to be measured since it indicates the characteristic wavelength or optimal sampling area of the topographic surface. It is determined by the 'knick point' in the plot of local relief against circle diameter, and it is where the line representing local relief levels off and stops to increase dramatically with distance. Figure 5.1a shows the graphs used to determine the grain of each subarea in the study area based on T R I M DEMs. From these two graphs it can be seen that there exists several 'knick points' in each plot. The flatter subarea T R I M 93G has a grain of greater than 15 km (this represents the limits of the data due to the size of each subarea); however, there are some minor 'knick points' at about 9 km, 4.6 km, 3 km, and 1 km. The rougher subarea T R I M 93H has a smaller grain of 9 km and there is also a 'knick point' at about 4 km. For the whole study area, the grain is determined from EMR1 D E M and the result is shown in Figure 5.1b. The curve in Figure 5.1b indicates some 'knick points' at about 155 km, 110 km, 46 km, and 9 km. Based on the size of the 'knick points', and the consistency of the value, across data sets, it appears that 9 and 46 km are the significant scales in the study area 168 169 170 surface. 5.1.2 Fourier interpretation The scales at which "forms" exist in the landscape can be examined using Fourier analysis [Clarke, 1987] since the basis of this technique is to separate the characteristic 'forms' by scale. The Fourier interpretation of scale is the spatial wavelength, and this provides an operational definition for the study of processes in terms of scale, also known as spectral analysis. As indicated in section 4.3.2, the spectrum gives a measure of the proportion of a process which occurs at any one of a large number of scales. In the case of one-dimensional spectral analysis, the relation of large-scale to small-scale roughness is expressed in the slope of the spectrum plot, and the spacing of topographic periodicities is indicated by the humps in the spectrum. Before using one-dimensional spectral analysis, the directionality of the study area topography needs to be examined so that several representative profiles can be extracted for the analysis. The two-dimensional spectrum plot of the study area derived from the 5' D E M is shown in Figure 5.2a, which indicates similar anisotropy in the study area as in the spectrum of whole B.C. estimated from 5' D E M (see Figure 4.4). Figure 5.2b presents another two-dimensional spectrum of the study area but is derived from the 1 km E M R l D E M . Again, anisotropy is evident. Figures 5.2c and 5.2d show the two-dimensional spectrum plots of the T R I M 93G and T R I M 93H subareas. The spectrum of the T R I M 93G subarea in Figure 5.2c indicates 171 Minos-HPON 172 CM d o 9vOU909 9vOU9 (LU) 6U!L|)JON 173 9vOU96'9 1 0 174 OOS 092 002 091- 001- L|jnos-i])JON 175 09 0 some directional bias along two axes (i.e., north-south and east-west directions), while the spectrum of the T R I M 93H subarea in Figure 5.2d displays a circular pattern of isopleths and, thus, no apparent anisotropy, a reflection of the dendritic drainage pattern in the area. The maximum wavenumbers resolvable by the above analyses are 0.5 km" for E M R l D E M and 1 10 km" for T R I M DEMs because of their resolutions (i.e., 1 km E M R l and 50 m T R I M 1 DEMs). Considering the characteristics of the two-dimensional spectrum plots for various surfaces presented above, several representative profiles were extracted from each D E M . Using the spec.pgram spectral analysis function in Splus, one-dimensional spectra of four different representative topographic profiles extracted from the T R I M 93G and T R I M 93H subareas were produced. Figure 5.3a displays a representative east/west direction topographic profile (i.e., prof.93g.ew) and its one-dimensional spectrum plot for the T R I M 93G subarea. Figure 5.3b shows a representative north/south direction topographic profile (i.e., prof.93g.ns) and its one-dimensional spectrum plot for the T R I M 93G subarea. Figures 5.3c and 5.3d show representative east/west and north/south direction topographic profiles (i.e., prof.93h.ew and prof.93h.ns) and their spectrum plots for the T R I M 93H subarea. For the whole study area, six different profiles along various directions (i.e., emr.profilel to emr.profde6) were extracted from the E M R l D E M and examined. The first profile (i.e., emr.profilel) was extracted along an east/west direction. The second one (i.e., emr.profilel) was extracted along a southwest/northeast direction 30 degrees from east/west direction. The 176 177 178 m 09 017 02 0 wrupads Et o p. 2 J3 000E 0092 0002 0091- (oi) U0!}BA9|3 180 000! third one (i.e., emr.profile3) was also along a southwest/northeast direction but 60 degrees from east/west direction. The fourth one (i.e., emr.profile4) was along a north/south direction. The fifth one (i.e., emr.profileS) was along a northwest/southeast direction 30 degrees from north/south direction. The sixth one (i.e., emr.profile6) was also along a northwest/southeast direction but 60 degrees from the north/south direction. In order to make the results easily comparable, each profile keeps the same sampling interval of 1 km. Therefore, the profiles that are not along the grid were interpolated from the original D E M . Figures 5.4a-f show these six profiles and their one-dimensional spectrum plots. Plots of log amplitude (or power) against log wavelength often show a linear decline from high variability at long wavelengths (low frequencies) to low variability at short wavelengths. The above resulting spectral functions can be analyzed and interpreted either visually or numerically. Based on the spectral analysis results shown in Figures 5.3a-d, the following interpretation can be made regarding some scale-related properties for the two subareas. First of all, the two profiles from the T R I M 93H subarea show, as expected, much greater overall roughness than the profiles from the T R I M 93G subarea. From Figures 5.3a and 5.3b, the spectrum plots of an east/west and a north/south topographic profile from T R I M 93G subarea, several prominent peaks can be identified. They appear around frequencies 0.125, 0.17, 0.23, 0.34, and 0.46. Since the sampling interval of the original profiles is 50 m, the above frequencies roughly correspond to wavelengths of 400 m, 300 m, 220 m, 150 m, and 105 m. From Figures 5.3c and 5.3d, the following three prominent peaks are identified for the T R I M 93H subarea: 280 m, 170 m, and 110 m. Based on the spectral analysis results for the whole 181 182 183 185 09 000S Ofr 009S 0002 OS 0091. (UJ) U0|IBA9|3 187 0 0001- 009 study area (i.e., Figures 5.4a-f), it appears that the following important horizontal scales exist in the topography: 6.7 km, 5.6 km, 4.5 km, 3 km and 2.2 km. A major advantage of onedimensional spectral analysis over other global measures is thus its capability of identifying unexpected or less obvious periodicities in the topography. The strength of each of the above periodicities varies, as reflected by the width and height of the spectral peak. If the hills and valleys traversed by profiles were randomly dispersed, both horizontally and vertically, then the resulting variance spectra would be linear, featureless plots. It can be seen from the above spectra, however, that linear regressions would not be good fits to them. These poor fits result from the presence of significant peaks within a restricted wavelength band. In Figure 5.3a, for example, three least-squares best fit lines could be roughly drawn to fit different portions of the spectral section and two break points can be identified at around 720 m and 166 m. In Figure 5.3b, two break points are around 1 km and 166 m. Therefore, three scale breaks around 1 km, 720 m, and 166 m are important for subarea 93G. Similarly, the important scale breaks for subarea 93H are around 1 km, 333 m, and 172 m. For the whole study area surface, the break points are around 6 km and 3 km. The identification of these scale breaks and multiple peaks is useful in that they indicate at what scale levels the original terrain contains higher variability. This information can then be used to guide the selection of appropriate sampling interval and moving window size. Interestingly, though, the spectra appear to be identifying more high frequencies than the grain plots because of the use of logarithms for the spectra. 188 5.1.3 Examining scale effects by nested analysis of variance Following the example by Moellering and Tobler [1972], as mentioned in section 4.3.3, nested A N O V A was done for two 256 by 256 D E M matrices extracted from the T R I M 93G and T R I M 93H subareas (i.e., trim.93g.256 and trim93h.256). The matrices used in this analysis are 256 x 256 instead of 300 x 320 because the nested A N O V A can apply only to a 2" x 2" square matrix. The results of the scale-variance calculation for the two subareas are given in Tables 5.1a and 5.1b respectively. The scale-variance components may also be graphed as a magnitude versus frequency graph or spectrum as in Figure 5.5a. As can be seen from the graphs, it is evident that for both subareas the greatest amount of the scalevariance is at level 1, the biggest grid cell size, and that the importance of the scale-variance components diminishes quickly as the cell sizes get smaller. This is typical for most natural topographic surfaces because, as mentioned earlier in section 4.3.2, and as evident in Figure 5.5a, the greatest vertical variability within a surface is almost always associated with the longer wavelengths. For both subareas, the most detailed three scale levels (6, 7 and 8) contain no more than 2% of the total sum of squares. However, there are some differences between the spectra of the two subareas. First of all, the variance at each scale level is much higher for the T R I M 93H than for the T R I M 93G subarea because of the greater roughness of the former surface area. Secondly, the importance of the scale-variance components for T R I M 93H does not diminish as quickly as that of T R I M 93G when the scale level increases. For T R I M 93H, scale levels 1 and 2 are almost equally important whereas for T R I M 93G, level 1 alone contains more than 50% of the total sum of squares and the importance of level 189 Table 5.1a Scale variances for T R I M 93G D E M data (trim93g.256) Scale Level Sum of Squares (SS) Percent of Total SS Degrees of Freedom Mean Square ScaleVariance Component 1 1.30e+08 50.80 3 4.35e+07 2653.40 2 6.68e+07 26.01 12 5.56e+06 1358.38 3 3.34e+07 12.99 48 6.95e+05 678.56 4 1.55e+07 6.02 192 8.05e+04 314.38 5 6.64e+06 2.59 768 8.65e+03 135.13 6 2.66e+06 1.04 3072 8.65e+02 54.08 7 1.05e+06 0.41 12288 8.54e+01 21.36 8 3.76e+05 0.15 49152 7.65 7.65 Total 2.57e+08 100.00 65535 - - 190 Table 5.1b Scale variances for T R I M 93H D E M data (trim93h.256) Scale Level Sum of Squares (SS) Percent of Total SS Degrees of Freedom Mean Square ScaleVariance Component 1 4.39e+09 33.74 3 1.46e+09 89348.82 2 4.27e+09 32.77 12 3.55e+08 86785.24 3 2.33e+09 17.90 48 4.86e+07 47415.16 4 1.27e+09 9.78 192 6.63e+06 25897.08 5 4.94e+08 3.80 768 6.44e+05 10056.94 6 1.87e+08 1.43 3072 6.07e+04 3794.74 7 5.85e+07 0.45 12288 4.76e+03 1189.77 8 1.70e+07 0.13 49152 3.46e+02 346.21 Total 1.30e+10 100.00 65535 - - 191 CD a 00 CO G > CD G CD co co CO CD > o _CD CO or i- O CO T3 X CO CO <D > _0) CO o CO or oo D i-I 60 60 X! d CM CM T3 W Q 009S 0001- 0 000001- aoueuBA OOOOfr co 0 ON aoueuBA H T3 G a 00 o co 00 ON CO CD co CD CD > CD r- <U CO CD CD > a _CD 0) 0) S CO CO o & o CO or T3 X rr cj> CO XI 2 3 O & 1> Cl, CM 09 Ofr 03 CM 0 oe sejenbs jo tuns % oi o sajBnbs jo tuns % 192 G G 2 «S ccj in in G 60 c > s C3 CD T3 CD 60 1) in CM m 60 60 «J CO CM XI rr d UJ CO CO CM CM T3 W Q oo CN oe os 01 o ooooe saienbs jo urns % a aouBUBA c3 00 CN CD CD in in w 3 OH CO CM s rr s CO LU CO CO CO S * " e s g S o o <-> <U CM oo CM CM in 02 0L 0 00002 0 3 60 saienbs jo wns % 80UB|JBA 193 ,_ U fi 5 XI X) ir> 08 <D '3 g Table 5.2 Scale variances for E M R l . 128(1) and E M R l . 128(2) Scale Sum of Squares (SS) % of Total SS Degrees of Freedom Mean Square ScaleVariance Component 1 439804493 25.94 3 1.46e+08 35791.4 2 337921972 19.93 12 2.81e+07 27500.2 3 243234662 14.34 48 5.06e+06 19794.5 4 214700121 12.66 192 l.lle+06 17472.3 5 218902869 12.91 768 2.85e+05 17814.4 6 157121768 9.26 3072 5.11e+04 12786.6 7 83735469 4.93 12288 6.81e+03 6814.4 Total (1) 1695421355 100 16383 - - 1 644961726 29.05 3 2.14e+08 52487.1 2 256899001 11.57 12 2.14e+07 20906.5 3 320717165 14.45 48 6.68e+06 26100.0 4 285311366 12.85 192 1.48e+06 23218.7 5 328478483 14.79 768 4.27e+05 26731.6 6 252269124 11.36 3072 8.21e+04 20529.7 7 130813349 5.89 12288 1.06e+04 10645.6 Total (2) 2219450213 100 16383 - - 194 2 drops quickly compared to level 1. Since the T R I M D E M resolution is 50 m, scale level 0 corresponds to grid size 12.8 km (i.e., 256 x 50 m); scale level 1 corresponds to grid size 6.4 km; level 2 corresponds to 3.2 km; level 3 is 1.6 km; level 4 is 800 m; level 5 is 400 m; level 6 is 200 m; level 7 is 100 m; and level 8 is 50 m. The results of nested A N O V A indicate that for both subareas not much more terrain variability, hence precision, will be gained by sampling at intervals smaller than 200 m. However, as the sampling interval is coarsened, the rate at which information is lost would be slower for T R I M 93G than for T R I M 93H. For the whole study area, two different 128 by 128 matrices were extracted from the original 160 by 187 E M R l D E M and a nested A N O V A was done on each to examine the scalevariance at various scale levels. These two matrices were named EMRl.128(1) and E M R 1.128(2). They were extracted from the northwest corner and the center part of the original E M R l D E M respectively in order to see what impact the different locations would have on the results. Since E M R l D E M has a resolution of 1 km, scale level 0 corresponds to grid size 128 km, scale level 1 corresponds to size 64 km, and scale level 7 represents a grid size of 1 km. The results of scale variance analysis for each of the two square matrices are displayed graphically in Figure 5.5b. Table 5.2 contains the numerical results of scale variances. Based on these results, several observations can be made with regard to the study area surface variability. Again, in both cases, the greatest amount of the scale-variance is at level 1, the biggest grid cell size (i.e., about 26% in case one and 29% in case two). When compared to the T R I M 93G and T R I M 93H subareas, however, the importance of the scale- 195 variance components does not diminish as quickly as the cell sizes get smaller. Each of the scale levels from 2 to 6 (i.e., from 32 to 2 km) still has above 9% of the total sum of squares. The two cases tested for the whole study area mainly differ at the first two scale levels (i.e., 1 and 2). E M R 1.128(2) exhibits a more 'interesting' pattern of scale variances with a drop at scale level 2. This difference is probably caused by the inclusion of more rough area in EMR1.128(2) (i.e., greater total sum of squares as seen in Table 5.2). In connection with the interpretations from grain and spectral analyses, the following observations can also be made regarding the global characteristics of the study area surfaces: (i) significant variance in subarea 93H occurs at scale levels 1 and 2 (possibly reflecting the variability observed in the grain plots at 9 km); (ii) scale level 2 (i.e., 3.2 km) in 93H contains great variability which may be related to the 4 km minor 'knick point' as identified earlier in grain analysis; (iii) the variability at scale level 2 (i.e., 32 km) in the central region in the EMR1 D E M drops a great deal from level 1 (i.e., 64 km) indicating a major scale break in between (probably corresponding to the significant scale 46 km as identified in grain analysis); (iv) for EMR1 more variability is evident at level 5 than 4, indicating one or more minor scale breaks between 4 and 8 km (compare to the scale breaks 4.5, 5.6, and 6.7 km as identified in the spectral analysis). 5.1.4 Interpreting the variograms In this research, the variogram method is adopted from Klinkenberg and Goodchild [1992] 196 in order to determine the fractal dimension of the study area surfaces. Figures 5.6a shows the surface variograms derived from the two subarea DEMs TRIM93G.SUB and TRIM93H.SUB (each has a dimension of 300x320). First, scattergrams of the elevation difference variance squared against horizontal distance were plotted in the log-log space. For each one, the maximum distance was divided into 30 distance classes defined such that in log-log space the classes would be of equal width on the x-axis. This practice improves the reliability of the linear regression [Klinkenberg and Goodchild, 1992]. Then, for each of the variograms, the linearity of the plot was examined and breaks of slope were identified by visual inspection. That is, the first and last points which appeared to enclose a reasonably linear cluster were identified and straight lines were least-squares fitted to the intervening parts of the scattergrams and plotted. Finally, the slopes were calculated for all the lines and associated parameters including fractal dimension (D), gamma value and break distance, were printed. In the fractal plots for the two subareas some non-linearity is present and the selfsimilar fractal model does not perfectly describe the study area landscape surfaces at the scales considered in this research. As can be seen from surface variograms derived from the two subarea DEMs, shown in Figure 5.6a, both plots used in the determination of the fractal dimension display more than one linear segment. For both plots there are four least-squares best fit lines. These fractal elements indicate scale-delimited regions of self-similar topography. It is evident from the above figure that the variogram plot of T R I M 93G demonstrates a better overall linearity than that of T R I M 93H. The scale breaks for T R I M 93G are not as distinctive as for T R I M 93H. 197 198 The fractal dimension (D) for T R I M 93G changes from 2.28 to 2.35 to 2.29 and then to 2.25 for the four different scale ranges. For T R I M 93H, the changes in the D values are more dramatic, from 2.42 to 2.24 to 2.44 and then to 2.72. The gamma value indicates some aspect of the magnitude of the surface roughness. Since T R I M 93H is overall much rougher than T R I M 93G, the gamma values of T R I M 93H variogram plot are higher, as expected, 6.68, 6.75, 7.81, and 10.18 versus only 3.05, 3.16, 2.56, and 1.8 for T R I M 93G. The break distance represents the maximum distance to which each linear regression line could be fitted. Clearly, T R I M 93G has some minor breaks around 631 m, 1.7 km, 3.9 km and a much longer break distance over 19.8 km which is beyond the limit of the study area size. For T R I M 93H, some major breaks exist at around 102 m, 420 m, 1.4 km and 7.2 km. The minimum distance to which a fractal element could be assigned is, of course, set by the T R I M D E M grid spacing of 50 m. Following the same methodology, the surface variogram plot of the study area EMR1 D E M was derived and is shown in Figure 5.6b. It demonstrates a poor overall linearity and a number of scale breaks exist. Three linear segments could be fitted. The first major scale break identified is between 3 and 4 km (around 3.8 km) for the whole study area surface with a fractal dimension of 2.26 below this level and a fractal scale of dimension 2.84 above. The other two scale breaks identified are around 33 km and 166 km. It should be noted that the scale breaks identified by the variogram method and the other methods as discussed above are not precise and they provide only a general picture of the 199 Variogram plot of E M R 1 D E M Figure 5.6b Surface variogram derived from EMR1 D E M . 200 global characteristics of the surface. They should therefore be used only as a guideline for the selection of the appropriate sampling intervals or window sizes. 5.1.5 A comparison of various global surface characterization methods and their results There are definite links between grain measures, spectral analysis and the methodology of both nested analysis of variance and fractal analysis using the variogram technique. Each method has its advantages and disadvantages in the study of scale effects and is capable of identifying only a limited range of scales. Spectral analysis provides the most comprehensive interpretation of terrain according to the frequencies of undulations, although assumptions about stationarity of the elevation data do not make it especially amenable for terrain. The anisotropic nature of topography also makes it difficult to interpret the meaning of twodimensional spectrum. Therefore, one-dimensional spectral techniques need to be used for dealing with either the average of the two-dimensional spectrum over all directions, or the spectrum of some representative profiles across or along the direction of the longest significant wavelength in the topography. The spectra are able to identify more scale breaks at shorter wavelengths. However, the variograms and associated fractal scales seem more capable of detecting scale breaks over a wider range. The self-similar fractal model has been found to provide a very good fit for some terrain surfaces, but a not so good fit for some others [Klinkenberg and Goodchild, 1992]. The limitations of fractal analyses are comparable to those of autocorrelation, spectral and regionalized variable analysis. 201 Both grain and variogram analyses describe the same attribute of topography, spatial autocorrelation of elevation. Pike et al. [1989] suggest that the geostatistical parameters termed "range" and "sill" on elevation variograms are similar to "grain" and "relief at grain" on grain graphs (local relief versus area diameter). Grain values might be obtained from topographic variograms which, based on formal geostatistics, may give a much needed theoretical basis for the relief/grain concept. One problem with the grain measure determined from a grain graph is that the choice of the center of the sampling area is arbitrary. Estimates of local relief from unit cells of one size do not represent a wide spectrum of terrain types with equal fidelity. This problem reflects the varied dominance of topography by local features that differ widely in relief and spacing [Pike et al, 1989]. The nested A N O V A can be useful in identifying the scales where significant terrain variability exists, but it is limited in the level of information it provides. Besides, it can only handle 2" x 2" square matrices and it assumes a complete enumeration, not a sample. Based on the above global characterization results for the two subareas and the whole study area surface using various methods, some of their general characteristics can be summarized as follows (see also Table 5.3): 1) Overall, the self-similar fractal model is found to provide an imperfect fit for the whole study area EMR1 D E M . A series of distinct fractal dimensions connected by transition zones are evident from its variogram plot. That means the levels of surface variability are clustered at some particular scale ranges. 202 Table 5.3 Grain Spectra A summary of the important scales in the study area surfaces identified by various global methods 93G 93H EMRl Major: > 15 km Major: 9 km Major: 46 km 9 km Minor: 9 km 4.6 km 3 km 1 km Minor: 4 km Major: 1 km Major: 1 km Minor: 720 m 250 m 166 m Minor: 333 m 172 m Major: 3.2 km ANOVA Variogram Minor: 200 m Minor: 200 m Major: > 19.8 km Major: 7.2 km 1.4 km 420 m 102 m Minor: 3.9 km 1.7 km 631 m 203 Minor: 155 km 110 km Major: 6 km 3 - 4 km 2.2 km Major: 4 km 16 km 4 - 8 km 32 - 64 km Major: 166 km 33 km 3.8 km The multiple 'knick points' in the grain graph, the resulting nonlinear variance spectra from spectral analysis, and the results from nested A N O V A all indicate this similar characteristic. For the whole study area, there appear to be several scale breaks. For example, a scale break of about 4 km is evident from the variogram plot, A N O V A analysis and the spectra. Also a scale break of 9 km is identified from the grain graph. Such scales could be of great value because one could then tailor sampling to a particular scale range in order to capture the maximum amount of surface variability with minimum sampling effort. In addition, these scale breaks can be used to help identify the appropriate/optimal sampling interval and moving window size in the characterization of surface roughness for D E M error modelling. Because the study area EMR1 D E M has a resolution of 1 km and the moving window size has to be in odd numbers (i.g., 3x3, 5x5, 7x7, and 9x9), the above two scale breaks can be best characterized by window sizes (5x5) and (9x9). In Chapter 6, D E M error modelling results based on local roughness measures extracted using these two different characteristic window sizes (i.e., 5x5 and 9x9) will be compared to each other. Because a major scale break exists below 5 km for the whole study area surface, one would suggest that the smaller window size (5x5) would be one of the better choices for extracting local geomorphometric measures from EMR1 D E M . Two subareas demonstrate some very different global characteristics. For 204 example, the variogram plot of T R I M 93G demonstrates a better linearity than that of T R I M 93H, which indicates a better fit of the self-similar fractal model over the scale range being studied. Both the grain graphs and the variogram plots of the two subareas show that T R I M 93H has a shorter characteristic horizontal scale at 8 km ± 1 km. The variogram plot of T R I M 93H inflects at a distance of around 7.2 km whereas the variogram for T R I M 93G does not inflect until at least 19.8 km (its computation stopped at a distance of around 19.8 km). Similar observations can be made on the grain graphs of the two subareas. For T R I M 93H, a major "knick" point was found between 8 and 9 km whereas for T R I M 93G no major "knick" point until at least 15 km (the maximum range for the study area). Since a scale break at around 1 km was identified for both subareas by the spectral analysis and by grain measure for subarea 93G, a window size of (21x21) will be chosen for the extraction of the local measures in the following section (note that T R I M DEMs have a resolution of 50 m). For comparison purposes, a smaller window of size (7x7) will also be tested in order to see the effects of the scale break 420 m identified in subarea 93H but not in 93G. Because of the higher terrain variabilities at this short scale range in subarea 93H, one would imagine that the smaller moving window size might work better to characterize the local roughness for subarea T R I M 93H than for T R I M 93G for D E M error modelling. This will be tested in Chapter 6 by examining the effects of using two different window sizes. 205 The next two sections (5.2 and 5.3) will present some results of local characterizations of surface roughness using the 'moving window' technique on several DEMs in the study area. Based on the global characteristics of the two subareas, two different window sizes will be used for the extraction of the local roughness measures from 50 m T R I M DEMs: (7x7) and (21x21). For the whole study area, window sizes (5x5) and (9x9) will be tested for the extraction of the local parameters from the 1 km EMR1 D E M . Various terrain clusters will then be identified using a multivariate analysis of the local measures. 5.2 Local Surface Characteristics In order to identify various terrain classes and, therefore, to be able to quantify the relations between the spatial pattern of D E M errors and the variation of the surface characteristics, some local geomorphometric measures need to be extracted from the study area DEMs. First, the seven variables identified in section 4.2 are derived from the two subarea T R I M DEMs (i.e., TRIM93G.SUB and TRIM93H.SUB). Then, in order to examine the issue of scale and resolution in the characterization of surface roughness, the same set of variables is also derived from the 1 km EMR1 D E M for the whole study area. Several Fortran and Splus programs, based on the methodologies discussed in section 4.2, were used to extract the measures. Some of the results for the two subareas using T R I M DEMs are displayed in Figures 5.7a-g. Two different window sizes (i.e., 7x7 and 21x21) for the extraction of parameters were used on the 50 m resolution T R I M DEMs. Because of the edge effects of using a moving window, a matrix of size only (280 x 300) was created for each derived 206 3 , T TS c >< CD •a . .o CM X co CD — i & co pd J w g !§ L £ CO CT> o "5 15T3 6f CO OO / 2 co .9 -d 3 21 CO H co -. CO 3 -° 13 3 > CO O £ co "O 03 CO 3 CO CD ^ 2 § 8 Ico £ | 8 oo o o £a -3o-° ~g x co * O CO C+H o - d -ri G O co ° a & 03 co o c 13 CO > rd ii B T3 O — co (0 d co 60 .£0 1 | <* S CO o 8 S -9 o 03 •*•' X —I co d o OOOOi 000S9 00008 (Ul) 6UIC|}JON OO0SZ a <o w CO c •a =3Q 08 co (Ul) 6uiuiJorg > 0 13 CO — i• I 4—' to d X 60 a g oO 1 Biff <X*-> 2 u. CO T3 xi C 5 i l lS a d . O X rt » o LU Q LU Q CO D CO I g> CD 5 3 h T3 a Tj 03 co T3 c3 X O CO o IH d £ X 3 O co • d ON d ^3 CO O) UH 1 LT LX 03 en ooooz 00008 (ui) BuiU}jo|s| (Ul) SUIUIJON 207 O B T3 ,dg a3 OO • -H - xi 13 2 -2 '5b : 3 ' CD CO CJ) U £ 4=1 | s s to T3 •g CO XI Q c XI o 09 .o "2 CO CO OOOCY OOOS9 i 55 ooooe (UJ) BlMinJON T3 C c3 X I -a ^ c ooosz (ill) BufUJJON I3 «x tUJ > •s -s -o a • I -4—1 c*> _ <D a B DC cn ^ .2 X CO CL, 3 ij *c3 c o OH T3 > CD T3 "E CO "O tz UJ tL) *-> « 2 2^ gj aj 3 Ul S 60 X •a CO O Io O CJ CO o M 13 ^ ^ Ul 0009Z OOOOZ 00099 00009 (HI) 6umiiorg 0009Z c3 O (in) BuiuiJON • OX) «J 1-H 3 r—I OH O LU <U 3 3 aj 60 • Q H CD .9 I >» "4-. 4J O cj CO X CO C/3 _ QJ > > SW O Q 2 x F in ooooz 00008 (ill) BujuyoM (LU) BujUjJON 208 O flJ & 3 0009Z 3 60 PH -3 X o o S 1=1 ~ u to to X ! CH <-> ^ I CO < l o a I XI H o C • tin .o C/J CL M _ l cd to <D co c/3 is '5 <u 3-g T3 OOOOZ 00099 00098 (ui) 6UJL|IJO|\| 00008 0009Z i l l (LU) BUJLJLJON & O 1-1 CO eg 00 to co to ^ tO XI X -2 £ £ >! § sI CD CL O CO Sill cfl U ltO i s C to ~ ^a to E O lo to C <o 03 X Q x. co 000SZ OOOOi ooooe 00099 (HI) BULLION 0009Z fill) t)uil(|)' )(J sUc c/J H ~3 § 3 * ?? 2 w >o 5/3 oo O « to co •2 x X 1 3 CO 0) to < 3 5f> O tO to CD QO CO Px xi 5s<* 5 5 «> I« u 0OO0Z <2 03 000S8 00099 (LU) BUILJUON 00008 (LU) fkllLIJJON 209 0009Z cu s x CD I CO o o to o o to x CD co en CD n o o tr LX OOOOZ 000S9 OOOOB 0009/ (Ul) 6U(HIJOM (uj) CmujJON V ? f CO X CO co CD X CO CO o o o X CO n 1I CO o O DC (X 000SZ OOOOi 000S8 0OOS9 00008 000SZ (LU) SUJIOJON (UJ) 6u|injON LU LU Q CQ 3 CO CQ a =3 CO CD co o> X CO cc H 00008 0000Z (LU) BUIIUJON fui) BuiqjJON 210 3 E 13 a o 2 (3 co CD CD o 2 ^ 3 CO o CD O- CO CD ON t3 co ON o co J CD C o CO CD OOOOZ oooos 0OOS9 (UJ) SuiuyoN ooosz (ui) 6utiaio|sj is e3 J) 3 ^ CD rX g, CD OO 1 X co CO s ra s CL JO o g co o c o o co co 0000/: 00099 00008 (Ul) euiupofj. (ill) 6IJ|U|JON co Pn 5co 'fa cd 3 O CD CL O O CD O & P CL, O 3 CD oo OOOSZ cd '*co CD T3 P CD C <D CD SP ^ 6 LU Q CD 13 fi o -S Q m CD CO CO CD X 3 CO tr CD CO P 00008 ooooz (ui) BUJUIJON (ui) Buiuyofg 211 M 2 w O Q CO CO CO u cd UJ tuO CO CD co en e e 2— 03 > . H rH _j 2 W s ' CD <u o w 7" .S o .2? c C\l X O I CD w c o D. a> 2 X3 CD CD CD -O E 2 CD XI > t— 'u 1 <D E <D CN X CN T3 3 ooooe ooosz (LU) Bum jO iN I CO OJ CO o D. JC CD 0) n E E 3 CD c 'o u CD 3 3H N o *53 .2 ti <D 5 X Q o c i (D -a X <D p m X x O I- X W) X <D X 3 ON CD -b a oo 3 ON 03 T3 H c/i cD BO o & 3 Cw § x 3 co 'S O co O-, O T3 cJj 3 03 -r" LU Q CQ Zl 00 LU Q m 3 00 CD b0 3 c3 ' > O o X 03 CO CD 3 ooooz 00008 (LU) BUJLTUON (in) Buicrporg 212 oo cD X) 3 6 .S D tD O •-3 « i- E o u o c3 <<> o> -x C5 co ^ X CO CT> § co G CD ON C 0) E o £ T3 S • cd CO o § 3 03 & co y 3 ON O CO co x CD 2 S g3 8 X CO ON CO S3 .1.6 co 6 -2 o a o ts E o 00 B PL.' <0 5 3 CO CL •a 83 >, X 0 g co bfl 3 03 co Q s .s ^ LU Q Q CD CQ X> CO CO 6 X CO CO O) M to' 0 3 to 1 2 LU co O co en r* 1 O CO 00008 ooooz (Ul) BuiuyoN (UJ) BUIUJJON 213 1 —1 , — l > Ul CO Ui CN bO variable based on the original (300 x 320) DEMs. For the whole study area using E M R l D E M , some of the results of local measures are shown in Figure 5.8a-c. Since E M R l D E M has a resolution of 1 km, smaller window sizes (i.e., 5x5 and 9x9) were used for the extraction of the seven local measures. Each resultant grayscale image has a dimension of (140 x 167) extracted from the center of the original E M R l D E M of size (160 x 187). A detailed description of each figure is provided below. Figure 5.7a shows continuous gray tone image representations of local relief (LR) measured from TRIM93G.SUB and TRIM93H.SUB DEMs using two different window sizes (7x7) and (21x21). The original D E M images of the two subareas are also shown in the figure so that the spatial pattern of the D E M and that of the local relief measure can be compared. Evidently, the patterns of some physical features in the landscape are visible in both D E M images and local relief images. For example, features such as the Willow River valley and the Wansa Creek valley (which cut north-south from the top of the region) are evident in subarea 93G. In subarea 93H, the patterns of the predominant McGregor River valley (which cuts southwest-northeast through the region) and the Buchanan Creek valley (which connects to the McGregor River valley from the east) are evident. Figure 5.7b displays graytone image representations of the standard deviation of altitude (SD) calculated in two different moving window sizes from TRIM93G.SUB and TRIM93H.SUB DEMs. Again, the patterns of the above river valleys can be recognized in the figures. The results of slope angle and aspect calculations from TRIM93G.SUB and TRIM93H.SUB DEMs are shown in Figure 5.7c. Slope and aspect values were first derived based on elevations in a (3x3) 214 <D 15 x o x S <.2 c ° '2 LU — i * en x CD o o T3 CD a .a co en CD c J= cn CO c O 000091 000031 00009 T-J ON '2 T3 O 000091 OOOOfr (LU) BU;L|UOM 0O0021 00008 0000*' (LU) BujinjoN c *-* § J in • X co ^ s— • i "S § LU CD tD c 0 CD ™ 8 Os" 2 X CD <+_, ON X ! O _, 3 •o co TO c CO 00 0O003L 00008 LO 00009L OOOOt X! CO fill 1 00009 L -a > (LU) BUIULIOM OOOOZL 00008 000017 (LU) BujLflJON <D c I C £ o CD X S.9S D ^ cl 5P c x E * CD co c3 8 .2 £ CO > S D CD •D •g CO •D c c3 oq CO CO 00009 1 000031 00008 00009 1 OOOOf 000031 (LU) BmgjjON (LU) BuiqilON 215 00008 000017 g 3 oo 3 X in X LO <u _N 'to O O g =i <o <D CL O CO =! C_> 0 0) 1 CL o co O co CU CD 3 CO 000091 000031 000031 00008 00008 (uj) 6U!U|JON (ill) 6UIU1JON 60 •> 3 CD O X VH •!-> 3 RO a 2 X CO 3 CD 3 HI o CD 3 O CD . CD o o o o to U 60 CL, -2 £ CO CO T 3 CD o DC CO 3 H O 000091 000031 00003I 00008 00008 3 (UI) BUKIIJON (uj) 6UIULION g CD . oo T3 a g CD C« OH CD CD O 60 , 3 .5 CD CC LU LU O o o> CL CL o o ,o co 00 in 000031 000031 00008 (UJ) BUJULION (Ul) fknijiJOM 216 I O at o CO CD -5 00008 a3 60 H <u ed £ <u — t< 03 ^ ^-J 03 (D CC 5 s UJ LU 12 ON 0 u g OOOOZt OOOOZt 13 s b (UJ) BOMUON (UJ) DwjUON m c« .. o o N O cr s c o 1 H co B Si 13 o U ,o 03 ^3 o c 000091 OOOOZl 00008 00009) 0000* 000031 00008 OOOOt' 03 Sr 03 (UJ) DUKJUOfJ (uj)6utuuoN a u d CO H a w •9 § w or r o 13 co .5 f- 4> f c "C 13 o 00 000091 OOOOSl 00008 00009) OOOOt' OOOOSt MO (UJ) ButuJJON 217 00008 neighbourhood and then the slope angle values were averaged for a (7x7) or (21x21) moving window. The results of the original slope and aspect values based on the (3x3) window and the slope values averaged for a (7x7) moving window are presented in the figure for each of the subareas. The aspect values are not used in any further analyses but are included in the above figure for visual examination of the results of the original slope and aspect calculation. The slope values averaged for the moving window (21x21) were also obtained for future analysis but are not displayed due to space limitations. Figure 5.7d shows the grayscale image representations of the roughness factor (RF) derived for the two subareas. It was calculated for each point based on the slope value according to Equation 4.7 and then averaged for the moving window of size (7x7) or (21x21). The slope curvature measures (SC) derived for the two subareas using two different moving window sizes are shown in Figure 5.7e. Note that there is evidence of the valley patterns in all above figures. Figure 5.7f gives graytone image representations of HP (i.e., the number of D E M points that are higher than the center point of the moving window) derived from the two T R I M subarea DEMs using two different moving window sizes. Finally, Figure 5.7g displays graytone image representations of HI (i.e., hypsometric integral), estimated from TRIM93G.SUB and TRIM93H.SUB DEMs using two different window sizes (i.e., 7x7 and 21x21). Similarly, for the whole study area, Figure 5.8a shows image representations of local relief and standard deviation of elevations extracted from the 1 km E M R l D E M using window sizes (5x5) and (9x9) and it also shows image representations of slope and roughness factor derived from the window size (3x3). Slope, slope curvature, and roughness factor measures 218 derived using window sizes (5x5) and (9x9) are displayed in Figure 5.8b. Figure 5.8c gives the results of HP and HI measures derived from the EMR1 D E M using window sizes (5x5) and (9x9). The original EMR1 D E M image is also shown in the figure for comparing the spatial patterns. Clearly, the spatial patterns of the Fraser River valley, McGregor Plateau and Misinchinka Ranges in the region are evident in all the figures. From the above results it can be observed visually that none of these measures varies randomly, but are all somewhat related to the spatial variation of terrain. There is no doubt that they all exhibit structure which allows one to infer that there is systematic spatial variation of terrain attributes. However, each measure seems to be capturing a different aspect of the surface roughness. Some measures appear to show more similar patterns (e.g., local relief and standard deviation of altitude) than others do (e.g., HP). The different window sizes also have an impact on the results. Since the 'moving window' technique is an example of a low pass spatial filter, the larger the size of the window, the 'smoother' the result. 5.3 Multivariate Classification Results In section 5.2 seven commonly used local geometric measures were identified and extracted from study area DEMs. These local variables include: local relief (relief), standard deviation of altitude (std), slope (slope), roughness factor (rough), slope curvature (curv), number of points higher than center point (highpt), and hypsometric integral (hypinf). Each of the 219 parameters extracted above is believed to represent one or several aspects of the surface roughness property. Clearly, there is room for differing views on parameters to be chosen for what is essentially the same problem. Murtagh and Heck [1987] identified the following criteria that would help in making the decision towards selecting a particular set of parameters in a particular case: (i) the quality of the data; (ii) the computational ease of measuring certain parameters; (iii) the relevance and importance of the parameters measured relative to the data analysis output (e.g., the classification); (iv) the importance of the parameters relative to theoretical models under investigation; and (v) in the case of very large sets of data, storage considerations may preclude extravagance in the number of parameters used. Surely, criteria (iii) and (iv) are much more important than (i), (ii) and (v). 5.3.1 Variable selection Among all those identified, it is possible that some of the variables are redundant because of correlation. If variables are highly correlated, they add little new information to a multivariate analysis. Also, as indicated in section 4.4.2, highly correlated variables can create problems in some multivariate analyses. In the limiting case where the variables are perfectly correlated, for example, the transpose of the matrix of variables has no inverse and a determinant of zero will result. Thus, the first step in a multivariate analysis is usually to cross-correlate all variables (i.e., relief, slope, curv, hypint, std, highpt, rough) and to identify those significantly correlating ones. In order to find out if significant correlations exist among the seven variables used in this study, the scattergrams of all the variable pairs were 220 examined and their correlation coefficients were calculated. Figure 5.9a displays the scattergrams of all the variable pairs for the two subareas T R I M 93G and T R I M 93H (N.B. only the result of using window size 7x7 and only a random sample of n=40 data points is displayed for each variable for clearer visualization). Figure 5.9b shows the scattergrams of all the variable pairs for the whole study area. Again, only a random sample of 40 is shown due to the visualization constraints. Table 5.4a lists the rankorder correlation matrix of the seven variables (window size 7x7) for each subarea. Because of the skewed distributions of the most variables (hence a statistical lack of normality in the majority of the morphometric parameters) and the obvious non-linear relation between some of the variables (e.g., slope and rough), Spearman's rank-order correlation coefficient (r ) was s used rather than Pearson's linear correlation coefficient. As shown in Figure 3.10, for example, the histogram of slope values for each of the two subareas is not normally distributed but positively skewed. Spearman's rank-order correlation provides a measure very similar to Pearson's correlation coefficient, with the same range of values and an identical interpretation. However, it is based on the differences of ranks between the two variables rather than the values themselves and their covariance. The diagonal of each correlation matrix table, from the top left corner, is the correlation of each variable with itself (r=+1.0). The triangle of the data below that line contains the same information as the triangle above the line (row for column). Therefore, for comparison purposes, the results for subarea T R I M 93G are shown above the diagonal line and below the line are the results for subarea T R I M 93H. Table 5.4b lists the rank-order correlation results of the variables extracted from the 221 Figure 5.9a The scattergrams of all the variable pairs (window size: 7x7) for subareas 93G (above the diagonal) and 93H (below the diagonal) (n=40). 222 Figure 5.9b The scattergrams of all the variable pairs (above the diagonal for window size 5x5 and below the diagonal for window size 9x9) for the whole study area (n=40). 223 -*—» 00 •4-» #—« £H m IT) d d (30 m d d 00 d CN CN o di o q NO d oo d u o c/o m ON o q o q NO ON d d Curv 00 O d Slope m o d d Relief ON d Curv Os d NO o d CN lief 'EL d NO in d en ON d d NO o q d d NO o q o d o q en d T—1 o o d i o d CN NO ON d O d• d m d en o d NO ON ON O o q d in d m d oo ON d d -*-» 00 d ro d oo o d l ON Rough 6J0 ON Highpt XI d ON Hypint Rough +-* /-> MH ON 224 ON d d Rough 3 M >n ON ON ON 00 d d d o d ON d Tl- d o q I OO > OO XI a m •*-» 00 'S3 * EH 03 oo .2 o £ -3 o3 -a m CN CN d d > O oo 3 d ON d oo d ,_ ,0 3 O „ w> P >03 "03 -3 co o d d co 00 d ^oo d o q CH m ON Eo75 d Pi O q o q 00 d d o d d• co ON d o q r-~ o o r- d d NO CN NO d d oo o 00 CN ON m o ON ON d d oo d CN d d di d 00 t-- NO ON in co d d d d d d on oo Curv Slope Relief o CN ON x in O o l lief U 3 fl o u ^ X 00 ON d co I Curv £ _ .a 4^ di ON . a x 2 3 i) 1 13 m di d <g IH o q 00 "El o 0 o q CO 4—> on O X O o O > 3 g m o 00 o -*—* o ON Rough ? Highpt 3 Hypint O 225 whole study area E M R l D E M . Above the diagonal line are the results for window size (5x5) and below the line are the results for window size (9x9). It can be seen from the scattergrams in Figure 5.9a and the correlation matrix in Table 5.4a that, for both subareas, variables rough and slope are highly correlated and variable relief is highly correlated to std. This is no surprise when one looks at the way that rough is actually directly derived from slope and notes that both relief and std are some sort of description of elevation range (see sections 4.2.1 to 4.2.4) and they both belong to one group of geomorphometric measures as identified by Pike [1988a,b]. Variables slope and rough are also highly correlated with both relief and std. Based on Figure 5.9b and Table 5.4b, the same conclusions can be drawn for the variables extracted from E M R l D E M for the whole study area using the different window sizes. Considering these correlating relations, only variable groups 3-5 (i.e., curv, hypint, and std), 3-6 (i.e., curv, hypint, std, and highpt), 2 (i.e., slope only) and 5 (i.e., std only) are used for hierarchical clustering in the following section. The largest subset group chosen to be tested includes only variables #3 to #6 because the other variables #1, #2, and #7 (relief, slope, and rough) are all highly correlated to variable #5 (std). Variables relief and std both describe elevation range (the basic topographic measure of the vertical dimension of the terrain) and they are highly correlated to each other. However, the variable std was included in the above subset group (i.e., 3-6) instead of relief mainly because the former is more statistically sound and stable (see discussions in section 4.2.2). Although the calculation of relief"is computationally less demanding, the automatic 226 extraction of the variables from DEMs using current computer power has made this consideration insignificant. Variable group 3-5 was also chosen to be tested because variable #6 (highpf) appears to emphasize more local terrain discontinuities and is very different from all other variables. It would be interesting to compare the clustering results with and without this variable included in the group. Finally, two single variable groups (i.e., 2 or 5 only) were selected as well in order to compare the differences between classifications using single- and multivariate signature—to see if a variable such as slope and std alone can be effective in characterizing the roughness of terrain for the purpose of D E M error modelling. 5.3.2 Grouping of homogeneous areas Two different grouping techniques are widely used in image analysis: supervised classification (i.e., discriminant function analysis), and unsupervised classification (i.e., cluster analysis). Discriminant analysis uses a priori knowledge for the selection of a discriminant function which separate individual classes in multi-dimensional variable space. Supervised classifiers require that the objectives of the classification and the class properties be known and clear; the number of groups in a discriminant function is set prior to the analysis. On the other hand, cluster analysis requires little or no prior knowledge about the number and nature of classes in the area. The number of clusters that will emerge from a classification scheme cannot ordinarily be predetermined. Thus, cluster analysis is commonly used to get 227 a first impression of the regional partition [Weibel and DeLotto, 1988]. For the purpose of classifying the roughness of terrain at different scale levels, unsupervised hierarchical clustering seems to be the more appropriate method. Although the classification itself is point-based and not based on context-sensitivity, the variables are extracted from a local neighbourhood (i.e., a moving window) and thus reflect texture at different scales. Figures 5.10a-d and 5.1 la-d show results of 8 different classification tests for each of the two subareas T R I M 93G and 93H. Variations tested include different moving window size used in the variable extraction (window = 21x21 or 7x7), and different variable groups used for clustering (variables = 3-5, 3-6, 2 or 5 only). Figure 5.10a, for example, displays the three terrain clusters derived from variable group (3-5) and moving window size (21x21) for the two subareas. In order to make comparison of the figures easier, the roughest cluster (see the discussion below) is always shown as black and the least rough cluster is shown as white. Considering the computer power constraint, only a submatrix of size 220 x 240 (extracted from the center of the original variable matrix of size 280 x 300 in each subarea) was used in the classification (see Figure 5.7a). Figures 5.12a-d show some examples of classification results for the whole study area. Variations tested include window size (5x5) or (9x9) for the variable extraction, different variable group (3-5), (3-6), (2) or (5) for the multivariate clustering. Figure 5.12a, for example, displays the three terrain clusters derived from variable group (3-5) and moving window sizes (5x5) and (9x9). Not all the classification results are presented in the above 228 229 CN x o o o o w cu .s CO 'to 0 E ,- I CO CO o o o DC LO g "J d i bO u I 1 1 S3 S « 00028 0008Z 60 CN O ^—^ v-» o o o o OH 3 CU j3 2 ~ OOOt'Z 60 CD g •2 *" u > U cd CC D 4H (LU) 6UJU.U0N Win 3 •O fi <U GO X! 13 CU -3 3 H CO o o o o CD LO <U hp CO co co 4H E CO CO to LU 2 o o o LO b ; co co CD 3 O •i CU u 00089 (LU) 6 U | L | U 0 N Win 3 *•—' cd LO cd 0002Z ON _ pd H to 1 cd 3 GO 000^9 3 00 •— I< UH 231 1 MH CN X o o o o CN <u .3 CO CO CO ^ > E. ^ X CO en o o o DC I- o -a a > d .si LO a ~ T3 1) s * w bo IT, O o o o o 1^ 000S8 0008/ (w) S —' XI O 60 <u 3 5 3 OOOfrZ 6 U J U U 0 N V\l±n o3 > cD X 3 O T 3 •a x H cu <u ca X o o o o CD LO CO cr> 1) h H C/3 i2 133os „ E co LU O O 5 c/3 I 3 O in o c •jj 03 u 00089 (LU) 6U!U.UON Win 000*9 o vi a 3 60 232 OS S o 000SZ M-l H 03 1> 03 X 3 o o o o 00 X CO CO 05 o o o LO ID LU l l o o o o 00028 0008Z OOOfrZ (ui) 6U!L|UON tAlin in o o o o CD LO CO CO LU o o o LO c3 X> u 0002Z 00089 0001^9 C3 LO (LU) 6U!U.UON Win 60 PH 233 g —. OOOSZ • . 00089 (uu) 6U!U,UON . <—* 000^9 UH u Win 3 00 235 co o o o o rx u CO 'lo X & O CO 03 o o o co LU 5 o T3 g SO O B ^ T3 o o o o I- S 0008Z § PH s 00038 d OOOvl Win (LU) BU|U.UON > u s 0 o 12 o o o o co LO CD c 1J CO CD CO C Ha h- o o o LO 5 £ to 1) ON T3 C o o ••s aI ON CJ <+3 0002Z 00089 (LU) 6 U | U U O N Win 236 000*9 0H in 237 238 239 240 figures. It is visually evident, from the examples of the terrain classification results for the whole study area and the two subareas, that various terrain clusters are not randomly distributed but closely related to the variation and complexity of terrain. In other words, the spatial pattern of the clusters appears to reflect, to some degree, the variability and roughness characteristics of the study area surfaces. As can be seen in Figure 5.10a, for example, the roughest cluster (shown as black) for subarea 93G includes the area around the Willow River Valley where there is high terrain variability. The area on the south-west side of the region is the second roughest cluster (shown as dark gray). The area on the north-east side of the region is the smoothest (shown as white) of all three terrain classes derived from variable group (3-5) and window size (21x21), and it obviously corresponds to a very flat area in the region. Similarly, for subarea 93H, the three terrain clusters seem to follow the patterns of the McGregor River Valley and various ridges around Mt. Sir Alexander in the region. From Figures 5.12a-d the general patterns of the Fraser River Valley and the McGregor Plateau on the south-west side and the Misinchinka and Hart Ranges on the north-east side are reflected in the terrain classification maps derived using various window sizes and variable groups. However, the various clustering results do look somewhat different from one another for each surface. For example, depending on the size of the moving window originally used for variable extraction, a different classification outcome results. If "smooth" variables (i.e., derived by using a larger moving window) are used (e.g., Figures 5.10a-d), the classification becomes more homogeneous than with "rough" variables (i.e., derived by using a smaller 241 moving window) (e.g., Figures 5.1 la-d). Likewise, the number of variables in the geometric signature used for the multivariate clustering also affects the classification result (e.g., compare Figures 5.10a-d, Figures 5.1 la-d, or Figures 5.12a-d). For example, the classification results based on variable group (3-6) tend to show the roughest cluster as being more "peppery" when compared to the results from other variable groups. This is probably because variable #6 (highpt) emphasizes more local terrain discontinuities and is very different from all other variables. 5.3.3 Interpretation of classification results The multivariate classification results presented above need to be analyzed in relation to the original variables in order to fully understand the characteristics of various terrain clusters and to aid in the interpretation of the classification results. This is done by generating the summary statistics of each local roughness variable within the terrain clusters. Table 5.5a gives the mean and the standard deviation of local relief values and the number of points in each terrain cluster at various hierarchical levels with different number of resulting clusters (nclass = 2, 3, 4, 5, or 6) for both subareas (93G and 93H). In this table, the terrain clusters were derived from variable group (3-5) and the variables were extracted using window size (7x7). Tables 5.5b-g show the results of summary statistics for local roughness variable slope, curv, hypint, std, highpt, and rough respectively (window size: 7x7; variable group: 35; subarea: 93G and 93H). 242 Table 5.5a Summary statistics of local relief values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. Class* # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 17.1 9.7 33340 231.0 117.9 2 10184 40.6 13.0 19460 95.4 41.1 1 10184 40.6 13.0 24142 262.1 109.3 2 24665 20.2 10.5 9198 149.4 99.2 3 17951 12.8 6.3 19460 95.4 41.1 1 24665 20.2 10.5 20025 267.9 111.9 2 8563 37.2 9.8 9198 149.4 99.2 3 17951 12.8 6.3 19460 95.4 41.1 4 1621 58.7 13.0 4117 233.7 90.1 1 17711 24.9 8.2 9198 149.4 99.2 2 8563 37.2 9.8 19460 95.4 41.1 3 17951 12.8 6.3 13670 212.9 69.1 4 1621 58.7 13.0 6355 386.2 93.4 5 6954 8.0 3.7 4117 233.7 90.1 1 8563 37.2 9.8 19460 95.4 41.1 2 17951 12.8 6.3 13670 212.9 69.1 3 6654 22.4 7.4 6355 386.2 93.4 4 1621 58.7 13.0 6219 99.8 54.8 5 11057 26.5 8.3 4117 233.7 90.1 6 6954 8.0 3.7 2979 252.8 91.2 * Note that the classes do not remain consistent down the table. That is, class 1 at one level does not necessarily remain class 1 at the next level. 243 Table 5.5b Summary statistics of slope values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 3.62 1.76 33340 29.46 11.37 2 10184 7.78 2.69 19460 15.04 5.41 1 10184 7.78 2.69 24142 32.78 9.82 2 24665 4.18 1.86 9198 20.74 10.50 3 17951 2.84 1.27 19460 15.04 5.41 1 24665 4.18 1.86 20025 33.02 9.93 2 8563 7.02 1.69 9198 20.74 10.50 3 17951 2.84 1.27 19460 15.04 5.41 4 1621 11.79 3.34 4117 31.60 9.18 1 17711 5.05 1.38 9198 20.74 10.50 2 8563 7.02 1.69 19460 15.04 5.41 3 17951 2.84 1.27 13670 28.60 6.77 4 1621 11.79 3.34 6355 42.55 8.91 5 6954 1.96 0.75 4117 31.60 9.18 1 8563 7.02 1.69 19460 15.04 5.41 2 17951 2.84 1.27 13670 28.60 6.77 3 6654 4.64 1.40 6355 42.55 8.91 4 1621 11.79 3.34 6219 15.72 7.21 5 11057 5.30 1.31 4117 31.60 9.18 6 6954 1.96 0.75 2979 31.22 8.35 Class 244 Table 5.5c Summary statistics of slope curvature values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 1.55 0.73 33340 8.53 3.78 2 10184 3.41 1.63 19460 3.69 1.48 1 10184 3.41 1.63 24142 8.11 3.36 2 24665 1.65 0.79 9198 9.63 4.53 3 17951 1.40 0.61 19460 3.69 1.48 1 24665 1.65 0.79 20025 7.25 2.61 2 8563 2.87 0.90 9198 9.63 4.53 3 17951 1.40 0.61 19460 3.69 1.48 4 1621 6.26 1.65 4117 12.27 3.51 1 17711 1.96 0.71 9198 9.63 4.53 2 8563 2.87 0.90 19460 3.69 1.48 3 17951 1.40 0.61 13670 6.51 2.21 4 1621 6.26 1.65 6355 8.84 2.69 5 6954 0.88 0.29 4117 12.27 3.51 1 8563 2.87 0.90 19460 3.69 1.48 2 17951 1.40 0.61 13670 6.51 2.21 3 6654 2.14 0.93 6355 8.84 2.69 4 1621 6.26 1.65 6219 7.07 2.63 5 11057 1.85 0.50 4117 12.27 3.51 6 6954 0.88 0.29 2979 14.99 2.55 Class 245 Table 5.5d Class Summary statistics of hypint values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 0.48 0.11 33340 0.45 0.10 2 10184 0.44 0.11 19460 0.49 0.06 1 10184 0.44 0.11 24142 0.50 0.08 2 24665 0.55 0.07 9198 0.32 0.05 3 17951 0.38 0.07 19460 0.49 0.06 1 24665 0.55 0.07 20025 0.47 0.05 2 8563 0.42 0.09 9198 0.32 0.05 3 17951 0.38 0.07 19460 0.49 0.06 4 1621 0.57 0.14 4117 0.62 0.06 1 17711 0.54 0.08 9198 0.32 0.05 2 8563 0.42 0.09 19460 0.49 0.06 3 17951 0.38 0.07 13670 0.47 0.04 4 1621 0.57 0.14 6355 0.47 0.07 5 6954 0.57 0.05 4117 0.62 0.06 1 8563 0.42 0.09 19460 0.49 0.06 2 17951 0.38 0.07 13670 0.47 0.04 3 6654 0.62 0.05 6355 0.47 0.07 4 1621 0.57 0.14 6219 0.32 0.05 5 11057 0.49 0.04 4117 0.62 0.06 6 6954 0.57 0.05 2979 0.31 0.06 246 Table 5.5e Summary statistics of standard deviation of elevation values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. Class # of points (93G) Mean (93G) Std.dev (93G) #of points (93H) Mean (93H) Std.dev (93H) 1 42616 4.37 2.56 33340 61.48 33.27 2 10184 11.23 4.09 19460 25.22 11.00 1 10184 11.23 4.09 24142 70.56 31.09 2 24665 5.24 2.78 9198 37.66 26.28 3 17951 3.18 1.60 19460 25.22 11.00 1 24665 5.24 2.78 20025 72.55 31.51 2 8563 10.14 2.88 9198 37.66 26.28 3 17951 3.18 1.60 19460 25.22 11.00 4 1621 16.99 4.68 4117 60.84 26.93 1 17711 6.50 2.19 9198 37.66 26.28 2 8563 10.14 2.88 19460 25.22 11.00 3 17951 3.18 1.60 13670 56.70 18.17 4 1621 16.99 4.68 6355 106.65 26.75 5 6954 2.04 0.94 4117 60.84 26.93 1 8563 10.14 2.88 19460 25.22 11.00 2 17951 3.18 1.60 13670 56.70 18.17 3 6654 5.64 1.87 6355 106.65 26.75 4 1621 16.99 4.68 6219 24.86 13.83 5 11057 7.02 2.21 4117 60.84 26.93 6 6954 2.04 0.94 2979 64.40 26.02 247 Table 5.5f Summary statistics of highpt values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 20.80 9.93 33340 23.63 5.84 2 10184 23.56 7.26 19460 23.59 5.26 1 10184 23.56 7.26 24142 22.56 5.45 2 24665 19.45 9.54 9198 26.42 5.90 3 17951 22.66 10.15 19460 23.59 5.26 1 24665 19.45 9.54 20025 23.27 4.84 2 8563 23.63 7.03 9198 26.42 5.90 3 17951 22.66 10.15 19460 23.59 5.26 4 1621 23.15 8.35 4117 19.14 6.82 1 17711 20.24 8.80 9198 26.42 5.90 2 8563 23.63 7.03 19460 23.59 5.26 3 17951 22.66 10.15 13670 23.31 4.97 4 1621 23.15 8.35 6355 23.17 4.54 5 6954 17.43 10.95 4117 19.14 6.82 1 8563 23.63 7.03 19460 23.59 5.26 2 17951 22.66 10.15 13670 23.31 4.97 3 6654 17.71 9.50 6355 23.17 4.54 4 1621 23.15 8.35 6219 26.38 6.33 5 11057 21.77 7.97 4117 19.14 6.82 6 6954 17.43 10.95 2979 26.50 4.88 248 Table 5.5g Class Summary statistics of rough values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 0.29 0.25 33340 17.44 11.06 2 10184 1.23 1.01 19460 4.09 2.63 1 10184 1.23 1.01 24142 20.30 10.55 2 24665 0.36 0.28 9198 9.95 8.61 3 17951 0.18 0.15 19460 4.09 2.63 1 24665 0.36 0.28 20025 20.48 10.82 2 8563 0.92 0.42 9198 9.95 8.61 3 17951 0.18 0.15 19460 4.09 2.63 4 1621 2.88 1.52 4117 19.44 9.09 1 17711 0.48 0.25 9198 9.95 8.61 2 8563 0.92 0.42 19460 4.09 2.63 3 17951 0.18 0.15 13670 14.95 6.53 4 1621 2.88 1.52 6355 32.35 8.39 5 6954 0.08 0.05 4117 19.44 9.09 1 8563 0.92 0.42 19460 4.09 2.63 2 17951 0.18 0.15 13670 14.95 6.53 3 6654 0.43 0.27 6355 32.35 8.39 4 1621 2.88 1.52 6219 5.40 4.01 5 11057 0.50 0.23 4117 19.44 9.09 6 6954 0.08 0.05 2979 19.44 7.88 249 From Table 5.5a, it appears that at level 2 (i.e., nclass = 2) for subarea 93G, cluster {2} is "rougher" than cluster {1} (i.e., in an order of {2,1}) in terms of "local relief. The average local relief value is 40.6 m for cluster {2} and only 17.1 m for cluster {1}. For subarea 93H, cluster {1} is obviously much "rougher" than cluster {2} at level 2 because the average local relief value is 231.0 m for cluster {1} and only 95.4 m for cluster {2}. The order of the clusters according to local relief values for subarea 93H is thus {1,2}. At level 3 (i.e., nclass = 3) for subarea 93G, cluster {1} is now the roughest, then cluster {2} and then {3} (i.e., in an order of {1,2,3}). At level 3 for subarea 93H, cluster {1} is now the roughest and then {2} and then {3} (i.e., an order of {1,2,3}). Following the same notion, the interpretation of each cluster (derived from variable group 3-5 and window size 7x7) at various hierarchical levels (2 to 6) in terms of "local relief can be summarized for both subareas. For subarea 93G, the orders at various levels are: {2,1}, {1,2,3}, {4,2,1,3}, {4,2,1,3,5}, and {4,1,5,3,2,6}. For subarea 93H. the orders are: {1,2}, {1,2,3}, {1,4,2,3}, {4,5,3,1,2} and {3,6,5,2,4,1}. Meanwhile, it should be noted that the differences between local relief values in some of the clusters become less and less significant at higher levels. For example, comparing clusters {1} and {4} at level 6 for subarea 93H, the difference between the means of local relief values in these two clusters is only 4.4 m but the standard deviation of local relief values within each cluster is as high as 41.1 m for {1} and 54.8 m for {4}. Similarly, the characteristics of all seven local roughness variables within each terrain cluster derived from variable group (3-5) and window size (7x7) can be summarized as follows for each subarea based on information from Tables 5.5a-g: 250 1) For subarea 93G "relief: {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6} "slope": {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6} "curv": {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,3,5,2,6} "hypint": {1,2} {2,1,3} {4,1,2,3} {4,5,1,2,3} {3,4,6,5,1,2} "std": {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6} "highpt": {2,1} {1,3,2} {2,4,3,1} {2,4,3,1,5} {1,4,2,5,3,6} "rough": {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6} 2) For subarea 93H "relief: {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,6,5,2,4,1} "slope": {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,5,6,2,4,1} "curv": {1,2} {2,1,3} {4,2,1,3} {5,1,4,3,2} {6,5,3,4,2,1} "hypint": {2,1} {1,3,2} {4,3,1,2} {5,2,3,4,1} {5,1,2,3,4,6} "std": {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,6,5,2,1,4} "highpt": {1,2} {2,3,1} {2,3,1,4} {1,2,3,4,5} {6,4,1,2,3,5} "rough": {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,5,6,2,4,1} From the ordering of the terrain clusters according to various local roughness variables, several observations can be made. First, for both subareas, the orders of the clusters at various levels according to variable slope are identical to the orders given by variable rough (at least from levels 2 to 6). This gives no surprise because the two variables are highly positively correlated (r=0.99) as indicated in section 5.3.1. Second, for subarea 93G, the 251 orders given by relief, slope, std, and rough are the same for all the six levels. For subarea 93H, they are same for levels 2 to 5. There is only a little difference among them at level 6. Again, this can be accounted for by their positive correlations (r ranging from 0.93 to s 0.99 as seen in Table 5.4a). Third, as a result of relatively high correlations between curv and the above four variables (r ranging from 0.74 to 0.86 as seen in Table 5.4a) in the flatter s subarea, the orders of clusters given by curv for subarea 93G are the same as those given by the above four variables for levels 2 to 5, with only a slight difference at level 6. For the rougher subarea 93H, the variable curv is not as highly correlated with variables relief, slope, std and rough (r ranging from 0.51 to 0.64 as seen in Table 5.4a). As a result, the orders s given by curv for subarea 93H are quite different from those given by the above four variables. Fourth, since the variable highpt for subarea 93G has slightly positive correlations with variables relief, slope, std, rough, and curv (r ranging from 0.14 to 0.16 as seen in s Table 5.4a), the orders given by this variable have some similarity to those given by the above five variables. For the rougher subarea 93H, the variable highpt has no or slightly negative correlations with the five variables (r ranging from -0.07 to 0.03 as seen in Table s 5.4a). Therefore, the orders given by highpt are very different from those given by the above five variables and are sometimes even reversed. Finally, for both subareas, because of some negative correlations between highpt and hypint (-0.23 and -0.27 respectively for 93G and 93H), the orders given by these two variables are mostly reversed, especially for subarea 93H. At level 6 for subarea 93H, for example, cluster {5} is the roughest and {6} is the least rough cluster according to hypint, whereas according to highpt, the reverse is true with cluster {6} being the roughest. 252 The above observations can be used to understand or interpret the characteristics of each of the terrain clusters at various levels derived from variable group (3-5) and window size (7x7) in relation to the original local roughness measures. For example, take the classification results (3 classes; variable group 3-5; window size 7x7) for the two subareas as shown in Figure 5.11a. Subarea 93G cluster {1} seems to represent the roughest area overall. This cluster has, on average, the highest local relief, the highest slope, the highest slope curvature, the highest standard deviation of elevations, the highest highpt (i.e., number of points that are higher than the center point of the moving window), and the highest roughness" factor. Only the mean hypsometric integral value in this cluster is not the highest of the three. On the other hand, cluster {3} seems to represent the least rough area in 93G. A l l seven variables have the lowest average values in this cluster except that highpt value in this cluster is the second lowest of the three. For subarea 93H, it is more difficult to point out which cluster at this level is the roughest. However, it is clear that cluster {1} indicates area with relatively high average relief, slope, hypint, std, rough, medium curv value, and relatively low highpt value. Cluster {3}, on the other hand, represents area with relatively low average relief, slope, curv, std, rough, and medium hypint and highpt values. For terrain clusters derived from variable group (3-5) and variables extracted using window size (21x21), the summary statistics of each variable within various terrain clusters for the two subareas were calculated, but are not shown. From these summary statistics, the characteristics of all seven local roughness variables within each terrain cluster derived from variable group (3-5) and window size (21x21) can be summarized as follows: 253 1) For subarea 93G "relief: {1,2} {3,1,2} {1,4,2,3} {5,3,4,1,2} {5,2,3,4,6,1} "slope": {1,2} {3,1,2} {4,1,2,3} {5,3,4,1,2} {5,2,4,3,6,1} "curv": {1,2} {3,1,2} {4,1,2,3} {3,5,4,1,2} {2,5,3,4,6,1} "hypint": {1,2} {3,1,2} {4,2,3,1} {3,1,5,2,4} {2,6,4,5,1,3} "std": {1,2} {3,1,2} {1,4,2,3} {5,4,3,1,2} {5,3,2,4,6,1} "highpt": {2,1} {2,1,3} {1,3,4,2} {4,2,5,3,1} {3,1,5,4,2,6} "rough": {1,2} {3,1,2} {4,1,2,3} {3,5,4,1,2} {2,5,4,3,6,1} 2) For subarea 93H "relief: {1,2} {3,2,1} {2,1,3,4} {1,3,4,2,5} {2,3,4,5,1,6} "slope": {1,2} {3,2,1} {2,1,3,4} {4,1,3,2,5} {2,4,3,5,1,6} "curv": {1,2} {3,2,1} {2,1,4,3} {1,4,3,5,2} {2,4,5,3,6,1} "hypint": {1,2] {2,1,3} {1,3,2,4} {4,3,2,1,5} {4,3,1,2,5,6} "std": {1,2] {3,2,1} {2,1,3,4} {1,3,4,2,5} {2,3,4,5,1,6} "highpt": {2,1] {1,3,2} {4,3,2,1} {5,2,1,3,4} {6,1,5,2,3,4} "rough": {1,2] {3,2,1} {2,1,3,4} {4,1,3,2,5} {2,4,3,5,1,6} Based on the ordering of the terrain clusters according to various local roughness variables as listed above, similar interpretations can be made regarding the characteristics of each of the terrain clusters at various levels derived from variable group (3-5) and window size (21x21) in relation to the original local roughness measures. It should be noted, however, that the variable hypint agrees more with the others for both subareas when using the larger 254 window size (21x21) than using the smaller one (7x7). This is probably because, as mentioned in section 4.2, the estimation of hypsometric integral based on only a few points (i.e., 7x7) within the moving window is not very accurate. Thus, it might not be a suitable "local measure" to use when the moving window size is small. For terrain clusters derived from different variable groups (i.e., 3-6, 2 or 5), the results of summary statistics of various local roughness measures are not shown. The characteristics of all seven local roughness variables within each terrain cluster derived from different variable groups and window sizes can be summarized similarly as above. For the whole study area, the results of summary statistics of each variable within various hierarchical terrain clusters derived from variable group (3-5) and two different window sizes (5x5 and 9x9) are presented in Tables 5.6a-g. The rankings (roughest to the least rough cluster) of each local roughness variable at different levels (2 to 6) are summarized as follows: 1) For window size 5x5 itrelief: {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,2,5,6,3,4} itslope": {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,5,2,6,3,4} IIcurv {2,1} {1,2,3} {3,2,1,4} {2,1,5,4,3} {2,1,5,6,4,3} hypint": {2,1} {3,1,2} {4,2,1,3} {3,1,5,4,2} {5,3,1,6,4,2} std": {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,2,5,6,3,4} IIhighpt": {1,2} {2,1,3} {3,1,4,2} {2,4,5,3,1} {2,4,6,1,3,5} 255 "rough": 2) {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,2,5,6,3,4} For window size 9x9 "relief: {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,5,2,6,4,3} "slope": {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,2,5,6,4,3} "curv": {1,2} {3,2,1} {2,1,4,3} {2,1,3,5,4} {5,1,2,6,4,3} "hypint": {1,2} {2,1,3} {1,3,2,4} {3,4,2,1,5} {2,3,1,6,5,4} "std": {1,2} {2,3,1} {1,2,4,3} {2,1,3,5,4} {1,5,2,6,4,3} "highpt": {2,1} {3,1,2} {2,4,3,1} {1,5,4,2,3} {5,4,6,3,1,2} "rough": {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,5,2,6,4,3} From the above ordering of the terrain clusters according to various local roughness variables, several similar observations as made for the two subareas can be made also for the whole study area. These observations can then be used to interpret the characteristics of each of the terrain clusters at various levels derived from variable group (3-5) and window size (5x5) or (9x9) in relation to the original local roughness measures. For example, take the classification results (3 classes; variable group 3-5; window size 5x5) as shown in Figure 5.12a. Cluster {1} (shown as black in the figure) represents the roughest area overall. This cluster has, on average, the highest local relief, the highest slope, the highest slope curvature, the highest standard deviation of elevations, and the highest roughness factor. Only the mean hypsometric integral value and the highpt value in this cluster are not the highest of the three. On the other hand, cluster {3} (shown as white in the figure) seems to represent the least rough area. A l l seven variables have the lowest average values in this cluster except the 256 Table 5.6a Class Summary statistics of local relief values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 273.1 182.1 15399 923.3 254.7 2 10564 753.1 212.2 7981 284.8 133.9 1 10564 753.1 212.2 7981 284.8 133.9 2 7512 333.5 194.2 10674 969.9 263.3 3 5304 187.6 119.1 4725 817.8 197.1 1 7512 333.5 194.2 10674 969.9 263.3 2 8313 768.1 201.6 4725 817.8 197.1 3 2251 697.6 239.4 4951 245.3 106.9 4 5304 187.6 119.1 3030 349.2 148.0 1 8313 768.1 201.55 4725 817.8 197.1 2 2251 697.6 239.41 5885 1086.5 253.3 3 5304 187.6 119.15 4789 826.7 196.2 4 4100 185.7 89.75 4951 245.3 106.9 5 3412 511.2 124.55 3030 349.2 148.0 1 4452 859.2 199.96 5885 1086.5 253.3 2 2251 697.6 239.41 4789 826.7 196.2 3 5304 187.6 119.15 4951 245.3 106.9 4 4100 185.7 89.75 3030 349.2 148.0 5 3861 663.2 144.23 2133 966.6 149.1 6 3412 511.2 124.55 2592 695.4 139.0 257 Table 5.6b Class Summary statistics of slope values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 4.68 3.21 15399 11.06 3.15 2 10564 12.59 3.03 7981 2.84 1.21 1 10564 12.59 3.03 7981 2.84 1.21 2 7512 5.69 3.46 10674 12.01 2.83 3 5304 3.25 2.12 4725 8.91 2.73 1 7512 5.69 3.46 10674 12.01 2.83 2 8313 13.01 2.78 4725 8.91 2.73 3 2251 11.06 3.38 4951 2.70 1.21 4 5304 3.25 2.12 3030 3.07 1.18 1 8313 13.01 2.78 4725 8.91 2.73 2 2251 11.06 3.38 5885 13.19 2.49 3 5304 3.25 2.12 4789 10.57 2.54 4 4100 3.04 1.38 4951 2.70 1.21 5 3412 8.88 2.34 3030 3.07 1.18 1 4452 14.27 2.73 5885 13.19 2.49 2 2251 11.06 3.38 4789 10.57 2.54 3 5304 3.25 2.12 4951 2.70 1.21 4 4100 3.04 1.38 3030 3.07 1.18 5 3861 11.55 2.02 2133 10.16 2.67 6 3412 8.88 2.34 2592 7.88 2.33 258 Table 5.6c Summary statistics of slope curvature values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 1.77 1.04 15399 4.36 1.09 2 10564 4.40 1.10 7981 1.55 0.84 1 10564 4.40 1.10 7981 1.55 0.84 2 7512 2.19 1.05 10674 4.34 1.05 3 5304 1.17 0.66 4725 4.40 1.19 1 7512 2.19 1.05 10674 4.34 1.05 2 8313 4.11 0.88 4725 4.40 1.19 3 2251 5.47 1.15 4951 1.21 0.48 4 5304 1.17 0.66 3030 2.10 0.99 1 8313 4.11 0.88 4725 4.40 1.19 2 2251 5.47 1.15 5885 4.77 0.92 3 5304 1.17 0.66 4789 3.81 0.95 4 4100 1.52 0.80 4951 1.21 0.48 5 3412 3.01 0.66 3030 2.10 0.99 1 4452 4.54 0.79 5885 4.77 0.92 2 2251 5.47 1.15 4789 3.81 0.95 3 5304 1.17 0.66 4951 1.21 0.48 4 4100 1.52 0.80 3030 2.10 0.99 5 3861 3.61 0.70 2133 5.49 0.72 6 3412 3.01 0.66 2592 3.50 0.58 259 Table 5.6d Class Summary statistics of hypint values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 0.43 0.11 15399 0.42 0.10 2 10564 0.45 0.12 7981 0.39 0.12 1 10564 0.45 0.12 7981 0.39 0.12 2 7512 0.36 0.07 10674 0.47 0.07 3 5304 0.52 0.07 4725 0.31 0.06 1 7512 0.36 0.07 10674 0.47 0.07 2 8313 0.49 0.09 4725 0.31 0.06 3 2251 0.28 0.06 4951 0.47 0.07 4 5304 0.52 0.07 3030 0.27 0.06 1 8313 0.49 0.09 4725 0.31 0.06 2 2251 0.28 0.06 5885 0.43 0.06 3 5304 0.52 0.07 4789 0.52 0.06 4 4100 0.34 0.06 4951 0.47 0.07 5 3412 0.39 0.06 3030 0.27 0.06 1 4452 0.45 0.08 5885 0.43 0.06 2 2251 0.28 0.06 4789 0.52 0.06 3 5304 0.52 0.07 4951 0.47 0.07 4 4100 0.34 0.06 3030 0.27 0.06 5 3861 0.55 0.06 2133 0.28 0.06 6 3412 0.39 0.06 2592 0.34 0.05 260 Table 5.6e Class Summary statistics of standard deviation of elevation values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 76.6 52.5 15399 240.1 71.6 2 10564 216.8 66.8 7981 65.3 28.7 1 10564 216.8 66.8 7981 65.3 28.7 2 7512 93.9 57.0 10674 253.9 73.0 3 5304 52.1 32.4 4725 208.9 57.4 1 7512 93.9 57.0 10674 253.9 73.0 2 8313 222.7 63.9 4725 208.9 57.4 3 2251 194.9 72.5 4951 59.4 26.1 4 5304 52.1 32.4 3030 74.9 30.1 1 8313 222.7 63.9 4725 208.9 57.4 2 2251 194.9 72.5 5885 293.8 62.5 3 5304 52.1 32.4 4789 204.9 52.0 4 4100 50.1 24.3 4951 59.4 26.1 5 3412 146.7 36.6 3030 74.9 30.1 1 4452 255.2 62.0 5885 293.8 62.5 2 2251 194.9 72.5 4789 204.9 52.0 3 5304 52.1 32.4 4951 59.4 26.1 4 4100 50.1 24.3 3030 74.9 30.1 5 3861 185.2 41.7 2133 250.9 47.8 6 3412 146.7 36.6 2592 174.4 38.5 261 Table 5.6f Class Summary statistics of highpt values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 12.3 5.57 15399 39.3 20.6 2 10564 11.5 5.62 7981 41.3 19.2 1 10564 11.5 5.62 7981 41.3 19.2 2 7512 13.4 5.34 10674 36.4 20.6 3 5304 10.8 5.52 4725 45.8 19.0 1 7512 13.4 5.34 10674 36.4 20.6 2 8313 10.6 5.54 4725 45.8 19.0 3 2251 14.6 4.74 4951 38.6 19.4 4 5304 10.8 5.52 3030 45.7 18.0 1 8313 10.6 5.54 4725 45.8 19.0 2 2251 14.6 4.74 5885 38.5 20.1 3 5304 10.8 5.52 4789 33.8 20.8 4 4100 13.8 5.17 4951 38.6 19.4 5 3412 13.0 5.50 3030 45.7 18.0 1 4452 11.46 5.25 5885 38.5 20.1 2 2251 14.59 4.74 4789 33.8 20.8 3 5304 10.78 5.52 4951 38.6 19.4 4 4100 13.81 5.17 3030 45.7 18.0 5 3861 9.67 5.71 2133 48.8 17.8 6 3412 12.96 5.50 2592 43.4 19.5 262 Table 5.6g Class Summary statistics of rough values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 0.55 0.67 15399 2.30 1.15 2 10564 2.83 1.27 7981 0.19 0.16 1 10564 2.83 1.27 7981 0.19 0.16 2 7512 0.76 0.76 10674 2.60 1.14 3 5304 0.26 0.33 4725 1.63 0.85 1 7512 0.76 0.76 10674 2.60 1.14 2 8313 2.93 1.22 4725 1.63 0.85 3 2251 2.47 1.37 4951 0.16 0.13 4 5304 0.26 0.33 3030 0.25 0.19 1 8313 2.93 1.22 4725 1.63 0.85 2 2251 2.47 1.37 5885 3.07 1.11 3 5304 0.26 0.33 4789 2.02 0.88 4 4100 0.21 0.18 4951 0.16 0.13 5 3412 1.42 0.66 3030 0.25 0.19 1 4452 3.50 1.27 5885 3.07 1.11 2 2251 2.47 1.37 4789 2.02 0.88 3 5304 0.26 0.33 4951 0.16 0.13 4 4100 0.21 0.18 3030 0.25 0.19 5 3861 2.28 0.75 2133 2.13 0.83 6 3412 1.42 0.66 2592 1.21 0.61 263 hypint value. The results of summary statistics of each variable within various terrain clusters and the rankings of each local roughness variable at different levels (2 to 6) for other cases (i.e., terrain clusters derived using variable group 3-6, 2, or 5) are not presented here. Similar patterns can be expected. From examining Tables 5.5a-5.6g, it is also noticeable that for most variables the greatest dichotomy occurs at nclass=2, but not for hypint, and that highpt never exhibits significant grouping. One must remember that the classifications are not based upon these particular variables. 5.3.4 Comparisons of different classification results In order to see how similar/dissimilar the cluster groups are, the different classification results can be compared to each other quantitatively. To do that, a classification error matrix was calculated first for each comparison (only the nclass=2> results were considered in the analyses below). That is, two sets of classification results derived from using different variable groups or window sizes were cross-tabulated in a 3 by 3 two-dimensional array. Note that each error matrix was generated by using a total enumeration of the study area data and not just a sample. Therefore, the sampling scheme is not a concern in this study. Otherwise, when classification maps were sampled and compared to the "true" reference data for accuracy 264 assessment (as in land cover maps generated from remotely sensed data), an appropriate sampling scheme would be important and it would be necessary to consider the spatial complexity or the spatial autocorrelation in the data [Congalton, 1988a,b]. Once the classification error matrices had been generated, a test of agreement between two or more error matrices (classifications) was then done using the K H A T (or kappa) statistic to see if the classification results are different from each other [Cohen, 1960; Bishop et al, 1975; Congalton et al, 1983; Lillesand and Kiefer, 1994]. The K H A T statistic is computed as KHAT (k) = — ^ " ^ 2 - E (5.1) KJ x where r: the number of rows in the error matrix JC„: the number of observations in row i and column i (on the major diagonal) x: total of observations in row i x: total of observations in column i N: total number of observations included in matrix i+ +i KHAT represents the ratio of beyond-random agreement to expected disagreement in a 265 random case. It serves as an indicator of the extent to which the percentage correct values of an error matrix are due to "true" agreement versus "chance" agreement. As the true agreement (observed) approaches 1 and the agreement due to chance approaches 0, K H A T approaches 1 [Lillesand and Kiefer, 1994]. The value of K H A T is equal to 0 for random agreement and 1 for perfect agreement. expected in a random case. It is negative when agreement is less than that Landis and Koch [1977] have suggested three ranges of agreement for the K H A T statistic. These are poor (KHAT < 0.4), good (0.4 - 0.75) and excellent ( K H A T > 0.75). These ranges will be adopted later in the discussion of the classification comparison results. The error matrices and the K H A T statistics of 16 different comparisons (window size: 7x7 or 21x21; variable group: 3-5, 3-6, 2, or 5 only) for subarea 93G are summarized and presented in Figures 5.13a-b. Note that only one factor—either the window size or the variable group—is different while the other factor is held consistent when comparing two classifications. The first comparison shown in Figure 5.13a, for example, is between the two variable groups (3-5) and (3-6) but with same window size (7x7). If the window size and the variable group were both changing, it would be difficult to interpret the comparison results. Also worth noting is that since the class numbers (i.e., 1, 2 and 3 in this case) directly resulting from cluster analysis do not follow same order for classifications derived using different window size or variable group, the major diagonals in the original error matrices would not necessarily represent the correct correspondence between terrain clusters from two different classifications. For each error matrix in the above figures, therefore, the 266 subarea 93 G window: 7x7 IT, l variable: CO window: 7x7 variable: 3-6 12911 5040 variable: 2 0 15060 9465 140 641 1459 8084 KHAT= 0.10 m co 12172 5764 jo 8572 3 03 51 'S > window: 21x21 variable: 19988 586 0 11335 18298 87 220 42 2244 KHAT = 0.59 18339 9848 variable: ro 425 2453 17385 2751 3 214 1382 vari able: 6110 14281 5296 4023 3 03 279 > VO to KHAT = 0.42 window: 21x21 997 in • ro 11038 9530 6 ii 2398 •c 03 110 > 11776 15546 1173 1223 KHAT = 0.32 window: 7x7 window: 7x7 variable: 5 variable: 5 22562 5784 hi •§ 3692 6213 variable: 5 ii 2773 6436 20511 f3 443 1899 •c 164 03 > KHAT = 0.21 4961 > 22 266 20550 245 CN 0 11357 6271 ii 6995 16267 4185 161 03 > 0 790 1416 3768 KHAT = 0.47 KHAT = 0.42 KHAT = 0.62 window: 21x21 window: 21x21 window: 21x21 variable: 5 variable: 5 variable: 2 co 15573 520 15582 2369 0 11684 11241 1740 variable: 2 <n • to variable: 2 i 15 m m window: 21x21 window: 7x7 VO variable: 5 KHAT = 0.34 variable: 3-6 m m window: 7x7 17028 7974 6541 128 3848 14950 62 353 1916 KHAT = 0.19 VO i ro 03 13494 15062 2987 35 "S 17 > 6389 12502 1028 1286 KHAT =0.23 CN 12656 4561 3 889 03 1 "S > 9550 8368 1 1736 15038 KHAT = 0.57 Figure 5.13a Error matrices and K H A T statistics of 12 comparisons of various classifications (differences between variable groups) for subarea 93G. 267 subarea 93 G variable: 3-5 variable: 3-6 window: 21x21 window: 21x21 11338 6098 23488 4132 515 8836 14820 1009 400 8802 982 14286 378 130 508 KHAT = 0.49 variable: 2 variable: 5 961 15115 4260 1420 window: 21x21 13017 12603 1925 2103 7900 17444 529 0 15 4543 0 8674 1202 8099 6751 KHAT = 0.32 KHAT = 0.33 Figure 5.13b 7925 KHAT =0.18 window: 21x21 992 Error matrices and K H A T statistics of 4 different comparisons of various classifications (differences between the two window sizes for each of the 4 variable groups) for subarea 93G. 268 correct correspondences were found first according to the interpretation of classification results as discussed in section 5.3.3 and then the K H A T statistic calculated for each error matrix. Figures 5.13c-d give the error matrices and the K H A T statistics results of all 16 comparisons of various classifications for subarea 93H. For the whole study area, the error matrices and the K H A T statistics results of all 16 comparisons of various classifications (i.e., variable group: 3-5, 3-6, 2, and 5; window size: 5x5 and 9x9) were obtained and they are shown in Figures 5.13e-f. The reason for computing the K H A T statistics was to see how similar the different classification groups were—when using the different clustering variables and window sizes— for each particular study area. The higher the K H A T value the more similar the resultant groups. A l l the K H A T values (nclass=3 only) for each of the three study area surfaces as shown in Figures 5.13a-f are summarized in Figure 5.13g for easier discussion. From the above quantitative comparisons of various terrain classification results derived from different variable groups and window sizes, two points can be made. First, there are some similarities between the different classifications. For example, the K H A T statistic for subarea 93G is 0.59 (good agreement) when the classifications using variable groups (3-5) and (3-6), with a window size (21x21), are compared to each other. For the same comparison (i.e., variable group 3-5 versus variable group 3-6 with window size being 21x21) in subarea 93H, the K H A T value is 0.51. For the whole study area, K H A T is as high as 0.73 (close to excellent agreement) for the comparison between classifications of using variable groups (2) and (5) with window size (5x5). This comes as no surprise because variable 2 (i.e., slope) is highly 269 subarea 93 H window: 7x7 variable: 3-6 variable: in co 16096 1792 variable: 5 variable: 2 1572 4835 206 4157 1734 6763 15645 in• 16438 2978 co (L) 4693 3033 > 1972 44 1472 10142 12028 KHAT = 0.37 KHAT = 0.40 window: 21x21 window: 21x21 variable: 3-6 mi co variable: 17665 3454 2677 518 7221 1756 variable: 2 1153 7573 10783 19719 2522 31 co u 1224 12559 3688 r—1 • •c 1592 ca > 8157 3308 m CO JjJ 3 ca "S > 18668 792 0 6364 2052 782 4659 10879 8604 KHAT = 0.33 window: 21x21 ini CO CD •c ca > variable: 5 19258 3011 3 1551 7050 8870 1565 5343 6149 KHAT = 0.51 KHAT = 0.49 KHAT = 0.44 window: 7x7 window: 7x7 window: 7x7 variable: 5 variable: 5 variable: 2 variable: 3-6 window: 7x7 window: 7x7 19095 3511 59 1973 2984 3804 2035 9658 9681 NO co CD" 1 > 21726 939 0 3528 3110 2123 4437 9674 7263 CN CD 'S 22894 6669 128 175 8328 5220 34 1156 8196 > KHAT = 0.39 KHAT = 0.41 KHAT = 0.60 window: 21x21 window: 21x21 window: 21x21 variable: 5 variable: 5 variable: 2 NO variable: CO r<-i 15860 4541 459 17109 3727 24 3361 7359 1711 4619 5570 2242 2065 12152 5292 > 1895 5293 12321 KHAT = 0.36 Figure 5.13c NO KHAT =0.45 CN 19507 2977 3 2867 ca •c ca 0 > 51 11769 8602 658 6369 KHAT = 0.56 Error matrices and K H A T statistics of 12 comparisons of various classifications (differences between variable groups) for subarea 93H. 270 subarea 93 H variable: 3-5 variable: 3-6 window: 21x21 window: 21x21 15531 2552 1377 16195 3213 3257 3917 1073 4208 827 5590 2344 2824 13846 7472 3838 3628 13908 K H A T = 0.19 K H A T = 0.50 variable: 2 variable: 5 window: 21x21 window: 21x21 19246 3799 58 21351 7366 974 3211 12251 691 990 6861 5872 78 7188 6278 33 1177 8176 K H A T = 0.56 K H A T =0.51 Figure 5.13d Error matrices and K H A T statistics of 4 comparisons of various classifications (differences between the two window sizes for each of the 4 variable groups) for subarea 93H. 271 the whole study area window: 5x5 window: 5x5 variable: 3-6 variable: 2 377 193 5235 596 1681 2229 8130 ii 'S 205 > in CO CD KHAT = 0.35 ii 7838 556 57 523 289 3988 86 3646 6397 i ro ii •8 103 164 6449 3951 4 181 2265 756 'S 282 > 6428 3294 ii 196 215 6118 NO 99 0 793 2148 1784 1854 8528 in ro ii s 03 > 7821 160 0 384 3928 413 324 7584 2766 KHAT = 0.48 window: 5x5 window: 5x5 variable: 5 variable: 5 i 8769 1400 5 212 2071 919 347 4267 5390 > ro 4231 8888 490 0 440 6820 2688 0 428 3626 KHAT = 0.50 KHAT = 0.54 window: 9x9 window: 9x9 window: 9x9 variable: 5 variable: 5 8200 468 15 167 1043 3358 ca 600 > 2590 6939 I 4353 2963 variable: 5 7882 variable: 2 ro 0 window: 9x9 iable: 2 1255 SO 4760 544 KHAT = 0.40 variable: 8915 1 2898 ro KHAT = 0.67 variable: 2 ii 4511 "S 292 window: 5x5 m 0 variable: 2 m KHAT = 0.51 NO 604 window: 9x9 variable: 3-6 ro 4703 KHAT = 0.27 window: 9x9 in variable: 5 in variable: in 4734 ro window: 5x5 KHAT = 0.51 SO ro ii 8133 550 0 181 3686 701 •d 215 cd > 7436 2478 1 KHAT = 0.51 KHAT = 0.73 CN ii 1 8247 720 0 282 3639 180 0 7313 2999 KHAT = 0.47 Figure 5.13e Error matrices and K H A T statistics of 12 comparisons of various classifications (differences between variable groups) for the whole study area. 272 the whole study area variable: 3-5 variable: 3-6 window: 9x9 window: 9x9 4340 267 697 8320 252 3615 2251 1646 150 2609 443 26 2207 8331 213 1707 8084 KHAT = 0.44 KHAT= 0.70 variable: 2 variable: 5 window: 9x9 window: 9x9 8779 589 10 8194 1131 3 188 3511 6249 335 7044 359 0 1 4053 0 3497 2817 KHAT = 0.57 Figure 5.13f 1602 KHAT = 0.65 Error matrices and K H A T statistics of 4 comparisons of various classifications (differences between the two window sizes for each of the 4 variable groups) for the whole study area. 273 subarea 93 G subarea 93H 2 3-5 3-6 1.00 0.37 0.40 0.34 0.51 1.00 0.39 0.41 0.62 0.49 0.36 1.00 0.60 1.00 0.44 0.45 0.56 1.00 0.33 0.32 0.19 0.50 2 3-5 3-6 5 1.00 0.10 0.34 0.42 1 0.59 1.00 0.47 0.42 <N 0.21 0.19 1.00 0.32 0.23 0.57 0.18 0.49 1 <-A 5 0.56 0.51 the whole study area 3-5 3-6 1 1.00 0.35 0.27 0.40 i 0.51 1.00 0.50 0.54 0.67 0.51 1.00 0.73 0.48 0.51 0.47 0.44 0.70 0.57 0.65 <•<-. Figure 5.13g 2 5 1.00 A summary of K H A T statistics for the two subareas (93G and 93H) and the whole study area. The values above the diagonal in each matrix box are for the smaller window size (i.e., 7x7 for the two subareas and 5x5 for the whole study area) and the values below the diagonal are for the larger window size (i.e., 21x21 for the two subareas and 9x9 for the whole study area). The values below each matrix box are for the comparisons between the two window sizes for each of the 4 variable groups. 274 correlated with variable 5 (i.e., the standard deviation of elevations) and therefore the classifications resulting from these two variable groups are quite similar. Second, the number of geomorphometric measures used in the multivariate clustering and the size of the moving window originally used in the extraction of those measures have different impacts on the classification result. Take the comparison of the classification results derived using variable group (3-5) and variable group (3-6) for example. The classifications show greater difference or poorer agreement when the smaller moving window was used in the variable extraction. As seen in Figure 5.13g, K H A T is only 0.10 for subarea 93G and 0.37 for subarea 93H with the smaller window size 7x7; and 0.35 for the whole study area with the smaller window size 5x5. They tend to be more similar if the larger moving window (i.e., 21x21 for the two subareas and 9x9 for the whole study area) was used in the variable extraction ( K H A T is 0.59, 0.51, and 0.51 respectively). This is probably because variable #6 (i.e., highpt) emphasizes more local terrain discontinuities as indicated in section 4.2.6. The addition of this variable to the multivariate signatures used for the clustering would only make more difference if smaller window sizes were used for the extraction of the variables. When comparing the classifications from different window sizes (i.e., 7x7 or 21x21 for the two subareas and 5x5 or 9x9 for the whole study area), it seems that for variable group (3-5) in the two subareas the classification difference is relatively large (i.e., a low K H A T value of 0.18 for subarea 93G and 0.19 for 93H). Whereas for other variable groups and surfaces the K H A T values range from 0.32 to 0.70 for the comparisons between different window sizes. A l l the above similarities and differences between various classifications can be accounted for by the nature of the terrain surfaces (both local and global characteristics) and the ability 275 of each variable in capturing different aspects of the variability of terrain. Since the K H A T values are generally in the poor to good range for various comparisons (few indicate a close to excellent agreement), the classification results based on all four different variable groups (3-5, 3-6, 2, 5) and two different moving window sizes (7x7 and 21x21 for the two subareas; 5x5 and 9x9 for the whole study area) will be examined later for the D E M error modelling. As discussed and demonstrated above, the resulting classifications can be visually examined, quantitatively compared to each other, and interpreted in relation to the originally local roughness measures from which they were derived. The various classification results based on different variable groups and moving window sizes for different surfaces should also be statistically evaluated in conjunction with the observed spatial pattern of mismatch between DEMs of differing resolution shown in section 3.3. This will be done in Chapter 6 to examine the effectiveness of using multivariate hierarchical classification to quantify the relation between D E M errors and the roughness of terrain. The thesis hypothesis stated in Chapter 3 will then be tested. 276 6. D E M ERROR M O D E L L I N G 6.1 Overview As reviewed in Chapter 2, most D E M accuracy investigations were conducted along three lines: (i) describing or identifying the possible sources of gross errors; (ii) evaluating the effect of varying densities of data constituting the D E M or that of different interpolation methods; and (iii) comparing the "products" (e.g., spot height, contours) derived from a D E M with those obtained directly from terrestrial and photogrammetric surveying procedures, mostly from a producer's point of view. The accuracy estimate is usually constrained to a global measure such as R M S E . Seldom are the errors described in terms of their spatial domain or how the resolution of the D E M interacts with the relief variability. In addition, studies have demonstrated the scale-dependent nature of the terrain (e.g., Goodchild, 1982; Mark and Aronson, 1984; Roy et al., 1987). That is, specific landforms occur over only a limited range of scales and there is a wide range of topographic variation present in different terrain surfaces. Thus, in defining the accuracy of a D E M , one also needs ultimately to know the global and local characteristics of the terrain and how the resolution interacts with them. As seen from the preliminary study results in Chapter 3, D E M errors and their spatial distribution appear to be related to the D E M resolution and its interaction with the variability of terrain. In order to fully understand the issue of uncertainty in digital terrain representation and the role of scale in it, some quantitative investigations of these relations 277 are required. The rest of this chapter is organized around three tasks. Firstly, the correlations between the D E M errors and each of the seven local variables are examined. Then, the resulting terrain roughness classification will be statistically evaluated in conjunction with the observed spatial pattern of mismatch between DEMs of differing resolution. Finally, the quantitative relations between the extent and the spatial pattern of D E M errors, spatial resolution, and the terrain characteristics will be analyzed. 6.2 A New Approach to D E M Error Modelling A new approach to D E M error modelling is taken in this research to demonstrate the importance of scale in terrain characterization and to show how D E M accuracy is related to the D E M resolution and how the spatial variation of D E M errors (i.e., elevation differences) is related to the roughness of terrain. In this section, the terrain classification results presented in section 5.3 are evaluated in conjunction with the study area D E M errors presented in section 3.3. The idea is to evaluate the resulting classification and to examine the effectiveness of using multivariate hierarchical classification to quantify the relation between D E M errors and topographic complexity. This is done by visual inspection and quantitative analysis to find out if D E M errors show significant variation within and between different clusters resulting from terrain classification. If classification results do show significant correlation with D E M errors, then it would suggest that hierarchical terrain 278 classification could be used to model D E M errors and their spatial distribution. 6.2.1 D E M errors and local roughness measures Before examining the D E M errors in various terrain clusters derived from local roughness measures, the correlations between the D E M errors and each of the seven local variables are examined first. Figure 6.1a, for example, displays two scattergrams between D E M errors (WSC2 D E M errors as compared to T R I M D E M in section 3.3) and variable relief (derived using window size 7x7) for the two subareas (93G and 93H). For visualization purposes, a random sample of only 8000 points is taken from the original 52800 (i.e., 220 x 240) points and shown in the scattergram. Figure 6.1b shows two scattergrams between WSC2 D E M errors and variable relief derived using window size (21x21). The scattergrams between WSC2 D E M errors and other local roughness measures (i.e., slope, curv, hypint, std, highpt, and rough) derived using two different window sizes (7x7) and (21x21) for the two subareas are shown in Figures 6.2a-b to 6.7a-b. Several observations can be made based on the above scattergrams. Firstly, the positive and negative elevation differences are roughly symmetrical in the scattergrams between WSC2 D E M error and variables relief, slope, curv, hypint, std, and rough, no matter what the global and local characteristics are. This is in accordance with the relatively symmetrical histogram distributions of elevation differences derived from the six D E M comparisons as discussed in section 3.3. The above observation indicates that most local roughness variables can be useful only in relating the absolute amount of D E M error to the roughness measure but not in distinguishing the positive elevation differences from the 279 oo 03 <u S3 x> 3 o o oo O CO 4— •1 <u o o co r- o o CO aj N O o o c\j 3 03 o o 009- 009 3 03 0SML)S6'1!P w Q O 00 (N u o 3 (D CO CO CO o "53 60 o CM 03 O oo <U t— 1 0SI- 001- 09 0 09- OSM-6£6'1!P 00I-- 03 < i-Iu 3 60 280 281 C/3 03 CO O 3 1/1 CN 091- 001- 09 0 OS- 001-- ^ CD Si OSM-6£6'1!P 282 So 283 285 OO cd 0SML)e6'l!P CD 091. 001- 09 0 09- 00I-- 2 CD 0SM-6e61!P 286 g; X cu o N LO CM O oo a CM CO o LO oo JZ CO CO a *H—» cd T3 > to o c o *H o LO -a 0 009 cd 009- T3 a OSML|e6'l!P cd O cu w p LO CM CN u CO O CM LO r*- co co CO T -—' to LO c CU £ — I- > CU X> a a 4— • » 3CO td o o — •> CO 4 cU H 091- 001- 09 0 09- OSM-6S61IP 001- cd m NO CU LH a 288 5 289 291 292 negative ones. Secondly, the elevation differences do show certain consistent correlations with various local roughness variables. With most variables except highpt, the pattern seems to be that the higher the roughness measure, the more points with higher absolute elevation differences. This horn-shaped scatter of points indicates heteroscedasticity—the lack of equal variance. Thirdly, the scattergrams between WSC2 D E M error and variable highpt do present a different pattern. As seen in Figure 6.6b, for example, there seems to be more points with positive elevation differences when the variable highpt is low, but more points with negative elevation differences when the variable highpt is high. This suggests that the variable highpt may be the only one that is capable of indicating the sign of the elevation differences. Finally, there are some differences between the scattergrams for the two subareas and the two window sizes because of the different global characteristics of each surface. The scattergrams between elevation differences and local measures for other cases, such as for EMR1 D E M errors as compared to T R I M DEMs for the two subareas, NGDC5 and WSC2 D E M errors as compared to EMR1 D E M for the whole study area, and for variables derived using different window sizes, are not presented but similar patterns were observed. 6.2.2 D E M errors in various terrain clusters This section examines D E M errors within various terrain clusters derived from the multivariate classification. Both WSC2 and EMR1 D E M errors (as compared to 50 m T R I M DEM) are evaluated for both subarea surfaces (93G and 93H) in order to investigate the nature of D E M errors in surfaces with different global characteristics. In addition, NGDC5 294 and WSC2 D E M errors (as compared to 1 km E M R l D E M ) within each terrain cluster are evaluated for the whole study area in order to examine the role of resolution or scale in the study of D E M error. 6.2.2.1 WSC2 D E M errors as compared to T R I M D E M Firstly, WSC2 D E M errors (i.e., the elevation differences resulting from comparison of 2 km WSC2 and 50 m T R I M DEMs) for the two subareas (see Figures 3.20 and 3.21) were evaluated to demonstrate the technique. Figure 6.8a shows a histogram of WSC2 D E M errors within each of the three terrain classes based on geomorphometric variable group (3-5) and moving window size (7x7) for subareas 93G and 93H (the classification results were shown in Figure 5.1 la). Figures 6.8b gives the histogram of WSC2 D E M errors within each of the four classes derived from the same variable group and window size as above (i.e., variable group: 3-5, window size: 7x7). Table 6.1a lists some summary statistics including the number of data points, mean, and standard deviation of WSC2 D E M errors in each terrain class for both 93G and 93H subareas. In this table, WSC2 D E M errors at different hierarchical levels of terrain clusters (only levels 2 to 6 are shown due to space limitation) are summarized (variable group: 3-5, window size: 7x7). As mentioned earlier in section 6.2.1, most local roughness variables can only be useful in relating the absolute amount of D E M error to the roughness measure but not in distinguishing the positive elevation differences from the negative ones. Therefore, the absolute WSC2 D E M errors were also examined and their summary statistics within each terrain cluster (variable group: 3-5, 295 Cl, GO CU GO CO 03 3 O 60 - ^ 73 .03 2 S 03 > C c t> o CU W rn Cd "S X) « ,3 I § cu C+H .CH 0009 000S OOOfr OOOE 0003 0001 ° s 8 I U cU .9 * c o cm t3 o is h H 00 S3 U 3 3 o 03 "—' S3 03 CU 03 33 CO 2 o S£ 0008 0009 OOOV CU G 'O ao I SSi 0002 CN •3 60 O ts o ^ - SC cA 3H 03 OO NO H 3 OOOE 0092 0002 0031- 000 1 oooe 00S 296 0002 oooi 60 On Oh 00t> OOE 002 OOl CO uo 5 8 O LO O CL 3 O £ bo • J JZ 8 0009 0008 000t> OOOE 0003 0001 a c« 03 S 3 O 3 ° ^ (U 03 .9 S •- tU 1 2 2 2 c w a Q e 0092 0002 0091 000 1 cN •« u 3 008 Eg ig S U M 03 60 (U 2 o3 .23 x x hh 3 r-. X OO HIH to »—NO 60 297 Table 6.la Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 2.7 21.9 33340 15.9 221.3 2 10184 2.3 31.2 19460 5.0 136.7 1 10184 2.3 31.2 24142 53.3 222.5 2 24665 6.4 23.0 9198 -82.4 184.8 3 17951 -2.4 19.1 19460 5.0 136.7 1 24665 6.4 23.0 20025 47.5 220.9 2 8563 2.4 31.2 9198 -82.4 184.8 3 17951 -2.4 19.1 19460 5.0 136.7 4 1621 1.6 31.1 4117 81.5 228.4 1 17711 8.6 24.6 9198 -82.4 184.8 2 8563 2.4 31.2 19460 5.0 136.7 3 17951 -2.4 19.1 13670 42.7 204.2 4 1621 1.6 31.1 6355 58.0 252.8 5 6954 1.0 17.0 4117 81.5 228.4 1 8563 2.4 31.2 19460 5.0 136.7 2 17951 -2.4 19.1 13670 42.7 204.2 3 6654 12.2 23.5 6355 58.0 252.8 4 1621 1.6 31.1 6219 -49.4 148.9 5 11057 6.4 25.0 4117 81.5 228.4 6 6954 1.0 17.0 2979 -151.4 228.4 298 window size: 7x7) are given in Table 6.1b for the two subareas. For moving window size (21x21) and variable group (3-5), the summary statistics of WSC2 D E M errors are listed in Table 6.1c and the statistics of the absolute WSC2 D E M errors within each terrain cluster are summarized in Table 6.Id. To examine the effect of using different geomorphometric variable groups on terrain classification and hence on D E M error modelling, classification results based on a different variable group were also considered for the evaluation of WSC2 D E M errors as compared to T R I M DEMs for the two subareas. For example, more results were obtained for the two subareas by comparing WSC2 D E M errors within various terrain clusters derived from geomorphometric variable group (3-6) and different moving window sizes (7x7 or 21x21). With variable group (3-6), the statistics of WSC2 D E M errors and the absolute WSC2 D E M errors within each terrain cluster are summarized in Tables 6.2a-b for window size (7x7) and in Tables 6.2c-d for window size (21x21). As indicated earlier, variable group (3-6) includes curv, hypint, std, and highpt. Although these variables are not highly correlated and each represents some aspect of the roughness of terrain, it may be interesting to see how the results will differ if only one variable is used in the classification. In order to make this comparison, terrain clusters derived from slope (or 2) and std (or 5) are used as well for the examination of WSC2 D E M errors. With variable slope, the summary statistics of WSC2 D E M errors and the absolute WSC2 D E M errors in each cluster are given in Tables 6.3a-d for two different window sizes (7x7 and 299 Table 6.1b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas Class #of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 16.55 14.55 33340 170.55 141.88 2 10184 22.85 21.37 19460 103.29 89.62 1 10184 22.85 21.37 24142 182.16 138.51 2 24665 17.96 15.70 9198 140.07 146.06 3 17951 14.60 12.54 19460 103.29 89.62 1 24665 17.96 15.70 20025 179.92 136.66 2 8563 22.53 21.74 9198 140.07 146.06 3 17951 14.60 12.54 19460 103.29 89.62 4 1621 24.55 19.19 4117 193.06 146.71 1 17711 19.78 16.94 9198 140.07 146.06 2 8563 22.53 21.74 19460 103.29 89.62 3 17951 14.60 12.54 13670 166.31 125.89 4 1621 24.55 19.19 6355 209.21 153.30 5 6954 13.33 10.66 4117 193.06 146.71 1 8563 22.53 21.74 19460 103.29 89.62 2 17951 14.60 12.54 13670 166.31 125.89 3 6654 20.78 16.31 6355 209.21 153.30 4 1621 24.55 19.19 6219 111.96 109.87 5 11057 19.18 17.29 4117 193.06 146.71 6 6954 13.33 10.66 2979 198.77 188.63 300 Table 6. lc Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32226 4.3 27.3 30528 19.8 231.7 2 20574 0.0 17.2 22272 0.9 126.1 1 29720 3.6 27.7 22272 0.9 126.1 2 20574 0.0 17.2 17471 84.8 208.3 3 2506 13.1 19.8 13057 -67.1 232.9 1 12681 -2.9 32.6 17471 84.8 208.3 2 17039 8.4 22.2 13057 -67.1 232.9 3 20574 0.0 17.2 17147 12.6 126.7 4 2506 13.1 19.8 5125 -38.1 115.6 1 17039 8.4 22.2 13057 -67.1 232.9 2 20574 0.0 17.2 17147 12.6 126.7 3 2506 13.1 19.8 12113 87.8 197.6 4 9555 -4.9 28.0 5358 78.1 230.8 5 3126 3.2 43.3 5125 -38.1 115.6 1 20574 0.0 17.2 17147 12.6 126.7 2 2506 13.1 19.8 6245 -114.7 276.5 3 9555 -4.9 28.0 12113 87.8 197.6 4 13138 8.5 23.9 5358 78.1 230.8 5 3126 3.2 43.3 6812 -23.4 172.9 6 3901 7.8 14.7 5125 -38.1 115.6 301 Table 6.Id Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32226 20.41 18.61 30528 182.11 144.62 2 20574 13.62 10.48 22272 95.94 81.78 1 29720 20.40 19.07 22272 95.94 81.78 2 20574 13.62 10.48 17471 184.04 129.33 3 2506 20.49 11.95 13057 179.52 162.82 1 12681 23.72 22.58 17471 184.04 129.33 2 17039 17.93 15.50 13057 179.52 162.82 3 20574 13.62 10.48 17147 99.23 79.79 4 2506 20.49 11.95 5125 84.95 87.23 1 17039 17.93 15.50 13057 179.52 162.82 2 20574 13.62 10.48 17147 99.23 79.79 3 2506 20.49 11.95 12113 179.24 120.83 4 9555 20.85 19.32 5358 194.88 146.19 5 3126 32.49 28.74 5125 84.95 87.23 1 20574 13.62 10.48 17147 99.23 79.79 2 2506 20.49 11.95 6245 228.92 192.87 3 9555 20.85 19.32 12113 179.24 120.83 4 13138 19.59 16.21 5358 194.88 146.19 5 3126 32.49 28.74 6812 134.23 111.48 6 3901 12.34 11.18 5125 84.95 87.23 302 Table 6.2a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 24188 1.7 27.6 30135 28.9 226.1 2 28612 3.4 20.4 22665 -10.9 138.7 1 28612 3.4 20.4 21374 -4.1 222.2 2 22589 1.5 27.3 22665 -10.9 138.7 3 1599 3.7 30.3 8761 109.5 215.0 1 22589 1.5 27.3 22665 -10.9 138.7 2 16793 6.0 21.5 16869 24.0 210.3 3 1599 3.7 30.3 8761 109.5 215.0 4 11819 -0.3 18.1 4505 -109.3 233.8 1 16793 6.0 21.5 16869 24.0 210.3 2 11188 -5.6 25.5 8761 109.5 215.0 3 11401 8.5 27.3 8893 -49.6 137.0 4 1599 3.7 30.3 13772 14.1 134.0 5 11819 -0.3 18.1 4505 -109.3 233.8 1 11188 -5.6 25.5 8761 109.5 215.0 2 10099 8.1 22.9 8893 -49.6 137.0 3 11401 8.5 27.3 13092 1.8 198.3 4 1599 3.7 30.3 13772 14.1 134.0 5 11819 -0.3 18.1 4505 -109.3 233.8 6 6694 2.9 18.7 3777 100.9 231.8 303 Table 6.2b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 24188 20.12 18.90 30135 177.54 142.98 2 28612 15.77 13.37 22665 103.51 93.02 1 28612 15.77 13.37 21374 171.48 141.41 2 22589 19.82 18.90 22665 103.51 93.02 3 1599 24.40 18.41 8761 192.32 145.69 1 22589 19.82 18.90 22665 103.51 93.02 2 16793 16.92 14.54 16869 166.88 130.26 3 1599 24.40 18.41 8761 192.32 145.69 4 11819 14.15 11.29 4505 188.70 175.99 1 16793 16.92 14.54 16869 166.88 130.26 2 11188 18.97 17.94 8761 192.32 145.69 3 11401 20.64 19.76 8893 104.65 101.38 4 1599 24.40 18.41 13772 102.78 87.19 5 11819 14.15 11.29 4505 188.70 175.99 1 11188 18.97 17.94 8761 192.32 145.69 2 10099 18.49 15.76 8893 104.65 101.38 3 11401 20.64 19.76 13092 154.38 124.43 4 1599 24.40 18.41 13772 102.78 87.19 5 11819 14.15 11.29 4505 188.70 175.99 6 6694 14.54 12.11 3777 210.20 140.38 304 Table 6.2c Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) #of points (93H) Mean (93H) Std.dev (93H) 1 21257 -0.7 28.0 31940 24.8 227.2 2 31543 4.9 20.5 20860 -8.0 126.9 1 31543 4.88 20.46 19509 -38.28 231.63 2 18926 -2.4 28.37 20860 -8.0 126.88 3 2331 13.0 20.2 12431 123.85 179.77 1 18926 -2.4 28.37 20860 -8.0 126.88 2 13602 -5.15 14.8 13096 -24.27 259.99 3 17941 12.49 20.85 12431 123.85 179.77 4 2331 13.0 20.2 6413 -66.89 154.8 1 13602 -5.15 14.8 13096 -24.27 259.99 2 17941 12.49 20.85 12431 123.85 179.77 3 10152 -4.0 23.6 9461 -33.25 94.29 4 8774 -0.55 32.9 11399 12.9 145.30 5 2331 13.0 20.2 6413 -66.89 154.8 1 17941 12.49 20.85 12431 123.85 179.77 2 10152 -4.0 23.6 9461 -33.25 94.29 3 8774 -0.55 32.92 5010 -179.86 267.35 4 6741 -7.1 14.66 11399 12.9 145.30 5 2331 13.0 20.2 6413 -66.89 154.8 6 6861 -3.2 14.75 8086 72.1 202.2 305 Table 6.2d Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 21257 20.53 19.08 31940 178.33 142.87 2 31543 15.90 13.78 20860 95.89 83.48 1 31543 15.90 13.78 19509 178.47 152.54 2 18926 20.52 19.74 20860 95.89 83.48 3 2331 20.61 12.39 12431 178.12 126.21 1 18926 20.52 19.74 20860 95.89 83.48 2 13602 12.54 9.46 13096 202.64 164.69 3 17941 18.44 15.84 12431 178.12 126.21 4 2331 20.61 12.39 6413 129.12 108.49 1 13602 12.54 9.46 13096 202.64 164.69 2 17941 18.44 15.84 12431 178.12 126.21 3 10152 18.19 15.63 9461 75.48 65.57 4 8774 23.22 23.34 11399 112.83 92.47 5 2331 20.61 12.39 6413 129.12 108.49 1 17941 18.44 15.84 12431 178.12 126.21 2 10152 18.19 15.63 9461 75.48 65.57 3 8774 23.22 23.34 5010 245.98 208.14 4 6741 13.21 9.56 11399 112.83 92.47 5 2331 20.61 12.39 6413 129.12 108.49 6 6861 11.88 9.32 8086 175.78 123.30 306 Table 6.3a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32005 5.08 27.42 39256 -1.37 164.99 2 20795 -1.16 16.61 13544 50.16 258.08 1 27447 5.04 25.86 13544 50.16 258.08 2 4558 5.30 35.42 23103 -11.02 139.41 3 20795 -1.16 16.61 16153 12.44 195.04 1 4558 5.30 35.42 23103 -11.02 139.41 2 20795 -1.16 16.61 16153 12.44 195.04 3 14486 5.77 27.35 5818 55.85 281.82 4 12961 4.23 24.07 7726 45.88 238.58 1 20795 -1.16 16.61 16153 12.44 195.04 2 14486 5.77 27.35 5818 55.85 281.82 3 680 -13.11 30.11 10239 -28.92 113.91 4 3878 8.53 35.30 12864 3.22 155.31 5 12961 4.23 24.07 7726 45.88 238.58 1 14486 5.77 27.35 5818 55.85 281.82 2 680 -13.11 30.11 10239 -28.92 113.91 3 3878 8.53 35.30 12864 3.22 155.31 4 12961 4.23 24.07 7726 45.88 238.58 5 12420 -2.57 14.36 8438 19.45 206.13 6 8375 0.93 19.30 7715 4.76 181.85 307 Table 6.3b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32005 20.70 18.70 39256 123.41 109.53 2 20795 13.25 10.09 13544 210.56 157.43 1 27447 19.71 17.48 13544 210.56 157.43 2 4558 26.62 23.96 23103 104.49 92.94 3 20795 13.25 10.09 16153 150.46 124.73 1 4558 26.62 23.96 23103 104.49 92.94 2 20795 13.25 10.09 16153 150.46 124.73 3 14486 20.81 18.65 5818 232.75 168.41 4 12961 18.49 15.98 7726 193.85 146.43 1 20795 13.25 10.09 16153 150.46 124.73 2 14486 20.81 18.65 5818 232.75 168.41 3 680 25.51 20.66 10239 84.62 81.55 4 3878 26.81 24.49 12864 120.31 98.26 5 12961 18.49 15.98 7726 193.85 146.43 1 14486 20.81 18.65 5818 232.75 168.41 2 680 25.51 20.66 10239 84.62 81.55 3 3878 26.81 24.49 12864 120.31 98.26 4 12961 18.49 15.98 7726 193.85 146.43 5 12420 11.85 8.50 8438 162.35 128.48 6 8375 15.32 11.76 7715 137.45 119.15 308 Table 6.3c Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 29393 0.70 17.09 30265 20.63 234.96 2 23407 5.02 30.29 22535 0.05 119.39 1 23407 5.02 30.29 22535 0.05 119.39 2 17218 -2.29 13.98 23238 15.61 204.48 3 12175 4.94 19.95 7027 37.23 314.88 1 20555 5.30 27.30 23238 15.61 204.48 2 17218 -2.29 13.98 7027 37.23 314.88 3 2852 3.03 46.41 11407 -22.37 91.65 4 12175 4.94 19.95 11128 23.03 138.62 1 17218 -2.29 13.98 7027 37.23 314.88 2 2852 3.03 46.41 11407 -22.37 91.65 3 10574 5.30 28.98 14877 -0.23 197.76 4 12175 4.94 19.95 11128 23.03 138.62 5 9981 5.31 25.40 8361 43.80 213.04 1 2852 3.03 46.41 11407 -22.37 91.65 2 10574 5.30 28.98 14877 -0.23 197.76 3 12175 4.94 19.95 11128 23.03 138.62 4 9481 -2.35 12.97 5413 24.30 304.59 5 9981 5.31 25.40 8361 43.80 213.04 6 7737 -2.23 15.12 1614 80.63 343.72 309 Table 6.3d Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 29393 13.75 10.17 30265 185.43 145.75 2 23407 22.81 20.55 22535 92.48 75.51 1 23407 22.81 20.55 22535 92.48 75.51 2 17218 11.50 8.28 23238 162.72 124.82 3 12175 16.93 11.65 7027 260.56 180.65 1 20555 21.13 18.08 23238 162.72 124.82 2 17218 11.50 8.28 7027 260.56 180.65 3 2852 34.90 30.73 11407 71.87 61.10 4 12175 16.93 11.65 11128 113.61 82.69 1 17218 11.50 8.28 7027 260.56 180.65 2 2852 34.90 30.73 11407 71.87 61.10 3 10572 21.91 19.69 14877 158.07 118.83 4 12175 16.93 11.65 11128 113.61 82.69 5 9981 20.31 16.16 8361 170.98 134.42 1 2852 34.90 30.73 11407 71.87 61.10 2 10574 21.91 19.69 14877 158.07 118.83 3 12175 16.93 11.65 11128 113.61 82.69 4 9481 11.11 7.10 5413 248.87 177.25 5 9981 20.31 16.16 8361 170.98 134.42 6 7737 11.97 9.50 1614 299.76 186.38 310 21x21). Tables 6.4a-d list the summary results for variable std. In order to examine the relation between the D E M errors and the "roughness", the above summary statistics of WSC2 D E M errors in each terrain cluster need to be related back to the roughness characteristics of each cluster as interpreted in section 5.3.3. Therefore, for each case (subarea: 93G or 93H, window size: 7x7 or 21x21, variable group: 3-5, 3-6, 2, or 5) the clusters at each hierarchical level are ranked from the one with the highest D E M error to the one with the lowest D E M error. Based on the observations made in section 6.2.1 when examining the scattergrams between WSC2 D E M errors and each of the seven local variables (e.g., roughly symmetrical distribution of the positive and negative elevation differences for most variables), the standard deviation of WSC2 D E M errors in each cluster can be considered as a measure of the overall amount of error in that cluster. Whereas for the absolute WSC2 D E M errors in each cluster, the mean absolute error should serve as such a measure. As shown in Figure 6.8a, for example, cluster #1 in subarea 93G is the roughest as identified in section 5.3.3 and it corresponds to the largest standard deviation of elevation differences (i.e., the most spread-out histogram of WSC2 D E M errors). Cluster #3 represents the least rough class in the subarea (as seen in section 5.3.3) and it corresponds to a histogram with the smallest standard deviation. The same is true for subarea 93H. That is, the ranking of the three clusters based on WSC2 D E M errors appears to be consistent with the ranking of the clusters by "roughness." Based on the information presented in Tables 6. la-d to 6.4a-d, the ranking of the clusters 311 Table 6.4a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) #of points (93H) Mean (93H) Std.dev (93H) 1 25255 5.2 28.2 23109 38.6 233.9 2 27545 0.3 19.0 29691 -9.0 154.0 1 7953 6.9 31.7 9386 52.5 259.9 2 27545 0.3 19.0 29691 -9.0 154.0 3 17302 4.4 26.4 13723 29.1 213.8 1 27545 0.3 19.0 29691 -9.0 154.0 2 17302 4.4 26.4 2681 35.8 291.3 3 1782 0.8 33.4 13723 29.1 213.8 4 6171 8.7 30.9 6705 59.1 245.9 1 17302 4.4 26.4 2681 35.8 291.3 2 1782 0.8 33.4 13723 29.1 213.8 3 6171 8.7 30.9 6705 59.1 245.9 4 14745 2.7 21.2 17792 3.5 169.9 5 12800 -2.5 15.6 11899 -27.6 124.2 1 1782 0.8 33.4 13723 29.1 213.8 2 6171 8.7 30.9 6705 59.1 245.9 3 14745 2.7 21.2 17792 3.5 169.9 4 10696 3.0 25.0 11899 -27.6 124.2 5 12800 -2.5 15.6 2238 44.8 288.7 6 6606 6.6 28.5 443 -9.6 300.7 312 Table 6.4b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 25255 21.23 19.26 23109 186.57 146.25 2 27545 14.59 12.12 29691 114.00 103.87 1 7953 23.73 22.07 9386 213.09 157.74 2 27545 14.59 12.12 29691 114.00 103.87 3 17302 20.08 17.70 13723 168.43 134.88 1 27545 14.59 12.12 29691 114.00 103.87 2 17302 20.08 17.70 2681 244.54 162.29 3 1782 25.27 21.87 13723 168.43 134.88 4 6171 23.28 22.11 6705 200.52 154.12 1 17302 20.08 17.70 2681 244.54 162.29 2 1782 25.27 21.87 13723 168.43 134.88 3 6171 23.28 22.11 6705 200.52 154.12 4 14745 16.23 13.83 17792 128.26 111.44 5 12800 12.69 9.45 11899 92.67 87.12 1 1782 25.27 21.87 13723 168.43 134.88 2 6171 23.28 22.11 6705 200.52 154.12 3 14745 16.23 13.83 17792 128.26 111.44 4 10696 19.27 16.16 11899 92.67 87.12 5 12800 12.69 9.45 2238 243.16 161.81 6 6606 21.39 19.89 443 251.54 164.70 313 Table 6.4c Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 36025 2.0 19.6 37778 10.7 157.6 2 16775 3.9 31.3 15022 14.6 265.5 1 16775 3.9 31.3 15022 14.6 265.5 2 22479 4.0 22.2 22374 3.3 127.9 3 13546 -1.2 13.8 15404 21.5 192.3 1 22479 4.0 22.2 22374 3.3 127.9 2 6296 4.7 38.5 5711 -14.6 296.4 3 10479 3.5 26.0 15404 21.5 192.3 4 13546 -1.2 13.8 9311 32.5 243.0 1 6296 4.7 38.5 5711 -14.6 296.4 2 10479 3.5 26.0 15404 21.5 192.3 3 13546 -1.2 13.8 12237 -13.6 106.0 4 11958 4.9 25.6 9311 32.5 243.0 5 10521 3.0 17.5 10137 23.8 147.6 1 10479 3.5 26.0 15404 21.5 192.3 2 13546 -1.2 13.8 12237 -13.6 106.0 3 5057 3.2 34.8 9311 32.5 243.0 4 11958 4.9 25.6 4468 6.7 296.5 5 10521 3.0 17.5 10137 23.8 147.6 6 1239 10.5 50.3 1243 -91.0 283.3 314 Table 6.4d Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 36025 15.19 12.58 37778 120.67 101.94 2 16775 23.30 21.21 15022 208.87 164.57 1 16775 23.30 21.21 15022 208.87 164.57 2 22479 17.57 14.10 22374 98.89 81.15 3 13546 11.22 8.09 15404 152.31 119.30 1 22479 17.57 14.10 22374 98.89 81.15 2 6296 28.64 26.13 5711 236.96 178.59 3 10479 20.09 16.81 15404 152.31 119.30 4 13546 11.22 8.09 9311 191.63 152.82 1 6296 28.64 26.13 5711 236.96 178.59 2 10479 20.09 16.81 15404 152.31 119.30 3 13546 11.22 8.09 12237 82.31 68.12 4 11958 20.21 16.38 9311 191.63 152.82 5 10521 14.57 10.15 10137 118.90 90.56 1 10479 20.09 16.81 15404 152.31 119.30 2 13546 11.22 8.09 12237 82.31 68.12 3 5057 26.53 22.81 9311 191.63 152.82 4 11958 20.21 16.38 4468 235.24 180.52 5 10521 14.57 10.15 10137 118.90 90.56 6 1239 37.25 35.44 1243 243.16 171.38 315 according to the standard deviation-based WSC2 D E M errors (first line) and the absolute WSC2 D E M errors (second line) can be summarized as follows for various cases: 1) For subarea 93G variable group: 3-5, window size: 7x7 {2,1} {1,2,3} {2,4,1,3} {2,4,1,3,5} {1,4,5,3,2,6} {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,3,5,2,6} variable group: 3-5, window size: 21x21 {1,2} {1,3,2} {1,2,4,3} {5,4,1,3,2} {5,3,4,2,1,6} {1,2} {3,1,2} {1,4,2,3} {5,4,3,1,2} {5,3,2,4,1,6} variable group: 3-6, window size: 7x7 {1,2} {3,2,1} {3,1,2,4} {4,3,2,1,5} {4,3,1,2,6,5} {1,2} {3,2,1} {3,1,2,4} {4,3,2,1,5} {4,3,1,2,6,5} variable group: 3-6, window size: 21x21 {1,2} {2,1,3} {1,3,4,2} {4,3,2,5,1} {3,2,1,5,6,4} {1,2} {3,2,1} {4,1,3,2} {4,5,2,3,1} {3,5,1,2,4,6} variable group: 2, window size: 7x7 {1,2} {2,1,3} {1,3,4,2} {4,3,2,5,1} {3,2,1,4,6,5} {1,2} {2,1,3} {1,3,4,2} {4,3,2,5,1} {3,2,1,4,6,5} variable group: 2, window size: 21x21 {2,1} {1,3,2} {3,1,4,2} {2,3,5,4,1} {1,2,5,3,6,4} {2,1} {1,3,2} {3,1,4,2} {2,3,5,4,1} {1,2,5,3,6,4} variable group: 5, window size: 7x7 316 {1,2} {1,3,2} {3,4,2,1} {2,3,1,4,5} {1,2,6,4,3,5} {1,2} {1,3,2} {3,4,2,1} {2,3,1,4,5} {1,2,6,4,3,5} variable group: 5, window size: 21x21 {2,1} {1,2,3} {2,3,1,4} {1,2,4,5,3} {6,3,1,4,5,2} {2,1} {1,2,3} {2,3,1,4} {1,4,2,5,3} {6,3,4,1,5,2} 2) For subarea 93H variable group: 3-5, window size: 7x7 {1,2} {1,2,3} {4,1,2,3} {4,5,3,1,2} {3,6,5,2,4,1} {1,2} {1,2,3} {4,1,2,3} {4,5,3,1,2} {3,6,5,2,4,1} variable group: 3-5, window size: 21x21 {1,2} {3,2,1} {2,1,3,4} {1,4,3,2,5} {2,4,3,5,1,6} {1,2} {2,3,1} {1,2,3,4} {4,1,3,2,5} {2,4,3,5,1,6} variable group: 3-6, window size: 7x7 {1,2} {1,3,2} {4,3,2,1} {5,2,1,3,4} {5,6,1,3,2,4} {1,2} {3,1,2} {3,4,2,1} {2,5,1,3,4} {6,1,5,3,2,4} variable group: 3-6, window size: 21x21 {1,2} {1,3,2} {2,3,4,1} {1,2,5,4,3} {3,6,1,5,4,2} {1,2} {1,3,2} {2,3,4,1} {1,2,5,4,3} {3,1,6,5,4,2} variable group: 2, window size: 7x7 {2,1} {1,3,2} {3,4,2,1} {2,5,1,4,3} {1,4,5,6,3,2} {2,1} {1,3,2} {3,4,2,1} {2,5,1,4,3} {1,4,5,6,3,2} variable group: 2, window size: 21x21 317 {1,2} {3,2,1} {2,1,4,3} {1,5,3,4,2} {6,4,5,2,3,1} {1,2} {3,2,1} {2,1,4,3} {1,5,3,4,2} {6,4,5,2,3,1} variable group: 5, window size: 7x7 {1,2} {1,3,2} {2,4,3,1} {1,3,2,4,5} {6,5,2,1,3,4} {1,2} {1,3,2} {2,4,3,1} {1,3,2,4,5} {6,5,2,1,3,4} variable group: 5, window size: 21x21 {2,1} {1,3,2} {2,4,3,1} {1,4,2,5,3} {4,6,3,1,5,2} {2,1} {1,3,2} {2,4,3,1} {1,4,2,5,3} {6,4,3,1,5,2} By comparing the above rankings of the clusters based on WSC2 D E M errors to those "roughness" rankings based on the local variables presented in section 5.3.3, several observations can be made. First, a fairly consistent relation exists between D E M errors and the roughness of the terrain clusters for all the cases tested. That is, the general pattern seems to be that the rougher the cluster, the larger the D E M error (measured with either the standard deviation of the elevation differences or the mean of the absolute elevation differences in each cluster). For example, with variable group (3-5) and window size 7x7 in subarea 93G, cluster #1 at level 3 is the roughest and it has the largest standard deviation of WSC2 D E M errors and the highest mean of the absolute WSC2 D E M errors. Cluster #2 is the second roughest and it has the second largest standard deviation of WSC2 D E M errors and the mean of the absolute WSC2 D E M errors. Cluster #3 is the least rough one and it corresponds to a cluster with the smallest amount of WSC2 D E M errors. With variable group (3-6) and window size 7x7 in subarea 93G, cluster #3 at level 3 is the roughest and it has the 318 largest standard deviation of WSG2 D E M errors and the highest mean of the absolute WSC2 D E M errors. Cluster #2 is the second roughest and it has the second largest standard deviation of WSC2 D E M errors and the mean of the absolute WSC2 D E M errors. Cluster #1 is the least rough one and it corresponds to a cluster with the smallest amount of WSC2 D E M errors. Second, the rankings of the clusters based on the original WSC2 D E M errors (measured with the standard deviation of the elevation differences) and on the absolute WSC2 D E M errors (measured with the mean of the absolute elevation differences) are fairly consistent (at least the first four levels) for most cases, especially with variable group (2) and group (5). This is because of the relatively symmetrical distribution of the positive and negative elevation differences. However, with variable groups (3-5) and (3-6), the rankings of the clusters based on the two different measures in some cases (e.g., window size 21x21 for subarea 93G) show consistency only at the first two levels. This can probably be attributed to the inclusion of the variables curv, hypint and/or highpt in the classification of terrain clusters and the characteristics of these variables. As indicated in section 6.2.1, for example, there seems to be more D E M points with positive elevation differences when the variable highpt is low, but more points with negative elevation differences when highpt value is high. When there is an asymmetrical distribution of positive and negative elevation differences, the rankings of the clusters based on the original WSC2 D E M errors and on the absolute WSC2 D E M errors will likely show some inconsistency. It is, thus, desirable to examine both the original D E M errors and the absolute D E M errors. Finally, it should be noted that the differences between D E M errors in some clusters become insignificant at certain hierarchy levels for various cases. For example, the standard deviations of WSC2 319 D E M errors in clusters #2 and #4 at level 4 for subarea 93G with variable group 3-5 and window size 7x7 (see Table 6.1a) are only slightly different (i.e., 31.2 m in #2 versus 31.1 m in #4). The ranking of the clusters in this case then become meaningless. The significance test of D E M error variation will be carried out later in section 6.2.3. 6.2.2.2 EMR1 D E M errors as compared to T R I M D E M In order to find out if the above test results can be generalized for DEMs with different resolutions, the evaluation of EMR1 D E M errors (derived by comparing EMR1 with T R I M DEMs; see Figures 3.18 and 3.19.) within each terrain cluster was also conducted for the two subareas (93G and 93H) in different cases (i.e., window size 7x7 or 21x21, variable group 3-5, 3-6, 2 or 5 only). With variable group (3-5), the summary statistics of EMR1 D E M errors and the absolute EMR1 D E M errors in each cluster are listed in Tables 6.5a-b for window size (7x7) and in Tables 6.5c-d for window size (21x21). With variable group (3-6), the results are summarized in Tables 6.6a-d. Tables 6.7a-d and Tables 6.8a-d show the results of summary statistics of EMR1 D E M errors and the absolute EMR1 D E M errors for the case of using only a single variable slope (i.e., variable #2) or std (i.e., variable #5) in the classification. The rankings of the clusters based on EMR1 D E M errors and the absolute EMR1 D E M errors for various cases are not presented, but similar observations can be made as in the above section for WSC2 D E M errors. 320 Table 6.5a Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 1.19 13.55 33340 38.73 143.12 2 10184 0.38 23.39 19460 2.21 83.29 1 10184 0.38 23.39 24142 69.42 148.80 2 24665 4.16 14.56 9198 -41.83 84.68 3 17951 -2.89 10.76 19460 2.21 83.29 1 24665 4.16 14.56 20025 50.64 143.72 2 8563 0.07 22.80 9198 -41.83 84.68 3 17951 -2.89 10.76 19460 2.21 83.29 4 1621 2.04 26.21 4117 160.74 139.00 1 17711 5.26 15.64 9198 -41.83 84.68 2 8563 0.07 22.80 19460 2.21 83.29 3 17951 -2.89 10.76 13670 25.17 130.27 4 1621 2.04 26.21 6355 105.44 155.51 5 6954 1.38 10.89 4117 160.74 139.00 1 8563 0.07 22.80 19460 2.21 83.29 2 17951 -2.89 10.76 13670 25.17 130.27 3 6654 8.44 15.42 6355 105.44 155.51 4 1621 2.04 26.21 6219 -43.72 79.91 5 11057 3.34 15.46 4117 160.74 139.00 6 6954 1.38 10.89 2979 -37.88 93.76 321 Table 6.5b Summary statistics of the absolute E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 42616 10.23 8.98 33340 112.03 97.13 2 10184 16.93 16.15 19460 61.10 56.66 1 10184 16.93 16.15 24142 127.08 103.98 2 24665 11.33 10.07 9198 72.54 60.50 3 17951 8.71 6.96 19460 61.10 56.66 1 24665 11.33 10.07 20025 117.97 96.46 2 8563 16.06 16.19 9198 72.54 60.50 3 17951 8.71 6.96 19460 61.10 56.66 4 1621 21.48 15.16 4117 171.39 125.65 1 17711 12.43 10.86 9198 72.54 60.50 2 8563 16.06 16.19 19460 61.10 56.66 3 17951 8.71 6.96 13670 104.15 82.20 4 1621 21.48 15.16 6355 147.68 116.15 5 6954 8.53 6.92 4117 171.39 125.65 1 8563 16.06 16.19 19460 61.10 56.66 2 17951 8.71 6.96 13670 104.15 82.20 3 6654 13.67 11.06 6355 147.68 116.15 4 1621 21.48 15.16 6219 69.05 59.41 5 11057 11.68 10.67 4117 171.39 125.65 6 6954 8.53 6.92 2979 79.81 62.09 322 Table 6.5c Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Class #of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32226 2.31 18.56 30528 51.20 148.59 2 20574 -0.95 10.28 22272 -10.27 70.80 1 29720 1.65 18.19 22272 -10.27 70.80 2 20574 -0.95 10.28 17471 78.67 155.04 3 2506 10.06 20.92 13057 14.45 130.75 1 12681 -1.86 21.76 17471 78.67 155.04 2 17039 4.27 14.45 13057 14.45 130.75 3 20574 -0.95 10.28 17147 -3.43 72.50 4 2506 10.06 20.92 5125 -33.18 59.31 1 17039 4.27 14.45 13057 14.45 130.75 2 20574 -0.95 10.28 17147 -3.43 72.50 3 2506 10.06 20.92 12113 54.05 149.58 4 9555 -5.06 14.00 5358 134.32 152.77 5 3126 7.93 34.58 5125 -33.18 59.31 1 20574 -0.95 10.28 17147 -3.43 72.50 2 2506 10.06 20.92 6245 20.91 145.53 3 9555 -5.06 14.00 12113 54.05 149.58 4 13138 3.58 15.39 5358 134.32 152.77 5 3126 7.93 34.58 6812 8.53 115.26 6 3901 6.61 10.32 5125 -33.18 59.31 , 323 Table 6.5d Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32226 13.68 12.76 30528 121.01 100.29 2 20574 8.14 6.36 22272 55.23 45.48 1 29720 13.24 12.60 22272 55.23 45.48 2 20574 8.14 6.36 17471 137.32 106.64 3 2506 18.91 13.48 13057 99.18 86.43 1 12681 15.27 15.63 17471 137.32 106.64 2 17039 11.72 9.47 13057 99.18 86.43 3 20574 8.14 6.36 17147 56.55 45.50 4 2506 18.91 13.48 5125 50.80 45.15 1 17039 11.72 9.47 13057 99.18 86.43 2 20574 8.14 6.36 17147 56.55 45.50 3 2506 18.91 13.48 12113 127.83 94.63 4 9555 11.59 9.34 5358 158.77 127.17 5 3126 26.49 23.61 5125 50.80 45.15 1 20574 8.14 6.36 17147 56.55 45.50 2 2506 18.91 13.48 6245 111.42 95.93 3 9555 11.59 9.34 12113 127.83 94.63 4 13138 12.34 9.88 5358 158.77 127.17 5 3126 26.49 23.61 6812 87.96 74.96 6 3901 9.64 7.58 5125 50.80 45.15 324 Table 6.6a Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) #of points (93H) Mean (93H) Std.dev (93H) 1 24188 -0.38 18.90 30135 53.05 144.69 2 28612 2.24 12.78 22665 -11.67 81.15 1 28612 2.24 12.78 21374 17.20 129.02 2 22589 -0.71 18.28 22665 -11.67 81.15 3 1599 4.24 25.72 8761 140.54 143.56 1 22589 -0.71 18.28 22665 -11.67 81.15 16793 4.65 13.65 16869 26.15 133.15 3 1599 4.24 25.72 8761 140.54 143.56 4 11819 -1.17 10.51 4505 -16.34 105.69 1 16793 4.65 13.65 16869 26.15 133.15 2 11188 -7.29 14.07 8761 140.54 143.56 3 11401 5.73 19.60 8893 -40.53 74.33 4 1599 4.24 25.72 13772 6.96 79.90 5 11819 -1.17 10.51 4505 -16.34 105.69 1 11188 -7.29 14.07 8761 140.54 143.56 2 10099 6.66 14.46 8893 -40.53 74.33 3 11401 5.73 19.60 13092 -0.74 117.85 4 1599 4.24 25.72 13772 6.96 79.90 5 11819 -1.17 10.51 4505 -16.34 105.69 6 6694 1.61 11.70 3777 119.39 140.86 2 ' 325 Table 6.6b Summary statistics of the absolute E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 24188 13.67 13.07 30135 117.56 99.65 2 28612 9.70 8.62 22665 60.95 54.83 1 28612 9.70 8.62 21374 101.64 81.31 2 22589 13.12 12.76 22665 60.95 54.83 3 1599 21.49 14.76 8761 156.41 126.10 1 22589 13.12 12.76 22665 60.95 54.83 2 16793 10.71 9.67 16869 106.12 84.57 3 1599 21.49 14.76 8761 156.41 126.10 4 11819 8.27 6.60 4505 84.87 65.07 1 16793 10.71 9.67 16869 106.12 84.57 2 11188 12.16 10.16 8761 156.41 126.10 3 11401 14.05 14.82 8893 64.09 55.34 4 1599 21.49 14.76 13772 58.93 54.41 5 11819 8.27 6.60 4505 84.87 65.07 1 11188 12.16 10.16 8761 156.41 126.10 2 10099 11.80 10.69 8893 64.09 55.34 3 11401 14.05 14.82 13092 94.21 70.81 4 1599 21.49 14.76 13772 58.93 54.41 5 11819 8.27 6.60 4505 84.87 65.07 6 6694 9.06 7.59 3777 147.41 111.21 326 Table 6.6c Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 21257 -1.89 19.09 31940 57.63 142.95 2 31543 3.01 13.02 20860 -24.28 68.17 1 31543 3.01 13.02 19509 14.62 132.25 2 18926 -3.32 18.32 20860 -24.2 68.17 3 2331 9.71 21.14 12431 125.14 132.64 1 18926 -3.32 18.32 20860 -24.2 68.17 2 13602 -4.45 8.52 13096 36.71 145.60 3 17941 8.67 13.00 12431 125.14 132.64 4 2331 9.71 21.14 6413 -30.49 82.98 1 13602 -4.45 8.52 13096 36.71 145.60 2 17941 8.67 13.00 12431 125.14 132.64 3 10152 -6.46 12.79 9461 -32.00 51.41 4 8774 0.31 22.59 11399 -17.87 78.87 5 2331 9.71 21.14 6413 -30.49 82.98 1 17941 8.67 13.00 12431 125.14 132.64 2 10152 -6.46 12.79 9461 -32.00 51.41 3 8774 0.31 22.59 5010 -6.15 117.46 4 6741 -4.40 7.90 11399 -17.87 78.87 5 2331 9.71 21.14 6413 -30.49 82.98 6 6861 -4.50 9.09 8086 63.27 154.74 327 Table 6.6d Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 21257 13.95 13.18 31940 117.04 100.30 2 31543 9.88 9.01 20860 56.86 44.78 1 31543 9.88 9.01 19509 102.28 85.11 2 18926 13.32 13.02 20860 56.86 44.78 3 2331 19.02 13.41 12431 140.19 116.63 1 18926 13.32 13.02 20860 56.86 44.78 2 13602 7.77 5.68 13096 119.10 91.46 3 17941 11.48 10.60 12431 140.19 116.63 4 2331 19.02 13.41 6413 67.94 56.56 1 13602 7.77 5.68 13096 119.10 91.46 2 17941 11.48 10.60 12431 140.19 116.63 3 10152 11.29 8.83 9461 45.60 39.87 4 8774 15.67 16.28 11399 66.20 46.46 5 2331 19.02 13.41 6413 67.94 56.56 1 17941 11.48 10.60 12431 140.19 116.63 2 10152 11.29 8.83 9461 45.60 39.87 3 8774 15.67 16.28 5010 92.78 72.30 4 6741 7.48 5.09 11399 66.20 46.46 5 2331 19.02 13.41 6413 67.94 56.56 6 6861 8.05 6.19 8086 135.40 98.05 328 Table 6.7a Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32005 2.11 18.47 39256 -1.39 100.09 2 20795 -0.61 10.73 13544 102.56 156.63 1 27447 1.93 16.36 13544 102.56 156.63 2 4558 3.25 27.97 23103 -4.71 82.45 3 20795 -0.61 10.73 16153 3.34 120.79 1 4558 3.25 27.97 23103 -4.71 82.45 2 20795 -0.61 10.73 16153 3.34 120.79 3 14486 3.07 18.42 5818 145.09 167.01 4 12961 0.65 13.60 7726 70.53 140.06 1 20795 -0.61 10.73 16153 3.34 120.79 2 14486 3.07 18.42 5818 145.09 167.01 3 680 -6.31 26.85 10239 -20.22 57.63 4 3878 4.93 27.83 12864 7.65 96.02 5 12961 0.65 13.60 7726 70.53 140.06 1 14486 3.07 18.42 5818 145.09 167.01 2 680 -6.31 26.85 10239 -20.22 57.63 3 3878 4.93 27.83 12864 7.65 96.02 4 12961 0.65 13.60 7726 70.53 140.06 5 12420 -1.56 9.59 8438 11.87 127.30 6 8375 0.79 12.09 7715 -5.98 112.51 329 Table 6.7b Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 32005 13.56 12.72 39256 74.51 66.85 2 20795 8.37 6.75 13544 147.59 115.17 1 27447 12.27 11.00 13544 147.59 115.17 2 4558 21.33 18.38 23103 60.09 56.64 3 20795 8.37 6.75 16153 95.14 74.48 1 4558 21.33 18.38 23103 60.09 56.64 2 20795 8.37 6.75 16153 95.14 74.48 3 14486 13.76 12.62 5818 178.35 130.89 4 12961 10.61 8.53 7726 124.43 95.42 1 20795 8.37 6.75 16153 95.14 74.48 2 14486 13.76 12.62 5818 178.35 130.89 3 680 21.73 16.97 10239 44.27 42.08 4 3878 21.25 18.62 12864 72.68 63.21 5 12961 10.61 8.53 7726 124.43 95.42 1 14486 13.76 12.62 5818 178.35 130.89 2 680 21.73 16.97 10239 44.27 42.08 3 3878 21.25 18.62 12864 72.68 63.21 4 12961 10.61 8.53 7726 124.43 95.42 5 12420 7.75 5.86 8438 101.23 78.09 6 8375 9.29 7.78 7715 88.49 69.72 330 Table 6.7c Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 29393 0.14 10.95 30265 52.47 149.65 2 23407 2.17 20.50 22535 -11.26 67.96 1 23407 2.17 20.50 22535 -11.26 67.96 2 17218 -0.76 8.61 23238 25.09 129.58 3 12175 1.42 13.48 7027 143.01 173.95 1 20555 1.58 16.73 23238 25.09 129.58 2 17218 -0.76 8.61 7027 143.01 173.95 3 2852 6.41 37.59 11407 -17.01 51.38 4 12175 1.42 13.48 11128 -5.36 81.11 1 17218 -0.76 8.61 7027 143.01 173.95 2 2852 6.41 37.59 11407 -17.01 51.38 3 10574 1.61 17.67 14877 6.98 113.72 4 12175 1.42 13.48 11128 -5.36 81.11 5 9981 1.55 15.66 8361 57.31 148.45 1 2852 6.41 37.59 11407 -17.01 51.38 2 10574 1.61 17.67 14877 6.98 113.72 3 12175 1.42 13.48 11128 -5.36 81.11 4 9481 -0.91 8.06 5413 120.58 164.38 5 9981 1.55 15.66 8361 57.31 148.45 6 7737 -0.57 9.24 1614 218.26 183.83 331 Table 6.7d Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 29393 8.61 6.75 30265 122.80 100.35 2 23407 15.16 13.97 22535 53.59 43.28 1 23407 15.16 13.97 22535 53.59 43.28 2 17218 7.10 4.93 23238 105.18 79.73 3 12175 10.76 8.24 7027 181.05 133.91 1 20555 13.18 10.42 23238 105.18 79.73 2 17218 7.10 4.93 7027 181.05 133.91 3 2852 29.45 24.21 11407 41.54 34.70 4 12175 10.76 8.24 11128 65.95 47.51 1 17218 7.10 4.93 7027 181.05 133.91 2 2852 29.45 24.21 11407 41.54 34.70 3 10574 13.87 11.06 14877 92.17 66.96 4 12175 10.76 8.24 11128 65.95 47.51 5 9981 12.44 9.64 8361 128.33 94.09 1 2852 29.45 24.21 11407 41.54 34.70 2 10574 13.87 11.06 14877 92.17 66.96 3 12175 10.76 8.24 11128 65.95 47.51 4 9481 6.96 4.16 5413 164.15 120.88 5 9981 12.44 9.64 8361 128.33 94.09 6 7737 7.26 5.73 1614 237.72 157.84 332 Table 6.8a Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas Class #of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 25255 2.7 19.4 23109 59.5 151.1 2 27545 -0.5 11.7 29691 -1.4 93.3 1 7953 5.3 24.9 9386 105.9 156.9 2 27545 -0.5 11.7 29691 -1.4 93.3 3 17302 1.5 16.2 13723 27.7 138.3 1 27545 -0.5 11.7 29691 -1.4 93.3 2 17302 1.5 16.2 2681 136.6 155.5 3 1782 4.4 30.7 13723 27.7 138.3 4 6171 5.6 22.9 6705 93.7 155.9 1 17302 1.5 16.2 2681 136.6 155.5 2 1782 4.4 30.7 13723 27.7 138.3 3 6171 5.6 22.9 6705 93.7 155.9 4 14745 0.4 12.8 17792 9.4 104.8 5 12800 -1.4 10.2 11899 -17.5 69.6 1 1782 4.4 30.7 13723 27.7 138.3 2 6171 5.6 22.9 6705 93.7 155.9 3 14745 0.4 12.8 17792 9.4 104.8 4 10696 0.8 14.8 11899 -17.5 69.6 5 12800 -1.4 10.2 2238 136.0 155.2 6 6606 2.6 18.1 443 139.7 157.1 333 Table 6.8b Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 25255 14.20 13.52 23109 125.79 102.76 2 27545 9.06 7.38 29691 67.94 63.93 1 7953 18.34 17.64 9386 148.84 117.14 2 27545 9.06 7.38 29691 67.94 63.93 3 17302 12.29 10.60 13723 110.03 88.24 1 27545 9.06 7.38 29691 67.94 63.93 2 17302 12.29 10.60 2681 166.39 123.09 3 1782 23.63 20.10 13723 110.03 88.24 4 6171 16.82 16.55 6705 141.83 113.93 1 17302 12.29 10.60 2681 166.39 123.09 2 1782 23.63 20.10 13723 110.03 88.24 3 6171 16.82 16.55 6705 141.83 113.93 4 14745 9.84 8.16 17792 79.40 69.06 5 12800 8.16 6.25 11899 50.80 50.74 1 1782 23.63 20.10 13723 110.03 88.24 2 6171 16.82 16.55 6705 141.83 113.93 3 14745 9.84 8.16 17792 79.40 69.06 4 10696 11.49 9.41 11899 50.80 50.74 5 12800 8.16 6.25 2238 165.71 122.95 6 6606 13.59 12.17 443 169.80 123.88 334 Table 6.8c Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) # of points (93H) Mean (93H) Std.dev (93H) 1 36025 0.16 11.8 37778 8.7 102.6 2 16775 2.9 22.2 15022 67.0 163.2 1 16775 2.9 22.2 15022 67.0 163.2 2 22479 0.45 13.19 22374 -2.2 76.7 3 13546 -0.3 9.1 15404 24.5 129.8 1 22479 0.45 13.19 22374 -2.2 76.7 2 6296 4.37 28.47 5711 82.3 165.2 3 10479 2.0 17.35 15404 24.5 129.8 4 13546 -0.3 9.1 9311 57.6 161.3 1 6296 4.37 28.47 5711 82.3 165.2 2 10479 2.0 17.35 15404 24.5 129.8 3 13546 -0.3 9.1 12237 -12.1 62.3 4 11958 0.15 13.6 9311 57.6 161.3 5 10521 0.8 12.66 10137 9.7 89.6 1 10479 2.0 17.4 15404 24.5 129.8 2 13546 -0.3 9.1 12237 -12.1 62.3 3 5057 1.0 23.3 9311 57.6 161.3 4 11958 0.2 13.6 4468 81.3 170.8 5 10521 0.8 12.7 10137 9.7 89.6 6 1239 18.1 40.96 1243 86.1 143.2 335 Table 6.8d Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas Class # of points (93G) Mean (93G) Std.dev (93G) #of points (93H) Mean (93H) Std.dev (93H) 1 36025 9.21 7.44 37778 75.77 69.70 2 16775 16.48 15.18 15022 137.26 110.88 1 16775 16.48 15.18 15022 137.26 110.88 2 22479 10.28 8.29 22374 57.47 50.86 3 13546 7.43 5.29 15404 102.34 83.45 1 22479 10.28 8.29 22374 57.47 50.86 2 6296 21.04 19.68 5711 142.25 117.63 3 10479 13.75 10.79 15404 102.34 83.45 4 13546 7.43 5.29 9311 134.20 106.42 1 6296 21.04 19.68 5711 142.25 117.63 2 10479 13.75 10.79 15404 102.34 83.45 3 13546 7.43 5.29 12237 46.78 42.95 4 11958 10.54 8.67 9311 134.20 106.42 5 10521 9.98 7.83 10137 70.37 56.38 1 10479 13.75 10.79 15404 102.34 83.45 2 13546 7.43 5.29 12237 46.78 42.95 3 5057 17.86 14.93 9311 134.20 106.42 4 11958 10.54 8.67 4468 145.47 120.94 5 10521 9.98 7.83 10137 70.37 56.38 6 1239 33.99 29.15 1243 130.69 104.11 336 6.2.2.3 NGDC5 and WSC2 D E M errors as compared to E M R l D E M In the preceding two sections, WSC2 and E M R l D E M errors as compared to T R I M DEMs for two different surfaces (93G and 93H) were examined in relation to the terrain clusters derived using different variable groups and window sizes. The terrain clusters used in the above analyses were all based on local measures extracted from 50 m T R I M DEMs for the two subareas. In order to investigate the sensitivity of the above D E M error models, NGDC5 and WSC2 D E M errors as compared to E M R l D E M for the whole study area were also examined within various terrain clusters based on local roughness measures extracted from the 1 km E M R l D E M . Table 6.9a presents the summary statistics (i.e., number of points, mean, and standard deviation) of NGDC5 D E M errors within each terrain cluster derived using variable group (3-5) and two different moving window sizes (5x5) and (9x9) for the whole study area. The results for the absolute NGDC5 D E M errors (variable group: 3-5) are presented in Table 6.9b. Based on the above two tables, the rankings of the clusters according to NGDC5 D E M errors and the absolute NGDC5 D E M errors can be summarized as follows for the two window sizes: 1) variable group: 3-5, window size: 5x5 {2,1} {1,2,3} {2,3,1,4} {1,5,2,4,3} {1,5,6,2,4,3} {2,1} {1,2,3} {3,2,1,4} {2,1,5,4,3} {2,5,1,6,4,3} 2) variable group: 3-5, window size: 9x9 {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,2,5,6,4,3} {1,2} {2,3,1} {1,2,4,3} {2,1,3,5,4} {5,1,2,6,4,3} 337 Table 6.9a Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) #of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 60.64 190.53 15399 -15.95 323.33 2 10564 -57.08 341.92 7981 52.59 135.26 1 10564 -57.08 341.92 7981 52.59 135.26 2 7512 100.11 202.62 10674 -101.23 314.36 3 5304 4.73 155.69 4725 176.71 252.82 1 7512 100.11 202.62 10674 -101.23 314.36 2 8313 -147.41 308.44 4725 176.71 252.82 3 2251 276.51 236.52 4951 16.40 116.49 4 5304 4.73 155.69 3030 111.72 142.78 1 8313 -147.41 308.44 4725 176.71 252.82 2 2251 276.51 236.52 5885 -59.56 335.42 3 5304 4.73 155.69 4789 -152.43 277.98 4 4100 115.39 158.19 4951 16.40 116.49 5 3412 81.75 244.36 3030 111.72 142.78 1 4452 -71.54 327.30 5885 -59.56 335.42 2 2251 276.51 236.52 4789 -152.43 277.98 3 5304 4.73 155.69 4951 16.40 116.49 4 4100 115.39 158.19 3030 111.72 142.78 5 3861 -234.89 258.92 2133 259.64 256.85 6 3412 81.75 244.36 2592 108.46 227.88 338 Table 6.9b Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 145.26 137.39 15399 259.61 193.37 2 10564 280.03 204.32 7981 103.01 102.22 1 10564 280.03 204.32 7981 103.01 102.22 2 7512 170.71 148.09 10674 262.97 199.78 3 5304 109.20 111.06 4725 252.02 177.83 1 7512 170.71 148.09 10674 262.97 199.78 2 8313 272.78 206.03 4725 252.02 177.83 3 2251 306.82 195.58 4951 83.60 82.75 4 5304 109.20 111.06 3030 134.72 121.31 1 8313 272.78 206.03 4725 252.02 177.83 2 2251 306.82 195.58 5885 274.45 201.80 3 5304 109.20 111.06 4789 248.87 196.37 4 4100 140.68 136.19 4951 83.60 82.75 5 3412 206.80 153.68 3030 134.72 121.31 1 4452 262.68 207.91 5885 274.45 201.80 2 2251 306.82 195.58 4789 248.87 196.37 3 5304 109.20 111.06 4951 83.60 82.75 4 4100 140.68 136.19 3030 134.72 121.31 5 3861 284.42 203.25 2133 309.13 194.45 6 3412 206.80 153.68 2592 205.03 147.13 339 With window size 5x5, cluster #1 at level 3 is the roughest (as seen in section 5.3.3) and it has the largest standard deviation of NGDC5 D E M errors and the highest mean of the absolute NGDC5 D E M errors. Cluster #2 is the second roughest and it has the second largest standard deviation of NGDC5 D E M errors and the mean of the absolute NGDC5 D E M errors. Cluster #3 is the least rough one and it corresponds to a cluster with the smallest amount of NGDC5 D E M errors. With window size 9x9, cluster #2 at level 3 is the roughest (as seen in section 5.3.3) and it has the largest standard deviation of NGDC5 D E M errors and the highest mean of the absolute NGDC5 D E M errors. Cluster #3 is the second roughest and it has the second largest standard deviation of NGDC5 D E M errors and the mean of the absolute NGDC5 D E M errors. Cluster #1 is the least rough one and it corresponds to a cluster with the smallest amount of NGDC5 D E M errors. For variable groups (3-6), (2), and (5), the statistics of NGDC5 D E M errors and the absolute NGDC5 D E M errors within various terrain clusters are summarized in Tables 6.10a-b, 6.11ab, and 6.12a-b respectively. The rankings of the clusters are not presented. Tables 6.13a-6.16b give the summary results for WSC2 D E M errors (as compared to E M R l DEM) within different terrain clusters for the whole study area in various cases (i.e., window size: 5x5 or 9x9; variable group: 3-5, 3-6, 2, or 5). Again, both the original (including the positive and negative elevation differences) and the absolute (ignoring the signs) WSC2 D E M errors (as compared to E M R l DEM) were examined. 340 Table 6.10a Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 13206 -46.90 327.32 14697 -23.95 325.88 2 10174 77.99 164.71 8683 60.59 144.26 1 10174 77.99 164.71 8683 60.59 144.26 2 10004 40.03 301.38 10129 99.64 282.03 3 3202 -318.52 246.40 4568 -298.00 237.37 1 10004 40.03 301.38 10129 99.64 282.03 2 6006 122.85 168.71 6420 22.65 121.81 3 4168 13.36 134.67 4568 -298.00 237.37 4 3202 -318.52 246.40 2263 168.22 148.64 1 4982 90.66 323.85 6420 22.65 121.81 2 6006 122.85 168.71 7014 77.18 245.23 3 5022 -10.19 268.04 3115 150.21 345.80 4 4168 13.36 134.67 4568 -298.00 237.37 5 3202 -318.52 246.40 2263 168.22 148.64 1 6006 122.85 168.71 7014 77.18 245.23 2 5022 -10.19 268.04 3115 150.21 345.80 3 4168 13.36 134.67 4279 7.22 123.66 4 3202 -318.52 246.40 4568 -298.00 237.37 5 3290 -30.01 310.01 2263 168.22 148.64 6 1692 325.31 196.41 2141 53.49 111.84 341 Table 6.10b Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 13206 266.05 196.35 14697 262.52 194.55 2 10174 128.41 129.31 8683 110.74 110.53 1 10174 128.41 129.31 8683 110.74 110.53 2 10004 242.73 183.05 10129 238.43 180.60 3 3202 338.88 217.55 4568 315.94 212.89 1 10004 242.73 183.05 10129 238.43 180.60 2 6006 152.13 142.86 6420 88.73 '86.47 3 4168 94.24 97.12 4568 315.94 212.89 4 3202 338.88 217.55 2263 173.20 142.80 1 4982 272.71 196.76 6420 88.73 86.47 2 6006 152.13 142.86 7014 206.51 153.10 3 5022 213.00 163.01 3115 310.31 214.07 4 4168 94.24 97.12 4568 315.94 212.89 5 3202 338.88 217.55 2263 173.20 142.80 1 6006 152.13 142.86 7014 206.51 153.10 2 5022 213.00 163.01 3115 310.31 214.07 3 4168 94.24 97.12 4279 89.90 85.21 4 3202 338.88 217.55 4568 315.94 212.89 5 3290 243.39 194.29 2263 173.20 142.80 6 1692 329.71 188.93 2141 86.39 88.91 342 Table 6.11a Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14002 -37.85 324.67 14413 -26.70 327.54 2 9378 75.08 157.60 8967 62.34 145.73 1 9948 -13.26 297.16 10312 -59.44 341.57 2 9378 75.08 157.60 8967 62.34 145.73 3 4054 -98.20 377.22 4101 55.61 272.45 1 9378 75.08 157.60 8967 62.34 145.73 2 4054 -98.20 377.22 3194 -93.74 395.17 3 4701 29.57 281.06 7118 -44.05 313.38 4 5247 -51.63 305.85 4101 55.61 272.45 1 4054 -98.20 377.22 3194 -93.74 395.17 2 4701 29.57 281.06 7118 -44.05 313.38 3 5247 -51.63 305.85 4101 55.61 272.45 4 4046 81.77 187.85 5435 76.64 173.64 5 5332 70.00 129.83 3532 40.33 82.00 1 4701 29.57 281.06 7118 -44.05 313.38 2 5247 -51.63 305.85 4101 55.61 272.45 3 1246 -131.48 434.22 5435 76.64 173.64 4 4046 81.77 187.85 2536 -92.54 372.48 5 5332 70.00 129.83 658 -98.36 472.82 6 2808 -83.44 348.02 3532 40.33 82.00 343 Table 6.11b Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14002 262.86 194.27 14413 264.85 194.54 2 9378 121.48 125.36 8967 111.81 112.34 1 9948 241.46 173.70 10312 281.25 202.72 2 9378 121.48 125.36 8967 111.81 112.34 3 4054 315.39 229.00 4101 223.61 165.24 1 9378 121.48 125.36 8967 111.81 112.34 2 4054 315.39 229.00 3194 330.50 235.97 3 4701 228.21 166.66 7118 259.15 181.60 4 5247 253.33 178.95 4101 223.61 165.24 1 4054 315.39 229.00 3194 330.50 235.97 2 4701 228.21 166.66 7118 259.15 181.60 3 5247 253.33 178.95 4101 223.61 165.24 4 4046 152.02 137.33 5435 141.12 126.92 5 5332 98.31 109.95 3532 66.71 62.45 1 4701 228.21 166.66 7118 259.15 181.60 2 5247 253.33 178.95 4101 223.61 165.24 3 1246 371.36 260.43 5435 141.12 126.92 4 4046 152.02 137.33 2536 314.23 220.30 5 5332 98.31 109.95 658 393.22 279.98 6 2808 290.56 208.88 3532 66.71 62.45 344 Table 6.12a Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14052 -36.44 321.35 14851 -18.61 326.12 2 9328 73.55 167.47 8529 52.81 142.58 1 6314 -55.75 340.24 11672 -4.79 296.88 2 7738 -20.67 304.19 3179 -69.33 412.34 3 9328 73.55 167.47 8529 52.81 142.58 1 7738 -20.67 304.19 3179 -69.33 412.34 2 9328 73.55 167.47 8529 52.81 142.58 3 1938 -89.97 360.31 6666 5.52 272.99 4 4376 -40.60 329.87 5006 -18.53 325.51 1 9328 73.55 167.47 8529 52.81 142.58 2 1938 -89.97 360.31 6666 5.52 272.99 3 4376 -40.60 329.87 957 -125.92 442.39 4 4399 -9.39 294.20 5006 -18.53 325.51 5 3339 -35.54 316.30 2222 -44.95 396.32 1 1938 -89.97 360.31 6666 5.52 272.99 2 4376 -40.60 329.87 957 -125.92 442.39 3 4399 -9.39 294.20 5006 -18.53 325.51 4 6523 74.28 143.43 4407 58.43 172.65 5 2805 71.85 213.16 2222 -44.95 396.32 6 3339 -35.54 316.30 4122 46.82 100.65 345 Table 6.12b Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14052 258.89 193.81 14851 262.73 194.09 2 9328 126.71 131.91 8529 107.65 107.38 1 6314 275.23 207.61 11672 241.98 172.05 2 7738 245.56 180.70 3179 338.92 244.80 3 9328 126.71 131.91 8529 107.65 107.38 1 7738 245.56 180.70 3179 338.92 244.80 2 9328 126.71 131.91 8529 107.65 107.38 3 1938 292.24 229.07 6666 221.77 159.26 4 4376 267.70 196.93 5006 268.88 184.35 1 9328 126.71 131.91 8529 107.65 107.38 2 1938 292.24 229.07 6666 221.77 159.26 3 4376 267.70 196.93 957 377.11 263.09 4 4399 237.22 174.22 5006 268.88 184.35 5 3339 256.54 188.36 2222 322.47 234.65 1 1938 292.24 229.07 6666 221.77 159.26 2 4376 267.70 196.93 957 377.11 263.09 3 4399 237.22 174.22 5006 268.88 184.35 4 6523 109.56 118.68 4407 136.40 120.88 5 2805 166.60 151.12 2222 322.47 234.65 6 3339 256.54 188.36 4122 76.91 80.04 346 Table 6.13a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Class # of point (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 1.90 49.66 15399 1.03 103.49 2 10564 -1.09 114.65 7981 -0.36 26.41 1 10564 -1.09 114.65 7981 -0.36 26.41 2 7512 5.11 55.30 10674 -0.84 115.41 3 5304 -2.64 39.90 4725 5.24 69.24 1 7512 5.11 55.30 10674 -0.84 115.41 2 8313 -6.76 121.36 4725 5.24 69.24 3 2251 19.86 82.14 4951 -1.03 26.82 4 5304 -2.64 39.90 3030 0.72 25.69 1 8313 -6.76 121.36 4725 5.24 69.24 2 2251 19.86 82.14 5885 1.34 123.55 3 5304 -2.64 39.90 4789 -3.52 104.49 4 4100 2.62 25.93 4951 -1.03 26.82 5 3412 8.10 76.87 3030 0.72 25.69 1 4452 -3.15 128.79 5885 1.34 123.55 2 2251 19.86 82.14 4789 -3.52 104.49 3 5304 -2.64 39.90 4951 -1.03 26.82 4 4100 2.62 25.93 3030 0.72 25.69 5 3861 -10.92 112.05 2133 7.73 70.57 6 3412 8.10 76.87 2592 3.20 68.07 347 Table 6.13b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 12816 30.20 39.46 15399 70.87 75.43 2 10564 80.04 82.09 7981 17.71 19.59 1 10564 80.04 82.09 7981 17.71 19.59 2 7512 34.20 43.75 10674 82.85 80.35 3 5304 24.54 31.57 4725 43.80 53.87 1 7512 34.20 43.75 10674 82.85 80.35 2 8313 87.93 83.91 4725 43.80 53.87 3 2251 50.88 67.47 4951 18.53 19.41 4 5304 24.54 31.57 3030 16.37 19.81 1 8313 87.93 83.91 4725 43.80 53.87 2 2251 50.88 67.47 5885 87.59 87.14 3 5304 24.54 31.57 4789 77.02 70.70 4 4100 17.44 19.37 4951 18.53 19.41 5 3412 54.34 54.97 3030 16.37 19.81 1 4452 91.14 91.05 5885 87.59 87.14 2 2251 50.88 67.47 4789 77.02 70.70 3 5304 24.54 31.57 4951 18.53 19.41 4 4100 17.44 19.37 3030 16.37 19.81 5 3861 84.24 74.67 2133 41.55 57.56 6 3412 54.34 54.97 2592 45.66 50.58 348 Table 6.14a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 13206 -0.47 109.55 14697 0.40 105.23 2 10174 1.88 34.33 8683 0.81 29.93 1 10174 1.88 34.33 8683 0.81 29.93 2 10004 15.00 104.31 10129 19.62 94.57 3 3202 -48.81 111.47 4568 -42.21 114.73 1 10004 15.00 104.31 10129 19.62 94.57 2 6006 8.38 37.45 6420 -1.24 30.77 3 4168 -7.49 26.59 4568 -42.21 114.73 4 3202 -48.81 111.47 2263 6.62 26.58 1 4982 -0.70 103.01 6420 -1.24 30.77 2 6006 8.38 37.45 7014 24.48 91.62 3 5022 30.58 103.25 3115 8.67 100.04 4 4168 -7.49 26.59 4568 -42.21 114.73 5 3202 -48.81 111.47 2263 6.62 26.58 1 6006 8.38 37.45 7014 24.48 91.62 2 5022 30.58 103.25 3115 8.67 100.04 3 4168 -7.49 26.59 4279 -6.16 27.68 4 3202 -48.81 111.47 4568 -42.21 114.73 5 3290 -14.63 115.56 2263 6.62 26.58 6 1692 26.36 64.59 2141 8.58 34.10 349 Table 6.14b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 13206 76.46 78.46 14697 72.42 76.34 2 10174 21.91 26.49 8683 19.38 22.82 1 10174 21.91 26.49 8683 19.38 22.82 2 10004 72.05 76.90 10129 64.53 71.86 3 3202 90.24 81.62 4568 89.90 82.83 1 10004 72.05 76.90 10129 64.53 71.86 2 6006 24.12 29.85 6420 20.12 23.31 3 4168 18.73 20.31 4568 89.90 82.83 4 3202 90.24 81.62 2263 17.31 21.23 1 4982 67.75 77.60 6420 20.12 23.31 2 6006 24.12 29.85 7014 65.66 68.42 3 5022 76.31 75.98 3115 61.98 79.00 4 4168 18.73 20.31 4568 89.90 82.83 5 3202 90.24 81.62 2263 17.31 21.23 1 6006 24.12 29.85 7014 65.66 68.42 2 5022 76.31 75.98 3115 61.98 79.00 3 4168 18.73 20.31 4279 18.91 21.13 4 3202 90.24 81.62 4568 89.90 82.83 5 3290 81.54 83.17 2263 17.31 21.23 6 1692 40.93 56.49 2141 22.52 27.00 350 Table 6.15a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14002 0.84 108.00 14413 1.05 106.47 2 9378 0.13 27.68 8967 -0.24 28.19 1 9948 0.90 91.73 10312 0.60 117.75 2 9378 0.13 27.68 8967 -0.24 28.19 3 4054 0.67 140.15 4101 2.16 70.54 1 9378 0.13 27.68 8967 -0.24 28.19 2 4054 0.67 140.15 3194 4.33 140.76 3 4701 3.28 74.18 7118 -1.07 105.78 4 5247 -1.23 104.96 4101 2.16 70.54 1 4054 0.67 140.15 3194 4.33 140.76 2 4701 3.28 74.18 7118 -1.07 105.78 3 5247 -1.23 104.96 4101 2.16 70.54 4 4046 0.20 36.40 5435 -0.58 33.77 5 5332 0.08 18.48 3532 0.28 16.19 1 4701 3.28 74.18 7118 -1.07 105.78 2 5247 -1.23 104.96 4101 2.16 70.54 3 1246 3.11 164.93 5435 -0.58 33.77 4 4046 0.20 36.40 2536 4.33 130.94 5 5332 0.08 18.48 658 4.31 173.61 6 2808 -0.41 127.64 3532 0.28 16.19 351 Table 6.15b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14002 75.56 77.17 14413 73.85 76.70 2 9378 18.62 20.47 8967 18.76 21.04 1 9948 65.55 64.18 10312 83.27 83.25 2 9378 18.62 20.47 8967 18.76 21.04 3 4054 100.13 98.06 4101 50.16 49.64 1 9378 18.62 20.47 8967 18.76 21.04 2 4054 100.13 98.06 3194 100.55 98.59 3 4701 53.63 51.35 7118 75.51 74.07 4 5247 76.22 72.16 4101 50.16 49.64 1 4054 100.13 98.06 3194 100.55 98.59 2 4701 53.63 51.35 7118 75.51 74.07 3 5247 76.22 72.16 4101 50.16 49.64 4 4046 25.82 25.66 5435 23.42 24.34 5 5332 13.16 12.98 3532 11.59 11.30 1 4701 53.63 51.35 7118 75.51 74.07 2 5247 76.22 72.16 4101 50.16 49.64 3 1246 118.98 114.21 5435 23.42 24.34 4 4046 25.82 25.66 2536 93.55 91.70 5 5332 13.16 12.98 658 127.53 117.77 6 2808 91.76 88.71 3532 11.59 11.30 352 Table 6.16a Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area Class #of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14052 0.73 107.45 14851 0.92 104.81 2 9328 0.29 29.80 8529 -0.09 29.41 1 6314 -0.37 125.22 11672 -0.27 98.29 2 7738 1.62 90.39 3179 5.30 125.79 3 9328 0.29 29.80 8529 -0.09 29.41 1 7738 1.62 90.39 3179 5.30 125.79 2 9328 0.29 29.80 8529 -0.09 29.41 3 1938 2.65 134.68 6666 0.56 90.78 4 4376 -1.71 120.79 5006 -1.39 107.48 1 9328 0.29 29.80 8529 -0.09 29.41 2 1938 2.65 134.68 6666 0.56 90.78 3 4376 -1.71 120.79 957 0.76 149.67 4 4399 1.64 83.40 5006 -1.39 107.48 5 3339 1.60 98.85 2222 7.26 113.95 1 1938 2.65 134.68 6666 0.56 90.78 2 4376 -1.71 120.79 957 0.76 149.67 3 4399 1.64 83.40 5006 -1.39 107.48 4 6523 0.12 22.47 4407 -0.50 35.86 5 2805 0.68 42.20 2222 7.26 113.95 6 3339 1.60 98.85 4122 0.35 20.35 353 Table 6.16b Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area Class # of points (5x5) Mean (5x5) Std.dev (5x5) # of points (9x9) Mean (9x9) Std.dev (9x9) 1 14052 74.84 77.10 14851 72.00 76.17 2 9328 19.40 22.63 8529 19.16 22.31 1 6314 88.81 88.27 11672 67.42 71.52 2 7738 63.44 64.40 3179 88.78 89.25 3 9328 19.40 22.63 8529 19.16 22.31 1 7738 63.44 64.40 3179 88.78 89.25 2 9328 19.40 22.63 8529 19.16 22.31 3 1938 96.09 94.39 6666 62.80 65.55 4 4376 85.59 85.24 5006 73.58 78.35 1 9328 19.40 22.63 8529 19.16 22.31 2 1938 96.09 94.39 6666 62.80 65.55 3 4376 85.59 85.24 957 108.38 103.17 4 4399 58.17 59.78 5006 73.58 78.35 5 3339 70.37 69.43 2222 80.34 81.11 1 1938 96.09 94.39 6666 62.80 65.55 2 4376 85.59 85.24 957 108.38 103.17 3 4399 58.17 59.78 5006 73.58 78.35 4 6523 15.47 16.29 4407 23.99 26.66 5 2805 28.54 31.09 2222 80.34 81.11 6 3339 70.37 69.43 4122 13.99 14.79 354 6.2.3 Significance tests of D E M error variation From Figures 6.8a-b and Tables 6.1a to 6.16b, it is evident that WSC2 and E M R l D E M errors (as compared to 50 m T R I M DEMs) for the two subareas and NGDC5 and WSC2 D E M errors (as compared to 1 km E M R l DEM) for the whole study area are different from cluster to cluster. It was observed earlier in section 6.2.2 that, in general, the rougher the cluster, the larger the standard deviation of the elevation differences or the mean of the absolute elevation differences, indicating greater amount of D E M errors. Also note, however, that the differences between D E M errors in some clusters become insignificant at certain hierarchical levels for various cases. In order to examine the differences quantitatively, the following statistical test was used for the testing for difference between two population variances: s — 2 ~F , , , (6.1) 11,-1,112-1 2 S where the subscript (n, - 1) refers to the number of degrees of freedom in the numerator and (n - 1) refers to the number of degrees of freedom in the denominator. 2 This most commonly used test for variances takes the ratio of the two sample variances (s! /s ) and tests it against an F statistic. Strictly speaking, this test is true only for normal 2 2 2 parent populations. There are some indications, however, that the results also apply to a large extent to other types of parent populations, providing they do not differ from the normal 355 population too markedly [Kmenta, 1986]. Since the D E M errors are quite symmetrically distributed (i.e., not too nonnormal) and the numbers of observations are large, moderate departures from the assumption of normality are tolerable. It is, thus, safe to use the above test here to examine quantitatively the differences between the standard deviations (or variances) of D E M errors in various clusters. 6.2.3.1 Significance tests of WSC2 and EMR1 D E M errors in the two subareas The standard deviations of the original WSC2 D E M errors (as compared to T R I M DEMs) in various terrain clusters in each case (variable group: 3-5, 3-6, 2, or 5; window size: 7x7 or 21x21; number of clusters: 2, 3, 4, 5, 6, 7 or 8; and subarea: 93G or 93H) were examined using the above F statistic. Figure 6.9a illustrates the different levels of partition (2 to 8 clusters) from the hierarchical terrain clustering based on variable group (3-5) and two different window sizes (7x7 and 21x21) for the two subareas (93G and 93H). The connectivity information from one level to the next was derived from the data (i.e., the number of points in each cluster) presented in Tables 6.1a-d. The figure illustrates how terrain cluster {1} at any one partition level is separated successively into two finer clusters at the next higher level. The clusters at different levels are hierarchically nested and the partition level controls the scale of the investigation. If two points belong to the same cluster at a higher level L , it is clear that they must also belong to the same cluster at a lower level ; L <Lj. 2 356 93Q (WSC2) (variable: 3-5) (window: 21x21) 93Q (WSC2) (variable: 3-5) (window: 7x7) 1 2 1 2 3 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 2 93H (WSC2) (variable: 3-5) (window: 7x7) 93H (WSC2) (variable: 3-5) (window: 21x21) 3 4 5 1 2 1 2 3 1 2 3 4 5 1) 2 3 (4) 5 1 2 3 4 5 6 1 2 3 (4) 5 (6) 7 1 2 3 4 5 6 7 1 2 1 2 gure 6.9a 3 4 5 The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21) 357 At each partition level in Figure 6.9a, terrain clusters that are not significantly different (at a=0.01) from all other clusters, in terms of the standard deviation of WSC2 D E M errors, are indicated by shaded circles. It can be seen in Figure 6.9a that for case #1 (subarea 93G; variable group: 3-5; and window size: 7x7), WSC2 D E M errors in some clusters (clusters {2} and {4}) start to show no significant differences at partition level 4. For case #2 (subarea 93G; variable group: 3-5; and window size: 21x21), it starts at level 7 (clusters {6} and {7}). For case #3 (subarea 93H; variable group: 3-5; and window size: 7x7), it starts at level 6 (clusters {5} and {6}) and for case #4 (subarea 93H; variable group: 3-5; and window size: 21x21) it starts at level 5 (clusters {1} and {4}). Note that at levels 2 and 3 all clusters are significantly different from each other for every case tested above. The above results indicate that WSC2 D E M errors (elevation differences as compared to T R I M DEMs) in each of the two subareas do show some significant variations between different clusters resulting from terrain classification based on variable group (3-5) and the two different window sizes. Furthermore, variation of WSC2 D E M errors between terrain clusters becomes statistically insignificant at different partition levels for the different window sizes used for local measure extraction and for the two subareas with distinct global characteristics. For the flatter surface (i.e., 93G), variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 7 for window size (21x21) and level 4 for window size (7x7). For the rougher surface (i.e., 93H), as can be seen in Figure 6.9a, variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 6 for window size 7x7 versus level 5 for 358 window size 21x21. Finally, it should be noted that in some cases even though the standard deviations (or variances) in two clusters were found not significantly different (i.e., equal population standard deviations), their means are apparently very dissimilar. This can be tested statistically by using a t-test. That is, t = X Xj s 2 2 s 2 (6.2) N As seen in Table 6.1a and Figure 6.9a, for example, clusters {5} and {6} at level 6 for subarea 93H (variable group: 3-5, window size: 7x7) were not different in terms of their standard deviations of WSC2 D E M errors (both have the same value of 228.4 m). However, their means (81.5 and -151.4 m respectively) were tested significantly different (at a=0.01). In order to examine the effect of different variable groups on the above results, significance test results for WSC2 D E M errors in terrain clusters derived from variable groups (3-6), (2) and (5) (for both subareas 93G and 93H and both window sizes 7x7 and 21x21) are summarized in Figures 6.9b-d respectively. Based on information presented in Figure 6.9b it can be seen that, with variable group (3-6) used for classification, the variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 7 for window size (7x7) and level 3 for window size (21x21) for subarea 93G. For the rougher surface 93H, the variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 5 for window size (7x7). For 359 93Q (WSC2) (variable: 3-6) (window: 7x7) 93Q (WSC2) (variable: 3-6) (window: 21x21) 1 2 1 2 0) 2 ® 12 3 4 5 1 2 3 4 5 6 0 ) @ 3 4 5 6 7 8 0 2 ( 3 ) 4 5 ® ® 93H (WSC2) (variable: 3-6) (window: 7x7) 93H (WSC2) (variable: 3-6) (window: 21x21) 1 2 1 2 1 2 3 12 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 7 CD ® ® @ Figure 6.9b 1 2 3 4 5 6 7 8 5 The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21) 360 93Q (WSC2) (variable: 2) (window: 7x7) 93Q (WSC2) (variable: 2) (window: 21x21) 1 2 1 2 1 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 0) 2 3 4 5 2 3 4 5 6 © 1 3 1 2 3 4 ® © ® 5 6 ® 93H (WSC2) (variable: 2) (window: 21x21) 93H (WSC2) (variable: 2) (window: 7x7) 1 2 2 3 4 1 2 Figure 6.9c 3 4 5 2 3 4 5 6 1 2 3 4 5 6 2 3 4 5 6 7 1 2 3 4 5 6 7 2 3 4 5 6 7 8 1 2 3 (4) 5 6 8 The significance test results for WSC2 D E M errors for the two subareas (variable group: 2, window size: 7x7 and 21x21) 361 93Q (WSC2) (variable: 5) (window: 7x7) 93Q (WSC2) (variable: 5) (window: 21x21) 1 2 1 2 1 2 1 3 4 2 3 4 5 2 5 6 3 @ 5 6 D 2 3 4 5 ® ® 1 2 3 4 6 93H (WSC2) (variable: 5) (window: 7x7) 93H (WSC2) (variable: 5) (window: 21 x21) 1 2 1 7 8 1 2 2 3 1 2 3 4 1 2 3 4 gure 6.9d 5 1 2 3 4 5 3 5 (6) 7 8 4 The significance test results for WSC2 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21) 362 window size (21x21), the variation of WSC2 D E M errors in various terrain clusters are significant at all the levels examined (levels 2 to 8). For the case of variable group (2), as shown in Figure 6.9c, the variation of WSC2 D E M errors between some of the terrain clusters in subarea 93G becomes statistically insignificant at partition level 7 for both window sizes (7x7) and (21x21). In subarea 93H, it is significant at all the levels from 2 to 8 for window size (7x7), but it starts to become insignificant at level 8 for window size (21x21). With variable group (5) (see Figure 6.9d), the variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 7 for case #1 (subarea 93G; variable group: 5; and window size: 7x7), level 5 for case #2 (subarea 93G; variable group: 5; and window size: 21x21), and level 6 for both case #3 (subarea 93H; variable group: 5; and window size: 7x7) and case #4 (subarea 93H; variable group: 5; and window size: 21x21). For EMR1 D E M errors as compared to T R I M DEMs for the two subareas (93G and 93H), significance test results at various hierarchical levels (2 to 8) were also obtained for the four different variable groups (3-5, 3-6, 2 and 5) and the two different window sizes (7x7 and 21x21). Figures 6.10a-d summarize the test results for EMR1 D E M errors in various cases. From the above significance test results of WSC2 and EMR1 D E M errors within various 363 93Q (EMRl) (variable: 3-5) (window: 7x7) 93G(EMR1) (variable: 3-5) (window: 21x21) 1 2 1 2 1 2 3 1 2 3 4 5 0) 2 3 4 5 93H (EMRl) (variable: 3-5) (window: 21x21) 93H(EMR1) (variable: 3-5) (window: 7x7) 1 2 1 2 3 D 12) 3 4 5 1 2 3 4 5 6 1 2 3 4 5 ® ® 1 2 1 2 ® @ 5 ® ® 1 @ ® 4 ® ® ® ® 6 ® ® gure 6.10a The significance test results for E M R l D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21) 364 93Q (EMR1) (variable: 3-6) (window: 7x7) 93Q (EMR1) (variable: 3-6) (window: 21 x21) 1 1 2 1 2 3 2 1 2 1 2 1 2 3 4 5 3 4 5 3 4 6 1 2 3 4 5 6 7 1 2 3 4 5 6 93H (EMR1) (variable: 3-6) (window: 7x7) 93H (EMR1) (variable: 3-6) (window: 21x21) 1 1 2 7 8 2 CD 2 ® 1 2 3 4 5 1 2 7 gure 6.10b 8 3 4 5 1 2 3 4 5 1 2 3 4 5 6 7 1 2 3 4 5 6 (7) 6 The significance test results for EMR1 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21) 365 93Q (EMRl) (variable: 2) (window: 7x7) 93Q (EMRl) (variable: 2) (window: 21x21) 1 2 1 2 1 2 3 4 1 2 ® ® 5 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6 1) (2) 3 4 5 6 7 1 2 3 4 5 6 7 1 2 1 2 3 3 4 93H (EMRl) (variable: 2) (window: 21x21) 93H (EMRl) (variable: 2) (window: 7x7) 1 2 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 ® 5 6 <3) 1 2 3 4 5 6 7 1 2 1 2 3 4 5 6 7 8 Figure 6.10c The significance test results for E M R l D E M errors for the two subareas (variable group: 2, window size: 7x7 21x21) 366 93Q (EMR1) (variable: 5) (window: 7x7) 93Q (EMR1) (variable: 5) (window: 21x21) 1 2 1 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 1 2 3 4 5 6 7 8 3 4 5 (7) 8 93H (EMR1) (variable: 5) (window: 7x7) 93H (EMR1) (variable: 5) (window: 21x21) 1 2 1 2 1 2 3 1 © 3 ® CD 2 3 1 2 3 4 5 6 0) 2 3 (4) 5 1 2 Figure 6. lOd 6) 7 8 The significance test results for EMR1 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21) 367 terrain clusters (i.e., Figures 6.9a-d and Figures 6.10a-d), it can be seen that there is some variation from case to case in terms of the best variable group and window size for D E M error modelling in each subarea. As shown in Figures 5.10a-d or Figures 5.1 la-d in section 5.2.2.2, the number of variables in the geometric signature used for the multivariate clustering affects the classification result. Similarly, significance test results (e.g., Figures 6.9a-d or 6.10a-d) also show variation from case to case depending on the geomorphometric variable group used in deriving the terrain clusters. The size of moving window used in geomorphometric parameter abstraction also has an impact on the test results. Generally speaking, however, for both subareas all four different variable groups and the two different window sizes demonstrated some usefulness in classifying the terrain and relating the D E M errors to the variability of the topographic surface. 6.2.3.2 Significance test of NGDC5 and WSC2 D E M errors in the whole study area For NGDC5 and WSC2 D E M errors as compared to E M R l D E M for the whole study area, the same analysis procedure as described above was followed. The F test was done for different cases (window size: 5x5 or 9x9; variable group: 3-5, 3-6, 2, or 5; number of clusters: 2 to 8) to examine the significance of difference between D E M errors (in terms of the standard deviation of NGDC5 or WSC2 D E M errors) in different terrain clusters. Significance test results of NGDC5 and WSC2 D E M errors in the whole study area are presented in Figures 6.11a-b and 6.12a-b for the different cases (window size: 5x5 or 9x9; 368 MQDC5 (variable: 3-5) (window: 5x5) nQDC5 (variable: 3-6) (window: 5x5) 1 2 1 2 1 2 1 2 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 7 3 (D @ ® 4 5 ® ® 8 riGDC5 (variable: 2) (window: 5x5) 2 1 2 1 2 3 4 5 1 2 3 4 5 1 2 5 1 2 gure 6.1 l a 3 3 3 6 ® 4 8 NGDC5 (variable: 5) (window: 5x5) 1 3 4 1 2 3 2 3 6 2 3 4 5 6 6 7 2 3 4 5 6 7 2 3 4 ® ® 4 ® The significance test results for NGDC5 D E M errors for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5) 369 NGDC5 (variable: 3-5) (window: 9x9) MGDC5 (variable: 3-6) (window: 9x9) 1 2 1 2 3 5 1 2 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 1 2 3 4 3 4 7 3 4 5 8 NGDC5 (variable: 2) (window: 9x9) NGDC5 (variable: 5) (window: 9x9) 1 2 1 2 3 1 2 3 4 2 3 5 1 2 3 4 5 6 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 1 3 5 6 ® Figure 6.1 lb 4 8 5 6 ® ® The significance test results for NGDC5 D E M errors for the whole study (variable group: 3-5, 3-6, 2, and 5; window size: 9x9) 370 WSC2 (variable: 3-6) (window: 5x5) WSC2 (variable: 3-5) (window: 5x5) 1 2 1 2 1 2 3 1 2 3 ® (B) 6 1 2 3 4 5 1 2 3 4 5 6 WSC2 (variable: 5) (window: 5x5) WSC2 (variable: 2) (window: 5x5) 1 2 1 2 1 2 3 1 2 3 4 5 1 2 1 2 3 4 5 6 1 2 3 4 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 gure 6.12a 3 4 5 5 6 Q 2 3 4 5 6 (7) 1 2 3 4 5 ® ® ® The significance test results for WSC2 D E M errors (as compared to EMR1 DEM) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5) 371 WSC2 (variable: 3-5) (window: 9x9) 1 2 3 4 WSC2 (variable: 3-6) (window: 9x9) 1 2 5 3 4 5 M ^ ^ 4 CD WSC2 (variable: 2) (window: 9x9) <2> 3 @ WSC2 (variable: 5) (window: 9x9) 1 2 1 2 3 1 2 3 4 5 1 2 3 4 5 6 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 1 2 3 4 5 6 2 3 4 5 6 Figure 6.12b 7 8 7 8 The significance test results for WSC2 D E M errors (as compared to E M R l D E M ) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 9x9) 372 variable group: 3-5, 3-6, 2, or 5). Figure 6.11a shows the test results for N G D C 5 D E M errors (as compared to E M R l D E M ) with four different variable groups for window size (5x5). Figure 6.11b gives the test results for NGDC5 D E M errors for window size (9x9). Figures 6.12a-b present the results for WSC2 D E M errors (as compared to E M R l DEM) with four different variable groups for window sizes (5x5) and (9x9). Based on these results, several observations can be made. First, for both NGDC5 and WSC2 D E M errors (as compared to E M R l DEM), all four different variable groups and the two different window sizes were capable of classifying the terrain and relating the D E M errors to the variability of the topographic surface. The standard deviation of NGDC5 or WSC2 D E M errors shows great variation from cluster to cluster. For NGDC5 D E M errors, the variation between some of the clusters becomes statistically insignificant at level 8 in most cases except for one case with variable group (3-5) and window size (5x5). Although the differences in variances in this case become statistically insignificant at level 5 (clusters {2, 5} and clusters {3, 4}), their means are actually significantly different (276.51 m versus 81.75 m and 4.73m versus 115.39 m as seen in Table 6.9a). For WSC2 D E M errors, as summarized in Tables 6.13a-b to 6.16ab, the variation between some of the clusters becomes statistically insignificant at levels ranging from 5 to beyond 8 for the various cases. Second, for the characterization of the spatial patterns of NGDC5 and WSC2 D E M errors, the differences among the capabilities of the four variable groups are small, especially with window size (9x9). This might be largely because of the relatively high K H A T values (ranging from 0.47 to 0.67 for window size 9x9) as observed in Figure 5.16g for the comparisons between classifications using different variable groups. The same is true for the two different window sizes, especially with variable 373 groups (3-6), (2), and (5). Again this is probably because of the high agreement ( K H A T value ranging from 0.65 to 0.70) in the classifications using the two different window sizes with these three variable groups. Overall, variable group (2) seems to be slightly superior to the other variable groups in its capability of differentiating the terrain clusters for D E M error modelling. This suggests that a multivariate approach to the classification of topographic surfaces for D E M error modelling is not necessarily more effective, as expected earlier. 6.3 Summary This chapter examined the correlations between the D E M errors and each of the seven local roughness measures and statistically evaluated the observed spatial pattern of mismatch between DEMs of differing resolution in conjunction with the terrain clusters resulting from various multivariate classifications. In summary, from Figures 6.9a-d to 6.12a-b and Tables 6.1a-d to 6.16a-b, certain patterns can be identified in terms of the relation between D E M errors and terrain roughness. First of all, the results indicate that both WSC2 and E M R 1 D E M errors (elevation differences as compared to T R I M DEMs) in each of the two subareas and NGDC5 and WSC2 D E M errors (elevation differences as compared to EMR1 DEM) in the whole study area do show some significant variation between different clusters. Cluster analysis was, therefore, successful to a large degree in grouping the areas according to their overall roughness, and useful 374 in D E M error modelling. Another statement that can be made is that the multivariate approach to the classification of topographic surfaces for D E M error modelling is not necessarily more successful than using only a single roughness measure such as slope and std in characterizing the overall roughness of terrain. This is especially true when the overall amount of D E M errors in each cluster is characterized by the standard deviation of the errors or the mean of the absolute D E M errors in the cluster. Overall, variable group (2) appears to be the best group for the classification of terrain for the examination of the magnitude and spatial pattern of D E M errors. The clusters formed using variable group (3-5) seem to be the least discriminating of all four groups. However, variable groups (3-5) and (3-6) do seem to be more capable of differentiating the positive and negative elevation differences than the single variable groups. As can be seen in Tables 6.1a-d to 6.16a-b, a comparison of the summaries of the original and the absolute D E M errors in various cases shows a more consistent rankings of error clusters for the two single variable groups (2) and (5) than for the two multivariate groups (3-5) and (3-6) (also see section 6.2.2). When comparing the D E M error modelling results for surfaces with different global characteristics, the size of moving window used in the geomorphometric parameter abstraction appears to have some impact on the modelling results. In the two subareas tested, for example, for the flatter surface (i.e., 93G), the larger moving window size (i.e., 21x21) with variable group (3-5) works better than the smaller window size (i.e., 7x7) when extracting local geometric measures for hierarchical terrain classification—if better is defined as being more capable of differentiating various error clusters by variance. With variable group (3-6), the smaller window size (7x7) works better than the larger window size 375 (21x21) for subarea 93G. Based on the various test results presented in this chapter, it would appear that the answer to the question raised in Chapter 3 is generally positive and, therefore, the thesis hypothesis can be accepted. That is, knowledge of the landscape characteristics does provide some insights into the nature of the inherent error (or uncertainty) in a D E M and can be useful for D E M error modelling. It is not possible to provide further blanket statements about the overall fit of the D E M error model because of the variation from case to case, and the fact that some of the variation could not be explained by the cluster structure. Outright rejection of a D E M error model based on topographic characterization is certainly not warranted, but neither is a blind application. It seems that the variability in various test results is more a function of the methods used to obtain the terrain clusters (i.e., different moving window sizes and variable groups) than it is a reflection of any theoretical inadequacy of the model. Another important point to make is that the part of the D E M error variation that was not explained by the cluster structure could be attributed to the interpolation errors introduced when comparing DEMs to the one with higher resolution to determine D E M errors. Although as indicated in Chapter 3 (see Figures 3.22 to 3.25), the interpolation errors (i.e., mismatch caused by using different interpolators) are minor when compared to the D E M errors (the difference between the D E M data sets of differing resolution), and the effects of different interpolation methods can be ignored in the interpolation procedure for comparison of a lower resolution D E M with a higher one, there are still some differences between DEMs of same resolution but interpolated using different 376 methods as seen in Figures 3.24 and 3.25 for subareas 93G and 93H. Therefore, more research is needed to understand the relation between interpolation errors and topographic characteristics and the impact of this on D E M error modelling. 377 378 7. S U M M A R Y A N D CONCLUSIONS 7.1 Summary The intensive use of D E M for a wide range of geographic analyses has given rise to many accuracy investigations. These are usually conducted along three lines: (1) describing or identifying possible sources of gross errors, (2) evaluating the effect of varying densities of sampling data or that of different interpolation methods, (3) comparing the "products" (e.g., spot height, contours) derived from a D E M with those obtained directly from terrestrial and photogrammetric surveying procedures, mostly from a producer's perspective. The D E M accuracy estimate is usually restricted to a global measure such as root-mean-square error (RMSE). Seldom are the errors examined in terms of their spatial distribution pattern or how the resolution of the D E M interacts with the topographic surface variability. There is a wide range of topographic variation present in any terrain surface. Thus, in defining the accuracy of a D E M , one would ultimately like to know the spatial variation of the terrain and how the resolution interacts with this variation. This thesis has two primary objectives: (1) to describe and analyze the spatial variation of D E M errors and to attempt to explain the pattern of this variation based on topographic characterization in order to investigate the relation between D E M errors and the roughness characteristics of terrain; (2) to examine the role of scale in topographic characterization and to see what effects the global characteristics of topographic surfaces have on the results of 379 D E M error modelling. The present research has accomplished several tasks. First, DEMs of various resolutions (i.e., 10-arc-minute NGDC10, 5-arc-minute NGDC5, 2 km WSC2, 1 km E M R l , and 50 m T R I M 93G and 93H) in a study area around Prince George, British Columbia, were compared to each other and their mismatches were examined (i.e., NGDC5 and WSC2 D E M errors as compared to 1 km E M R l D E M for the whole study area, and WSC2 and E M R l D E M errors as compared to 50 m T R I M DEMs for the two subareas). Based on the preliminary test results, some observations were made regarding the relations among the spatial distribution of D E M errors, D E M resolution and the roughness of terrain. A hypothesis was then formed suggesting that knowledge of the landscape characteristics might provide some insights into the nature of the inherent error and/or uncertainty in a D E M . To test this statistically, characterization of the variation and complexity of the study area surfaces was conducted by means of general geomorphometric measures such as "local relief" and "slope." Seven local roughness variables were identified and extracted from study area DEMs using the moving window technique. For the whole study area, the measures were derived based on the 1 km E M R l D E M and two different moving window sizes (i.e., 5x5 and 9x9) were tested. For the two subareas (93G and 93H) the measures were derived from the 50 m T R I M DEMs using two different window sizes (7x7 and 21x21). In order to select the appropriate moving window sizes, the global characteristics of terrain were examined to identify the important scale breaks which indicate at which scales the original terrain contains higher variability. The global characteristics were described by measures such as grain, and those derived from 380 spectral analysis, nested analysis of variance and fractal analysis. A multivariate cluster analysis was then used for automated hierarchical terrain classification in which relatively homogeneous terrain units at different scale levels were identified. Several different variable groups were tested in the cluster analysis and different classification results were compared to each other and interpreted in relation to each roughness measure. Finally, the correlations between the D E M errors and each of the local roughness measures were examined, and the variation of D E M errors within various terrain clusters resulting from multivariate classifications were statistically evaluated. The effectiveness of using different moving window sizes for the extraction of the local measures and the appropriateness of different variable groups for terrain classification were also evaluated. 7.2 Conclusions The major conclusion of this study is that knowledge of topographic characteristics does provide some insights into the nature of the inherent error and/or uncertainty in a D E M and can be useful for D E M error modelling. The measures of topographic complexity are related to the observed patterns of discrepancy between DEMs of differing resolution, but there are variations from case to case. Several patterns can be identified in the relation between D E M errors and the roughness of terrain. First of all, results indicate that both WSC2 and EMR1 D E M errors (elevation differences as compared to T R I M DEMs) in each of the two subareas and NGDC5 and WSC2 D E M errors (elevation differences as compared to EMR1 DEM) in the whole study area do show significant variation among clusters as a 381 result of terrain classifications based on different variable groups and window sizes. Cluster analysis was, therefore, considered successful in grouping the areas according to their overall roughness and, thus, useful in D E M error modelling. However, some of the total variation of various D E M errors could not be accounted for by the cluster structure derived from multivariate classification. Second conclusion of this thesis is that the multivariate approach to the classification of topographic surfaces for D E M error modelling is not necessarily more successful than using only a single roughness measure such as slope or std (i.e., standard deviation of elevations) in characterizing the overall roughness of terrain. This is especially true when the overall amount of D E M error in each cluster is characterized by the standard deviation of the errors or the mean of the absolute D E M errors in the cluster. Overall, variable slope (i.e., #2) proves to be the best single variable for the classification of terrain for D E M error modelling. Variable group (3-5) (i.e., curv, hypint, and std) seems to be the worst of all four groups. However, variable groups (3-5) (i.e., curv, hypint, and std) and (3-6) (i.e., curv, hypint, std, and highpt) are more capable of differentiating the positive and negative elevation differences. In this sense, the assertion made in Chapter 4 regarding the necessity of using multiple signatures to quantitatively describe the terrain proves to be true. When comparing the D E M error modelling results for surfaces with different global characteristics, the size of the moving window used in geomorphometric parameter abstraction also has certain impact on the modelling results. Some understanding of the global characteristics of the surfaces is, therefore, useful in the selection of appropriate/optimal window sizes for the extraction of 382 local measures for D E M error modelling. Based on the important scale breaks identified in the surfaces and the scale levels being considered, it can be expected that certain window sizes will work better than some others in characterizing the local roughness and will, hence, be more useful in differentiating various D E M error clusters. Finally, considering the advantages and disadvantages of various global surface characterization methods, the variogram analysis appears to be the most appropriate and useful method for the purpose of identifying important scale breaks in the surface. 7.3 A Practical Guide for the Users Based on the above conclusions about D E M error modelling, a practical guide to the understanding of D E M errors can be provided from a user's point of view. Following the steps for topographic characterization as illustrated in Figure 4.9, the user should first examine the surface global characteristics using the variogram analysis of study area D E M to find out where the important scale breaks are in the surface. Care should be taken though when extracting linear segments from the variogram plot. One or two of the other global characterization methods, such as the grain measure and the spectral analysis, can be used in addition to the variogram analysis to confirm or identify other important scales of variation. Nested analysis of variance can provide only general and limited information on the surface characteristics, and is, thus, of little practical use. The information on the surface global characteristics can be used then to guide the selection of an optimal window size for the extraction of local measures such as slope from the D E M . Since the single variable slope 383 has proven to be sufficient in characterizing the local roughness of terrain for the purpose of D E M error modelling, a multivariate terrain classification would not be necessary. A hierarchical clustering using a single local variable such as slope would result in terrain clusters with varying degrees of roughness at various scale levels. As demonstrated in this study, the D E M errors/uncertainties show significant variation from cluster to cluster. The rougher the cluster (i.e., relatively greater average slope values), the larger the standard deviation of the errors or the mean of the absolute D E M errors in the cluster. Therefore, the spatial pattern of D E M errors/uncertainties can be understood using the results of topographic characterization. In addition, this information can be used to guide the surface generalization process. For areas with greater uncertainty, a higher degree of generalization would be appropriate, whereas for areas with less uncertainty, a lower degree of generalization would be justifiable. 7.4 Discussions and Future Research There is no single standardized set of variables that can be used in every instance to classify terrain. For example, an application that is attempting to classify an area's susceptibility to landslides will use different measures than one that is trying to establish physiographic regions. Theoretically, the selection of variables used in a classification should reflect known properties of the phenomena under investigation. The development of numeric variables to quantitatively express terrain complexity and roughness has been a strongly interdisciplinary 384 effort. Often, mathematicians and computer scientists are responsible for developing new measures, while applications and testing methods are developed in fields such as geography, geology, and psychology [DeLotto, 1989]. In geography, the pioneering work of Strahler [1950] and Bunge [1962] has led to a tradition of surface quantification for geographic representations and modelling. Many of these measures, as described in Chapter 4, have proven to be useful in capturing the information that is required for successful D E M error modelling. There is, however, still a need to search for and develop new measures for more effective quantitative topographic characterization. It is possible, though, that we will never find measures which can capture "everything." Some of the D E M uncertainty could be a result of "randomness" in the topography which is not quantifiable. Just as the measures used to characterize the complexity of topography are slightly inadequate, so are the classification algorithms used to identify relatively homogenous terrain clusters. Specific landforms only occur over a limited range of scales. This observational scale-dependency of terrain has proven to be an impediment to the successful implementation of automated terrain classification methods. Also there is still room for improving the D E M error models used in this study. These will probably need new methods developed for spatial statistical analysis. Finally, it is useful to suggest that considerable future research is needed into the following areas: 1) In order to reach more generalized conclusions on the usefulness of 385 topographic characterization for D E M error modelling, more tests are necessary to investigate and compare the effectiveness of using other variable groups and moving window sizes for D E M error modelling for various surfaces with different resolutions and distinctive global characteristics. 2) In addition, further examination of the relation between surface roughness and window sizes using a variety of surfaces of varying roughness and a variety of window sizes should be carried out. This could be done using synthetic surfaces (e.g., surfaces of known fractal dimension). 3) As mentioned in Chapter 6, the part of the total variation of D E M errors that was not explained by the cluster structure partially could be attributed to the interpolation errors introduced when comparing a D E M of coarse resolution to one of finer resolution. Although it has been demonstrated that the mismatch caused by using different interpolators is relatively small compared to the mismatch between the DEMs of differing resolution, and, therefore, the effects of different interpolation methods were ignored in the interpolation procedure for comparison of a lower resolution D E M with a higher one, there are still some differences between DEMs of same resolution but interpolated using different methods. Therefore, more research is needed in order to understand the nature of interpolation errors, their relation to the topographic characteristics, and the impact of this relation on D E M error modelling. 386 BIBLIOGRAPHY Akima, H . 1978a. " A Method of Bivariate Interpolation and Smooth Surface Fitting for Irregularly Distributed Data Points." ACM Transactions on Mathematical Software, Vol.4, No.2, pp.148-159. Akima, H . 1978b. " A L G O R I T H M 526: Bivariate Interpolation and Smooth Surface Fitting for Irregularly Distributed Data Points." ACM Transactions on Mathematical Software, Vol.4, No.2, pp. 160-164. American Society of Civil Engineers (Committee on Cartographic Surveying, Surveying and Mapping Division). 1983. Map Uses, Scales and Accuracies for Engineering and Associated Purposes, New York. American Society of Photogrammetry. 1978. Proceedings of the Digital Terrain Models Symposium, A S P - A C S M , St. Louis, Missouri, May 9-11. American Society of Photogrammetry (Committee for Specifications and Standards, Professional Practice Division). 1985. "Accuracy Specification for Large-Scale Line Maps." Photogrammetric Engineering and Remote Sensing, Vol.51, No.2, pp.195-199. Aronoff, S. 1989. Geographic Information Systems: A Management Perspective, Ottawa: W D L Publications. Balser, R. 1989. "Terrain Resource Information Management (TRIM)--A Standardized GeoReferencing Database for the Province of British Columbia." Proceedings of GIS'89 Symposium, Vancouver, British Columbia. Bethel, J.S. and E . M . Mikhail. 1983. "On-Line Quality Assessment in D T M . " Technical Papers, American Congress on Surveying and Mapping—American Society of Photogrammetry Fall Convention, pp.576-584. Bishop, Y.S., S. Fienberg, and P. Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge: MIT Press. 575p. Blais, J.A.R. 1988. "Digital Terrain Modelling for Spatial Information Systems." Proceedings of the Third International Seminar on Trends and Concerns of Spatial Sciences, Laval University, Quebec. Boehm, B.W. 1967. "Tabular Representations of Multivariate Functions—With Applications to Topographic Modelling." Proceedings A.C.M. National Meeting, pp.403-415. Braile, L . W . 1978. "Comparison of Four Random to Grid Methods." Computers and Geosciences, Vol.4, No.4, pp.341-349. 387 Bunge, W. 1962. Theoretical Geography, Lund, Sweden: C.W.K.Gleerup. Burrough, P.A. 1981. "Fractal Dimensions of Landscapes and Other Environmental Data." Nature, Vol.294, 19 November, pp.240-242. Burrough, P.A. 1983. "Multiscale Sources of Spatial Variation in Soil: I. The Application of Fractal Concepts to Nested Levels of Soil Variation, Journal of Soil Science, Vol.34, pp.577598. Burrough, P.A. 1986. Principles of Geographic Information System for Land Resources Assessment, Oxford University Press. Campbell, J.B. 1987. Introduction to Remote Sensing, London: Guilford Press. Carr, J.R. 1995. Numerical Analysis for the Geological Sciences, New Jersey: Prentice Hall. Carter, J.R. 1988. "Digital Representation of Topographic Surfaces." Photogrammetric Engineering and Remote Sensing, Vol.54, No. 11, pp. 1577-1580. Carter, J.R. 1989. "Relative Errors Identified in USGS Gridded DEMs." Proceedings ofAutoCarto 9, American Congress on Surveying and Mapping Bethesda, Maryland, pp.255-265. Caruso, V . M . 1987. "Standards for Digital Elevation Models." Technical Papers, American Society of Photogrammetry and Remote Sensing—American Congress on Surveying and Mapping Annual Convention, 4, pp. 159-166. Cheong, A . L . 1992. Quantifying Drainage Basin Comparisons Within a Knowledge-Based System Framework, M.Sc. thesis, University of British Columbia, Vancouver, B.C., 133p. Christian, C.S. and G.A. Stewart. 1968. "Methodology of Integrated Surveys." Proceedings of the Conference on Aerial Surveys and Integrated Studies, Toulouse, Unesco, pp.233-280. Church, M . and D . M . Mark. 1980. "On Size and Scale in Geomorphology." Progress in Physical Geography, Vol.4, pp.342-390. Clark, W . A . V . and P.L. Hosking. 1986. Statistical Methods for Geographers, John Wiley & Sons, Inc. Clarke, J.I. 1966. "Morphometry from Maps." G.H. Dury (ed.): Essays in Geomorphology, Heinemann, London, pp.235-274. Clarke, K . C . 1987. "Scale-Based Simulation of Topography." Proceedings of Auto-Carto 8, American Congress on Surveying and Mapping, pp.680-688. 388 Clarke, K . C . and D . M . Schweizer. 1991. "Measuring the Fractal Dimension of Natural Surfaces Using a Robust Fractal Estimator." Cartography and Geographic Information Systems, Vol.18, N o . l , pp.37-47. Cohen, J. 1960. " A Coefficient of Agreement for Nominal Scales." Educational and Psychological Measurement, Vol.20, N o . l , pp.37-40. Congalton, R.G. 1988a. "Using Spatial Autocorrelation Analysis to Explore the Errors in Maps Generated from Remotely Sensed Data." Photogrammetric Engineering and Remote Sensing, Vol.54, No.5, pp.587-592. Congalton, R . G . 1988b. " A Comparison of Sampling Schemes Used in Generating Error Matrices for Assessing the Accuracy of Maps Generated from Remotely Sensed Data." Photogrammetric Engineering and Remote Sensing, Vol.54, No.5, pp.593-600. Congalton, R.G., R.G. Oderwald, and R.A. Mead. 1983. "Assessing Landsat Classification Accuracy Using Discrete Multivariate Analysis Statistical Techniques." Photogrammetric Engineering and Remote Sensing, Vol.49, No. 12, pp. 1671-1678. Corbett, J.D. and P.J. Gersmehl. 1987. "Terrain Data for a Water Resources GIS." D . A . Brown and P.J. Gersmehl (eds.): File Structure Design and Data Specifications for Water Resources Geographic Information Systems, pp.11-5 to 11-28. Csillag, F., A . Kummert, and M . Kertesz. 1992. "Resolution, Accuracy, and Attributes: Approaches for Environmental Geographical Information Systems." Computer, Environment, and Urban Systems, Vol.16, pp.289-297. Davis, J.C. 1986. Statistics and Data Analysis in Geology, John Wiley & Sons, Inc. DeLotto, J.S. 1989. The Role of Scale in Automated Terrain Classification, M . A . thesis, the State University of New York at Buffalo, Buffalo, New York, 53p. Dozier, J. and S.I. Outcalt. 1979. "An Approach toward Energy Balance Simulation over Rugged Terrain." Geographic Analysis, Vol.11, N o . l , pp.65-85. Dubayah, R. and F.W. Davis. 1988. "Factors Influencing the Utility of Digital Elevation Models in Ecological Research." Proceedings of the 3rd International Symposium on Spatial Data Handling, Sydney. Dunn, G . and B.S. Everitt. 1982. A n Introduction to Mathematical Taxonomy, Cambridge University Press. Eastman, J.R. 1985. "Single-Pass Measurement of the Fractal Dimension of Digitized Cartographic Lines." paper presented at the annual meeting of the Canadian Cartographic 389 Association, New Brunswick. Evans, I.S. 1972. "General Geomorphometry, Derivatives of Altitude, and Descriptive Statistics." R.J. Chorley (ed.): Spatial Analysis in Geomorphology, London: Methuen & Co. Ltd., pp. 17-90. Evans, I.S. 1980. "An Integrated System of Terrain Analysis and Slope Mapping." Zeitschrift fur Geomorphology, N.F., Suppl. -Bd.36, pp.274-295. Felgueiras, C.A. and M.F. Goodchild. 1995. "A Comparison of Three TIN Surface Modelling Methods and Associated Algorithms." Technical Report 95-2, National Center for Geographic Information and Analysis. Fisher, P. 1991. "First Experiments in Viewshed Uncertainty: The Accuracy of the Viewshed Area." Photogrammetric Engineering and Remote Sensing, Vol.57, No.10, pp.1321-1327. Fisher, R.A. 1953. "Dispersion on a Sphere." Proceedings of the Royal Society of London, Series A, Mathematical and Physical Sciences, Vol.217, pp.295-305. Frederiksen, P. 1981. "Terrain Analysis and Accuracy Prediction by Means of the Fourier Transformation." Photogrammetria, 36: 145-157. Frederiksen, P., O. Jacobi and K . Kubik. 1985. " A Review of Current Trends in Terrain Modelling." ITC Journal, 1985-2, pp.101-106. Gerrard, A.J.W. and D.A. Robinson. 1971. "Variability in Slope Measurements. A Discussion of the Effects of Different Recording Intervals and Micro-Relief in Slope Studies." Transactions of the Institute of British Geographers, Vol.54, pp.45-54. Gittings, B . 1994. "Digital Elevation Data Catalogue." GIS-L email network. Goodall, D.W. 1966. "Deviant Index: A New Tool for Numerical Taxonomy." Nature, Vol.210, pp.216. Goodchild, M.F. 1980. "Fractals and the Accuracy of Geographical Measures." Mathematical Geology, Vol.12, No.2, pp.85-97. Goodchild, M.F. 1982. "The Fractional Brownian Process as a Terrain Simulation Model." Modelling and Simulation, Vol.13, pp.1133-1137. Goodchild, M . F . 1988. "The Issue of Accuracy in Global Databases." H . Mounsey (ed.): Building Databases for Global Science, London: Taylor & Francis Ltd., pp.31-48. Goodchild, M.F. 1989. "National Center for Geographic Information and Analysis, Research 390 Initiative One: Accuracy of Spatial Databases." Report of Specialist Meeting, University of California—Santa Barbara. Goodchild, M.F. 1992. Research Initiative 1: Accuracy of Spatial Databases. Final Report, National Center for Geographic Information and Analysis, University of California—Santa Barbara. Goodchild, M.F. and S. Gopal. 1989. The Accuracy of Spatial Databases, London: Taylor & Francis Ltd. Goodchild, M . F . and D . M . Mark. 1987. "The Fractal Nature of Geographic Phenomena." Annals of the Association of American Geographers, 77(2), pp.265-278. Goodchild, M.F. and N.J. Tate. 1992. "Forum: Description of Terrain as a Fractal Surface, and Application to Digital
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Topographic characterization for DEM error modelling
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Topographic characterization for DEM error modelling Xiao, Yanni 1996
pdf
Page Metadata
Item Metadata
Title | Topographic characterization for DEM error modelling |
Creator |
Xiao, Yanni |
Date Issued | 1996 |
Description | Digital Elevation Models have been in use for more than three decades and have become a major component of geographic information processing. The intensive use of DEMs has given rise to many accuracy investigations. The accuracy estimate is usually given in a form of a global measure such as root-mean-square error (RMSE), mostly from a producer's point of view. Seldom are the errors described in terms of their spatial distribution or how the resolution of the DEM interacts with the variability of terrain. There is a wide range of topographic variation present in different terrain surfaces. Thus, in defining the accuracy of a DEM, one needs ultimately to know the global and local characteristics of the terrain and how the resolution interacts with them. In this thesis, DEMs of various resolutions (i.e., 10 arc-minutes, 5 arc-minutes, 2 km, 1 km, and 50 m) in the study area (Prince George, British Columbia) were compared to each other and their mismatches were examined. Based on the preliminary test results, some observations were made regarding the relations among the spatial distribution of DEM errors, DEM resolution and the roughness of terrain. A hypothesis was proposed that knowledge of the landscape characteristics might provide some insights into the nature of the inherent error (or uncertainty) in a DEM. To test this statistically, the global characteristics of the study area surfaces were first examined by measures such as grain and those derived from spectral analysis, nested analysis of variance and fractal analysis of DEMs. Some important scale breaks were identified for each surface and this information on the surface global characteristics was then used to guide the selection of the moving window sizes for the extraction of the local roughness measures. The spatial variation and complexity of various study area surfaces was characterized by means of seven local geomorphometric parameters. The local measures were extracted from DEMs with different resolutions and using different moving window sizes. Then the multivariate cluster analysis was used for automated terrain classification in which relatively homogeneous terrain types at different scale levels were identified. Several different variable groups were used in the cluster analysis and the different classification results were compared to each other and interpreted in relation to each roughness measure. Finally, the correlations between the DEM errors and each of the local roughness measures were examined and the variation of DEM errors within various terrain clusters resulting from multivariate classifications were statistically evaluated. The effectiveness of using different moving window sizes for the extraction of the local measures and the appropriateness of different variable groups for terrain classification were also evaluated. The major conclusion of this study is that knowledge of topographic characteristics does provide some insights into the nature of the inherent error (or uncertainty) in a DEM and can be useful for DEM error modelling. The measures of topographic complexity are related to the observed patterns of discrepancy between DEMs of differing resolution, but there are variations from case to case. Several patterns can be identified in terms of relation between DEM errors and the roughness of terrain. First of all, the DEM errors (or elevation differences) do show certain consistent correlations with each of the various local roughness variables. With most variables, the general pattern is that the higher the roughness measure, the more points with higher absolute elevation differences (i.e., horn-shaped scatter of points indicating heteroscedasticity). Further statistical test results indicate that various DEM errors in the study area do show significant variation between different clusters resulting from terrain classifications based on different variable groups and window sizes. Cluster analysis was considered successful in grouping the areas according to their overall roughness and useful in DEM error modelling. In general, the rougher the cluster, the larger the DEM error (measured with either the standard deviation of the elevation differences or the mean of the absolute elevation differences in each cluster). However, there is still some of the total variation of various DEM errors that could not be accounted for by the cluster structure derived from multivariate classification. This could be attributed to the random errors inherent in any of the DEMs and the errors introduced in the interpolation process. Another conclusion is that the multivariate approach to the classification of topographic surfaces for DEM error modelling is not necessarily more successful than using only a single roughness measure in characterizing the overall roughness of terrain. When comparing the DEM error modelling results for surfaces with different global characteristics, the size of the moving window used in geomorphometric parameter abstraction also has certain impact on the modelling results. It shows that some understanding of the global characteristics of the surface is useful in the selection of appropriate/optimal window sizes for the extraction of local measures for DEM error modelling. Finally, directions for further research are suggested. |
Extent | 37255724 bytes |
Subject |
Digital mapping - Quality control Geographic information systems |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-03-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0087747 |
URI | http://hdl.handle.net/2429/6235 |
Degree |
Doctor of Philosophy - PhD |
Program |
Geography |
Affiliation |
Arts, Faculty of Geography, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1996-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1996-091864.pdf [ 35.53MB ]
- Metadata
- JSON: 831-1.0087747.json
- JSON-LD: 831-1.0087747-ld.json
- RDF/XML (Pretty): 831-1.0087747-rdf.xml
- RDF/JSON: 831-1.0087747-rdf.json
- Turtle: 831-1.0087747-turtle.txt
- N-Triples: 831-1.0087747-rdf-ntriples.txt
- Original Record: 831-1.0087747-source.json
- Full Text
- 831-1.0087747-fulltext.txt
- Citation
- 831-1.0087747.ris