UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Topographic characterization for DEM error modelling Xiao, Yanni 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1996-091864.pdf [ 35.53MB ]
Metadata
JSON: 831-1.0087747.json
JSON-LD: 831-1.0087747-ld.json
RDF/XML (Pretty): 831-1.0087747-rdf.xml
RDF/JSON: 831-1.0087747-rdf.json
Turtle: 831-1.0087747-turtle.txt
N-Triples: 831-1.0087747-rdf-ntriples.txt
Original Record: 831-1.0087747-source.json
Full Text
831-1.0087747-fulltext.txt
Citation
831-1.0087747.ris

Full Text

TOPOGRAPHIC C H A R A C T E R I Z A T I O N FOR D E M ERROR M O D E L L I N G by YANNI XIAO B . S c , Lanzhou University, 1984 M . S c , The Chinese Academy of Sciences, 1987  A THESIS SUBMITTED IN P A R T I A L F U L F I L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF DOCTOR OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Geography)  We accept this thesis as conforming to the required standard.  T H E UNIVERSITY OF BRITISH C O L U M B I A March 1996 ©Yanni Xiao, 1996  In  presenting this  degree at the  thesis in  University of  partial  fulfilment  of  the  requirements  British Columbia, I agree that the  for  an advanced  Library shall make it  freely available for reference and study. I further agree that permission for extensive copying  of  department  this thesis for or  by  his  or  scholarly purposes may be granted her  representatives.  It  is  by the  understood  that  head of copying  my or  publication of this thesis for financial gain shall not be allowed without my written permission.  Department of  Q&OGnftPH^/  The University of British Columbia Vancouver, Canada  DE-6 (2/88)  ABSTRACT  Digital Elevation Models have been in use for more than three decades and have become a major component of geographic information processing. The intensive use of D E M s has given rise to many accuracy investigations. The accuracy estimate is usually given in a form of a global measure such as root-mean-square error (RMSE), mostly from a producer's point of view. Seldom are the errors described in terms of their spatial distribution or how the resolution of the D E M interacts with the variability of terrain. There is a wide range of topographic variation present in different terrain surfaces. Thus, in defining the accuracy of a D E M , one needs ultimately to know the global and local characteristics of the terrain and how the resolution interacts with them.  In this thesis, DEMs of various resolutions (i.e., 10 arc-minutes, 5 arc-minutes, 2 km, 1 km, and 50 m) in the study area (Prince George, British Columbia) were compared to each other and their mismatches were examined.  Based on the preliminary test results, some  observations were made regarding the relations among the spatial distribution of D E M errors, D E M resolution and the roughness of terrain. A hypothesis was proposed that knowledge of the landscape characteristics might provide some insights into the nature of the inherent error (or uncertainty) in a D E M . To test this statistically, the global characteristics of the study area surfaces were first examined by measures such as grain and those derived from spectral analysis, nested analysis of variance and fractal analysis of DEMs. Some important scale breaks were identified for each surface and this information on the surface global  ii  characteristics was then used to guide the selection of the moving window sizes for the extraction of the local roughness measures. The spatial variation and complexity of various study area surfaces was characterized by means of seven local geomorphometric parameters. The local measures were extracted from DEMs with different resolutions and using different moving window sizes. Then the multivariate cluster analysis was used for automated terrain classification in which relatively homogeneous terrain types at different scale levels were identified.  Several different variable groups were used in the cluster analysis and the  different classification results were compared to each other and interpreted in relation to each roughness measure. Finally, the correlations between the D E M errors and each of the local roughness measures were examined and the variation of D E M errors within various terrain clusters resulting from multivariate classifications were statistically evaluated.  The  effectiveness of using different moving window sizes for the extraction of the local measures and the appropriateness of different variable groups for terrain classification were also evaluated.  The major conclusion of this study is that knowledge of topographic characteristics does provide some insights into the nature of the inherent error (or uncertainty) in a D E M  and can be useful for D E M error modelling. The measures of topographic complexity are related to the observed patterns of discrepancy between DEMs of differing resolution, but there are variations from case to case. Several patterns can be identified in terms of relation between D E M errors and the roughness of terrain. First of all, the D E M errors (or elevation differences) do show certain consistent correlations with each of the various local roughness  iii  variables. With most variables, the general pattern is that the higher the roughness measure, the more points with higher absolute elevation differences (i.e., horn-shaped scatter of points indicating heteroscedasticity). Further statistical test results indicate that various D E M errors in the study area do show significant variation between different clusters resulting from terrain classifications based on different variable groups and window sizes. Cluster analysis was considered successful in grouping the areas according to their overall roughness and useful in D E M error modelling. In general, the rougher the cluster, the larger the D E M error (measured with either the standard deviation of the elevation differences or the mean of the absolute elevation differences in each cluster). However, there is still some of the total variation of various D E M errors that could not be accounted for by the cluster structure derived from multivariate classification. This could be attributed to the random errors inherent in any of the DEMs and the errors introduced in the interpolation process.  Another conclusion is that the multivariate approach to the classification of topographic surfaces for D E M error modelling is not necessarily more successful than using only a single roughness measure in characterizing the overall roughness of terrain. When comparing the D E M error modelling results for surfaces with different global characteristics, the size of the moving window used in geomorphometric parameter abstraction also has certain impact on the modelling results. It shows that some understanding of the global characteristics of the surface is useful in the selection of appropriate/optimal window sizes for the extraction of local measures for D E M error modelling.  Finally, directions for further research are  suggested.  iv  T A B L E OF CONTENTS  ABSTRACT  ii  T A B L E OF CONTENTS  v  LIST OF T A B L E S  xi  LIST OF FIGURES  xx  ACKNOWLEDGEMENTS  1.  xxx  INTRODUCTION  1  1.1  Overview  1  1.2  Background of Digital Terrain Modelling  4  1.2.1  Digital terrain models  4  1.2.2  Digital Terrain Models in geographic modelling  12  1.2.3  Topographic data availability  13  1.2.4  Defining accuracy and error  16  1.3  Statement of the Problem  21  1.4  Thesis Outline  25  v  2.  ERROR IN DIGITAL T E R R A I N M O D E L S  27  2.1  Overview  27  2.2  Error Detection and Rectification in Digital Terrain Models  28  2.2.1  Errors in USGS gridded DEMs  28  2.2.2  Gross-error detection and correction  31  2.3  Accuracy Estimation and Error Modelling  33  2.3.1  Global accuracy measure  33  2.3.1.1  Terminology  33  2.3.1.2  Topographic map accuracy standards  38  2.3.1.3  Effects of check points on the reliability of  2.3.1.4  2.4  3.  accuracy estimates  45  Interpolation accuracy  47  2.3.2  Spatial variation of interpolation errors  59  2.3.3  Propagation of D E M error  61  Summary  61  S T U D Y A R E A D A T A A N D S O M E P R E L I M I N A R Y TESTS  63  3.1  Study Area Selection  63  3.2  Database Implementation  67  3.2.1  NGDC10  67  3.2.2  NGDC5  70  3.2.3  WSC2  73  vi  3.3  3.2.4  EMR1  75  3.2.5  T R I M 93G and T R I M 93H  80  Some Preliminary Tests  91  3.3.1  Topographic profile comparison  94  3.3.2  D E M spatial mismatch  96  3.3.2.1  Two comparisons between DEMs for the whole study area  3.3.2.2  Four comparisons between DEMs for the two subareas  4.  97  97  3.4  Observations Based on Preliminary Tests  107  3.5  Thesis Hypothesis  112  METHODOLOGIES FOR TOPOGRAPHIC C H A R A C T E R I Z A T I O N  115  4.1  Overview  115  4.2  Extraction of the Local Geometric Measures  119  4.2.1  Local relief (LR)  120  4.2.2  Standard deviation of altitude (SD)  121  4.2.3  Slope and aspect (a and p)  122  4.2.4  Roughness factor (RF)  125  4.2.5  Slope curvature (SQ  128  4.2.6  Number of higher points (HP)  129  4.2.7  Hypsometric integral (HI)  129  vii  4.3  4.4  4.5  5.  Global Surface Characterization  132  4.3.1  Grain  133  4.3.2  Spectral analysis of terrain  133  4.3.3  Nested analysis of variance  141  4.3.4  Description of Terrain as a Fractal Surface  147  Hierarchical Terrain Classification  156  4.4.1  Principles of terrain classification  156  4.4.2  Concept of distance  158  4.4.3  Variable standardization  160  4.4.4  Terrain classification  161  What is next?  164  TOPOGRAPHIC C H A R A C T E R I Z A T I O N RESULTS  167  5.1  The Role of Scale in Topographic Characterization  167  5.1.1  Grain determination  168  5.1.2  Fourier interpretation  171  5.1.3  Examining scale effects by nested analysis of variance  189  5.1.4  Interpreting the variograms  196  5.1.5  A comparison of various global surface methods and their results  characterization 201  5.2  Local Surface Characteristics  206  5.3  Multivariate Classification Results  219  viii  6.  5.3.1  Variable selection  220  5.3.2  Grouping of homogeneous areas  227  5.3.3  Interpretation of classification results  242  5.3.4  Comparisons of different classification results  264  D E M ERROR MODELLING  277  6.1  Overview  277  6.2  A New Approach to D E M Error Modelling  278  6.2.1  D E M errors and local roughness measures  279  6.2.2  D E M errors in various terrain clusters  294  6.2.2.1  WSC2 D E M errors as compared to T R I M DEM  6.2.2.2  295  E M R 1 D E M errors as compared to T R I M DEM  6.2.2.3  6.2.3  320  NGDC5 and WSC2 D E M errors as compared to EMR1 D E M  337  Significance tests of D E M error variation  355  6.2.3.1  Significance tests of WSC2 and E M R 1 D E M errors in the two subareas  6.2.3.2  Significance test of N G D C 5 and WSC2 D E M errors in the whole study area  6.3  356  Summary  368 374  ix  7.  S U M M A R Y A N D CONCLUSIONS  379  7.1  Summary  379  7.2  Conclusions  381  7.3  A Practical Guide for the Users  383  7.4  Discussions and Future Research . . .  384  BIBLIOGRAPHY  APPENDIX A:  APPENDIX B:  387  A SELECTION OF DIFFERENT DIGITAL TOPOGRAPHIC D A T A PROVIDERS  401  T R I M D A T A M O E P ASCII FILE F O R M A T  407  x  LIST OF T A B L E S  Table 2.1  Probability of errors in the univariate case  42  Table 2.2a  Probability of errors in the bivariate case (MSEP method)  43  Table 2.2b  Probability of errors in the bivariate case (CSE method)  43  Table 3.1  A brief description of each sample data set  68  Table 3.2  The 11 parameters in the N G D C 10' global data set  69  Table 3.3  Summary statistics of the study area N G D C 10, NGDC5, WSC2 and EMR1 D E M data  71  Table 3.4  Specifications of EMR1 D E M data collection for the study area  78  Table 3.5  T R I M data collection density for the study area  83  Table 3.6  Statistics of two subset T R I M DEMs  89  Table 3.7  Summary statistics of six D E M comparisons (elevation differences)  Table 4.1  Scale variance for the artificial data in Figure 4.7  146  Table 5.1a  Scale variances for T R I M 93G D E M data {trim.93g.256)  190  Table 5.1b  Scale variances for T R I M 93H D E M data (trim93h.256)  191  Table 5.2  Scale variances for EMR1.128(1) and EMR1.128(2)  194  Table 5.3  A summary of the important scales in the study area surfaces identified by various global methods  Table 5.4a  . . 110  203  The rank-order correlation matrix of seven variables for T R I M 93G (values above the diagonal) and T R I M 93H (values below the diagonal) subareas (window size: 7x7)  xi  224  Table 5.4b  The rank-order correlation matrix of seven variables for the whole study area. Values above the diagonal are for window size 5x5 and values below the diagonal are for window size 9x9  Table 5.5a  225  Summary statistics of local relief values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H  Table 5.5b  243  Summary statistics of slope values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H  Table 5.5c  244  Summary statistics of slope curvature values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H  Table 5.5d  245  Summary statistics of hypint values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H  Table 5.5e  246  Summary statistics of standard deviation of elevation values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H  Table 5.5f  247  Summary statistics of highpt values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H  Table 5.5g  248  Summary statistics of rough values (window size: 7x7) within each  xii  terrain cluster (variable group: 3-5) for the two subareas 93G and 93H Table 5.6a  249  Summary statistics of local relief values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area  Table 5.6b  257  Summary statistics of slope values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 258  Table 5.6c  Summary statistics of slope curvature values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area  Table 5.6d  259  Summary statistics of hypint values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 260  Table 5.6e  Summary statistics of standard deviation of elevation values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area  Table 5.6f  261  Summary statistics of highpt values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 262  Table 5.6g  Summary statistics of rough values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. . . . 263  Table 6.1a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas  Table 6.1b  Summary statistics of the absolute WSC2 D E M errors in each terrain  xiii  298  cluster (window size: 7x7, variable group: 3-5) for the two subareas . . 300 Table 6.1c  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Table 6.Id  301  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Table 6.2a  302  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas  Table 6.2b  303  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas . . 304  Table 6.2c  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas  Table 6.2d  305  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas  Table 6.3a  306  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas  Table 6.3b  307  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas . . . 308  Table 6.3c  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas  Table 6.3d  Summary statistics of the absolute WSC2 D E M errors in each terrain  xiv  309  cluster (window size: 21x21, variable group: 2) for the two subareas . . 310 Table 6.4a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas  Table 6.4b  312  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas . . . 313  Table 6.4c  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas  Table 6.4d  314  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas . . 315  Table 6.5a  Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas  Table 6.5b  321  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas . . 322  Table 6.5c  Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Table 6.5d  323  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Table 6.6a  324  Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas  Table 6.6b  325  Summary statistics of the absolute E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas . . 326  xv  Table 6.6c  Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas  Table 6.6d  327  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas . .  Table 6.7a  328  Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas  Table 6.7b  329  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas . . . 330  Table 6.7c  Summary statistics of E M R 1 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas  Table 6.7d  331  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas . . 332  Table 6.8a  Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas  Table 6.8b  333  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas . . . 334  Table 6.8c  Summary statistics of EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas  Table 6.8d  335  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas . . 336  Table 6.9a  Summary statistics of NGDC5 D E M errors in each terrain cluster  xvi  (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area Table 6.9b  338  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area  Table 6.10a  339  Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Table 6.10b  341  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Table 6.11a  342  Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Table 6.1 lb  343  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Table 6.12a  344  Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area  Table 6.12b  345  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole  xvii  study area Table 6.13a  346  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area  Table 6.13b  347  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area  Table 6.14a  348  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Table 6.14b  349  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Table 6.15a  350  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Table 6.15b  351  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Table 6.16a  352  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area  353  xviii  Table 6.16b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area  354  xix  LIST OF FIGURES  Figure 1.1  The five generic forms of DTMs  8  Figure 2.1  Comparison of D T M surfaces with the 'true' terrain surface  35  Figure 2.2  Factors influencing the performance of a D T M  48  Figure 3.1  5' D E M image of British Columbia  64  Figure 3.2  Study area map sheet numbers  66  Figure 3.3  Statistical summary of the study area 10', 5', 2 km and 1 km D E M data  72  Figure 3.4  W S C data coding method  74  Figure 3.5  W S C 2 km D E M data points  76  Figure 3.6  E M R 1 D E M data points  79  Figure 3.7  Full-size and subset D E M images for subarea T R I M 93G  86  Figure 3.8  Full-size and subset D E M images for subarea T R I M 93H  87  Figure 3.9  Histograms of TRIM93G.SUB and TRIM93H.SUB DEMs  88  Figure 3.10  Slope histograms for subareas T R I M 93G and T R I M 93H  90  Figure 3.11  Registration of various study area D E M data sets  92  Figure 3.12  Four D E M images of the whole study area and the D E M images of the two subareas (93G and 93H)  93  Figure 3.13  Visual comparison of 10' and 1 km topographic profiles  95  Figure 3.14  Comparison results of the study area DEMs N G D C 5 and E M R 1  Figure 3.15  Comparison results of the study area DEMs WSC2 and EMR1  xx  ....  98 99  Figure 3.16  Comparison results of EMR1 and T R I M DEMs for subarea 93G . . . .  101  Figure 3.17  Comparison results of EMR1 and T R I M DEMs for subarea 93H . . . .  102  Figure 3.18  Comparison results of WSC2 and T R I M DEMs for subarea 93G . . . .  103  Figure 3.19  Comparison results of WSC2 and T R I M DEMs for subarea 93H . . . .  104  Figure 3.20  Comparison results of WSC2 and T R I M DEMs for subarea 93G using three different interpolators  Figure 3.21  105  Comparison results of WSC2 and T R I M DEMs for subarea 93H using three different interpolators  Figure 3.22  106  Differences between the 50 m DEMs interpolated from WSC2 D E M using different interpolators (ncp=0, 2, 4) for subarea 93G  Figure 3.23  108  Differences between the 50 m DEMs interpolated from WSC2 D E M using different interpolators {ncp-0, 2, 4) for subarea 93H  109  Figure 4.1  Forms of surface roughness after Mark [1974]  118  Figure 4.2  3 by 3 neighbourhood of elevation points for slope calculation  124  Figure 4.3  Vector strength and vector dispersion  127  Figure 4.4  2-D spectrum of B.C. 5' D E M  139  Figure 4.5  Hierarchical structure of a square matrix  144  Figure 4.6  A numeric example of a 16 by 16 even hierarchy (after Moellering and Tobler, 1972)  Figure 4.7  Figure 4.8  145  Self-similar surfaces generated with a fractional Brownian process using a range of H values  152  Generic sequence of automated terrain classification  163  xxi  Figure 4.9  Major  steps for the  study  area  surface  characterization and  classification Figure 5.1a  166  Graphs used to determine the surface grain for the two subareas 93G and 93H  169  Figure 5.1b  Graph used to determine the surface grain for the whole study area  . . 170  Figure 5.2a  2-D spectrum of the study area derived from 5' D E M  172  Figure 5.2b  2-D spectrum of the study area derived from EMR1 D E M  173  Figure 5.2c  2-D spectrum of subarea 93G  174  Figure 5.2d  2-D spectrum of subarea 93H  175  Figure 5.3a  The spectrum plot of an east/west topographic profile  from  TRIM93G.SUB D E M Figure 5.3b  177  The spectrum plot of a north/south topographic profile  from  TRIM93G.SUB D E M Figure 5.3c  178  The spectrum plot of an east/west topographic profile  from  TRIM93H.SUB D E M Figure 5.3d  179  The spectrum plot of a north/south topographic profile  from  TRIM93H.SUB D E M Figure 5.4a  180  The spectrum plot of topographic profile emr.profile 1 from E M R 1 DEM  Figure 5.4b  182  The spectrum plot of topographic profile emr.profile2 from E M R 1 DEM  Figure 5.4c  183  The spectrum plot of topographic profile emr.profile3 from E M R 1  xxii  DEM Figure 5.4d  184  The spectrum plot of topographic profile emr.profile4 from E M R 1 DEM  Figure 5.4e  185  The spectrum plot of topographic profile emr.profile5 from E M R 1 DEM  Figure 5.4f  186  The spectrum plot of topographic profile emr.profile6 from E M R 1 DEM  Figure 5.5a  187  The spectrum computed for TRJJVI93G and TRIM93H data. Each aggregated in an even manner to form a hierarchy  Figure 5.5b  192  The spectrum computed for EMR1.128(1) and EMR1.128(2) data. Each aggregated in an even manner to form a hierarchy  193  Figure 5.6a  Variogram plots of T R I M 93G and 93H DEMs  198  Figure 5.6b  Variogram plot of EMR1 D E M  200  Figure 5.7a  Grayscale image representations of local relief for 93G and 93H . . . . 207  Figure 5.7b  Grayscale image representations of the standard deviation of elevations for 93G and 93H  Figure 5.7c  208  Grayscale image representations of slope and aspect for 93G and 93H  Figure 5.7d  209  Grayscale image representations of roughness factor for 93G and 93H  210  Figure 5.7e  Grayscale image representations of slope curvature for 93G and 93H . 211  Figure 5.7f  Grayscale image representations of the number of D E M points that are  xxiii  higher than the center point of the moving window (subarea: 93G and 93H, window size: 7x7 and 21x21) Figure 5.7g  212  Grayscale image representations of hypsometric integral for 93G and 93H  Figure 5.8a  213  Grayscale image representations of local relief and standard deviation of elevations for the whole study area  Figure 5.8b  Grayscale image representations  of slope, slope curvature and  roughness factor for the whole study area Figure 5.8c  217  The scattergrams of all the variable pairs (window size: 7x7) for subareas 93G and 93H  Figure 5.9b  216  Grayscale image representations of HI and HP for the whole study area  Figure 5.9a  215  222  The scattergrams of all the variable pairs for the whole study area . . . 223  Figure 5.10a Classification results (3 classes) based on variable group (3-5) and moving window size (21x21) for subareas 93G and 93H  229  Figure 5.10b Classification results (3 classes) based on variable group (3-6) and moving window size (21x21) for subareas 93G and 93H  230  Figure 5.10c Classification results (3 classes) based on variable group (2) and moving window size (21x21) for subareas 93G and 93H  231  Figure 5.10d Classification results (3 classes) based on variable group (5) and moving window size (21x21) for subareas 93G and 93H Figure 5.11a Classification results (3 classes) based on variable group (3-5) and  xxiv  232  moving window size (7x7) for subareas 93G and 93H  233  Figure 5.11b Classification results (3 classes) based on variable group (3-6) and moving window size (7x7) for subareas 93G and 93H  234  Figure 5.11c Classification results (3 classes) based on variable group (2) and moving window size (7x7) for subareas 93G and 93H Figure 5 . l i d  235  Classification results (3 classes) based on variable group (5) and moving window size (7x7) for subareas 93G and 93H  236  Figure 5.12a Classification results (3 classes) based on variable group (3-5) and moving window sizes (5x5) and (9x9) for the whole study area  237  Figure 5.12b Classification results (3 classes) based on variable group (3-6) and moving window sizes (5x5) and (9x9) for the whole study area  238  Figure 5.12c Classification results (3 classes) based on variable group (2) and moving window sizes (5x5) and (9x9) for the whole study area  239  Figure 5.12d Classification results (3 classes) based on variable group (5) and moving window sizes (5x5) and (9x9) for the whole study area  240  Figure 5.13a Classification error matrices and K H A T statistics for 12 comparisons of classification results for subarea 93G (nclass=3)  267  Figure 5.13b Classification error matrices and K H A T statistics for 4 comparisons (difference between the two window sizes) of classification results for subarea 93G (nclass=3)  268  Figure 5.13c Classification error matrices and K H A T statistics for 12 comparisons of classification results for subarea 93H (nclass=3)  xxv  270  Figure 5.13d Classification error matrices and K H A T statistics for 4 comparisons (difference between the two window sizes) of classification results for subarea 93H (nclass=3) Figure 5.13e  271  Classification error matrices and K H A T statistics for 12 comparisons of classification results for the whole study area (nclass=3)  Figure 5.13f  272  Classification error matrices and K H A T statistics for 4 comparisons (difference between the two window sizes) of classification results for the whole study area (nclass=3)  273  Figure 5.13g Summary of the K H A T statistics for all classification comparisons for the whole study area and the two subareas (nclass=3) Figure 6.1a  The scattergrams between WSC2 D E M error and local relief (window size 7x7) for the two subareas  Figure 6.1b  282  The scattergrams between WSC2 D E M error and slope (window size 21x21) for the two subareas  Figure 6.3a  283  The scattergrams between WSC2 D E M error and 'cwrv' (window size 7x7) for the two subareas  Figure 6.3b  281  The scattergrams between WSC2 D E M error and slope (window size 7x7) for the two subareas  Figure 6.2b  280  The scattergrams between WSC2 D E M error and local relief (window size 21x21) for the two subareas  Figure 6.2a  274  284  The scattergrams between WSC2 D E M error and variable 'cwrv' (window size 21x21) for the two subareas  xxvi  285  Figure 6.4a  The scattergrams between WSC2 D E M error and 'hypinf (window size 7x7) for the two subareas  Figure 6.4b  286  The scattergrams between WSC2 D E M error and variable 'hypinf (window size 21x21) for the two subareas  Figure 6.5a  287  The scattergrams between WSC2 D E M error and 'sta" (window size 7x7) for the two subareas  Figure 6.5b  288  The scattergrams between WSC2 D E M error and variable 'std' (window size 21x21) for the two subareas  Figure 6.6a  289  The scattergrams between WSC2 D E M error and 'highpf (window size 7x7) for the two subareas  Figure 6.6b  290  The scattergrams between WSC2 D E M error and variable 'highpf (window size 21x21) for the two subareas  Figure 6.7a  291  The scattergrams between WSC2 D E M error and 'rough (window size 7  7x7) for the two subareas Figure 6.7b  292  The scattergrams between WSC2 D E M error and variable 'rough' (window size 21x21) for the two subareas  Figure 6.8a  Histogram of WSC2 D E M errors (3 clusters, variable group: 3-5, window size: 7x7) for the two subareas  Figure 6.8b  296  Histogram of WSC2 D E M errors (4 clusters, variable group: 3-5, window size: 7x7) for the two subareas  Figure 6.9a  293  297  The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21)  xxvii  357  Figure 6.9b  The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21)  Figure 6.9c  The significance test results for WSC2 D E M errors for the two subareas (variable group: 2, window size: 7x7 and 21x21)  Figure 6.9d  360  361  The significance test results for WSC2 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21)  362  Figure 6.10a The significance test results for E M R 1 D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21)  364  Figure 6.10b The significance test results for E M R 1 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21)  365  Figure 6.10c The significance test results for EMR1 D E M errors for the two subareas (variable group: 2, window size: 7x7 and 21x21)  366  Figure 6.10d The significance test results for E M R 1 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21)  367  Figure 6.11a The significance test results for NGDC5 D E M errors for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5) . . . . 369 Figure 6.11b The significance test results for N G D C 5 D E M errors for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 9x9) . . . . 370 Figure 6.12a The significance test results for WSC2 D E M errors (as compared to EMR1 D E M ) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5)  371  Figure 6.12b The significance test results for WSC2 D E M errors (as compared to  xxviii  EMR1 D E M ) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 9x9)  372  Figure A. 1.  The areal coverage of E M R G G G D digital terrain files  404  Figure A.2.  British Columbia Geographic System (BCGS)  405  xxix  ACKNOWLEDGEMENTS I wish to gratefully thank my supervisor, Dr. Brian Klinkenberg, for his very valuable support, guidance, advice and encouragement throughout my years as a student at the University of British Columbia and as a junior professor at the University of Lethbridge. I would also like to acknowledge the help of my advising committee members: Dr. Tom Poiker of Simon Fraser University and Dr. Michael Church of the University of British Columbia. Their ideas, suggestions, expertise and willingness to help are very much appreciated. To Dr. Rostam Yazdanni of B.C. Ministry of Environment, Lands and Parks and Ms. Susan Rountree of Water Survey of Canada my thanks for providing some of the sample test data. M y great appreciation also goes to Dr. Brian Klinkenberg of the University of British Columbia and Dr. Fionn Murtagh of European Southern Observatory, Germany for kindly providing some of the computer programs and codes. Thanks are also due to my best friend Rose Klinkenberg for her kindness in offering me my very first home to stay in Canada and her continuous encouragement and moral support. Finally, a lot of very valuable suggestions and comments on the manuscript, especially from Drs. T. Poiker, M . Church, P. Gong, I. Saunders and B . Klinkenberg, are very much appreciated. This thesis would not have been possible without the welcoming resources of the University of Lethbridge and encouragement from Drs. R. Barendregt and R. Rogerson.  xxx  1.  INTRODUCTION  1.1  Overview  Topographic information about a geographic area is essential in any type of environmental research. It provides an integrating framework for a wide range of land resources, notably surface materials, soils, water, and vegetation, and thus for an equally wide range of users, including disciplines as diverse as forestry, archaeology, and civil engineering [Mitchell, 1991].  Digital representation of topography is particularly suitable for computer-based  numeric analysis or modelling, so that 'real world' problems may be approached in a representative and efficient manner through automated means. There are many examples of natural resource applications that would benefit from including topography as a variable in the modelling and prediction of phenomena. Advantages of digitally encoding elevation data include fast manipulation, the ability to handle a large volume of data, and, of course, ease of integration with computerized mapping, geographic information systems (GIS) and remote sensing techniques [Theobald, 1989].  The conceptual framework for a computerized representation of topography is the Digital Terrain Model (DTM). It describes a terrain form and its evolution with the help of a model. Such a model defines observable quantities and relation between them [Frederiksen et al, 1985].  DTMs can be used as an analytical tool for quantitative characterization of  topography and, so, they are increasingly being used in a wide range of applications which  1  are dependent on topographic characteristics. This is also partly because recent developments in photogrammetric methods in general, and orthophoto mapping in particular, have resulted in the increasing availability and the mass production of digital topographic data. These data carry the potential for breakthroughs in the characterization of topography.  Worldwide, institutions are now involved in the collection of digital topographic data. Users either directly acquire those data that have been produced or collect their own to meet their special needs. Digital topographic data are collected at different scales or spatial resolutions with different standards and accuracy, and they are applied to various geographic models at different scales for different purposes.  Scale has always been of primary concern in geography. Considerations of scale underlie geographical study in all of its aspects.  A n essential part of the definition of an  environmental system is to define its geographic scale [Church and Mark, 1980]. The term 'scale,' however, has many usages in geographical research, each with a different perspective. It can refer to size or area, the preciseness of measurement, the level of generalization, level of data detail or resolution, relative degree or extent, and so on [Griffiths and Doiron, 1971]. Nonetheless, all perspectives are interrelated. This diversity of usage is also reflected in this thesis.  There are a wide range of scales at which geographic models operate, from a global-scale (or macroscale) atmospheric general circulation model (GCM) to a regional-scale (or mesoscale)  2  hydrologic model or a detailed (or microscale) representation of soil/slope processes. Each attempts to model different scale-governing or scale-governed processes.  Haggett [1965]  states that one of the characteristic features of geographical research is its concern with a particular scale of reality. The statement summarizes the essence of what has come to be known as the 'scale problem' in geography [DeLotto, 1989].  This concept is often  dichotomized by terms as "local and global." Burrough [1983] illustrates this point by suggesting that variations in soil may arise from processes acting at vastly different scales, such as geology, erosion, and earthworms.  Scale has especially long been a fundamental  concern in geomorphology. As indicated by Church and Mark [1980], size and form of landscape features are inextricably connected through the function and scale of geomorphic systems.  Global coverage of topographic data at fine resolution is unlikely to be available and may even be unnecessary.  Data at coarse resolution are highly generalized, with much high-  frequency spatial variation (i.e., geographic detail) removed. In general, mesoscale features will not be represented within global-scale models.  A system which carries too much  geographic detail is wasteful, and one which carries too little geographic detail is not much of use [Tobler, 1988]. Considering land evaluation research, for example, data resolutions needed can range from high to coarse depending on whether an intensive or reconnaissance evaluation is required. The resolution is governed by the purpose of the application and who the eventual users will be. It should be fine enough to show all necessary detail but coarse enough to avoid an excessive amount of data.  3  Just as there is no such thing as an absolutely accurate map, so the absolutely accurate terrain model does not exist [Shearer, 1990].  A D T M is merely a digital representation of a  continuous topographic surface in the real world. It is known that any abstraction of reality will contain discrepancies from its source.  A l l digital terrain models will thus contain  inaccuracies to a greater or lesser extent, depending on a number of factors. It is, therefore, desirable to specify the accuracy of the terrain model in such a way that one could assess its suitability to the chosen or intended application. However, this is seldom addressed and, if it is, the accuracy estimate is usually constrained to one of the following: (i) an estimate of the root-mean-square error (RMSE) in the measurement of elevations; (ii) a description of possible source errors due to human mistakes during data acquisition; and (iii) a description of systematic error of striping, or random noise created during the photogrammetric processes. A l l these are considered from a producer's perspective. Seldom is the accuracy estimated, from a user's point of view, in terms of the spatial pattern (or distribution) of errors or how the spatial resolution of a D T M interacts with all kinds of topographic characteristics [Theobald, 1989]. As a result, the 'fitness of use' of a D T M in a specific application is not known and cannot be justified.  1.2  Background of Digital Terrain Modelling  1.2.1  Digital terrain models  Although the concept of representing terrain using spot heights has been around for many  4  decades, the idea of creating digital models of the terrain is a relatively recent development. The term Digital Terrain Model (DTM) has historically been the generic term used to refer to any digital representation of a topographic surface. It had its origin in work performed by Miller in the Photogrammetry Laboratory at the Massachusetts Institute of Technology during the late 1950s [Miller, 1957; Miller and LaFlamme, 1958], but only later was its potential thoroughly explored. Miller and his colleagues were conducting research for the U.S. Bureau of Public Roads. The objective was to expedite highway design by digital computation based upon photogrammetrically-acquired topographic data. Their definition was as follows:  The digital terrain model (DTM) is simply a statistical representation of the continuous surface of the ground by a large number of selected points with known X Y Z coordinates in an arbitrary coordinate field.  There exist several other terms - e.g., Digital Elevation Model (DEM), Digital Height Model (DHM) and Digital Ground Model (DGM) - which are also commonly used. Although in practice these terms are presumed to be synonymous, in reality they often refer to quite distinct products [Kennie and Petrie, 1990]. Burrough [1986], for example, prefers the term Digital Elevation Model (DEM) "for models containing only elevation data" because, he argues, that "the term 'terrain' often implies attributes of a landscape other than the altitude of the land surface  This can be seen in the definition of 'land' given by Townshend  [1981, p.6] in which 'land' and 'terrain' are treated as synonyms:  Land is an area of the Earth's surface which is characterized by a distinctive assemblage of attributes and interlinking processes in space and time of soil and other surface materials, their atmosphere and water, the landforms, vegetation and animal populations, as well as the results of human activity; also included to the extent that they directly influence the characteristics of the land under consideration are 5  suprasurface properties of the atmosphere, subsurface geological characteristics and the nature of the immediately surrounding land and water. We follow Christian and Stewart (1968) in treating 'land' and 'terrain' as synonyms  However, many others tend to use the term D E M to represent only gridded matrices of elevations, which have also been called Altitude Matrices [Evans, 1980]; and to use the term D T M to refer to any digital representation of topography.  Throughout this thesis, the term D E M refers to a regular array of elevations. The grids usually conform to either the graticule of latitude and longitude or to some grid system such as U T M (Universal Transverse Mercator). Those grids oriented to latitude and longitude are usually referred to as the arc-second or arc-minute data. This format is non-square except at the equator where a unit of latitude and longitude are equal. The term D T M will still be used to represent generally any digital model of a topographic surface and the term 'terrain' and 'topographic surface' will be used interchangeably.  The development of DTMs has experienced a tremendous growth in the past three decades or so. This was especially reflected in the Digital Terrain Models Symposium held in St. Louis in 1978 by the American Society of Photogrammetry [American Society of Photogrammetry, 1978]. Many data structures, which may be broadly defined as collections of objects (data) together with the relations among them [Mark, 1978, 1979], or models, have been developed to digitally represent the continuous variation of surface elevations over an area. Among them, the contour or profile method, the uniform or variable grid structure, and 6  the triangulated irregular network (TIN) structure are the most popular ones (see Figure 1.1). Each has its capabilities and limitations with respect to data acquisition, storage efficiency, retrieval and processing speeds, and accuracy of representation of the surface.  Because contour lines are drawn on most existing topographic maps to describe the surfaces, they are a ready source of data for digital terrain models. Extensive efforts have been made to capture them automatically using scanners [Burrough, 1986]. Unfortunately, as pointed out by Boehm [1967] and Peucker et al. [1978], while contour encoding can be adaptive to terrain variability (as shown in Figure 1.1), it is very inefficient for many types of computations. Digitized contours, created either manually by table digitizing or automatically by raster scanning, are not especially suitable for computing slope or other derivatives, and so they are usually converted to other formats such as regular grids for further analysis. Another standard cartographic method of portraying the surface is to use a series of parallel profiles showing elevations. A profile trace results from the intersection of a vertical plane with the surface and can be built from a stereo-photo model or from an existing topographic map. The profiles are usually a derived product used for slope analysis.  The most common form of D T M , and in many ways the simplest, is the regular grid with the sample points located at the intersections of two orthogonal sets of regularly-spaced parallel lines. The data are usually obtained from quantitative measurements from stereoscopic aerial photographs made on analytical stereo-plotters, such as the Gestalt Photo Mapper (GPM-2) system [Kelly et al, 1977] or alternatively produced by interpolation from irregularly or  7  Figure 1.1  The five generic forms of DTMs. The contour lines are shown in every case for comparison purposes, (partly after Carter [1988])  8  regularly spaced data points. The simplicity of the grid data structure lies in that only the altitude of the surface at each sample point needs to be measured and stored within the computer. The geographic locations are determined by the grid spacing, and are implicit in the sequential position of the altitude value in the data array. Another advantage of this structure is that the neighbours of a given data point, which are often required in the calculation of some surface parameters for geographic modelling purposes, can be obtained readily from the positions of points within the data array.  The major disadvantage of the regular grid is its inability to adapt to areas of varying relief complexity without changing the grid size, hence, there is a tendency toward data redundancy in areas of uniform terrain.  This problem can be largely solved by the practice of  'progressive sampling' [Makarovic, 1973] in which stereoscopic aerial photographs are automatically or semiautomatically scanned at grids of increasing fineness in areas of complex relief; thus, a variable instead of uniform grid structure is used. This procedure directly links the terrain characteristics (i.e. the local roughness of the terrain relief) with the sampling density.  The TIN is a data structure designed by Poiker (formerly Peucker) and his co-workers [Peucker et al, 1978] for digital elevation modelling that avoids the redundancy of the regular grids and which at the same time is more efficient for many types of computation than systems that are based only on digitized contours. A TIN is a terrain model that uses a set of connected triangular patches based on the triangulation of irregularly spaced  9  observation points (e.g., Delaunay triangulation). These irregularly distributed points are assumed to be sample points from a single-valued surface and they should be critical points on the surface such as peaks, pits, and passes along lines of high information content (e.g., ridges and channel lines) [Peucker and Chrisman, 1975]. The location of these "coordinate random, but surface specific" sample points would, therefore, be dictated solely by the surface being modelled [Peucker et al., 1978].  Discussion and comparison of different data structures for D T M have been given by Boehm [1967], Peucker [1977, 1978], Peucker et al. [1978], Mark [1974, 1975a, 1978, 1979], and Kumler [1992]. Extensive research has also been done on the types of data structures that work best for operations on surfaces [Peucker, 1978]. Boehm, for example, compared the following five forms of topographic information storage: (i) contour points sorted on x; (ii) contour tree ordering; (iii) uniform grid; (iv) uniform grid, incremental altitude; and (v) variable-mesh grid, incremental altitude. Peucker [1977, 1978], Mark [1975a, 1978] and Kumler [1992, 1994] mainly compared the methods of regular grid D E M (surface-random sampling) and TIN (surface-specific sampling). Mark [1975a] and Peucker [1978] showed that rectangular grids need significantly more data points than TINs consisting of surface specific points to be able to represent geomorphometric parameters to the same degree of precision.  Kumler [1992, 1994], however, concluded that for a wide range of different types of terrain, the D E M is not only the most efficient sampling scheme in the absence of prior terrain  10  knowledge, but is also the most accurate for a given volume of digital storage. None of the TINs produced in his experiment represented the surface more accurately than a comparablysized D E M . In every study area he examined, the standard gridded D E M produced by the U.S. Geological Survey (USGS) yielded more accurate estimates of the surface elevation than any of the TINs derived from digitized contours or subsets of points sampled from DEMs. In a few cases, the differences were statistically insignificant, suggesting that a TIN is comparable to the D E M . However, in most cases the DEMs were clearly superior. These same conclusions were reached regardless of which error measure (i.e., mean, maximum, PvMSE, or 90th percentiles of error) was considered. In fact, it appears that the per-point overhead associated with the irregular TIN structure tends to outweigh the advantage of variable resolution. This reverses a longstanding presumption in the GIS field, and could weaken the argument for the TIN model significantly [Goodchild, 1992]. Even the author himself found the above conclusions surprising. However, as mentioned earlier, Kumler only tested the TINs derived from digitized contours or subsets of points sampled from DEMs. Since a T I N was clearly designed to be surface specific, then if a D E M was compared to a TIN derived from the stereomodel itself or with prior knowledge of the terrain, a different conclusion might result.  So why use gridded data? While gridded DTMs may not be, in some cases, an optimal source of topographic information, they are often the only source, as elevation matrices continue to be the predominant type of data distributed by many government mapping agencies.  Although there exist different forms of DTMs or different data structures for  11  DTMs, the regular grid DTMs (regular grid matrices of elevations) are the most easily obtainable and available form, and will continue to be for some time. This is because of the convenience of collecting topographic data using photogrammetric methods, and the ease with which matrices can be handled computationally, in particular for numerical analysis or modelling purposes in raster-based GIS and image processing systems. There is a shortage of algorithms for these kinds of purposes that can be conveniently implemented on irregular structures. Furthermore, the regular geometric structure of grids ensures predictable distances between points, and an even information distribution. This is especially valuable when distance and spatial context are to be considered in terrain classification analysis using D T M [Weibel and DeLotto, 1988]. It is also valuable when data are being assembled for general purposes (as in topographic mapping) with no particular application in view, but many different ones anticipated. Therefore, only the regular grid DTMs will be considered later in this research.  1.2.2  Digital Terrain Models in geographic modelling  Traditionally DTMs have been developed to aid in computerized mapping of topography, the determination of earthwork cut-and-fill volumes for road design and other surveying and civil engineering projects, and military applications. As a result they are usually limited to large scale (cartographically speaking) work with high density sampling of the terrain surface. Nowadays DTMs are being widely used in geographic modelling because of topography's predominant role in the processes being modelled and the increasing availability of digital  12  topographic data. Some examples of geographic models which include a D T M as a useful component include: the simulation of wind flow patterns over coastal and mountainous regions [Tesche and Bergstrom, 1978], geometric and radiometric correction of remotely sensed imagery [Wong, 1984], quantification of landslide-terrain types for geological hazard analysis [Pike, 1988a], energy balance simulation over rugged terrain [Dozier and Outcalt, 1979], the study of mesoscale runoff variability [Krasovskaia, 1988], and global climate modelling. These applications of DTMs to scales other than large scale make it necessary to extend the analysis of the metric performance of digital terrain representations to smaller scales.  Natural phenomena, such as topography, generally possess a property recently made prominent through the study of Mandelbrot's fractals [Mandelbrot, 1977, 1982]~the closer one looks, the more the self-similar detail. The amount of resolvable detail is closely related to scale. Since the scales (in terms of spatial coverage) at which various geographic models are applied vary from global to regional, and the type of topography often varies from one area to another, the user needs to be able to know whether available digital topographic data adequately meet the needs of different modelling analyses.  1.2.3  Topographic data availability  The tremendous development of DTMs is also evidenced by the fact that increasing number of digital topographic data sets are becoming available. Acquisition of machine-processable  13  topographic data is the first step in the creation of a D T M . Digital topographic data may be obtained in a number of ways, from ground surveys, from photogrammetric stereomodels of either aerial photographs or SPOT satellite imagery, from existing topographic maps, or from other systems such as altimeters carried in aircraft or spacecraft and global positioning systems (GPS). In practice, the accuracy and cost aspects are the limiting factors [Blais, 1988]. Currently, direct photogrammetry from aerial photographs or digitizing from contour maps are the usual methods for D T M data collection [Aronoff, 1989]. The development of various new instruments has made some methods more accurate, less laborious, faster, and/or less expensive than others in capturing topographic data.  Worldwide, mapping organizations collect, produce and distribute digital topographic data covering either the entire world, a country, a province, a local municipality, or a specific region.  Since terrain is almost unchanging and its components are simple and readily  observable, worldwide cover could be achieved more easily than in most other thematic surveys.  Nationwide D T M collection projects are currently being pursued in most  industrialized countries [Weibel and Heller, 1990]. For instance, the National Geophysical Data Center (NGDC) in the United States has a variety of topographic data sets available for use in geoscience applications. The data were obtained from U.S. Government agencies, academic institutions, and private industries. The data coverage ranges from regional to worldwide; data collection methods range from digitization of existing topographic map to satellite remote sensing (e.g., SPOT).  14  A selection of different digital topographic data providers or distributors around the world is given in Appendix A. Seven data providers and/or distributors are profiled: N G D C (the U.S. National Geophysical Data Center), USGS (the U.S. Geological Survey), I G N (the French national mapping agency), OS (the Ordnance Survey of the United Kingdom), E M R (the Department of Natural Resources of Canada), W S C (Water Survey of Canada), and T R I M (Terrain Resource Information Management program of British Columbia). It is apparent that each D T M is produced differently—not only in its data acquisition method or data structure, but also in its geo-referencing coordinate system and spatial resolution.  As pointed out  earlier, different resolution is required for applications at different scale. The data providers listed here operate in territories of different size and probably characteristically serve different kinds of projects.  Thus it is possible that their methods of data acquisition and quality  control might vary accordingly. Therefore, it is fair to assume that each D T M would have a different accuracy and would suit a different research objective and operational mandate. Although the purpose to which the D T M will be applied should govern the resolution, accuracy, and precision of the D T M [Carter, 1988], it would appear that most DTMs reflect primarily producers' concerns, not the end-users' objectives. That is, the end-users have little say, in most cases, in the specifications of any digital data collection projects.  Finally, it  should also be noted that despite progress made in the mass production of large matrices and networks of terrain elevations, severe shortcomings in D T M quality remain, and D T M coverage of many parts of the world is either still lacking, of coarse resolution, or prohibitively expensive.  15  1.2.4  Defining accuracy and error  A D T M is only an approximation of the real world topographic surface.  As Miller's  definition clearly indicates, a D T M is a digital representation of a continuous surface. It represents a topographic surface in terms of a set of discrete spatial coordinates obtained by sampling the surface.  Assumptions about surface behavior between sampled points for  reconstructing the original surface from the sampled values is inherent in the definition.  The amount of information transferred from the input data to the data reconstructed from sampled point data determines the fidelity of the digital terrain model—the greater the amount of information transferred, the higher the fidelity [Makarovic, 1972].  The variations or  discrepancies between the original surface and the digital representation of that surface are often referred to as sampling error. How accurately a topographic surface is represented by a D T M essentially depends on the combined interaction of three influencing factors: the sampling density—that is, the size of the interval at which values are picked out in relation to the variability of the surface; the measuring error introduced by sampling an analogue quantity and converting it to digital form; and the interpolation method employed for reconstructing a continuous surface from the sampled discrete data [Tempfli, 1980]. That is, f(x,y) - f (x,y) - 8i(x,y) - f(x,y) sampling measuring interpolation t  ^ ^  Sampling (the determination of the sampling strategy), measuring (the determination of the discrete sample values) and interpolation (the reconstruction of the surface) transform f(x, y)  16  into f(x, y).  The difference between f(x, y) and f(x, y) is referred to as the error of  reconstruction [Tempfli, 1980].  Sampling is the selection of part of an aggregate to represent the whole. A sampling process implements a compression of the input data. In the case of a profile, this process can be formulated as n  /*(*)  where  6(x)  =  E/(*Ax)8(x-*Ax)  is the unit impulse function and  (1-2)  Ax  is the sampling interval. Ideally,  input data should be compressed in such a way that the reestablished information will not be reduced significantly. The reconstruction of the input should be essentially a reverse process of the compression. In practice, however, some loss of information is unavoidable and can usually be accepted.  The discrete signal measured at each sample point is g (x) =fi(x) + m (x), where m (x) is the t  t  t  measuring error and is composed of various contributions such as instrumental errors and observational errors. Sampling error—as usually defined—includes the information loss due to sampling itself (/•) and the error introduced by measurement (g,.).  Reconstruction of the input (i.e. the continuous topographic surface) is achieved by spatial interpolation, a procedure used to estimate the value of properties at unsampled points within  17  the area covered by sampled points. The rationale behind spatial interpolation is the very common observation that, in general, points that are close together in space are more likely to have similar values than points further apart. This is also known as Tobler's First Law of Geography [Tobler, 1970]. There are many different interpolation methods which have been devised. Most interpolation methods are linear with respect to their parameters and are space invariant. They can be formulated as  (1.3)  for an infinite record length. The interpolation is characterized by its 'weighting function' a(t). Instead of using the weighting function in the spatial domain, the linear, space invariant system can also be described in the frequency domain by its 'transfer function' (also called 'frequency response') A(v), the Fourier transform of a(t). The advantage of this transition from the spatial domain to the frequency domain is that the relationship between the system input and output can be defined by a multiplication instead of the convolution in the spatial domain. In the absence of a measuring error, sampling and interpolation can be combined to a single linear (but not space invariant) system and its transfer function can be computed as the ratio of output to input signal amplitudes [Tempfli, 1980].  Research has shown that a complex interpolation does not significantly compensate for the information lost through sampling, but may increase the cost.  The loss of information  depends primarily on the density of sampling. The information which has been lost through  18  sampling cannot be significantly regained by interpolation.  The word 'error' is used here in its widest sense to include not only 'mistakes' but also to include the statistical concept of error, meaning 'variation' [Burrough, 1986] or 'uncertainty.' Two major types of error may be recognized: random and systematic. Random errors are unpredictable variations on either side of the 'true' value. Systematic errors, on the other hand, are consistently biased either above or below the true value. Blunders, a third type of error, are gross errors that are usually easily detected and are assumed to be an insignificant source of error in the context of this thesis. 'accuracy' and 'precision.'  'Error' is closely related to the concepts of  Accuracy can be defined in terms of the magnitude of the  difference between the reported value and the true value, or the value accepted as being true. For data to be accurate, they must be unbiased. systematic error.  The bias clearly represents undetected  Precision has different meanings with respect to different levels of  measurement (i.e., nominal, ordinal, interval and ratio) [Klinkenberg and Xiao, 1990].  For interval/ratio data there is a statistical definition for precision which is usually a measure of the dispersion about the sample mean for replicate observations.  This definition of  precision is sometimes referred to as relative precision [Kirby, 1985]. In mathematics or computer science, precision refers to the ability to measure or represent data with a certain number of significant digits, a number which should be commensurate with the data collection process, and the nature of the underlying phenomenon. This form of precision is sometimes referred to as absolute precision and is the one used for DTMs. It is clear that  19  with DTMs both high accuracy and absolute precision are required in order to have a good quality product. For nominal/ordinal data, such as land-use information, precision defines 'detail of classification' [Campbell, 1987]. This definition is obviously irrelevant to DTMs.  Following the above definitions, D T M accuracy refers to the degree to which a D T M differs from the real surface which it supposedly represents. Understanding of D T M errors requires not only an accuracy measure for each individual point but also an evaluation of the spatial variation of the inaccuracy. Only by looking at the distribution of the inaccuracy can we quantify the nature of the error. A D T M may be said to be "inaccurate" with respect to a specific point on the surface if the information presented is different than ground truth. This is true with respect to an assigned location and in a non-statistical sense.  If sampling  repeatedly over the surface reveals errors of a similar sort, that is, if the errors are revealed to be systematic, then the surface represented by the D T M is said to be inaccurate or, more specifically, biased. On the other hand, if the errors are revealed to be random, then it would be more correct to refer to the D T M as relatively imprecise.  It is important that the D T M , in whatever form, should provide an accurate representation of the real terrain for intended application. It is difficult, however, to test how accurately any digital model actually captures the real world, for normally we do not have any independent model of the real world against which to test our digital model [Carter, 1988]. The only 'true' situation in the context here is the terrain surface itself, and since this condition of 'absolute' accuracy cannot be attained by measurement, the accuracy of any field survey data,  20  photogrammetric measurement  or completed map can be assessed only by making  comparisons with measurements made to a known higher order of accuracy. Thus, when defining accuracy, in most cases this is strictly relative rather than absolute accuracy [Shearer, 1990]. Perhaps because of this there are only a few developed procedures to account for errors in a D T M .  1.3  Statement of the Problem  At present, there exist few geographic information systems which routinely estimate and report measures of the error inherent in any of their products [Goodchild and Wang, 1988; Goodchild, 1992]. Along with many other automated processes, there seems to be a tendency to lose sight of the quality and accuracy of the original data and the effects of subsequent manipulation on those data [Theobald, 1989]. Only recently have the problems of error analysis attracted the attention they deserve. For example, the accuracy of spatial databases was chosen as the subject of the first research initiative of the National Center for Geographic Information and Analysis (NCGIA) in the United States [Goodchild, 1989]. It has been realized that the error pattern of a digital product is important, both for the information it yields itself and as input into more advanced manipulation and analysis. Inaccurate products can lead to false inferences, bad decisions, and even litigation [Goodchild, 1988]. DTMs are usually a fundamental component of spatial database in most GISs and they are used in a very wide range of geographic analyses.  Therefore, there is a need to develop a set of  procedures to assess, as much as possible, the performance of DTMs in representing and  21  characterizing the actual topographic surface so that the much broader question of error analysis in spatial data can be fully addressed.  Some previous research on D T M did address the issue of accuracy and the issue has been dealt with by many other authors [Bethel and Mikhail, 1983; Carter, 1989; Caruso, 1987; Felgueiras and Goodchild, 1995; Fisher, 1991; Frederiksen, 1981; Hannah, 1979; Leberl, 1973; L i , 1991; Shearer, 1990; Tempfli, 1980; Tempfli and Makarovic, 1979; Torlegard et al., 1986; Wehde, 1982]. Classically, the accuracy of a D T M is assessed by comparison of height values derived from the D T M with height values of corresponding check points obtained by measurement of the terrain surface to a known higher order of accuracy. For example, a D T M derived from digitized contours could be checked by measurements made by field survey or photogrammetric measurement. The data obtained from such a comparison will consist of height differences at the tested points which may then be analyzed to yield statistical expressions of the accuracy, such as the root-mean-square error or the standard error [Shearer, 1990]. The accuracy measures were expressed as either an R M S E of the vertical measurement of height or, following the specification of topographic map accuracy standard for contour lines and spot elevations, by stating, for example, that "at least 90 percent of all elevations determined from the contours shall be accurate within one-half the contour interval, and the remaining 10 percent shall be accurate within one contour interval." This kind of accuracy specification is useful for applications in automated mapping and surveying and civil engineering, in which the scales are usually large (1:1,000 to 1:50,000). But in geographic modelling applications, scales are relatively small and many other aspects  22  of topography are of interest as well. So the issue of the 'level of detail' becomes important to the user. The map can be checked and found to be within tolerance (the so-called required accuracy), but a specific user still does not know whether or not it fits the needs of a particular analysis [Ostman, 1987]. Therefore, the concept of spatial resolution is important in error/accuracy analysis because a model could be useless because of inadequate detail for the particular purpose [Klinkenberg and Xiao, 1990] or it could be locally inaccurate but regionally accurate. The construction of a D T M error model that goes beyond what we have now is, thus, very necessary.  Accuracy of a D T M entails not only the average departure of points in the D T M from the real ground surface, but also involves the distribution and the non-random spatial component of errors, which have rarely been formally examined. It is difficult to extract much meaning from a single global accuracy measure such as R M S E because data uncertainty will almost always vary spatially across the elevation surface.  That is, the nature of the error will  determine the usefulness of the D T M for a particular purpose. Furthermore, the spatial variation of error is influenced not only by the characteristics of the topography being modeled but also by the algorithm that produces the elevation model [Wood and Fisher, 1993]. For example, the types of errors associated with interpolating photogrammetricallyderived spot elevations probably differ from those produced by interpolating contour data [Carter, 1989; Hannah, 1979; Wood and Fisher, 1993].  Whether or not a D T M is acceptable for a given application depends upon the objective of  23  the research and the precision required, as well as the resolution of the sampling method and its sensitivity to the variability of the terrain [Theobald, 1989].  So, a selected spatial  resolution should reflect a desired level of detail guided by the actual geographic detail of the phenomenon and the application. According to the sampling theorem, terrain variations which correspond to a wavelength less than twice the sampling interval (the corresponding frequency is called the Nyquist frequency) are not represented. Therefore, the detection of a feature is possible only if the sampling rate is at least twice as fine as the size of the smallest feature to be detected. Specification of the Nyquist frequency presupposes that the structure of the perturbation in question is known (i.e., usually, in theory, sinusoidal). In terrain, the features are apt to be irregular. In practice, more than the theoretic minimum information is required, then, to characterize them. This implies that one must know the spatial size of the features in which one is interested before one starts to collect data. Additionally, one cannot expect a collection of digital geographic data to be suitable for all kinds of problem [Tobler, 1988]. The interaction between sampling frequency and terrain variation is, thus, important and needs to be understood because terrain features captured at a certain resolution are dependent upon the relative prominence of large- and small-scale topographic features.  In summary, it is evident that DTMs are becoming more and more important in various geographic modelling applications and digital topographic data are becoming more readily available.  Meanwhile, there is a growing realization that error analysis is important in  determining the usefulness of a D T M . However, current measures of D T M inaccuracy are  24  inadequate. Therefore, there is a need to develop new methods of investigating the error in DTMs—methods that are spatially explicit and relate the spatial pattern of error to the characteristics of topography.  1.4  Thesis Outline  This introductory chapter has defined the problem and provided some background information on digital terrain modelling. The rest of the thesis is organized as follows:  Chapter 2 gives a literature review on previous research on D T M error modelling and accuracy estimation.  Sources of error in DTMs are first identified and some gross-error  detection and correction techniques are introduced. Then, common terminologies used in D T M error modelling are described and general empirical and analytical accuracy estimation approaches are reviewed.  Chapter 3 describes the collection of some digital topographic data and the implementation of a true multi-scale D E M database for a study area chosen for this research.  Some pre-  processing such as data extraction, conversion and interpolation is done for the proper registration of the multiple data sets. A number of observations are made regarding D E M errors and their spatial distribution based on some preliminary test results of comparing DEMs of differing resolutions.  This Chapter concludes with the statement of the thesis  hypothesis and prepares for further exploration of the relation between D E M errors and  25  terrain characteristics.  Chapter 4 discusses the methodologies used for D E M error analysis. First, characterization of the variation and complexity of terrain is discussed. Both local and global characteristics are examined.  Local characterization is made by means of general geomorphometric  parameters such as local relief and slope. Global descriptions are made using grain measures, spectral analysis, nested analysis of variance, and fractal analysis of DEMs.  Then, a  multivariate statistical analysis method based on local roughness measures for automated hierarchical terrain classification is introduced.  Chapter 5 shows the results of the topographic characterization (both global and local) and classification of the study area surfaces and provides interpretations of the characterization results.  Chapter 6 presents a new approach to the problem of D E M error modelling. Based on the results from Chapter 3 ( D E M errors) and Chapter 5 (topographic characterization), quantitative relations between the extent and the spatial pattern of D E M errors, spatial resolution, and the terrain characteristics are analyzed. The hypothesis proposed in Chapter 3 is tested in this Chapter.  Chapter 7 concludes the thesis with a summary of the methods presented, the study results and recommendations for future research.  26  2.  ERROR IN DIGITAL T E R R A I N M O D E L S  2.1  Overview  Any particular point in a D T M may not be actually at the elevation recorded for it, and the sources of this error may be multitudinous [Fisher, 1991]. To evaluate the accuracy of a D T M requires consideration of how the D T M is created.  The D T M created directly from an existing topographic map can be no more accurate than the source map from which it was derived [Carter, 1988]. Sources of inaccuracy include the original survey by field workers or photogrammetrists, the expertise of the cartographer who generated the map, and of the digitizer operator who converted the contour from analogue to digital form, or, if generated directly from aerial photographs, of the photogrammetrist. The error may be caused by faulty calibration of the measuring instrument, limited precision of the data format, human mistakes in the reading of the instrument and the copying of the figures, or poor interpolation. Therefore, in generating a D T M from a topographic map by digitizing, at least three stages are present when error may be introduced: map compilation, D T M generation from the map, and comparison of D T M elevations with those directly measured from the map. The last of these and a combination of the first two also occur if the D T M is generated directly by photogrammetry [Fisher, 1991].  Error in DTMs is widely acknowledged, and has been the subject of some studies. The rest  27  of this chapter will summarize the previous research in the areas of error detection and rectification in digital terrain models, and D T M accuracy estimation and error modelling.  2.2  Error Detection and Rectification in Digital Terrain Models  2.2.1  Errors in USGS gridded DEMs  As mentioned in section 1.2.3, the USGS is one of the two major D E M data producers for the United States.  Because of the wide use and study of USGS D E M data sets, errors  detected in these data sets are specifically discussed in this section. The USGS D E M User's Guide specifies that the root-mean-square error (RMSE) should be included with all their D E M data [USGS, 1987]. Accuracy testing of DEMs by the USGS consists of comparing the known elevations of at least twenty control points (usually acquired from a map) to the elevations of these points as interpolated from the D E M . The RMSE is then calculated in the Z dimension. The RMSE is defined as n  £ « y  RMSE =  where  8  Z  2  (2.1)  i=l  is the elevation difference at n individual test points.  In addition, it should be noted that most USGS source maps are commonly stated as 28  conforming to the National Map Accuracy Standard, which states that "at no more than 10 percent of the elevations tested will a contour be in error by more than one half the contour interval," as established by comparison with survey data. Based on the source of the D E M and, in part, on the R M S E calculation, the USGS identifies three levels of D E M quality [USGS, 1986]. • Level I DEMs are of the lowest quality and contain no points with elevations in error by more than 50 m. The maximum R M S E permitted for the whole elevation model is 15 m. DEMs derived from profiling high-altitude aerial photography, such as was done with the Gestalt Photo Mapper II (GPM-2) instruments, typically fall within this level. • Level II DEMs have a maximum R M S E of 7 m (half a contour interval) and contain no points with elevations in error by more than twice the contour interval of the source map. DEMs acquired by contour digitizing typically fall within this level. • Level III DEMs have a maximum R M S E of 7 m and contain no points with elevations in error by more than the contour interval of the source map. Digital Line Graph (DLG) DEMs, which incorporate hypsographic and hydrographic data, fall within this level.  Caruso [1987] discusses the standards employed by the USGS to evaluate the accuracy of 7.5-arc-minute quadrangle gridded DEMs. The USGS classifies errors in their DEMs into three categories. 'Blunders' are gross errors that sneak into DEMs at creation. They should be easily detected and are, therefore, usually removed from the D E M in the editing stage  29  prior to general release and rarely get into a published D E M .  'Systematic errors' are non-  random errors associated with specific procedures that introduce biases and artifacts into the DEM.  A n example is 'striping,' an artificially high level of spatial autocorrelation in  elevation values along one axis of the D E M . Although such systematic errors are often easily detected, they are not always correctable. 'Random errors,' the third error category, result from measurement error and, unlike systematic errors, reduce precision but do not introduce bias.  Carter [1989] also identifies a number of different basic types of error in USGS gridded DEMs but presents an alternative taxonomy. He defines 'relative' and 'global' error based on the extent of the error. The former refers to a situation in which a number of single elevation values are obviously inconsistent relative to their neighbors which, as a group, give an adequate representation of the surface. The latter—global errors—are thought of as those situations where the general form of the land surface is adequately defined by the digital data, but the total model departs significantly from the source map or the actual land surface. Global errors are particularly problematic when matching neighboring 7.5-arc-minute DEMs. The author identifies four types of relative error common in USGS DEMs and demonstrates how these are often difficult to detect and correct.  United Kingdom's Ordnance Survey often supplies the R M S E of its digital contour data in addition to the likely R M S E of interpolation as well.  Stereo imagery from SPOT Image  Corporation is now capable of supporting the generation of a D E M as a standard product on  30  a 10 m grid [Gugan and Dowman, 1988]. Studies have shown the error in these products is less than 10 m R M S E in all three dimensions [Swann et al, 1988].  The inference of the error reporting used by the USGS, the United Kingdom's OS, and, in fact, probably all mapping organizations, is that the error at any point occurs independently of that at any other point. That is, aspatial statistical measures form the basis of their error reporting procedure. This is obviously not the case—'striping' and Carter's 'relative errors', as identified above, are two examples where the error is apparently spatially related.  2.2.2  Gross-error detection and correction  Hannah [1979] presents a method for detecting and correcting gross errors in DEMs produced by computer correlation of stereo images—one of the typical methods of matching pairs of corresponding image points for stereoscopic view of the terrain. The basic problem to be solved in automatic digital stereo mapping is the determination of the precise geometric positions of corresponding points (features) on the focal planes of the stereo pair. Once these points have been matched up, it is a straightforward process to photogrammetrically intersect the corresponding rays through the points using the appropriate sensor model to produce digital terrain data in terms of X Y Z coordinates [Panton, 1978]. Algorithms have been developed by Hannah [1979] to detect and correct errors resulting from having mismatched sub-areas of the two images, a problem which can occur for a variety of image- and terrainrelated reasons, including sensor noise, low contrast in portions of the images, relief-induced  31  distortions between the images, and the presence of ambiguities as a result of identical objects or highly periodic textures on the terrain. Based on the assumption that any data points causing sharp discontinuities in the elevations or sudden changes in the surface slopes can be suspected of being errors, the algorithms focus on the use of constraints on both the allowable slope and the allowable change in slope in local areas around each point. Relaxation-like techniques are employed in the iteration of error detection and correction. This method succeeds in identifying and correcting gross errors. However, problems may be encountered along steep ridge lines and an extreme level of surface smoothing sometimes occurs.  Norvelle [1992] presents some techniques used in the U.S. Army Engineer Topographic Laboratories for enhancing the accuracy of DEMs extracted automatically from stereo images using digital correlation methods.  Window shaping operations are performed within the  image correlation process to reduce the size and shape differences between small corresponding areas on stereo images and contribute to a more accurate determination of stereo image correspondence and, consequently, D E M values.  A new D E M correction  technique—the Iterative Orthophoto Refinements (IOR) method, is also introduced. It can be used to edit and correct D E M values based on the geometric relation between pairs of orthophotos and the D E M used to produce them.  Bethel and Mikhail [1983] discuss a method for detecting gross errors, or blunders, in DTMs based on mathematical modelling of the terrain surface by tensor-product B-splines. The  32  authors view this procedure as the first stage of an on-line quality assessment system for DTMs.  The procedure is based on fitting the tensor-product of two one-dimensional B -  splines locally over the D T M . Residuals are then computed and a statistical test is performed to yield an overall assessment of the presence of outliers (i.e. gross errors) in the D T M . Specific outliers are then identified in a candidate subgroup of the residuals and flagged as gross errors.  Tests performed on a set of synthetic surfaces generated using known  mathematical functions perturbed with pseudo-random numbers of known variance and actual DTMs reveal that this approach is especially effective in the case of multiple blunders of relatively large magnitude.  2.3  Accuracy Estimation and Error Modelling  2.3.1  Global accuracy measure  2.3.1.1  Terminology  As mentioned before (section 1.3), most D T M quality evaluation techniques involve estimating a global accuracy with regard to a reference. In empirical tests on the accuracy of DTMs, a set of check points is used as the 'ground truth' and then the points derived from the constructed D T M surface are checked against the corresponding check points. After that, the differences of the two heights at each test point are obtained. These differences are used to compute statistical values such as the mean difference and standard deviation of the  33  differences which are used as a measure of D T M accuracy [Li, 1991].  Shearer [1990]  provides the following terminology given the height differences as represented by v (residuals at n individual test points being v ,  v„):  ;  • Algebraic mean The mean is computed as follows:  Algebraic mean  (2.2)  v •=  This expression takes into account the sign of the residuals, and will tend to zero if there are similar magnitudes of positive and negative values. If the differences are truly random, a result of y=0  w o u  l d gi  v e  n  o  indication of what may be quite large  differences in height at individual points. On the other hand, if a significant positive or negative value resulted from this computation, this would indicate that there was a systematic component in the residual values, which means that one surface was systematically higher or lower than the other (i.e., biased).  The three possible  situations which may result from the comparison of height values in a D T M with 'true' heights on the terrain surface are illustrated in Figure 2.1. The diagrams show profiles only.  Situation A indicates that one surface is uniformly higher than the other. Situation B illustrates quite random values of v in terms of both magnitude and sign. In the  34  Figure 2.1  Comparison of D T M surfaces with the 'true' terrain surface.  35  case of situation C, the magnitude of v varies quite considerably but majority of differences are in one direction, which reflects a combination of random and systematic effects.  • Mean absolute error The mean absolute error is computed in the same way as for the algebraic mean, but the sign of the residuals is ignored. That is: " l l ME = E — ~ v  Mean absolute error  (2.3)  Because the sign is not considered, the result will always be greater than zero and will reflect the average magnitude of the residuals. Shearer [1990] notes that "50% of the residual values will lie in the range -ME to +ME."  • Root-mean-square error The problem of having positive and negative values cancel each other out is also obviated through the use of the root-mean-square error (RMSE):  Root-mean-square error RMSE =  N  n-l  (2.4)  Properly, (n-l) rather than n is used in this case because in dividing by (n-l) rather  36  than n, an unbiased estimate of population parameters is obtained. However, some authors use n because if the number of points tested is large, the difference in the result will be of no significance.  Assuming y=0  >  a n <  ^  m  a  t  there  l s  a  normal  distribution, then 68.27% (approximately two-thirds) of the residual values will fall in the range -RMSE to +RMSE. The term standard error is frequently employed in mapping to describe this expression of accuracy.  • Standard deviation The standard deviation of residuals is also a commonly used statistical expression, and is computed as follows:  Standard deviation  5  N  t°i  n  -1  (2.5)  Again, when the number of tested points is large, the above expression can be, and often is, rewritten in the form  S=  (2.6)  i=l  n  or  37  S = \I(RMSE - V ) 2  2  -  (2  7)  It is clear from the above that if V=Q , then S = RMSE, and the root-mean-square error can replace the standard deviation. This explains the confusion which may arise from the fact that both root-mean-square error and standard deviation are often referred to as the standard error in the context of mapping accuracy (see below).  2.3.1.2  Topographic map accuracy standards  As DTMs are often derived from topographic maps, some topographic map accuracy standards need to be reviewed. There are two principal methods of representing height information in conventional mapping, both of which may be used as input data for DTMs: spot heights and contours. In both cases, the accuracy has to be considered in terms of both horizontal position and vertical height value. For example, a spot height may be incorrectly positioned but correct in height, correct in position but incorrect in height, or, most likely, incorrect in both position and height [Shearer, 1990].  Spot heights are normally assessed with respect to accuracy in terms of R M S E values related to horizontal position and height. Contours are linear features, and their accuracy is not quite so simple to define as point-related data. The common practice is to establish a tolerance (RMSE) value for permissible errors in contours, and then to check that points on the map fall within such a tolerance. The accuracy specified is determined with reference to the interrelated factors of scale, terrain slope and vertical interval, and is based on the R M S E 38  values established for points. The normally accepted standard for contours is about three times that which can be attained for measured points. It is normally expressed in terms of probability related to the vertical interval. For example, "90% of tested points should be within one-half of the vertical interval of their true value."  In the United States, the definitive accuracy standard for topographic maps is the National Map Accuracy Standard (NMAS), which is currently applied to the USGS topographic map series.  This standard is based on compliance with a horizontal and a vertical accuracy  standard which defines the limit of acceptable error in the horizontal and vertical map dimensions. Compliance testing is based on a comparison of at least twenty well-defined map points relative to a survey of higher accuracy. The term "well-defined points" pertains to features that can be sharply identified as discrete points. The horizontal accuracy standard states that, at most, ten percent of the map points tested may have a horizontal error greater than 1/30 inch for map scales smaller than 1:20,000, or 1/50 inch for scales of 1:20,000 or greater. The vertical accuracy standard states that at most ten percent of the map points tested may have a vertical error greater than one-half of the contour interval of the map. In checking elevations taken from the map, the apparent vertical error may be decreased by assuming a horizontal displacement within the permissible horizontal error for a map of that scale [Thompson and Davey, 1953].  The American Society of Civil Engineers [1983] has proposed the Engineering Map Accuracy Standard (EMAS) as an alternative to N M A S for large-scale maps. E M A S gives a statistical  39  expression of map accuracy based on errors in the X Y Z coordinates of at least twenty welldefined and well-distributed sample points. The mean and the standard deviation (referred to as the mean error and the standard error respectively) are calculated for each of the x, y and z dimensions. Compliance testing for E M A S is performed by comparing the computed mean errors and standard errors in the x, y and z dimensions to their respective maximum acceptable limits (referred to as the 'limiting' errors). A t-test and a yf-test are performed for the degree of bias and the degree of imprecision for each dimension. Specific values for the limiting mean error and the limiting standard error are not provided as they are assumed to be application-specific. E M A S is intended to facilitate accuracy testing for a variety of special-purpose maps. However, the limiting horizontal and vertical standard errors can be computed from the horizontal and vertical map accuracy standards of N M A S . The limiting vertical standard error is defined by = 0.60SVMAS  (2.8)  where VMAS = the vertical map accuracy standard in Z (i.e. one-half the contour interval). The limiting horizontal standard error is defined by s  =s o  x  = 0.466 CMAS  (2.9)  Jo  where CMAS = the circular (or horizontal) map accuracy standard in X , Y expressed at full (ground) scale (i.e. the appropriate fraction, 1/30 inch or 1/50 inch, weighted by the denominator of the scale representative fraction).  The constants in equations (2.8) and (2.9) are derived from the 90th percentile of a univariate  40  and bivariate normal distribution respectively [Rosenfield, 1971]—two major types of distribution used in mapping. Table 2.1 lists the probability of errors for different multiples of standard deviation in the univariate case. Tables 2.2a-b give the probability of errors in the bivariate case using two different methods: the mean square error of position (MSEP) and the circular standard error (CSE).  The information in the above Tables is derived from  cumulative standard normal distribution table values, which assume uncorrelated error, available in most statistics books. From Table 2.1, for example, it can be seen that a 1.645 multiple of the standard deviation will give a ninety percent confidence level for univariate case. Whereas from Table 2.2b, it can be seen that for the circular bivariate case, a 2.146 multiple of the standard deviation will give a ninety percent confidence level. These relations are used to define three classes of map accuracy by the American Society of Photogrammetry [1985], whose proposed accuracy standard for large-scale maps was similar to E M AS until changed by USGS.  Merchant [1987] gives a more detailed discussion of the American  Society of Photogrammetry standard. The limiting vertical standard error for Class 1 maps is defined in accordance with the N M A S vertical accuracy standard (i.e. one-half the contour interval). The limiting horizontal standard error is more stringent than the N M A S horizontal accuracy standard, and corresponds approximately to a C M A S of 0.54 mm (equivalent to about 1/47 inch) expressed at full (ground) scale of the map. The limiting standard errors for lower-accuracy Class 2 and 3 maps are defined by multiplying the Class 1 limiting standard error by a factor equal to the accuracy class. That is: • A Class 2 map is one with a limiting R M S E twice that of a Class 1 map. • A Class 3 map is one with a limiting R M S E three times that of a Class 1 map.  41  Table 2.1  Probability of errors in the univariate case  Multiple of standard deviation  % of points falling within this multiple (probability)  Common term  1.000  68.27  Standard error  1.645  90.00  Linear map accuracy standard  2.570  99.00  3.000  99.73  'Maximum' (Near certainty) error  42  Table 2.2a  Probability of errors in the bivariate case (MSEP method)  Multiple of standard deviation  % of points falling within this multiple (probability)  Common term  1.000  63.21  Mean square error of position  1.520  90.00  Mean square error of position  2.140  99.00  (Rejection level)  2.470  99.78  Mean square error of position  Table 2.2b  Probability of errors in the bivariate case (CSE method)  Multiple of standard deviation  % of points falling within this multiple (probability)  1.000  39.35  Circular standard error  2.146  90.00  Circular map accuracy standard  3.035  99.00  (Rejection level)  3.500  99.78  Circular near certainty error  43  Common term  Merchant [1987] presents a revised version of the American Society of Photogrammetry standard, which is referred to as the American Society of Photogrammetry and Remote Sensing (ASPRS) spatial accuracy specification for large scale topographic maps.  The  revised standard expresses accuracy in terms of a limiting R M S E in each dimension. Hence the systematic bias associated with the mean error is not removed in computing map accuracy. Hypothesis testing for precision is performed by comparing the calculated R M S E to the limiting horizontal or vertical R M S E . The limiting horizontal R M S E is computed in the same fashion as the limiting standard errors in the American Society of Photogrammetry standard. However, the limiting vertical R M S E is somewhat more stringent, and is defined as one-third of the contour interval for well-defined points.  Another alternative to N M A S to specify the accuracies for height representation on maps is provided by Koppe's formula, which accounts for the effects of terrain slope on mean vertical or horizontal error [Shearer, 1990]:  Mean vertical error Mv = ±(A + B * tan(a))  (2.11)  Mean horizontal error Mh = ±(B + A * cos(a))  (2.12)  where a is the slope angle. Coefficients A and B are empirically-derived constants for a particular map in relation to the scale and accuracy requirements, where A represents the vertical error when the terrain slope is zero and B is related to the horizontal error at a given point. The equation (2.11) shows that a given degree of horizontal error will produce a  44  greater degree of vertical error as the terrain slope rises. Thus any increase in the terrain slope is associated with an increase in vertical error. Regression analysis can be used to estimate A and B based on the observed vertical error and terrain slope associated with a sample of points, although this may not always produce a satisfactory answer. Once the coefficients in Koppe's formula have been estimated for a particular map, the horizontal errors in points or contour lines can be determined. Koppe's formula offers a number of advantages over N M A S as a statement of map accuracy. In contrast to Koppe's formula, N M A S is simply a statement of compliance with an accuracy test rather than a statistical expression of accuracy and it ignores the effects of terrain slope on vertical accuracy.  Standards for Canadian topographic maps do not appear to be as well defined. The Canadian Surveys and Mapping Branch designs its maps so that "on Class A maps the contours are accurate to one-half a contour interval." If it is assumed that this represents a 95% confidence level, the allowable root-mean-square height error can be estimated as 0.255 times the contour interval.  Points on Canadian Class A maps are to appear within 0.5 mm of their true  positions as map scale—this would represent 25 m on the ground for 1:50,000 scale maps [Mark, 1974].  2.3.1.3  Effects of check points on the reliability of accuracy estimates  In the case of the above mentioned empirical standards on D T M accuracy, it is clear that the final D T M accuracy figures, such as the mean error and the standard error estimated from  45  the test results, are definitely affected by the characteristics of the set of check points which were used as the ground truth. L i [1991] has investigated the effects of the check points used in the empirical tests on the reliability of the D T M accuracy estimates.  The concept of  reliability in this context might be defined as the degree of correctness to which the D T M accuracy figures have been estimated. In any case, it is apparent that the D T M accuracy results obtained from empirical tests are not absolutely certain and one can accept these results only to a certain confidence level.  A set of check points can be characterized by three main parameters: (i) the sample size (i.e. the number of test points); (ii) the measurement accuracy of the sample points; and (iii) the distribution of the test points. It seems obvious that the inclusion of more check points in the test will lead to a more reliable result, although more points may lead to more chance of correlated error. However, a large number of check points can be costly to produce and, in some cases, even impossible to provide in the context of D T M accuracy testing. Therefore, an important question which arises is whether a large number of check points is necessary. If not, then what is the minimum number of check points required for a given degree of reliability for the accuracy estimates? From statistical theory, the sample size required for the accuracy estimates depends upon the variation associated with the random variable. The smaller the variation, the smaller the sample size that is needed to achieve the degree of accuracy required for the accuracy estimates.  The required minimum sample size also  depends on the degree of accuracy requirement itself and the spatial autocorrelation of the data. Some equations are given by L i to provide a general guide to the values required in  46  practice.  The reliability of the estimated D T M accuracy figures is also affected by the accuracy of check points, which is usually specified in terms of R M S E or standard deviation. Only if the sample size is increased and the accuracy of the check points is improved at the same time, can the reliability of the final estimates be improved. Another important concern with the check points used for the D T M accuracy test is their distribution. That is, locations and patterns. In some tests, the check points are in a grid pattern. The question is raised as to whether such a pattern is suitable. Ley [1986] states that "an accuracy assessment of a D T M should be based on a sample of heights taken from the entire model." He also points out that such "a sample of points should include both the recorded (measured) and interpolated heights." L i suggests that the check points be sampled randomly from the entire testing area. It is worth noting that although geodetic and other higher order control points would be the most accurate vertical check points, they are very sparsely distributed and, therefore, would not satisfy the minimum sample size requirement. The development of GPS technology may improve this situation, but not without its limitation.  2.3.1.4  Interpolation accuracy  The goodness of fit of a D T M to reality depends on the terrain itself, on the sampling pattern «  and density, and on the method of interpolating a new point from the measurements [Leberl, 1973] (see Figure 2.2).  47  Sampling density Measuring pattern  4  >  Method of Interpolation  Type of terrain •  •  Accuracy of terrain representation  Figure 2.2  Factors influencing the performance of a D T M .  48  Interpolation accuracy depends on the nature of the interpolation algorithm, the complexity of the underlying surface, and the distance between control points relative to the frequency of spatial variation. Many of the earliest studies on interpolation accuracy focus on the effects of the number and spatial distribution of sample points [Veregin, 1989]. According to Morrison [1969], the ability of a given set of sample points to capture the variation present in the surface is inversely related to the degree to which the points are clustered in space. In his opinion, systematic sampling is preferable to stratified random sampling, and stratified random sampling is preferable to random sampling. These conclusions are based on the notion that a dispersed sample is more capable of capturing surface variation and the sample is not at the periodicity of the terrain.  A random sampling means that each element in the population has an equal chance of being selected.  In a spatial context, the simple random sample is derived by establishing a  coordinate system with some specified resolution, and taking two random numbers to establish the location of each point. The spatial systematic random sample, on the other hand, establishes an initial location and a fixed interval, and points are selected at that fixed interval in both directions across the area to be sampled. The systematic sample overcomes one of the perceived problems of the simple random sample—that it may not give good coverage within the spatial sampling space.  On the other hand, if there is some regular  periodicity in the population, systematic sampling is likely to reproduce only a part of that periodicity, or alternatively, to bias the sample by being overrepresented by the periodicity [Clark and Hosking, 1986]. In a stratified random sampling design, the elements of the  49  population are allocated into subareas or strata before the sample is taken, and then each stratum is randomly sampled.  Rhind [1971] argues that those conclusions by Morrison are unwarranted, since there is no guarantee that a more dispersed sample will yield a more accurate interpolated surface. Surface-specific sampling, in which sample points are selected if they define critical points in the surface, may exhibit a high degree of spatial clustering but produce more accurate results than systematic sampling. Accuracy may also be affected in systematic sampling if periodicity is evident in the surface, since systematic random sampling method rests on the assumption that there is no periodic trend in the ordering of the individual elements [Clark and Hosking, 1986]. Thus, the appropriateness of a given sampling method depends, in part, on the nature of the surface.  The effects of the nature of the surface on interpolation accuracy have been explored by Morrison [1968; 1971].  In an empirical test, four different synthetic surfaces were  constructed, each described by a finite set of mathematical terms. Each of the four surfaces was sampled using the six sampling methods described in Morrison [1969], that is, unaligned/aligned random sampling, unaligned/aligned stratified random sampling, and unaligned/aligned systematic sampling. For each surface/sampling method combination, four samples of different sizes (25, 49, 100, and 144) were selected, yielding a total of 96 sets of sample points.  For each of these sets, interpolation was performed with ten different  interpolation methods.  Interpolation accuracy for each of the resulting 960 interpolated  50  surfaces was assessed by comparing observed and interpolated values for a set of 100 grid points. Two indices of accuracy were defined—the correlation between the observed and interpolated values, and the standard deviation of the residuals.  For each of the two accuracy indices, three-way analysis of variance was employed to test the effects of interpolation method, sample size and sampling method on interpolation accuracy. For the second index of accuracy, the sampling and interpolation methods were observed to have the greatest effect on accuracy. Sample size and the first-order interactions between the three factors were observed to be only marginally significant. Similar results were obtained for the alternate index of accuracy (i.e., the correlation coefficient), except that sample size was observed to have a significant effect. Comparison of interpolation accuracy for different sampling methods revealed that higher levels of accuracy were associated with unaligned methods.  Variations in accuracy associated with sampling method were also  observed to have less significant impact as surface complexity increased, due to an overall decline in accuracy for more complex surfaces. A n increase in accuracy was also observed as sample size increased, although this effect, again, was found to be less significant for complex surfaces. It should be noted that some of Morrison's findings are partly attributable to the use of synthetic surfaces exhibiting an unrealistically high degree of smoothness (i.e., without any breaks). This strong condition on the surface smoothness limits the usefulness of the findings. In the real world, irregular functions with breaks such as those encountered in topography are, in fact, more frequent than smooth surfaces.  51  The factors considered by Morrison in his study have since been examined in greater detail by other authors. Shepard [1984], for example, examined the effects of variations in sample size on interpolation accuracy. The author interpolated the grid point values for a 60 x 64 grid using a sample of between 4 and 272 points randomly selected from the grid. The relative R M S E was computed for all 3840 grid points on each interpolated surface. The relative R M S E is defined as the ratio of the R M S E to the standard deviation of the observed grid point values. Regression results indicated a close fit (r = 0.997) between the relative 2  R M S E and sample size according to a relationship of the form: RMSE = 2 . 8 3 n "  (2.13)  056  r  where:  RMSE = the relative R M S E ; and r  n = the sample size. Hence as n increases, RMSE decreases at a declining rate. r  In a similar study, MacEachren and Davidson [1987] examined the effects of sample size on interpolation accuracy for surfaces of varying complexity. Six surfaces were defined, each of which portrayed topographic elevation values for a 103 x 103 grid. Eight samples of points were obtained for each surface using unaligned stratified random sampling in which sample size varied between 100 and 2025 points. Accuracy was calculated as the mean absolute deviation between actual and interpolated values for all 10,609 grid points on the surface.  The mean absolute deviation, d, is defined in the same way as mean error in  Equation 2.3. Regression results revealed a close fit between d and n according to an equation of the form 52  d =  an'  b  (2.14)  Coefficient b was observed to be relatively constant over all surfaces, with a mean value of approximately 0.3. Coefficient a was found to be directly related to the range of elevation values on the surface, such that the value of a was lower for surfaces with smaller elevation range. The authors examined the spatial distribution of error and observed that errors tended to be more clustered in space for smaller sample sizes.  The study of the effects of various factors on grid D T M interpolation accuracy by Leberl [1973] was based on a numerical test. Leberl compared six terrain models created using different interpolation procedures and with grid spacing varying from 10 m up to 450 m. For grid DTMs, comparison of the different interpolation algorithms lead to the conclusion that "linear prediction" (or "least squares interpolation"), "moving averages" and "patchwise polynomial interpolation" provided the highest accuracy.  Consideration of computational  complexity, however, indicated that the method with "moving averages" was comparatively rather expensive, so that the other two remained as the most effective interpolation methods. In general, however, the difference between interpolation methods was fairly small. In a comparison using the 2 x 2, 4 x 4, or 6 x 6 surrounding reference points for interpolation of a new point, it was concluded that no gain might be expected by using more than the 4 x 4 reference points. On the other hand, use of 4 x 4 points tended to be slightly superior to the use of only the four closest reference points. It was also shown that a linear relation existed between accuracy of interpolation and sampling density. The slope of the linear regression  53  equation was correlated with the terrain type. Obviously, a problem one immediately faces here when "different types" of terrain must be evaluated is to describe a terrain type concisely and quantitatively. In Leberl's study, a simple indicator for the terrain type, namely the "normalized standard deviation of terrain relief," was used.  The effects of different interpolation methods have been explored by other authors as well. Hundreds of interpolation algorithms exist and each, of course, offers a slightly different result [Kvamme, 1989]. Leberl [1973], for example, evaluated the effects of varying the weighting function in interpolation methods based on distance-weighted averaging. Accuracy was assessed for each interpolated surface as the mean absolute error. Variations in weight (defining the influence of sample point on grid point as a function of the distance between them) for surface-specific sampling were observed to produce small but systematic changes in the accuracy of the interpolated surface. The aligned systematic sample of points yielded a mean absolute error less than or approximately equal to the surface-specific sample.  Braile [1978] examined the effects of four different interpolation methods on interpolation accuracy. The interpolation methods were applied to a sample of 200 random points derived from a digitized map. The R M S E was computed for each interpolated surface as an index of accuracy. The R M S E was observed to vary significantly for the four interpolated surfaces, with the n'^-order polynomial interpolation method yielding the lowest error, and one of the distance-weighted averaging methods yielding the highest error.  54  Tempfli and Makarovic [1979] made a general and uniform evaluation of the performance of three classes of interpolation methods: piecewise polynomials, moving averages, and linear-least-squares algorithms. Transfer functions, which represent a generalized sampling theorem, were determined numerically for different interpolation methods and their variants. Transfer functions establish a relation between sampling density and transfer ratio. The latter is a measure of fidelity of the reconstructed data which is determined by the amount of information transferred from the input data to the data reconstructed from sampled point data. Transfer functions were shown to be useful in identifying the appropriate weight function, limiting distance, and in quantifying the parameters used in the interpolation algorithms. It was concluded that fidelity is strongly affected by the sampling density. The complexity of the interpolation procedure may also have a significant impact if applied with careful consideration. However, the chance for incorrect application is greater for more elaborate interpolation methods than for simple ones.  Of the two variants of piecewise polynomial methods tested by Tempfli and Makarovic [1979], the third degree polynomial method performed slightly better than the simple linear interpolation within the entire range of sampling density considered. The moving average methods were represented by several variants of the polynomials of the zero- and the second degree.  The two basic versions provided nearly equal results, though the second degree  polynomials seemed to be slightly superior.  The best variant of the second degree  polynomials performed as well as the third degree piecewise polynomial. Similarly, the best variant of the weighted mean (0-degree polynomial) performed as well as the simple linear  55  interpolation. Fidelity was found to be far more sensitive to changes in the parameter values of a weight function than to the different types of weight functions.  Kvamme [1989] investigated three elevation interpolation algorithms that produced alternative and slightly different DEMs from digitized contour lines.  Three interpolation methods  examined included the 'steepest ascent' algorithm, the 'weighted average' algorithm, and the 'vertical scan' technique. Some interpolation artifacts were described by the author, such as the 'bench-like' effect along ridges and drainages, a result of using the 'weighted average' method in regions within horseshoe-shaped contours, and a 'stair-step' effect, a result of using the 'vertical scan' algorithm.  Boehm [1967] also discussed different surface interpolation methods from digital contour representation and grid representation.  He concluded that the most reliable method is the  weighted distance interpolation, which is equivalent to the extension of bilinear interpolation between unequally spaced data points. Higher order interpolation methods can in many cases yield errors as great as or greater than bilinear interpolation.  As indicated in section 1.2.4, production of a D T M involves sampling and interpolation. The transfer function of this process defines the fidelity of the reconstructed surface. The transfer function allows for a comparative evaluation of different interpolation procedures and can also be used for determining an adequate sampling interval. If the required quality of a D T M is specified by a standard error, or if the laws of error propagation need to be applied for  56  evaluating the accuracy of the derived product, then additional knowledge is required. It becomes necessary to characterize the terrain by its power spectrum and also the measuring error, which then allows expression of the accuracy of the reconstruction in terms of a variance, or its square root, the standard error. Tempfli [1980] gave a demonstration of D T M accuracy estimation using spectral analysis. To compare the variance estimates with the actual mean square discrepancies, computer generated surfaces were used. Three profiles were selected from the contour plots and three interpolation methods were applied. Reconstructing two surfaces by bilinear interpolation was also studied. The results showed that a higher accuracy was attained by linear interpolation than by the chosen moving average for all three profiles and for all the sampling intervals used. The chosen linear least square interpolator performed better than linear interpolator only when the sampling density was high. The accuracy of the reconstructed profiles decreased, in general, with an increasing sampling interval. It is clear that the sampling density is the decisive influencing factor for the accuracy estimate, not the interpolation method used, assuming a suitable parameter choice is provided.  Fractals are a relatively recent field of research and yet a great deal has been written about the fractal nature of topographic surfaces [Klinkenberg and Goodchild, 1992]. The fractal dimension concept is used by Polidori et al. [1991] for grid D T M quality assessment. The authors claim that most D T M evaluation techniques, which consist of estimating a global accuracy with regard to a set of check points, do not detect artifacts such as those caused by digitizing and resampling. By computing the fractal dimension value at different scales and  57  in different directions based on a fractional Brownian motion (fBm) surface model, interpolation artifacts, like excessive smoothness and directional tendency, may be revealed. The D E M used in the study was interpolated (40-meter interval) from digitized contour data (from a 1:25,000 scale topographic map). Estimates of the fractal dimension (D) made from the D E M were observed to be lower when obtained over short distance intervals (1 to 5 pixels, i.e. < 200 m, D = 2.07) than when estimates were obtained using longer distance intervals (10 to 30 pixels, i.e. 400-1200 m, D = 2.25). The authors state that this difference is due to the smoothing inherent in the interpolation process that was used to obtain the gridded data from the initial contours. Goodchild and Tate [1992], however, argue that the central conclusion of this research—that fractal analysis can provide an effective index of D E M quality—appears to be unwarranted based on the results presented, and in the light of the available literature. One of their arguments, for example, is that real surfaces are rarely pure fBm, and often contain clear departures from the fBm model in the form of local linear trends and scale dependencies. In order to detect and prove the existence of an interpolation smoothing effect over short distance intervals, D at short values must be shown to have been significantly reduced beyond the variation expected by chance.  A number of authors have proposed techniques for assessing the interactions between interpolation and measurement error associated with sample points. These techniques often suggest ways in which interpolation methods can be devised to minimize the effects of such interactions. A bibliographic review is given by Veregin [1989].  58  2.3.2  Spatial variation of interpolation errors  Many measures of error that have been proposed, such as root-mean-square error for ratio data and the classification error matrix for nominal data , have no spatial dimension and they 1  are of little use in error study of spatial data [Goodchild and Gopal, 1989]. How the error is distributed across the area of any one D E M is still unknown, and factors that may affect the distribution of error are largely unresearched. For effectiveness and accuracy, many GIS studies depend on reliable digital elevation models. To understand the impact of error in D T M on error-sensitive GIS-type applications, the following is important [Wood and Fisher, 1993]: 1)  identify the occurrence of uncertainty in spatial data,  2)  identify the spatial distribution of this uncertainty, and  3)  identify the effect this uncertainty will have on subsequent GIS operations.  For the above reasons, the study of spatial variation of error has become the focus of recent research [Theobald, 1989; Shearer, 1990; Csillag et al, 1992; Wood and Fisher, 1993]. Wood and Fisher [1993], for example, describe a visualization method that can be used to identify the spatial variation in D T M interpolation accuracy produced by interpolating digital contour data from Ordnance Survey 1:10,000 digitized contours at 10-meter vertical intervals.  'Error matrix: a common means of reporting site specific accuracy for a classification with a two dimensional array that cross-references the classification results with the reference data. The correctly classified objects are indicated by the diagonal elements and omission and commission errors are indicated by the off-diagonal elements. 59  Four methods of interpolation are applied to the rasterized contour data with a horizontal resolution of 10 m. The interpolation processes are then visualized using different techniques such as shaded relief maps, aspect maps, Laplacian maps, and convexity maps. Each of the techniques emphasized a particular facet of the elevation model. For instance, the shaded relief maps and, especially, convexity maps, showed clearly some terracing effects in the topography due to the distribution of the original contour lines used to represent the surface. Slope direction maps indicated some interpolation artifacts in flatter regions which related to the original profile directions used in a one-dimensional spline fitting interpolator.  The  visualization and examination of the distribution of root-mean-square error produced by this interpolator also identified patterns of accuracy loss related to the profile directions used in interpolation.  Shearer [1990] demonstrates two possible methods of graphical representation of accuracy in digital terrain models showing error magnitude and distribution. Based on the difference values (or residuals) obtained by comparing two gridded DEMs derived from the same digitized contour data, but processed using different interpolation packages, errors at grid nodes can be displayed by means of proportional circles. Different colours can be employed to differentiate between positive and negative residuals.  Such representations have the  advantage of clearly indicating grid nodes where serious, and perhaps anomalous, errors occur.  A n alternative and commonly applied method is to plot the errors as contours.  Comparison of such diagrammatic representations with, for example, a plot of the original input contours, can be informative with respect to the occurrence and magnitude of errors in  60  relation to such factors as terrain slopes and distribution of input data.  2.3.3  Propagation of D E M error  As reviewed in the above sections, most of the studies on error in DTMs have concentrated on the nature and description of the error, rather than its propagation into derivative products resulting from GIS-type operations. Fisher [1991] examined how RMSEs as reported in each USGS gridded D E M propagated into DEM-derived products such as the yiewshed—the area observable from a viewing location versus that which is invisible. A Monte Carlo simulation and testing approach is used for studying the propagation of D E M error, in which repeated error fields with varying parameters were added to the original D E M , and the viewshed was determined in the resulting noisy D E M . In the absence of any other information on error structure, the assumption of independence implied by the USGS error reporting was used in the test. It assumes that the R M S E is equivalent to the standard deviation of a normal distribution. Results showed that the area of the viewshed calculated in the original D E M may significantly overestimate the viewshed area.  2.4  Summary  A literature review of the previous research on D T M error modelling and accuracy estimation has been conducted.  Various sources of error in DTMs were identified and different  approaches taken by researchers for the evaluation of D T M errors were briefly discussed.  61  Generally, aspatial approach has been taken to the study of error in DTMs. Most studies use a global measure of D T M accuracy, such as the R M S E as used by the USGS. The spatial distribution of errors and its interaction with scale and resolution have rarely been examined. However, a few authors have observed that there appears to be a link between terrain complexity and the spatial distribution of D T M errors. While some recent studies have begun to actively investigate the spatial variation of D T M error, none so far have quantitatively examined the relation between terrain complexity and D T M error. This paper will try to fill this blank and present a new approach to this issue.  In the next chapter, the selection of the study area and the implementation of the sample database will be discussed. Some preliminary test results will also be presented.  62  3.  S T U D Y A R E A D A T A A N D S O M E P R E L I M I N A R Y TESTS  In order to have a better understanding of the issue of scale, accuracy, and spatial pattern of errors in digital topographic modelling, a true multi-scale D E M data set was developed. In this chapter, a study area is selected and some preliminary investigations based on the sample database are conducted. The principal objectives of this are threefold: 1)  to demonstrate the availability of digital topographic data from different organizations, at different resolutions, and with different formats;  2)  to compare different DEMs and to investigate the interaction between the terrain variation and the spatial resolution and their influence on D E M error and its spatial pattern, thus to form the hypothesis; and  3)  to provide empirical data for testing the thesis hypothesis and for a more detailed evaluation of the role of scale in topographic characterization, terrain classification, and D E M error modelling (discussed in Chapters 4, 5 and 6).  3.1  Study Area Selection  Several DEMs within a study area in central British Columbia were chosen to demonstrate the effects of D E M resolution on accuracy and the interrelations among the spatial pattern of errors, resolution and the complexity of terrain. The study area is north and east of Prince George and covers approximately 198 km (E/W) by 175 km (N/S). Figure 3.1 shows a D E M image of British Columbia (5' resolution) and the location of the selected study area. The  63  <u  fcCS <u  o 29  09  8S  9S  P9  29  09  8fr  1 1 _  _i  O  9fr  d CM  o CD T3  o eg  -*—• =3 C  u  E i  c  CM  o CO  c  in  oo CM  LU Q  CM CO  d  o  1 "3 U  xi  GO  CD CD  O  o  J  co  29  1  09  1  1  89  99  1  1  29  64  1  09  1  8fr  1—  91?  <u  W Q ID  areal extents of the study area are 53.5°-55°N and 120°-l23°W corresponding to the boundaries of 36 NTS 1:50,000 scale sheets (each generally defined by a quadrangle area of 15 minutes of latitude and 30 minutes of longitude). A l l the map sheet numbers included are illustrated in Figure 3.2 for identification purposes.  There are two main reasons why this area is selected. They are: 1)  the availability of different DEMs in the area. Since some larger scale digital topographic data programs such as B.C. T R I M 1:20,000 project are completed only for selected areas, the selection of the study area is thus limited.  2)  the general characteristics of terrain. British Columbia is essentially a mountainous region except for the northeast corner which includes a small portion of the Interior Plains. Although the province is largely mountainous, there are extensive plateaus, large plains and basins, and areas of prairie [Holland, 1964]. As seen in Figure 3.1, the study area is located in the Rocky Mountain area with high terrain variability. Of the 21 major physiographic divisions of landforms in British Columbia defined by Holland [1964], three occur in this area including the Interior Plateau and the Rocky Mountain Trench, two very different physiographic divisions. The selection of an area with variable sub-terrain types will allow an analysis of the correlation between D E M errors and terrain complexity.  65  93 J/15 93 J / 1 6  93 1/13  93 1/14  93 1/15  93 I 16  93 J / 1 0  93 J / 9  93 1/12  93 1/11  93 1/10  93 1/9  93 J / 7  93 J / 8  93 1/5  93 1/6  93 1/7  93 1/8  93 J / 2  93 J / 1  93 1/4  93 1/3  93 1/2  93 1/1  93 G/15 93 G/16 93 H/13 93 H/14 93 H/15 93 H/16 93 G/10 93 G / 9 93 H/12 93 H / l l 93 H/10 93 H / 9 123°00'  122°30'  122°00'  Figure 3.2  121°30'  121°00'  120°30'  120°00'  Study area NTS map sheet numbers.  66  93 H/16 93 H/16  W  E  3.2  Database Implementation  Different D E M data sets for the above selected study area were obtained from the mapping institutions mentioned in section 1.2.3, and a multi-scale D E M database was constructed on a Unix workstation. In particular, five digital topographic data sets (i.e., N G D C 10, NGDC5, WSC2, E M R 1 , and T R I M DEMs) were ordered from N G D C , WSC, E M R and B.C. Ministry of Environment, Lands and Parks respectively. A brief description of each data set is given in Table 3.1 including their spatial resolution, matrix dimension, geo-referencing coordinate system and elevation mode.  Because of differences in both resolution and format as seen from Table 3.1, some computer programs were written (mainly in Fortran or Splus ) to extract or transfer necessary 2  information.  The following five sections introduce each digital file and its format, and  describe the data preprocessing procedures used for the database implementation.  3.2.1  NGDC10  The N G D C 10' (i.e., 10-arc-minute) global data set was produced originally by the US Navy Fleet Numeric Oceanographic Center (FNOC) and consists of a 1080 x 2160 geographic (i.e. Latitude/Longitude) grid containing a number of parameters including maximum, minimum and modal elevations, recorded to the nearest 100 feet (see Table 3.2).  2  Splus is a graphic and statistical data analysis software developed by StatSci. 67  Pi  o  O  o  ON  3 .. x 9 K 00 w n tn H as  o  o o m rm  oo  ON  o  ON  S ON O  ON  ON O O  d  as  X  o  X  r-  oo  ON ON  60  C  ^ Jj m OH ON >n g d 3 i  o  ON  a  ON  o  •J  00  a,  1/3  ON ON  u  oo  00  U Q O  a o ID  wo  Os  x  in  03  ON  6  60  a  u  8  Z  * 2  9  i  o  ON  I  X  x  o  T3 O  173  o  c o  3  -a  'en  e  a  C  3  S 0)  6J  o  Table 3.2  The 11 parameters in the N G D C 10' global data set  Format:  Field contents:  F7.2 F6.2 13 13 13 12 12 12 12 13 13  Longitude (+/- 180°) Latitude (+/- 90°) Modal height in hundreds of feet Maximum height in hundreds of feet Minimum height in hundreds of feet Number of significant ridges Direction of ridges in 10's of degrees (0-18) Primary surface type code Secondary surface type code Percentage of surface covered by water Percentage of surface covered by urban development  69  The modal, minimum and maximum elevations for the area (45°N-65°N, 140°W-110°W) were extracted to cover the whole province of British Columbia.  Each of the three elevation  matrices has a dimension of 120 x 180, from which a 10 x 19 sub-matrix was then extracted for the study area chosen for this research. A statistical summary of the 10' study area data set (modal elevations only) is shown in Table 3.3.  The descriptive statistics calculated  include mean elevation, standard deviation, maximum elevation, median elevation, minimum elevation, upper quartile and lower quartile.  A histogram of the study area 10' modal  elevations is shown in Figure 3.3. From the summary statistics and the histogram of D E M data, the overall D E M characteristics can be observed. These statistics are also used for a general comparison of various study area DEMs, and to identify obvious problems, if any, in the D E M data sets before doing any further analysis.  3.2.2  NGDC5  The 5' northern hemisphere average elevation data (also known as ETOP05) were originally prepared by D M A from arithmetic averages of data digitized from contour maps, and now are distributed by N G D C as part of the Global Ecosystems Database. The data set consists of a 2160 x 4320 geographic (i.e. Latitude/Longitude) centroid-registered grid with elevation data expressed to the nearest meter. The British Columbia 5' mean elevation matrix (200 x 350) (see Figure 3.1) and the study area 5' data (19 x 37) were extracted from this data set. Some summary statistics of the study area 5' elevation data are listed in Table 3.3. histogram of the study area 5' elevations is shown in Figure 3.3. A comparison of the  70  A  in  • i-H  l-H  ON  ON NO 00  CN CN 00  00  00  CO  ON  CN  in  ON  00  CO CO  NO  OH OH  cr  CO  co  NO  CN  <N  w  a 3 3  C  o NO  c  '•3  r--  rH  ON  o  o  NO  m  TT  NO NO  1  O  '"  O  wo  H  o  m  r -  oo O  o '"  *—H  H  6 3  d  (m)  s  (Tl  CN  o  ON NO  ON  (N  CO  co  o (N  o  CO  *-H  CO  > T3  ON  O  •d •*-»  ON 00  co  CO  r oo  co  NO NO  r -  m  oo  3  ed D  oo  CO  CO  H — i 1— 1H  jo  o  U Q O  m  U Q O  CN  u  3 "o1  T-H  > VH  < T-H p tr  w  4— -* 3  71  Ov  OS  0  OOSL  sjujod p jeqwnN  009  0  sjujod jo jaqwnN  72  g  statistical summary of NGDC10 and NGDC5 DEMs as shown in Table 3.3 and Figure 3.3 indicates the closeness of the two data sets. No apparent problem was found in either one.  3.2.3  WSC2  For the long range development of hydrometric networks in British Columbia, the Water Survey of Canada, in conjunction with Shawinigan Engineering Company, Limited, carried out a deliberate and systematic study of data collection starting in 1970. The U T M grid on topographic maps was used as the basic unit for data collection and for determining physiographic characteristics in the development of the physiographic and hydrologic data banks. This provided the tool for multiple regressions between hydrologic and physiographic characteristics.  Considering the mountainous terrain and the small size of many of the  ungaged rivers in British Columbia, a grid interval of 2 km was selected as the basic unit for the abstraction of the parameters. The data which were abstracted from the NTS 1:50,000 map sheets included information to compute mean elevation, average land slope, stream density, and area of lakes and swamps [Kreuder, 1979]. The mean elevation is the elevation at the center of the 2 km x 2 km square which is assumed to represent the average elevation of the square. The data that were abstracted were written and color coded into each square on the map sheet as shown in Figure 3.4. The data were then transferred from the map sheets to computer cards in a format which would allow verification of data, identification and verification of map sheets, and the transfer of the data onto magnetic tape.  73  2  0  kilometers  18 15  1050 4  %w  CN  gure 3.4  3.5  3  Sample of WSC coded grid square (after Kreuder, 1979). Elevation in feet was coded in red in the upper right corner, the number of contour crossings (EAV and N/S) coded in blue in the upper left corner, the number of dots for areas of lakes and swamps coded in brown in the lower left corner, and the number of divider intervals for measuring length of streams coded in purple in the lower right corner.  74  The 2 km raw physiographic data for the study area (covering 36 1:50,000 map sheets as shown in Figure 3.2) and the Fortran programs used for mean elevation data extraction were obtained from W S C directly. The corresponding U T M coordinates for the study area range easting (X) from 501,000 to 697,000 m and northing (Y) from 5,929,000 to 6,097,000 m in U T M zone 10, which resulted in an 85 x 99 elevation matrix.  A plot of all the data points (see Figure 3.5) indicated that in addition to some missing values along the border, five data values inside the area were also missing.  They were later  interpolated by taking the average of their surrounding data points. The descriptive statistics of the data set are shown in Table 3.3 and a histogram of WSC2 elevations is shown in Figure 3.3.  3.2.4 E M R 1  The EMR1 D E M data for the study area were acquired from the Earth Physics Branch of the federal Department of Natural Resources (formerly Energy, Mines and Resources). The data were digitized from available topographic and bathymetric maps and distributed on magnetic tapes with a minimum data retrieval unit of a 15' of latitude by 30' of longitude quadrangle corresponding to the boundaries of the NTS 1:50,000 maps. For the purpose of this research, elevations and water depth data for 36 quadrangles were ordered to cover the 1°30' x 3° study area (see Figure 3.2). Data are stored by quadrangle, and some quadrangles are duplicated to give fresh water depths instead of land elevations. There is header information for each  75  OQS  9>S  0'V9  76  96S  quadrangle. Spot values are listed by column from south to north and west to east within the quadrangle. The collection of the spot elevations is in arc-minute format and the average spacing is about one kilometer. More specifically, the number of rows is 28 and is constant for all quadrangles, but the number of columns varies for different latitude ranges. Table 3.4 gives the detailed specifications of data collection for the study area.  Several Fortran programs were written in order to extract elevations and their coordinates. Because of the different numbers of columns in the upper part (latitude > 54.25°) and the lower part of the study area as indicated in Table 3.4, two elevation matrices were prepared in Splus. The dimensions are 84 x 192 for the upper part and 84 x 198 for the lower part. Figure 3.6 illustrates the location arrangement of spot elevation collecting points for the upper and lower part. Because of the space limitation, only points within the range of 54°N to 54.5°N and 121°W to 122°W were plotted.  In order to be comparable with other D E M data sets, and to merge the upper and lower parts into one D E M for the whole study area, the coordinate system was transformed from geographic coordinates (longitude, latitude) into U T M coordinates (easting, northing) using the SPANS GIS program. Then a D E M of dimension 160 x 188 with a resolution of exactly 1 km was interpolated for the study area from the elevation data in both upper and lower parts using an interpolation function in Splus—interp. The algorithm is based on a method of bivariate interpolation and smooth surface fitting for irregularly distributed data points developed by Akima [1978a]. In this method the x-y plane is triangulated or divided into a  77  z  << _) H  00 •—1 Q  <N  CN  CN  m  00  CO  O  NO ON  ON  CN CN  o  »—I  1—I  as  ILAM  Q  oas oas  ON  m  ON  NO  d  Os  ON ON  as  ON  oON O ON  SO  NO ON 00  NO ON 00  co rr. in  m CN  in CO  o  ON O  O  as  00  o  O  o  O  o  i> CO  r~ CO  r-~ CO  O  om  O ON  ON  ON  q  om  ON  om  ON  5 is co co  CO CO  CO CO  CN CO  CN CO  CN co  fl M  O 00  w  TJ-  00  CN  f-  r-H  00  ON  00  co  CO-  CO  CO  CO-  Q  CO as as  r-H PH  in co  r>n co  m  rin co in  in CO  in CO  in CO  00  00  00  oo  00  00  m  O  in CN  O  m  m r-  O  co in  o  in  in  m  m  H  Q  Z  CM  U-!  m  CN  r~;  ON ON  CM  oq  ON ON  CN  ON ON  m  CN  ON ON  m  CN  u  o s  2£  £  C  S3 — II  O (u  o e  in  ON ON ON  m  CN  q  in  m GO  in co" m  m CO m  o o •<fr  m  a o  m CN  o  in  m  m  in  in  Q  78  ^ E E 2 j  J  M  P u Z Q Q Z Q Q Z  o CM  CM CM CM  g 'S  CO -*—'  c 'o  OH  CL  id  CO CO  CM CM  LU Q  w CD  33  CD T3  DC  '6) CD  111  CM CM  CO CD CO  c o  Q  2 w CD T3  a  >»  CO  CO  00 CM CM  CO CM  1  9>9  1  VV9  :  1  6>9  1  Zt9  (99j6ap) 9pni!iBT  79  1  L>9  1  0fr9  co I-I  3  number of triangular patches, each having projections of three data points in the plane as its vertices, and a bivariate fifth-degree polynomial in x and y is applied to each triangle. Estimated values of partial derivatives at each data point are used in determining the polynomial. The method does not smooth the data and gives an exact interpolator. That is, the resulting surface passes through all the given points. In order to examine the impact of the interpolation, and later for comparison purposes, the descriptive statistics of both raw and interpolated study area EMR1 D E M data sets are given in Table 3.3 together with statistics for other study area D E M data sets. The interpolated 1 km EMR1 D E M is almost identical to the raw EMR1 elevation data in terms of their summary statistics. The grayscale images and histograms of the raw and interpolated EMR1 elevation data were also compared to each other and no apparent differences were found. It can, therefore, be concluded that the impact of the interpolation is insignificant—if the spatial resolution does not change much after the interpolation, of course. The variation caused by different interpolators can, thus, be ignored in future analyses for the purpose of comparing DEMs of different resolutions. The EMR1 D E M data set will, from now on, refer to the interpolated data only. The histogram of the study area E M R 1 elevations is shown in Figure 3.3 together with the histograms for NGDC10, NGDC5 and WSC2 D E M data. It can be seen from Table 3.3 and Figure 3.3 that WSC2 and EMR1 DEMs are very similar in terms of their summary statistics and histograms.  3.2.5  T R I M 93G and T R I M 93H  For the 1:20,000 scale T R I M D E M data as described in Appendix A , only two smaller  80  regions within the study area were chosen, given both budget and computer power constraints—one relatively flatter subarea (herein referred to as T R I M 93G or just 93G) and one rougher subarea (TRIM 93H or 93H). Each subarea contains four adjacent map sheets and their map numbers are as follows: • T R I M 93G - 93G.089, 93G.090, 93G.099, and 93G.100 • T R I M 93H - 93H.088, 93H.089, 93H.098, and 93H.099 As shown in Figure A.2 in Appendix A , a map number in the T R I M project consists of the appropriate NTS 1:250,000 lettered map number followed by the numbers of each successive breakdown (001-100), each separated by a period.  According to British Columbia Specifications and Guidelines for Geomatics [MOEP, 1990], all T R I M mapping is presented on the U T M coordinate system (based on the 1983 North American Datum) and the accuracy requirements reflect those standards set under the North Atlantic Treaty Organization (NATO) Standard Agreement (STANAG) for the evaluation of Land Maps. For D E M data in particular, the following accuracy specifications are stated: 1)  ninety percent of all well defined planimetric features shall be coordinated to within 10 m (i.e. 0.5 mm x 20,000) of their true position.  2)  ninety percent of all discrete spot elevations and D E M points shall be accurate to within 5 m of their true elevation.  3)  ninety percent of all points interpolated from the D E M (including contour data) shall be accurate to within 10 m of their true elevation.  4)  true position/elevation is defined as the coordinates which would be obtained  81  from positioning with high order ground methods. 5)  accuracies relating to elevations relate to ground not sufficiently obscured by vegetation or other features to cause significant error.  The original spot elevation data for the selected eight map sheets were collected in random mode. Appendix B gives the M O E P ASCII file format for T R I M data. Submissions from the Digital Mapping Group Limited (DMG) give all coordinates (X-easting, Y-northing, and Z-elevation) to the nearest meter which conforms to accuracy requirements.  A Fortran  program was written to extract the X Y Z coordinates of all the points from each of the eight D E M files.  The eight D E M point files were then read into Splus separately for further  processing. Table 3.5 lists the X , Y coordinate ranges, the number of elevation points (AO, the random point collection density (D), and the average spacing between points ( 5 ) for each map sheet. The density is computed by (3.1)  where A is the area covered by each map sheet. The average spacing, another way of looking at density, is computed by  S = 1.0746.  -  (3.2)  when a hexagonal spacing is assumed, which is the arrangement when the N points are equidistant. Given that the points are randomly distributed, this formula provides only an  82  X Os os  m m SO  CN  rr-  CN  30298  5987421  5975237  684203  00  00 00  Os  o  o  o  o  8.3-  30645  34668  r--  00  Os  93H.  00  r~  93H.  00  33476  00  30304  670120  o o  o  r-  o  93H.  o  co  5986931  5974769  671091  657016  Os ON  00  93H.  CN  Tir-  30197  5964120  684674  670572  o ON o  ON  — r1  93G.  CN CN  93G.  •a a .  r-  CO  5976310  5975817  5963664  671525  657394  5984189  5972471  565940  552223  5984026  552785  539113  5972330  566066  552362  ON Os  5961213  00  552884  Average Spacing (m)* r-  93G.  Number Points*  35213  =t* 5973096  c E X 5961355  « E 35291  >*  5972896  s E ><  93G.  S E S  539234  Map Q  CN  O  approximation of the true spacing. It should be noted that the values listed in the above table do not include the breaklines, ridgelines, and planimetric data which are also included in T R I M data and are supplements to the point elevations.  As indicated in Appendix A , the T R I M D T M data are collected either in a random mode or a grid mode by stereo compilation. The D T M points are defined to fit a grid spacing of 50 m in steep terrain (average slope > 25°) and 75 m in less complex terrain when captured in a grid pattern. When a random point collection pattern is used, the average spacing between sampling points will be approximately 75 m in steep terrain and 100 m in flatter, less complex, terrain. When supplemented by breaklines, ridgelines, and planimetric data, a 25 m grid D E M can be derived [Balser, 1989].  Therefore, in order to create a regular grid D E M for future analyses, the random elevation point data were, in this case, interpolated into 50 m grid format for each of the two subareas (TRIM 93G and T R I M 93H) using the interpolation function in Splus—interp. As discussed in the previous section, the variation that might be caused by the interpolation can be ignored. Two elevation matrices of the same size (438 x 520) were created. In order to reduce the computational demand in further analyses, a smaller subset D E M of size 300 x 320 was extracted from each of the two 438 x 520 elevation matrices. The interpolated full-size D E M images of the two subareas (TRIM 93G and 93H) along with their smaller subset DEMs (TRIM93G.SUB and TRIM93H.SUB) are shown in Figures 3.7 and 3.8 respectively. The histograms of the two subset DEMs are displayed in Figure 3.9. Their descriptive statistics  84  are shown in Table 3.6.  A comparison of the two subset DEMs reveals the overall  differences between the two subareas in terms of their general roughness or complexity with TRIM93H.SUB having a much larger elevation variance.  As shown in Table 3.6, the  standard deviation of elevations for TRIM93H.SUB is 409 m whereas for TRIM93G.SUB it is only 85 m.  As indicated in Appendix A , when reading T R I M elevation data (X, Y , Z) in a random pattern, the density specifications are as follows: 1)  in areas where the average slope of the terrain is less than 25°, the average spacing between points will be approximately 100 m, and approximately 120 points/km ; 2  2)  in areas where the average slope of the terrain is more than 25°, the average spacing between points will be approximately 75 m, and approximately 200 points/km . 2  In order to find out whether the original random point data collection for the two subareas (TRIM 93G and T R I M 93H) actually meets the density specifications set by M O E P [1990], slopes were derived from T R I M DEMs (see section 4.2.3 next chapter for detailed procedure). Figure 3.10 displays the slope histogram for each subarea. It can be seen from Table 3.5 that the data collection for all four map sheets in T R I M 93G subarea conforms with the above mentioned density specifications for rougher terrain. Although the average slope for this whole subarea is only 6°, there are areas with slopes up to 54°, as shown in Figure 3.10. This probably explains why a high density of over 200 points/km was used for each of the four 2  85  "S*  C  o o o o  O  C/J  C3  4—»  CO  LU  E CQ [?</) o o o o  1-5  OH  O o co o OS  CJ  -S  S3 <D  LU  2  H  U  03 (J •S 3 3 T3 VH  £2 <U l-H  o o o o •<* 00008  0009/  0000Z  es  00099  (uu) 6UJL|)J0N  gs H  © :=) C N _,3 H  co t3 <D co g ! a U  o o o o  oo -O  CD  C3  -pLLI  o o o o in  -35  £ Q U)<5 c co CO  co LU  ^ £  3  W  O  £  «  oo  03  CN  x 'B oo 2 co S  oo  U  00  >>  .3 -3 <U o o o o •<+ CO  00008  0009Z  0000Z  (Ul) 6UJL|)J0|S|  86  00099  u  1-1 3  60  o o o o oo LU  o o o o  g></) CO (J,  o o o o  CD  00098  00008  0009Z  OOOOZ  00099  (LU) 6u|quoN  o o o o oo  o o o o h-  S O D)X  8 1 g  CO « 5 CO - £  oo *  C  CO  LU i f  4>  _N  o o o o co  CO  00098  00008  0009Z  OOOOZ  00099  CD  t-l 0  (oi) 6u!L|U0N  87  Z! H ^  «s If2 oq  £  5  B S  _1  CO Z> CO I  o o o co o o m CM  o o o  CO CD  CM  DC  c o >  CO  co  o o in  r-  o o o 0000L  0008  0009  OOOfr  0002  Pi H  § g CO  b  sjinod p jequunN  CO ON  o o  CQ  •  CO  r "J  CD  I"- 3 1  r  CO CD  3  o o o  I-I  o  o o  CT>  0\  or  o o oo  L.1  00002  0009L  0000L  0009  sjinod jo jaqwriN  88  o o  co 3  £1  Maxir (m) 1148 3252  Std.dev (m)  ON o  m o  1820  TRDA93G. SUB  493H. SUB  ranu  oo  Mean (m)  as SO  r1816  Median (m)  Miniinum (m)  Upper quartile (m)  Lowe quart (m)  1544  SO  00  2093  Tf oo  m r-  O >/-> as  00 00  ON  H  0002L0000L 0008 0009 000V OOOS  0  s i u p d io J s q a i n N  o  co  CD  or  i 00002  0009 L  0000 L  siuiod p  jaqiunN  90  0009  L  O  map sheets in this flatter subarea. On the other hand, the collection density for all four map sheets in the T R I M 93H subarea falls short a little bit, although this is a rougher area with higher average slope compared with T R I M 93G (see Figure 3.10). The slope values in this subarea range from 0° to 82° and the average slope is 23°. It can be concluded then that the point collection density specifications were generally met for both subareas, but there exist some inconsistencies from one area to another. Cheong [1992] discussed in more detail this problem associated with T R I M D T M data point collection and the specifications set by MOEP.  3.3  Some Preliminary Tests  In order to compare DEMs of different resolutions and formats, geometric registration of all the DEMs from various sources is the first necessary step. Figure 3.11 shows the registration of various study area data sets as introduced in section 3.2. Only data points from NGDC10, NGDC5 and WSC2 are displayed in this figure since EMR1 has too many points to show in the limited space.  The four D E M images of the study area as represented at different  resolutions and the D E M images of the two subareas are displayed in Figure 3.12.  Also  shown in this figure (on the image of EMR1 DEM) are the boundaries of the two T R I M subareas (93G and 93H) in order to indicate their locations in relation to the whole study area. Most of the data manipulation and graphic display used here were accomplished using the Splus program. The GIS packages ARC/INFO and SPANS were also used for conversion  91  O  CM  x:--x.:::::^:"::x-" .'5<::::::-?s-:::  X-- ~-x:z X::vx—"«":-::x::--H-::::i<::: ::x •*•_>:::»:;:•::*•••• »--»"»":€::K''^ *-:«"fci:4::x'-«'--»--^'-V:i*:: mm  _ o c — o . £ QQ. O CO  i  x:::::iH:":::7<-  >i:--">K:::::x  s<::::::x:::::>«--  in  ci CM  r- m CM :  :  s 'o  .*...«.  .x:::::x -" H>e::::::si;:::::x- -->«' ::  oo  OH  - X - »•••¥•  •H.::::irir:::;xL-:-->«}---.x:L::::js:::::x—x-.z/.a:::  xs  q CM  • •»"S!«i"I<i":K."^;:v--«r-» -«!Ii!!Ii! ,  -x '.'x:z::\x-— x- -.:i2::::::H;:::::jK::-  x  CD CD i  n>  in x  ::2KU"">< " -x "-^::::x:":"K":^"""i< :  x  :  ::  :>:::i:::«::s -xx  c o  CM CM  • x  x  x  zi/.y/.l::-^ :  m CM CM  •X-  :^::^a::^:i: :i^> ^i::^r::;^ x X  ::  ::X:::::JK:"::»:::::3^ -X  X  X  - X  X  - X  X  X---  —  X - -  o CO CM  0S9  9fS  0>9  (eej6ep) eprujnri  92  60  x--.'.'..x":z\x::  -X --"X..:::X:L":X:::::X::::::S::::::X:::;::X:::::?K::  :  3 O  Pi  is; :X:;:::3K:::::X::::::?«  o 'H-H  -:  •»••••••••••»-.  x  3  3 .O  ::::  L-?s:::::x:;:: • x .»...•...»..*..  x —  ..x::::::x::::x-  CD 73  W Q <s <u 53 >>  9S9  co  a 3  60  ( Bep)epn,!, e81  M Buim»N  e1  93  between geographic coordinates and U T M coordinates, which is sometimes necessary for the registration and comparison of different DEMs.  In addition to showing images of different DEMs and their descriptive statistics, there are two other ways to further visually and quantitatively examine the differences between DEMs of various spatial resolutions: topographic profile comparison and D E M spatial mismatch evaluation.  3.3.1  Topographic profile comparison  Several terrain profiles can be extracted from DEMs and comparisons made with regard to topographic variation revealed by different D E M data sets. For example, the N G D C 10 data set provides maximum, modal and minimum elevations for each 10' by 10' grid area while EMR1 gives a spot elevation every 0.535714' by 0.909091' (about 1 km by 1 km). Figure 3.13 shows the variation revealed by EMR1 within a 10' band (west/east direction from 123° to 120° W) around latitude 54.08°. From the elevation variation shown by the 10' profiles and EMR1 band it can be seen that two distinctive regions exist in the area. One portion of the surface is relatively flat (from 123°W to 121.7°W), where 10' maximum/modal/minimum profiles and EMR1 spot elevations are all close to each other; and the other portion is very rough (from 121.7°W to 120°W), where a great amount of variation is evident. Also note that there are many E M R 1 points outside the minimum/maximum bounds given by 10' data, particularly near changes in roughness.  94  95  3.3.2  D E M spatial mismatch  As indicated before, D E M accuracy assessment is classically carried out by comparing some sample points of the D E M to data of a higher accuracy, and an R M S E is computed to estimate the D E M accuracy [Polidori et al, 1991]. In fact, however, it is necessary to know not only the global measure of the average departure of points on the D E M from the 'real' ground surface, but also the spatial distribution of errors. To compare D E M s at different resolutions, therefore, it is important to examine both the extent and the spatial pattern of mismatch between them and to study the relation between the mismatch and the resolution. The mismatch between two DEMs is determined by calculating the elevation differences between all the data points on the D E M of higher resolution and those interpolated from the D E M of lower resolution. The Splus function interp is used for the interpolation.  The mismatch between two DEMs can also be assumed to be the D E M error for the one with lower resolution while considering the D E M with higher resolution as the reference or accepted 'truth.' It can be derived by calculating the elevation differences between the two DEMs. In the rest of this section, NGDC5 and WSC2 D E M errors as compared to EMR1 D E M for the whole study area will be determined first and then WSC2 and EMR1 D E M errors as compared to T R I M DEMs will be examined for the two subareas.  96  3.3.2.1  Two comparisons between DEMs for the whole study area  As indicated earlier, study area D E M NGDC5 has a 5' resolution and Latitude/Longitude georeferencing coordinate system.  E M R 1 has a spatial resolution of 1 km in U T M .  To  determine their mismatch, the higher resolution D E M EMR1 is taken as the reference data set or accepted 'truth.'  NGDC5 elevations are then used to interpolate a 1 km D E M .  Finally, the interpolated 1 km D E M from lower resolution NGDC5 is compared to the actual 1 km D E M EMR1 and a 160 x 188 elevation difference matrix is calculated for the whole study area (a few missing data points can be observed due to the border effects during interpolation). Figure 3.14 shows the comparison results of the study area DEMs NGDC5 and E M R 1 . The display includes a grayscale image showing spatial variation of elevation differences between two DEMs and a histogram of the elevation differences. Following the same procedure, 2 km D E M WSC2 is also compared to 1 km EMR1 D E M . The spatial pattern of the mismatch between WSC2 and E M R 1 DEMs and the histogram of their discrepancies are shown in Figure 3.15. From the mismatch patterns present in above two figures, it is noted that for both comparisons the D E M errors are not evenly or randomly distributed but show certain correlations with the variation of terrain revealed by the study area D E M images.  3.3.2.2  Four comparisons between DEMs for the two subareas  The comparison of EMR1 D E M with 50 m T R I M DEMs is also done for the two subareas  97  98  r  o  O LO  cD C i—  9  o c  T5  c  CD 1 —  g >  _CD CD O E  o o in  0  15 1  CO _> CD  OS LU  u  LU  PJ Q  CU  0002 L 0000 L 0008 0009 OOOfr 0002  c/i CU  sjujod jo jaqLunN  J3  LU  D E  O  3  •o c  <u  C O E  3 O so  2  CM  c  CD CD  I  CD -O .C  O  U  o to E  cn  CO  — ' 3  co D. co D03  9v0L,S0'9  9vOU9  ( L U ) 6U|L|IJON  99  9vOU96'9  m  93G and 93H.  In this case, the higher resolution T R I M DEMs are considered as the  reference data sets. First, a sub-matrix is extracted from EMR1 D E M for each corresponding T R I M subarea. Then, a D E M of a 50 m grid resolution is interpolated from EMR1 elevation points in each sub-matrix. Finally, differences between T R I M D E M and the interpolated D E M from EMR1 are calculated and a 300 x 320 difference matrix is derived for each subarea. The results are seen in Figures 3.16 and 3.17 respectively for 93G and 93H. In addition to the spatial variation of elevation differences between high resolution T R I M D E M and low resolution E M R 1 D E M , each figure also displays a histogram of elevation differences and a 3-D perspective view of the spatial variation of elevation differences (only a sampling of every four lines is used for clearer visualization).  Following the same  procedure, Figures 3.18 and 3.19 display the comparison results of WSC2 and T R I M DEMs for the two subareas 93G and 93H.  In order to test whether or not the interpolation procedure used in the D E M creation or transformation has any significant impact on D E M accuracy/error analysis, a test was done for both subareas 93G and 93H using different interpolation methods in Splus. Test results are shown in Figures 3.20 and 3.21. Figure 3.20 shows the spatial variation of elevation differences between TRIM93G.SUB and WSC2 DEMs and a histogram of differences using each of the three different interpolators.  The three interpolators tested include the linear  interpolation in the triangles bounded by original data points (i.e., ncp=0 ), cubic 3  ncp is a parameter in Splus interp function which refers to the number of additional points to be used in computing partial derivatives at each data point for determining the polynomial to fit to the surface in each triangle. A number between 3 and 5 (inclusive) is often recommended. 3  100  OOOSi  OOOOZ (UJ) 6 u i q i i o  00099  0009Z  OOOOZ  00099  (w) 6 U U O N  N  N  101  00098  00008  000SZ  Q  (ui) 6UI1410N  00058  00008  (LU) B U L L I O N  102  0009/  0009/  OOOOi  00099  Q  103  0009Z  OOOOZ  0O0S9  o  CA)  E  0 1  X  LU  CO ON  > >  5  E o  3  CO  009  00008  0  — (1 w Q  009-C  SDUSiOUip UO|)BAB|3  I co  -a  2 HI  X)  a  CN  u o  CA)  o s  0 co 000O6  00098  00008  0009/  0000A  00003  00099  00091  00001  0009  (Ul) BullflJON  I  CO CD  o U  s  ON  CA)  I-  5 LU  a m  co  O co  D co i  a  1  m  00008  00008  (uj) BumuoN  (UJ) B H H H O N  104  OOOSZ  II  c  I OOOOZ  OOOOe  00059  000S2  (UJ) eUKJUOfJ  00002  OOOSl  is  0000 L  sjuiod p jeqiunN  5  o E  D  5= to  m  GO CL  MRS 0000Z  000S9  ooooe  ooosz  (lU) f)UIL)|JON  O O O O B OOOSI ooooi siujod jo jsqiunN  Q-  OOOOZ  000S9  ooooe  ooosz  ooooz  ooost  ooooi  siuiod to jeqmnN  (UJ) 6umnoM  o  CN  3  DO  it  105  106  interpolation with 2 additional points used in computing partial derivatives at each data point (i.e., ncp=2) and cubic interpolation with 4 additional points used in computing partial derivatives (i.e., ncp=4) [Akima, 1978a,b]. The standard deviations of elevation differences derived using three different interpolators are very close and they are 29, 31, and 30 m respectively. Figure 3.21 gives similar results for subarea 93H and the standard deviations of elevation differences are 186, 183, and 172 m respectively when using each of the three different interpolation methods. Furthermore, the three different 50 m DEMs interpolated from 2 km WSC2 D E M using different interpolators for each of the two subareas were subtracted from each other. The spatial pattern and the magnitude of their differences are shown in Figures 3.22 and 3.23 for subarea 93G and 93H respectively. It should be noted that, in Figures 3.22 and 3.23, the same vignetting appears, suggesting the same spurious structure has been introduced by the interpolators and is likely to be somewhat related to the variability of terrain. From the above test results, however, it is evident that the magnitude of differences between two DEMs caused by using different interpolators is much less than the difference between the D E M data sets of differing resolution. It is, thus, concluded that the effects of different interpolation methods can be ignored in the above interpolation procedure when comparing a lower resolution D E M with a higher one.  3.4  Observations Based on Preliminary Tests  Table 3.7 summarizes some basic statistics (e.g., mean and standard deviation of elevation differences) of all six comparisons as discussed in section 3.3. From the above preliminary  107  2  <N  13 II  II  o o  « 11  CD CM  6  § too  too §  .2 a o &  mm •  OOOOZ  000S9  OOOSZ  00003  00OSI  OOCXH  (UI) Bu|UUON  s  °  CN  3  W o Q s u on  «S c  £I 1 .2  J? o  &  -rt p I  J & 2 M  &^ OOOSZ  0O0OZ  ooooe  ooosz  ooooz  ooosi  ooooi  (111) eumuofg  Q  w  sl  -° >  ?  s  § Cf g Ji 18 &  c?  Q  b  CN OOOSi  0000/.  00099  OOOOe  (uj) BuiguoN  108  00003  OOOOL  tri u  s  109  30080 (188x160) oo  Number of points  Mean (m)  EMRl & TRIM 93H 96000 (320x300)  WSC2 & TRIM 93H 96000 (320x300)  EMRl & TRIM 93G 96000 (320x300)  CN  -459  CO  m  -847  oo  CN  CO  1  r--  T—1 1  CN  Maximum (m)  oo  CO  -763  CN  -1325  30080 (188x160) O  Minimum (m)  96000 (320x300)  WSC2 & EMRl  o  Std.dev (m)  WSC2 & TRIM 93G  CN  NGDC5& EMRl  CN U3  VO  Os  O  00  o  1—1  CN  O  <N  as  vo  oo  as  110  test results, some observations can be made: 1)  A l l six means of elevation differences are small relative to the standard deviation (see Table 3.7), and the histograms of elevation differences all show a fairly symmetrical distribution of the discrepancies about each mean (see Figures 3.14 to 3.19). This indicates that there is no obvious systematic error component in the elevation difference values.  Furthermore, statistical  normality tests indicate that none of the histograms follows a normal distribution. 2)  The standard deviation calculated for each histogram indicates the 'width' of the frequency distribution of the elevation differences and provides a global measure of D E M accuracy.  3)  The standard deviation of discrepancies for each comparison varies (see Table 3.7) and is related to the spatial resolutions of the two DEMs being compared. For example, the standard deviation of discrepancies between 2 km WSC2 and 1 km EMR1 study area DEMs is 83 m but is 262 m between 5' NGDC5 and 1 km EMR1 DEMs. Between 1 km EMR1 D E M and 50 m T R I M D E M for subarea 93G, the standard deviation of elevation differences is 21 m whereas between 2 km WSC2 and T R I M DEMs it is 31 m. For subarea 93H, the standard deviation of elevation differences between 1 km EMR1 D E M and 50 m T R I M D E M is 126 m and is 183 m between 2 km WSC2 and 50 m T R I M . It is obvious from above preliminary tests that the coarser the D E M resolution the lower the global accuracy. This is true no matter how rough or flat the  111  surface is (i.e., 93H or 93G) and which D E M is used as 'truth' or the one with the highest resolution (i.e., 1 km EMR1 or 50 m T R I M DEMs) for accuracy evaluation. 4)  Based on a visual inspection of the spatial mismatch plots (see Figures 3.14 to 3.19), it appears as though elevation difference values vary spatially across the surface in the study area. The spatial pattern of mismatch is not random, but closely related to the variation and complexity of terrain. In other words, the mismatch pattern visually appears to reflect the terrain variability. Apparently, the rougher the terrain, the more the mismatch pattern appears and the larger the discrepancies. As seen in Table 3.7, for example, the standard deviation of discrepancies between EMR1 and T R I M in the flatter subarea (93G) is about 21 m, much lower than that of discrepancies in the rougher subarea (93H) which is close to 126 m.  The standard deviations of  discrepancies between WSC2 and T R I M DEMs for two subareas follow a similar pattern, with a standard deviation of 31 m in the flatter area and 183 m in the rougher area.  3.5  Thesis Hypothesis  In order to investigate the various relations as observed above quantitatively, the following question needs to be answered: are measures of topographic complexity significantly related to observed patterns of mismatch between DEMs of differing resolution? In other words,  112  does knowledge of the landscape characteristics provide some insights into the nature of the inherent error (or uncertainty) in a D E M ?  Based on the above preliminary  observations, the following hypothesis is to be tested in this thesis: Knowledge of topographic characteristics provides insights into the nature of DEM errors and can be useful for DEM error modelling.  To test this hypothesis, first it is necessary to extract some geometric measures from DEMs which characterize the variation and complexity of topographic surfaces. Then a multivariate classification is necessary to automatically identify relatively homogeneous terrain classes based on various roughness measures. Concepts and methodologies involved in topographic characterization and terrain classification are discussed in Chapter 4.  113  114  4.  METHODOLOGIES FOR TOPOGRAPHIC C H A R A C T E R I Z A T I O N  4.1  Overview  While the idea of utilizing geometric measures or signatures to characterize the nature of topography and to classify terrain is relatively new to GIS [Pike, 1988a,b; Weibel and DeLotto, 1988], the conceptual basis has been developing in geomorphology for some time [Langbein et al, 1947]. Geomorphology is concerned with the description of the form of the Earth and its genesis and, therefore, supplies knowledge on terrain forms and characterization. Geomorphometry, a sub-discipline of geomorphology, is specially devoted to developing quantitative description of landforms [Evans, 1972; Mark, 1975b].  Terrain forms are often classified according to their genesis: by uplift, erosion, sedimentation and other processes. They are also classified by their magnitude and extent, as well as their roughness, all of which are related to the genesis of the terrain [Frederiksen et al, 1985]. Evans [1972] distinguished two aspects of geomorphometry, namely: (i) 'specific geomorphometry,' which is concerned with the identification of named landforms such as drumlins, ridges, peaks, or karst topography; and (ii) 'general geomorphometry,' which attempts to provide a geometric description of the landforms that is applicable to any continuous rough surface.  Only the latter will be the focus of this study because the  roughness or complexity of the terrain is a major factor influencing D E M accuracy, as observed in previous chapters.  115  A considerable number of parameters have been proposed to describe the spatial variations of the terrain in general terms and to represent several different attributes of terrain geometry. Examples of studies which incorporated geomorphometric parameters into a terrain classification procedure include Wood and Snell [1960], Mather [1972], Pike [1988a,b], and Weibel and DeLotto [1988].  These studies, along with Evans [1972] and Mark [1974,  1975b], provide a theoretical basis concerning geometric measures that have been developed, as well as many of their characteristics.  For instance, Mark [1975b]  discusses  geomorphometric parameters in terms of three components or dimensions of a topographic surface including: vertical dimension (e.g., relief), horizontal dimension (e.g., texture, grain, and drainage density), and slope and its derivatives (e.g., local convexity)—which relate the vertical and horizontal dimensions—and a few less important variables.  Evans [1972]  describes, at any point on a surface, five measures of important geometric properties. They are: (i) altitude, z\ (ii) slope gradient, z ' ; (iii) aspect, z ' ; (iv) vertical convexity, z " ; (v) v  A  v  horizontal convexity, z " . Areal parameters of surface geometry are described by their h  frequency distributions which may be summarised by statistical moments such as standard deviation and skewness of altitude and standard deviation of gradient. In addition to classical geomorphometric parameters, some other measures have also been proposed to describe topographic variation. Examples are: number of points higher than the center point of the moving window, the entropy of altitude, and fractal dimension.  There is no single 'magic measure' that is sufficient to characterize the complexity or roughness of terrain. In a general sense, roughness refers to the irregularity of a topographic  116  surface. In the description of natural terrains, roughness parameters should be established that they can be used to describe surface irregularities ranging from a few centimetres to tens of meters. However, a single concise definition of surface roughness is probably impossible [Hobson, 1972]. As observed by Stone and Dugunji [1965], and Hobson [1972], roughness cannot be completely defined by any single measure, which usually describes only some aspect of the physical or mathematical properties of a surface, but must be represented by a 'roughness vector' or a set of parameters. One area may be rougher than another because it has a finer texture, a higher relief, an irregularity of ridge spacing, or sharp ridges (see Figure 4.1) [Mark, 1974]. It is, therefore, necessary to develop multiple signatures to quantitatively describe the terrain. Fractal dimension (£>)—a non-integer value which ranges from 2 to 3 and increases when the surface progressively changes from a plane (D=2) to a surface so folded that it would fill a volume (£>=.?)—was once expected to emerge as the most promising single parameter to measure terrain variation. However, empirical measurement of fractal dimension leads to different results, depending on which measurement method is used, and it is observed that the concept of self-similarity~a form of invariance with respect to changes in scale—is not easily applicable to terrain [Mandelbrot, 1986] [Klinkenberg and Goodchild, 1992]. Therefore no single fractal dimension is sufficient to represent scale-dependent (and thus non-fractal) roughness of topographic surfaces [Weibel and DeLotto, 1988].  In the following sections, methodologies for quantitative topographic characterization for D E M error modelling will be reviewed and presented first. Then, the principles of hierarchical terrain classification will be discussed. Characterization of the variation and  117  Figure 4.1  Forms of surface roughness after Mark [1974]. The numbers refer to the relation of the amplitudes and wavelengths between the profiles (e.g., the amplitude of the second profile is twice that of the first and the wavelength of the first profile is twice that of the third).  118  complexity of terrain will be achieved by means of both local measures and global measures. The local measures include general geomorphometric parameters such as slope and local relief. The global characteristics will be identified using the grain measure, spectral analysis, nested analysis of variance and fractal analysis of DEMs. The rationale for determining both local and global measures of surface roughness from DEMs lies in that: (i) the local measures are required for automatic terrain classification, that is, to identify various terrain clusters and, therefore, to relate the spatial pattern of D E M errors to the roughness variation of the surface; and (2) the global measures are used to provide 'context' and to allow for comparisons to be made with regard to the D E M error modelling for various surfaces (93G and 93H) and resolutions.  4.2  Extraction of the Local Geometric Measures  While mass-production of DEMs has all but replaced the laborious topographic data capture that once restricted the quantitative description of land form, new computer methods are automating the characterization of topography [Pike et al, 1988]. Automation is helpful in providing broad coverage by standardised methods and permitting comparisons of parameters for different areas. Review of the literature of general geomorphometry shows that all of its parameters may be defined in terms of altitude [Evans, 1972]. For example, relief is usually expressed as range in altitude, texture is the spacing of maxima or minima (or the shortest significant wavelength), grain is the longest significant wavelength, slope is the first derivative (rate of change) of altitude, convexity is the second derivative of altitude (rate of  119  change of slope), and the hypsometric integral represents a measurement of the interrelation of area and altitude.  Geometric measures in general geomorphometry have often been extracted and evaluated from gridded DTMs through a method which is similar to texture analysis in image processing [Haralick et al, 1973]. That is, a sampling window of a certain size (e.g., 5x5 or 7x7) is moved through the elevation matrix. As the window visits each matrix element, geomorphometric parameters are calculated from all elevation elements in the window, and these values are assigned to the center point. Then, each point that is visited by the window is characterized by a multivariate description of the local surface geometry.  As the primary purpose of this study is not to formulate new geometric measures or to evaluate the relative capabilities of different parameters, the following seven groups of established parameters for the description of the roughness of terrain are identified and reviewed based on the results of the previously mentioned studies. Each of these parameters is later extracted from the study subarea DEMs using different moving window sizes to demonstrate the feasibility of the concepts and techniques that are presented.  4.2.1  Local relief (LR)  'Relief is a concept usually used to describe the vertical extent of topography. There is no generally accepted definition of 'relief [Evans, 1972], but range in altitude is most commonly  120  used and is referred to as 'local relief by Mark [1974] or 'relative relief by Smith [1935]. Local relief is calculated as the difference between the highest and lowest elevations. That is, LR = z  max  -z  min  (4.1)  Local relief is always defined with respect to some particular area and there have been various ways of defining the area within which range is to be measured. In most cases, local relief is determined for arbitrarily-bounded sample areas such as squares, circles, or latitudelongitude quadrangles; local relief has also frequently been determined for drainage basins and 'hills' [Mark, 1974].  4.2.2  Standard deviation of altitude (SD)  Evans [1972] observed that most relief measures depend upon the use of extreme values of the distribution of elevations within the sample area as in LR above, and would therefore be sensitive to even minor variations in estimations of these values. To describe the dispersion of a distribution or, in particular, to characterize the vertical dimension of topographic surface, the standard deviation is a more powerful and stable statistic than the range since it is based on all values and not just the two extremes.  Evans, thus, proposed use of the  standard deviation of altitude as the relief measure.  But he also noted that "... the  autocorrelation of altitude admittedly makes range more reliable than it is for random variables, since on a continuous surface all intermediate values between the extremes must  121  be represented [p.31]." Most of the previous studies did not use the statistically preferable standard deviation generally because of its high computational demand with manual methods. Automation of parameter extraction from DEMs using computers has made the effort required to calculate standard deviation insignificant. The standard deviation of altitude is calculated by the following formula.  SD =  where  z  (4.2)  i=l  N  n - 1  is the mean elevation within the moving window and n is the number of grid  points within the window.  4.2.3  Slope and aspect (a and (3)  Slope measures have been considered as the most important type of parameters by geomorphologists and are, thus, most widely used. Evans [1972], for example, stated that"... slope is perhaps the most important aspect of surface form, since surfaces are composed completely of slopes, and slope angles control the gravitational force available for geomorphic work."  Unlike relief and most other geomorphometric parameters, which are normally  defined for sample areas, slope is theoretically defined at every point. Slope at a point is defined in terms of a plane, characterized by its gradient and aspect, tangential to the surface at that point [Evans, 1972].  Mathematically, the tangent of slope (tan a) is the first  122  derivative of altitude (i.e. the rate of change of altitude with distance). In practice, however, slope is generally measured over a finite distance [Mark, 1974] and, thus, involves considerable sampling problems. influence the values obtained.  The size of area over which slope is measured will The effects of different sampling intervals on slope  calculations have been discussed by Gerrard and Robinson [1971].  In a discrete D E M  representation of a continuous surface, slope calculations will depend on the resolution of the grid. Obviously, the accuracy of slope calculations at a point will decrease with coarser grid resolutions. The rate of this decrease in accuracy will depend on the spatial variability of the surface. Certainly deriving slope from the square grid method is safe if, in accord with the sampling theorem, the grid size is less than half the shortest wavelength of variability present. Evans [1972] concluded that grid sizes of 20, 50 or 100 m are suitable for study of mesorelief. If elevation is sampled with a resolution of 1 km or coarser, Evans suggested that slope and convexity be measured separately at each point, and not derived from the altitude matrix.  There are various techniques that could be used to calculate slope and aspect in a D E M . Most methods are based on a 3 by 3 moving window which traverses the D E M . Following DeLotto [1989], the method used in this study is illustrated in Figure 4.2. This method defines gradient as the maximum gradient of either the steepest drop or the steepest rise from the center cell to the eight nearest cells. Aspect is then the direction (in 45° intervals) of the  123  o 1  315  z  z  1  It  270°  Figure 4.2  7  L  z  3  \ T /  - z - *z / 1 \ z z 5  z 225  2  45  8  180'  6  90  9  135  3 by 3 neighborhood of points for slope calculation.  124  maximum gradient. That is, given the elevations for a 3 by 3 neighbourhood, a simple slope angle calculation for the center point can be obtained for each of the four directions with edge-contact neighbors with the formula:  a, = arctan(-^  ^)  (4.3)  d  where z, is the elevation of each of the appropriate neighbors and d is the grid resolution. For each of the four other directions with corner-contact neighbors, the formula is: z ~ Za, = arctan( — -) y/2 d  (4.4)  Once these eight angles have been calculated at a point, the slope for that point is then defined as the maximum of the eight angles, that is, a = max(oCi), and the aspect (p) is the direction of the gradient clockwise from north as seen in Figure 4.2. For example, if the maximum gradient is in the direction of cell z then aspect = 45°. This is only a crude 3  measure of aspect and is likely to be influenced by D E M error.  4.2.4  Roughness factor (RF)  The study of the dispersion of slope angle and direction using three-dimensional vector analysis gives rise to another surface roughness parameter.  Hobson [1972] uses the  distribution of planes to describe the three-dimensional orientation of surfaces within an area, treating the perpendiculars to slope units as vectors and applying well-established  125  mathematical approaches to the analysis of directional data. As illustrated in Figure 4.3 (after Hobson [1972]), the test area is simulated by a set of intersecting planar surfaces or triangular facets formed by inserting diagonals into a regular grid in an altitude matrix. Normals to these planes are represented by unit vectors.  Vector mean, vector strength and vector  dispersion are computed using methods defined by Fisher [1953]. Vector strength indicates the length of the resultant sum of the unit vectors (R) and is obtained by using the direction cosine method. Dispersion (1/K), on the other hand, indicates the variability or spread of the unit vectors in space and is similar, in some respects, to 'standard deviation.' K is calculated as: K - 2LJ1 (N - R)  (4.5)  where N is the number of vectors. As a surface approaches planarity (a roughness of zero), the vectors will become parallel, R will approach N, and K will become infinite. Thus the inverse of K gives an intuitive roughness measure. Vector strength is usually high and vector dispersion low in areas characterized by similar elevations (see Figure 4.3-B) or equal rates of elevation change, whereas non-systematic elevation changes yield low vector strength and high vector dispersion (see Figure 4.3-C).  Extending Hobson's work, Mark [1974] proposes that the best measure of vector dispersion roughness is the roughness factor (RF), defined by: RF = 100 - L(%)  (4.6)  where L(%) is 100(R/N), the vector strength in per cent. In the case of large N, RF will 126  127  approximately equal 100 times the inverse of K. The following equation can also be derived and will be the method used in this study: RF = 100(1 - cosa)  4.2.5  (4.7)  Slope curvature ( S Q  The slope curvature—or the local convexity according to Evans [1972]—is the rate of change of slope, the first derivative of slope or the second derivative of altitude mathematically. It describes the convexity, concavity or straightness of slope profiles and comprises downslope (vertical) convexity and cross-slope (horizontal) convexity. It can be obtained by fitting a quadratic trend surface to the 3 by 3 neighbourhood of each sample point in the altitude matrix and then differentiating the resulting quadratic function for the surface twice—just as slope and aspect can be obtained by fitting a linear surface.  Standard deviation of slope is another expression of degree of curvature, although it does not distinguish convexity from concavity. That is n  SC =  E K - «) i=l N  n -  1  128  2  (4.8)  4.2.6  Number of higher points (HP)  A number of non-standard measures of vertical component of topography have been reported in the literature. One of the measures, for example, involves computation of differences in elevation between the center point of a moving window and the computed mean for that window. Another example involves measurement of the mean difference between the center cell and elevation of cells in a window that are higher than the center cell. The number of points in a window that are higher than the center point (HP) is another such measure used in some studies [Weibel and DeLotto, 1989]. This measure emphasizes terrain discontinuities even more than drainage density or profile convexity emphasize them.  4.2.7  Hypsometric integral (HI)  In order to have an adequate measure of the 'dissection' or 'aeration' of a landscape, that is, the extent to which it has been opened up, especially by erosion [Clarke, 1966], geomorphologists have proposed several measures to describe aspects of the distribution of landmass with elevation. Most of these measures are based on the hypsometric curve. That is the cumulative frequency curve of elevation which was first developed by Imamura [1937]. The hypsometric integral, proposed by Strahler [1952] to represent the area under the dimensionless hypsometric curve (also termed relative or percentage hypsometric curve by Mark [1974]), is now the most widely used coefficient of dissection. It is given by:  129  1  (4.9)  HI = f a(h)dh o  where a ( h ) is the relative area above a height and h is the relative height (altitude above the lowest point), defined by: (z - z • )  h =—  -22—  where z is the actual elevation, and z  min  and z  max  (4.10)  are the lowest and highest elevations,  respectively, within the area. In an effort to reduce the tedium inherent in the calculation of the hypsometric integral measure, Pike and Wilson [1971] showed that the elevation-relief ratio (E) of Wood and Snell [1960] is mathematically equivalent to the hypsometric integral. Therefore, it can be estimated as:  HI = E =  (4.11)  {Z  where z is the mean elevation within the area.  According to Wood and Snell [1960], the measure E expresses the relative proportion of upland to lowland within a sample region. As pointed out by Evans [1972], this expression is also termed the "relative coefficient of massiveness" by Merlin [1965]. Pike and Wilson [1971, p.1081] stated that"... experience has shown that a sample of 40 to 50 elevations will ensure accuracy of E to, on the average, 0.01, the value to which area-altitude parameters 130  customarily are read." Mark [1974] pointed out that Evans's [1972] estimates of HI using grid values, at least for the smaller sub-matrices such as 3 by 3 (9 points), are probably in serious error. Two extensive investigations of terrain geometry by Wood and Snell [1960] and Pike [1963] showed that topographic samples might resemble one another with respect to local relief, average slope, or other geometric aspects, and yet vary appreciably in appearance as demonstrated by different values of E. Identical E values can also represent dissimilar terrain types. It is, thus, necessary to refer to complementary geometric parameters to obtain a comprehensive and meaningful description of a topographic surface. E usually ranges from 0.15 to 0.85, with values tending to cluster between 0.40 and 0.60. Low values occur in terrains characterized by isolated relief features standing above extensive level surfaces, whereas high E values describe broad, somewhat level, surfaces broken by occasional depressions [Pike and Wilson, 1971].  In summary, among all the seven identified variables as discussed above, some of them might actually describe a similar aspect of the roughness of terrain and are, thus, redundant because of correlation. If variables are highly correlated, they add little new information to the multivariate description of the surface characteristics. Therefore, a variable selection process is necessary before they are used in further analyses.  According to Pike [1988a,b] five  groups of geometric measures might be distinguished: (i) statistics of altitude; (ii) variables of the power spectrum of altitude; (iii) statistics of slope at a variable horizontal length; (iv) statistics of slope at a constant horizontal length; and (v) statistics of slope curvature at constant length. Various studies by Evans [1972], Mark [1975b] and Pike [1986] have shown  131  that correlation between variables of these groups is negligible (i.e., they are complementary and explain different aspects of topography). Correlations among variables within the same groups, however, are significant and thus redundancies exist.  4.3  Global Surface Characterization  While the 'moving window' method used above is computationally attractive and has been employed in many studies of terrain classification, the scale-dependent nature of topography limits its usefulness.  Several studies have demonstrated  the sensitivity of general  geomorphometric parameters, such as slope, to changes in D E M resolution [Dubayah and Davis, 1988] [Corbett and Gersmehl, 1987]. Besides, selection of the size of the moving window has, in many instances, been a somewhat arbitrary decision [Mark, 1975b], with the only criterion being a priori knowledge of the approximate size of landforms [Weibel and DeLotto, 1989]. If the moving window is too small, statistical significance of the resulting parameters would be too low because of a small sample size; if the window size is too large, smaller landforms may not be discriminated because geometric signatures are averaged out by the excessive size of the window. Different global characterization techniques, such as grain measure, spectral analysis, nested analysis of variance and fractal analysis, have been proposed to guide the selection of an appropriate optimal window size based on the surface characteristics. The following sections discuss these techniques.  132  4.3.1  Grain  The above-mentioned local derivatives of elevation at a point, and moments of their distribution over an area, cover all geomorphometric concepts except for horizontal variation. Grain has been used generally to describe in some way the scale of horizontal variations in the terrain. Grain is used to refer to the size of area over which the other geomorphometric parameters are to be measured and is dependent on the spacing of major ridges and valleys [Wood and Snell, I960]. The grain of a surface can be determined by calculating the local relief within concentric circles around a randomly-located point. According to Wood and Snell, the grain exists at a 'knick point' in the curve plot of local relief against the circle diameter, which is a point where the line representing relief stops to increase dramatically with distance.  The diameter at the knick point will then be the grain. It indicates the  characteristic wavelength or optimal sampling area of the topographic surface.  4.3.2  Spectral analysis of terrain  Periodic landforms have been described often in the literature. It has been shown that regular spacing of landform elements may actually be a dominant aspect in many types of topography, including meandering river valleys, ridge-and-valley terrain, sand ripples and dunes, and some drumlin fields. Information about horizontal variations in the topography may be obtained by study of the autocorrelation properties of elevation and its derivatives [Evans, 1972]. Topographic signatures could be made more discriminating by including a  133  variable that is sensitive to the periodicity of terrain.  The spectrum is a quantity well-known to electrical engineers. It is used in spectral analysis to examine the power content at different frequencies or wave lengths in signals. Spectral analysis was first used on terrain by engineers to determine surface roughness for military purposes [Pike and Rozema, 1975].  The basic premise of spectral analysis is that by transforming the data from the time, h(t), or space, h(x), domain into the frequency domain, H(f), certain types of interpretation and manipulation may be more easily performed. For every space domain feature there is a frequency equivalent which consists of one or more sinusoidal frequency components having distinctive wavelengths, phase leads or lags, and directions. Both domains contain exactly the same information [Robinson, 1973]. The representation of the terrain in the frequency domain greatly simplifies the separation of various surface forms or features of various sizes and amplitudes. Fourier transformation is most commonly used for frequency analysis. The two-dimensional Fourier transformation and inverse Fourier transformation equations can be expressed as:  Hif f ) v  2  =  l\Kx,y)e  -2ni(f *f y) lX  2  dx dy  (4.12)  oo oo  h(x, y) = j fH(f f ) v  2  134  e  df, df  2  (4.13)  where the data surface h(x, y) has coordinates (x, y), and the spectral array H, which is complex, has frequency (or wave number) coordinates (fj, fj.  For discretely sampled data, a summation sign would replace the integral and the equation (4.12) becomes:  H(n  v  where:  h(k  n) = £ 2  E Kk  k ) - ^"^ 2  v  2  e  «-™W"i  (4.14)  = elevation at (k kj on surface;  lt  p  Nj, N = total number of elevation sample points along each direction. 2  The two-dimensional Fourier transformation involves the fitting, by using the least-squares criterion, of a set of parallel sinusoidal waves of varying wave length and orientation. The transformed coefficients define the amplitude, phase (point at which the particular wave intersects the origin) and orientation of each wave. The original elevation data array is replaced by a new array of the same size which describes the power information content (or magnitudes) at different wave lengths in the fluctuations of the terrain surface.  The  coordinates are now wave length rather than distance from an origin. The amplitude of the wave is a measure of the variability at that wave length. The transformed array, therefore, presents a ranking of the variability as a function of magnitude and direction of scale [Rayner, 1972].  The inverse Fourier transformation can simply transform the ranked  variabilities back to space domain data.  135  The continuous variance spectrum can be estimated from the autocorrelation results by integrating the autocovariance function for all lag intervals according to one of the Fourier functions.  Theoretically, the actual continuous spectrum of a random series can only be  determined from an infinite record. Most spectra, however, are derived from only small samples of much larger data populations. Therefore, many statistical considerations are important in the selection of an appropriate procedure to estimate the true spectrum from a sample of finite length. First, spectral theory assumes that the series examined is stationary; that is, the mean and higher moments of the series are time (or space) invariant. The statistical properties of a topographic profile must remain unaffected by any change in its location or orientation. Long and regional topographic trends may cause nonstationarity and distort other parts of the spectrum by dramatically increasing the variance in all wave length bands, and, thus, should be removed or filtered out before the variance spectrum can be estimated.  The second consideration is the aliasing problem in spectral analysis which  involves the distortion of the 'true' spectrum by wave lengths less than twice the sampling interval.  Aliasing error is inherent in all topographic profiles or surfaces which are  constructed from sampled data.  The distance between sample points limits the range of  frequencies that can be determined. The highest detectable frequency, the 'Nyquist,' has a wavelength of twice the sampling interval. Sampling theory shows that frequencies higher than this limit may be present in the data set but they cannot be detected or measured. Their amplitudes, however, appear as aliased additions to the amplitudes of lower and presumably more accurately measured frequencies.  Aliasing can often be overcome by choosing a  sampling rate at least twice as fine as the size of any important features to be detected. Or  136  it can be minimized by using a digital band pass filter which can be used to extract a limited band of frequencies.  Filtering is accomplished by multiplying the amplitudes of the  frequency components by some value. Usually the multiplier is zero for frequencies that are to be deleted and unity for those that are to be retained unchanged [Robinson, 1973]. The third consideration is tapering the finite-length series. It is often desirable to taper a random series at each end by multiplying the series by a 'data window,' analogous to multiplying the correlation function by a lag window [Otnes and Enochson, 1978].  It is equivalent to  applying a convolution operation to the 'raw' Fourier transform. The purpose of tapering when looked at from its frequency domain effect is to suppress large side lobes in the effective filter obtained with the raw transform. When viewed from the space domain, the effect of tapering is to 'round off potential discontinuities at each end of the finite segment of a long series being analyzed.  By the means of the spectrum, a statistical plot of the wave length of terrain undulations against the variance (square of the amplitude) of the different size-ranges of terrain undulations, spectral analysis can ascertain and express the spacing of topographic periodicities and measure two other landform properties: absolute terrain roughness and relation of large-scale to small-scale roughness characteristics [Pike and Rozema, 1975].  Two kinds of spectral analysis can be used in topographic studies: the one-dimensional spectrum for a topographic profile, and the two-dimensional spectrum for a matrix of elevations. Each method has its advantages and disadvantages. The major advantage of two-  137  dimensional terrain spectral analysis is that it explicitly retains any directionality in the topography [Steyn and Ayotte, 1985] and thus provides the most complete representation of the scales of terrain variability.  Anisotropy is often encountered in areas with regional  geologic or structural fabrics. Fourier domain approaches can be used in which topographic anisotropy would be plainly visible in two-dimensional power spectra—the isopleth plots of spectral amplitude in wavenumber space. Perfectly circular isopleths centred on the (0, 0) point in wavenumber space would represent a totally directionless topography.  Any  systematic deviation from circularity would then indicate directional bias.  Figure 4.4 illustrates an example of two-dimensional spectral analysis of B.C. 5' D E M . A standard two-dimensional discrete Fourier transform function fft in Splus software package was used for the analysis. Before the Fourier transform analysis, the data arrays were first averaged, and the average elevation subtracted from each point in order to remove the large amplitude spike at zero wavenumber which represents the average terrain elevation. Then all topographic trends were removed and a cosine taper was applied to the resulting D E M matrices in order to reduce the variance introduced to the spectral estimates by the edge discontinuity. The spectrum in Figure 4.4 has symmetry about the origin and it demonstrates a considerable directional bias. It shows marked anisotropy, with variance contributions extending to much greater wavenumbers in a northeast to southwest direction. Such a pattern represents a strongly ordered terrain with a series of ridges and valleys running in a northwest to southeast direction which, in this case, represents the coast and rocky mountains' structural lineation and the trellis drainage pattern.  138  139  One disadvantage of two-dimensional spectral analysis is that, instead of providing a single measure of surface wave length, it produces a matrix of power at various wave lengths and directions.  One-dimensional spectral techniques are, therefore, of continuing value, for  dealing with either the average of the two-dimensional spectrum over all directions, or the spectrum of profiles across or along the 'grain' (i.e., the longest significant wave length in the topography) of the surface. In the case of directionless topography, an isotropic spectrum should be just collapsed to the simpler, one-dimensional case. The one-dimensional spectrum organizes the geometry of a terrain profile according to different sizes (or wave lengths) of topographic undulations. Since the greatest vertical variability within an original topographic profile is almost always associated with the longer wave lengths, the spectral density function of the variance (i.e., the Fourier transform of the autocovariance function) diminishes rapidly with decreasing topographic wave length or increasing frequencies. According to Pike and Rozema [1975], each spectrum plot conveys mainly the following information: 1)  spikes in the plotted spectrum curve mean periodicities of topography.  A  prominent peak in a topographic spectrum indicates that the original profile contains an unusually high number of terrain undulations within a restricted horizontal scale range, or perhaps a few undulations at that wave length may have unusually high local relief. While one would imagine that this periodic behaviour could be most easily detected by simply examining the original terrain profiles, Pike and Rozema explain that "the advantage of interpreting spectral peaks lies not in corroborating evident regularities, but rather in identifying unexpected or less obvious periodicities in the topography."  140  2)  the overall displacement of the curve above the horizontal axis measures variance in elevation of the profile and indicates absolute roughness of the original topography at the indicated wave lengths.  3)  the slope of the curve expresses the relative importance of large-scale and small-scale landforms.  Steeper spectral curves mean proportionately less  roughness in small topographic features than in larger features.  A lower  spectral slope indicates that small topographic features are rougher than large features, a common natural condition. In general, according to Frederiksen et al. [1985], if the slope of the spectrum is larger than 2.5, the landscape is smooth because of the absence of high amplitudes at high frequencies. On the other hand, a slope less than 2.0 indicates a rough surface with relatively large variations of high frequencies.  4.3.3  Nested analysis of variance  Another way of examining scale effects in digital terrain modelling is through the use of a nested analysis of variance technique. This method was first proposed by Moellering and Tobler [1972]. It is analogous to spectral analysis but more adapted to the types of data available in geography. That is, it can be used not only for situations in which the data are available at uniform intervals of space, but also for any nested hierarchical geographical data which are irregularly spaced. Human society generally is arranged into nested hierarchies such as governmental units; townships, counties, provinces, and nations; or census tracts. The  141  geographical hierarchy orders different levels by areal size, and this can be taken as a surrogate for scale or resolution. Analyzing the data at different levels of the hierarchy is thus equivalent to analyzing the data at different geographical scales [Moellering and Tobler, 1972]. Analysis of variance (ANOVA) is done through a hierarchical investigation of a region, where different classes are not different zones in the region as in more conventional applications of A N O V A , but rather are different resolutions of the same zone. The idea is to find the resolutions (scales) where significant variability exists in the data, assuming that the best resolution for studying a phenomenon is the one "where the action is", or where it is not. This may give clues to the scale at which the phenomena of concern are operating.  Regular grids in a D E M present a simplified situation for the application of a nested A N O V A . Grouping of the grid cells is a natural way to form a hierarchy from such data, same as the concept of a quadtree structure.  Figure 4.5 illustrates the nested hierarchical  structure of a square matrix. As shown in the figure, the square grid cells nest in a ratio of 4:1 from the lower level to the higher level. The four levels shown would serve as classes in a nested A N O V A procedure. Using these levels as classes, the variance at each resolution can be identified as a percent of the total variance by comparing the mean at one level to the mean of the level just above it. The total sum of squares is given by:  ss  total  where  X  ijk  = E (** " If  is the data value at each observation and  of squares at each level is given by:  142  -  (4 15)  x  is the grand mean. The sum  i  ss  =E  -  x.)  2  2 = E <*<., ~  (4  ss  ss* = E  -V  -  16)  2  where each X represents the mean of the area as shown in Figure 4.5. The total sum of squares will equal the sum of components at each level. That is, totci = SS SS SS  SS  1+  2+  3  (4.17)  The above equation shows how the variation around the grand mean may be partitioned into parts attributable to the various scale levels.  Figure 4.6 shows a set of artificially constructed data used by Moellering and Tobler to demonstrate this technique. The results of a nested A N O V A on that data are shown in Table 4.1—a typical analysis of variance table as developed by R.A. Fisher. In contrast to an ordinary analysis of variance procedure, the significance test using the F-ratio was omitted from the above statistical table by the authors because they assumed a complete enumeration of the population rather than just a sample. As is clearly indicated in the table, all of the "action" is on levels 1 and 4 while there is no "action" on levels 2 and 3.  143  Level 0  Level 2  Level 1  Level 3  Figure 4.5  Hierarchical structure of a square matrix.  144  2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8  5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5  2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8  Figure 4.6  5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5  2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8  5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5  2 5 2 5 2 5 2 5 5 8 5 8 5 8 5 8  5 2 5 2 5 2 5 2 8 5 8 5 8 5 8 5  5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5  8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2  5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5  8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2  5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5  8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2  5 8 5 8 5 8 5 8 2 5 2 5 2 5 2 5  8 5 8 5 8 5 8 5 5 2 5 2 5 2 5 2  A numerical example of a 16 by 16 even hierarchy (after Moellering and Tobler, 1972)  145  Table 4.1  Scale variance for the artificial data in Figure 4.7  Scale Level  Sum of Squares  Percent of Total SS  DF  Mean square  ScaleVariance Component  4  576.0  50.0  192  3.0  3.0  3  0.0  0.0  48  0.0  0.0  2  0.0  0.0  12  0.0  0.0  1  576.0  50.0  3  192  3.0  0  1152.0  100.0  255  -  -  146  4.3.4  Description of Terrain as a Fractal Surface  Terrain models can be described by the concept of self-similarity and fractals. 'Fractals,' or fractional dimensions, were first presented by Mandelbrot [1975] as a new mathematical basis for describing many complex scale-invariant natural patterns. Mandelbrot introduced the term 'fractal' specifically for temporal or spatial phenomena that are continuous but not differentiable, and that exhibit partial correlations over many scales [Burrough, 1981]. By studying various natural phenomena such as coastlines, Mandelbrot observed a form of invariance with respect to changes in scale over a wide range, and he introduced the concept of self-similarity to describe this phenomenon [Frederiksen et al, 1985].  By definition, self-similarity is a property of certain curves where each part of the curve is indistinguishable from the whole, or that the form of the curve is invariant with respect to scale. That is, a spatial feature is said to be self-similar if any part of the feature, when appropriately enlarged, is indistinguishable from the whole feature.  In the case of natural  phenomena such as coastlines, self-similarity must be interpreted statistically—that is, each part of the curve should be statistically indistinguishable from the whole [Goodchild, 1980]. Either in a strict or a statistical sense, fractals are invariant as a result of transformations of scale [Mandelbrot, 1977].  This concept may be illustrated when measuring coastline length. Any objective method used to measure the length of the irregular line has an implied sampling interval in the form of the  147  diameter of a wheel rolled along the line or the step size of a pair of dividers. It has been shown that over a wide range of scales there is a tendency for the length of a coastline to behave in a predictable manner such that when length is plotted against sampling interval on logarithmic scales the points tend to follow a straight line. Not only does more detail become apparent at larger scales, but it tends to do so at a predictable rate. Suppose that the length of the coastline is measured using dividers with step size s  lt  and that n, such steps are found  to span the line, then the length of the line will be estimated by n s . 7  7  If the process is  repeated with a smaller step size, s , another length estimate, n ^ , will be obtained and it will 2  be greater than or equal to the previous estimate. The more irregular the line, the greater the increase in length between the two estimates.  One may define the Hausdorff-Besicovich dimension (D) of an irregular line as follows [Goodchild and Mark, 1987]:  D =  login-In,) 2  1  logCy^)  (4.18)  A similar definition can be applied to the situation of using a square grid to measure the area of an irregular patch or a rough surface. The value of D characterizes the intricacy or the jaggedness of the entity. Lines have dimensions varying from 1 to 2 while surfaces are described by values of D ranging from 2 to 3. As D increases towards the upper limit of the range, the entity becomes so rough and irregular that the line would take up the whole of a plane space and the surface would fill a volume space.  148  A fractal, according to Mandelbrot's original definition [1977], is a set for which the Hausdorff-Besicovitch dimension (D) strictly exceeds the topological dimension in Cartesian geometry (e.g., 0 for points; 1 for lines; 2 for areas); for such functions, D is often termed the fractal dimension. It appears, however, that most real spatial entities such as coastlines are not fractals in the pure sense of having a constant D but in a looser sense of exhibiting the behaviour associated with noninteger  dimensions [Goodchild and Mark,  1987].  Mandelbrot's revised definition of fractals states more openly that a fractal is a shape made of parts similar to the whole in some way. Self-similarity is an important concept of fractal geometry and fractal scaling is easily demonstrated using self-similar fractals.  In this  compound term, while 'fractal' points to disorder and is compatible with intractable irregularity, 'self-similar' points to a kind of strict order [Mandelbrot, 1977]. However, the concepts of 'fractal' and 'self-similarity' should not be confusingly mixed. In fact, not all self-similar objects are fractals. A straight line would be such an example.  Several different but related methods have been proposed to determine the fractal dimension of surfaces: the dividers method, the cell counting method, and the variogram method [Goodchild, 1980; Burrough, 1981; Shelberg et al, 1982; Shelberg et al, 1983; Mark and Aronson, 1984; Eastman, 1985; Roy et al, 1987; Clarke and Schweizer, 1991; Klinkenberg and Goodchild, 1992]. Among them, the cell counting and the variogram methods use the elevation values directly while the dividers method use the contour lines derived from the elevations. There is still controversy in the literature over the effectiveness of some of the different measurement methods and how much variation to expect between different methods.  149  Therefore, the search for more robust methods of estimating fractal dimension is a topic of current research [e.g., Clarke and Schweizer, 1991].  Surface fractal models may be created through fractional Brownian motion (fBm) [Mandelbrot, 1975; Goodchild, 1982; Burrough, 1983] which in practice may be generated from fractional Gaussian noise.  The fractional Brownian functions are characterized by  variograms of the form: E[(Z(x) - Z(x+d)) ] = C(\d\) 2  2H  ( -!9) 4  where C is a constant, E[.] denotes the mathematical expectation of the argument, d is the distance (or lag) between two points, and Z(x), Z(x+d) are the elevation values observed at locations x and x+d respectively.  A variogram is often shown as a plot of squared  differences between paired observations, averaged by distance bins, against distance between paired observations. The essence of the variogram technique is that the statistical variation of the elevations between samples is some function of the spacing between them.  This  technique is the principal tool of a method of spatial modelling known as regionalized variable theory and it is well-known for determining the degree of spatial dependence between samples in the geosciences to model geological, geochemical and geophysical phenomena [Davis, 1986]. The term regionalized variable was coined by Matheron [1963] to describe variables whose variation in space is erratic and often unpredictable from one point to another. Yet, the behaviour of these variables is not completely random; values taken at neighboring points are related by a complex set of correlations reflecting the structure of the underlying phenomena.  That is, a regionalized variable has properties  150  intermediate between a purely deterministic process and a purely random one [Carr, 1995]. The variogram takes on the form of a power function in which the parameter H varies between 0 and 1, and relates to the fractal dimension of the surface through D =3 -H  (4.20)  H is an indicator of the surface complexity [Burrough, 1981]. The smaller is H, the larger is D and the more irregular the surface. On the contrary, the larger is H, the smaller is D and the smoother the surface. In the case of profiles  D=2-H.  (4.21)  The fractional Brownian model (fBm) provides a method of generating irregular, self-similar surfaces that resemble topography and have known fractional dimension [Mandelbrot, 1975]. It allows simulation of terrain with specific surface characteristics. Figure 4.7 shows example realizations of fBm for various values of H.  If the surface has fractal characteristics, then H is approximated by the slope, p\ of the bestfitting line produced from the plot of the mean-squared difference in the elevations as a function of the distance between samples on log-log scales. That is, H = [3/2 or D = 3 (pV2j. The ordinate-intercept of the best-fitting line represents the expected difference in elevation for points a unit distance apart and is referred to as the gamma value by Klinkenberg and Goodchild [1992]. It has been shown that gamma captures some aspect of  151  z  A  152  the magnitude of the roughness of the surface and has significant correlations with the standard deviations of the elevations.  The combination of the fractal dimension and the  gamma value is anticipated to capture the essential characteristics of the land surfaces. Of course, a real fractal surface is ultimately unmappable because by definition it includes variation at all spatial resolutions from zero to infinity. Fractal characteristics can only be assessed within a limiting maximum resolution (corresponding to the overall size of the sample) and minimum resolution (corresponding to the sampling interval) [Goodchild and Tate, 1992].  If the statistical behaviour of the land surface is not similar to that of a fractional Brownian surface, then the variogram derived dimension may not represent the fractal dimension. One way of judging the goodness of fit of the self-similarity model is to observe the linearity of the plots used in the determination of the fractal dimension [Yokoya et al, 1989]. This can be accomplished by applying an interactive least-squares line fitting procedure during the analysis stage.  Self-similarity implies that the spatial structures inherent within a landscape repeat themselves at all scales, that is, a part of the whole resembles the whole [Clarke, 1987]. Empirical research results show, however, that the self-similar fractal model provides a very good fit for some land surfaces but an imperfect fit for some others [Klinkenberg and Goodchild, 1992]. As shown by Mark and Aronson [1984], it is not likely that fractal characteristics are displayed by terrain as a whole.  Real surfaces are rarely pure fBm, but often contain  153  departures from the fBm model in the form of local linear trends and scale dependencies [Goodchild and Tate, 1992]. Real surfaces are often multi-fractal in that D is observed to vary both spatially and with scale. This is because earth-forming and earth-moving processes have been at work on almost all landscapes, for varying amounts of time, and as a result the landscape bears the 'forms' or manifestation of the scales at which these processes operate or operated in the past [Clarke, 1987]. Topography may have many statistical properties similar to fractional Brownian surfaces, but different dimensions may apply over different scale ranges separated by distinct scale breaks. For scale ranges between adjacent breaks, surface behaviour should be that predicted by the fractal model; the breaks represent characteristic horizontal scales at which surface behaviour changes substantially. These scale breaks are especially important for digital terrain modelling, since they represent scales at which there is a distinct change in the relation between sampling interval and associated error [Mark and Aronson, 1984].  Mandelbrot's D can be used as a useful indicator of the complexity of autocorrelations over many scales for natural phenomena.  For surfaces with high fractal dimension, the  autocovariance is low, and points cannot be accurately predicted (interpolated) from the elevations of neighboring points. Thus, considerable information is lost as the sampling interval is increased. Conversely, when the fractal dimension is low, the surface is 'smooth,' and elevations can be interpolated from their neighbors; the rate at which precision decreases as the sampling interval is increased would be much slower [Mark and Aronson, 1984].  154  However, although many natural phenomena do display certain degrees of statistical selfsimilarity over many spatial scales, there are others that seem to be structured and have their levels of variability clustered at particular scales. This behaviour does not exclude them from the fractal concept. Mandelbrot considers that it is quite acceptable to have a series of zones of distinct dimensions connected by transition zones. If this is reasonable, it means that the examination of D values would be useful for trying to separate scales of variation that might be the result of particular natural processes. Moreover, identifying such scales could be of enormous practical value because one could then tailor sampling to a particular scale range of the phenomenon in question, therefore improving the efficiency of expensive field investigations and the resulting interpolations [Burrough, 1981].  Research by Klinkenberg and Goodchild [1992] also shows that the fractal dimension is capturing an aspect of the land surface which is not reflected in the traditional morphometric parameters.  Therefore, it will be combined with other parameters to quantitatively  characterize the terrain surface. As shown by Goodchild [1980], some error in computer representations of surfaces is related to sampling interval through the fractal dimension. Changes in the slope of the best fit line indicate breaks in scaling where fractal dimension changes.  Thus those breaks would be of key significance in assessing the efficiency of  sampling densities for digital elevation models.  155  4.4  Hierarchical Terrain Classification  4.4.1  Principles of terrain classification  Classification, the ordering of the basic units—usually referred to as operational taxonomic units, OTUs, by numerical taxonomists—into groups on the basis of their relations and a specified set of criteria, is one of the fundamental procedures in any scientific discipline. Geographers have traditionally been concerned with the description and differentiation of the earth's surface [Mather, 1972]. In the late 1960s and early 1970s there was a considerable amount of effort devoted to the development and application of numerical methods of classification (numerical taxonomy) to geographical data [Burrough, 1986].  Cluster analysis is the name given to a bewildering assortment of techniques designed to perform classification by assigning observations to groups so that each group is approximately homogeneous and distinct from the other groups [Davis, 1986]. In general, two types of approaches are identified in areal classification or clustering—hierarchical and nucleated. Hierarchical groupings usually involve a varying threshold which denotes the level of connection necessary for potential group entrance.  OTUs that are highly connected are  grouped together at an early stage, and as the threshold is lowered more OTUs or groups merge to form classes at different levels of generalization until all OTUs are joined to form a single class. The result is a hierarchy of classes and is typically illustrated in the form of a linkage tree diagram or dendrogram. This will be useful in the examination of the role of  156  scale and, therefore, will be the method used in this research.  In hierarchical clustering,  partitions are achieved by cutting a dendrogram after examining the difference between fusion levels or deciding on the appropriate number of clusters.  Hierarchical classification  techniques may be further subdivided into agglomerative methods which proceed by examining the similarities between individual data points before fusing them into groups, and by divisive methods which consider the whole data set first, and then examine the best ways to successively subdivide into finer groups [Dunn and Everitt, 1982]. A large number of algorithms for cluster analysis have been developed and many are today available in the form of well-documented computer packages (e.g., Splus).  Nucleated clustering method can be used for the classification of objects which can be represented as points in Euclidean space of some number of dimensions. The aim is to partition a set of n points into m groups so as to minimize the total within group variance (i.e., sum of squares) about the m nuclei.  It operates on the similarity between the  observations and a set of m arbitrary starting points that serve as initial group centroids. The observation closest or most similar to a starting point is combined with it to form a cluster. Observations are iteratively added to the nearest cluster, whose centroid is then recalculated for the expanded cluster. This appears to be a more reasonable method of obtaining compact clusters since long chains of points are avoided.  157  4.4.2  Concept of distance  Most clustering methods use a concept of 'distance' in the variable (or property) space as a measure of similarity. The closer the data points in m-dimensional space, the more similar the data points. The property values form a multivariate description that can be partitioned into groups through a number of different criteria. Thus, the range of methods for calculating similarities and for establishing linkages between items and groups (e.g., single-, average-, or complete-linkage) leads to a large number of possible classification strategies. Ideally, if the data are well-structured, then different methods should yield similar results [Burrough, 1986]. The goal of different grouping criteria is usually to minimize the distance vector within groups and maximize between-groups-distance.  The similarity between a pair of data points with a set of m properties in the data space can be estimated in several ways. The method used will depend on the types of property studied, and on the way in which the information has been coded. The complement of the similarity of two units is their dissimilarity, and in many cases it will be this measure that is determined from the data. Sokal and Sneath [1963] identify three types of measures of similarity or dissimilarity—coefficients of association, correlation coefficients and distance measures. Spence and Taylor [1970] add a fourth type of measure-the probability approach.  Coefficients of association involve presence/absence (i.e., binary) data arranged into a 2x2 frequency table of OTU against OTU.  The best known measure of association is the %  2  158  statistic and it has been widely used in taxonomy, particularly in divisive approaches. When dealing with data other than on a binary scale some other measure of similarity must generally be used.  Correlation measures exist for all levels of measurement scale. For  example, the non-parametric measures—the contingency coefficient and Spearman's rank correlation coefficient—can be used for nominal and ordinal data respectively. The most commonly used correlation measure of similarity for interval/ratio data is the Pearson's product-moment correlation coefficient described in most statistical textbooks. It should be pointed out, however, that not all taxonomists believe this coefficient to be a suitable measure of similarity.  Distance is a more restrictive measure of dissimilarity. It measures identical items by a zero value, and by increasingly large (positive) values as the proximity of the items decreases [Murtagh and Heck, 1987]. Given the coordinates of any two points on a plane the distance between them can be easily calculated by using the Pythagorean sum of squares equation. This equation can be generalized to measure distances between points in m-dimensional space, given their coordinates. Thus the simplest distance measure:  4=  m  where d is the distance between points i and j in a space of m dimensions with X being the tj  value of property k. Such an expression arises from the assumption that the m properties are represented by m orthogonal axes (i.e. axes at right angles to each other). As one would  159  expect, a low distance indicates the two points are similar or "close together," whereas a large distance indicates higher dissimilarity.  The Euclidean distance measure defined above is conceptually the most straightforward and has been the dissimilarity measure most widely used in numerical taxonomy. It is important to note, however, that this measure should only be used where the space is orthogonal [Spence and Taylor, 1970]. Otherwise many of the properties of Euclidean geometry will no longer apply, including Pythagoras's theorem.  However, in practice, because of the  correlation of different properties examined, the assumption of orthogonality will not be justified and the Euclidean distance will be a poor measure of distance between OTUs. This problem can be overcome either by using oblique coordinate axes and a measure such as Mahalanobis' generalized distance or by transforming to principal component axes [Dunn and Everitt, 1982].  Mahalanobis' distance measure has the advantage in that it allows for  correlations between variables. The principal component analysis is a mathematical technique for examining the relations between a number of individuals, each having a set of properties. The original data are transformed into a set of new properties, called principal components, that are linear combinations of the original variables. The principal components have the property that they are orthogonal (independent) of each other.  4.4.3  Variable standardization  Another important point to make clear is that Euclidean distance calculated directly on the  160  raw data may make little sense if the m properties observed on each O T U have different scales or measurement units—as is often the case. Some scaling of the data will be required before using a distance. Euclidean distances. deviations.  Thus all properties should be standardized prior to computing  By this is meant expressing each property in units of standard  This standardization transformation is done for each property dimension (k)  simply by subtracting the mean of each variable distribution ( x ) from each raw data value k  (x ) and dividing by the standard deviation (s ) of all the data values of that property. That ik  k  is: (4.23)  This is also called the standard score (or Z score) for an element.  It ensures that each  variable is weighted equally. Otherwise, the distance will be influenced most strongly by the variable which has the greatest magnitude.  4.4.4  Terrain classification  Automated terrain classification involves the partitioning of an area into homogeneous topographic regions through quantitative interpretation of a digital terrain model.  The  classification of elements in a D E M is performed in the same way as the classification of a remotely sensed image in digital picture processing.  In remote sensing, land-cover is  classified on the basis of spectral reflectance values over different wavelength bands. In the  161  case of terrain classification, rather  than using spectral signatures,  a variety of  geomorphometric measures are employed instead. The geomorphological data matrix (which consists of various geometric measures) is used as input for data processing and statistical classification using a hierarchical clustering method. The result of this procedure is a set of terrain units homogeneous in terms of their measured differentiating characteristics.  Weibel and DeLotto [1988] presented a generic sequence of steps involved in the process of automated terrain classification.  It consists of three steps: specification of variables,  extraction of geometric signature, and multivariate classification. These steps are shown graphically in Figure 4.8. Different variables may require that different extraction techniques be applied; and a different classification algorithm may be used depending on the type of variables selected.  The first two steps (i.e., specification of variables and extraction of  geometric signature) have been discussed and illustrated in earlier sections. Multivariate classification consists of the following two steps: (i) preprocessing for variable selection and/or transformation, and (ii) grouping of homogeneous areas into distinctive classes. Both steps are affected by a priori knowledge. A logical structure is necessary to indicate how a set of variables over a group of observations can be ordinated, scaled according to some measure of similarity, and finally grouped or divided. The purpose of preprocessing is to remove multicollinearity between different variables. Variables that are used in a multivariate classification model should not be highly interdependent.  Only if the variables are not  correlated may it be assumed that each variable contributes an individual share to the explanation of topographic variation and is, thus, useful. Two different methods may be used  162  digital terrain model  •  specific ation of variaibles  a priori knowledge  extraction of geometric signature  classification (preprocessing, grouping)  classes of homogeneous terrain characteristics  Figure 4.8  Generic sequence of automated terrain classification.  163  to achieve this: variable selection based on correlation analysis, or variable transformation (i.e. ordination) by a variety of multidimensional scaling methods such as principal components/factor  analysis.  Factor analysis transforms  a highly interrelated  and  interdependent data set into a series of basic orthogonal or independent dimensions, so it can be used to disentangle unknown interdependencies in a particular set of data [Spence and Taylor, 1970]. Each new dimension represents an independent or orthogonal axis of variation on which an individual variable or observation can be rated. However, the new uncorrelated synthetic variables thus derived have to be interpreted anew since no one-to-one causal relationship exists between the original and the derived variables. Variable selection only eliminates significantly correlating variables, the remaining variables are part of the set of original variables.  Interpretation of classification results may therefore be more easily  accomplished if only variable selection is used [Weibel and DeLotto, 1988]. That is, the classification results can be analyzed in relation to the original roughness variables so that the characteristics of various terrain clusters are fully understood. For this reason, variable selection method is used in this research.  4.5  What is next?  In the following chapter, a global characterization of the surface variabilities will be accomplished using the various methods discussed in section 4.3 in order to identify the significant scale levels or breaks in the study area surfaces. Those significant scales will then be used to guide the selection of the moving window sizes that will be tested for the  164  extraction of the local roughness measures. Finally, a multivariate statistical analysis, based on the local geomorphometric measures derived from the study area DEMs, will be used for automated hierarchical terrain classification in which relatively homogeneous terrain types at different scale levels will be identified. Some detailed test results based on the study area multi-scale D E M data sets will then be presented. Figure 4.9 summarizes the steps that will be followed in the next chapter for the study area surface characterization and classification.  165  DEM  Global characterization  a priori knowledge  Grain Spectra ANOVA Variogram  Specification of local measures Selection olfthe moving wirtdow sizes  Extraction of the local measures  Variable group selection  Hierarchical clustering  Interpretation of the terrain clusters  Figure 4.9  Major steps for study area topographic characterizaton.  166  5.  TOPOGRAPHIC C H A R A C T E R I Z A T I O N RESULTS  5.1  The Role of Scale in Topographic Characterization  Because of the wide range of topographic variation present in different landscapes, elevation is no exception to the 'scale problem.' In the case of digital terrain modelling, since the degree of roughness of elevation data is important when trying to make interpolations from sample point data, such as by least-squares fitting or kriging, it is worth examining the elevation data beforehand to see if the data contain evidence of variation over different scales, and how important these scales might be [Burrough, 1981].  As indicated in section 4.1, it is useful to determine some global characteristics of the surface from the study area DEMs. Firstly, they can be used to examine the relation between general surface characteristics and significant scale breaks. It is hoped that some general conclusions can be made with respect to the determination of appropriate/optimal window sizes for the extraction of local measures. Secondly, they may be helpful when making comparisons with regard to the usefulness of a D E M error model for various surfaces (e.g., 93G, 93H, and E M R l ) and different resolutions (e.g., 50 m and 1 km).  This section will examine the role of scale in topographic characterization and interpret some global characterization results. The global characteristics of the whole study area and the two subareas will be identified using the grain measure, spectral analysis, nested analysis of  167  variance and fractal analysis of EMR1 and T R I M DEMs using the methodologies discussed in section 4.3.  5.1.1  Grain determination  As discussed in section 4.3.1, grain can be used to guide the selection of the size of area over which the other local geomorphometric parameters are to be measured since it indicates the characteristic wavelength or optimal sampling area of the topographic surface.  It is  determined by the 'knick point' in the plot of local relief against circle diameter, and it is where the line representing local relief levels off and stops to increase dramatically with distance.  Figure 5.1a shows the graphs used to determine the grain of each subarea in the study area based on T R I M DEMs. From these two graphs it can be seen that there exists several 'knick points' in each plot. The flatter subarea T R I M 93G has a grain of greater than 15 km (this represents the limits of the data due to the size of each subarea); however, there are some minor 'knick points' at about 9 km, 4.6 km, 3 km, and 1 km. The rougher subarea T R I M 93H has a smaller grain of 9 km and there is also a 'knick point' at about 4 km. For the whole study area, the grain is determined from EMR1 D E M and the result is shown in Figure 5.1b. The curve in Figure 5.1b indicates some 'knick points' at about 155 km, 110 km, 46 km, and 9 km. Based on the size of the 'knick points', and the consistency of the value, across data sets, it appears that 9 and 46 km are the significant scales in the study area  168  169  170  surface.  5.1.2  Fourier interpretation  The scales at which "forms" exist in the landscape can be examined using Fourier analysis [Clarke, 1987] since the basis of this technique is to separate the characteristic 'forms' by scale. The Fourier interpretation of scale is the spatial wavelength, and this provides an operational definition for the study of processes in terms of scale, also known as spectral analysis. As indicated in section 4.3.2, the spectrum gives a measure of the proportion of a process which occurs at any one of a large number of scales. In the case of one-dimensional spectral analysis, the relation of large-scale to small-scale roughness is expressed in the slope of the spectrum plot, and the spacing of topographic periodicities is indicated by the humps in the spectrum.  Before using one-dimensional spectral analysis, the directionality of the study area topography needs to be examined so that several representative profiles can be extracted for the analysis. The two-dimensional spectrum plot of the study area derived from the 5' D E M is shown in Figure 5.2a, which indicates similar anisotropy in the study area as in the spectrum of whole B.C. estimated from 5' D E M (see Figure 4.4). Figure 5.2b presents another two-dimensional spectrum of the study area but is derived from the 1 km E M R l D E M . Again, anisotropy is evident. Figures 5.2c and 5.2d show the two-dimensional spectrum plots of the T R I M 93G and T R I M 93H subareas. The spectrum of the T R I M 93G subarea in Figure 5.2c indicates  171  Minos-HPON  172  CM  d  o 9vOU909  9vOU9 (LU) 6U!L|)JON  173  9vOU96'9  1 0  174  OOS  092  002  091-  001-  L|jnos-i])JON  175  09  0  some directional bias along two axes (i.e., north-south and east-west directions), while the spectrum of the T R I M 93H subarea in Figure 5.2d displays a circular pattern of isopleths and, thus, no apparent anisotropy, a reflection of the dendritic drainage pattern in the area. The maximum wavenumbers resolvable by the above analyses are 0.5 km" for E M R l D E M and 1  10 km" for T R I M DEMs because of their resolutions (i.e., 1 km E M R l and 50 m T R I M 1  DEMs).  Considering the characteristics of the two-dimensional spectrum plots for various surfaces presented above, several representative profiles were extracted from each D E M . Using the spec.pgram spectral analysis function in Splus, one-dimensional spectra of four different representative topographic profiles extracted from the T R I M 93G and T R I M 93H subareas were produced. Figure 5.3a displays a representative east/west direction topographic profile (i.e., prof.93g.ew) and its one-dimensional spectrum plot for the T R I M 93G subarea. Figure 5.3b shows a representative north/south direction topographic profile (i.e., prof.93g.ns) and its one-dimensional spectrum plot for the T R I M 93G subarea. Figures 5.3c and 5.3d show representative east/west and north/south direction topographic profiles (i.e., prof.93h.ew and prof.93h.ns) and their spectrum plots for the T R I M 93H subarea.  For the whole study area, six different profiles along various directions (i.e., emr.profilel to emr.profde6) were extracted from the E M R l D E M and examined. The first profile (i.e., emr.profilel) was extracted along an east/west direction. The second one (i.e., emr.profilel) was extracted along a southwest/northeast direction 30 degrees from east/west direction. The  176  177  178  m  09  017  02  0  wrupads  Et  o p.  2 J3  000E  0092  0002  0091-  (oi) U0!}BA9|3  180  000!  third one (i.e., emr.profile3) was also along a southwest/northeast direction but 60 degrees from east/west direction.  The fourth one (i.e., emr.profile4) was along a north/south  direction. The fifth one (i.e., emr.profileS) was along a northwest/southeast direction 30 degrees from north/south direction. The sixth one (i.e., emr.profile6) was also along a northwest/southeast direction but 60 degrees from the north/south direction. In order to make the results easily comparable, each profile keeps the same sampling interval of 1 km. Therefore, the profiles that are not along the grid were interpolated from the original D E M . Figures 5.4a-f show these six profiles and their one-dimensional spectrum plots.  Plots of log amplitude (or power) against log wavelength often show a linear decline from high variability at long wavelengths (low frequencies) to low variability at short wavelengths. The above resulting spectral functions can be analyzed and interpreted either visually or numerically. Based on the spectral analysis results shown in Figures 5.3a-d, the following interpretation can be made regarding some scale-related properties for the two subareas. First of all, the two profiles from the T R I M 93H subarea show, as expected, much greater overall roughness than the profiles from the T R I M 93G subarea. From Figures 5.3a and 5.3b, the spectrum plots of an east/west and a north/south topographic profile from T R I M 93G subarea, several prominent peaks can be identified. They appear around frequencies 0.125, 0.17, 0.23, 0.34, and 0.46.  Since the sampling interval of the original profiles is 50 m, the above  frequencies roughly correspond to wavelengths of 400 m, 300 m, 220 m, 150 m, and 105 m. From Figures 5.3c and 5.3d, the following three prominent peaks are identified for the T R I M 93H subarea: 280 m, 170 m, and 110 m. Based on the spectral analysis results for the whole  181  182  183  185  09  000S  Ofr  009S  0002  OS  0091.  (UJ) U0|IBA9|3  187  0  0001-  009  study area (i.e., Figures 5.4a-f), it appears that the following important horizontal scales exist in the topography: 6.7 km, 5.6 km, 4.5 km, 3 km and 2.2 km. A major advantage of onedimensional spectral analysis over other global measures is thus its capability of identifying unexpected or less obvious periodicities in the topography.  The strength of each of the above periodicities varies, as reflected by the width and height of the spectral peak. If the hills and valleys traversed by profiles were randomly dispersed, both horizontally and vertically, then the resulting variance spectra would be linear, featureless plots. It can be seen from the above spectra, however, that linear regressions would not be good fits to them. These poor fits result from the presence of significant peaks within a restricted wavelength band. In Figure 5.3a, for example, three least-squares best fit lines could be roughly drawn to fit different portions of the spectral section and two break points can be identified at around 720 m and 166 m. In Figure 5.3b, two break points are around 1 km and 166 m. Therefore, three scale breaks around 1 km, 720 m, and 166 m are important for subarea 93G. Similarly, the important scale breaks for subarea 93H are around 1 km, 333 m, and 172 m. For the whole study area surface, the break points are around 6 km and 3 km. The identification of these scale breaks and multiple peaks is useful in that they indicate at what scale levels the original terrain contains higher variability.  This  information can then be used to guide the selection of appropriate sampling interval and moving window size. Interestingly, though, the spectra appear to be identifying more high frequencies than the grain plots because of the use of logarithms for the spectra.  188  5.1.3  Examining scale effects by nested analysis of variance  Following the example by Moellering and Tobler [1972], as mentioned in section 4.3.3, nested A N O V A was done for two 256 by 256 D E M matrices extracted from the T R I M 93G and T R I M 93H subareas (i.e., trim.93g.256 and trim93h.256).  The matrices used in this  analysis are 256 x 256 instead of 300 x 320 because the nested A N O V A can apply only to a 2" x 2" square matrix. The results of the scale-variance calculation for the two subareas are given in Tables 5.1a and 5.1b respectively. The scale-variance components may also be graphed as a magnitude versus frequency graph or spectrum as in Figure 5.5a. As can be seen from the graphs, it is evident that for both subareas the greatest amount of the scalevariance is at level 1, the biggest grid cell size, and that the importance of the scale-variance components diminishes quickly as the cell sizes get smaller. This is typical for most natural topographic surfaces because, as mentioned earlier in section 4.3.2, and as evident in Figure 5.5a, the greatest vertical variability within a surface is almost always associated with the longer wavelengths. For both subareas, the most detailed three scale levels (6, 7 and 8) contain no more than 2% of the total sum of squares. However, there are some differences between the spectra of the two subareas. First of all, the variance at each scale level is much higher for the T R I M 93H than for the T R I M 93G subarea because of the greater roughness of the former surface area. Secondly, the importance of the scale-variance components for T R I M 93H does not diminish as quickly as that of T R I M 93G when the scale level increases. For T R I M 93H, scale levels 1 and 2 are almost equally important whereas for T R I M 93G, level 1 alone contains more than 50% of the total sum of squares and the importance of level  189  Table 5.1a  Scale variances for T R I M 93G D E M data (trim93g.256)  Scale Level  Sum of Squares (SS)  Percent of Total SS  Degrees of Freedom  Mean Square  ScaleVariance Component  1  1.30e+08  50.80  3  4.35e+07  2653.40  2  6.68e+07  26.01  12  5.56e+06  1358.38  3  3.34e+07  12.99  48  6.95e+05  678.56  4  1.55e+07  6.02  192  8.05e+04  314.38  5  6.64e+06  2.59  768  8.65e+03  135.13  6  2.66e+06  1.04  3072  8.65e+02  54.08  7  1.05e+06  0.41  12288  8.54e+01  21.36  8  3.76e+05  0.15  49152  7.65  7.65  Total  2.57e+08  100.00  65535  -  -  190  Table 5.1b  Scale variances for T R I M 93H D E M data (trim93h.256)  Scale Level  Sum of Squares (SS)  Percent of Total SS  Degrees of Freedom  Mean Square  ScaleVariance Component  1  4.39e+09  33.74  3  1.46e+09  89348.82  2  4.27e+09  32.77  12  3.55e+08  86785.24  3  2.33e+09  17.90  48  4.86e+07  47415.16  4  1.27e+09  9.78  192  6.63e+06  25897.08  5  4.94e+08  3.80  768  6.44e+05  10056.94  6  1.87e+08  1.43  3072  6.07e+04  3794.74  7  5.85e+07  0.45  12288  4.76e+03  1189.77  8  1.70e+07  0.13  49152  3.46e+02  346.21  Total  1.30e+10  100.00  65535  -  -  191  CD  a 00  CO  G > CD  G CD  co co  CO  CD >  o _CD CO  or i-  O CO  T3  X  CO CO  <D >  _0) CO  o CO  or  oo D  i-I 60 60  X!  d  CM  CM  T3 W Q  009S  0001- 0  000001-  aoueuBA  OOOOfr  co  0  ON  aoueuBA  H  T3 G  a  00  o co  00  ON  CO  CD co  CD  CD  >  CD  r-  <U  CO CD  CD >  a  _CD  0) 0)  S  CO  CO  o &  o CO  or  T3  X  rr  cj> CO  XI  2 3 O &  1> Cl, CM  09  Ofr  03  CM  0  oe  sejenbs jo tuns %  oi  o  sajBnbs jo tuns %  192  G G  2 «S ccj in in G 60  c > s C3 CD  T3  CD  60 1)  in  CM  m  60 60  «J  CO CM  XI  rr  d  UJ  CO  CO  CM  CM  T3 W Q oo CN  oe os 01 o  ooooe  saienbs jo urns %  a  aouBUBA  c3 00 CN  CD  CD  in  in  w  3  OH  CO CM  s  rr  s  CO  LU  CO  CO  CO  S * " e s g S o o <-> <U  CM  oo CM  CM  in 02  0L  0  00002  0  3  60  saienbs jo wns %  80UB|JBA  193  ,_ U fi  5 XI  X) ir> 08  <D  '3  g  Table 5.2  Scale variances for E M R l . 128(1) and E M R l . 128(2)  Scale  Sum of Squares (SS)  % of Total SS  Degrees of Freedom  Mean Square  ScaleVariance Component  1  439804493  25.94  3  1.46e+08  35791.4  2  337921972  19.93  12  2.81e+07  27500.2  3  243234662  14.34  48  5.06e+06  19794.5  4  214700121  12.66  192  l.lle+06  17472.3  5  218902869  12.91  768  2.85e+05  17814.4  6  157121768  9.26  3072  5.11e+04  12786.6  7  83735469  4.93  12288  6.81e+03  6814.4  Total (1)  1695421355  100  16383  -  -  1  644961726  29.05  3  2.14e+08  52487.1  2  256899001  11.57  12  2.14e+07  20906.5  3  320717165  14.45  48  6.68e+06  26100.0  4  285311366  12.85  192  1.48e+06  23218.7  5  328478483  14.79  768  4.27e+05  26731.6  6  252269124  11.36  3072  8.21e+04  20529.7  7  130813349  5.89  12288  1.06e+04  10645.6  Total (2)  2219450213  100  16383  -  -  194  2 drops quickly compared to level 1. Since the T R I M D E M resolution is 50 m, scale level 0 corresponds to grid size 12.8 km (i.e., 256 x 50 m); scale level 1 corresponds to grid size 6.4 km; level 2 corresponds to 3.2 km; level 3 is 1.6 km; level 4 is 800 m; level 5 is 400 m; level 6 is 200 m; level 7 is 100 m; and level 8 is 50 m. The results of nested A N O V A indicate that for both subareas not much more terrain variability, hence precision, will be gained by sampling at intervals smaller than 200 m. However, as the sampling interval is coarsened, the rate at which information is lost would be slower for T R I M 93G than for T R I M 93H.  For the whole study area, two different 128 by 128 matrices were extracted from the original 160 by 187 E M R l D E M and a nested A N O V A was done on each to examine the scalevariance at various scale levels.  These two matrices were named EMRl.128(1) and  E M R 1.128(2). They were extracted from the northwest corner and the center part of the original E M R l D E M respectively in order to see what impact the different locations would have on the results. Since E M R l D E M has a resolution of 1 km, scale level 0 corresponds to grid size 128 km, scale level 1 corresponds to size 64 km, and scale level 7 represents a grid size of 1 km. The results of scale variance analysis for each of the two square matrices are displayed graphically in Figure 5.5b. Table 5.2 contains the numerical results of scale variances. Based on these results, several observations can be made with regard to the study area surface variability. Again, in both cases, the greatest amount of the scale-variance is at level 1, the biggest grid cell size (i.e., about 26% in case one and 29% in case two). When compared to the T R I M 93G and T R I M 93H subareas, however, the importance of the scale-  195  variance components does not diminish as quickly as the cell sizes get smaller. Each of the scale levels from 2 to 6 (i.e., from 32 to 2 km) still has above 9% of the total sum of squares. The two cases tested for the whole study area mainly differ at the first two scale levels (i.e., 1 and 2). E M R 1.128(2) exhibits a more 'interesting' pattern of scale variances with a drop at scale level 2. This difference is probably caused by the inclusion of more rough area in EMR1.128(2) (i.e., greater total sum of squares as seen in Table 5.2).  In connection with the interpretations from grain and spectral analyses, the following observations can also be made regarding the global characteristics of the study area surfaces: (i) significant variance in subarea 93H occurs at scale levels 1 and 2 (possibly reflecting the variability observed in the grain plots at 9 km); (ii) scale level 2 (i.e., 3.2 km) in 93H contains great variability which may be related to the 4 km minor 'knick point' as identified earlier in grain analysis; (iii) the variability at scale level 2 (i.e., 32 km) in the central region in the EMR1 D E M drops a great deal from level 1 (i.e., 64 km) indicating a major scale break in between (probably corresponding to the significant scale 46 km as identified in grain analysis); (iv) for EMR1 more variability is evident at level 5 than 4, indicating one or more minor scale breaks between 4 and 8 km (compare to the scale breaks 4.5, 5.6, and 6.7 km as identified in the spectral analysis).  5.1.4  Interpreting the variograms  In this research, the variogram method is adopted from Klinkenberg and Goodchild [1992]  196  in order to determine the fractal dimension of the study area surfaces. Figures 5.6a shows the surface variograms derived from the two subarea DEMs  TRIM93G.SUB  and  TRIM93H.SUB (each has a dimension of 300x320). First, scattergrams of the elevation difference variance squared against horizontal distance were plotted in the log-log space. For each one, the maximum distance was divided into 30 distance classes defined such that in log-log space the classes would be of equal width on the x-axis. This practice improves the reliability of the linear regression [Klinkenberg and Goodchild, 1992]. Then, for each of the variograms, the linearity of the plot was examined and breaks of slope were identified by visual inspection. That is, the first and last points which appeared to enclose a reasonably linear cluster were identified and straight lines were least-squares fitted to the intervening parts of the scattergrams and plotted. Finally, the slopes were calculated for all the lines and associated parameters including fractal dimension (D), gamma value and break distance, were printed. In the fractal plots for the two subareas some non-linearity is present and the selfsimilar fractal model does not perfectly describe the study area landscape surfaces at the scales considered in this research.  As can be seen from surface variograms derived from the two subarea DEMs, shown in Figure 5.6a, both plots used in the determination of the fractal dimension display more than one linear segment. For both plots there are four least-squares best fit lines. These fractal elements indicate scale-delimited regions of self-similar topography. It is evident from the above figure that the variogram plot of T R I M 93G demonstrates a better overall linearity than that of T R I M 93H. The scale breaks for T R I M 93G are not as distinctive as for T R I M 93H.  197  198  The fractal dimension (D) for T R I M 93G changes from 2.28 to 2.35 to 2.29 and then to 2.25 for the four different scale ranges. For T R I M 93H, the changes in the D values are more dramatic, from 2.42 to 2.24 to 2.44 and then to 2.72. The gamma value indicates some aspect of the magnitude of the surface roughness. Since T R I M 93H is overall much rougher than T R I M 93G, the gamma values of T R I M 93H variogram plot are higher, as expected, 6.68, 6.75, 7.81, and 10.18 versus only 3.05, 3.16, 2.56, and 1.8 for T R I M 93G. The break distance represents the maximum distance to which each linear regression line could be fitted. Clearly, T R I M 93G has some minor breaks around 631 m, 1.7 km, 3.9 km and a much longer break distance over 19.8 km which is beyond the limit of the study area size. For T R I M 93H, some major breaks exist at around 102 m, 420 m, 1.4 km and 7.2 km. The minimum distance to which a fractal element could be assigned is, of course, set by the T R I M D E M grid spacing of 50 m.  Following the same methodology, the surface variogram plot of the study area EMR1 D E M was derived and is shown in Figure 5.6b. It demonstrates a poor overall linearity and a number of scale breaks exist. Three linear segments could be fitted. The first major scale break identified is between 3 and 4 km (around 3.8 km) for the whole study area surface with a fractal dimension of 2.26 below this level and a fractal scale of dimension 2.84 above. The other two scale breaks identified are around 33 km and 166 km.  It should be noted that the scale breaks identified by the variogram method and the other methods as discussed above are not precise and they provide only a general picture of the  199  Variogram plot of E M R 1 D E M  Figure 5.6b  Surface variogram derived from EMR1 D E M .  200  global characteristics of the surface. They should therefore be used only as a guideline for the selection of the appropriate sampling intervals or window sizes.  5.1.5  A comparison of various global surface characterization methods and their results  There are definite links between grain measures, spectral analysis and the methodology of both nested analysis of variance and fractal analysis using the variogram technique. Each method has its advantages and disadvantages in the study of scale effects and is capable of identifying only a limited range of scales. Spectral analysis provides the most comprehensive interpretation of terrain according to the frequencies of undulations, although assumptions about stationarity of the elevation data do not make it especially amenable for terrain. The anisotropic nature of topography also makes it difficult to interpret the meaning of twodimensional spectrum. Therefore, one-dimensional spectral techniques need to be used for dealing with either the average of the two-dimensional spectrum over all directions, or the spectrum of some representative profiles across or along the direction of the longest significant wavelength in the topography. The spectra are able to identify more scale breaks at shorter wavelengths. However, the variograms and associated fractal scales seem more capable of detecting scale breaks over a wider range. The self-similar fractal model has been found to provide a very good fit for some terrain surfaces, but a not so good fit for some others [Klinkenberg and Goodchild, 1992]. The limitations of fractal analyses are comparable to those of autocorrelation, spectral and regionalized variable analysis.  201  Both grain and variogram analyses describe the same attribute of topography, spatial autocorrelation of elevation. Pike et al. [1989] suggest that the geostatistical parameters termed "range" and "sill" on elevation variograms are similar to "grain" and "relief at grain" on grain graphs (local relief versus area diameter). Grain values might be obtained from topographic variograms which, based on formal geostatistics, may give a much needed theoretical basis for the relief/grain concept. One problem with the grain measure determined from a grain graph is that the choice of the center of the sampling area is arbitrary. Estimates of local relief from unit cells of one size do not represent a wide spectrum of terrain types with equal fidelity. This problem reflects the varied dominance of topography by local features that differ widely in relief and spacing [Pike et al, 1989].  The nested A N O V A can be useful in identifying the scales where significant terrain variability exists, but it is limited in the level of information it provides. Besides, it can only handle 2" x 2" square matrices and it assumes a complete enumeration, not a sample.  Based on the above global characterization results for the two subareas and the whole study area surface using various methods, some of their general characteristics can be summarized as follows (see also Table 5.3): 1)  Overall, the self-similar fractal model is found to provide an imperfect fit for the whole study area EMR1 D E M .  A series of distinct fractal dimensions  connected by transition zones are evident from its variogram plot. That means the levels of surface variability are clustered at some particular scale ranges.  202  Table 5.3  Grain  Spectra  A summary of the important scales in the study area surfaces identified by various global methods 93G  93H  EMRl  Major: > 15 km  Major: 9 km  Major: 46 km 9 km  Minor: 9 km 4.6 km 3 km 1 km  Minor: 4 km  Major: 1 km  Major: 1 km  Minor: 720 m 250 m 166 m  Minor: 333 m 172 m Major: 3.2 km  ANOVA  Variogram  Minor: 200 m  Minor: 200 m  Major: > 19.8 km  Major: 7.2 km 1.4 km 420 m 102 m  Minor: 3.9 km 1.7 km 631 m  203  Minor: 155 km 110 km Major: 6 km 3 - 4 km 2.2 km  Major: 4 km 16 km 4 - 8 km 32 - 64 km Major: 166 km 33 km 3.8 km  The multiple 'knick points' in the grain graph, the resulting nonlinear variance spectra from spectral analysis, and the results from nested A N O V A all indicate this similar characteristic. For the whole study area, there appear to be several scale breaks.  For  example, a scale break of about 4 km is evident from the variogram plot, A N O V A analysis and the spectra. Also a scale break of 9 km is identified from the grain graph. Such scales could be of great value because one could then tailor sampling to a particular scale range in order to capture the maximum amount of surface variability with minimum sampling effort. In addition,  these  scale  breaks  can  be  used  to  help  identify  the  appropriate/optimal sampling interval and moving window size in the characterization of surface roughness for D E M error modelling. Because the study area EMR1 D E M has a resolution of 1 km and the moving window size has to be in odd numbers (i.g., 3x3, 5x5, 7x7, and 9x9), the above two scale breaks can be best characterized by window sizes (5x5) and (9x9). In Chapter 6, D E M error modelling results based on local roughness measures extracted using these two different characteristic window sizes (i.e., 5x5 and 9x9) will be compared to each other. Because a major scale break exists below 5 km for the whole study area surface, one would suggest that the smaller window size (5x5) would be one of the better choices for extracting local geomorphometric measures from EMR1 D E M . Two subareas demonstrate some very different global characteristics. For  204  example, the variogram plot of T R I M 93G demonstrates a better linearity than that of T R I M 93H, which indicates a better fit of the self-similar fractal model over the scale range being studied. Both the grain graphs and the variogram plots of the two subareas show that T R I M 93H has a shorter characteristic horizontal scale at 8 km ± 1 km. The variogram plot of T R I M 93H inflects at a distance of around 7.2 km whereas the variogram for T R I M 93G does not inflect until at least 19.8 km (its computation stopped at a distance of around 19.8 km). Similar observations can be made on the grain graphs of the two subareas. For T R I M 93H, a major "knick" point was found between 8 and 9 km whereas for T R I M 93G no major "knick" point until at least 15 km (the maximum range for the study area). Since a scale break at around 1 km was identified for both subareas by the spectral analysis and by grain measure for subarea 93G, a window size of (21x21) will be chosen for the extraction of the local measures in the following section (note that T R I M DEMs have a resolution of 50 m). For comparison purposes, a smaller window of size (7x7) will also be tested in order to see the effects of the scale break 420 m identified in subarea 93H but not in 93G. Because of the higher terrain variabilities at this short scale range in subarea 93H, one would imagine that the smaller moving window size might work better to characterize the local roughness for subarea T R I M 93H than for T R I M 93G for D E M error modelling. This will be tested in Chapter 6 by examining the effects of using two different window sizes.  205  The next two sections (5.2 and 5.3) will present some results of local characterizations of surface roughness using the 'moving window' technique on several DEMs in the study area. Based on the global characteristics of the two subareas, two different window sizes will be used for the extraction of the local roughness measures from 50 m T R I M DEMs: (7x7) and (21x21). For the whole study area, window sizes (5x5) and (9x9) will be tested for the extraction of the local parameters from the 1 km EMR1 D E M . Various terrain clusters will then be identified using a multivariate analysis of the local measures.  5.2  Local Surface Characteristics  In order to identify various terrain classes and, therefore, to be able to quantify the relations between the spatial pattern of D E M errors and the variation of the surface characteristics, some local geomorphometric measures need to be extracted from the study area DEMs. First, the seven variables identified in section 4.2 are derived from the two subarea T R I M DEMs (i.e., TRIM93G.SUB and TRIM93H.SUB).  Then, in order to examine the issue of scale and  resolution in the characterization of surface roughness, the same set of variables is also derived from the 1 km EMR1 D E M for the whole study area. Several Fortran and Splus programs, based on the methodologies discussed in section 4.2, were used to extract the measures.  Some of the results for the two subareas using T R I M DEMs are displayed in  Figures 5.7a-g.  Two different window sizes (i.e., 7x7 and 21x21) for the extraction of  parameters were used on the 50 m resolution T R I M DEMs. Because of the edge effects of using a moving window, a matrix of size only (280 x 300) was created for each derived  206  3  ,  T  TS  c  >< CD  •a  .  .o  CM X  co  CD  — i  & co pd J w g  !§  L £  CO CT>  o  "5  15T3  6f  CO  OO /  2 co .9 -d  3  21 CO  H  co  -.  CO  3  -° 13 3 >  CO O £ co "O 03 CO 3  CO CD  ^ 2 § 8 Ico £ | 8  oo o  o £a -3o-° ~g  x  co  *  O  CO C+H  o  - d -ri  G  O co  ° a  &  03  co  o c  13 CO  > rd ii B T3 O — co (0 d co 60 .£0 1  | <*  S CO o 8 S -9 o 03 •*•' X —I  co  d o OOOOi  000S9  00008  (Ul) 6UIC|}JON  OO0SZ  a  <o w CO c  •a =3Q 08 co  (Ul) 6uiuiJorg  >  0  13  CO  — i• I  4—'  to d X 60  a g oO 1 Biff <X*-> 2 u.  CO T3  xi C  5  i l lS a d . O X rt » o  LU Q  LU Q CO D CO  I g>  CD  5  3  h T3  a Tj 03 co T3 c3 X O  CO  o IH d £ X 3 O co • d ON d ^3  CO O)  UH  1  LT  LX  03  en ooooz  00008  (ui) BuiU}jo|s|  (Ul) SUIUIJON  207  O  B T3 ,dg  a3 OO  • -H  -  xi 13  2 -2 '5b : 3 '  CD  CO CJ)  U  £  4=1  |  s s  to  T3 •g CO XI  Q  c  XI  o 09 .o "2  CO CO  OOOCY  OOOS9  i 55  ooooe  (UJ) BlMinJON  T3 C  c3  X  I  -a  ^  c  ooosz  (ill) BufUJJON  I3 «x  tUJ  >  •s -s -o a •  I  -4—1  c*> _  <D  a B DC  cn ^  .2  X  CO  CL, 3  ij *c3  c  o  OH T3  >  CD T3  "E CO "O  tz  UJ  tL) *->  « 2  2^  gj  aj 3  Ul  S  60  X  •a  CO  O  Io O  CJ  CO  o  M  13 ^  ^  Ul 0009Z  OOOOZ  00099  00009  (HI) 6umiiorg  0009Z  c3  O  (in) BuiuiJON  • OX)  «J 1-H  3  r—I  OH O  LU  <U  3  3  aj 60  •  Q  H  CD  .9  I  >» "4-. 4J  O  cj  CO  X  CO  C/3  _  QJ  > > SW O Q 2  x  F in ooooz  00008  (ill) BujuyoM  (LU) BujUjJON  208  O flJ & 3  0009Z  3 60 PH  -3  X  o o S 1=1  ~ u to to  X ! CH <-> ^  I  CO < l  o a  I  XI H o  C  • tin .o  C/J  CL  M  _  l  cd to <D  co  c/3  is '5 <u 3-g T3  OOOOZ  00099  00098  (ui) 6UJL|IJO|\|  00008  0009Z  i l l  (LU) BUJLJLJON  & O  1-1  CO  eg 00 to co to ^ tO XI X  -2 £ £  >! §  sI  CD CL O CO  Sill cfl  U  ltO i s  C to ~  ^a  to E O lo to C <o 03 X Q x. co  000SZ  OOOOi  ooooe  00099  (HI) BULLION  0009Z  fill) t)uil(|)' )(J  sUc  c/J  H  ~3 § 3 *  ?? 2 w >o 5/3  oo  O  «  to co •2 x  X  1 3  CO 0)  to < 3 5f> O tO to  CD QO CO  Px xi 5s<* 5 5 «> I« u  0OO0Z  <2 03  000S8  00099  (LU) BUILJUON  00008 (LU) fkllLIJJON  209  0009Z  cu  s  x  CD  I CO  o o to  o o to  x CD co  en  CD  n o  o tr  LX  OOOOZ  000S9  OOOOB  0009/  (Ul) 6U(HIJOM  (uj) CmujJON  V ?  f  CO X CO  co  CD  X  CO  CO  o  o o  X CO  n  1I  CO  o  O DC  (X  000SZ  OOOOi  000S8  0OOS9  00008  000SZ  (LU) SUJIOJON  (UJ) 6u|injON  LU  LU  Q CQ 3 CO  CQ  a =3  CO  CD co o>  X CO  cc  H  00008  0000Z  (LU) BUIIUJON  fui) BuiqjJON  210  3  E  13  a  o 2 (3 co CD  CD  o 2  ^  3  CO o CD O-  CO  CD  ON  t3  co ON  o co  J  CD C o  CO CD OOOOZ  oooos  0OOS9  (UJ) SuiuyoN  ooosz  (ui) 6utiaio|sj  is  e3 J) 3 ^  CD  rX  g,  CD  OO  1  X  co  CO  s  ra  s  CL JO  o g co o c o o  co  co  0000/:  00099  00008 (Ul) euiupofj.  (ill) 6IJ|U|JON  co  Pn  5co 'fa cd  3 O CD CL O  O CD  O & P  CL,  O  3  CD oo  OOOSZ  cd '*co CD  T3 P  CD  C  <D CD  SP ^  6 LU Q  CD  13 fi o -S  Q  m  CD  CO  CO  CD  X  3  CO  tr CD CO  P 00008  ooooz  (ui) BUJUIJON  (ui) Buiuyofg  211  M  2 w O Q  CO  CO  CO  u cd  UJ  tuO  CO  CD co en  e  e 2— 03 > . H rH _j 2 W s ' CD <u o w 7" .S o .2? c  C\l X  O  I  CD  w c o D. a>  2  X3 CD CD  CD -O  E 2  CD  XI > t— 'u 1  <D  E <D  CN X CN  T3 3  ooooe ooosz (LU) Bum jO iN  I CO OJ  CO  o  D.  JC  CD  0)  n  E  E  3  CD  c 'o u CD 3 3H N o *53 .2 ti <D 5 X Q o c i (D -a X <D p m X  x O  I-  X W) X <D X  3  ON  CD  -b  a  oo 3  ON  03  T3  H  c/i cD BO o & 3  Cw  § x  3  co  'S  O  co  O-,  O  T3 cJj 3 03 -r" LU Q CQ Zl 00  LU Q  m 3 00  CD  b0 3  c3 '  > O  o  X  03  CO  CD  3  ooooz  00008  (LU) BUJLTUON  (in) Buicrporg  212  oo  cD  X)  3  6 .S D tD  O •-3 « i-  E o u o  c3  <<> o> -x C5  co  ^  X CO CT>  §  co G  CD  ON  C  0)  E o  £  T3  S • cd  CO  o § 3  03 & co y  3  ON  O CO  co  x CD  2 S g3 8  X CO ON  CO  S3  .1.6 co  6 -2 o a  o  ts E o  00 B PL.' <0 5 3  CO CL  •a 83  >, X  0 g co bfl 3  03  co  Q  s .s ^ LU Q  Q  CD  CQ  X>  CO  CO  6  X CO  CO O)  M to' 0  3  to  1  2 LU  co  O co  en  r*  1  O  CO  00008  ooooz  (Ul) BuiuyoN  (UJ) BUIUJJON  213  1  —1  ,  —  l  > Ul  CO Ui  CN bO  variable based on the original (300 x 320) DEMs. For the whole study area using E M R l D E M , some of the results of local measures are shown in Figure 5.8a-c. Since E M R l D E M has a resolution of 1 km, smaller window sizes (i.e., 5x5 and 9x9) were used for the extraction of the seven local measures. Each resultant grayscale image has a dimension of (140 x 167) extracted from the center of the original E M R l D E M of size (160 x 187). A detailed description of each figure is provided below.  Figure 5.7a shows continuous gray tone image representations of local relief (LR) measured from TRIM93G.SUB and TRIM93H.SUB DEMs using two different window sizes (7x7) and (21x21). The original D E M images of the two subareas are also shown in the figure so that the spatial pattern of the D E M and that of the local relief measure can be compared. Evidently, the patterns of some physical features in the landscape are visible in both D E M images and local relief images. For example, features such as the Willow River valley and the Wansa Creek valley (which cut north-south from the top of the region) are evident in subarea 93G. In subarea 93H, the patterns of the predominant McGregor River valley (which cuts southwest-northeast through the region) and the Buchanan Creek valley (which connects to the McGregor River valley from the east) are evident.  Figure 5.7b displays graytone  image representations of the standard deviation of altitude (SD) calculated in two different moving window sizes from TRIM93G.SUB and TRIM93H.SUB DEMs. Again, the patterns of the above river valleys can be recognized in the figures. The results of slope angle and aspect calculations from TRIM93G.SUB and TRIM93H.SUB DEMs are shown in Figure 5.7c. Slope and aspect values were first derived based on elevations in a (3x3)  214  <D  15  x  o x S <.2 c ° '2  LU  — i  * en x CD  o o  T3 CD  a .a  co  en CD  c J= cn  CO  c  O  000091  000031  00009  T-J  ON '2 T3 O 000091  OOOOfr  (LU) BU;L|UOM  0O0021  00008  0000*'  (LU) BujinjoN  c *-* §  J  in • X  co  ^ s— • i "S § LU  CD  tD  c 0  CD  ™  8 Os" 2 X CD <+_, ON X ! O _, 3  •o co TO  c  CO  00 0O003L  00008  LO  00009L  OOOOt  X! CO  fill  1 00009 L  -a >  (LU) BUIULIOM  OOOOZL  00008  000017  (LU) BujLflJON  <D  c I  C  £  o  CD  X  S.9S D ^ cl  5P c x E * CD  co  c3  8 .2 £ CO  >  S D  CD •D  •g CO •D  c  c3 oq  CO  CO 00009 1  000031  00008  00009 1  OOOOf  000031 (LU) BmgjjON  (LU) BuiqilON  215  00008  000017  g  3 oo  3  X  in  X LO  <u  _N  'to O  O  g  =i  <o <D CL O CO  =! C_>  0  0)  1  CL  o  co O co CU CD 3  CO  000091  000031  000031  00008  00008  (uj) 6U!U|JON  (ill) 6UIU1JON  60 •> 3 CD O X VH  •!->  3 RO  a 2  X CO  3 CD 3  HI  o  CD  3  O  CD . CD  o o  o o to  U  60  CL, -2 £  CO  CO T 3 CD  o  DC CO  3  H  O 000091  000031  00003I  00008  00008  3  (UI) BUKIIJON  (uj) 6UIULION  g  CD .  oo T3  a g  CD C« OH CD CD O 60 , 3  .5 CD  CC LU  LU  O  o  o> CL  CL  o  o ,o co  00  in 000031  000031  00008  (UJ) BUJULION  (Ul) fknijiJOM  216  I  O at  o  CO  CD  -5  00008  a3  60  H <u  ed £ <u — t<  03 ^  ^-J 03 (D  CC  5  s  UJ  LU  12 ON  0  u  g  OOOOZt  OOOOZt  13  s  b  (UJ) BOMUON  (UJ) DwjUON  m c« .. o o N O cr  s  c  o  1 H co  B  Si  13  o U  ,o  03 ^3 o c 000091  OOOOZl  00008  00009)  0000*  000031  00008  OOOOt'  03 Sr 03  (UJ) DUKJUOfJ  (uj)6utuuoN  a u  d  CO  H  a w •9 § w  or  r  o 13 co .5 f- 4> f c "C 13 o  00 000091  OOOOSl  00008  00009)  OOOOt'  OOOOSt MO  (UJ) ButuJJON  217  00008  neighbourhood and then the slope angle values were averaged for a (7x7) or (21x21) moving window. The results of the original slope and aspect values based on the (3x3) window and the slope values averaged for a (7x7) moving window are presented in the figure for each of the subareas. The aspect values are not used in any further analyses but are included in the above figure for visual examination of the results of the original slope and aspect calculation. The slope values averaged for the moving window (21x21) were also obtained for future analysis but are not displayed due to space limitations. Figure 5.7d shows the grayscale image representations of the roughness factor (RF) derived for the two subareas.  It was  calculated for each point based on the slope value according to Equation 4.7 and then averaged for the moving window of size (7x7) or (21x21). The slope curvature measures (SC) derived for the two subareas using two different moving window sizes are shown in Figure 5.7e. Note that there is evidence of the valley patterns in all above figures. Figure 5.7f gives graytone image representations of HP (i.e., the number of D E M points that are higher than the center point of the moving window) derived from the two T R I M subarea DEMs using two different moving window sizes. Finally, Figure 5.7g displays graytone image representations of HI (i.e., hypsometric integral), estimated from TRIM93G.SUB and TRIM93H.SUB DEMs using two different window sizes (i.e., 7x7 and 21x21).  Similarly, for the whole study area, Figure 5.8a shows image representations of local relief and standard deviation of elevations extracted from the 1 km E M R l D E M using window sizes (5x5) and (9x9) and it also shows image representations of slope and roughness factor derived from the window size (3x3). Slope, slope curvature, and roughness factor measures  218  derived using window sizes (5x5) and (9x9) are displayed in Figure 5.8b. Figure 5.8c gives the results of HP and HI measures derived from the EMR1 D E M using window sizes (5x5) and (9x9). The original EMR1 D E M image is also shown in the figure for comparing the spatial patterns. Clearly, the spatial patterns of the Fraser River valley, McGregor Plateau and Misinchinka Ranges in the region are evident in all the figures.  From the above results it can be observed visually that none of these measures varies randomly, but are all somewhat related to the spatial variation of terrain. There is no doubt that they all exhibit structure which allows one to infer that there is systematic spatial variation of terrain attributes.  However, each measure seems to be capturing a different  aspect of the surface roughness. Some measures appear to show more similar patterns (e.g., local relief and standard deviation of altitude) than others do (e.g., HP).  The different  window sizes also have an impact on the results. Since the 'moving window' technique is an example of a low pass spatial filter, the larger the size of the window, the 'smoother' the result.  5.3  Multivariate Classification Results  In section 5.2 seven commonly used local geometric measures were identified and extracted from study area DEMs. These local variables include: local relief (relief), standard deviation of altitude (std), slope (slope), roughness factor (rough), slope curvature (curv), number of points higher than center point (highpt), and hypsometric integral (hypinf). Each of the  219  parameters extracted above is believed to represent one or several aspects of the surface roughness property. Clearly, there is room for differing views on parameters to be chosen for what is essentially the same problem. Murtagh and Heck [1987] identified the following criteria that would help in making the decision towards selecting a particular set of parameters in a particular case: (i) the quality of the data; (ii) the computational ease of measuring certain parameters; (iii) the relevance and importance of the parameters measured relative to the data analysis output (e.g., the classification); (iv) the importance of the parameters relative to theoretical models under investigation; and (v) in the case of very large sets of data, storage considerations may preclude extravagance in the number of parameters used. Surely, criteria (iii) and (iv) are much more important than (i), (ii) and (v).  5.3.1  Variable selection  Among all those identified, it is possible that some of the variables are redundant because of correlation.  If variables are highly correlated, they add little new information to a  multivariate analysis. Also, as indicated in section 4.4.2, highly correlated variables can create problems in some multivariate analyses. In the limiting case where the variables are perfectly correlated, for example, the transpose of the matrix of variables has no inverse and a determinant of zero will result. Thus, the first step in a multivariate analysis is usually to cross-correlate all variables (i.e., relief, slope, curv, hypint, std, highpt, rough) and to identify those significantly correlating ones.  In order to find out if significant correlations exist  among the seven variables used in this study, the scattergrams of all the variable pairs were  220  examined and their correlation coefficients were calculated.  Figure 5.9a displays the scattergrams of all the variable pairs for the two subareas T R I M 93G and T R I M 93H (N.B. only the result of using window size 7x7 and only a random sample of n=40 data points is displayed for each variable for clearer visualization). Figure 5.9b shows the scattergrams of all the variable pairs for the whole study area. Again, only a random sample of 40 is shown due to the visualization constraints. Table 5.4a lists the rankorder correlation matrix of the seven variables (window size 7x7) for each subarea. Because of the skewed distributions of the most variables (hence a statistical lack of normality in the majority of the morphometric parameters) and the obvious non-linear relation between some of the variables (e.g., slope and rough), Spearman's rank-order correlation coefficient (r ) was s  used rather than Pearson's linear correlation coefficient.  As shown in Figure 3.10, for  example, the histogram of slope values for each of the two subareas is not normally distributed but positively skewed. Spearman's rank-order correlation provides a measure very similar to Pearson's correlation coefficient, with the same range of values and an identical interpretation.  However, it is based on the differences of ranks between the two variables  rather than the values themselves and their covariance.  The diagonal of each correlation  matrix table, from the top left corner, is the correlation of each variable with itself (r=+1.0). The triangle of the data below that line contains the same information as the triangle above the line (row for column). Therefore, for comparison purposes, the results for subarea T R I M 93G are shown above the diagonal line and below the line are the results for subarea T R I M 93H. Table 5.4b lists the rank-order correlation results of the variables extracted from the  221  Figure 5.9a  The scattergrams of all the variable pairs (window size: 7x7) for subareas 93G (above the diagonal) and 93H (below the diagonal) (n=40).  222  Figure 5.9b  The scattergrams of all the variable pairs (above the diagonal for window size 5x5 and below the diagonal for window size 9x9) for the whole study area (n=40).  223  -*—»  00  •4-» #—« £H  m  IT)  d  d  (30  m  d  d  00 d  CN  CN  o di  o q  NO  d  oo d  u o c/o  m  ON  o q  o q  NO ON  d  d  Curv  00 O d  Slope  m o d  d  Relief  ON  d  Curv  Os  d  NO  o d  CN  lief  'EL  d  NO  in d  en ON  d  d  NO  o q  d  d  NO  o q  o d  o q  en d  T—1  o  o d  i  o d  CN  NO ON  d  O  d•  d  m d  en o d  NO  ON  ON O  o q  d  in d  m  d  oo ON  d  d  -*-» 00  d  ro d  oo  o d l  ON  Rough  6J0  ON  Highpt  XI  d  ON  Hypint  Rough +-* /-> MH  ON  224  ON  d  d  Rough  3 M  >n  ON  ON ON  00  d  d  d  o  d  ON  d  Tl-  d  o q  I  OO  >  OO  XI  a  m •*-»  00  'S3 *  EH  03 oo  .2 o £ -3 o3 -a  m  CN  CN  d  d  > O  oo 3  d  ON  d  oo  d  ,_ ,0 3 O  „ w>  P >03 "03 -3  co  o  d  d  co 00  d  ^oo  d  o q  CH  m  ON  Eo75 d  Pi  O  q  o q  00  d  d  o  d  d• co ON  d  o q  r-~ o  o  r-  d  d  NO CN  NO  d  d  oo  o  00  CN ON  m  o  ON ON  d  d  oo  d  CN  d  d  di  d  00  t--  NO ON  in  co  d  d  d  d  d  d  on  oo  Curv  Slope  Relief  o  CN ON  x in  O  o  l  lief  U 3 fl o u ^  X  00 ON  d  co  I  Curv  £ _ .a  4^  di  ON  . a x 2 3 i)  1 13  m  di  d  <g  IH  o q  00  "El o  0  o q  CO  4—>  on  O X  O  o  O  >  3  g  m  o  00  o  -*—*  o  ON  Rough  ?  Highpt  3  Hypint  O  225  whole study area E M R l D E M . Above the diagonal line are the results for window size (5x5) and below the line are the results for window size (9x9).  It can be seen from the scattergrams in Figure 5.9a and the correlation matrix in Table 5.4a that, for both subareas, variables rough and slope are highly correlated and variable relief is highly correlated to std. This is no surprise when one looks at the way that rough is actually directly derived from slope and notes that both relief and std are some sort of description of elevation range (see sections 4.2.1 to 4.2.4) and they both belong to one group of geomorphometric measures as identified by Pike [1988a,b]. Variables slope and rough are also highly correlated with both relief and std. Based on Figure 5.9b and Table 5.4b, the same conclusions can be drawn for the variables extracted from E M R l D E M for the whole study area using the different window sizes. Considering these correlating relations, only variable groups 3-5 (i.e., curv, hypint, and std), 3-6 (i.e., curv, hypint, std, and highpt), 2 (i.e., slope only) and 5 (i.e., std only) are used for hierarchical clustering in the following section.  The largest subset group chosen to be tested includes only variables #3 to #6 because the other variables #1, #2, and #7 (relief, slope, and rough) are all highly correlated to variable #5 (std). Variables relief and std both describe elevation range (the basic topographic measure of the vertical dimension of the terrain) and they are highly correlated to each other. However, the variable std was included in the above subset group (i.e., 3-6) instead of relief mainly because the former is more statistically sound and stable (see discussions in section 4.2.2). Although the calculation of relief"is computationally less demanding, the automatic  226  extraction of the variables from DEMs using current computer power has made this consideration insignificant.  Variable group 3-5 was also chosen to be tested because variable #6 (highpf) appears to emphasize more local terrain discontinuities and is very different from all other variables. It would be interesting to compare the clustering results with and without this variable included in the group. Finally, two single variable groups (i.e., 2 or 5 only) were selected as well in order to compare the differences between classifications using single- and multivariate signature—to see if a variable such as slope and std alone can be effective in characterizing the roughness of terrain for the purpose of D E M error modelling.  5.3.2  Grouping of homogeneous areas  Two different  grouping techniques  are widely used in image analysis: supervised  classification (i.e., discriminant function analysis), and unsupervised classification (i.e., cluster analysis). Discriminant analysis uses a priori knowledge for the selection of a discriminant function which separate individual classes in multi-dimensional variable space. Supervised classifiers require that the objectives of the classification and the class properties be known and clear; the number of groups in a discriminant function is set prior to the analysis. On the other hand, cluster analysis requires little or no prior knowledge about the number and nature of classes in the area. The number of clusters that will emerge from a classification scheme cannot ordinarily be predetermined. Thus, cluster analysis is commonly used to get  227  a first impression of the regional partition [Weibel and DeLotto, 1988]. For the purpose of classifying the roughness of terrain at different scale levels, unsupervised hierarchical clustering seems to be the more appropriate method.  Although the classification itself is  point-based and not based on context-sensitivity, the variables are extracted from a local neighbourhood (i.e., a moving window) and thus reflect texture at different scales.  Figures 5.10a-d and 5.1 la-d show results of 8 different classification tests for each of the two subareas T R I M 93G and 93H. Variations tested include different moving window size used in the variable extraction (window = 21x21 or 7x7), and different variable groups used for clustering (variables = 3-5, 3-6, 2 or 5 only). Figure 5.10a, for example, displays the three terrain clusters derived from variable group (3-5) and moving window size (21x21) for the two subareas. In order to make comparison of the figures easier, the roughest cluster (see the discussion below) is always shown as black and the least rough cluster is shown as white. Considering the computer power constraint, only a submatrix of size 220 x 240 (extracted from the center of the original variable matrix of size 280 x 300 in each subarea) was used in the classification (see Figure 5.7a).  Figures 5.12a-d show some examples of classification results for the whole study area. Variations tested include window size (5x5) or (9x9) for the variable extraction, different variable group (3-5), (3-6), (2) or (5) for the multivariate clustering.  Figure 5.12a, for  example, displays the three terrain clusters derived from variable group (3-5) and moving window sizes (5x5) and (9x9). Not all the classification results are presented in the above  228  229  CN  x  o o o o  w  cu  .s  CO  'to  0 E ,-  I CO CO  o o o  DC  LO  g  "J d  i  bO u  I  1  1  S3  S «  00028  0008Z  60  CN O ^—^ v-»  o o o o  OH  3  CU  j3  2 ~  OOOt'Z  60  CD  g •2 *" u > U cd CC D 4H  (LU) 6UJU.U0N Win  3  •O fi <U GO  X!  13 CU  -3 3 H CO  o o o o  CD  LO  <U hp CO  co co  4H E  CO CO  to  LU  2  o o o  LO  b  ;  co  co  CD  3 O  •i  CU  u 00089 (LU) 6 U | L | U 0 N  Win  3  *•—' cd  LO cd  0002Z  ON _  pd H to  1  cd  3 GO  000^9 3  00  •— I< UH  231  1  MH  CN X o o o o  CN <u  .3  CO CO CO  ^ > E. ^  X CO en  o o o  DC I-  o  -a  a > d  .si  LO  a ~ T3  1)  s *  w bo IT, O  o o o o 1^  000S8  0008/ (w)  S  —'  XI O 60 <u 3  5 3  OOOfrZ  6 U J U U 0 N V\l±n  o3 >  cD X  3  O T 3  •a x H  cu <u ca  X  o o o o  CD  LO  CO cr>  1) h H C/3  i2 133os  „  E  co LU  O O  5  c/3  I  3 O  in o c  •jj 03  u 00089 (LU) 6U!U.UON  Win  000*9  o vi  a 3  60  232  OS  S  o  000SZ  M-l  H 03 1> 03  X  3  o o o o  00  X  CO CO  05  o o o  LO  ID  LU  l l  o o o o  00028  0008Z  OOOfrZ  (ui) 6U!L|UON tAlin  in  o o o o  CD  LO  CO CO LU  o o o  LO  c3 X>  u 0002Z  00089  0001^9  C3 LO  (LU) 6U!U.UON  Win  60  PH  233  g  —. OOOSZ  •  . 00089  (uu) 6U!U,UON  .  <—* 000^9  UH u  Win  3 00  235  co  o o o o  rx u  CO  'lo  X  & O  CO 03  o o o  co LU  5  o T3  g SO  O  B ^ T3  o o o o I-  S  0008Z  §  PH  s  00038  d  OOOvl  Win  (LU) BU|U.UON  > u  s  0  o  12  o o o o  co  LO  CD  c  1J  CO CD  CO  C  Ha  h-  o o o  LO  5 £  to  1)  ON  T3  C  o o  ••s aI ON  CJ <+3  0002Z  00089 (LU) 6 U | U U O N  Win  236  000*9  0H in  237  238  239  240  figures.  It is visually evident, from the examples of the terrain classification results for the whole study area and the two subareas, that various terrain clusters are not randomly distributed but closely related to the variation and complexity of terrain. In other words, the spatial pattern of the clusters appears to reflect, to some degree, the variability and roughness characteristics of the study area surfaces. As can be seen in Figure 5.10a, for example, the roughest cluster (shown as black) for subarea 93G includes the area around the Willow River Valley where there is high terrain variability. The area on the south-west side of the region is the second roughest cluster (shown as dark gray). The area on the north-east side of the region is the smoothest (shown as white) of all three terrain classes derived from variable group (3-5) and window size (21x21), and it obviously corresponds to a very flat area in the region. Similarly, for subarea 93H, the three terrain clusters seem to follow the patterns of the McGregor River Valley and various ridges around Mt. Sir Alexander in the region. From Figures 5.12a-d the general patterns of the Fraser River Valley and the McGregor Plateau on the south-west side and the Misinchinka and Hart Ranges on the north-east side are reflected in the terrain classification maps derived using various window sizes and variable groups. However, the various clustering results do look somewhat different from one another for each surface.  For example, depending on the size of the moving window originally used for  variable extraction, a different classification outcome results.  If "smooth" variables (i.e.,  derived by using a larger moving window) are used (e.g., Figures 5.10a-d), the classification becomes more homogeneous than with "rough" variables (i.e., derived by using a smaller  241  moving window) (e.g., Figures 5.1 la-d). Likewise, the number of variables in the geometric signature used for the multivariate clustering also affects the classification result (e.g., compare Figures 5.10a-d, Figures 5.1 la-d, or Figures 5.12a-d).  For example, the  classification results based on variable group (3-6) tend to show the roughest cluster as being more "peppery" when compared to the results from other variable groups. This is probably because variable #6 (highpt) emphasizes more local terrain discontinuities and is very different from all other variables.  5.3.3  Interpretation of classification results  The multivariate classification results presented above need to be analyzed in relation to the original variables in order to fully understand the characteristics of various terrain clusters and to aid in the interpretation of the classification results.  This is done by generating the  summary statistics of each local roughness variable within the terrain clusters. Table 5.5a gives the mean and the standard deviation of local relief values and the number of points in each terrain cluster at various hierarchical levels with different number of resulting clusters (nclass = 2, 3, 4, 5, or 6) for both subareas (93G and 93H). In this table, the terrain clusters were derived from variable group (3-5) and the variables were extracted using window size (7x7). Tables 5.5b-g show the results of summary statistics for local roughness variable slope, curv, hypint, std, highpt, and rough respectively (window size: 7x7; variable group: 35; subarea: 93G and 93H).  242  Table 5.5a  Summary statistics of local relief values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H.  Class*  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  17.1  9.7  33340  231.0  117.9  2  10184  40.6  13.0  19460  95.4  41.1  1  10184  40.6  13.0  24142  262.1  109.3  2  24665  20.2  10.5  9198  149.4  99.2  3  17951  12.8  6.3  19460  95.4  41.1  1  24665  20.2  10.5  20025  267.9  111.9  2  8563  37.2  9.8  9198  149.4  99.2  3  17951  12.8  6.3  19460  95.4  41.1  4  1621  58.7  13.0  4117  233.7  90.1  1  17711  24.9  8.2  9198  149.4  99.2  2  8563  37.2  9.8  19460  95.4  41.1  3  17951  12.8  6.3  13670  212.9  69.1  4  1621  58.7  13.0  6355  386.2  93.4  5  6954  8.0  3.7  4117  233.7  90.1  1  8563  37.2  9.8  19460  95.4  41.1  2  17951  12.8  6.3  13670  212.9  69.1  3  6654  22.4  7.4  6355  386.2  93.4  4  1621  58.7  13.0  6219  99.8  54.8  5  11057  26.5  8.3  4117  233.7  90.1  6  6954  8.0  3.7  2979  252.8  91.2  * Note that the classes do not remain consistent down the table. That is, class 1 at one level does not necessarily remain class 1 at the next level.  243  Table 5.5b  Summary statistics of slope values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  3.62  1.76  33340  29.46  11.37  2  10184  7.78  2.69  19460  15.04  5.41  1  10184  7.78  2.69  24142  32.78  9.82  2  24665  4.18  1.86  9198  20.74  10.50  3  17951  2.84  1.27  19460  15.04  5.41  1  24665  4.18  1.86  20025  33.02  9.93  2  8563  7.02  1.69  9198  20.74  10.50  3  17951  2.84  1.27  19460  15.04  5.41  4  1621  11.79  3.34  4117  31.60  9.18  1  17711  5.05  1.38  9198  20.74  10.50  2  8563  7.02  1.69  19460  15.04  5.41  3  17951  2.84  1.27  13670  28.60  6.77  4  1621  11.79  3.34  6355  42.55  8.91  5  6954  1.96  0.75  4117  31.60  9.18  1  8563  7.02  1.69  19460  15.04  5.41  2  17951  2.84  1.27  13670  28.60  6.77  3  6654  4.64  1.40  6355  42.55  8.91  4  1621  11.79  3.34  6219  15.72  7.21  5  11057  5.30  1.31  4117  31.60  9.18  6  6954  1.96  0.75  2979  31.22  8.35  Class  244  Table 5.5c  Summary statistics of slope curvature values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  1.55  0.73  33340  8.53  3.78  2  10184  3.41  1.63  19460  3.69  1.48  1  10184  3.41  1.63  24142  8.11  3.36  2  24665  1.65  0.79  9198  9.63  4.53  3  17951  1.40  0.61  19460  3.69  1.48  1  24665  1.65  0.79  20025  7.25  2.61  2  8563  2.87  0.90  9198  9.63  4.53  3  17951  1.40  0.61  19460  3.69  1.48  4  1621  6.26  1.65  4117  12.27  3.51  1  17711  1.96  0.71  9198  9.63  4.53  2  8563  2.87  0.90  19460  3.69  1.48  3  17951  1.40  0.61  13670  6.51  2.21  4  1621  6.26  1.65  6355  8.84  2.69  5  6954  0.88  0.29  4117  12.27  3.51  1  8563  2.87  0.90  19460  3.69  1.48  2  17951  1.40  0.61  13670  6.51  2.21  3  6654  2.14  0.93  6355  8.84  2.69  4  1621  6.26  1.65  6219  7.07  2.63  5  11057  1.85  0.50  4117  12.27  3.51  6  6954  0.88  0.29  2979  14.99  2.55  Class  245  Table 5.5d  Class  Summary statistics of hypint values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  0.48  0.11  33340  0.45  0.10  2  10184  0.44  0.11  19460  0.49  0.06  1  10184  0.44  0.11  24142  0.50  0.08  2  24665  0.55  0.07  9198  0.32  0.05  3  17951  0.38  0.07  19460  0.49  0.06  1  24665  0.55  0.07  20025  0.47  0.05  2  8563  0.42  0.09  9198  0.32  0.05  3  17951  0.38  0.07  19460  0.49  0.06  4  1621  0.57  0.14  4117  0.62  0.06  1  17711  0.54  0.08  9198  0.32  0.05  2  8563  0.42  0.09  19460  0.49  0.06  3  17951  0.38  0.07  13670  0.47  0.04  4  1621  0.57  0.14  6355  0.47  0.07  5  6954  0.57  0.05  4117  0.62  0.06  1  8563  0.42  0.09  19460  0.49  0.06  2  17951  0.38  0.07  13670  0.47  0.04  3  6654  0.62  0.05  6355  0.47  0.07  4  1621  0.57  0.14  6219  0.32  0.05  5  11057  0.49  0.04  4117  0.62  0.06  6  6954  0.57  0.05  2979  0.31  0.06  246  Table 5.5e  Summary statistics of standard deviation of elevation values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H.  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  #of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  4.37  2.56  33340  61.48  33.27  2  10184  11.23  4.09  19460  25.22  11.00  1  10184  11.23  4.09  24142  70.56  31.09  2  24665  5.24  2.78  9198  37.66  26.28  3  17951  3.18  1.60  19460  25.22  11.00  1  24665  5.24  2.78  20025  72.55  31.51  2  8563  10.14  2.88  9198  37.66  26.28  3  17951  3.18  1.60  19460  25.22  11.00  4  1621  16.99  4.68  4117  60.84  26.93  1  17711  6.50  2.19  9198  37.66  26.28  2  8563  10.14  2.88  19460  25.22  11.00  3  17951  3.18  1.60  13670  56.70  18.17  4  1621  16.99  4.68  6355  106.65  26.75  5  6954  2.04  0.94  4117  60.84  26.93  1  8563  10.14  2.88  19460  25.22  11.00  2  17951  3.18  1.60  13670  56.70  18.17  3  6654  5.64  1.87  6355  106.65  26.75  4  1621  16.99  4.68  6219  24.86  13.83  5  11057  7.02  2.21  4117  60.84  26.93  6  6954  2.04  0.94  2979  64.40  26.02  247  Table 5.5f  Summary statistics of highpt values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H.  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  20.80  9.93  33340  23.63  5.84  2  10184  23.56  7.26  19460  23.59  5.26  1  10184  23.56  7.26  24142  22.56  5.45  2  24665  19.45  9.54  9198  26.42  5.90  3  17951  22.66  10.15  19460  23.59  5.26  1  24665  19.45  9.54  20025  23.27  4.84  2  8563  23.63  7.03  9198  26.42  5.90  3  17951  22.66  10.15  19460  23.59  5.26  4  1621  23.15  8.35  4117  19.14  6.82  1  17711  20.24  8.80  9198  26.42  5.90  2  8563  23.63  7.03  19460  23.59  5.26  3  17951  22.66  10.15  13670  23.31  4.97  4  1621  23.15  8.35  6355  23.17  4.54  5  6954  17.43  10.95  4117  19.14  6.82  1  8563  23.63  7.03  19460  23.59  5.26  2  17951  22.66  10.15  13670  23.31  4.97  3  6654  17.71  9.50  6355  23.17  4.54  4  1621  23.15  8.35  6219  26.38  6.33  5  11057  21.77  7.97  4117  19.14  6.82  6  6954  17.43  10.95  2979  26.50  4.88  248  Table 5.5g  Class  Summary statistics of rough values (window size: 7x7) within each terrain cluster (variable group: 3-5) for the two subareas 93G and 93H. # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  0.29  0.25  33340  17.44  11.06  2  10184  1.23  1.01  19460  4.09  2.63  1  10184  1.23  1.01  24142  20.30  10.55  2  24665  0.36  0.28  9198  9.95  8.61  3  17951  0.18  0.15  19460  4.09  2.63  1  24665  0.36  0.28  20025  20.48  10.82  2  8563  0.92  0.42  9198  9.95  8.61  3  17951  0.18  0.15  19460  4.09  2.63  4  1621  2.88  1.52  4117  19.44  9.09  1  17711  0.48  0.25  9198  9.95  8.61  2  8563  0.92  0.42  19460  4.09  2.63  3  17951  0.18  0.15  13670  14.95  6.53  4  1621  2.88  1.52  6355  32.35  8.39  5  6954  0.08  0.05  4117  19.44  9.09  1  8563  0.92  0.42  19460  4.09  2.63  2  17951  0.18  0.15  13670  14.95  6.53  3  6654  0.43  0.27  6355  32.35  8.39  4  1621  2.88  1.52  6219  5.40  4.01  5  11057  0.50  0.23  4117  19.44  9.09  6  6954  0.08  0.05  2979  19.44  7.88  249  From Table 5.5a, it appears that at level 2 (i.e., nclass = 2) for subarea 93G, cluster {2} is "rougher" than cluster {1} (i.e., in an order of {2,1}) in terms of "local relief. The average local relief value is 40.6 m for cluster {2} and only 17.1 m for cluster {1}. For subarea 93H, cluster {1} is obviously much "rougher" than cluster {2} at level 2 because the average local relief value is 231.0 m for cluster {1} and only 95.4 m for cluster {2}. The order of the clusters according to local relief values for subarea 93H is thus {1,2}. At level 3 (i.e., nclass = 3) for subarea 93G, cluster {1} is now the roughest, then cluster {2} and then {3} (i.e., in an order of {1,2,3}). At level 3 for subarea 93H, cluster {1} is now the roughest and then {2} and then {3} (i.e., an order of {1,2,3}). Following the same notion, the interpretation of each cluster (derived from variable group 3-5 and window size 7x7) at various hierarchical levels (2 to 6) in terms of "local relief can be summarized for both subareas. For subarea 93G, the orders at various levels are: {2,1}, {1,2,3}, {4,2,1,3}, {4,2,1,3,5}, and {4,1,5,3,2,6}. For subarea 93H. the orders are: {1,2}, {1,2,3}, {1,4,2,3}, {4,5,3,1,2} and {3,6,5,2,4,1}. Meanwhile, it should be noted that the differences between local relief values in some of the clusters become less and less significant at higher levels. For example, comparing clusters {1} and {4} at level 6 for subarea 93H, the difference between the means of local relief values in these two clusters is only 4.4 m but the standard deviation of local relief values within each cluster is as high as 41.1 m for {1} and 54.8 m for {4}.  Similarly, the characteristics of all seven local roughness variables within each terrain cluster derived from variable group (3-5) and window size (7x7) can be summarized as follows for each subarea based on information from Tables 5.5a-g:  250  1)  For subarea 93G  "relief:  {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6}  "slope":  {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6}  "curv":  {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,3,5,2,6}  "hypint":  {1,2} {2,1,3} {4,1,2,3} {4,5,1,2,3} {3,4,6,5,1,2}  "std":  {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6}  "highpt":  {2,1} {1,3,2} {2,4,3,1} {2,4,3,1,5} {1,4,2,5,3,6}  "rough":  {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,5,3,2,6}  2)  For subarea 93H  "relief:  {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,6,5,2,4,1}  "slope":  {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,5,6,2,4,1}  "curv":  {1,2} {2,1,3} {4,2,1,3} {5,1,4,3,2} {6,5,3,4,2,1}  "hypint":  {2,1} {1,3,2} {4,3,1,2} {5,2,3,4,1} {5,1,2,3,4,6}  "std":  {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,6,5,2,1,4}  "highpt":  {1,2} {2,3,1} {2,3,1,4} {1,2,3,4,5} {6,4,1,2,3,5}  "rough":  {1,2} {1,2,3} {1,4,2,3} {4,5,3,1,2} {3,5,6,2,4,1}  From the ordering of the terrain clusters according to various local roughness variables, several observations can be made. First, for both subareas, the orders of the clusters at various levels according to variable slope are identical to the orders given by variable rough (at least from levels 2 to 6). This gives no surprise because the two variables are highly positively correlated (r=0.99) as indicated in section 5.3.1. Second, for subarea 93G, the  251  orders given by relief, slope, std, and rough are the same for all the six levels. For subarea 93H, they are same for levels 2 to 5. There is only a little difference among them at level 6. Again, this can be accounted for by their positive correlations (r ranging from 0.93 to s  0.99 as seen in Table 5.4a). Third, as a result of relatively high correlations between curv and the above four variables (r ranging from 0.74 to 0.86 as seen in Table 5.4a) in the flatter s  subarea, the orders of clusters given by curv for subarea 93G are the same as those given by the above four variables for levels 2 to 5, with only a slight difference at level 6. For the rougher subarea 93H, the variable curv is not as highly correlated with variables relief, slope, std and rough (r ranging from 0.51 to 0.64 as seen in Table 5.4a). As a result, the orders s  given by curv for subarea 93H are quite different from those given by the above four variables. Fourth, since the variable highpt for subarea 93G has slightly positive correlations with variables relief, slope, std, rough, and curv (r ranging from 0.14 to 0.16 as seen in s  Table 5.4a), the orders given by this variable have some similarity to those given by the above five variables. For the rougher subarea 93H, the variable highpt has no or slightly negative correlations with the five variables (r ranging from -0.07 to 0.03 as seen in Table s  5.4a). Therefore, the orders given by highpt are very different from those given by the above five variables and are sometimes even reversed. Finally, for both subareas, because of some negative correlations between highpt and hypint (-0.23 and -0.27 respectively for 93G and 93H), the orders given by these two variables are mostly reversed, especially for subarea 93H. At level 6 for subarea 93H, for example, cluster {5} is the roughest and {6} is the least rough cluster according to hypint, whereas according to highpt, the reverse is true with cluster {6} being the roughest.  252  The above observations can be used to understand or interpret the characteristics of each of the terrain clusters at various levels derived from variable group (3-5) and window size (7x7) in relation to the original local roughness measures.  For example, take the classification  results (3 classes; variable group 3-5; window size 7x7) for the two subareas as shown in Figure 5.11a. Subarea 93G cluster {1} seems to represent the roughest area overall. This cluster has, on average, the highest local relief, the highest slope, the highest slope curvature, the highest standard deviation of elevations, the highest highpt (i.e., number of points that are higher than the center point of the moving window), and the highest roughness" factor. Only the mean hypsometric integral value in this cluster is not the highest of the three. On the other hand, cluster {3} seems to represent the least rough area in 93G. A l l seven variables have the lowest average values in this cluster except that highpt value in this cluster is the second lowest of the three. For subarea 93H, it is more difficult to point out which cluster at this level is the roughest. However, it is clear that cluster {1} indicates area with relatively high average relief, slope, hypint, std, rough, medium curv value, and relatively low highpt value. Cluster {3}, on the other hand, represents area with relatively low average relief, slope, curv, std, rough, and medium hypint and highpt values.  For terrain clusters derived from variable group (3-5) and variables extracted using window size (21x21), the summary statistics of each variable within various terrain clusters for the two subareas were calculated, but are not shown.  From these summary statistics, the  characteristics of all seven local roughness variables within each terrain cluster derived from variable group (3-5) and window size (21x21) can be summarized as follows:  253  1)  For subarea 93G  "relief:  {1,2} {3,1,2} {1,4,2,3} {5,3,4,1,2} {5,2,3,4,6,1}  "slope":  {1,2} {3,1,2} {4,1,2,3} {5,3,4,1,2} {5,2,4,3,6,1}  "curv":  {1,2} {3,1,2} {4,1,2,3} {3,5,4,1,2} {2,5,3,4,6,1}  "hypint":  {1,2} {3,1,2} {4,2,3,1} {3,1,5,2,4} {2,6,4,5,1,3}  "std":  {1,2} {3,1,2} {1,4,2,3} {5,4,3,1,2} {5,3,2,4,6,1}  "highpt":  {2,1} {2,1,3} {1,3,4,2} {4,2,5,3,1} {3,1,5,4,2,6}  "rough":  {1,2} {3,1,2} {4,1,2,3} {3,5,4,1,2} {2,5,4,3,6,1}  2)  For subarea 93H  "relief:  {1,2} {3,2,1} {2,1,3,4} {1,3,4,2,5} {2,3,4,5,1,6}  "slope":  {1,2} {3,2,1} {2,1,3,4} {4,1,3,2,5} {2,4,3,5,1,6}  "curv":  {1,2} {3,2,1} {2,1,4,3} {1,4,3,5,2} {2,4,5,3,6,1}  "hypint":  {1,2] {2,1,3} {1,3,2,4} {4,3,2,1,5} {4,3,1,2,5,6}  "std":  {1,2] {3,2,1} {2,1,3,4} {1,3,4,2,5} {2,3,4,5,1,6}  "highpt":  {2,1] {1,3,2} {4,3,2,1} {5,2,1,3,4} {6,1,5,2,3,4}  "rough":  {1,2] {3,2,1} {2,1,3,4} {4,1,3,2,5} {2,4,3,5,1,6}  Based on the ordering of the terrain clusters according to various local roughness variables as listed above, similar interpretations can be made regarding the characteristics of each of the terrain clusters at various levels derived from variable group (3-5) and window size (21x21) in relation to the original local roughness measures. It should be noted, however, that the variable hypint agrees more with the others for both subareas when using the larger  254  window size (21x21) than using the smaller one (7x7).  This is probably because, as  mentioned in section 4.2, the estimation of hypsometric integral based on only a few points (i.e., 7x7) within the moving window is not very accurate. Thus, it might not be a suitable "local measure" to use when the moving window size is small.  For terrain clusters derived from different variable groups (i.e., 3-6, 2 or 5), the results of summary statistics of various local roughness measures are not shown. The characteristics of all seven local roughness variables within each terrain cluster derived from different variable groups and window sizes can be summarized similarly as above.  For the whole study area, the results of summary statistics of each variable within various hierarchical terrain clusters derived from variable group (3-5) and two different window sizes (5x5 and 9x9) are presented in Tables 5.6a-g. The rankings (roughest to the least rough cluster) of each local roughness variable at different levels (2 to 6) are summarized as follows: 1)  For window size 5x5  itrelief:  {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,2,5,6,3,4}  itslope":  {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,5,2,6,3,4}  IIcurv  {2,1} {1,2,3} {3,2,1,4} {2,1,5,4,3} {2,1,5,6,4,3}  hypint":  {2,1} {3,1,2} {4,2,1,3} {3,1,5,4,2} {5,3,1,6,4,2}  std":  {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,2,5,6,3,4}  IIhighpt":  {1,2} {2,1,3} {3,1,4,2} {2,4,5,3,1} {2,4,6,1,3,5}  255  "rough": 2)  {2,1} {1,2,3} {2,3,1,4} {1,2,5,3,4} {1,2,5,6,3,4}  For window size 9x9  "relief:  {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,5,2,6,4,3}  "slope":  {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,2,5,6,4,3}  "curv":  {1,2} {3,2,1} {2,1,4,3} {2,1,3,5,4} {5,1,2,6,4,3}  "hypint":  {1,2} {2,1,3} {1,3,2,4} {3,4,2,1,5} {2,3,1,6,5,4}  "std":  {1,2} {2,3,1} {1,2,4,3} {2,1,3,5,4} {1,5,2,6,4,3}  "highpt":  {2,1} {3,1,2} {2,4,3,1} {1,5,4,2,3} {5,4,6,3,1,2}  "rough":  {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,5,2,6,4,3}  From the above ordering of the terrain clusters according to various local roughness variables, several similar observations as made for the two subareas can be made also for the whole study area. These observations can then be used to interpret the characteristics of each of the terrain clusters at various levels derived from variable group (3-5) and window size (5x5) or (9x9) in relation to the original local roughness measures.  For example, take the  classification results (3 classes; variable group 3-5; window size 5x5) as shown in Figure 5.12a. Cluster {1} (shown as black in the figure) represents the roughest area overall. This cluster has, on average, the highest local relief, the highest slope, the highest slope curvature, the highest standard deviation of elevations, and the highest roughness factor. Only the mean hypsometric integral value and the highpt value in this cluster are not the highest of the three. On the other hand, cluster {3} (shown as white in the figure) seems to represent the least rough area. A l l seven variables have the lowest average values in this cluster except the  256  Table 5.6a  Class  Summary statistics of local relief values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  273.1  182.1  15399  923.3  254.7  2  10564  753.1  212.2  7981  284.8  133.9  1  10564  753.1  212.2  7981  284.8  133.9  2  7512  333.5  194.2  10674  969.9  263.3  3  5304  187.6  119.1  4725  817.8  197.1  1  7512  333.5  194.2  10674  969.9  263.3  2  8313  768.1  201.6  4725  817.8  197.1  3  2251  697.6  239.4  4951  245.3  106.9  4  5304  187.6  119.1  3030  349.2  148.0  1  8313  768.1  201.55  4725  817.8  197.1  2  2251  697.6  239.41  5885  1086.5  253.3  3  5304  187.6  119.15  4789  826.7  196.2  4  4100  185.7  89.75  4951  245.3  106.9  5  3412  511.2  124.55  3030  349.2  148.0  1  4452  859.2  199.96  5885  1086.5  253.3  2  2251  697.6  239.41  4789  826.7  196.2  3  5304  187.6  119.15  4951  245.3  106.9  4  4100  185.7  89.75  3030  349.2  148.0  5  3861  663.2  144.23  2133  966.6  149.1  6  3412  511.2  124.55  2592  695.4  139.0  257  Table 5.6b  Class  Summary statistics of slope values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  4.68  3.21  15399  11.06  3.15  2  10564  12.59  3.03  7981  2.84  1.21  1  10564  12.59  3.03  7981  2.84  1.21  2  7512  5.69  3.46  10674  12.01  2.83  3  5304  3.25  2.12  4725  8.91  2.73  1  7512  5.69  3.46  10674  12.01  2.83  2  8313  13.01  2.78  4725  8.91  2.73  3  2251  11.06  3.38  4951  2.70  1.21  4  5304  3.25  2.12  3030  3.07  1.18  1  8313  13.01  2.78  4725  8.91  2.73  2  2251  11.06  3.38  5885  13.19  2.49  3  5304  3.25  2.12  4789  10.57  2.54  4  4100  3.04  1.38  4951  2.70  1.21  5  3412  8.88  2.34  3030  3.07  1.18  1  4452  14.27  2.73  5885  13.19  2.49  2  2251  11.06  3.38  4789  10.57  2.54  3  5304  3.25  2.12  4951  2.70  1.21  4  4100  3.04  1.38  3030  3.07  1.18  5  3861  11.55  2.02  2133  10.16  2.67  6  3412  8.88  2.34  2592  7.88  2.33  258  Table 5.6c  Summary statistics of slope curvature values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area.  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  1.77  1.04  15399  4.36  1.09  2  10564  4.40  1.10  7981  1.55  0.84  1  10564  4.40  1.10  7981  1.55  0.84  2  7512  2.19  1.05  10674  4.34  1.05  3  5304  1.17  0.66  4725  4.40  1.19  1  7512  2.19  1.05  10674  4.34  1.05  2  8313  4.11  0.88  4725  4.40  1.19  3  2251  5.47  1.15  4951  1.21  0.48  4  5304  1.17  0.66  3030  2.10  0.99  1  8313  4.11  0.88  4725  4.40  1.19  2  2251  5.47  1.15  5885  4.77  0.92  3  5304  1.17  0.66  4789  3.81  0.95  4  4100  1.52  0.80  4951  1.21  0.48  5  3412  3.01  0.66  3030  2.10  0.99  1  4452  4.54  0.79  5885  4.77  0.92  2  2251  5.47  1.15  4789  3.81  0.95  3  5304  1.17  0.66  4951  1.21  0.48  4  4100  1.52  0.80  3030  2.10  0.99  5  3861  3.61  0.70  2133  5.49  0.72  6  3412  3.01  0.66  2592  3.50  0.58  259  Table 5.6d  Class  Summary statistics of hypint values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  0.43  0.11  15399  0.42  0.10  2  10564  0.45  0.12  7981  0.39  0.12  1  10564  0.45  0.12  7981  0.39  0.12  2  7512  0.36  0.07  10674  0.47  0.07  3  5304  0.52  0.07  4725  0.31  0.06  1  7512  0.36  0.07  10674  0.47  0.07  2  8313  0.49  0.09  4725  0.31  0.06  3  2251  0.28  0.06  4951  0.47  0.07  4  5304  0.52  0.07  3030  0.27  0.06  1  8313  0.49  0.09  4725  0.31  0.06  2  2251  0.28  0.06  5885  0.43  0.06  3  5304  0.52  0.07  4789  0.52  0.06  4  4100  0.34  0.06  4951  0.47  0.07  5  3412  0.39  0.06  3030  0.27  0.06  1  4452  0.45  0.08  5885  0.43  0.06  2  2251  0.28  0.06  4789  0.52  0.06  3  5304  0.52  0.07  4951  0.47  0.07  4  4100  0.34  0.06  3030  0.27  0.06  5  3861  0.55  0.06  2133  0.28  0.06  6  3412  0.39  0.06  2592  0.34  0.05  260  Table 5.6e  Class  Summary statistics of standard deviation of elevation values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  76.6  52.5  15399  240.1  71.6  2  10564  216.8  66.8  7981  65.3  28.7  1  10564  216.8  66.8  7981  65.3  28.7  2  7512  93.9  57.0  10674  253.9  73.0  3  5304  52.1  32.4  4725  208.9  57.4  1  7512  93.9  57.0  10674  253.9  73.0  2  8313  222.7  63.9  4725  208.9  57.4  3  2251  194.9  72.5  4951  59.4  26.1  4  5304  52.1  32.4  3030  74.9  30.1  1  8313  222.7  63.9  4725  208.9  57.4  2  2251  194.9  72.5  5885  293.8  62.5  3  5304  52.1  32.4  4789  204.9  52.0  4  4100  50.1  24.3  4951  59.4  26.1  5  3412  146.7  36.6  3030  74.9  30.1  1  4452  255.2  62.0  5885  293.8  62.5  2  2251  194.9  72.5  4789  204.9  52.0  3  5304  52.1  32.4  4951  59.4  26.1  4  4100  50.1  24.3  3030  74.9  30.1  5  3861  185.2  41.7  2133  250.9  47.8  6  3412  146.7  36.6  2592  174.4  38.5  261  Table 5.6f  Class  Summary statistics of highpt values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  12.3  5.57  15399  39.3  20.6  2  10564  11.5  5.62  7981  41.3  19.2  1  10564  11.5  5.62  7981  41.3  19.2  2  7512  13.4  5.34  10674  36.4  20.6  3  5304  10.8  5.52  4725  45.8  19.0  1  7512  13.4  5.34  10674  36.4  20.6  2  8313  10.6  5.54  4725  45.8  19.0  3  2251  14.6  4.74  4951  38.6  19.4  4  5304  10.8  5.52  3030  45.7  18.0  1  8313  10.6  5.54  4725  45.8  19.0  2  2251  14.6  4.74  5885  38.5  20.1  3  5304  10.8  5.52  4789  33.8  20.8  4  4100  13.8  5.17  4951  38.6  19.4  5  3412  13.0  5.50  3030  45.7  18.0  1  4452  11.46  5.25  5885  38.5  20.1  2  2251  14.59  4.74  4789  33.8  20.8  3  5304  10.78  5.52  4951  38.6  19.4  4  4100  13.81  5.17  3030  45.7  18.0  5  3861  9.67  5.71  2133  48.8  17.8  6  3412  12.96  5.50  2592  43.4  19.5  262  Table 5.6g  Class  Summary statistics of rough values (window size: 5x5 and 9x9) within each terrain cluster (variable group: 3-5) for the whole study area. #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  0.55  0.67  15399  2.30  1.15  2  10564  2.83  1.27  7981  0.19  0.16  1  10564  2.83  1.27  7981  0.19  0.16  2  7512  0.76  0.76  10674  2.60  1.14  3  5304  0.26  0.33  4725  1.63  0.85  1  7512  0.76  0.76  10674  2.60  1.14  2  8313  2.93  1.22  4725  1.63  0.85  3  2251  2.47  1.37  4951  0.16  0.13  4  5304  0.26  0.33  3030  0.25  0.19  1  8313  2.93  1.22  4725  1.63  0.85  2  2251  2.47  1.37  5885  3.07  1.11  3  5304  0.26  0.33  4789  2.02  0.88  4  4100  0.21  0.18  4951  0.16  0.13  5  3412  1.42  0.66  3030  0.25  0.19  1  4452  3.50  1.27  5885  3.07  1.11  2  2251  2.47  1.37  4789  2.02  0.88  3  5304  0.26  0.33  4951  0.16  0.13  4  4100  0.21  0.18  3030  0.25  0.19  5  3861  2.28  0.75  2133  2.13  0.83  6  3412  1.42  0.66  2592  1.21  0.61  263  hypint value.  The results of summary statistics of each variable within various terrain clusters and the rankings of each local roughness variable at different levels (2 to 6) for other cases (i.e., terrain clusters derived using variable group 3-6, 2, or 5) are not presented here.  Similar  patterns can be expected.  From examining Tables 5.5a-5.6g, it is also noticeable that for most variables the greatest dichotomy occurs at nclass=2, but not for hypint, and that highpt never exhibits significant grouping. One must remember that the classifications are not based upon these particular variables.  5.3.4  Comparisons of different classification results  In order to see how similar/dissimilar the cluster groups are, the different classification results can be compared to each other quantitatively. To do that, a classification error matrix was calculated first for each comparison (only the nclass=2> results were considered in the analyses below). That is, two sets of classification results derived from using different variable groups or window sizes were cross-tabulated in a 3 by 3 two-dimensional array. Note that each error matrix was generated by using a total enumeration of the study area data and not just a sample. Therefore, the sampling scheme is not a concern in this study. Otherwise, when classification maps were sampled and compared to the "true" reference data for accuracy  264  assessment (as in land cover maps generated from remotely sensed data), an appropriate sampling scheme would be important and it would be necessary to consider the spatial complexity or the spatial autocorrelation in the data [Congalton, 1988a,b].  Once the classification error matrices had been generated, a test of agreement between two or more error matrices (classifications) was then done using the K H A T (or kappa) statistic to see if the classification results are different from each other [Cohen, 1960; Bishop et al, 1975; Congalton et al, 1983; Lillesand and Kiefer, 1994]. The K H A T statistic is computed as  KHAT (k) = — ^  "  ^ 2  - E  (5.1) KJ x  where r:  the number of rows in the error matrix  JC„:  the number of observations in row i and column i (on the major diagonal)  x:  total of observations in row i  x:  total of observations in column i  N:  total number of observations included in matrix  i+  +i  KHAT represents the ratio of beyond-random agreement to expected disagreement in a 265  random case. It serves as an indicator of the extent to which the percentage correct values of an error matrix are due to "true" agreement versus "chance" agreement.  As the true  agreement (observed) approaches 1 and the agreement due to chance approaches 0, K H A T approaches 1 [Lillesand and Kiefer, 1994]. The value of K H A T is equal to 0 for random agreement and 1 for perfect agreement. expected in a random case.  It is negative when agreement is less than that  Landis and Koch [1977] have suggested three ranges of  agreement for the K H A T statistic. These are poor (KHAT < 0.4), good (0.4 - 0.75) and excellent ( K H A T > 0.75).  These ranges will be adopted later in the discussion of the  classification comparison results.  The error matrices and the K H A T statistics of 16 different comparisons (window size: 7x7 or 21x21; variable group: 3-5, 3-6, 2, or 5 only) for subarea 93G are summarized and presented in Figures 5.13a-b.  Note that only one factor—either the window size or the  variable group—is different while the other factor is held consistent when comparing two classifications. The first comparison shown in Figure 5.13a, for example, is between the two variable groups (3-5) and (3-6) but with same window size (7x7). If the window size and the variable group were both changing, it would be difficult to interpret the comparison results.  Also worth noting is that since the class numbers (i.e., 1, 2 and 3 in this case)  directly resulting from cluster analysis do not follow same order for classifications derived using different window size or variable group, the major diagonals in the original error matrices would not necessarily represent the correct correspondence between terrain clusters from two different classifications. For each error matrix in the above figures, therefore, the  266  subarea 93 G window: 7x7 IT, l variable:  CO  window: 7x7  variable: 3-6 12911 5040  variable: 2 0  15060 9465  140  641  1459  8084  KHAT= 0.10  m co  12172 5764  jo  8572 3 03 51  'S >  window: 21x21  variable:  19988 586  0  11335 18298 87 220  42  2244  KHAT = 0.59  18339 9848  variable:  ro  425  2453  17385 2751  3  214  1382  vari able:  6110  14281 5296  4023  3 03 279 >  VO to  KHAT = 0.42  window: 21x21  997  in • ro  11038 9530 6  ii  2398 •c 03 110 >  11776 15546 1173 1223  KHAT = 0.32  window: 7x7  window: 7x7  variable: 5  variable: 5  22562 5784  hi  •§  3692 6213  variable: 5  ii 2773 6436 20511 f3 443 1899 •c 164 03 > KHAT = 0.21  4961  > 22  266  20550 245  CN  0  11357 6271  ii 6995  16267 4185  161  03  > 0  790  1416  3768  KHAT = 0.47  KHAT = 0.42  KHAT = 0.62  window: 21x21  window: 21x21  window: 21x21  variable: 5  variable: 5  variable: 2 co  15573 520  15582 2369 0 11684 11241 1740  variable: 2  <n • to  variable: 2 i  15  m m  window: 21x21  window: 7x7 VO  variable: 5  KHAT = 0.34  variable: 3-6 m m  window: 7x7  17028 7974  6541  128  3848  14950  62  353  1916  KHAT = 0.19  VO  i  ro  03  13494 15062 2987 35  "S 17 >  6389  12502  1028 1286  KHAT =0.23  CN  12656 4561  3 889 03 1  "S >  9550 8368  1 1736 15038  KHAT = 0.57  Figure 5.13a Error matrices and K H A T statistics of 12 comparisons of various classifications (differences between variable groups) for subarea 93G.  267  subarea 93 G  variable: 3-5  variable: 3-6  window: 21x21  window: 21x21  11338 6098  23488 4132  515  8836  14820 1009  400  8802  982  14286 378  130  508  KHAT = 0.49  variable: 2  variable: 5  961  15115 4260  1420  window: 21x21 13017 12603 1925  2103  7900  17444  529  0  15  4543  0  8674 1202  8099 6751  KHAT = 0.32  KHAT = 0.33  Figure 5.13b  7925  KHAT =0.18  window: 21x21  992  Error matrices and K H A T statistics of 4 different comparisons of various classifications (differences between the two window sizes for each of the 4 variable groups) for subarea 93G.  268  correct correspondences were found first according to the interpretation of classification results as discussed in section 5.3.3 and then the K H A T statistic calculated for each error matrix. Figures 5.13c-d give the error matrices and the K H A T statistics results of all 16 comparisons of various classifications for subarea 93H. For the whole study area, the error matrices and the K H A T statistics results of all 16 comparisons of various classifications (i.e., variable group: 3-5, 3-6, 2, and 5; window size: 5x5 and 9x9) were obtained and they are shown in Figures 5.13e-f.  The reason for computing the K H A T statistics was to see how similar the different classification groups were—when using the different clustering variables and window sizes— for each particular study area. The higher the K H A T value the more similar the resultant groups. A l l the K H A T values (nclass=3 only) for each of the three study area surfaces as shown in Figures 5.13a-f are summarized in Figure 5.13g for easier discussion. From the above quantitative comparisons of various terrain classification results derived from different variable groups and window sizes, two points can be made. First, there are some similarities between the different classifications. For example, the K H A T statistic for subarea 93G is 0.59 (good agreement) when the classifications using variable groups (3-5) and (3-6), with a window size (21x21), are compared to each other. For the same comparison (i.e., variable group 3-5 versus variable group 3-6 with window size being 21x21) in subarea 93H, the K H A T value is 0.51. For the whole study area, K H A T is as high as 0.73 (close to excellent agreement) for the comparison between classifications of using variable groups (2) and (5) with window size (5x5). This comes as no surprise because variable 2 (i.e., slope) is highly  269  subarea 93 H window: 7x7 variable: 3-6 variable:  in co 16096 1792  variable: 5  variable: 2  1572  4835  206  4157  1734  6763  15645  in• 16438 2978 co (L) 4693 3033 > 1972  44 1472  10142 12028  KHAT = 0.37  KHAT = 0.40  window: 21x21  window: 21x21  variable: 3-6  mi co  variable:  17665 3454 2677 518  7221 1756  variable: 2 1153 7573 10783  19719 2522 31 co u 1224 12559 3688 r—1 •  •c 1592 ca >  8157  3308  m  CO  JjJ 3 ca "S >  18668 792  0  6364  2052  782  4659  10879 8604  KHAT = 0.33  window: 21x21 ini  CO  CD  •c  ca >  variable: 5 19258 3011  3  1551  7050  8870  1565  5343  6149  KHAT = 0.51  KHAT = 0.49  KHAT = 0.44  window: 7x7  window: 7x7  window: 7x7  variable: 5  variable: 5  variable: 2 variable: 3-6  window: 7x7  window: 7x7  19095 3511  59  1973  2984  3804  2035  9658  9681  NO  co CD"  1 >  21726 939  0  3528  3110  2123  4437  9674  7263  CN CD  'S  22894 6669  128  175  8328  5220  34  1156  8196  >  KHAT = 0.39  KHAT = 0.41  KHAT = 0.60  window: 21x21  window: 21x21  window: 21x21  variable: 5  variable: 5  variable: 2 NO  variable:  CO  r<-i 15860 4541  459  17109 3727  24  3361  7359  1711  4619  5570  2242  2065  12152 5292  > 1895  5293  12321  KHAT = 0.36  Figure 5.13c  NO  KHAT =0.45  CN  19507 2977  3 2867 ca •c ca 0 >  51  11769 8602 658  6369  KHAT = 0.56  Error matrices and K H A T statistics of 12 comparisons of various classifications (differences between variable groups) for subarea 93H.  270  subarea 93 H variable: 3-5  variable: 3-6  window: 21x21  window: 21x21  15531 2552  1377  16195  3213  3257  3917  1073  4208  827  5590  2344  2824  13846 7472  3838  3628  13908  K H A T = 0.19  K H A T = 0.50  variable: 2  variable: 5  window: 21x21  window: 21x21  19246 3799  58  21351  7366  974  3211  12251  691  990  6861  5872  78  7188  6278  33  1177  8176  K H A T = 0.56  K H A T =0.51  Figure 5.13d Error matrices and K H A T statistics of 4 comparisons of various classifications (differences between the two window sizes for each of the 4 variable groups) for subarea 93H.  271  the whole study area window: 5x5  window: 5x5  variable: 3-6  variable: 2  377  193  5235  596  1681  2229  8130  ii  'S 205  >  in CO CD  KHAT = 0.35  ii  7838 556  57 523  289  3988  86 3646 6397  i  ro ii  •8  103  164  6449  3951  4  181  2265  756  'S 282 >  6428  3294  ii  196  215  6118  NO  99  0  793  2148  1784  1854  8528  in ro ii  s 03 >  7821  160  0  384  3928  413  324  7584  2766  KHAT = 0.48  window: 5x5  window: 5x5  variable: 5  variable: 5  i  8769  1400  5  212  2071  919  347  4267  5390  >  ro  4231  8888 490  0  440  6820  2688  0  428  3626  KHAT = 0.50  KHAT = 0.54  window: 9x9  window: 9x9  window: 9x9  variable: 5  variable: 5  8200  468  15  167  1043  3358  ca 600 >  2590  6939  I  4353 2963  variable: 5  7882  variable: 2 ro  0  window: 9x9  iable: 2  1255  SO  4760 544  KHAT = 0.40  variable:  8915  1  2898  ro  KHAT = 0.67  variable: 2 ii  4511  "S 292  window: 5x5 m  0  variable: 2 m  KHAT = 0.51  NO  604  window: 9x9  variable: 3-6 ro  4703  KHAT = 0.27  window: 9x9 in  variable: 5 in  variable:  in 4734 ro  window: 5x5  KHAT = 0.51  SO ro ii  8133  550  0  181  3686  701  •d 215 cd >  7436  2478  1  KHAT = 0.51  KHAT = 0.73  CN ii  1  8247  720  0  282  3639  180  0  7313  2999  KHAT = 0.47  Figure 5.13e Error matrices and K H A T statistics of 12 comparisons of various classifications (differences between variable groups) for the whole study area.  272  the whole study area  variable: 3-5  variable: 3-6  window: 9x9  window: 9x9  4340  267  697  8320  252  3615  2251  1646  150  2609 443  26  2207  8331  213  1707 8084  KHAT = 0.44  KHAT= 0.70  variable: 2  variable: 5  window: 9x9  window: 9x9  8779  589  10  8194  1131  3  188  3511  6249  335  7044  359  0  1  4053  0  3497  2817  KHAT = 0.57  Figure 5.13f  1602  KHAT = 0.65  Error matrices and K H A T statistics of 4 comparisons of various classifications (differences between the two window sizes for each of the 4 variable groups) for the whole study area.  273  subarea 93 G  subarea 93H 2  3-5  3-6  1.00  0.37 0.40 0.34  0.51  1.00  0.39 0.41  0.62  0.49  0.36  1.00  0.60  1.00  0.44  0.45 0.56  1.00  0.33 0.32  0.19  0.50  2  3-5  3-6  5  1.00  0.10 0.34 0.42  1  0.59  1.00  0.47  0.42  <N  0.21  0.19  1.00  0.32  0.23 0.57  0.18  0.49  1  <-A  5  0.56 0.51  the whole study area 3-5  3-6  1  1.00  0.35 0.27 0.40  i  0.51  1.00  0.50 0.54  0.67  0.51  1.00 0.73  0.48  0.51  0.47  0.44  0.70  0.57 0.65  <•<-.  Figure 5.13g  2  5  1.00  A summary of K H A T statistics for the two subareas (93G and 93H) and the whole study area. The values above the diagonal in each matrix box are for the smaller window size (i.e., 7x7 for the two subareas and 5x5 for the whole study area) and the values below the diagonal are for the larger window size (i.e., 21x21 for the two subareas and 9x9 for the whole study area). The values below each matrix box are for the comparisons between the two window sizes for each of the 4 variable groups. 274  correlated with variable 5 (i.e., the standard deviation of elevations) and therefore the classifications resulting from these two variable groups are quite similar. Second, the number of geomorphometric measures used in the multivariate clustering and the size of the moving window originally used in the extraction of those measures have different impacts on the classification result. Take the comparison of the classification results derived using variable group (3-5) and variable group (3-6) for example. The classifications show greater difference or poorer agreement when the smaller moving window was used in the variable extraction. As seen in Figure 5.13g, K H A T is only 0.10 for subarea 93G and 0.37 for subarea 93H with the smaller window size 7x7; and 0.35 for the whole study area with the smaller window size 5x5.  They tend to be more similar if the larger moving window (i.e., 21x21 for the two  subareas and 9x9 for the whole study area) was used in the variable extraction ( K H A T is 0.59, 0.51, and 0.51 respectively).  This is probably because variable #6 (i.e., highpt)  emphasizes more local terrain discontinuities as indicated in section 4.2.6. The addition of this variable to the multivariate signatures used for the clustering would only make more difference if smaller window sizes were used for the extraction of the variables. When comparing the classifications from different window sizes (i.e., 7x7 or 21x21 for the two subareas and 5x5 or 9x9 for the whole study area), it seems that for variable group (3-5) in the two subareas the classification difference is relatively large (i.e., a low K H A T value of 0.18 for subarea 93G and 0.19 for 93H). Whereas for other variable groups and surfaces the K H A T values range from 0.32 to 0.70 for the comparisons between different window sizes. A l l the above similarities and differences between various classifications can be accounted for by the nature of the terrain surfaces (both local and global characteristics) and the ability  275  of each variable in capturing different aspects of the variability of terrain. Since the K H A T values are generally in the poor to good range for various comparisons (few indicate a close to excellent agreement), the classification results based on all four different variable groups (3-5, 3-6, 2, 5) and two different moving window sizes (7x7 and 21x21 for the two subareas; 5x5 and 9x9 for the whole study area) will be examined later for the D E M error modelling.  As discussed and demonstrated above, the resulting classifications can be visually examined, quantitatively compared to each other, and interpreted in relation to the originally local roughness measures from which they were derived. The various classification results based on different variable groups and moving window sizes for different surfaces should also be statistically evaluated in conjunction with the observed spatial pattern of mismatch between DEMs of differing resolution shown in section 3.3.  This will be done in Chapter 6 to  examine the effectiveness of using multivariate hierarchical classification to quantify the relation between D E M errors and the roughness of terrain. The thesis hypothesis stated in Chapter 3 will then be tested.  276  6.  D E M ERROR M O D E L L I N G  6.1  Overview  As reviewed in Chapter 2, most D E M accuracy investigations were conducted along three lines: (i) describing or identifying the possible sources of gross errors; (ii) evaluating the effect of varying densities of data constituting the D E M or that of different interpolation methods; and (iii) comparing the "products" (e.g., spot height, contours) derived from a D E M with those obtained directly from terrestrial and photogrammetric surveying procedures, mostly from a producer's point of view. The accuracy estimate is usually constrained to a global measure such as R M S E .  Seldom are the errors described in terms of their spatial  domain or how the resolution of the D E M interacts with the relief variability. In addition, studies have demonstrated the scale-dependent nature of the terrain (e.g., Goodchild, 1982; Mark and Aronson, 1984; Roy et al., 1987). That is, specific landforms occur over only a limited range of scales and there is a wide range of topographic variation present in different terrain surfaces. Thus, in defining the accuracy of a D E M , one also needs ultimately to know the global and local characteristics of the terrain and how the resolution interacts with them.  As seen from the preliminary study results in Chapter 3, D E M errors and their spatial distribution appear to be related to the D E M resolution and its interaction with the variability of terrain.  In order to fully understand the issue of uncertainty in digital terrain  representation and the role of scale in it, some quantitative investigations of these relations  277  are required.  The rest of this chapter is organized around three tasks. Firstly, the correlations between the D E M errors and each of the seven local variables are examined. Then, the resulting terrain roughness classification will be statistically evaluated in conjunction with the observed spatial pattern of mismatch between DEMs of differing resolution. Finally, the quantitative relations between the extent and the spatial pattern of D E M errors, spatial resolution, and the terrain characteristics will be analyzed.  6.2  A New Approach to D E M Error Modelling  A new approach to D E M error modelling is taken in this research to demonstrate the importance of scale in terrain characterization and to show how D E M accuracy is related to the D E M resolution and how the spatial variation of D E M errors (i.e., elevation differences) is related to the roughness of terrain.  In this section, the terrain classification results  presented in section 5.3 are evaluated in conjunction with the study area D E M errors presented in section 3.3. The idea is to evaluate the resulting classification and to examine the effectiveness of using multivariate hierarchical classification to quantify the relation between D E M errors and topographic complexity. This is done by visual inspection and quantitative analysis to find out if D E M errors show significant variation within and between different clusters resulting from terrain classification. If classification results do show significant correlation with D E M errors, then it would suggest that hierarchical terrain  278  classification could be used to model D E M errors and their spatial distribution.  6.2.1  D E M errors and local roughness measures  Before examining the D E M errors in various terrain clusters derived from local roughness measures, the correlations between the D E M errors and each of the seven local variables are examined first. Figure 6.1a, for example, displays two scattergrams between D E M errors (WSC2 D E M errors as compared to T R I M D E M in section 3.3) and variable relief (derived using window size 7x7) for the two subareas (93G and 93H). For visualization purposes, a random sample of only 8000 points is taken from the original 52800 (i.e., 220 x 240) points and shown in the scattergram. Figure 6.1b shows two scattergrams between WSC2 D E M errors and variable relief derived using window size (21x21). The scattergrams between WSC2 D E M errors and other local roughness measures (i.e., slope, curv, hypint, std, highpt, and rough) derived using two different window sizes (7x7) and (21x21) for the two subareas are shown in Figures 6.2a-b to 6.7a-b. Several observations can be made based on the above scattergrams. Firstly, the positive and negative elevation differences are roughly symmetrical in the scattergrams between WSC2 D E M error and variables relief, slope, curv, hypint, std, and rough, no matter what the global and local characteristics are. This is in accordance with the relatively symmetrical histogram distributions of elevation differences derived from the six D E M comparisons as discussed in section 3.3. The above observation indicates that most local roughness variables can be useful only in relating the absolute amount of D E M error to the roughness measure but not in distinguishing the positive elevation differences from the  279  oo  03  <u  S3 x> 3  o o  oo  O  CO  4— •1 <u  o o co r-  o o  CO  aj  N  O  o o c\j  3  03  o  o  009-  009  3 03  0SML)S6'1!P  w Q  O 00  (N  u  o  3 (D  CO CO CO  o  "53 60  o  CM  03 O oo  <U t—  1  0SI-  001-  09  0  09-  OSM-6£6'1!P  00I--  03  < i-Iu 3  60  280  281  C/3  03  CO  O  3 1/1  CN  091-  001-  09  0  OS-  001--  ^ CD  Si  OSM-6£6'1!P  282  So  283  285  OO  cd  0SML)e6'l!P CD  091.  001-  09  0  09-  00I--  2 CD  0SM-6e61!P  286  g;  X  cu  o  N  LO CM O  oo  a  CM CO  o LO  oo  JZ CO CO  a  *H—» cd  T3  >  to  o c o  *H  o LO  -a 0  009  cd  009-  T3 a  OSML|e6'l!P  cd O  cu w p  LO CM  CN  u  CO  O CM  LO  r*-  co co CO  T -—' to  LO  c  CU  £  — I- > CU  X> a a 4— • » 3CO td o o — •> CO 4 cU  H 091-  001-  09  0  09-  OSM-6S61IP  001-  cd  m  NO CU  LH  a  288  5  289  291  292  negative ones. Secondly, the elevation differences do show certain consistent correlations with various local roughness variables. With most variables except highpt, the pattern seems to be that the higher the roughness measure, the more points with higher absolute elevation differences. This horn-shaped scatter of points indicates heteroscedasticity—the lack of equal variance. Thirdly, the scattergrams between WSC2 D E M error and variable highpt do present a different pattern. As seen in Figure 6.6b, for example, there seems to be more points with positive elevation differences when the variable highpt is low, but more points with negative elevation differences when the variable highpt is high. This suggests that the variable highpt may be the only one that is capable of indicating the sign of the elevation differences. Finally, there are some differences between the scattergrams for the two subareas and the two window sizes because of the different global characteristics of each surface. The scattergrams between elevation differences and local measures for other cases, such as for EMR1 D E M errors as compared to T R I M DEMs for the two subareas, NGDC5 and WSC2 D E M errors as compared to EMR1 D E M for the whole study area, and for variables derived using different window sizes, are not presented but similar patterns were observed.  6.2.2  D E M errors in various terrain clusters  This section examines D E M errors within various terrain clusters derived from the multivariate classification. Both WSC2 and EMR1 D E M errors (as compared to 50 m T R I M DEM) are evaluated for both subarea surfaces (93G and 93H) in order to investigate the nature of D E M errors in surfaces with different global characteristics. In addition, NGDC5  294  and WSC2 D E M errors (as compared to 1 km E M R l D E M ) within each terrain cluster are evaluated for the whole study area in order to examine the role of resolution or scale in the study of D E M error.  6.2.2.1  WSC2 D E M errors as compared to T R I M D E M  Firstly, WSC2 D E M errors (i.e., the elevation differences resulting from comparison of 2 km WSC2 and 50 m T R I M DEMs) for the two subareas (see Figures 3.20 and 3.21) were evaluated to demonstrate the technique. Figure 6.8a shows a histogram of WSC2 D E M errors within each of the three terrain classes based on geomorphometric variable group (3-5) and moving window size (7x7) for subareas 93G and 93H (the classification results were shown in Figure 5.1 la). Figures 6.8b gives the histogram of WSC2 D E M errors within each of the four classes derived from the same variable group and window size as above (i.e., variable group: 3-5, window size: 7x7).  Table 6.1a lists some summary statistics including the  number of data points, mean, and standard deviation of WSC2 D E M errors in each terrain class for both 93G and 93H subareas.  In this table, WSC2 D E M errors at different  hierarchical levels of terrain clusters (only levels 2 to 6 are shown due to space limitation) are summarized (variable group: 3-5, window size: 7x7). As mentioned earlier in section 6.2.1, most local roughness variables can only be useful in relating the absolute amount of D E M error to the roughness measure but not in distinguishing the positive elevation differences from the negative ones. Therefore, the absolute WSC2 D E M errors were also examined and their summary statistics within each terrain cluster (variable group: 3-5,  295  Cl,  GO CU  GO CO  03  3 O  60 -  ^  73 .03  2 S  03  >  C c t> o CU W rn Cd  "S X)  «  ,3  I §  cu  C+H .CH  0009  000S  OOOfr  OOOE  0003  0001  ° s 8  I U  cU  .9 * c o cm t3 o is h H  00  S3 U 3 3  o 03 "—'  S3 03  CU 03  33  CO  2 o  S£  0008  0009  OOOV  CU  G 'O  ao I SSi  0002  CN  •3  60 O  ts o ^ - SC cA  3H  03 OO NO  H 3  OOOE  0092  0002  0031-  000 1  oooe  00S  296  0002  oooi  60  On  Oh 00t>  OOE  002  OOl  CO uo  5  8 O  LO  O  CL 3 O  £ bo  •  J  JZ  8  0009  0008  000t>  OOOE  0003  0001  a c« 03  S  3 O  3 ° ^  (U 03  .9 S •- tU 1  2 2  2 c w  a  Q e 0092  0002  0091  000 1  cN •«  u 3  008  Eg ig  S  U  M 03 60 (U 2 o3  .23 x x hh 3 r-. X OO HIH to »—NO  60  297  Table 6.la  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  2.7  21.9  33340  15.9  221.3  2  10184  2.3  31.2  19460  5.0  136.7  1  10184  2.3  31.2  24142  53.3  222.5  2  24665  6.4  23.0  9198  -82.4  184.8  3  17951  -2.4  19.1  19460  5.0  136.7  1  24665  6.4  23.0  20025  47.5  220.9  2  8563  2.4  31.2  9198  -82.4  184.8  3  17951  -2.4  19.1  19460  5.0  136.7  4  1621  1.6  31.1  4117  81.5  228.4  1  17711  8.6  24.6  9198  -82.4  184.8  2  8563  2.4  31.2  19460  5.0  136.7  3  17951  -2.4  19.1  13670  42.7  204.2  4  1621  1.6  31.1  6355  58.0  252.8  5  6954  1.0  17.0  4117  81.5  228.4  1  8563  2.4  31.2  19460  5.0  136.7  2  17951  -2.4  19.1  13670  42.7  204.2  3  6654  12.2  23.5  6355  58.0  252.8  4  1621  1.6  31.1  6219  -49.4  148.9  5  11057  6.4  25.0  4117  81.5  228.4  6  6954  1.0  17.0  2979  -151.4  228.4  298  window size: 7x7) are given in Table 6.1b for the two subareas. For moving window size (21x21) and variable group (3-5), the summary statistics of WSC2 D E M errors are listed in Table 6.1c and the statistics of the absolute WSC2 D E M errors within each terrain cluster are summarized in Table 6.Id.  To examine the effect of using different geomorphometric variable groups on terrain classification and hence on D E M error modelling, classification results based on a different variable group were also considered for the evaluation of WSC2 D E M errors as compared to T R I M DEMs for the two subareas. For example, more results were obtained for the two subareas by comparing WSC2 D E M errors within various terrain clusters derived from geomorphometric variable group (3-6) and different moving window sizes (7x7 or 21x21). With variable group (3-6), the statistics of WSC2 D E M errors and the absolute WSC2 D E M errors within each terrain cluster are summarized in Tables 6.2a-b for window size (7x7) and in Tables 6.2c-d for window size (21x21).  As indicated earlier, variable group (3-6) includes curv, hypint, std, and highpt. Although these variables are not highly correlated and each represents some aspect of the roughness of terrain, it may be interesting to see how the results will differ if only one variable is used in the classification. In order to make this comparison, terrain clusters derived from slope (or 2) and std (or 5) are used as well for the examination of WSC2 D E M errors.  With  variable slope, the summary statistics of WSC2 D E M errors and the absolute WSC2 D E M errors in each cluster are given in Tables 6.3a-d for two different window sizes (7x7 and  299  Table 6.1b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas  Class  #of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  16.55  14.55  33340  170.55  141.88  2  10184  22.85  21.37  19460  103.29  89.62  1  10184  22.85  21.37  24142  182.16  138.51  2  24665  17.96  15.70  9198  140.07  146.06  3  17951  14.60  12.54  19460  103.29  89.62  1  24665  17.96  15.70  20025  179.92  136.66  2  8563  22.53  21.74  9198  140.07  146.06  3  17951  14.60  12.54  19460  103.29  89.62  4  1621  24.55  19.19  4117  193.06  146.71  1  17711  19.78  16.94  9198  140.07  146.06  2  8563  22.53  21.74  19460  103.29  89.62  3  17951  14.60  12.54  13670  166.31  125.89  4  1621  24.55  19.19  6355  209.21  153.30  5  6954  13.33  10.66  4117  193.06  146.71  1  8563  22.53  21.74  19460  103.29  89.62  2  17951  14.60  12.54  13670  166.31  125.89  3  6654  20.78  16.31  6355  209.21  153.30  4  1621  24.55  19.19  6219  111.96  109.87  5  11057  19.18  17.29  4117  193.06  146.71  6  6954  13.33  10.66  2979  198.77  188.63  300  Table 6. lc  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32226  4.3  27.3  30528  19.8  231.7  2  20574  0.0  17.2  22272  0.9  126.1  1  29720  3.6  27.7  22272  0.9  126.1  2  20574  0.0  17.2  17471  84.8  208.3  3  2506  13.1  19.8  13057  -67.1  232.9  1  12681  -2.9  32.6  17471  84.8  208.3  2  17039  8.4  22.2  13057  -67.1  232.9  3  20574  0.0  17.2  17147  12.6  126.7  4  2506  13.1  19.8  5125  -38.1  115.6  1  17039  8.4  22.2  13057  -67.1  232.9  2  20574  0.0  17.2  17147  12.6  126.7  3  2506  13.1  19.8  12113  87.8  197.6  4  9555  -4.9  28.0  5358  78.1  230.8  5  3126  3.2  43.3  5125  -38.1  115.6  1  20574  0.0  17.2  17147  12.6  126.7  2  2506  13.1  19.8  6245  -114.7  276.5  3  9555  -4.9  28.0  12113  87.8  197.6  4  13138  8.5  23.9  5358  78.1  230.8  5  3126  3.2  43.3  6812  -23.4  172.9  6  3901  7.8  14.7  5125  -38.1  115.6  301  Table 6.Id  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32226  20.41  18.61  30528  182.11  144.62  2  20574  13.62  10.48  22272  95.94  81.78  1  29720  20.40  19.07  22272  95.94  81.78  2  20574  13.62  10.48  17471  184.04  129.33  3  2506  20.49  11.95  13057  179.52  162.82  1  12681  23.72  22.58  17471  184.04  129.33  2  17039  17.93  15.50  13057  179.52  162.82  3  20574  13.62  10.48  17147  99.23  79.79  4  2506  20.49  11.95  5125  84.95  87.23  1  17039  17.93  15.50  13057  179.52  162.82  2  20574  13.62  10.48  17147  99.23  79.79  3  2506  20.49  11.95  12113  179.24  120.83  4  9555  20.85  19.32  5358  194.88  146.19  5  3126  32.49  28.74  5125  84.95  87.23  1  20574  13.62  10.48  17147  99.23  79.79  2  2506  20.49  11.95  6245  228.92  192.87  3  9555  20.85  19.32  12113  179.24  120.83  4  13138  19.59  16.21  5358  194.88  146.19  5  3126  32.49  28.74  6812  134.23  111.48  6  3901  12.34  11.18  5125  84.95  87.23  302  Table 6.2a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  24188  1.7  27.6  30135  28.9  226.1  2  28612  3.4  20.4  22665  -10.9  138.7  1  28612  3.4  20.4  21374  -4.1  222.2  2  22589  1.5  27.3  22665  -10.9  138.7  3  1599  3.7  30.3  8761  109.5  215.0  1  22589  1.5  27.3  22665  -10.9  138.7  2  16793  6.0  21.5  16869  24.0  210.3  3  1599  3.7  30.3  8761  109.5  215.0  4  11819  -0.3  18.1  4505  -109.3  233.8  1  16793  6.0  21.5  16869  24.0  210.3  2  11188  -5.6  25.5  8761  109.5  215.0  3  11401  8.5  27.3  8893  -49.6  137.0  4  1599  3.7  30.3  13772  14.1  134.0  5  11819  -0.3  18.1  4505  -109.3  233.8  1  11188  -5.6  25.5  8761  109.5  215.0  2  10099  8.1  22.9  8893  -49.6  137.0  3  11401  8.5  27.3  13092  1.8  198.3  4  1599  3.7  30.3  13772  14.1  134.0  5  11819  -0.3  18.1  4505  -109.3  233.8  6  6694  2.9  18.7  3777  100.9  231.8  303  Table 6.2b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  24188  20.12  18.90  30135  177.54  142.98  2  28612  15.77  13.37  22665  103.51  93.02  1  28612  15.77  13.37  21374  171.48  141.41  2  22589  19.82  18.90  22665  103.51  93.02  3  1599  24.40  18.41  8761  192.32  145.69  1  22589  19.82  18.90  22665  103.51  93.02  2  16793  16.92  14.54  16869  166.88  130.26  3  1599  24.40  18.41  8761  192.32  145.69  4  11819  14.15  11.29  4505  188.70  175.99  1  16793  16.92  14.54  16869  166.88  130.26  2  11188  18.97  17.94  8761  192.32  145.69  3  11401  20.64  19.76  8893  104.65  101.38  4  1599  24.40  18.41  13772  102.78  87.19  5  11819  14.15  11.29  4505  188.70  175.99  1  11188  18.97  17.94  8761  192.32  145.69  2  10099  18.49  15.76  8893  104.65  101.38  3  11401  20.64  19.76  13092  154.38  124.43  4  1599  24.40  18.41  13772  102.78  87.19  5  11819  14.15  11.29  4505  188.70  175.99  6  6694  14.54  12.11  3777  210.20  140.38  304  Table 6.2c  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  #of points (93H)  Mean (93H)  Std.dev (93H)  1  21257  -0.7  28.0  31940  24.8  227.2  2  31543  4.9  20.5  20860  -8.0  126.9  1  31543  4.88  20.46  19509  -38.28  231.63  2  18926  -2.4  28.37  20860  -8.0  126.88  3  2331  13.0  20.2  12431  123.85  179.77  1  18926  -2.4  28.37  20860  -8.0  126.88  2  13602  -5.15  14.8  13096  -24.27  259.99  3  17941  12.49  20.85  12431  123.85  179.77  4  2331  13.0  20.2  6413  -66.89  154.8  1  13602  -5.15  14.8  13096  -24.27  259.99  2  17941  12.49  20.85  12431  123.85  179.77  3  10152  -4.0  23.6  9461  -33.25  94.29  4  8774  -0.55  32.9  11399  12.9  145.30  5  2331  13.0  20.2  6413  -66.89  154.8  1  17941  12.49  20.85  12431  123.85  179.77  2  10152  -4.0  23.6  9461  -33.25  94.29  3  8774  -0.55  32.92  5010  -179.86  267.35  4  6741  -7.1  14.66  11399  12.9  145.30  5  2331  13.0  20.2  6413  -66.89  154.8  6  6861  -3.2  14.75  8086  72.1  202.2  305  Table 6.2d  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  21257  20.53  19.08  31940  178.33  142.87  2  31543  15.90  13.78  20860  95.89  83.48  1  31543  15.90  13.78  19509  178.47  152.54  2  18926  20.52  19.74  20860  95.89  83.48  3  2331  20.61  12.39  12431  178.12  126.21  1  18926  20.52  19.74  20860  95.89  83.48  2  13602  12.54  9.46  13096  202.64  164.69  3  17941  18.44  15.84  12431  178.12  126.21  4  2331  20.61  12.39  6413  129.12  108.49  1  13602  12.54  9.46  13096  202.64  164.69  2  17941  18.44  15.84  12431  178.12  126.21  3  10152  18.19  15.63  9461  75.48  65.57  4  8774  23.22  23.34  11399  112.83  92.47  5  2331  20.61  12.39  6413  129.12  108.49  1  17941  18.44  15.84  12431  178.12  126.21  2  10152  18.19  15.63  9461  75.48  65.57  3  8774  23.22  23.34  5010  245.98  208.14  4  6741  13.21  9.56  11399  112.83  92.47  5  2331  20.61  12.39  6413  129.12  108.49  6  6861  11.88  9.32  8086  175.78  123.30  306  Table 6.3a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32005  5.08  27.42  39256  -1.37  164.99  2  20795  -1.16  16.61  13544  50.16  258.08  1  27447  5.04  25.86  13544  50.16  258.08  2  4558  5.30  35.42  23103  -11.02  139.41  3  20795  -1.16  16.61  16153  12.44  195.04  1  4558  5.30  35.42  23103  -11.02  139.41  2  20795  -1.16  16.61  16153  12.44  195.04  3  14486  5.77  27.35  5818  55.85  281.82  4  12961  4.23  24.07  7726  45.88  238.58  1  20795  -1.16  16.61  16153  12.44  195.04  2  14486  5.77  27.35  5818  55.85  281.82  3  680  -13.11  30.11  10239  -28.92  113.91  4  3878  8.53  35.30  12864  3.22  155.31  5  12961  4.23  24.07  7726  45.88  238.58  1  14486  5.77  27.35  5818  55.85  281.82  2  680  -13.11  30.11  10239  -28.92  113.91  3  3878  8.53  35.30  12864  3.22  155.31  4  12961  4.23  24.07  7726  45.88  238.58  5  12420  -2.57  14.36  8438  19.45  206.13  6  8375  0.93  19.30  7715  4.76  181.85  307  Table 6.3b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32005  20.70  18.70  39256  123.41  109.53  2  20795  13.25  10.09  13544  210.56  157.43  1  27447  19.71  17.48  13544  210.56  157.43  2  4558  26.62  23.96  23103  104.49  92.94  3  20795  13.25  10.09  16153  150.46  124.73  1  4558  26.62  23.96  23103  104.49  92.94  2  20795  13.25  10.09  16153  150.46  124.73  3  14486  20.81  18.65  5818  232.75  168.41  4  12961  18.49  15.98  7726  193.85  146.43  1  20795  13.25  10.09  16153  150.46  124.73  2  14486  20.81  18.65  5818  232.75  168.41  3  680  25.51  20.66  10239  84.62  81.55  4  3878  26.81  24.49  12864  120.31  98.26  5  12961  18.49  15.98  7726  193.85  146.43  1  14486  20.81  18.65  5818  232.75  168.41  2  680  25.51  20.66  10239  84.62  81.55  3  3878  26.81  24.49  12864  120.31  98.26  4  12961  18.49  15.98  7726  193.85  146.43  5  12420  11.85  8.50  8438  162.35  128.48  6  8375  15.32  11.76  7715  137.45  119.15  308  Table 6.3c  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  29393  0.70  17.09  30265  20.63  234.96  2  23407  5.02  30.29  22535  0.05  119.39  1  23407  5.02  30.29  22535  0.05  119.39  2  17218  -2.29  13.98  23238  15.61  204.48  3  12175  4.94  19.95  7027  37.23  314.88  1  20555  5.30  27.30  23238  15.61  204.48  2  17218  -2.29  13.98  7027  37.23  314.88  3  2852  3.03  46.41  11407  -22.37  91.65  4  12175  4.94  19.95  11128  23.03  138.62  1  17218  -2.29  13.98  7027  37.23  314.88  2  2852  3.03  46.41  11407  -22.37  91.65  3  10574  5.30  28.98  14877  -0.23  197.76  4  12175  4.94  19.95  11128  23.03  138.62  5  9981  5.31  25.40  8361  43.80  213.04  1  2852  3.03  46.41  11407  -22.37  91.65  2  10574  5.30  28.98  14877  -0.23  197.76  3  12175  4.94  19.95  11128  23.03  138.62  4  9481  -2.35  12.97  5413  24.30  304.59  5  9981  5.31  25.40  8361  43.80  213.04  6  7737  -2.23  15.12  1614  80.63  343.72  309  Table 6.3d  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  29393  13.75  10.17  30265  185.43  145.75  2  23407  22.81  20.55  22535  92.48  75.51  1  23407  22.81  20.55  22535  92.48  75.51  2  17218  11.50  8.28  23238  162.72  124.82  3  12175  16.93  11.65  7027  260.56  180.65  1  20555  21.13  18.08  23238  162.72  124.82  2  17218  11.50  8.28  7027  260.56  180.65  3  2852  34.90  30.73  11407  71.87  61.10  4  12175  16.93  11.65  11128  113.61  82.69  1  17218  11.50  8.28  7027  260.56  180.65  2  2852  34.90  30.73  11407  71.87  61.10  3  10572  21.91  19.69  14877  158.07  118.83  4  12175  16.93  11.65  11128  113.61  82.69  5  9981  20.31  16.16  8361  170.98  134.42  1  2852  34.90  30.73  11407  71.87  61.10  2  10574  21.91  19.69  14877  158.07  118.83  3  12175  16.93  11.65  11128  113.61  82.69  4  9481  11.11  7.10  5413  248.87  177.25  5  9981  20.31  16.16  8361  170.98  134.42  6  7737  11.97  9.50  1614  299.76  186.38  310  21x21). Tables 6.4a-d list the summary results for variable std.  In order to examine the relation between the D E M errors and the "roughness", the above summary statistics of WSC2 D E M errors in each terrain cluster need to be related back to the roughness characteristics of each cluster as interpreted in section 5.3.3. Therefore, for each case (subarea: 93G or 93H, window size: 7x7 or 21x21, variable group: 3-5, 3-6, 2, or 5) the clusters at each hierarchical level are ranked from the one with the highest D E M error to the one with the lowest D E M error. Based on the observations made in section 6.2.1 when examining the scattergrams between WSC2 D E M errors and each of the seven local variables (e.g., roughly symmetrical distribution of the positive and negative elevation differences for most variables), the standard deviation of WSC2 D E M errors in each cluster can be considered as a measure of the overall amount of error in that cluster. Whereas for the absolute WSC2 D E M errors in each cluster, the mean absolute error should serve as such a measure. As shown in Figure 6.8a, for example, cluster #1 in subarea 93G is the roughest as identified in section 5.3.3 and it corresponds to the largest standard deviation of elevation differences (i.e., the most spread-out histogram of WSC2 D E M errors). Cluster #3 represents the least rough class in the subarea (as seen in section 5.3.3) and it corresponds to a histogram with the smallest standard deviation. The same is true for subarea 93H. That is, the ranking of the three clusters based on WSC2 D E M errors appears to be consistent with the ranking of the clusters by "roughness."  Based on the information presented in Tables 6. la-d to 6.4a-d, the ranking of the clusters  311  Table 6.4a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  #of points (93H)  Mean (93H)  Std.dev (93H)  1  25255  5.2  28.2  23109  38.6  233.9  2  27545  0.3  19.0  29691  -9.0  154.0  1  7953  6.9  31.7  9386  52.5  259.9  2  27545  0.3  19.0  29691  -9.0  154.0  3  17302  4.4  26.4  13723  29.1  213.8  1  27545  0.3  19.0  29691  -9.0  154.0  2  17302  4.4  26.4  2681  35.8  291.3  3  1782  0.8  33.4  13723  29.1  213.8  4  6171  8.7  30.9  6705  59.1  245.9  1  17302  4.4  26.4  2681  35.8  291.3  2  1782  0.8  33.4  13723  29.1  213.8  3  6171  8.7  30.9  6705  59.1  245.9  4  14745  2.7  21.2  17792  3.5  169.9  5  12800  -2.5  15.6  11899  -27.6  124.2  1  1782  0.8  33.4  13723  29.1  213.8  2  6171  8.7  30.9  6705  59.1  245.9  3  14745  2.7  21.2  17792  3.5  169.9  4  10696  3.0  25.0  11899  -27.6  124.2  5  12800  -2.5  15.6  2238  44.8  288.7  6  6606  6.6  28.5  443  -9.6  300.7  312  Table 6.4b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  25255  21.23  19.26  23109  186.57  146.25  2  27545  14.59  12.12  29691  114.00  103.87  1  7953  23.73  22.07  9386  213.09  157.74  2  27545  14.59  12.12  29691  114.00  103.87  3  17302  20.08  17.70  13723  168.43  134.88  1  27545  14.59  12.12  29691  114.00  103.87  2  17302  20.08  17.70  2681  244.54  162.29  3  1782  25.27  21.87  13723  168.43  134.88  4  6171  23.28  22.11  6705  200.52  154.12  1  17302  20.08  17.70  2681  244.54  162.29  2  1782  25.27  21.87  13723  168.43  134.88  3  6171  23.28  22.11  6705  200.52  154.12  4  14745  16.23  13.83  17792  128.26  111.44  5  12800  12.69  9.45  11899  92.67  87.12  1  1782  25.27  21.87  13723  168.43  134.88  2  6171  23.28  22.11  6705  200.52  154.12  3  14745  16.23  13.83  17792  128.26  111.44  4  10696  19.27  16.16  11899  92.67  87.12  5  12800  12.69  9.45  2238  243.16  161.81  6  6606  21.39  19.89  443  251.54  164.70  313  Table 6.4c  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  36025  2.0  19.6  37778  10.7  157.6  2  16775  3.9  31.3  15022  14.6  265.5  1  16775  3.9  31.3  15022  14.6  265.5  2  22479  4.0  22.2  22374  3.3  127.9  3  13546  -1.2  13.8  15404  21.5  192.3  1  22479  4.0  22.2  22374  3.3  127.9  2  6296  4.7  38.5  5711  -14.6  296.4  3  10479  3.5  26.0  15404  21.5  192.3  4  13546  -1.2  13.8  9311  32.5  243.0  1  6296  4.7  38.5  5711  -14.6  296.4  2  10479  3.5  26.0  15404  21.5  192.3  3  13546  -1.2  13.8  12237  -13.6  106.0  4  11958  4.9  25.6  9311  32.5  243.0  5  10521  3.0  17.5  10137  23.8  147.6  1  10479  3.5  26.0  15404  21.5  192.3  2  13546  -1.2  13.8  12237  -13.6  106.0  3  5057  3.2  34.8  9311  32.5  243.0  4  11958  4.9  25.6  4468  6.7  296.5  5  10521  3.0  17.5  10137  23.8  147.6  6  1239  10.5  50.3  1243  -91.0  283.3  314  Table 6.4d  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  36025  15.19  12.58  37778  120.67  101.94  2  16775  23.30  21.21  15022  208.87  164.57  1  16775  23.30  21.21  15022  208.87  164.57  2  22479  17.57  14.10  22374  98.89  81.15  3  13546  11.22  8.09  15404  152.31  119.30  1  22479  17.57  14.10  22374  98.89  81.15  2  6296  28.64  26.13  5711  236.96  178.59  3  10479  20.09  16.81  15404  152.31  119.30  4  13546  11.22  8.09  9311  191.63  152.82  1  6296  28.64  26.13  5711  236.96  178.59  2  10479  20.09  16.81  15404  152.31  119.30  3  13546  11.22  8.09  12237  82.31  68.12  4  11958  20.21  16.38  9311  191.63  152.82  5  10521  14.57  10.15  10137  118.90  90.56  1  10479  20.09  16.81  15404  152.31  119.30  2  13546  11.22  8.09  12237  82.31  68.12  3  5057  26.53  22.81  9311  191.63  152.82  4  11958  20.21  16.38  4468  235.24  180.52  5  10521  14.57  10.15  10137  118.90  90.56  6  1239  37.25  35.44  1243  243.16  171.38  315  according to the standard deviation-based WSC2 D E M errors (first line) and the absolute WSC2 D E M errors (second line) can be summarized as follows for various cases: 1)  For subarea 93G  variable group: 3-5, window size: 7x7 {2,1} {1,2,3} {2,4,1,3} {2,4,1,3,5} {1,4,5,3,2,6} {2,1} {1,2,3} {4,2,1,3} {4,2,1,3,5} {4,1,3,5,2,6} variable group: 3-5, window size: 21x21 {1,2} {1,3,2} {1,2,4,3} {5,4,1,3,2} {5,3,4,2,1,6} {1,2} {3,1,2} {1,4,2,3} {5,4,3,1,2} {5,3,2,4,1,6} variable group: 3-6, window size: 7x7 {1,2} {3,2,1} {3,1,2,4} {4,3,2,1,5} {4,3,1,2,6,5} {1,2} {3,2,1} {3,1,2,4} {4,3,2,1,5} {4,3,1,2,6,5} variable group: 3-6, window size: 21x21 {1,2} {2,1,3} {1,3,4,2} {4,3,2,5,1} {3,2,1,5,6,4} {1,2} {3,2,1} {4,1,3,2} {4,5,2,3,1} {3,5,1,2,4,6} variable group: 2, window size: 7x7 {1,2} {2,1,3} {1,3,4,2} {4,3,2,5,1} {3,2,1,4,6,5} {1,2} {2,1,3} {1,3,4,2} {4,3,2,5,1} {3,2,1,4,6,5} variable group: 2, window size: 21x21 {2,1} {1,3,2} {3,1,4,2} {2,3,5,4,1} {1,2,5,3,6,4} {2,1} {1,3,2} {3,1,4,2} {2,3,5,4,1} {1,2,5,3,6,4} variable group: 5, window size: 7x7  316  {1,2} {1,3,2} {3,4,2,1} {2,3,1,4,5} {1,2,6,4,3,5} {1,2} {1,3,2} {3,4,2,1} {2,3,1,4,5} {1,2,6,4,3,5} variable group: 5, window size: 21x21 {2,1} {1,2,3} {2,3,1,4} {1,2,4,5,3} {6,3,1,4,5,2} {2,1} {1,2,3} {2,3,1,4} {1,4,2,5,3} {6,3,4,1,5,2} 2)  For subarea 93H  variable group: 3-5, window size: 7x7 {1,2} {1,2,3} {4,1,2,3} {4,5,3,1,2} {3,6,5,2,4,1} {1,2} {1,2,3} {4,1,2,3} {4,5,3,1,2} {3,6,5,2,4,1} variable group: 3-5, window size: 21x21 {1,2} {3,2,1} {2,1,3,4} {1,4,3,2,5} {2,4,3,5,1,6} {1,2} {2,3,1} {1,2,3,4} {4,1,3,2,5} {2,4,3,5,1,6} variable group: 3-6, window size: 7x7 {1,2} {1,3,2} {4,3,2,1} {5,2,1,3,4} {5,6,1,3,2,4} {1,2} {3,1,2} {3,4,2,1} {2,5,1,3,4} {6,1,5,3,2,4} variable group: 3-6, window size: 21x21 {1,2} {1,3,2} {2,3,4,1} {1,2,5,4,3} {3,6,1,5,4,2} {1,2} {1,3,2} {2,3,4,1} {1,2,5,4,3} {3,1,6,5,4,2} variable group: 2, window size: 7x7 {2,1} {1,3,2} {3,4,2,1} {2,5,1,4,3} {1,4,5,6,3,2} {2,1} {1,3,2} {3,4,2,1} {2,5,1,4,3} {1,4,5,6,3,2} variable group: 2, window size: 21x21  317  {1,2} {3,2,1} {2,1,4,3} {1,5,3,4,2} {6,4,5,2,3,1} {1,2} {3,2,1} {2,1,4,3} {1,5,3,4,2} {6,4,5,2,3,1} variable group: 5, window size: 7x7 {1,2} {1,3,2} {2,4,3,1} {1,3,2,4,5} {6,5,2,1,3,4} {1,2} {1,3,2} {2,4,3,1} {1,3,2,4,5} {6,5,2,1,3,4} variable group: 5, window size: 21x21 {2,1} {1,3,2} {2,4,3,1} {1,4,2,5,3} {4,6,3,1,5,2} {2,1} {1,3,2} {2,4,3,1} {1,4,2,5,3} {6,4,3,1,5,2}  By comparing the above rankings of the clusters based on WSC2 D E M errors to those "roughness" rankings based on the local variables presented in section 5.3.3, several observations can be made. First, a fairly consistent relation exists between D E M errors and the roughness of the terrain clusters for all the cases tested. That is, the general pattern seems to be that the rougher the cluster, the larger the D E M error (measured with either the standard deviation of the elevation differences or the mean of the absolute elevation differences in each cluster). For example, with variable group (3-5) and window size 7x7 in subarea 93G, cluster #1 at level 3 is the roughest and it has the largest standard deviation of WSC2 D E M errors and the highest mean of the absolute WSC2 D E M errors. Cluster #2 is the second roughest and it has the second largest standard deviation of WSC2 D E M errors and the mean of the absolute WSC2 D E M errors. Cluster #3 is the least rough one and it corresponds to a cluster with the smallest amount of WSC2 D E M errors. With variable group (3-6) and window size 7x7 in subarea 93G, cluster #3 at level 3 is the roughest and it has the  318  largest standard deviation of WSG2 D E M errors and the highest mean of the absolute WSC2 D E M errors.  Cluster #2 is the second roughest and it has the second largest standard  deviation of WSC2 D E M errors and the mean of the absolute WSC2 D E M errors. Cluster #1 is the least rough one and it corresponds to a cluster with the smallest amount of WSC2 D E M errors. Second, the rankings of the clusters based on the original WSC2 D E M errors (measured with the standard deviation of the elevation differences) and on the absolute WSC2 D E M errors (measured with the mean of the absolute elevation differences) are fairly consistent (at least the first four levels) for most cases, especially with variable group (2) and group (5). This is because of the relatively symmetrical distribution of the positive and negative elevation differences. However, with variable groups (3-5) and (3-6), the rankings of the clusters based on the two different measures in some cases (e.g., window size 21x21 for subarea 93G) show consistency only at the first two levels.  This can probably be  attributed to the inclusion of the variables curv, hypint and/or highpt in the classification of terrain clusters and the characteristics of these variables. As indicated in section 6.2.1, for example, there seems to be more D E M points with positive elevation differences when the variable highpt is low, but more points with negative elevation differences when highpt value is high.  When there is an asymmetrical distribution of positive and negative elevation  differences, the rankings of the clusters based on the original WSC2 D E M errors and on the absolute WSC2 D E M errors will likely show some inconsistency. It is, thus, desirable to examine both the original D E M errors and the absolute D E M errors. Finally, it should be noted that the differences between D E M errors in some clusters become insignificant at certain hierarchy levels for various cases. For example, the standard deviations of WSC2  319  D E M errors in clusters #2 and #4 at level 4 for subarea 93G with variable group 3-5 and window size 7x7 (see Table 6.1a) are only slightly different (i.e., 31.2 m in #2 versus 31.1 m in #4).  The ranking of the clusters in this case then become meaningless.  The  significance test of D E M error variation will be carried out later in section 6.2.3.  6.2.2.2  EMR1 D E M errors as compared to T R I M D E M  In order to find out if the above test results can be generalized for DEMs with different resolutions, the evaluation of EMR1 D E M errors (derived by comparing EMR1 with T R I M DEMs; see Figures 3.18 and 3.19.) within each terrain cluster was also conducted for the two subareas (93G and 93H) in different cases (i.e., window size 7x7 or 21x21, variable group 3-5, 3-6, 2 or 5 only). With variable group (3-5), the summary statistics of EMR1 D E M errors and the absolute EMR1 D E M errors in each cluster are listed in Tables 6.5a-b for window size (7x7) and in Tables 6.5c-d for window size (21x21). With variable group (3-6), the results are summarized in Tables 6.6a-d.  Tables 6.7a-d and Tables 6.8a-d show the  results of summary statistics of EMR1 D E M errors and the absolute EMR1 D E M errors for the case of using only a single variable slope (i.e., variable #2) or std (i.e., variable #5) in the classification. The rankings of the clusters based on EMR1 D E M errors and the absolute EMR1 D E M errors for various cases are not presented, but similar observations can be made as in the above section for WSC2 D E M errors.  320  Table 6.5a  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  1.19  13.55  33340  38.73  143.12  2  10184  0.38  23.39  19460  2.21  83.29  1  10184  0.38  23.39  24142  69.42  148.80  2  24665  4.16  14.56  9198  -41.83  84.68  3  17951  -2.89  10.76  19460  2.21  83.29  1  24665  4.16  14.56  20025  50.64  143.72  2  8563  0.07  22.80  9198  -41.83  84.68  3  17951  -2.89  10.76  19460  2.21  83.29  4  1621  2.04  26.21  4117  160.74  139.00  1  17711  5.26  15.64  9198  -41.83  84.68  2  8563  0.07  22.80  19460  2.21  83.29  3  17951  -2.89  10.76  13670  25.17  130.27  4  1621  2.04  26.21  6355  105.44  155.51  5  6954  1.38  10.89  4117  160.74  139.00  1  8563  0.07  22.80  19460  2.21  83.29  2  17951  -2.89  10.76  13670  25.17  130.27  3  6654  8.44  15.42  6355  105.44  155.51  4  1621  2.04  26.21  6219  -43.72  79.91  5  11057  3.34  15.46  4117  160.74  139.00  6  6954  1.38  10.89  2979  -37.88  93.76  321  Table 6.5b  Summary statistics of the absolute E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  42616  10.23  8.98  33340  112.03  97.13  2  10184  16.93  16.15  19460  61.10  56.66  1  10184  16.93  16.15  24142  127.08  103.98  2  24665  11.33  10.07  9198  72.54  60.50  3  17951  8.71  6.96  19460  61.10  56.66  1  24665  11.33  10.07  20025  117.97  96.46  2  8563  16.06  16.19  9198  72.54  60.50  3  17951  8.71  6.96  19460  61.10  56.66  4  1621  21.48  15.16  4117  171.39  125.65  1  17711  12.43  10.86  9198  72.54  60.50  2  8563  16.06  16.19  19460  61.10  56.66  3  17951  8.71  6.96  13670  104.15  82.20  4  1621  21.48  15.16  6355  147.68  116.15  5  6954  8.53  6.92  4117  171.39  125.65  1  8563  16.06  16.19  19460  61.10  56.66  2  17951  8.71  6.96  13670  104.15  82.20  3  6654  13.67  11.06  6355  147.68  116.15  4  1621  21.48  15.16  6219  69.05  59.41  5  11057  11.68  10.67  4117  171.39  125.65  6  6954  8.53  6.92  2979  79.81  62.09  322  Table 6.5c  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Class  #of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32226  2.31  18.56  30528  51.20  148.59  2  20574  -0.95  10.28  22272  -10.27  70.80  1  29720  1.65  18.19  22272  -10.27  70.80  2  20574  -0.95  10.28  17471  78.67  155.04  3  2506  10.06  20.92  13057  14.45  130.75  1  12681  -1.86  21.76  17471  78.67  155.04  2  17039  4.27  14.45  13057  14.45  130.75  3  20574  -0.95  10.28  17147  -3.43  72.50  4  2506  10.06  20.92  5125  -33.18  59.31  1  17039  4.27  14.45  13057  14.45  130.75  2  20574  -0.95  10.28  17147  -3.43  72.50  3  2506  10.06  20.92  12113  54.05  149.58  4  9555  -5.06  14.00  5358  134.32  152.77  5  3126  7.93  34.58  5125  -33.18  59.31  1  20574  -0.95  10.28  17147  -3.43  72.50  2  2506  10.06  20.92  6245  20.91  145.53  3  9555  -5.06  14.00  12113  54.05  149.58  4  13138  3.58  15.39  5358  134.32  152.77  5  3126  7.93  34.58  6812  8.53  115.26  6  3901  6.61  10.32  5125  -33.18  59.31  ,  323  Table 6.5d  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32226  13.68  12.76  30528  121.01  100.29  2  20574  8.14  6.36  22272  55.23  45.48  1  29720  13.24  12.60  22272  55.23  45.48  2  20574  8.14  6.36  17471  137.32  106.64  3  2506  18.91  13.48  13057  99.18  86.43  1  12681  15.27  15.63  17471  137.32  106.64  2  17039  11.72  9.47  13057  99.18  86.43  3  20574  8.14  6.36  17147  56.55  45.50  4  2506  18.91  13.48  5125  50.80  45.15  1  17039  11.72  9.47  13057  99.18  86.43  2  20574  8.14  6.36  17147  56.55  45.50  3  2506  18.91  13.48  12113  127.83  94.63  4  9555  11.59  9.34  5358  158.77  127.17  5  3126  26.49  23.61  5125  50.80  45.15  1  20574  8.14  6.36  17147  56.55  45.50  2  2506  18.91  13.48  6245  111.42  95.93  3  9555  11.59  9.34  12113  127.83  94.63  4  13138  12.34  9.88  5358  158.77  127.17  5  3126  26.49  23.61  6812  87.96  74.96  6  3901  9.64  7.58  5125  50.80  45.15  324  Table 6.6a  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  #of points (93H)  Mean (93H)  Std.dev (93H)  1  24188  -0.38  18.90  30135  53.05  144.69  2  28612  2.24  12.78  22665  -11.67  81.15  1  28612  2.24  12.78  21374  17.20  129.02  2  22589  -0.71  18.28  22665  -11.67  81.15  3  1599  4.24  25.72  8761  140.54  143.56  1  22589  -0.71  18.28  22665  -11.67  81.15  16793  4.65  13.65  16869  26.15  133.15  3  1599  4.24  25.72  8761  140.54  143.56  4  11819  -1.17  10.51  4505  -16.34  105.69  1  16793  4.65  13.65  16869  26.15  133.15  2  11188  -7.29  14.07  8761  140.54  143.56  3  11401  5.73  19.60  8893  -40.53  74.33  4  1599  4.24  25.72  13772  6.96  79.90  5  11819  -1.17  10.51  4505  -16.34  105.69  1  11188  -7.29  14.07  8761  140.54  143.56  2  10099  6.66  14.46  8893  -40.53  74.33  3  11401  5.73  19.60  13092  -0.74  117.85  4  1599  4.24  25.72  13772  6.96  79.90  5  11819  -1.17  10.51  4505  -16.34  105.69  6  6694  1.61  11.70  3777  119.39  140.86  2  '  325  Table 6.6b  Summary statistics of the absolute E M R 1 D E M errors in each terrain cluster (window size: 7x7, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  24188  13.67  13.07  30135  117.56  99.65  2  28612  9.70  8.62  22665  60.95  54.83  1  28612  9.70  8.62  21374  101.64  81.31  2  22589  13.12  12.76  22665  60.95  54.83  3  1599  21.49  14.76  8761  156.41  126.10  1  22589  13.12  12.76  22665  60.95  54.83  2  16793  10.71  9.67  16869  106.12  84.57  3  1599  21.49  14.76  8761  156.41  126.10  4  11819  8.27  6.60  4505  84.87  65.07  1  16793  10.71  9.67  16869  106.12  84.57  2  11188  12.16  10.16  8761  156.41  126.10  3  11401  14.05  14.82  8893  64.09  55.34  4  1599  21.49  14.76  13772  58.93  54.41  5  11819  8.27  6.60  4505  84.87  65.07  1  11188  12.16  10.16  8761  156.41  126.10  2  10099  11.80  10.69  8893  64.09  55.34  3  11401  14.05  14.82  13092  94.21  70.81  4  1599  21.49  14.76  13772  58.93  54.41  5  11819  8.27  6.60  4505  84.87  65.07  6  6694  9.06  7.59  3777  147.41  111.21  326  Table 6.6c  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  21257  -1.89  19.09  31940  57.63  142.95  2  31543  3.01  13.02  20860  -24.28  68.17  1  31543  3.01  13.02  19509  14.62  132.25  2  18926  -3.32  18.32  20860  -24.2  68.17  3  2331  9.71  21.14  12431  125.14  132.64  1  18926  -3.32  18.32  20860  -24.2  68.17  2  13602  -4.45  8.52  13096  36.71  145.60  3  17941  8.67  13.00  12431  125.14  132.64  4  2331  9.71  21.14  6413  -30.49  82.98  1  13602  -4.45  8.52  13096  36.71  145.60  2  17941  8.67  13.00  12431  125.14  132.64  3  10152  -6.46  12.79  9461  -32.00  51.41  4  8774  0.31  22.59  11399  -17.87  78.87  5  2331  9.71  21.14  6413  -30.49  82.98  1  17941  8.67  13.00  12431  125.14  132.64  2  10152  -6.46  12.79  9461  -32.00  51.41  3  8774  0.31  22.59  5010  -6.15  117.46  4  6741  -4.40  7.90  11399  -17.87  78.87  5  2331  9.71  21.14  6413  -30.49  82.98  6  6861  -4.50  9.09  8086  63.27  154.74  327  Table 6.6d  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 3-6) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  21257  13.95  13.18  31940  117.04  100.30  2  31543  9.88  9.01  20860  56.86  44.78  1  31543  9.88  9.01  19509  102.28  85.11  2  18926  13.32  13.02  20860  56.86  44.78  3  2331  19.02  13.41  12431  140.19  116.63  1  18926  13.32  13.02  20860  56.86  44.78  2  13602  7.77  5.68  13096  119.10  91.46  3  17941  11.48  10.60  12431  140.19  116.63  4  2331  19.02  13.41  6413  67.94  56.56  1  13602  7.77  5.68  13096  119.10  91.46  2  17941  11.48  10.60  12431  140.19  116.63  3  10152  11.29  8.83  9461  45.60  39.87  4  8774  15.67  16.28  11399  66.20  46.46  5  2331  19.02  13.41  6413  67.94  56.56  1  17941  11.48  10.60  12431  140.19  116.63  2  10152  11.29  8.83  9461  45.60  39.87  3  8774  15.67  16.28  5010  92.78  72.30  4  6741  7.48  5.09  11399  66.20  46.46  5  2331  19.02  13.41  6413  67.94  56.56  6  6861  8.05  6.19  8086  135.40  98.05  328  Table 6.7a  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32005  2.11  18.47  39256  -1.39  100.09  2  20795  -0.61  10.73  13544  102.56  156.63  1  27447  1.93  16.36  13544  102.56  156.63  2  4558  3.25  27.97  23103  -4.71  82.45  3  20795  -0.61  10.73  16153  3.34  120.79  1  4558  3.25  27.97  23103  -4.71  82.45  2  20795  -0.61  10.73  16153  3.34  120.79  3  14486  3.07  18.42  5818  145.09  167.01  4  12961  0.65  13.60  7726  70.53  140.06  1  20795  -0.61  10.73  16153  3.34  120.79  2  14486  3.07  18.42  5818  145.09  167.01  3  680  -6.31  26.85  10239  -20.22  57.63  4  3878  4.93  27.83  12864  7.65  96.02  5  12961  0.65  13.60  7726  70.53  140.06  1  14486  3.07  18.42  5818  145.09  167.01  2  680  -6.31  26.85  10239  -20.22  57.63  3  3878  4.93  27.83  12864  7.65  96.02  4  12961  0.65  13.60  7726  70.53  140.06  5  12420  -1.56  9.59  8438  11.87  127.30  6  8375  0.79  12.09  7715  -5.98  112.51  329  Table 6.7b  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  32005  13.56  12.72  39256  74.51  66.85  2  20795  8.37  6.75  13544  147.59  115.17  1  27447  12.27  11.00  13544  147.59  115.17  2  4558  21.33  18.38  23103  60.09  56.64  3  20795  8.37  6.75  16153  95.14  74.48  1  4558  21.33  18.38  23103  60.09  56.64  2  20795  8.37  6.75  16153  95.14  74.48  3  14486  13.76  12.62  5818  178.35  130.89  4  12961  10.61  8.53  7726  124.43  95.42  1  20795  8.37  6.75  16153  95.14  74.48  2  14486  13.76  12.62  5818  178.35  130.89  3  680  21.73  16.97  10239  44.27  42.08  4  3878  21.25  18.62  12864  72.68  63.21  5  12961  10.61  8.53  7726  124.43  95.42  1  14486  13.76  12.62  5818  178.35  130.89  2  680  21.73  16.97  10239  44.27  42.08  3  3878  21.25  18.62  12864  72.68  63.21  4  12961  10.61  8.53  7726  124.43  95.42  5  12420  7.75  5.86  8438  101.23  78.09  6  8375  9.29  7.78  7715  88.49  69.72  330  Table 6.7c  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  29393  0.14  10.95  30265  52.47  149.65  2  23407  2.17  20.50  22535  -11.26  67.96  1  23407  2.17  20.50  22535  -11.26  67.96  2  17218  -0.76  8.61  23238  25.09  129.58  3  12175  1.42  13.48  7027  143.01  173.95  1  20555  1.58  16.73  23238  25.09  129.58  2  17218  -0.76  8.61  7027  143.01  173.95  3  2852  6.41  37.59  11407  -17.01  51.38  4  12175  1.42  13.48  11128  -5.36  81.11  1  17218  -0.76  8.61  7027  143.01  173.95  2  2852  6.41  37.59  11407  -17.01  51.38  3  10574  1.61  17.67  14877  6.98  113.72  4  12175  1.42  13.48  11128  -5.36  81.11  5  9981  1.55  15.66  8361  57.31  148.45  1  2852  6.41  37.59  11407  -17.01  51.38  2  10574  1.61  17.67  14877  6.98  113.72  3  12175  1.42  13.48  11128  -5.36  81.11  4  9481  -0.91  8.06  5413  120.58  164.38  5  9981  1.55  15.66  8361  57.31  148.45  6  7737  -0.57  9.24  1614  218.26  183.83  331  Table 6.7d  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 2) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  29393  8.61  6.75  30265  122.80  100.35  2  23407  15.16  13.97  22535  53.59  43.28  1  23407  15.16  13.97  22535  53.59  43.28  2  17218  7.10  4.93  23238  105.18  79.73  3  12175  10.76  8.24  7027  181.05  133.91  1  20555  13.18  10.42  23238  105.18  79.73  2  17218  7.10  4.93  7027  181.05  133.91  3  2852  29.45  24.21  11407  41.54  34.70  4  12175  10.76  8.24  11128  65.95  47.51  1  17218  7.10  4.93  7027  181.05  133.91  2  2852  29.45  24.21  11407  41.54  34.70  3  10574  13.87  11.06  14877  92.17  66.96  4  12175  10.76  8.24  11128  65.95  47.51  5  9981  12.44  9.64  8361  128.33  94.09  1  2852  29.45  24.21  11407  41.54  34.70  2  10574  13.87  11.06  14877  92.17  66.96  3  12175  10.76  8.24  11128  65.95  47.51  4  9481  6.96  4.16  5413  164.15  120.88  5  9981  12.44  9.64  8361  128.33  94.09  6  7737  7.26  5.73  1614  237.72  157.84  332  Table 6.8a  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas  Class  #of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  25255  2.7  19.4  23109  59.5  151.1  2  27545  -0.5  11.7  29691  -1.4  93.3  1  7953  5.3  24.9  9386  105.9  156.9  2  27545  -0.5  11.7  29691  -1.4  93.3  3  17302  1.5  16.2  13723  27.7  138.3  1  27545  -0.5  11.7  29691  -1.4  93.3  2  17302  1.5  16.2  2681  136.6  155.5  3  1782  4.4  30.7  13723  27.7  138.3  4  6171  5.6  22.9  6705  93.7  155.9  1  17302  1.5  16.2  2681  136.6  155.5  2  1782  4.4  30.7  13723  27.7  138.3  3  6171  5.6  22.9  6705  93.7  155.9  4  14745  0.4  12.8  17792  9.4  104.8  5  12800  -1.4  10.2  11899  -17.5  69.6  1  1782  4.4  30.7  13723  27.7  138.3  2  6171  5.6  22.9  6705  93.7  155.9  3  14745  0.4  12.8  17792  9.4  104.8  4  10696  0.8  14.8  11899  -17.5  69.6  5  12800  -1.4  10.2  2238  136.0  155.2  6  6606  2.6  18.1  443  139.7  157.1  333  Table 6.8b  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 7x7, variable group: 5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  25255  14.20  13.52  23109  125.79  102.76  2  27545  9.06  7.38  29691  67.94  63.93  1  7953  18.34  17.64  9386  148.84  117.14  2  27545  9.06  7.38  29691  67.94  63.93  3  17302  12.29  10.60  13723  110.03  88.24  1  27545  9.06  7.38  29691  67.94  63.93  2  17302  12.29  10.60  2681  166.39  123.09  3  1782  23.63  20.10  13723  110.03  88.24  4  6171  16.82  16.55  6705  141.83  113.93  1  17302  12.29  10.60  2681  166.39  123.09  2  1782  23.63  20.10  13723  110.03  88.24  3  6171  16.82  16.55  6705  141.83  113.93  4  14745  9.84  8.16  17792  79.40  69.06  5  12800  8.16  6.25  11899  50.80  50.74  1  1782  23.63  20.10  13723  110.03  88.24  2  6171  16.82  16.55  6705  141.83  113.93  3  14745  9.84  8.16  17792  79.40  69.06  4  10696  11.49  9.41  11899  50.80  50.74  5  12800  8.16  6.25  2238  165.71  122.95  6  6606  13.59  12.17  443  169.80  123.88  334  Table 6.8c  Summary statistics of E M R l D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  # of points (93H)  Mean (93H)  Std.dev (93H)  1  36025  0.16  11.8  37778  8.7  102.6  2  16775  2.9  22.2  15022  67.0  163.2  1  16775  2.9  22.2  15022  67.0  163.2  2  22479  0.45  13.19  22374  -2.2  76.7  3  13546  -0.3  9.1  15404  24.5  129.8  1  22479  0.45  13.19  22374  -2.2  76.7  2  6296  4.37  28.47  5711  82.3  165.2  3  10479  2.0  17.35  15404  24.5  129.8  4  13546  -0.3  9.1  9311  57.6  161.3  1  6296  4.37  28.47  5711  82.3  165.2  2  10479  2.0  17.35  15404  24.5  129.8  3  13546  -0.3  9.1  12237  -12.1  62.3  4  11958  0.15  13.6  9311  57.6  161.3  5  10521  0.8  12.66  10137  9.7  89.6  1  10479  2.0  17.4  15404  24.5  129.8  2  13546  -0.3  9.1  12237  -12.1  62.3  3  5057  1.0  23.3  9311  57.6  161.3  4  11958  0.2  13.6  4468  81.3  170.8  5  10521  0.8  12.7  10137  9.7  89.6  6  1239  18.1  40.96  1243  86.1  143.2  335  Table 6.8d  Summary statistics of the absolute EMR1 D E M errors in each terrain cluster (window size: 21x21, variable group: 5) for the two subareas  Class  # of points (93G)  Mean (93G)  Std.dev (93G)  #of points (93H)  Mean (93H)  Std.dev (93H)  1  36025  9.21  7.44  37778  75.77  69.70  2  16775  16.48  15.18  15022  137.26  110.88  1  16775  16.48  15.18  15022  137.26  110.88  2  22479  10.28  8.29  22374  57.47  50.86  3  13546  7.43  5.29  15404  102.34  83.45  1  22479  10.28  8.29  22374  57.47  50.86  2  6296  21.04  19.68  5711  142.25  117.63  3  10479  13.75  10.79  15404  102.34  83.45  4  13546  7.43  5.29  9311  134.20  106.42  1  6296  21.04  19.68  5711  142.25  117.63  2  10479  13.75  10.79  15404  102.34  83.45  3  13546  7.43  5.29  12237  46.78  42.95  4  11958  10.54  8.67  9311  134.20  106.42  5  10521  9.98  7.83  10137  70.37  56.38  1  10479  13.75  10.79  15404  102.34  83.45  2  13546  7.43  5.29  12237  46.78  42.95  3  5057  17.86  14.93  9311  134.20  106.42  4  11958  10.54  8.67  4468  145.47  120.94  5  10521  9.98  7.83  10137  70.37  56.38  6  1239  33.99  29.15  1243  130.69  104.11  336  6.2.2.3  NGDC5 and WSC2 D E M errors as compared to E M R l D E M  In the preceding two sections, WSC2 and E M R l D E M errors as compared to T R I M DEMs for two different surfaces (93G and 93H) were examined in relation to the terrain clusters derived using different variable groups and window sizes. The terrain clusters used in the above analyses were all based on local measures extracted from 50 m T R I M DEMs for the two subareas. In order to investigate the sensitivity of the above D E M error models, NGDC5 and WSC2 D E M errors as compared to E M R l D E M for the whole study area were also examined within various terrain clusters based on local roughness measures extracted from the 1 km E M R l D E M . Table 6.9a presents the summary statistics (i.e., number of points, mean, and standard deviation) of NGDC5 D E M errors within each terrain cluster derived using variable group (3-5) and two different moving window sizes (5x5) and (9x9) for the whole study area. The results for the absolute NGDC5 D E M errors (variable group: 3-5) are presented in Table 6.9b.  Based on the above two tables, the rankings of the clusters  according to NGDC5 D E M errors and the absolute NGDC5 D E M errors can be summarized as follows for the two window sizes: 1)  variable group: 3-5, window size: 5x5  {2,1} {1,2,3} {2,3,1,4} {1,5,2,4,3} {1,5,6,2,4,3} {2,1} {1,2,3} {3,2,1,4} {2,1,5,4,3} {2,5,1,6,4,3} 2)  variable group: 3-5, window size: 9x9  {1,2} {2,3,1} {1,2,4,3} {2,3,1,5,4} {1,2,5,6,4,3} {1,2} {2,3,1} {1,2,4,3} {2,1,3,5,4} {5,1,2,6,4,3}  337  Table 6.9a  Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  #of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  60.64  190.53  15399  -15.95  323.33  2  10564  -57.08  341.92  7981  52.59  135.26  1  10564  -57.08  341.92  7981  52.59  135.26  2  7512  100.11  202.62  10674  -101.23  314.36  3  5304  4.73  155.69  4725  176.71  252.82  1  7512  100.11  202.62  10674  -101.23  314.36  2  8313  -147.41  308.44  4725  176.71  252.82  3  2251  276.51  236.52  4951  16.40  116.49  4  5304  4.73  155.69  3030  111.72  142.78  1  8313  -147.41  308.44  4725  176.71  252.82  2  2251  276.51  236.52  5885  -59.56  335.42  3  5304  4.73  155.69  4789  -152.43  277.98  4  4100  115.39  158.19  4951  16.40  116.49  5  3412  81.75  244.36  3030  111.72  142.78  1  4452  -71.54  327.30  5885  -59.56  335.42  2  2251  276.51  236.52  4789  -152.43  277.98  3  5304  4.73  155.69  4951  16.40  116.49  4  4100  115.39  158.19  3030  111.72  142.78  5  3861  -234.89  258.92  2133  259.64  256.85  6  3412  81.75  244.36  2592  108.46  227.88  338  Table 6.9b  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  145.26  137.39  15399  259.61  193.37  2  10564  280.03  204.32  7981  103.01  102.22  1  10564  280.03  204.32  7981  103.01  102.22  2  7512  170.71  148.09  10674  262.97  199.78  3  5304  109.20  111.06  4725  252.02  177.83  1  7512  170.71  148.09  10674  262.97  199.78  2  8313  272.78  206.03  4725  252.02  177.83  3  2251  306.82  195.58  4951  83.60  82.75  4  5304  109.20  111.06  3030  134.72  121.31  1  8313  272.78  206.03  4725  252.02  177.83  2  2251  306.82  195.58  5885  274.45  201.80  3  5304  109.20  111.06  4789  248.87  196.37  4  4100  140.68  136.19  4951  83.60  82.75  5  3412  206.80  153.68  3030  134.72  121.31  1  4452  262.68  207.91  5885  274.45  201.80  2  2251  306.82  195.58  4789  248.87  196.37  3  5304  109.20  111.06  4951  83.60  82.75  4  4100  140.68  136.19  3030  134.72  121.31  5  3861  284.42  203.25  2133  309.13  194.45  6  3412  206.80  153.68  2592  205.03  147.13  339  With window size 5x5, cluster #1 at level 3 is the roughest (as seen in section 5.3.3) and it has the largest standard deviation of NGDC5 D E M errors and the highest mean of the absolute NGDC5 D E M errors. Cluster #2 is the second roughest and it has the second largest standard deviation of NGDC5 D E M errors and the mean of the absolute NGDC5 D E M errors. Cluster #3 is the least rough one and it corresponds to a cluster with the smallest amount of NGDC5 D E M errors. With window size 9x9, cluster #2 at level 3 is the roughest (as seen in section 5.3.3) and it has the largest standard deviation of NGDC5 D E M errors and the highest mean of the absolute NGDC5 D E M errors. Cluster #3 is the second roughest and it has the second largest standard deviation of NGDC5 D E M errors and the mean of the absolute NGDC5 D E M errors. Cluster #1 is the least rough one and it corresponds to a cluster with the smallest amount of NGDC5 D E M errors.  For variable groups (3-6), (2), and (5), the statistics of NGDC5 D E M errors and the absolute NGDC5 D E M errors within various terrain clusters are summarized in Tables 6.10a-b, 6.11ab, and 6.12a-b respectively. The rankings of the clusters are not presented.  Tables 6.13a-6.16b give the summary results for WSC2 D E M errors (as compared to E M R l DEM) within different terrain clusters for the whole study area in various cases (i.e., window size: 5x5 or 9x9; variable group: 3-5, 3-6, 2, or 5). Again, both the original (including the positive and negative elevation differences) and the absolute (ignoring the signs) WSC2 D E M errors (as compared to E M R l DEM) were examined.  340  Table 6.10a  Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  13206  -46.90  327.32  14697  -23.95  325.88  2  10174  77.99  164.71  8683  60.59  144.26  1  10174  77.99  164.71  8683  60.59  144.26  2  10004  40.03  301.38  10129  99.64  282.03  3  3202  -318.52  246.40  4568  -298.00  237.37  1  10004  40.03  301.38  10129  99.64  282.03  2  6006  122.85  168.71  6420  22.65  121.81  3  4168  13.36  134.67  4568  -298.00  237.37  4  3202  -318.52  246.40  2263  168.22  148.64  1  4982  90.66  323.85  6420  22.65  121.81  2  6006  122.85  168.71  7014  77.18  245.23  3  5022  -10.19  268.04  3115  150.21  345.80  4  4168  13.36  134.67  4568  -298.00  237.37  5  3202  -318.52  246.40  2263  168.22  148.64  1  6006  122.85  168.71  7014  77.18  245.23  2  5022  -10.19  268.04  3115  150.21  345.80  3  4168  13.36  134.67  4279  7.22  123.66  4  3202  -318.52  246.40  4568  -298.00  237.37  5  3290  -30.01  310.01  2263  168.22  148.64  6  1692  325.31  196.41  2141  53.49  111.84  341  Table 6.10b  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  13206  266.05  196.35  14697  262.52  194.55  2  10174  128.41  129.31  8683  110.74  110.53  1  10174  128.41  129.31  8683  110.74  110.53  2  10004  242.73  183.05  10129  238.43  180.60  3  3202  338.88  217.55  4568  315.94  212.89  1  10004  242.73  183.05  10129  238.43  180.60  2  6006  152.13  142.86  6420  88.73  '86.47  3  4168  94.24  97.12  4568  315.94  212.89  4  3202  338.88  217.55  2263  173.20  142.80  1  4982  272.71  196.76  6420  88.73  86.47  2  6006  152.13  142.86  7014  206.51  153.10  3  5022  213.00  163.01  3115  310.31  214.07  4  4168  94.24  97.12  4568  315.94  212.89  5  3202  338.88  217.55  2263  173.20  142.80  1  6006  152.13  142.86  7014  206.51  153.10  2  5022  213.00  163.01  3115  310.31  214.07  3  4168  94.24  97.12  4279  89.90  85.21  4  3202  338.88  217.55  4568  315.94  212.89  5  3290  243.39  194.29  2263  173.20  142.80  6  1692  329.71  188.93  2141  86.39  88.91  342  Table 6.11a  Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14002  -37.85  324.67  14413  -26.70  327.54  2  9378  75.08  157.60  8967  62.34  145.73  1  9948  -13.26  297.16  10312  -59.44  341.57  2  9378  75.08  157.60  8967  62.34  145.73  3  4054  -98.20  377.22  4101  55.61  272.45  1  9378  75.08  157.60  8967  62.34  145.73  2  4054  -98.20  377.22  3194  -93.74  395.17  3  4701  29.57  281.06  7118  -44.05  313.38  4  5247  -51.63  305.85  4101  55.61  272.45  1  4054  -98.20  377.22  3194  -93.74  395.17  2  4701  29.57  281.06  7118  -44.05  313.38  3  5247  -51.63  305.85  4101  55.61  272.45  4  4046  81.77  187.85  5435  76.64  173.64  5  5332  70.00  129.83  3532  40.33  82.00  1  4701  29.57  281.06  7118  -44.05  313.38  2  5247  -51.63  305.85  4101  55.61  272.45  3  1246  -131.48  434.22  5435  76.64  173.64  4  4046  81.77  187.85  2536  -92.54  372.48  5  5332  70.00  129.83  658  -98.36  472.82  6  2808  -83.44  348.02  3532  40.33  82.00  343  Table 6.11b  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14002  262.86  194.27  14413  264.85  194.54  2  9378  121.48  125.36  8967  111.81  112.34  1  9948  241.46  173.70  10312  281.25  202.72  2  9378  121.48  125.36  8967  111.81  112.34  3  4054  315.39  229.00  4101  223.61  165.24  1  9378  121.48  125.36  8967  111.81  112.34  2  4054  315.39  229.00  3194  330.50  235.97  3  4701  228.21  166.66  7118  259.15  181.60  4  5247  253.33  178.95  4101  223.61  165.24  1  4054  315.39  229.00  3194  330.50  235.97  2  4701  228.21  166.66  7118  259.15  181.60  3  5247  253.33  178.95  4101  223.61  165.24  4  4046  152.02  137.33  5435  141.12  126.92  5  5332  98.31  109.95  3532  66.71  62.45  1  4701  228.21  166.66  7118  259.15  181.60  2  5247  253.33  178.95  4101  223.61  165.24  3  1246  371.36  260.43  5435  141.12  126.92  4  4046  152.02  137.33  2536  314.23  220.30  5  5332  98.31  109.95  658  393.22  279.98  6  2808  290.56  208.88  3532  66.71  62.45  344  Table 6.12a  Summary statistics of NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14052  -36.44  321.35  14851  -18.61  326.12  2  9328  73.55  167.47  8529  52.81  142.58  1  6314  -55.75  340.24  11672  -4.79  296.88  2  7738  -20.67  304.19  3179  -69.33  412.34  3  9328  73.55  167.47  8529  52.81  142.58  1  7738  -20.67  304.19  3179  -69.33  412.34  2  9328  73.55  167.47  8529  52.81  142.58  3  1938  -89.97  360.31  6666  5.52  272.99  4  4376  -40.60  329.87  5006  -18.53  325.51  1  9328  73.55  167.47  8529  52.81  142.58  2  1938  -89.97  360.31  6666  5.52  272.99  3  4376  -40.60  329.87  957  -125.92  442.39  4  4399  -9.39  294.20  5006  -18.53  325.51  5  3339  -35.54  316.30  2222  -44.95  396.32  1  1938  -89.97  360.31  6666  5.52  272.99  2  4376  -40.60  329.87  957  -125.92  442.39  3  4399  -9.39  294.20  5006  -18.53  325.51  4  6523  74.28  143.43  4407  58.43  172.65  5  2805  71.85  213.16  2222  -44.95  396.32  6  3339  -35.54  316.30  4122  46.82  100.65  345  Table 6.12b  Summary statistics of the absolute NGDC5 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14052  258.89  193.81  14851  262.73  194.09  2  9328  126.71  131.91  8529  107.65  107.38  1  6314  275.23  207.61  11672  241.98  172.05  2  7738  245.56  180.70  3179  338.92  244.80  3  9328  126.71  131.91  8529  107.65  107.38  1  7738  245.56  180.70  3179  338.92  244.80  2  9328  126.71  131.91  8529  107.65  107.38  3  1938  292.24  229.07  6666  221.77  159.26  4  4376  267.70  196.93  5006  268.88  184.35  1  9328  126.71  131.91  8529  107.65  107.38  2  1938  292.24  229.07  6666  221.77  159.26  3  4376  267.70  196.93  957  377.11  263.09  4  4399  237.22  174.22  5006  268.88  184.35  5  3339  256.54  188.36  2222  322.47  234.65  1  1938  292.24  229.07  6666  221.77  159.26  2  4376  267.70  196.93  957  377.11  263.09  3  4399  237.22  174.22  5006  268.88  184.35  4  6523  109.56  118.68  4407  136.40  120.88  5  2805  166.60  151.12  2222  322.47  234.65  6  3339  256.54  188.36  4122  76.91  80.04  346  Table 6.13a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area  Class  # of point (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  1.90  49.66  15399  1.03  103.49  2  10564  -1.09  114.65  7981  -0.36  26.41  1  10564  -1.09  114.65  7981  -0.36  26.41  2  7512  5.11  55.30  10674  -0.84  115.41  3  5304  -2.64  39.90  4725  5.24  69.24  1  7512  5.11  55.30  10674  -0.84  115.41  2  8313  -6.76  121.36  4725  5.24  69.24  3  2251  19.86  82.14  4951  -1.03  26.82  4  5304  -2.64  39.90  3030  0.72  25.69  1  8313  -6.76  121.36  4725  5.24  69.24  2  2251  19.86  82.14  5885  1.34  123.55  3  5304  -2.64  39.90  4789  -3.52  104.49  4  4100  2.62  25.93  4951  -1.03  26.82  5  3412  8.10  76.87  3030  0.72  25.69  1  4452  -3.15  128.79  5885  1.34  123.55  2  2251  19.86  82.14  4789  -3.52  104.49  3  5304  -2.64  39.90  4951  -1.03  26.82  4  4100  2.62  25.93  3030  0.72  25.69  5  3861  -10.92  112.05  2133  7.73  70.57  6  3412  8.10  76.87  2592  3.20  68.07  347  Table 6.13b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-5) for the whole study area  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  12816  30.20  39.46  15399  70.87  75.43  2  10564  80.04  82.09  7981  17.71  19.59  1  10564  80.04  82.09  7981  17.71  19.59  2  7512  34.20  43.75  10674  82.85  80.35  3  5304  24.54  31.57  4725  43.80  53.87  1  7512  34.20  43.75  10674  82.85  80.35  2  8313  87.93  83.91  4725  43.80  53.87  3  2251  50.88  67.47  4951  18.53  19.41  4  5304  24.54  31.57  3030  16.37  19.81  1  8313  87.93  83.91  4725  43.80  53.87  2  2251  50.88  67.47  5885  87.59  87.14  3  5304  24.54  31.57  4789  77.02  70.70  4  4100  17.44  19.37  4951  18.53  19.41  5  3412  54.34  54.97  3030  16.37  19.81  1  4452  91.14  91.05  5885  87.59  87.14  2  2251  50.88  67.47  4789  77.02  70.70  3  5304  24.54  31.57  4951  18.53  19.41  4  4100  17.44  19.37  3030  16.37  19.81  5  3861  84.24  74.67  2133  41.55  57.56  6  3412  54.34  54.97  2592  45.66  50.58  348  Table 6.14a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  13206  -0.47  109.55  14697  0.40  105.23  2  10174  1.88  34.33  8683  0.81  29.93  1  10174  1.88  34.33  8683  0.81  29.93  2  10004  15.00  104.31  10129  19.62  94.57  3  3202  -48.81  111.47  4568  -42.21  114.73  1  10004  15.00  104.31  10129  19.62  94.57  2  6006  8.38  37.45  6420  -1.24  30.77  3  4168  -7.49  26.59  4568  -42.21  114.73  4  3202  -48.81  111.47  2263  6.62  26.58  1  4982  -0.70  103.01  6420  -1.24  30.77  2  6006  8.38  37.45  7014  24.48  91.62  3  5022  30.58  103.25  3115  8.67  100.04  4  4168  -7.49  26.59  4568  -42.21  114.73  5  3202  -48.81  111.47  2263  6.62  26.58  1  6006  8.38  37.45  7014  24.48  91.62  2  5022  30.58  103.25  3115  8.67  100.04  3  4168  -7.49  26.59  4279  -6.16  27.68  4  3202  -48.81  111.47  4568  -42.21  114.73  5  3290  -14.63  115.56  2263  6.62  26.58  6  1692  26.36  64.59  2141  8.58  34.10  349  Table 6.14b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 3-6) for the whole study area  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  13206  76.46  78.46  14697  72.42  76.34  2  10174  21.91  26.49  8683  19.38  22.82  1  10174  21.91  26.49  8683  19.38  22.82  2  10004  72.05  76.90  10129  64.53  71.86  3  3202  90.24  81.62  4568  89.90  82.83  1  10004  72.05  76.90  10129  64.53  71.86  2  6006  24.12  29.85  6420  20.12  23.31  3  4168  18.73  20.31  4568  89.90  82.83  4  3202  90.24  81.62  2263  17.31  21.23  1  4982  67.75  77.60  6420  20.12  23.31  2  6006  24.12  29.85  7014  65.66  68.42  3  5022  76.31  75.98  3115  61.98  79.00  4  4168  18.73  20.31  4568  89.90  82.83  5  3202  90.24  81.62  2263  17.31  21.23  1  6006  24.12  29.85  7014  65.66  68.42  2  5022  76.31  75.98  3115  61.98  79.00  3  4168  18.73  20.31  4279  18.91  21.13  4  3202  90.24  81.62  4568  89.90  82.83  5  3290  81.54  83.17  2263  17.31  21.23  6  1692  40.93  56.49  2141  22.52  27.00  350  Table 6.15a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14002  0.84  108.00  14413  1.05  106.47  2  9378  0.13  27.68  8967  -0.24  28.19  1  9948  0.90  91.73  10312  0.60  117.75  2  9378  0.13  27.68  8967  -0.24  28.19  3  4054  0.67  140.15  4101  2.16  70.54  1  9378  0.13  27.68  8967  -0.24  28.19  2  4054  0.67  140.15  3194  4.33  140.76  3  4701  3.28  74.18  7118  -1.07  105.78  4  5247  -1.23  104.96  4101  2.16  70.54  1  4054  0.67  140.15  3194  4.33  140.76  2  4701  3.28  74.18  7118  -1.07  105.78  3  5247  -1.23  104.96  4101  2.16  70.54  4  4046  0.20  36.40  5435  -0.58  33.77  5  5332  0.08  18.48  3532  0.28  16.19  1  4701  3.28  74.18  7118  -1.07  105.78  2  5247  -1.23  104.96  4101  2.16  70.54  3  1246  3.11  164.93  5435  -0.58  33.77  4  4046  0.20  36.40  2536  4.33  130.94  5  5332  0.08  18.48  658  4.31  173.61  6  2808  -0.41  127.64  3532  0.28  16.19  351  Table 6.15b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 2) for the whole study area  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14002  75.56  77.17  14413  73.85  76.70  2  9378  18.62  20.47  8967  18.76  21.04  1  9948  65.55  64.18  10312  83.27  83.25  2  9378  18.62  20.47  8967  18.76  21.04  3  4054  100.13  98.06  4101  50.16  49.64  1  9378  18.62  20.47  8967  18.76  21.04  2  4054  100.13  98.06  3194  100.55  98.59  3  4701  53.63  51.35  7118  75.51  74.07  4  5247  76.22  72.16  4101  50.16  49.64  1  4054  100.13  98.06  3194  100.55  98.59  2  4701  53.63  51.35  7118  75.51  74.07  3  5247  76.22  72.16  4101  50.16  49.64  4  4046  25.82  25.66  5435  23.42  24.34  5  5332  13.16  12.98  3532  11.59  11.30  1  4701  53.63  51.35  7118  75.51  74.07  2  5247  76.22  72.16  4101  50.16  49.64  3  1246  118.98  114.21  5435  23.42  24.34  4  4046  25.82  25.66  2536  93.55  91.70  5  5332  13.16  12.98  658  127.53  117.77  6  2808  91.76  88.71  3532  11.59  11.30  352  Table 6.16a  Summary statistics of WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area  Class  #of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14052  0.73  107.45  14851  0.92  104.81  2  9328  0.29  29.80  8529  -0.09  29.41  1  6314  -0.37  125.22  11672  -0.27  98.29  2  7738  1.62  90.39  3179  5.30  125.79  3  9328  0.29  29.80  8529  -0.09  29.41  1  7738  1.62  90.39  3179  5.30  125.79  2  9328  0.29  29.80  8529  -0.09  29.41  3  1938  2.65  134.68  6666  0.56  90.78  4  4376  -1.71  120.79  5006  -1.39  107.48  1  9328  0.29  29.80  8529  -0.09  29.41  2  1938  2.65  134.68  6666  0.56  90.78  3  4376  -1.71  120.79  957  0.76  149.67  4  4399  1.64  83.40  5006  -1.39  107.48  5  3339  1.60  98.85  2222  7.26  113.95  1  1938  2.65  134.68  6666  0.56  90.78  2  4376  -1.71  120.79  957  0.76  149.67  3  4399  1.64  83.40  5006  -1.39  107.48  4  6523  0.12  22.47  4407  -0.50  35.86  5  2805  0.68  42.20  2222  7.26  113.95  6  3339  1.60  98.85  4122  0.35  20.35  353  Table 6.16b  Summary statistics of the absolute WSC2 D E M errors in each terrain cluster (window size: 5x5 and 9x9; variable group: 5) for the whole study area  Class  # of points (5x5)  Mean (5x5)  Std.dev (5x5)  # of points (9x9)  Mean (9x9)  Std.dev (9x9)  1  14052  74.84  77.10  14851  72.00  76.17  2  9328  19.40  22.63  8529  19.16  22.31  1  6314  88.81  88.27  11672  67.42  71.52  2  7738  63.44  64.40  3179  88.78  89.25  3  9328  19.40  22.63  8529  19.16  22.31  1  7738  63.44  64.40  3179  88.78  89.25  2  9328  19.40  22.63  8529  19.16  22.31  3  1938  96.09  94.39  6666  62.80  65.55  4  4376  85.59  85.24  5006  73.58  78.35  1  9328  19.40  22.63  8529  19.16  22.31  2  1938  96.09  94.39  6666  62.80  65.55  3  4376  85.59  85.24  957  108.38  103.17  4  4399  58.17  59.78  5006  73.58  78.35  5  3339  70.37  69.43  2222  80.34  81.11  1  1938  96.09  94.39  6666  62.80  65.55  2  4376  85.59  85.24  957  108.38  103.17  3  4399  58.17  59.78  5006  73.58  78.35  4  6523  15.47  16.29  4407  23.99  26.66  5  2805  28.54  31.09  2222  80.34  81.11  6  3339  70.37  69.43  4122  13.99  14.79  354  6.2.3  Significance tests of D E M error variation  From Figures 6.8a-b and Tables 6.1a to 6.16b, it is evident that WSC2 and E M R l D E M errors (as compared to 50 m T R I M DEMs) for the two subareas and NGDC5 and WSC2 D E M errors (as compared to 1 km E M R l DEM) for the whole study area are different from cluster to cluster. It was observed earlier in section 6.2.2 that, in general, the rougher the cluster, the larger the standard deviation of the elevation differences or the mean of the absolute elevation differences, indicating greater amount of D E M errors. Also note, however, that the differences between D E M errors in some clusters become insignificant at certain hierarchical levels for various cases. In order to examine the differences quantitatively, the following statistical test was used for the testing for difference between two population variances: s —  2  ~F ,  ,  ,  (6.1)  11,-1,112-1  2  S  where the subscript (n, - 1) refers to the number of degrees of freedom in the numerator and (n - 1) refers to the number of degrees of freedom in the denominator. 2  This most commonly used test for variances takes the ratio of the two sample variances (s! /s ) and tests it against an F statistic. Strictly speaking, this test is true only for normal 2  2  2  parent populations. There are some indications, however, that the results also apply to a large extent to other types of parent populations, providing they do not differ from the normal  355  population too markedly [Kmenta, 1986]. Since the D E M errors are quite symmetrically distributed (i.e., not too nonnormal) and the numbers of observations are large, moderate departures from the assumption of normality are tolerable. It is, thus, safe to use the above test here to examine quantitatively the differences between the standard deviations (or variances) of D E M errors in various clusters.  6.2.3.1  Significance tests of WSC2 and EMR1 D E M errors in the two subareas  The standard deviations of the original WSC2 D E M errors (as compared to T R I M DEMs) in various terrain clusters in each case (variable group: 3-5, 3-6, 2, or 5; window size: 7x7 or 21x21; number of clusters: 2, 3, 4, 5, 6, 7 or 8; and subarea: 93G or 93H) were examined using the above F statistic. Figure 6.9a illustrates the different levels of partition (2 to 8 clusters) from the hierarchical terrain clustering based on variable group (3-5) and two different window sizes (7x7 and 21x21) for the two subareas (93G and 93H).  The  connectivity information from one level to the next was derived from the data (i.e., the number of points in each cluster) presented in Tables 6.1a-d.  The figure illustrates how  terrain cluster {1} at any one partition level is separated successively into two finer clusters at the next higher level. The clusters at different levels are hierarchically nested and the partition level controls the scale of the investigation. If two points belong to the same cluster at a higher level L , it is clear that they must also belong to the same cluster at a lower level ;  L <Lj. 2  356  93Q (WSC2) (variable: 3-5) (window: 21x21)  93Q (WSC2) (variable: 3-5) (window: 7x7)  1 2 1 2 3  1 2  3 4 5  1 2 3 4 5 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8  1 2  93H (WSC2) (variable: 3-5) (window: 7x7)  93H (WSC2) (variable: 3-5) (window: 21x21)  3 4 5  1 2 1 2 3  1 2  3 4 5  1) 2 3 (4) 5 1 2 3 4 5 6  1 2 3 (4) 5 (6) 7  1 2 3 4 5 6 7  1 2  1 2  gure 6.9a  3 4 5  The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21) 357  At each partition level in Figure 6.9a, terrain clusters that are not significantly different (at a=0.01) from all other clusters, in terms of the standard deviation of WSC2 D E M errors, are indicated by shaded circles. It can be seen in Figure 6.9a that for case #1 (subarea 93G; variable group: 3-5; and window size: 7x7), WSC2 D E M errors in some clusters (clusters {2} and {4}) start to show no significant differences at partition level 4. For case #2 (subarea 93G; variable group: 3-5; and window size: 21x21), it starts at level 7 (clusters {6} and {7}). For case #3 (subarea 93H; variable group: 3-5; and window size: 7x7), it starts at level 6 (clusters {5} and {6}) and for case #4 (subarea 93H; variable group: 3-5; and window size: 21x21) it starts at level 5 (clusters {1} and {4}). Note that at levels 2 and 3 all clusters are significantly different from each other for every case tested above.  The above results indicate that WSC2 D E M errors (elevation differences as compared to T R I M DEMs) in each of the two subareas do show some significant variations between different clusters resulting from terrain classification based on variable group (3-5) and the two different window sizes. Furthermore, variation of WSC2 D E M errors between terrain clusters becomes statistically insignificant at different partition levels for the different window sizes used for local measure extraction and for the two subareas with distinct global characteristics. For the flatter surface (i.e., 93G), variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 7 for window size (21x21) and level 4 for window size (7x7). For the rougher surface (i.e., 93H), as can be seen in Figure 6.9a, variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 6 for window size 7x7 versus level 5 for  358  window size 21x21. Finally, it should be noted that in some cases even though the standard deviations (or variances) in two clusters were found not significantly different (i.e., equal population standard deviations), their means are apparently very dissimilar. This can be tested statistically by using a t-test. That is,  t =  X  Xj  s  2  2  s  2  (6.2)  N  As seen in Table 6.1a and Figure 6.9a, for example, clusters {5} and {6} at level 6 for subarea 93H (variable group: 3-5, window size: 7x7) were not different in terms of their standard deviations of WSC2 D E M errors (both have the same value of 228.4 m). However, their means (81.5 and -151.4 m respectively) were tested significantly different (at a=0.01).  In order to examine the effect of different variable groups on the above results, significance test results for WSC2 D E M errors in terrain clusters derived from variable groups (3-6), (2) and (5) (for both subareas 93G and 93H and both window sizes 7x7 and 21x21) are summarized in Figures 6.9b-d respectively. Based on information presented in Figure 6.9b it can be seen that, with variable group (3-6) used for classification, the variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 7 for window size (7x7) and level 3 for window size (21x21) for subarea 93G. For the rougher surface 93H, the variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 5 for window size (7x7). For 359  93Q (WSC2) (variable: 3-6) (window: 7x7)  93Q (WSC2) (variable: 3-6) (window: 21x21)  1 2  1 2  0) 2 ®  12  3 4 5  1 2 3 4 5 6  0 ) @ 3 4 5 6 7 8  0 2 ( 3 ) 4 5 ® ®  93H (WSC2) (variable: 3-6) (window: 7x7)  93H (WSC2) (variable: 3-6) (window: 21x21)  1 2  1 2  1 2 3  12  3 4 5  1 2 3 4 5 6 1 2 3 4 5 6 7  CD ® ® @ Figure 6.9b  1 2 3 4 5 6 7 8  5  The significance test results for WSC2 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21) 360  93Q (WSC2) (variable: 2) (window: 7x7)  93Q (WSC2) (variable: 2) (window: 21x21)  1 2  1 2  1 2  3 4  1 2  3 4 5  1 2 3 4 5 1 2 3 4 5 6  1 2 3 4 5 6  0) 2 3 4  5 2 3 4 5 6 © 1  3  1 2 3  4 ® © ®  5  6 ®  93H (WSC2) (variable: 2) (window: 21x21)  93H (WSC2) (variable: 2) (window: 7x7)  1 2  2  3 4 1 2  Figure 6.9c  3 4 5  2  3 4 5 6  1 2 3 4 5 6  2  3 4 5 6 7  1 2 3 4 5 6 7  2  3 4 5 6 7 8  1 2 3 (4) 5 6  8  The significance test results for WSC2 D E M errors for the two subareas (variable group: 2, window size: 7x7 and 21x21) 361  93Q (WSC2) (variable: 5) (window: 7x7)  93Q (WSC2) (variable: 5) (window: 21x21)  1 2  1 2  1 2 1  3 4 2 3 4  5 2  5 6  3 @ 5 6  D 2 3 4 5 ® ® 1 2  3 4  6  93H (WSC2) (variable: 5) (window: 7x7)  93H (WSC2) (variable: 5) (window: 21 x21)  1 2 1  7 8  1 2  2 3  1 2  3 4  1 2  3 4  gure 6.9d  5  1 2  3 4  5  3  5 (6) 7 8  4  The significance test results for WSC2 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21) 362  window size (21x21), the variation of WSC2 D E M errors in various terrain clusters are significant at all the levels examined (levels 2 to 8).  For the case of variable group (2), as shown in Figure 6.9c, the variation of WSC2 D E M errors between some of the terrain clusters in subarea 93G becomes statistically insignificant at partition level 7 for both window sizes (7x7) and (21x21). In subarea 93H, it is significant at all the levels from 2 to 8 for window size (7x7), but it starts to become insignificant at level 8 for window size (21x21).  With variable group (5) (see Figure 6.9d), the variation of WSC2 D E M errors between some of the terrain clusters becomes statistically insignificant at partition level 7 for case #1 (subarea 93G; variable group: 5; and window size: 7x7), level 5 for case #2 (subarea 93G; variable group: 5; and window size: 21x21), and level 6 for both case #3 (subarea 93H; variable group: 5; and window size: 7x7) and case #4 (subarea 93H; variable group: 5; and window size: 21x21).  For EMR1 D E M errors as compared to T R I M DEMs for the two subareas (93G and 93H), significance test results at various hierarchical levels (2 to 8) were also obtained for the four different variable groups (3-5, 3-6, 2 and 5) and the two different window sizes (7x7 and 21x21). Figures 6.10a-d summarize the test results for EMR1 D E M errors in various cases.  From the above significance test results of WSC2 and EMR1 D E M errors within various  363  93Q (EMRl) (variable: 3-5) (window: 7x7)  93G(EMR1) (variable: 3-5) (window: 21x21)  1 2  1 2  1 2 3  1 2  3 4 5  0) 2 3 4 5 93H (EMRl) (variable: 3-5) (window: 21x21)  93H(EMR1) (variable: 3-5) (window: 7x7)  1 2 1 2 3  D 12) 3 4 5 1 2 3 4 5 6 1 2 3 4 5 ® ® 1 2  1 2 ® @ 5 ® ® 1 @ ® 4 ® ® ® ®  6 ® ®  gure 6.10a The significance test results for E M R l D E M errors for the two subareas (variable group: 3-5, window size: 7x7 and 21x21) 364  93Q (EMR1) (variable: 3-6) (window: 7x7)  93Q (EMR1) (variable: 3-6) (window: 21 x21)  1  1  2  1  2 3  2  1 2 1 2 1  2  3 4  5  3 4  5  3 4  6 1  2  3 4  5  6 7  1  2  3 4  5  6  93H (EMR1) (variable: 3-6) (window: 7x7)  93H (EMR1) (variable: 3-6) (window: 21x21)  1  1  2  7 8  2  CD 2 ®  1 2  3 4  5  1 2  7 gure 6.10b  8  3 4  5  1  2 3 4  5  1  2 3 4  5 6 7  1  2 3 4  5 6 (7)  6  The significance test results for EMR1 D E M errors for the two subareas (variable group: 3-6, window size: 7x7 and 21x21) 365  93Q (EMRl) (variable: 2) (window: 7x7)  93Q (EMRl) (variable: 2) (window: 21x21)  1 2  1 2  1 2  3 4  1 2 ® ® 5  1 2  3 4  1 2  3 4 5  1 2 3 4 5 6 1) (2) 3 4 5 6 7  1 2 3 4 5 6 7  1 2  1 2 3  3 4  93H (EMRl) (variable: 2) (window: 21x21)  93H (EMRl) (variable: 2) (window: 7x7)  1 2  1 2  1 2 3 1 2  3 4  1 2 3 4 5  1 2  3 4 5  1 2 3 4 5 6  1 2 3 4 5 6  1 2 3 ® 5 6 <3)  1 2 3 4 5 6 7  1 2  1 2 3 4 5 6 7 8  Figure 6.10c The significance test results for E M R l D E M errors for the two subareas (variable group: 2, window size: 7x7 21x21) 366  93Q (EMR1) (variable: 5) (window: 7x7)  93Q (EMR1) (variable: 5) (window: 21x21)  1 2  1 2  3 4  1 2  3 4 5  1 2  3 4 5  1 2 3 4 5 6  1 2 3 4 5 6  1 2 3 4 5 6 7  1 2 3 4 5 6 7  1 2  1 2 3 4 5 6 7 8  3 4 5  (7) 8  93H (EMR1) (variable: 5) (window: 7x7)  93H (EMR1) (variable: 5) (window: 21x21)  1 2  1 2  1 2 3 1 © 3 ®  CD 2 3 1 2 3 4 5 6  0) 2 3 (4) 5 1 2 Figure 6. lOd  6) 7 8 The significance test results for EMR1 D E M errors for the two subareas (variable group: 5, window size: 7x7 and 21x21) 367  terrain clusters (i.e., Figures 6.9a-d and Figures 6.10a-d), it can be seen that there is some variation from case to case in terms of the best variable group and window size for D E M error modelling in each subarea. As shown in Figures 5.10a-d or Figures 5.1 la-d in section 5.2.2.2, the number of variables in the geometric signature used for the multivariate clustering affects the classification result. Similarly, significance test results (e.g., Figures 6.9a-d or 6.10a-d) also show variation from case to case depending on the geomorphometric variable group used in deriving the terrain clusters.  The size of moving window used in  geomorphometric parameter abstraction also has an impact on the test results. Generally speaking, however, for both subareas all four different variable groups and the two different window sizes demonstrated some usefulness in classifying the terrain and relating the D E M errors to the variability of the topographic surface.  6.2.3.2  Significance test of NGDC5 and WSC2 D E M errors in the whole study area  For NGDC5 and WSC2 D E M errors as compared to E M R l D E M for the whole study area, the same analysis procedure as described above was followed.  The F test was done for  different cases (window size: 5x5 or 9x9; variable group: 3-5, 3-6, 2, or 5; number of clusters: 2 to 8) to examine the significance of difference between D E M errors (in terms of the standard deviation of NGDC5 or WSC2 D E M errors) in different terrain clusters.  Significance test results of NGDC5 and WSC2 D E M errors in the whole study area are presented in Figures 6.11a-b and 6.12a-b for the different cases (window size: 5x5 or 9x9;  368  MQDC5 (variable: 3-5) (window: 5x5)  nQDC5 (variable: 3-6) (window: 5x5)  1  2  1  2  1  2  1  2  1  2 3 4  5  1  2 3 4  5  6  1  2 3 4  5  6 7  3  (D @ ® 4 5 ® ®  8  riGDC5 (variable: 2) (window: 5x5)  2  1  2  1  2 3 4  5  1  2 3 4  5  1  2  5  1 2 gure 6.1 l a  3  3  3  6 ®  4  8  NGDC5 (variable: 5) (window: 5x5)  1  3 4  1 2  3  2  3  6  2  3 4  5  6  6 7  2  3 4  5  6 7  2  3 4 ® ®  4  ®  The significance test results for NGDC5 D E M errors for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5) 369  NGDC5 (variable: 3-5) (window: 9x9)  MGDC5 (variable: 3-6) (window: 9x9)  1  2  1 2 3  5  1 2  1 2 3 4  5 6  1 2 3 4  5 6  1 2 3 4  5 6 7  1 2 3 4  5 6 7  1 2  1 2  3 4  3 4  7  3 4  5  8  NGDC5 (variable: 2) (window: 9x9)  NGDC5 (variable: 5) (window: 9x9)  1 2 1 2 3  1  2 3 4  2  3  5  1 2 3 4  5 6  2  3 4  5 6  1  2 3 4  5 6 7  2  3 4  5 6 7  1  3  5 6 ®  Figure 6.1 lb  4  8  5  6 ® ®  The significance test results for NGDC5 D E M errors for the whole study (variable group: 3-5, 3-6, 2, and 5; window size: 9x9) 370  WSC2 (variable: 3-6) (window: 5x5)  WSC2 (variable: 3-5) (window: 5x5)  1 2  1 2  1  2 3  1  2 3 ® (B) 6  1 2 3 4 5 1 2 3 4  5 6  WSC2 (variable: 5) (window: 5x5)  WSC2 (variable: 2) (window: 5x5)  1 2  1 2 1 2 3  1 2 3 4 5  1 2  1 2 3 4  5 6  1 2 3 4  1 2 3 4  5 6 7  1 2 3 4  5 6 7 8  gure 6.12a  3 4  5 5 6  Q 2 3 4 5 6 (7) 1 2  3 4  5 ® ® ®  The significance test results for WSC2 D E M errors (as compared to EMR1 DEM) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 5x5) 371  WSC2 (variable: 3-5) (window: 9x9)  1 2 3 4  WSC2 (variable: 3-6) (window: 9x9)  1 2  5  3 4  5  M ^ ^ 4  CD WSC2 (variable: 2) (window: 9x9)  <2> 3 @  WSC2 (variable: 5) (window: 9x9)  1 2 1 2 3  1 2  3 4  5  1 2 3 4  5 6  2  3 4  5 6  1 2 3 4  5 6 7  2  3 4  5 6 7  1 2 3 4  5 6  2  3 4  5 6  Figure 6.12b  7 8  7 8  The significance test results for WSC2 D E M errors (as compared to E M R l D E M ) for the whole study area (variable group: 3-5, 3-6, 2, and 5; window size: 9x9) 372  variable group: 3-5, 3-6, 2, or 5). Figure 6.11a shows the test results for N G D C 5 D E M errors (as compared to E M R l D E M ) with four different variable groups for window size (5x5). Figure 6.11b gives the test results for NGDC5 D E M errors for window size (9x9). Figures 6.12a-b present the results for WSC2 D E M errors (as compared to E M R l DEM) with four different variable groups for window sizes (5x5) and (9x9). Based on these results, several observations can be made.  First, for both NGDC5 and WSC2 D E M errors (as  compared to E M R l DEM), all four different variable groups and the two different window sizes were capable of classifying the terrain and relating the D E M errors to the variability of the topographic surface.  The standard deviation of NGDC5 or WSC2 D E M errors shows  great variation from cluster to cluster. For NGDC5 D E M errors, the variation between some of the clusters becomes statistically insignificant at level 8 in most cases except for one case with variable group (3-5) and window size (5x5). Although the differences in variances in this case become statistically insignificant at level 5 (clusters {2, 5} and clusters {3, 4}), their means are actually significantly different (276.51 m versus 81.75 m and 4.73m versus 115.39 m as seen in Table 6.9a). For WSC2 D E M errors, as summarized in Tables 6.13a-b to 6.16ab, the variation between some of the clusters becomes statistically insignificant at levels ranging from 5 to beyond 8 for the various cases. Second, for the characterization of the spatial patterns of NGDC5 and WSC2 D E M errors, the differences among the capabilities of the four variable groups are small, especially with window size (9x9). This might be largely because of the relatively high K H A T values (ranging from 0.47 to 0.67 for window size 9x9) as observed in Figure 5.16g for the comparisons between classifications using different variable groups. The same is true for the two different window sizes, especially with variable  373  groups (3-6), (2), and (5). Again this is probably because of the high agreement ( K H A T value ranging from 0.65 to 0.70) in the classifications using the two different window sizes with these three variable groups. Overall, variable group (2) seems to be slightly superior to the other variable groups in its capability of differentiating the terrain clusters for D E M error modelling.  This suggests that a multivariate approach to the classification of  topographic surfaces for D E M error modelling is not necessarily more effective, as expected earlier.  6.3  Summary  This chapter examined the correlations between the D E M errors and each of the seven local roughness measures and statistically evaluated the observed spatial pattern of mismatch between DEMs of differing resolution in conjunction with the terrain clusters resulting from various multivariate classifications.  In summary, from Figures 6.9a-d to 6.12a-b and Tables 6.1a-d to 6.16a-b, certain patterns can be identified in terms of the relation between D E M errors and terrain roughness. First of all, the results indicate that both WSC2 and E M R 1 D E M errors (elevation differences as compared to T R I M DEMs) in each of the two subareas and NGDC5 and WSC2 D E M errors (elevation differences as compared to EMR1 DEM) in the whole study area do show some significant variation between different clusters. Cluster analysis was, therefore, successful to a large degree in grouping the areas according to their overall roughness, and useful  374  in D E M error modelling. Another statement that can be made is that the multivariate approach to the classification of topographic surfaces for D E M error modelling is not necessarily more successful than using only a single roughness measure such as slope and std in characterizing the overall roughness of terrain. This is especially true when the overall amount of D E M errors in each cluster is characterized by the standard deviation of the errors or the mean of the absolute D E M errors in the cluster. Overall, variable group (2) appears to be the best group for the classification of terrain for the examination of the magnitude and spatial pattern of D E M errors. The clusters formed using variable group (3-5) seem to be the least discriminating of all four groups.  However, variable  groups (3-5) and (3-6) do seem to be more capable of differentiating the positive and negative elevation differences than the single variable groups. As can be seen in Tables 6.1a-d to 6.16a-b, a comparison of the summaries of the original and the absolute D E M errors in various cases shows a more consistent rankings of error clusters for the two single variable groups (2) and (5) than for the two multivariate groups (3-5) and (3-6) (also see section 6.2.2). When comparing the D E M error modelling results for surfaces with different global characteristics, the size of moving window used in the geomorphometric parameter abstraction appears to have some impact on the modelling results. In the two subareas tested, for example, for the flatter surface (i.e., 93G), the larger moving window size (i.e., 21x21) with variable group (3-5) works better than the smaller window size (i.e., 7x7) when extracting local geometric measures for hierarchical terrain classification—if better is defined as being more capable of differentiating various error clusters by variance. With variable group (3-6), the smaller window size (7x7) works better than the larger window size  375  (21x21) for subarea 93G.  Based on the various test results presented in this chapter, it would appear that the answer to the question raised in Chapter 3 is generally positive and, therefore, the thesis hypothesis can be accepted. That is, knowledge of the landscape characteristics does provide some insights into the nature of the inherent error (or uncertainty) in a D E M and can be useful for D E M error modelling. It is not possible to provide further blanket statements about the overall fit of the D E M error model because of the variation from case to case, and the fact that some of the variation could not be explained by the cluster structure. Outright rejection of a D E M error model based on topographic characterization is certainly not warranted, but neither is a blind application. It seems that the variability in various test results is more a function of the methods used to obtain the terrain clusters (i.e., different moving window sizes and variable groups) than it is a reflection of any theoretical inadequacy of the model. Another important point to make is that the part of the D E M error variation that was not explained by the cluster structure could be attributed to the interpolation errors introduced when comparing DEMs to the one with higher resolution to determine D E M errors. Although as indicated in Chapter 3 (see Figures 3.22 to 3.25), the interpolation errors (i.e., mismatch caused by using different interpolators) are minor when compared to the D E M errors (the difference between the D E M data sets of differing resolution), and the effects of different interpolation methods can be ignored in the interpolation procedure for comparison of a lower resolution D E M with a higher one, there are still some differences between DEMs of same resolution but interpolated using different  376  methods as seen in Figures 3.24 and 3.25 for subareas 93G and 93H. Therefore, more research is needed to understand the relation between interpolation errors and topographic characteristics and the impact of this on D E M error modelling.  377  378  7.  S U M M A R Y A N D CONCLUSIONS  7.1  Summary  The intensive use of D E M for a wide range of geographic analyses has given rise to many accuracy investigations. These are usually conducted along three lines: (1) describing or identifying possible sources of gross errors, (2) evaluating the effect of varying densities of sampling data or that of different interpolation methods, (3) comparing the "products" (e.g., spot height, contours) derived from a D E M with those obtained directly from terrestrial and photogrammetric surveying procedures, mostly from a producer's perspective. The D E M accuracy estimate is usually restricted to a global measure such as root-mean-square error (RMSE). Seldom are the errors examined in terms of their spatial distribution pattern or how the resolution of the D E M interacts with the topographic surface variability. There is a wide range of topographic variation present in any terrain surface. Thus, in defining the accuracy of a D E M , one would ultimately like to know the spatial variation of the terrain and how the resolution interacts with this variation.  This thesis has two primary objectives: (1) to describe and analyze the spatial variation of D E M errors and to attempt to explain the pattern of this variation based on topographic characterization in order to investigate the relation between D E M errors and the roughness characteristics of terrain; (2) to examine the role of scale in topographic characterization and to see what effects the global characteristics of topographic surfaces have on the results of  379  D E M error modelling.  The present research has accomplished several tasks. First, DEMs of various resolutions (i.e., 10-arc-minute NGDC10, 5-arc-minute NGDC5, 2 km WSC2, 1 km E M R l , and 50 m T R I M 93G and 93H) in a study area around Prince George, British Columbia, were compared to each other and their mismatches were examined (i.e., NGDC5 and WSC2 D E M errors as compared to 1 km E M R l D E M for the whole study area, and WSC2 and E M R l D E M errors as compared to 50 m T R I M DEMs for the two subareas).  Based on the preliminary test  results, some observations were made regarding the relations among the spatial distribution of D E M errors, D E M resolution and the roughness of terrain. A hypothesis was then formed suggesting that knowledge of the landscape characteristics might provide some insights into the nature of the inherent error and/or uncertainty in a D E M .  To test this statistically,  characterization of the variation and complexity of the study area surfaces was conducted by means of general geomorphometric measures such as "local relief" and "slope." Seven local roughness variables were identified and extracted from study area DEMs using the moving window technique. For the whole study area, the measures were derived based on the 1 km E M R l D E M and two different moving window sizes (i.e., 5x5 and 9x9) were tested. For the two subareas (93G and 93H) the measures were derived from the 50 m T R I M DEMs using two different window sizes (7x7 and 21x21). In order to select the appropriate moving window sizes, the global characteristics of terrain were examined to identify the important scale breaks which indicate at which scales the original terrain contains higher variability. The global characteristics were described by measures such as grain, and those derived from  380  spectral analysis, nested analysis of variance and fractal analysis. A multivariate cluster analysis was then used for automated hierarchical terrain classification in which relatively homogeneous terrain units at different scale levels were identified. Several different variable groups were tested in the cluster analysis and different classification results were compared to each other and interpreted in relation to each roughness measure. Finally, the correlations between the D E M errors and each of the local roughness measures were examined, and the variation of D E M errors within various terrain clusters resulting from multivariate classifications were statistically evaluated.  The effectiveness of using different moving  window sizes for the extraction of the local measures and the appropriateness of different variable groups for terrain classification were also evaluated.  7.2  Conclusions  The major conclusion of this study is that knowledge of topographic characteristics does provide some insights into the nature of the inherent error and/or uncertainty in a D E M and can be useful for D E M error modelling. The measures of topographic complexity are related to the observed patterns of discrepancy between DEMs of differing resolution, but there are variations from case to case.  Several patterns can be identified in the relation  between D E M errors and the roughness of terrain. First of all, results indicate that both WSC2 and EMR1 D E M errors (elevation differences as compared to T R I M DEMs) in each of the two subareas and NGDC5 and WSC2 D E M errors (elevation differences as compared to EMR1 DEM) in the whole study area do show significant variation among clusters as a  381  result of terrain classifications based on different variable groups and window sizes. Cluster analysis was, therefore, considered successful in grouping the areas according to their overall roughness and, thus, useful in D E M error modelling. However, some of the total variation of various D E M errors could not be accounted for by the cluster structure derived from multivariate classification.  Second conclusion of this thesis is that the multivariate approach to the classification of topographic surfaces for D E M error modelling is not necessarily more successful than using only a single roughness measure such as slope or std (i.e., standard deviation of elevations) in characterizing the overall roughness of terrain. This is especially true when the overall amount of D E M error in each cluster is characterized by the standard deviation of the errors or the mean of the absolute D E M errors in the cluster. Overall, variable slope (i.e., #2) proves to be the best single variable for the classification of terrain for D E M error modelling. Variable group (3-5) (i.e., curv, hypint, and std) seems to be the worst of all four groups. However, variable groups (3-5) (i.e., curv, hypint, and std) and (3-6) (i.e., curv, hypint, std, and highpt) are more capable of differentiating the positive and negative elevation differences. In this sense, the assertion made in Chapter 4 regarding the necessity of using multiple signatures to quantitatively describe the terrain proves to be true. When comparing the D E M error modelling results for surfaces with different global characteristics, the size of the moving window used in geomorphometric parameter abstraction also has certain impact on the modelling results. Some understanding of the global characteristics of the surfaces is, therefore, useful in the selection of appropriate/optimal window sizes for the extraction of  382  local measures for D E M error modelling. Based on the important scale breaks identified in the surfaces and the scale levels being considered, it can be expected that certain window sizes will work better than some others in characterizing the local roughness and will, hence, be more useful in differentiating various D E M error clusters.  Finally, considering the  advantages and disadvantages of various global surface characterization methods, the variogram analysis appears to be the most appropriate and useful method for the purpose of identifying important scale breaks in the surface.  7.3  A Practical Guide for the Users  Based on the above conclusions about D E M error modelling, a practical guide to the understanding of D E M errors can be provided from a user's point of view. Following the steps for topographic characterization as illustrated in Figure 4.9, the user should first examine the surface global characteristics using the variogram analysis of study area D E M to find out where the important scale breaks are in the surface. Care should be taken though when extracting linear segments from the variogram plot. One or two of the other global characterization methods, such as the grain measure and the spectral analysis, can be used in addition to the variogram analysis to confirm or identify other important scales of variation. Nested analysis of variance can provide only general and limited information on the surface characteristics, and is, thus, of little practical use. The information on the surface global characteristics can be used then to guide the selection of an optimal window size for the extraction of local measures such as slope from the D E M . Since the single variable slope  383  has proven to be sufficient in characterizing the local roughness of terrain for the purpose of D E M error modelling, a multivariate terrain classification would not be necessary.  A  hierarchical clustering using a single local variable such as slope would result in terrain clusters with varying degrees of roughness at various scale levels.  As demonstrated in this study, the D E M errors/uncertainties show significant variation from cluster to cluster. The rougher the cluster (i.e., relatively greater average slope values), the larger the standard deviation of the errors or the mean of the absolute D E M errors in the cluster. Therefore, the spatial pattern of D E M errors/uncertainties can be understood using the results of topographic characterization. In addition, this information can be used to guide the surface generalization process. For areas with greater uncertainty, a higher degree of generalization would be appropriate, whereas for areas with less uncertainty, a lower degree of generalization would be justifiable.  7.4  Discussions and Future Research  There is no single standardized set of variables that can be used in every instance to classify terrain. For example, an application that is attempting to classify an area's susceptibility to landslides will use different measures than one that is trying to establish physiographic regions. Theoretically, the selection of variables used in a classification should reflect known properties of the phenomena under investigation. The development of numeric variables to quantitatively express terrain complexity and roughness has been a strongly interdisciplinary  384  effort.  Often, mathematicians and computer scientists are responsible for developing new  measures, while applications and testing methods are developed in fields such as geography, geology, and psychology [DeLotto, 1989]. In geography, the pioneering work of Strahler [1950] and Bunge [1962] has led to a tradition of surface quantification for geographic representations and modelling. Many of these measures, as described in Chapter 4, have proven to be useful in capturing the information that is required for successful D E M error modelling. There is, however, still a need to search for and develop new measures for more effective quantitative topographic characterization. It is possible, though, that we will never find measures which can capture "everything." Some of the D E M uncertainty could be a result of "randomness" in the topography which is not quantifiable.  Just as the measures used to characterize the complexity of topography are slightly inadequate, so are the classification algorithms used to identify relatively homogenous terrain clusters. Specific landforms only occur over a limited range of scales. This observational scale-dependency of terrain has proven to be an impediment to the successful implementation of automated terrain classification methods. Also there is still room for improving the D E M error models used in this study. These will probably need new methods developed for spatial statistical analysis.  Finally, it is useful to suggest that considerable future research is needed into the following areas: 1)  In order to reach more generalized conclusions on the usefulness of  385  topographic characterization for D E M error modelling, more tests are necessary to investigate and compare the effectiveness of using other variable groups and moving window sizes for D E M error modelling for various surfaces with different resolutions and distinctive global characteristics. 2)  In addition, further examination of the relation between surface roughness and window sizes using a variety of surfaces of varying roughness and a variety of window sizes should be carried out. This could be done using synthetic surfaces (e.g., surfaces of known fractal dimension).  3)  As mentioned in Chapter 6, the part of the total variation of D E M errors that was not explained by the cluster structure partially could be attributed to the interpolation errors introduced when comparing a D E M of coarse resolution to one of finer resolution.  Although it has been demonstrated that the  mismatch caused by using different interpolators is relatively small compared to the mismatch between the DEMs of differing resolution, and, therefore, the effects of different interpolation methods were ignored in the interpolation procedure for comparison of a lower resolution D E M with a higher one, there are still some differences between DEMs of same resolution but interpolated using different methods.  Therefore, more research is needed in order to  understand the nature of interpolation errors, their relation to the topographic characteristics, and the impact of this relation on D E M error modelling.  386  BIBLIOGRAPHY Akima, H . 1978a. " A Method of Bivariate Interpolation and Smooth Surface Fitting for Irregularly Distributed Data Points." ACM Transactions on Mathematical Software, Vol.4, No.2, pp.148-159. Akima, H . 1978b. " A L G O R I T H M 526: Bivariate Interpolation and Smooth Surface Fitting for Irregularly Distributed Data Points." ACM Transactions on Mathematical Software, Vol.4, No.2, pp. 160-164. American Society of Civil Engineers (Committee on Cartographic Surveying, Surveying and Mapping Division). 1983. Map Uses, Scales and Accuracies for Engineering and Associated Purposes, New York. American Society of Photogrammetry. 1978. Proceedings of the Digital Terrain Models Symposium, A S P - A C S M , St. Louis, Missouri, May 9-11. American Society of Photogrammetry (Committee for Specifications and Standards, Professional Practice Division). 1985. "Accuracy Specification for Large-Scale Line Maps." Photogrammetric Engineering and Remote Sensing, Vol.51, No.2, pp.195-199. Aronoff, S. 1989. Geographic Information Systems: A Management Perspective, Ottawa: W D L Publications. Balser, R. 1989. "Terrain Resource Information Management (TRIM)--A Standardized GeoReferencing Database for the Province of British Columbia." Proceedings of GIS'89 Symposium, Vancouver, British Columbia. Bethel, J.S. and E . M . Mikhail. 1983. "On-Line Quality Assessment in D T M . " Technical Papers, American Congress on Surveying and Mapping—American Society of Photogrammetry Fall Convention, pp.576-584. Bishop, Y.S., S. Fienberg, and P. Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge: MIT Press. 575p. Blais, J.A.R. 1988. "Digital Terrain Modelling for Spatial Information Systems." Proceedings of the Third International Seminar on Trends and Concerns of Spatial Sciences, Laval University, Quebec. Boehm, B.W. 1967. "Tabular Representations of Multivariate Functions—With Applications to Topographic Modelling." Proceedings A.C.M. National Meeting, pp.403-415. Braile, L . W . 1978. "Comparison of Four Random to Grid Methods." Computers and Geosciences, Vol.4, No.4, pp.341-349. 387  Bunge, W. 1962. Theoretical Geography, Lund, Sweden: C.W.K.Gleerup. Burrough, P.A. 1981. "Fractal Dimensions of Landscapes and Other Environmental Data." Nature, Vol.294, 19 November, pp.240-242. Burrough, P.A. 1983. "Multiscale Sources of Spatial Variation in Soil: I. The Application of Fractal Concepts to Nested Levels of Soil Variation, Journal of Soil Science, Vol.34, pp.577598. Burrough, P.A. 1986. Principles of Geographic Information System for Land Resources Assessment, Oxford University Press. Campbell, J.B. 1987. Introduction to Remote Sensing, London: Guilford Press. Carr, J.R. 1995. Numerical Analysis for the Geological Sciences, New Jersey: Prentice Hall. Carter, J.R. 1988. "Digital Representation of Topographic Surfaces." Photogrammetric Engineering and Remote Sensing, Vol.54, No. 11, pp. 1577-1580. Carter, J.R. 1989. "Relative Errors Identified in USGS Gridded DEMs." Proceedings ofAutoCarto 9, American Congress on Surveying and Mapping Bethesda, Maryland, pp.255-265. Caruso, V . M . 1987. "Standards for Digital Elevation Models." Technical Papers, American Society of Photogrammetry and Remote Sensing—American Congress on Surveying and Mapping Annual Convention, 4, pp. 159-166. Cheong, A . L . 1992. Quantifying Drainage Basin Comparisons Within a Knowledge-Based System Framework, M.Sc. thesis, University of British Columbia, Vancouver, B.C., 133p. Christian, C.S. and G.A. Stewart. 1968. "Methodology of Integrated Surveys." Proceedings of the Conference on Aerial Surveys and Integrated Studies, Toulouse, Unesco, pp.233-280. Church, M . and D . M . Mark. 1980. "On Size and Scale in Geomorphology." Progress in Physical Geography, Vol.4, pp.342-390. Clark, W . A . V . and P.L. Hosking. 1986. Statistical Methods for Geographers, John Wiley & Sons, Inc. Clarke, J.I. 1966. "Morphometry from Maps." G.H. Dury (ed.): Essays in Geomorphology, Heinemann, London, pp.235-274. Clarke, K . C . 1987. "Scale-Based Simulation of Topography." Proceedings of Auto-Carto 8, American Congress on Surveying and Mapping, pp.680-688.  388  Clarke, K . C . and D . M . Schweizer. 1991. "Measuring the Fractal Dimension of Natural Surfaces Using a Robust Fractal Estimator." Cartography and Geographic Information Systems, Vol.18, N o . l , pp.37-47. Cohen, J. 1960. " A Coefficient of Agreement for Nominal Scales." Educational and Psychological Measurement, Vol.20, N o . l , pp.37-40. Congalton, R.G. 1988a. "Using Spatial Autocorrelation Analysis to Explore the Errors in Maps Generated from Remotely Sensed Data." Photogrammetric Engineering and Remote Sensing, Vol.54, No.5, pp.587-592. Congalton, R . G . 1988b. " A Comparison of Sampling Schemes Used in Generating Error Matrices for Assessing the Accuracy of Maps Generated from Remotely Sensed Data." Photogrammetric Engineering and Remote Sensing, Vol.54, No.5, pp.593-600. Congalton, R.G., R.G. Oderwald, and R.A. Mead. 1983. "Assessing Landsat Classification Accuracy Using Discrete Multivariate Analysis Statistical Techniques." Photogrammetric Engineering and Remote Sensing, Vol.49, No. 12, pp. 1671-1678. Corbett, J.D. and P.J. Gersmehl. 1987. "Terrain Data for a Water Resources GIS." D . A . Brown and P.J. Gersmehl (eds.): File Structure Design and Data Specifications for Water Resources Geographic Information Systems, pp.11-5 to 11-28. Csillag, F., A . Kummert, and M . Kertesz. 1992. "Resolution, Accuracy, and Attributes: Approaches for Environmental Geographical Information Systems." Computer, Environment, and Urban Systems, Vol.16, pp.289-297. Davis, J.C. 1986. Statistics and Data Analysis in Geology, John Wiley & Sons, Inc. DeLotto, J.S. 1989. The Role of Scale in Automated Terrain Classification, M . A . thesis, the State University of New York at Buffalo, Buffalo, New York, 53p. Dozier, J. and S.I. Outcalt. 1979. "An Approach toward Energy Balance Simulation over Rugged Terrain." Geographic Analysis, Vol.11, N o . l , pp.65-85. Dubayah, R. and F.W. Davis. 1988. "Factors Influencing the Utility of Digital Elevation Models in Ecological Research." Proceedings of the 3rd International Symposium on Spatial Data Handling, Sydney. Dunn, G . and B.S. Everitt. 1982. A n Introduction to Mathematical Taxonomy, Cambridge University Press. Eastman, J.R. 1985. "Single-Pass Measurement of the Fractal Dimension of Digitized Cartographic Lines." paper presented at the annual meeting of the Canadian Cartographic 389  Association, New Brunswick. Evans, I.S. 1972. "General Geomorphometry, Derivatives of Altitude, and Descriptive Statistics." R.J. Chorley (ed.): Spatial Analysis in Geomorphology, London: Methuen & Co. Ltd., pp. 17-90. Evans, I.S. 1980. "An Integrated System of Terrain Analysis and Slope Mapping." Zeitschrift fur Geomorphology, N.F., Suppl. -Bd.36, pp.274-295. Felgueiras, C.A. and M.F. Goodchild. 1995. "A Comparison of Three TIN Surface Modelling Methods and Associated Algorithms." Technical Report 95-2, National Center for Geographic Information and Analysis. Fisher, P. 1991. "First Experiments in Viewshed Uncertainty: The Accuracy of the Viewshed Area." Photogrammetric Engineering and Remote Sensing, Vol.57, No.10, pp.1321-1327. Fisher, R.A. 1953. "Dispersion on a Sphere." Proceedings of the Royal Society of London, Series A, Mathematical and Physical Sciences, Vol.217, pp.295-305. Frederiksen, P. 1981. "Terrain Analysis and Accuracy Prediction by Means of the Fourier Transformation." Photogrammetria, 36: 145-157. Frederiksen, P., O. Jacobi and K . Kubik. 1985. " A Review of Current Trends in Terrain Modelling." ITC Journal, 1985-2, pp.101-106. Gerrard, A.J.W. and D.A. Robinson. 1971. "Variability in Slope Measurements. A Discussion of the Effects of Different Recording Intervals and Micro-Relief in Slope Studies." Transactions of the Institute of British Geographers, Vol.54, pp.45-54. Gittings, B . 1994. "Digital Elevation Data Catalogue." GIS-L email network. Goodall, D.W. 1966. "Deviant Index: A New Tool for Numerical Taxonomy." Nature, Vol.210, pp.216. Goodchild, M.F. 1980. "Fractals and the Accuracy of Geographical Measures." Mathematical Geology, Vol.12, No.2, pp.85-97. Goodchild, M.F. 1982. "The Fractional Brownian Process as a Terrain Simulation Model." Modelling and Simulation, Vol.13, pp.1133-1137. Goodchild, M . F . 1988. "The Issue of Accuracy in Global Databases." H . Mounsey (ed.): Building Databases for Global Science, London: Taylor & Francis Ltd., pp.31-48. Goodchild, M.F. 1989. "National Center for Geographic Information and Analysis, Research 390  Initiative One: Accuracy of Spatial Databases." Report of Specialist Meeting, University of California—Santa Barbara. Goodchild, M.F. 1992. Research Initiative 1: Accuracy of Spatial Databases. Final Report, National Center for Geographic Information and Analysis, University of California—Santa Barbara. Goodchild, M.F. and S. Gopal. 1989. The Accuracy of Spatial Databases, London: Taylor & Francis Ltd. Goodchild, M . F . and D . M . Mark. 1987. "The Fractal Nature of Geographic Phenomena." Annals of the Association of American Geographers, 77(2), pp.265-278. Goodchild, M.F. and N.J. Tate. 1992. "Forum: Description of Terrain as a Fractal Surface, and Application to Digital