Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Toward verification of a natural resource uncertainty model Davis, Trevor John 1999

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1999-463362.pdf [ 43.62MB ]
Metadata
JSON: 831-1.0089266.json
JSON-LD: 831-1.0089266-ld.json
RDF/XML (Pretty): 831-1.0089266-rdf.xml
RDF/JSON: 831-1.0089266-rdf.json
Turtle: 831-1.0089266-turtle.txt
N-Triples: 831-1.0089266-rdf-ntriples.txt
Original Record: 831-1.0089266-source.json
Full Text
831-1.0089266-fulltext.txt
Citation
831-1.0089266.ris

Full Text

Toward Verification of a Natural Resource Uncertainty Model by Trevor John Davis B.Sc, University of Victoria, 1992 M.Sc, University of Victoria, 1994 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY in , THE FACULTY OF GRADUATE STUDIES (Department of Geography) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA June, 1999 © Trevor John Davis, 1999 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of (S g-oc^ rcxp Wy\ The University of British Columbia Vancouver, Canada Date DE-6 (2/88) Abstract Natural resource management models simplify reality for the purpose of planning or manage-ment. In much the same way, an uncertainty model simplifies the many uncertainties that pervade the natural resource management model. However, though a number of uncertainty models have been developed, there has been little work on verifying such models against the uncertainty they purport to represent. The central research question addressed by this work is 'can a natural resource management uncertainty model be verified in order to evaluate its utility in real-world management?' Methods to verity uncertainty models are developed in two areas: uncertainty data models, and uncertainty propagation through process models. General methods are developed, and then applied to a specific case study: slope stability uncertainty in the southern Queen Charlotte Islands. Verification of two typical uncertainty data models (of classified soils and continuous slope) demonstrates that (in this case) both expert opinion inputs and published error statistics underestimate the level of uncertainty that exists in reality. Methods are developed to recalibrate the data models, and the recalibrated data are used as input to an uncertainty propagation model. Exploratory analysis methods are then used to verify the output of this model, comparing it with a high-resolution mass wastage database—itself developed using a new set of tools incorporating uncertainty visualisation. Exploratory data analysis and statistical analysis of the verification shows that, given the nature of slope stability modelling, it is not possible to directly verify variability in the model outputs due to the existing distribution of slope variability (based on the nature of slope model-ling). However, the verification work indicates that the information retained in uncertainty-based process models allows increased predictive accuracy—in this case of slope failure. It is noted that these verified models and their data increase real-world management and planning options at all levels of resource management. Operational utility is demonstrated throughout this work. Increased strategic planning utility is discussed, and a call is made for integrative studies of uncertainty model verification at this level. iii Table of Contents Abstract ii List of Tables vii List of Figures viii Acknowledgements x Chapter One: Introduction 1 1.1. The Problem 1 1.2. Major Questions 3 1.3. Research Orgariisation 5 1.4. Contribution to Knowledge 6 Chapter Two: B a c k g r o u n d 8 2.1. Introduction 8 2.1.1. GIS and Uncertainty 9 2.1.2. Chapter Layout 10 2.2. Error and Uncertainty 10 2.2.1. Uncertainty Defined 11 2.2.2. Quality 12 2.2.3. Subdivisions of Uncertainty 13 2.2.3.1. Positional Uncertainty 14 2.2.3.1.1. Registration 14 2.2.3.1.2. Other Sources 15 2.2.3.1.3. Lines and Areas 15 2.2.3.2. Attribute Uncertainty 16 2.2.3.3. Temporal Uncertainty 19 2.2.4. Subdivision by Source 20 2.2.4.1. Inherent Uncertainty 20 2.2.4.2. Data Collection and Input Uncertainty 21 2.2.4.3. Data Interpretation 21 2.2.4.4. Data Entry 22 2.2.4.5. Data Manipulation Uncertainty 23 2.2.4.6. Propagation 23 2.2.4.7. Generalisation Issues 24 iv 2.2.5. Measures of Uncertainty 25 2.3. Uncertainty Modelling 26 2.3.1. Modelling 27 2.3.1;1. Modelling with GIS 27 2.3.1.2. Methods of Modelling 28 2.3.1.2.1. Bayesian Probability 29 2.3.1.2.2. Dempster-Shafer's Theory of Evidence 30 2.3.1.2.1. Non-Monotonic Logic 31 2.3.1.2.2. Fuzzy Sets 32 2.3.1.2.3. Lmking Fuzzy Sets With Attribute Data 35 2.3.1.2.4. Combining Fuzzy Classifications 36 2.3.1.2.5. Cardinal Values 37 2.3.2. Propagation of Uncertainty 37 2.3.2.1. Arithmetic Propagation 38 2.3.2.1. Monte Carlo 40 2.3.3. Uncertainty in Continuous Data 41 2.3.4. Surnmary 43 2.4. Communication of Uncertainty 43 2.5. Uncertainty in Forestry Data and Models 45 2.5.1. Uncertainty in Soil and Terrain Modelling 47 2.5.1.1. Soil 47 2.5.1.2. Elevation 48 2.6. Research Gaps 50 2.6.1. Uncertainty Modelling 50 2.6.2. Validation 50 2.6.3. Linking Uncertainty Management with Decision Making 51 2.7. Summary 52 Chapter T h r e e : M o d e l l i n g and S t o r i n g Measures of Uncertainty in Inventory. 54 3.1. Introduction 54 3.2. Methodology 56 3.2.1. The Corridor of Transition Model 57 3.2.1.1. The Semantic Import Model 57 3.2.1.1.1. Polygon Centres 59 3.2.1.1.2. Polygon Boundaries 61 3.2.2. DEM Randomisation 65 3.2.3. Combming Error and Uncertainty 67 3.3. Slope Stability Modelling 67 3.3.1. Combining and Summarising 68 3.4. Case Study 69 3.5. The Boundary Model and Attribute Uncertainty 70 V 3.6. The Monte Carlo Procedure 71 3.7. Results 73 3.7.1. Problems and Work Required 75 3.7.1.1. Verifying Parameters 75 3.7.1.2. Prediction 76 3.7.1.3. Reporting and Communicating Uncertainty 77 3.8. Summary ' 78 Chapter F o u r : V e r i f i c a t i o n of M o d e l Inputs 79 4.1. Introduction 79 4.2. Background 81 4.2.1. Fuzzy Classification 82 4.2.2. Maximum Likelihood 85 4.2.3. Continuous Classes - Fuzzy Clustering 86 4.2.3.1. Background - Fuzzy Clustering 86 4.2.3.2. Applying Fuzzy Clustering to Confirmatory Sampling 89 4.2.3.3. Structure of the Classes in Attribute Space 90 4.2.3.4. Nature of the Sample 92 4.2.3.5. Metrics and Measures for Membership Values 93 4.2.4. Summary of Theoretical Work 97 4.3. Application to Parameter Verification and Tuning 97 4.3.1. Samples Required 98 4.3.2. Methodology 99 4.3.2.1. Cross-Correlation 103 4.3.2.2. Data Summary 106 4.3.3. Results 107 4.3.4. Applying Changes 110 4.4. Discussion 113 4.5. Calibration of Continuous Data 114 4.6. Conclusions 117 Chapter F i v e : e v a l u a t i o n of Uncertainty M o d e l Output 120 5.1. Introduction 120 5.2. Methodology 123 5.3. Results 124 5.3.1. Comparison of Means 125 5.3.2. Alternative Realisations 128 5.3.3. Variance 129 5.3.4. Expected vs. Actual 131 5.3.5. Zonal Spatial Limits 133 5.3.6. Spatial Constraints 136 vi 5.3.7. Comparison: Old vs. New 136 5.4. Discussion 138 5.5. Conclusions 140 Chapter Six: D i s c u s s i o n 141 6.1. Introduction 141 6.2. Resource Management 142 6.2.1. Strategic Level 143 6.2.2. Tactical Level 144 6.2.3. Operational Level 146 6.3. Uncertainty Model Validation 147 6.4. Further Research 148 6.5. Summary 154 Chapter Seven: C o n c l u s i o n s 155 7.1. Research Timeline 155 7.2. Research Questions 156 B i b l i o g r a p h y 160 Appendix A : T h e H a r d k - M e a n s and Fuzzy c - M e a n s A l g o r i t h m s 174 Appendix B : G P S A c c u r a c y Statistics 177 i Appendix C: Cross C o r r e l o g r a m s and S i g n i f i c a n c e Tests for Sample Transects 179 Appendix D : Uncertainty V i s u a l i s a t i o n in the Development of a New Data Update T o o l for GIS 182 Appendix E : Development of the Mass Wastage Database 220 vii List of Tables 3.1. Misclassiflcation matrices derived from the SI model 59 3.2. Soil characteristics and estimated standard deviations 72 4.1. Calculation of the Mahalanobis distance for one sample 106 4.2. Maximum values of correlation coefficients 112 4.3. Original and updated misclassiflcation matrices 113 5.1. Summary statistics for slope stability predictions, based on mean values 126 List of Figures 1.1. A view of data flow and feedback (verification) loops in resource management 3 2.1. Location probability of a survey coordinate in 2-D space 13 2.2. Epsilon boundary model of a digitised line 15 2.3. Examples of probability density functions for line or digitising error. 16 2.4. Boolean and fuzzy classification models 34 2.5. Four probabilistic functions of spatial boundary uncertainty 42 3.1. Alternative centroid models for variable polygon shapes 60 3.2. The variable ridge model of a polygon's centre 61 3.3. The 'corridor of transition' model for spatial boundary uncertainty 63 3.4. Perspective view of the fuzzy surface representing soil type 1 65 3.5. Perspective view of the fuzzy surface representing soil type 1 65 3.6. Location of the Louise Island study site 70 3.7. Detail of transition corridors 71 3.8. The three types of surfaces resulting from the uncertainty modelling routine 73 3.9. Maximum likelihood summary of slope stability factor-of-safety 75 3.10. The spatial distribution of standard deviation of slope stability factor-of-safety 75 3.11. An example of an application-specific data summary 75 3.12. The worst-case-scenario summary 75 4.1. Simplified view of p-dimensional attribute space, fuzzy classes and samples 84 4.2. Notional distribution of individuals in attribute space 87 4.3. Hard classes and continuous classes 87 4.4. Mahalanobis distance in a 3-D attribute space 88 4.5. Classes viewed as structures in (A,B,C) attribute space 88 4.6. A new individual at an intergrade position between class A and B 91 4.7. Centroids 92 4.8. A sample hyper-polygon 93 4.9. Sample to class distance defined using fuzzy sets in attribute space 96 4.10. An overview of transect locations on the Louise Island test site 100 4.11. Idealised transects and the effects of shifting them within uncertainty bounds 102 4.12. Simplified examples of cross-correlograms 105 4.13. The sequence of polygons 'encountered' on each transect 108 4.14. Differences between measured slope and TRIM modelled slope 116 5.1. The Lyell Island study area 123 5.2. Relative frequency using an ML realisation 127 5.3. Relative frequency using a worst-case realisation 128 5.4. Factor-of-safety values for slide zones relative to number of cells 129 ix 5 . 5 . F a c t o r - o f - s a f e t y v a l u e s fo r n o n - s l i d e c e l l s r e l a t i v e to n u m b e r o f c e l l s 1 3 0 5 . 6 . T h e p r e v i o u s t w o f i g u r e s g r a p h e d u s i n g c u m u l a t i v e v a l u e s 1 3 0 5 . 7 . A s c a t t e r g r a p h o f v a r i a n c e v s . f a c t o r - o f - s a f e t y fo r s l i d e z o n e s 131 5 . 8 . P o p u l a t i o n ( r a n d o m s u b s e t ) v s . s t a n d a r d d e v i a t i o n 132 5 . 9 . W o r s t c a s e r e a l i s a t i o n f a c t o r - o f - s a f e t y v s . s t a n d a r d d e v i a t i o n 1 3 3 5 . 1 0 . W o r s t c a s e r e a l i s a t i o n u s i n g t h e u p p e r 5 0 % o f s l i d e z o n e s 1 3 3 5 . 1 1 . R e l a t i v e f r e q u e n c y o f t h e r e l a t i v e p o s i t i o n i n e a c h s l i d e 1 3 4 5 . 1 2 . P o s i t i o n o f l o w F S p r e d i c t e d a r e a s r e l a t i v e to s l i d e s 1 3 5 5 . 1 3 . A c o m p a r i s o n o f p r e d i c t i v e a c c u r a c y b e t w e e n t h e o r i g i n a l a n d u p d a t e d m o d e l s 1 3 7 5 . 1 4 . T h e l o c a t i o n o f d i f f e r e n c e s b e t w e e n t h e t w o m o d e l s r e l a t i v e to s l i d e z o n e s 1 3 8 6 . 1 . T y p i c a l g r a p h o f n o r m a l l y d i s t r i b u t e d u n c e r t a i n t y 1 4 8 X Acknowledgements T h i s p r o j e c t w a s f u n d e d i n p a r t b y F o r e s t R e n e w a l B r i t i s h C o l u m b i a (p ro ject # H Q 9 6 0 7 8 - R E ) . T h e a u t h o r w i s h e s to t h a n k t h e f o l l o w i n g ( in c h r o n o l o g i c a l o r d e r ) : D r . O l a f N i e m a n n for p r o v i d -i n g t h e o r i g i n a l d a t a a n d c o n t a c t s ; D r . P e t e r K e l l e r fo r a s s i s t a n c e w i t h c o n t a c t s , g r a n t m a n a g e -m e n t a n d c o n s i d e r a b l e h e l p w i t h b r a i n s t o r m i n g ; m y s u p e r v i s o r , D r . B r i a n K l i n k e n b e r g , w h o s t a r t e d t h e b a l l r o l l i n g , b r a i n s t o r m e d , e d i t e d , a n d p e r f o r m e d a l l t h e o t h e r t h a n k l e s s s u p e r v i s o r y d u t i e s ; T o d d G o l u m b i a a n d h i s c r e w a t G w a i i H a a n a s A M B / P a r k s C a n a d a f o r l o g i s t i c a l s u p p o r t ( a n d s o m e d e l i c i o u s f i s h ) — s p e c i f i c a l l y , W a r d e n D e b b i e G a r d i n e r fo r s u p p o r t o n L y e l l ( a n d fo r c l i m b i n g a l l t h o s e s l ides ) a n d W a r d e n D e n n i e C h r e t i e n fo r a s s i s t i n g w i t h t h e a e r i a l v i d e o w o r k ; P a t B a r r i e r , G w a i i H a a n a s G I S a n a l y s t , fo r m a k i n g d a t a a v a i l a b l e a n d a l s o h e l p i n g w i t h l o g i s -t i c s ; M a c m i l l a n B l o e d e l a n d t h e i r e m p l o y e e s a n d c o n t r a c t o r s w h o a s s i s t e d w i t h d a t a a n d l o g i s -t i c s o n L o u i s e I s l a n d ; D r . R o s a l i n e C a n e s s a for a s s i s t i n g w i t h f i e ld w o r k a n d d e t e r r i n g b e a r s ; J o h n B a r k e r a n d o t h e r s a t W e s t e r n F o r e s t P r o d u c t s fo r m a k i n g d a t a a v a i l a b l e a n d s u p p o r t i n g t h e r e s e a r c h e x t e n s i o n s ; t h e s t a f f a t t h e M a l c o l m K n a p p R e s e a r c h F o r e s t fo r l o g i s t i c a l a n d f ie ld a s s i s t a n c e ; a n d , t h r o u g h o u t i t a l l , m y w i fe W a n d a . 1 Chapter One Introduction 1 . 1 . T H E P R O B L E M E f f e c t i v e n a t u r a l r e s o u r c e m a n a g e m e n t r e q u i r e s s i g n i f i c a n t a m o u n t s o f i n f o r m a t i o n . T o b e u s e -f u l , t h i s i n f o r m a t i o n m u s t b e u p to d a t e a n d a c c u r a t e , m u s t c o v e r t h e e n t i r e m a n a g e m e n t a r e a , a n d m u s t b e i n a u s e a b l e f o r m . H o w e v e r , r e s o u r c e m a n a g e m e n t i s r a r e l y p r a c t i s e d i n a n o p t i m a l i n f o r m a t i o n e n v i r o n m e n t . If d a t a a r e s i m p l y m i s s i n g , o u t o f d a t e , o r i n t h e w r o n g f o r m , t h e s o l u -t i o n s to t h e s e p r o b l e m s a r e o f t e n s t r a i g h t f o r w a r d ; h o w e v e r , d a t a accuracy i s s u e s a r e a f a r l e s s t r a c t a b l e p r o b l e m . A k e y i n g r e d i e n t i n i n c r e a s i n g t h e e f f e c t i v e n e s s o f n a t u r a l r e s o u r c e m a n a g e -m e n t d e c i s i o n - m a k i n g i s t h e d e v e l o p m e n t o f e f f i c i e n t a n d r e a l i s t i c m o d e l s o f d a t a a c c u r a c y . A m o d e l o f d a t a a c c u r a c y d i f f e r s f r o m a m e a s u r e o f a c c u r a c y . T h e l a t t e r a l l o w s o n l y e v a l u a t i o n , w h i l e t h e f o r m e r a l l o w s b o t h e v a l u a t i o n a n d f u r t h e r m a n i p u l a t i o n . A n u m b e r o f m o d e l s h a v e b e e n d e v e l o p e d t h a t p u r p o r t to r e a l i s t i c a l l y r e p r e s e n t o n e o r m o r e a s p e c t s o f d a t a a c c u r a c y . H o w e v e r , a c r u c i a l d e f i c i e n c y e x i s t s i n t h i s r e s e a r c h a r e a . W h e n a s t a n d a r d m o d e l i s d e v e l o p e d , a n i m p o r t a n t p h a s e i n t h e d e v e l o p m e n t p r o c e s s i s t h e t e s t i n g a n d v e r i f i c a t i o n o f t h e m o d e l . T h e q u e s t i o n m u s t b e a s k e d : d o e s t h e n e w r e p r e s e n t a t i o n o f t h e r e -s o u r c e o r r e s o u r c e - b a s e d p r o c e s s (the m o d e l ) accurately r e p r e s e n t t h e r e s o u r c e o r p r o c e s s i n t h e r e a l w o r l d ? W i t h o u t t h i s t y p e o f v e r i f i c a t i o n p r o c e s s t h e u t i l i t y o f t h e m o d e l i s i n q u e s t i o n . T h e e n t i r e f i e ld o f d a t a a c c u r a c y m o d e l l i n g s u f f e r s f r o m a n o t a b l e l a c k o f v e r i f i c a t i o n w o r k . 2 T h e r e s e a r c h p r e s e n t e d i n t h i s d o c u m e n t f o c u s e s o n t h e i s s u e o f a c c u r a c y (or, to b e m o r e p r e c i s e , ' u n c e r t a i n t y ' ) m o d e l v e r i f i c a t i o n . T h e c e n t r a l r e s e a r c h q u e s t i o n i s : c a n a n a t u r a l r e s o u r c e m a n -a g e m e n t u n c e r t a i n t y m o d e l b e v e r i f i e d i n o r d e r t o e v a l u a t e i t s u t i l i t y i n r e a l - w o r l d m a n a g e -m e n t ? F o r e x a m p l e , a m o d e l t h a t c l a i m s to r e p r e s e n t u n c e r t a i n t y i n s l o p e s t a b i l i t y m i g h t b e u s e d to r e d u c e t h e a m o u n t o f r o a d c o n s t r u c t i o n i n a n a r e a w h e r e u n c e r t a i n t y i s h i g h . T h e q u e s t i o n b e c o m e s : i s t h i s j u s t i f i e d b y t h e actual u n c e r t a i n t y ? A n s w e r i n g t h e a p p a r e n t l y s i m p l e r e s e a r c h q u e s t i o n i n v o l v e s a d d r e s s i n g a w i d e r a n g e o f i s s u e s . F o r o n e , a n u n c e r t a i n t y m o d e l c a n o p e r a t e a t a n y o n e o f a n u m b e r o f l e v e l s , f r o m t h e r e p r e s e n t a t i o n o f a n a t t r i b u t e o f a p a r t i c u l a r r e s o u r c e ( s u c h a s s o i l c o h e s i o n ) t h r o u g h to t h e r e p r e s e n t a t i o n o f a s p e c t s o f t h e h u m a n d e c i s i o n m a k i n g p r o c e s s ( s u c h a s r i s k m a n a g e m e n t i n r e s o u r c e a l l o c a t i o n ) . T h e u n c e r t a i n t i e s a t e a c h leve l a r e q u i t e d i f f e r e n t i n n a t u r e . V e r i f i c a t i o n a t e a c h leve l w i l l r e q u i r e u n i q u e m e t h o d s . T h e r e f o r e , i t w i l l b e n e c e s s a r y to f o c u s o n p a r t i c u l a r a s p e c t s . A s e c o n d i s s u e i n a n s w e r i n g t h e p r i n c i p a l r e s e a r c h q u e s t i o n i s t h a t t h e r e i s n o s t r a i g h t f o r w a r d yes o r n o s o l u t i o n . T h e r e e x i s t s n o s i n g l e s i m p l e s t a t i s t i c to d e t e r m i n e i f u n c e r t a i n t y a s m o d e l l e d m a t c h e s t h e a c t u a l l e v e l o f u n c e r t a i n t y . A n u n c e r t a i n t y m o d e l c a n r e p r e s e n t 'soft ' i n f o r m a t i o n s u c h a s ' t o o w h a t d e g r e e c a n I t r u s t t h i s v a l u e ' , o r ' w h a t i s t h e leve l o f risk a s s o c i a t e d w i t h t h i s d e c i s i o n ' (as c o m p a r e d to e a s i l y v e r i f i e d ' h a r d ' d a t a s u c h a s ' p e r c e n t s l o p e ' o r ' so i l c lass ' . ) T h e r e -fo re , i t w i l l b e n e c e s s a r y to m a k e u s e o f s u r r o g a t e m e a s u r e s a n d e x p l o r a t o r y a n a l y s i s to a p p r o a c h t h e a n s w e r to t h e p r i n c i p a l q u e s t i o n . N e v e r t h e l e s s , t h e r e s e a r c h q u e s t i o n i s a c r u c i a l o n e . T h e r e s e a r c h p r e s e n t e d h e r e i n r e p r e s e n t s a m a j o r s t e p i n a n o v e r a l l r e s e a r c h p r o g r a m i n t e n d e d to i n t e g r a t e u n c e r t a i n t y m a n a g e m e n t i n t o n a t u r a l r e s o u r c e d e c i s i o n m a k i n g . T h i s r e s e a r c h f o c u s e s o n m o d e l l i n g a n d v e r i f y i n g u n c e r t a i n t y a t b o t h t h e d a t a g a t h e r i n g a n d i n f o r m a t i o n p r o d u c t m o d -e l l i n g s t a g e s . In t h e p r o c e s s o f d e v e l o p i n g t h e v e r i f i c a t i o n m e t h o d s a n d r e s u l t s , s o m e i n i t i a l w o r k i s a l s o p e r f o r m e d o n t h e i n t e g r a t i o n o f u n c e r t a i n t y m o d e l s i n t o r e a l w o r l d m a n a g e m e n t d e c i s i o n m a k i n g . P r e s e n t a t i o n o f t h i s l a t t e r w o r k s e r v e s to h i g h l i g h t t h e n e e d f o r i n t e g r a t i v e r e s e a r c h ; h o w e v e r , s u c h r e s e a r c h c a n o n l y b e e f fec t i ve o n c e a l l t h e p i e c e s o f t h e p u z z l e a r e i n p l a c e . 3 Stages Data gathering Preliminary metadata 1a Verification and Evaluation Data modelling Data model metadata Creation of information products Metadata combination and propagation Utilisation of info product for mgmt and decision making Risk mgmt visualisation of metadata products (decision support) Evaluate data through tests Evaluate metadata through tests 2a Evaluate model Evaluate model metadata 3a Evaluate decisions made using info products Evaluate decisions made based on risk mgmt Contributions to Metadata Modelling 1. Measure and record uncertainty in data (Ch. 3) 1a. Evaluate fuzzy and mathematical model parameters (Ch. 4) 2. Fuzzy model/mathematical model of data uncertainty (Ch. 3) 2a. Evaluate process model (slope stability) uncertainty (Ch. 5) 3. Propagation through process model (Ch. 3) 3a. Discuss . . . future research (Ch. 6) 4. (Touch on) visualisation and risk management (Ch. 3,6) Figure 1.1. A view of data flow and feedback (verification) loops in resource management. The shaded boxes represent metadata flows. 1 . 2 . M A J O R Q U E S T I O N S The organisation of this research, and how it fits in to the flow of information in resource manage-ment, is presented in Figure 1.1. This figure represents a simplified typical information flow from initial data gathering through to the decision making stage. Work on uncertainty management superimposes another level onto each box in which the focus shifts to metadata (shaded sections). Metadata is data about data — information that explains the sources, fitness for use, and other similar factors. The feedback loops, representing data verification exercises, are well established for standard data collection. For example, the output of a forest growth model would be calibrated through field checks. However, the feedback loops and verification of metadata do not typically occur. The first box (data gathering) represents the initial stages at which raw data gathering takes place. The 'data modelling' stage (box two) represents the reduction of these data to a useable form, such as classification or downscaling (although these first two stages are often combined in tasks such as remote sensing). The principal purpose of metadata at the data modelling stage is to quantify how well the data model represents reality. The feedback loop from the data modelling to the gathering stage is typically performed through procedures such as classification accuracy checks or other types of spot sampling. However, the focus of the metadata feedback loop is to d e t e r m i n e i f t h e p r e d i c t e d v a r i a b i l i t y m a t c h e s t h e v a r i a b i l i t y o n t h e g r o u n d . S u c h v a r i a b i l i t y m a y b e a f u n c t i o n o f t h e d a t a g a t h e r i n g (e.g. , s e n s o r p r e c i s i o n o r r e s o l u t i o n ) o r o f t h e c l a s s i f i c a t i o n s t a g e (e.g. , r e d u c t i o n f r o m c a r d i n a l to i n t e r v a l d a t a ) . T h e s e m e t a d a t a c a n b e b a s e d o n a w i d e v a r i e t y o f i t e m s , f r o m t h e i n h e r e n t v a r i a b i l i t y i n t h e c l a s s i f i c a t i o n s c h e m e t h r o u g h to s p a t i a l v a r i a b i l i t y i n a c c u r a c i e s c a u s e d b y p o l y g o n a l s t r u c t u r e s . T h e i n h e r e n t c o m p l e x i t y o f q u a n t i f i e d m e t a d a t a c r e a t e s a n e q u a l l y c o m p l e x p r o b l e m i n v e r i f i c a t i o n w o r k . T h e t h i r d b o x i n F i g u r e 1.1 r e p r e s e n t s i n f o r m a t i o n p r o d u c t s . T h e s e c a n i n c l u d e a n y t h i n g f r o m s i m p l e d a t a o v e r l a y s (e.g. , a s o i l a n d o w n e r s h i p l a y e r c o m b i n a t i o n ) t h r o u g h to c o m p l e x s i m u l a t i o n m o d e l s t h a t u s e m a n y i n p u t s . A g a i n , t h e f e e d b a c k l o o p s a r e w e l l e s t a b l i s h e d fo r s t a n d a r d m o d e l -l i n g — o n e s i m p l y c o m p a r e s m o d e l o u t p u t w i t h r e a l i t y t h r o u g h a s a m p l i n g s c h e m e . H o w e v e r , m e t a -d a t a p r o p a g a t i o n p r o c e d u r e s fo r b o t h s i m p l e a n d c o m p l e x m o d e l s a r e a r e l a t i v e l y n e w a r e a o f i n q u i r y . T h i s r e f e r s to t h e p r o c e s s o f c o m b i n i n g t h e m e t a d a t a a s s o c i a t e d w i t h t h e m a j o r i n p u t s to a m o d e l i n s u c h a m a n n e r t h a t t h e r e s u l t a n t i s r e p r e s e n t a t i v e o f t h e m o d e l ' s o u t p u t u n c e r t a i n t y . T h e r e a r e n o e s t a b l i s h e d m e t h o d s fo r c o m p a r i n g t h e s e c o m p l e x m e t a d a t a w i t h t h e v a r i a b i l i t y t h a t e x i s t s i n r e a l i t y . W h i l e a n u m b e r o f m e t a d a t a p r o p a g a t i o n m o d e l s h a v e b e e n a t t e m p t e d (e.g. , D u n n et al. 1 9 9 0 ) , t h e r e h a v e b e e n l i t t le o r n o a t t e m p t s to v e r i f y t h e i r u t i l i t y . T h e a c t o f d e c i s i o n - m a k i n g b a s e d o n i n f o r m a t i o n p r o d u c t s i s r e p r e s e n t e d i n t h e f o u r t h b o x . T h e f e e d b a c k l o o p h e r e i s t h e e v a l u a t i o n o f t h e d e c i s i o n s — w h i c h w o u l d t y p i c a l l y l e a d to b e t t e r i n f o r -m a t i o n p r o d u c t s . T h e m e t a d a t a l o o p h e r e f o c u s e s o n e v a l u a t i n g t h e q u a n t i f i c a t i o n , s u m m a r y a n d p r e s e n t a t i o n o f ' r i sk ' a s i t a f f e c t s d e c i s i o n s . T h e r e s e a r c h p r e s e n t e d i n t h i s d o c u m e n t i n c l u d e s c o n t r i b u t i o n s to e a c h o f t h e s t a g e s p r e s e n t e d i n F i g u r e 1 . 1 , a s s u m m a r i s e d a t t h e b o t t o m o f t h e f i g u r e . T h e k e y c o n t r i b u t i o n s f o c u s o n t h e f i r s t t w o f e e d b a c k l o o p s ( l a a n d 2a) . W h i l e t h e t h i r d l o o p ( d e c i s i o n e v a l u a t i o n ) w i l l b e d i s c u s s e d a n d d e m o n s t r a t e d , a p r o p e r e v a l u a t i o n w o u l d r e q u i r e a f a r l a r g e r p r o j e c t s c o p e . T h e s p e c i f i c q u e s t i o n s a s k e d i n t h i s r e s e a r c h a r e : 1. W h a t a r e a p p r o p r i a t e m e t h o d s fo r m o d e l l i n g d a t a u n c e r t a i n t y i n n a t u r a l r e s o u r c e m a n a g e -m e n t , m a k i n g u s e o f i n f o r m a t i o n t y p i c a l l y a v a i l a b l e ? 2 . H o w a p p r o p r i a t e a r e t h e s e m e t h o d s (1), a n d h o w c a n t h i s ' a p p r o p r i a t e n e s s ' b e d e t e r m i n e d ? S p e c i f i c q u e s t i o n s i n c l u d e : 2 a . H o w ef fect i ve i s g a t h e r i n g m e t a d a t a f r o m e x p e r t o p i n i o n ? 2 b . H o w ef fect i ve i s g a t h e r i n g m e t a d a t a f r o m p u b l i s h e d v a r i a b i l i t y s t a t i s t i c s ? 3 . W h a t a r e a p p r o p r i a t e m e t h o d s fo r p r o p a g a t i n g t h e s e m e t a d a t a t h r o u g h to i n f o r m a t i o n p r o d -u c t s ( i .e. , u s i n g a t y p i c a l t y p e o f n a t u r a l r e s o u r c e m o d e l ) ? 4 . H o w a p p r o p r i a t e a r e t h e s e m e t h o d s (3), a n d h o w c a n t h i s ' a p p r o p r i a t e n e s s ' b e d e t e r m i n e d ? 5 . W h a t a r e s o m e o f t h e i m p l i c a t i o n s o f t h e m e t h o d s o u t l i n e d a b o v e fo r r e s o u r c e m a n a g e m e n t d e c i s i o n m a k i n g ? T h e p r i n c i p a l f o c u s o f t h i s w o r k i s a n s w e r i n g q u e s t i o n s #2 a n d 4 — t h o s e c o n c e n t r a t i n g o n v e r i f i -c a t i o n o f m e t a d a t a . T h e p r i n c i p a l f o c u s o f t h e d i s c u s s i o n s e v o l v i n g f r o m t h e s e a n s w e r s i s t h e i m p l i c a t i o n s f o r m a n a g e m e n t — q u e s t i o n #5. Q u e s t i o n s #1 a n d #3 w i l l b e a d d r e s s e d ; h o w e v e r , t h e d i s c u s s i o n w i l l p a r t i a l l y d r a w u p o n r e s e a r c h c o n d u c t e d p r e v i o u s l y b y t h e a u t h o r . T h i s r e s e a r c h m a k e s u s e o f s p e c i f i c m o d e l s a n d a s p e c i f i c r e s o u r c e s e c t o r . T h e m o d e l s (s lope s tab i l i t y ) a n d t h e r e s o u r c e s e c t o r ( forest ry m a n a g e m e n t ) h a v e b e e n c h o s e n f o r t h e i r b r o a d a p p l i -c a b i l i t y . T h e t h e o r y a n d p r o c e d u r e s d e v e l o p e d h e r e i n c a n b e a p p l i e d to a w i d e r a n g e o f m o d e l s a n d m a n a g e m e n t r e g i m e s . L i n k s to t h e b r o a d e r f i e ld o f ' r e s o u r c e m a n a g e m e n t ' a r e n o t e d t h r o u g h o u t t h e d o c u m e n t . 1 .3. R E S E A R C H O R G A N I S A T I O N T h i s d i s s e r t a t i o n i s o r g a n i s e d a s f o l l o w s : a f t e r t h i s i n t r o d u c t i o n , t h e s e c o n d c h a p t e r c o n t a i n s t h e b a c k g r o u n d a n d s p e c i f i c r e s e a r c h j u s t i f i c a t i o n fo r a l l t h a t f o l l o w s . It i s d e m o n s t r a t e d t h a t t h e t r a d i t i o n a l m e t h o d s o f m o d e l l i n g n a t u r a l r e s o u r c e s a r e i n a d e q u a t e . U n c e r t a i n t y , u n c e r t a i n t y m o d e l l i n g , u n c e r t a i n t y p r o p a g a t i o n , a n d l i n k s w i t h m a n a g e m e n t a n d d e c i s i o n m a k i n g i n t h e r e s o u r c e s e c t o r a r e e a c h e x a m i n e d i n t u r n . T h e v a r i o u s r e s e a r c h f i e l d s a r e d e s c r i b e d w i t h a n e y e 6 to h i g h l i g h t i n g t h e f a c t o r s t h a t l i n k t h e m . T h e t h i r d c h a p t e r p r e s e n t s t h e m e t h o d o l o g y a n d r e s u l t s o f e a r l i e r u n c e r t a i n t y m o d e l l i n g w o r k u n d e r t a k e n b y t h e a u t h o r — w o r k t h a t d e v e l o p e d t h e b a s i c f o r m s o f t h e m o d e l u s e d i n t h e r e m a i n d e r o f t h i s d o c u m e n t . In t h i s c h a p t e r , o n e s p e c i f i c m e t h o d o f m o d e l l i n g a n d p r o p a g a t i n g n a t u r a l r e s o u r c e u n c e r t a i n t y i s u s e d to h i g h l i g h t t h e p o t e n t i a l o f t h e f i e ld . T h e f o l l o w i n g c h a p t e r p r e s e n t s n e w t h e o r e t i c a l a n d a p p l i e d w o r k o n v e r i f y i n g u n c e r t a i n t y m o d e l s a n d s a m p l i n g f o r u n c e r t a i n t y v a l u e s . T h e m o d e l p r e s e n t e d i n t h e p r e c e d i n g c h a p t e r i s g r o u n d -t r u t h e d , a n d t e c h n i q u e s a r e d e v e l o p e d to t u n e t h e m o d e l s o t h a t i t s r e s u l t s m a t c h t h e s e g r o u n d d a t a C h a p t e r F i v e m o v e s to t h e n e x t s t a g e o f v e r i f i c a t i o n — w i t h a f o c u s o n t h e m e t a d a t a generated b y t h e m o d e l l i n g p r o c e d u r e . T h e o u t p u t o f t h e m o d e l d e v e l o p e d i n C h a p t e r T h r e e i s v e r i f i e d u s i n g f i e ld d a t a . M e t h o d s a r e d e v e l o p e d to a d d r e s s t h e c o m p a r i s o n o f v a r i a b i l i t y - f o c u s e d m e t a d a t a w i t h y e s - o r - n o c o n f i r m a t i o n d a t a . T h e f o l l o w i n g c h a p t e r t a k e s a f u r t h e r s t e p u p i n m e t a d a t a c o m p l e x i t y b y e x a m i n i n g s o m e p o s s i b l e m e t h o d s o f i n t e g r a t i n g u n c e r t a i n t y m a n a g e m e n t i n t o r e a l - w o r l d d e c i s i o n m a k i n g . T h i s d i s c u s s i o n c h a p t e r p r e s e n t s e x a m p l e s d e v e l o p e d d u r i n g t h e v e r i f i c a t i o n w o r k , w h i c h a l s o s e r v e to d e m o n -s t r a t e s o m e p o s s i b l e a p p l i c a t i o n s o f t h i s r e s e a r c h i n d e c i s i o n s u p p o r t . I m p l i c a t i o n s for m a n a g e -m e n t i n b o t h f o r e s t r y a n d r e s o u r c e m a n a g e m e n t i n g e n e r a l a r e d i s c u s s e d . R e c o m m e n d a t i o n s for f u t u r e r e s e a r c h a s a l s o p r e s e n t e d h e r e . T h e f i n a l c h a p t e r s u m m a r i s e s t h e o v e r a l l c o n c l u s i o n s . 1 . 4 . C O N T R I B U T I O N T O K N O W L E D G E T h i s r e s e a r c h c o n t r i b u t e s to k n o w l e d g e i n t h e f i e ld o f u n c e r t a i n t y a n a l y s i s o n b o t h t h e o r e t i c a l a n d a p p l i e d f r o n t s . T w o m a j o r t h e o r e t i c a l a r e a s a r e e x p l o r e d : 1) t h e i n t e g r a t i o n o f c o n f i r m a t o r y s a m -p l i n g i n t o a s p a t i a l l y v a r i a b l e u n c e r t a i n t y m o d e l , w i t h a f o c u s o n t h e r e p r e s e n t a t i o n o f f u z z y c l a s s e s i n a t t r i b u t e s p a c e a n d t h e d e v e l o p m e n t o f s e v e r a l m e a s u r e s o f c o m p a r i n g s a m p l e s a n d c l a s s e s ; a n d 2) t h e d e v e l o p m e n t o f m e t h o d s for v e r i f i c a t i o n o f u n c e r t a i n t y m o d e l o u t p u t , w i t h a f o c u s o n c o m p a r i n g c o m p l e x m u l t i v a r i a t e v a r i a b i l i t y d a t a w i t h g r o u n d s a m p l e s . A s a s e c o n d a r y c o n t r i b u t i o n , p r o c e d u r e s a r e a l s o d e v e l o p e d fo r i n t e g r a t i n g o b l i q u e d a t a w i t h p l a n i m e t r i c d a t a -7 b a s e s i n G I S , w i t h a f o c u s o n c a p t u r i n g a n d u s i n g r e g i s t r a t i o n a n d d i g i t i s i n g u n c e r t a i n t y a s s p a t i a l l y - v a r i a b l e m e t a d a t a . C h a p t e r T w o w i l l d e t a i l h o w t h e s e t h e o r e t i c a l c o n t r i b u t i o n s f it i n t o t h e o v e r a l l d i s c i p l i n e o f u n c e r t a i n t y r e s e a r c h . A p p l i e d c o n t r i b u t i o n s i n c l u d e : 1) v e r i f i c a t i o n o f ' e x p e r t k n o w l e d g e ' o n s o i l s p a t i a l s t r u c t u r e i n a s l o p e s t a b i l i t y m o d e l ; 2) d e v e l o p m e n t a n d e v a l u a t i o n o f a s e t o f t o o l s fo r m e a s u r i n g s p a t i a l p a r a m -e t e r s f r o m o b l i q u e i m a g e d a t a ; 3) t e s t i n g o f a s l o p e s t a b i l i t y u n c e r t a i n t y m o d e l u s i n g a t e m p o r a l l a n d s l i d e d a t a b a s e d e v e l o p e d w i t h t h e s e t o o l s ; a n d 4) e v a l u a t i o n o f t h e m o d e l l i n g t e c h n i q u e s a n d t o o l s d e v e l o p e d i n 1 - 3 t h r o u g h a c a s e s t u d y . U n c e r t a i n t y i s a c r u c i a l i s s u e i n r e s o u r c e i n v e n t o r y d a t a , a s w e l l a s a l m o s t a l l o t h e r t y p e s o f s p a t i a l d a t a . Y e t i t i s u n l i k e l y t h a t a g e n e r a l p u r p o s e ' e r r o r b u t t o n ' w i l l e v e r b e d e v e l o p e d — t h e p r o b l e m s f a c e d a r e t o o d i v e r s e a n d a l s o too a p p l i c a t i o n - s p e c i f i c . T h e r e a r e s o m a n y w a y s o f a p -p r o a c h i n g t h e p r o b l e m s o f u n c e r t a i n t y m o d e l s , v i s u a l i s a t i o n , s a m p l i n g , e t c . , t h a t d e v e l o p m e n t s i n o n e o f t h e s e a r e a s i s r a r e l y a p p l i c a b l e to o t h e r s . T h e r e f o r e , b y f o c u s i n g o n a c r u c i a l s e t o f p r o b -l e m s (ver i f icat ion) a n d i n t e g r a t i n g t h e m i n t o a ' c r a d l e - t o - g r a v e ' u n c e r t a i n t y m a n a g e m e n t t a s k , it i s h o p e d t h a t t h i s r e s e a r c h w i l l b o t h d e m o n s t r a t e a n d i n c r e a s e t h e u t i l i t y o f u n c e r t a i n t y m o d e l -l i n g for n a t u r a l r e s o u r c e m a n a g e m e n t i n g e n e r a l . 8 Chapter Two Background 2 . 1 .INTRODUCTION 'Gulliver's Travels' contains the story of a cartographer from a small kingdom. In his quest for greater and greater mapping accuracy, he created maps at larger and larger scales. Eventually, he found himself working at a 1:1 scale; unfortunately, there was no kingdom left to describe—it was filled with his creation. Maps and spatial databases are an abstraction of reality. Details are filtered out in order to clarify information. The 1:1 scale map rather defeats this purpose (as well as being somewhat difficult to fold). Sampling, filtering—in fact any abstraction—leaves a gap between representation and reality. This gap, in essence, is data uncertainty. In the days when cartographers risked their lives in leaky boats trying to chart the unknown reaches of the world, uncertainty was, to say the least, very high. Yet no ship captain expected to use these charts in any precise way. It was sufficient to know that there was a big piece of land somewhere in a general westerly direction. Blank areas or straight lines meant unknowns. If a map of a particular piece of coastline was drawn in some detail, you would expect that it roughly corresponded to reality, yet you would be foolish to bet your life on the location of a particular shoal. Uncertainty was built into the structure and conventions of map making. In any case—any map was better than no map at all. 9 As cartography matured and information on maps became more precise, the issue of uncertainty still remained largely part of the map-making process. If the source data did not support it, then a competent cartographer would simply not draw a 1:5,000 map. Within a map, the thickness and style of lines could be used to indicate local spatial uncertainty, or uncertainty in a particular class of objects. Other visual techniques could be utilised to draw attention to degraded or miss-ing information, or source data that had a limited life-span. While control of map making re-mained within the hands of cartographers, uncertainties were largely understood. Then came the computer 'revolution'; the control of spatial data began to slip out of the cartogra-phers' hands. In a short span of years the production of maps—formerly solely the realm of specialists—became possible using simply a home computer; a six-year old (or a newspaper col-umnist) became able to turn out professional looking maps with the touch of a button. Yet such software is only capable of imitating what was the simplest part of the cartographer's job: the mechanical drawing of the map. Communication skills are not so easily emulated. Computers also took the analysis of spatial data out of the cartographer's hands. Automated area and perim-eter calculations soon gave way to data overlays, topographic analysis and spatial statistics—all of which are available to anyone who knows which button to press. In the areas of both cartogra-phy and spatial analysis the control of data uncertainty was wrested from the arms of cartogra-phers, and soon began to be a problem; or, to be more precise, a non-problem—it was virtually ignored. 2.1.1. GIS AND UNCERTAINTY Some cartographic and most analytical tasks formerly performed by cartographers have been passed on to geographic information systems (GIS). These often massive programs have revolu-tionised the manipulation of spatial data. However, some of the basic assumptions and structures built into GIS foster this 'non-problem' of data uncertainty. A GIS enables spatial data to be viewed and manipulated at virtually any scale. It is, in fact, a scaleless working environment. If addressed at all, scale information will normally only accompany data for display purposes (e.g., what size to print a label). A typical GIS stores data at a resolution that is capable of locating a point down to a tolerance of less than the width of a hydrogen atom (using double precision with 10 a local co-ordinate system). It also typically reports all information: co-ordinates, areas, etc., with the same excessive precision. These two characteristics—lack of scale and the reporting of ex-treme precision—virtually eliminated the implicit recognition of uncertainty found in manual cartography. After a number of years and a considerable amount of frustration on the part of users, GIS and spatial analysis researchers began to examine how this implicit uncertainty could be made ex-plicit. It took many more years for basic conceptual work to appear, in which the nature of spatial data uncertainty was examined, terms were defined, and eventually standards put in place. This field of inquiry is still in its infancy, due in part to the complexity of the problem, and also in part to the reluctance of users to accept the fact that an answer of lesser precision can be more 'correct'. 2.1.2. C H A P T E R L A Y O U T This chapter examines the current state of research into error and uncertainty in spatial data as it relates to the various elements of resource inventory, with a specific focus on forestry aspects. It does not attempt to trace the evolution of thought in this field, as this 'evolution' does not represent a co-ordinated effort towards a clear goal. Instead, what appears in the literature is a haphazard series of incremental steps on many diverse fronts towards numerous disparate goals. Therefore, the chapter is organised in a manner reminiscent of the overall document. Terms and basic concepts are defined and presented first, followed by a discussion of some of the major areas of application. Relevant research into the modelling of uncertainty is then presented, with later sections focusing on output issues relevant to this added dimension of data. The final section details research into uncertainty in natural resource management—in particular forestry data— and its implications for this current research. 2 . 2 . E R R O R A N D U N C E R T A I N T Y Geographic information systems can be found on the desks of public utility planners, natural resource scientists, and almost anyone else concerned with spatially referenced data. Utility man-agement focuses primarily on straight lines and unambiguous locations. Detailed GIS co-ordi-11 nates and precise analytical routines cater to a utility manager's desire to see the world in black-and-white. In contrast, when the same data structures and models are utilised in the resource sector, they potentially bear little resemblance to the spatial characteristics of the information being captured or modelled. Here, entities in question might be better represented by shades of grey. This dichotomy between imprecise reality and its precise digital representation has given rise to a rapidly growing research area in which spatial data experts grapple with the implications of uncertainty and error analysis, while cartographers focus on the special problems of communi-cating uncertainty. As this field has developed, researchers have fanned out across a broad front: advancing error analysis (Chrisman 1989; Chrisman 1991), locational and feature uncertainty analysis (Burrough et cd. 1992), error propagation methods (Heuvelink and Burrough 1993), the visualisation of uncertainty (MacEachren 1992; Goodchild et al. 1994), and numerous related topics. The self-referential problem of uncertainty about uncertainty terminology has been a notable stumbling block in this avenue of inquiry. The primary terms, namely 'error' and 'accuracy', are commonly used interchangeably—compounding the problem. Most writers would agree that 'er-ror' refers to deviations from a 'true' value. Almost all resource data contain some degree of error; however, as the 'true' value is generally unknown, the error cannot be easily quantified and stated in the same manner as errors in a numerical model might be. The term 'accuracy', as defined by Buttenfield and Beard (1994), is a more easily quantifiable alternative. In their definition it refers to measures of discrepancy from a modelled or assumed value. 2.2.1. U N C E R T A I N T Y D E F I N E D Here, the term 'uncertainty' is utilised to include both of the above, as well as to extend these concepts. In its broadest sense, uncertainty refers to knowledge of possible deviation from a 'true' value, but without precise knowledge of the magnitude. It is not as broad a term as data 'quality' which, in its commonly accepted definition (Guptill and Morrison 1995) includes items that can-not be subjected to test or verification, such as lineage or completeness. Uncertainty may exist for 12 many reasons: inability to measure precisely, alterations in values during processing (e.g., ma-nipulation or classification) or, at a more fundamental level, natural variability in the phenomena being measured. Uncertainty is not necessarily an absolute, since the resolution of the dataset or analysis may be a factor. 2.2.2. Q U A L I T Y Uncertainty is the focus of this work, yet the other elements of data quality play important sup-porting roles. Before delving into the subdivisions of uncertainty, it is important to step back and put it in context with other metadata elements. As introduced earlier, a U.S. committee on data standards (NCDCDS 1988) produced an influential document that attempts to categorise the major elements of data quality. It includes the following: 1. l i n e a g e : the history of the data and the operations performed on it; 2. c o m p l e t e n e s s : the extent of data coverage (spatial or attribute) relative to the complete real-world object (e.g., the subset of soil attributes in the database relative to all possible attributes) ; 3. p o s i t i o n a l a c c u r a c y : the closeness of spatial co-ordinates to the 'true' values (or values accepted as true); 4. a t t r i b u t e a c c u r a c y : as above, but with reference to the attributes of the spatial location; 5. l o g i c a l c o n s i s t e n c y : for example, the appropriateness of a chosen classification scheme; and 6. t e m p o r a l i n f o r m a t i o n : includes references to periodicity, the temporal range (shelf life) of the data, and other relevant descriptive temporal elements. These categories function well in a descriptive sense, but are not oriented towards implementa-tion of quality tracking or analysis. In fact, the descriptive nature of these categories has prompted many agencies to implement metadata through descriptive add-ons to their spatial and attribute GIS layers or data products. The US Geological Survey (USGS) and many Canadian government agencies have taken this approach. Given the metadata files and some interpretative information, a knowledgeable user can make general decisions about the utility of a data product for their 13 purposes—sometimes. Lack of standards makes it difficult to compare products from different agencies, or at times even within a single agency. Positional accuracy can usually be described with a small set of numbers; however, an item such as 'logical consistency' can be interpreted in many ways. Three items from this list do lend themselves to a more quantitative implementation: positional accuracy, attribute accuracy, and temporal information. Only when metadata such as these are stored quantitatively does it become possible to mathematically manipulate these values, to follow them through overlays or models, and to present them visually. A qualitative understanding of the implications of these metadata is still an important ingredient, for only then can a user determine data's fitness for use; however, quantitative metadata expands the utility of this infor-mation considerably. A focus on numerical aspects of quality brings the discussion back to the realm of uncertainty. 2.2.3. SUBDIVISIONS OF UNCERTAINTY By limiting this discussion to the measurement of natural resource data, uncertainty can be subdivided into these three broad areas: positional, attribute, and temporal. Positional uncer-tainty is a well defined topic area in geomatics. Those making positional measurements, notably surveyors, are accustomed to imagining a bell curve of uncertainty that exists along both the horizontal axes (Figure 2.1) and on the vertical axis as well. Much of surveying science is designed to minimise these uncertainty envelopes. However, when the study of uncertainty is expanded to more complex objects than 'points', numerous other issues appear. For example, how does uncer-tainty vary along the line drawn between two known points? How does uncertainty behave in an Figure 2.1. Location probability of a survey coordinate in 2-D space. 14 overlay of two data layers? The following sections present the primary sources of uncertainty in each of these areas, and provide an overview of research into these topics. 2.2.3.1. POSITIONAL UNCERTAINTY Uncertainty in position is certainly the most tractable of the issues discussed here. It is the basic problem that cartographers dealt with through line size and scale, and one that continues to occupy much of the efforts of geomaticians and surveyors. The three co-ordinates used to define a point in space may be mathematically compared with 'true' values, providing simple measures of positional uncertainty. Of course, the elusiveness of'true' values compounds this problem, as does measurement accuracy, data entry error, etc. 2.2.3.1.1. Registration Positional uncertainty is an issue at several points in the processing of spatial data. The first of these is the registration of the data source to a reference value. This might involve registering an air photo using survey markers and ground control points, or the registration of a satellite image to a reference dataset. Survey markers are undoubtedly the best spatial reference point available. First and second order control points are established with extremely high accuracy, using math-ematics that correct for the earth's curvature, as well as triangulation within the survey grid. The co-ordinates of a control point are subject to errors in the reference datum and ellipsoid; however, the magnitude of these errors is very small in a local context. High order survey control points are rarely used in the registration of satellite images or air photos. Lower order control points, GPS derived control points, or existing planimetric dataset points are more commonly used. Each of these sources is subject to various inaccuracies. Lower order control points are not subject to strict controls over placement, and may have positional errors significantly higher than their 'parents'. GPS points are subject to the many types of error associated with GPS data (see Owens and McConville 1996), and existing datasets have already been subject to registration, and so act to compound uncertainty. The process of image registration might involve simply shifting the image until the control points line up with minimal error, skewing the image (independent x and y stretching), or 'rubber-15 sheeting', which allows every co-ordinate in the image to shift. Each of these methods generate non-uniform registration error for every point in the image. However, these values are normally summarised in a single value such as root mean squared (RMS) error (and then typically ignored). Recent research has begun to address this loss of information during registration. New methods of performing and summarising registration accuracy have been developed (Mather 1995; Buiten and van Putten 1997), and methods of employing multiple representations are proving useful (Djamdji 1993; Fonseca and Manjunath 1996). A key element is not to simply perform the best possible registration, but to also maintain the uncertainty information for later processing (e.g., Delavar 1997). 2.2.3.1.2. Other Sources Positional uncertainty also occurs at later stages of spatial data processing; however, typically the entities being manipulated are of a higher order than points. Both line and area entities (vectors and polygons) are built out of point data, yet involve uncertainties of a different nature (see below). Raster datasets are somewhat simpler; however, uncertainties generated during the ma-nipulation of raster datasets can be complex in nature. 2.2.3.1.3. Lines and Areas During the period of transition from paper to digital datasets (still continuing in some sectors), manual digitising was the primary method of vector data input. Studies of digitising uncertainty constitute a major part of the field of uncertainty analysis. One of the first important works in this field (Perkal 1966) introduced the concept of an 'epsilon band', which later led to the epsilon distance model of cartographic lines (Peucker 1975; Chrisman 1982). In this model the assump-tion is that a cartographic line (i.e., the proper loca-tion of a feature) is surrounded on each side by an .,. u , u Width of the |grr~srr-^ epsilon band ' £*^Z^SjS^^ area of constant width epsilon, and that the digitised ^ ^ ^ ^ v - ° ' 9 "3l ^ representation of the line will lie somewhere within Digitised une -that area (Figure 2.2). The distribution of the location Figure 2.2. Epsilon boundary model of of the lines may then be described by some type of a digitised line, where the true location of the line is assumed to lie within the distribution function, such as a probability density epsilon band 16 function (PDF). A number of functions are possible, depending on the assumptions used (Figure 2.3). The epsilon band can use various measures, such as maxi-mum deviation (range), interquartile range, or some other. There is no single accepted definition of this model. It is explored in some depth by several au-thors (e.g., Dunn et al. 1990; Chen and Finn 1994). Others have gone on to examine the structure of un-certainty on the line segment between the digitised points, allowing the epsilon band some flexibility (Chrisman 1982; Dutton 1992). (a) 1 (b) 1 (c) • True Line Figure 2.3. Examples of probability den-sity functions for line or digitising error: (a) rectangular, (b) bell-shaped, (c) bimo-dal (Adapted from Dunn et al. 1990) Most recently, stochastic methods of addressing uncertainty in vector objects have begun to appear. For example, Hunter and others (Hunter 1995; Hunter et al. 1996) have developed a method of imposing controlled stochastic changes in the spatial location of all vector objects, allowing stochastic simulation in a vector environment. Others (Youcai and Wenbao 1997) utilise similar methods to address digitising error specifically. Other methods, such as moving bands, spatially autoregressive and Markov processes have been described (see Haining et al. 1983). Most of the above research applies specifically to database objects representing linear features in the real world. However, database vectors are also used to represent more abstract structures such as soil polygons. Here, uncertainties in spatial locations still apply, but are generally ren-dered insignificant due to the magnitude of uncertainty generated through the abstraction of reality: namely attribute uncertainty. 2.2.3.2. ATTRIBUTE UNCERTAINTY Positional and attribute information are stored separately in most spatial data models. The un-certainties in each are often determined by quite different processes, and in this case are termed separable (Goodchild 1991). For example, the boundary of a clearcut might be digitised from an orthophoto, with the accompanying registration and digitising error; yet the attributes are deter-17 mined through interpretation and field sampling. Uncertainties in these values have no bearing on spatial uncertainty. However, initial forest inventories do not necessarily contain obvious spatial discontinuities be-tween stands. In a typical procedure the stand polygons might be outlined on imagery, followed by an iterative process of ground surveys leading to changes in stand boundaries. Here, positional and attribute uncertainty are not separable, as the process of determining attributes is linked to the process of determining boundaries. A forest cover map would typically contain a mix of sepa-rable and inseparable uncertainties. Most research treats both types as completely separable, although a few studies (notably Mark and Csillag 1989; Brimicombe 1993) attempt a synthesis. It should be noted that many of these problems of separability are partially a result of, or com-pounded by, the data model utilised. Spatial data may be modelled in two basic ways: as discrete objects (e.g., vector model or object-oriented model) or as continuous fields (e.g., raster model). When fields are used, the attributes modelled are usually not sharply bounded, and so separabil-ity is less of an issue (Goodchild 1989). In contrast to positional uncertainty, attribute uncertainty has received considerably less atten-tion in research and analysis. This is hardly surprising, as the dimensions of the problem are vast, and many of the uncertainties resist easy quantification. Attribute uncertainty can arise in several general ways. Error or imprecision in field measurements is perhaps the simplest to deal with. More complex are uncertainties generated due to the way the data object represents com-plex reality. For example, a point object might be used to represent a city at a particular scale. Polygons are often used to represent transitions between soil types. The transition, which extends over some distance, is represented by a sharp discontinuity. The attributes assigned to such a polygon will have varying degrees of validity throughout its spatial extent (Mark and Csillag 1989). The simplification of reality necessary for data storage and modelling imposes these types of attribute uncertainty. Simplification of attributes themselves also generates uncertainties. The process of classification splits a continuous reality into discontinuous parcels for ease of sampling, storage and analysis. 18 However, the taxonomy involved may produce an incomplete or even misleading description of the attribute. Problems include: Internal purity: the degree to which a random sample matches the class descriptor can be abys-mally low for some data types—notably soils. Class boundaries: some classes may be functionally similar, visibly similar, or both. Other classes may be extremely distinct. Rigid class boundaries do not allow this distinction. A sample is either class A or B, even if its' properties are similar to both. Sampling error: field or laboratory error introduces a random element into classification or study comparison. Although multivariate statistical techniques can often ameliorate such problems, resulting indices or principal component scores are not necessarily easy to interpret (Burrough 1989). Once attributes are combined in a modelling scenario these uncertainties can play havoc with the results. Within a particular speciality the user often has some implicit understanding of the attribute uncertainties. However, when complex environmental models draw from numerous dis-ciplines for their source data, a lack of attention to uncertainty at the source leaves the model's user with no choice but to trust the datasets implicitly. It then becomes difficult to estimate even the general variability in the results. Some researchers have proposed alternative data structures that better describe attributes with-out abandoning the concept of a categorical coverage. Spline values might be used to describe environmental gradients within a polygon (Herring 1991). Fuz2y classification methods that rec-ognise the variability within and between classes have been proposed (Hall and Wang 1992; Burrough et al. 1992) and implemented in numerous disciplines, such as forestry (Palubinskas 1994) and earth science (Du and Lee 1996). Others have noted the inadequacy of vector-based categorical data structures to model many types of natural resources. Vector structures were the only alternative when computing power was relatively minimal relative to the amount of data in a large inventory. Today, as many data 19 sources are raster based, the advantages of continuous data structures are becoming obvious. Researchers such as Mark and Csillag (1989) point to the advantages of a native raster structure to represent both spatial and attribute uncertainty. Implementations include soil databases (Rogowski 1996), model propagation (Mowrer 1995) and fire modelling (Delavar 1997) among others. 2.2.3.3. TEMPORAL UNCERTAINTY Other than subsurface geologic maps, most natural resource databases are dynamic to some degree. The nature and extent of temporal uncertainty will vary with the resource being mapped. The principal sources of temporal uncertainty include: 1) gradual change, such as tree growth, succession, or urban expansion; 2) cyclical change, such as variations in deciduous canopy be-tween summer and winter; and 3) uncertainty due to measurement, where measurements may be spread over time, or analysis is delayed relative to measurement. The first of these—gradual change—is of primary interest in forestry and most other types of GIS analysis. Forestry is a particularly good example of temporal change in a heterogeneous environment. Gradual change in forest inventory is normally accounted for using a periodic inventory cycle. In British Columbia (BC) the cycle is approximately two years; cycle times in other areas vary (e.g., the Province of Quebec uses ten years). Natural forests are spatially heterogeneous, making stand delineation an uncertain and often unrepeatable exercise. Estimates of map accuracy from pho-tointerpretation of forested areas indicate that disagreement is as high as fourty to fifty percent (Edwards and Lowell 1996). The natural heterogeneous forest also changes in different ways, and at different rates, making it a challenge to model. Temporal uncertainty models for forestry must focus on several issues. Of particular importance is the variability in stand boundaries over time, due to both change in the forest and photointer-pretation uncertainty. Also important is the uncertainty in model results. For example, a growth model is based on data collected at a particular date from point samples within a forest stand polygon. The output of this model develops greater uncertainty as time passes; the uncertainty being based solely on model precision. While this uncertainty is commonly reported, it is rarely 20 integrated with uncertainty in data collection, uncertainty in volume estimates, uncertainty in other stand attributes, and boundary variability. The resulting change in uncertainty over time is a function of spatial and attribute variability over time, as well as built-in model uncertainty. Although pure computer research deals with temporal uncertainty databases separately, as in data structures (Kanazawa 1994), or computer vision (Rohrer and Sparks 1993), natural resource research focuses on integration of spatial and temporal elements. Examples include forestry work (Lowell et al. 1996), soils (Or and Hanks 1992), and fisheries (Hougard and Valdez 1994). 2.2.4. SUBDIVISION BY SOURCE Spatial, attribute and temporal uncertainties are clearly linked in complex ways. Although this three-way subdivision may be useful for basic research, from an operational perspective it is more beneficial to subdivide them by the sources of uncertainties. Where uncertainties are currently addressed in spatial database management it is usually in this manner. From a source perspec-tive there are three main areas: inherent uncertainty, uncertainty in data collection and input, and uncertainty in data manipulation. 2.2.4.1. INHERENT UNCERTAINTY Natural vagueness, also referred to as conceptual error (Veregin 1989), inherent uncertainty (Lanter and Veregin 1992) and inherent property error (Maffini et al. 1989), occurs in data that possess no standard for comparison of measurement. Beyond a certain point, increases in sampling density or in the precision of instruments used will not result in any increase in information content. Natural vagueness may be due to natural variations in the source or due to an inability of the chosen data model to fully encompass all properties of the source. Soil polygons are a commonly cited example of this problem (e.g., Burrough 1986b; Kollias and Voliotis 1991; Burrough 1993) due to the inability of the polygonal structure to address gradual changes over space. Other examples of natural vagueness include mobile species, seasonal fluc-tuations, and quantities that simply cannot be measured with available techniques. For example, 21 radar images of an ice pack may show well-defined lines, yet the constant movement of the ice introduces uncertainty. 2.2.4.2. DATA COLLECTION AND INPUT UNCERTAINTY Two field scientists, independently studying the same resource, will rarely generate the same data. This is a well-recognised phenomenon, and many data gathering techniques are designed specifically to minimise this observer bias. However, it remains an issue in uncertainty manage-ment because there is rarely sufficient time or money to complete the number of samples needed to minimise such bias. When 'judgement calls' are made, they introduce a subjective element that is very hard to quantify without repeating the entire process. The problem of observer bias will vary in severity between different classes of resource survey and different resources. Rapid recon-naissance surveys, such as shore-zone typing (e.g., Howes et al. 1994), will be particularly sus-ceptible to this effect. 2.2.4.3. DATA INTERPRETATION The precision of sampling instruments is rarely a problem in natural resource surveys. Data interpretation uncertainty arises when the data generated by those instruments are manipulated into a form suitable for storage and analysis. Field samples often require the application of a variety of inference techniques to estimate the distribution between sample sites. For example, soil or forestry point samples on the ground are combined with spatial parameters inferred from remote sensing to derive a polygonal distribution. Although the point samples may be precise to the n"1 decimal place, their purpose is to describe a local average of conditions. A good sampling scheme will pick up the main components of this variation; however, between the samples the data are always inferred. Once the procedures are complete, an inferred data point is indistin-guishable from a sampled one. The data gathering and interpretation stages are also subject to mistakes in the sampling proc-ess. Although commonly termed 'errors', the narrow definitions employed in this research field require a separate term to describe accidents or mistakes, rather than deviations from a known value. The term most commonly used is 'blunders' (although this departs from the dictionary 22 definition which focuses on 'gross mistakes'). Such blunders can occur in the field (mislabelling, misreading, wrong position, etc.) or in lab analysis. The statistical likelihood of such blunders can be estimated, providing further information about uncertainty in source data. When remotely sensed data are the primary data source there are a number of other inference uncertainties to account for. Air photos commonly have several geometric problems that must be corrected prior to use, including tilt displacement, radial displacement and topographic displace-ment. They are also subject to distortions such as atmospheric refraction, lens irregularities, film or print shrinkage and image motion. The process of correction is often one of inference based on other information (e.g., correcting topographic displacement with elevation data) and can intro-duce its own uncertainties into the process. Satellite images can only measure the reflectance of an object to radiation at various wavelengths, and therefore introduce uncertainty in the interpretation of these data. There are also a number of factors that intervene between the source and the sensor, such as atmospheric haze, path radiance, or variations in solar angle. Satellite and other digital remote sensing devices are also subject to many of the geometric distortions mentioned above. Once again, while some of these distortions can be corrected, each correction introduces uncertainty into the data collection proc-ess. 2.2.4.4. DATA ENTRY When data are entered into a system using some manual means, there are a number of opportu-nities for uncertainty to appear. Digitising, one of the more studied sources of uncertainty (e.g., Chen and Finn 1994 orYoucai and Wenbao 1997), can include registration skewing, variability in line location and outright blunders such as mislabelling. The source map being digitised is also a source of uncertainty due to printing registration problems, stretching of the medium or thick-ness of lines. For example, the area covered by lines (i.e., underneath) on a map represents an area of uncertainty. In one study, Burrough (1986b) notes that as much as ten percent of the total map area of a 1:25,000 soil map consists of lines. 23 2.2.4.5. DATA MANIPULATION UNCERTAINTY The process of manipulating data can include measures as simple as classifying a cardinal meas-ure, or as complex as the simulation of an ecosystem. While the process of digital data manipula-tion itself produces rounding errors, such problems are dwarfed by uncertainties introduced through simplification and combination of different types of data. The process of classification is rarely simple—it involves a number of subjective elements. The choice of the appropriate classification procedure is not always straightforward; different proce-dures often produce different numbers of classes and different class boundaries. Even if the class divisions are mathematically obvious, there is still the question of appropriate representation of the source. The purpose of the classification is also a factor. Class divisions are often chosen based on their appropriateness for a particular purpose, such as soil classes for slope stability analysis or forest classes for maximising profitability. However, such special-purpose classified data are often made available for other purposes. The secondary user is then faced with subopti-mal class definitions and therefore a heightened level of uncertainty. A related problem (although also related to issues discussed under 'data gathering') is that of cell value averaging. An individual pixel of a remotely sensed image represents the reflectance of all surface features within its bounds, as well as some from adjacent pixels due to factors such as atmospheric haze and viewing angle. If the resolution of the sensor is appropriate for the phenom-ena being measured, then this is not a crucial issue. All too commonly, however, the pixel is larger than the target. This basic variability in reflectance leads to many of the classification problems discussed above. Although it can be reduced by processes such as spectral unmixing (e.g., Mathieu et al 1994), it still remains an issue for uncertainty management. 2.2.4.6. PROPAGATION Much of the functionality of a GIS lies in its ability to combine two or more maps for the purpose of analysis. The complexity of this combination ranges from Boolean functions between raster layers, through topological polygon overlay, and right up to the integration of environmental simulation models within or closely linked to a GIS. Even one of the most basic GIS functions, topological overlay, generates data uncertainty that is difficult to both understand and quantify. 24 Sliver polygons that result from such overlays may be spurious or may actually represent infor-mation. Determining which is often a difficult task. Attempts to estimate cumulative errors (e.g., Newcomer and Szajgin 1984) have led to the general conclusion that, at best, the accuracy resulting from digital overlays is less than the accuracy of the least accurate input layer. This upper bound occurs when all uncertainties spatially coincide. At worst, when they are not coincident, accuracy can be much lower. At the upper end of the complexity scale, spatial analysis and simulation models often perform an elaborate series of operations in order to make their projections. Trying to derive statistical estimators of the uncer-tainty propagated through such models is generally an intractable problem (Mowrer 1995; 1997). 2.2.4.7. GENERALISATION ISSUES Uncertainty is also a function of the difference between the scale of the source data and the scale of use. A substantial difference between the two can, and often does, lead to problems at the analytical stage. The definition of 'substantial' depends upon the nature of the data, the type of analysis, and the traditions of the discipline. Some data are relatively scale invariant. The 49 t h parallel that divides much of the U.S. and Canada is a line that will appear the same no matter what the scale of map. In contrast, a coastline or a road system would be displayed with a greater or lesser degree of complexity depending upon the operational scale. This type of scale variability is not only a feature of data display, it is also important in analysis. A soil or forestry map at a 1:5,000 scale would present different attributes than a map at 1:250,000. Although the resource itself remains the same, the type of analysis performed on a handful of forest stands would be substantially different in nature from one performed on an entire forest district. Data gathered at one scale can often be utilised at another if generalisation procedures are per-formed. Normally, one would only move from large to small scale; however, specialised procedures may allow some movement in the other direction. Generalisation involves a complex set of data manipulation procedures that can act on both spatial and attribute data, and as such generate 25 data uncertainty. However, as the scale of analysis changes so does the tolerance for such uncer-tainty. Generalisation may, therefore, result in a decrease in data uncertainty in some cases. 2 . 2 . 5 . M E A S U R E S O F U N C E R T A I N T Y Measures of uncertainty are dependent upon the type of data under consideration. Spatial data uncertainty will typically be measured using standard circles (assuming x and y are dependent) or error ellipses [x and y independent). The z dimension is normally reported separately, as the data source is usually independent of the others. A more complex analysis might include the distribution of error—typically assumed to be a normal curve. Unfortunately, many of the simpler measures do not lend themselves to further analysis. For example, the USGS type of spatial standard (also employed by the BC government for their terrain data) uses statements such as "Ninety percent of all well-defined planimetric features shall be co-ordinated to within 10 metres of their true position" (SRMB 1990 p. 4-5). In the absence of additional information one could assume that the other ten percent might be located anywhere. Analysis of continuous thematic data might involve statistical measures such as dispersion, meas-ures similar to those employed for spatial data, or implicit measures such as monthly precipita-tion graphs indicating climatological variability (Buttenfield 1991). Categorical data often utilise an index of classification accuracy computed from a classification error matrix. The matrix con-sists of a cross-tabulation of estimated and actual thematic values for a sample of points. In such a matrix, element c s represents the number of points belonging to class i that actually belong to class j. Accuracy indices include: a) the kappa (or khat) statistic, which accounts for correct classifications that occur by chance alone (Hudson and Ramm 1987, for example see Stehman 1996; Naesset 1996); b) user's and producer's accuracy, which account for the accuracy of indi-vidual thematic classes (Aronoff 1982); and c) the PCC statistic (proportion of points correctly classified) which may be viewed as the probability that a point selected at random from a layer is correctly classified (Lanter and Veregin 1992). Alternatives to the matrix approach include area comparisons between polygons and ground survey results, or computation of positional error in polygon boundaries arising from classification error (e.g., Hord and Brooner 1976). 26 The principal problem with all of these measures is that they simply report a generalised account of uncertainty at a certain stage of the process. The numbers may be useful for ascertaining whether data or results are useful for a particular task, but do not offer much help in examining the spatial variability of uncertainty, what occurs when data are combined, or what the implica-tions of the uncertainty are for a particular task or decision. Such tasks are only possible when uncertainty becomes part of the data modelling process. 2 . 3 . U N C E R T A I N T Y M O D E L L I N G The term 'uncertainty modelling' is used rather loosely in the research literature. It is important to distinguish between the modelling of uncertainty, and uncertainty integrated into the model-ling process. The modelling of uncertainty is the same as other types of scientific modelling: it is an approximation of how some aspect of the world works. In general, the modelling of uncertainty refers to the concepts, methods, algorithms and data structures that allow uncertainty to be represented in a useable format, compressing the complexity of the real world. The measures of uncertainty introduced above are models of uncertainty; however, they constitute a very high degree of data compression—often a single number represents an entire layer of data. More com-plex methods of modelling uncertainty will recognise spatial, attribute and/or temporal variabil-ity. The goal of uncertainty modelling is the appropriate representation of uncertainty within a data structure. It is difficult to specify an 'appropriate' model if the context is not specified. This problem represents one of the major drawbacks of 'pure' research into models of uncertainty. In contrast, the integration of uncertainty into environmental modelling maintains a focus on the environmental model itself. Uncertainty models are an essential part of the process, however, the choice of model(s) is based on a number of other factors. Uncertainty estimates and models are utilised with an eye to their integration with other types of data, their ability to function within the modelling software, and the possibilities for propagating the information through to the environ-mental model's results. Other important constraints include the overall purpose of the modelling exercise (which determines the degree of uncertainty tolerance) and the overall complexity of the process relative to the computing facilities available. 27 This section will focus on both the structures used in uncertainty models and the process of environmental modelling as it relates to uncertainty management. 2.3.1. MODELLING Most branches of science concentrate on the specific physics and chemistry required to develop functional models. Earth sciences have been no exception; hydrologic, erosion and other mod-els—simple or complex—have focused primarily on the processes, not the distribution of proper-ties. Only recently has it become expedient to extend such 'lumped' process models to distributed models that attempt to include spatial distribution or transport across the landscape. Huge in-creases in data availability through satellite imagery coupled with exponential increases in com-puting power have enabled this recent shift. However, these specialists do not necessarily understand all the ramifications of spatially distrib-uted data. The spatial distribution of properties can be complex. Scale changes can affect these properties substantially. Changes over time also affect the properties of distributed models. Al-though surveyors and spatial analysts and some hydrologists have studied a number of these topics, traditional divisions between the branches of science have slowed the cross-fertilisation necessary to properly develop distributed models. 2.3.1.1. MODELLING WITH GIS Distributed models will typically be linked in some way with a GIS. The model may be run exter-nally, using the GIS as a data source and method of display, or internally, utilising standard analytical functions. The former method is advantageous when the model is complex, has been previously developed in a particular programming language, or requires specialised hardware. The latter—running the model inside the GIS—has the advantage of minimising data translation problems; however, most GIS programs only provide a range of general-purpose functions. It may be difficult to express a complex model in terms of such functions, and, even if such expression is possible, the generalisation may also lead to suboptimal computation time. 28 One of the crucial issues in this migration of models to a distributed GIS-linked form is the lack of techniques for determining model reliability. Often, the only criterion of quality in GIS-linked models is the cartographic display of the results (Burrough 1993). Even standard non-distributed models suffer from this problem. Users—particularly non-specialists—often accept the simulated results without adequate validation. In fact, there are few standard methods for validating mod-els, and for some validation is difficult or impossible. Even if a simulation model is validated it may, through reliance on empirical relationships, not describe the underlying process correctly. This dearth of techniques for analysing, propagating and reporting error in distributed models has been addressed by a recent surge of research. Some methods utilise derivations of standard techniques, while others borrow from related fields. The following section details some of the more common techniques available to address the modelling of uncertainty for the purpose of deter-mining the reliability of natural resource models. 2.3.1.2. METHODS OF MODELLING In sections above a number of methods of modelling spatial uncertainty in basic GIS entities have been discussed. Uncertainty in points, lines and polygons—the traditional reductionist entities— can be addressed with a number of statistical measures. Often a single measure refers to all entities within a data layer; however, the standard circular error, epsilon band or other measure could be stored as an attribute of each object. The methods of modelling uncertainty presented thus far focus on describing the variability between where an entity is in space and where its database representation places it. Attribute uncertainty is a different matter entirely. There are two basic types of attribute value: classified (nominal or ordinal) values such as soil classes, or continuous (cardinal) values such as elevation. Classified values are the more intractable of the two for uncertainty management. Although some classifications refer to easily-defined, sharply-bounded areas such as bedrock zones or lakes, in many cases of ordinal uncertainty there is no 'true' value for comparison, and problems introduced earlier such as internal purity, class boundaries, and sampling error in-crease the complexity of the problem of representation of a complex reality. This section describes 29 some methods that are available to address ordinal attribute uncertainty. A number of these methods are drawn from expert system analysis, which focuses on 'ill-structured problems' using non-dichotomous structures (i.e., more-or-less structures rather than yes-or-no). The method of fuzzy set theory, which forms the basis for the model presented in the following chapter, is dis-cussed in detail. The section concludes with a presentation of several alternatives for propagating uncertainty models. The quantification of uncertainty has been studied primarily in reference to expert systems. Un-der development in many different fields, expert systems utilise a series of carefully formulated rules to come to a specific conclusion or offer a set of alternatives. Uncertainty metadata is required to navigate some of the more complex decision-making functions, such as determining the strength of rules, when to apply them, and how to resolve conflicts between the rules (Winston 1984). The methods developed during the evolution of expert systems can also be applied in quantifying and manipulating classified attribute uncertainty in resource data. There are four approaches that have been commonly used to generate such metadata: Bayesian probability, Dempster-Shafer theory of evidence, non-monotonic logic, and fuzzy-set theory. 2.3.1.2.1. Bayesian Probability Probability theory is the earliest formal approach applied to quantifying uncertainty and, there-fore, has received the most attention in expert system design. This theory translates uncertainty into a rigorously formal definition easily utilised by expert system designers. The probability of a hypothesis represents a number between zero and one indicating the belief in that hypothesis. If we have an observation and wish to compute the probability that a hypothesis is true given that observation, we can do so if we have two items of information: 1. the likelihood that the observation will occur if the hypothesis is true; and 2. the prior probability of that hypothesis being true. 30 This is termed the 'conditional probability,' and is calculated using the following formula - Bayes Theorum (Stoms 1987): where H is the hypothesis and D is the observation. The mathematical manipulation of probability with Bayesian methods has several drawbacks when applied to uncertainty. First, the theory assumes that probabilities can be assigned with great precision by experts in a consistent way; often an unreasonable expectation. Second, there exists no consistent and fully objective method of rating the probability assignments; some may be based on thorough research while others may simply be guesswork. When we know nothing the theory requires us to assume equal probabilities. A third criticism, voiced by Gordon and Shortliffe (1992), is that committing partial belief to a hypothesis commits the remaining belief to its negation—which can be counter-intuitive. Bayesian probability theory is therefore best at dealing with uncertainty due to randomness or variability rather than vagueness or imprecision (Stoms 1987; Zimmerman 1990). 2.3.1.2.2. Dempster-Shqfer's Theory of Evidence Dempster and Shafer's theory (Shafer 1976) focuses on the quality of evidence rather than truth of hypothesis. A zero to one rating is applied relating to the chance that evidence demonstrates the truth of the hypothesis. Evidence is accumulated to narrow down the hypothesis set using convergence of evidence. Two functions are applied: Bel[H] measures the probability that the evidence implies H; it therefore is the lower bound on the probability that H is true. A plausibility measure, PL[H] = 1 - Bel[not H], represents the upper bound—the degree to which the evidence fails to refute the hypothesis. The range between the two, [Bel[H],PL[H]], indicates the incomplete-ness of evidence for H due to uncommitted support. Utilising the notation [lower bound, upper bound], this implies the following: (2.1) • H[0,11 • H[0,01 - no knowledge at all concerning H; - H is certainly false; 31 H[l.l] H is certainly true; H[.25,l] evidence provides partial support for H; H[0,.851 evidence provides partial support for not-H; H[.25,.85] simultaneous partial support for H and not-H. Support may be distributed among several hypotheses when evidence does not support a single one. The reliability of the source may be accounted for by discounting evidence for all hypotheses. Dempster's rule of combination allows pooling of multiple pieces of independent evidence, focus-ing on the intersection of their independent conclusions. For example, a set of airborne multi-spectral data has been gathered regarding vegetative reflectance of a specific tree species. The probabilities for the cause of a specific anomaly have been estimated as follows: • p(a) = .25 that the trees are water-stressed; • p(b) = . 15 that the trees are nutrient-stressed; • p(c) = .40 that the trees are insect stressed; and • p(d) = .20 that the cause is unknown (represents distributed support for the above). The belief value for p(a) (the trees are water-stressed) = [.25,.45] (the second measure being derived from 1 - .15 [nutrients] - .40 [insects]). The calculated belief values could then be com-bined with other values generated from evidence such as soil samples or rainfall data utilising Dempster's Rule. This theory has been criticised for its lack of attention to content of evidence. If two pieces of evidence conflict there is no mechanism for addressing the reasons for the conflict. Instead, conflicts generate indeterminate results or a diffusion of support among multiple hypotheses. 2.3.1.2.1. Non-Monotonic Logic Non-monotonic logic utilises non-quantitative reasoning modelled along the lines of certain hu-man decision-making processes. When evidence is lacking, a logical conclusion is to expect de-fault values. Multiple inferences are allowed, generated from a set of default axioms (Cohen et al 32 1985). A list of observations is also required, separated into those that would prove the assump-tion true, and those that would prove it false. These act as the rules that drive the system. A system utilising non-monotonic logic begins the process of reasoning to a conclusion by making decisions based on rules. General rules such as 'increased timber production reduces wildlife habitat' might be applied at one particular branch point in a land-use decision. However, if this inference is proven false at some later point by an observed increase in wildlife, the system back-tracks to the branch point that led to the false inference and continues searching until a consist-ent set of assumptions and facts are found. This method has several drawbacks that limit its application in real-world situations: it contains no method of deciding which assumption to reject from a set of contradictory ones, nor does it recognise degree of conflict—a direct contradiction is addressed in a manner similar to minor inconsistencies. However, it does provide a method of dealing with incomplete evidence by making best use of defaults. 2.3.1.2.2. Fuzzy Sets Fuzzy sets have a superficial similarity to Bayesian probability; they represent an uncertainty gradient using numbers between zero and one. However, fuzzy sets are considerably different in concept and, therefore, in application. The numbers represent a degree of membership in a set rather than the chances of probability theory. The implications of this 'degree of membership' mirror the nature of imprecise data, making fuzzy set theory a prime candidate for inclusion in an uncertainty model. In 1965 L.A. Zadeh introduced fuzzy set theory to a sceptical audience of mathematicians. It has since blossomed into an industry that produces billions of dollars worth of fuzzy products. At the heart of the difference between classical and fuzzy set theory is something Aristotle called the law of the excluded middle. In standard set theory, an object either does or does not belong to a set; there is no middle ground. The number seven belongs to the set of odd numbers and not at all to the set of even numbers. This principle preserves the structure of logic and avoids the contradic-tion that an object both is and is not a thing at the same time. 33 Multivalent or fuzzy sets allow degrees of membership. Items belong only partially to a fuzzy set. They may also belong to more than one set. Fuzzy set theory does not contradict classical set theory, but acts as a generalisation in situations where the class boundaries are not, or cannot be, sharply defined. Applications are numerous: they allow a mathematical way to express vague-ness in language, a structure for acting on imprecise information and, key to this discussion, a method of combining and manipulating imprecise input sets. For example, the concept of'moder-ately well-drained soil' does not require a strict class allocation, but might be better served by a quantitative judgement that allows partial membership. 'Fuzzy logic' refers to the rules of ma-nipulating these non-standard class functions as defined by the mathematics of fuzzy set theory. As with class intervals in crisp sets, the choices governing membership functions in fuzzy sets determine the utility of the model. The function utilised should ensure that the grade of member-ship is maximised at the centre of the set and falls off in an appropriate way to the regions outside the set. Burrough (1989) utilises a common function that can be adapted to specific require-where a is a parameter governing the shape of the function and c defines the value of the property x at the function's centre. By varying the value of a, the form of the function and the position of the crossover point (usually 0.5—where the Boolean-style maximum likelihood would shift from one class to another) can be easily controlled. In Figure 2.4, the difference between fuzzy and crisp sets as well as variations in membership function parameters are illustrated. The first three models (a-c) show several interpretations of symmetric Boolean and fuzzy function comparisons. The latter two (d-e) show asymmetric functions. Other fuzzy concepts, such as Very low' or 'close to' might be represented by decaying functions. Fuzzy logic has found its primary application in control circuitry. The most famous is a subway car controller used in Sendai, Japan. Utilising fuzzy rules, each of which is defined as a member-ship function (e.g., 'apply more brake pressure when the train is moving downhill'), the system ments: 1 for 0 < x < P (2.2) 34 • / \ / \ / \ / \ 1 1 \ 1 \ 1 \ ' / \ / \ % / \ / / \ / / \ / \ / / \ / / r \ N. (c) X Figure 2.4. B o o l e a n a n d f u z z y c l a s s i f i c a t i o n m o d e l s . T h e a t t r i b u t e v a l u e (x) i s g r a p h e d a g a i n s t t h e f u z z y m e m b e r s h i p f u n c t i o n v a l u e . T h e b r o k e n l i n e s s h o w t h e e n v e l o p e s o f t h e f u z z y c l a s s e s fo r e a c h m o d e l , w h i l e t h e s o l i d l i n e s a n d s h a d e d a r e a s i n d i c a t e t h e r e l a t e d B o o l e a n s e t s . ( S o u r c e : B u r r o u g h et al. 1992) o p e r a t e s t r a i n s m o r e s m o o t h l y a n d w i t h g r e a t e r e n e r g y e f f i c i e n c y t h a n h u m a n o p e r a t o r s ( K o s k o a n d I s a k a 1 9 9 3 ) . A n o t h e r a r e a o f a p p l i c a t i o n i s i n s p e e c h r e c o g n i t i o n : t e a c h i n g c o m p u t e r s to i n t e r p r e t t h e f u z z y c o n c e p t s i n h e r e n t i n h u m a n l a n g u a g e (e.g. , Z a d e h 1 9 7 0 ) . In t h e c o n t e x t o f n a t u r a l r e s o u r c e m a n a g e m e n t f u z z y log ic h a s b e e n p r i m a r i l y a p p l i e d to two a r e a s : 1) a n s w e r i n g c o m p l e x q u e r i e s t h a t c o m b i n e B o o l e a n m a p s u s i n g f u z z y r u l e s a n d p r o d u c e f u z z y o u t p u t ; a n d 2) u s i n g f u z z y m e m b e r s h i p f u n c t i o n s to r e c l a s s i f y e x i s t i n g d a t a a n d s u b m i t t i n g t h e r e s u l t s to s i m p l e o r c o m p l e x q u e r i e s . T h e f o r m e r i s i l l u s t r a t e d b y a n u m b e r o f l a n d c l a s s i f i c a -t i o n a p p l i c a t i o n s , i n c l u d i n g Z h a n g et al. (1988) a n d W a n g et al. (1990) . F u z z y s e t s a r e c o m b i n e d w i t h a m u l t i - c r i t e r i a m e t h o d o l o g y b y B a n a i (1993) i n r e f e r e n c e to l a n d c l a s s i f i c a t i o n . M e n d o z a a n d S p r o u s e (1989) d e s c r i b e t h e g e n e r a t i o n o f f o r e s t p l a n n i n g a l t e r n a t i v e s u t i l i s i n g f u z z y r u l e s , w h i l e K o l l i a s a n d V o l i o t i s (1991) u s e f u z z y r u l e s to re t r i eve s o i l i n f o r m a t i o n . T h e s e c o n d t y p e o f a p p l i c a t i o n i s d e m o n s t r a t e d b y B u r r o u g h s ( 1 9 8 9 ; 1991) s o i l s u r v e y a n d l a n d e v a l u a t i o n a n d M c B r a t n e y a n d M o o r e ' s (1985) c l i m a t i c c l a s s i f i c a t i o n m o d e l . S u r y a n a (1993) a p -p l i e s f u z z y r e c l a s s i f i c a t i o n to e r o s i o n h a z a r d m a n a g e m e n t a n d l a n d s u i t a b i l i t y m a p p i n g . 35 2.3.1.2.3. Linking Fuzzy Sets With Attribute Data P u b l i s h e d p a p e r s t h a t l i n k f u z z y s e t t h e o r y w i t h g e o g r a p h i c i n f o r m a t i o n c a n b e d i v i d e d i n t o two b r o a d c l a s s e s . M o s t d e a l w i t h f u z z y r e p r e s e n t a t i o n a n d a n a l y s i s o f s i te a t t r i b u t e s . C l a s s e s m a y b e a s s i g n e d u s i n g f u z z y c l a s s i f i e r s , s u p p l i e d w i t h f u z z y l i m i t s , o r g r o u p e d a n d a n a l y s e d w i t h f u z z y log ic . H o w e v e r , a f e w a u t h o r s h a v e v e n t u r e d to b r o a d e n t h e a p p l i c a t i o n o f f u z z y s e t s i n t o t h e r e p r e s e n t a t i o n o f t h e s p a t i a l d i s t r i b u t i o n o f g e o g r a p h i c p h e n o m e n a . F u z z y b o u n d a r i e s , n e i g h -b o u r h o o d s , a n d c o n t i g u i t y o f r e s u l t s a r e e x a m p l e s o f t h e s p a t i a l a p p l i c a t i o n s t o u c h e d o n b y t h e s e l a t t e r a u t h o r s . W h i l e t h e r e a r e u n d o u b t e d l y n u m e r o u s p o s s i b l e m e t h o d s o f d e r i v i n g f u z z y m e m b e r s h i p f u n c -t i o n s , t w o d i s t i n c t g r o u p i n g s h a v e a r i s e n i n t h e g e o g r a p h i c l i t e r a t u r e . R o b i n s o n ( 1 9 8 8 ) , d r a w i n g o n t e r m s c o i n e d b y B u c k l e s a n d P e t r y (1985) , d e f i n e d t h e s e two g r o u p i n g s a s t h e S i m i l a r i t y R e l a t i o n (SR) a n d t h e S e m a n t i c I m p o r t (SI) m o d e l s . T h e f i r s t i s s i m i l a r to c l u s t e r a n a l y s i s a n d n u m e r i c a l t a x o n o m y i n t h a t t h e v a l u e o f t h e m e m b e r -s h i p f u n c t i o n i s a f u n c t i o n o f t h e c l a s s i f i e r u s e d . R o b i n s o n t e r m e d t h i s t h e ' S i m i l a r i t y R e l a t i o n M o d e l ' . F u z z y c l u s t e r i n g , i n t r o d u c e d b y R u s p i n i ( 1 9 6 9 ) , p r o v i d e s a w a y a r o u n d s o m e o f t h e r e p r e -s e n t a t i o n a l d i f f i c u l t i e s o f c o n v e n t i o n a l c l u s t e r i n g w h e r e , fo r e x a m p l e , s t r a y p o i n t s o r ' b r i d g e s ' b e t w e e n s e t s c a n c a u s e p r o b l e m s . F o r e x a m p l e , t h e f u z z y - c - m e a n s m e t h o d i n t r o d u c e d b y D u n n (1974) g i v e s p o i n t s m e m b e r s h i p v a l u e s i n i n v e r s e r e l a t i o n to t h e i r d i s t a n c e s f r o m c l u s t e r c e n t r e s . O p e r a t i o n a l e x a m p l e s i n c l u d e M c B r a t n e y a n d M o o r e ' s (1985) u t i l i s a t i o n o f t h e f u z z y - c - m e a n s m e t h o d to p e r f o r m c l i m a t i c c l a s s i f i c a t i o n s , a s w e l l a s t h e i m p l e m e n t a t i o n o f s u c h c l u s t e r i n g a l g o -r i t h m s i n t h e G I S p a c k a g e ' IDRISI ' ( C l a r k e U n i v e r s i t y , W o r c e s t e r , M a r y l a n d ) . S u c h c l a s s i f i c a t i o n m e t h o d s h a v e b e e n f o u n d to b e m o s t u s e f u l w h e n p e r f o r m i n g e x p l o r a t o r y d a t a a n a l y s i s ; w h e n t h e r e s e a r c h e r h a s l i t t le i n f o r m a t i o n o n c l a s s i f i c a t i o n t r a i n i n g t h e S R m o d e l o f f e r s a n a l t e r n a t i v e m e t h o d o f a u t o m a t i c a l l y g r o u p i n g d a t a . F o r f u r t h e r d e t a i l s o n t h i s c o m p u t a t i o n a l l y c o m p l e x m e t h o d t h e r e a d e r i s r e f e r r e d to B e z d e k et al. (1984) o r M c B r a t n e y a n d M o o r e (1985) . A m a t h e m a t i c a l l y s i m p l e r a p p r o a c h i s to u s e a n a priori m e m b e r s h i p f u n c t i o n w i t h w h i c h i n d i -v i d u a l i t e m s m a y b e a s s i g n e d a m e m b e r s h i p g r a d e . T h i s i s k n o w n a s t h e ' S e m a n t i c I m p o r t M o d e l ' 36 (Robinson 1987; Burrough 1989). The concept of semantic import refers to this model's ability to represent semantic classifiers such as 'close to', 'nearly', or 'rarely' in numerical form. Natural resource scientists are often aware of such classifiers, but have become accustomed to translat-ing them into precise cut-offs for typical numerical analysis. A membership function can be structured in several ways. A symmetric function (Figure 2.4 a-c) might be useful for a situation where we want to distinguish between 'light' and 'moderate' rain-fall. The parameters are adjusted to create proper centring and a smooth crossover between the two functions. An asymmetric function (Figure 2.4 d-e) might be applied when the function can be truncated on one side. 'Insufficient' versus 'sufficient' rainfall for crop growth might qualify as two asymmetric functions. 2.3.1.2.4. Combining Fuzzy Classifications Logical models that assess complex issues such as land suitability for agriculture require that data from a variety of sources be combined in various ways. Boolean classifiers and Boolean logic have traditionally been utilised in everything from simple overlays to complex models of runoff and erosion (e.g., De Roo et al. 1989). For example, the Structured Query Language (SQL) inter-face built into many information systems allows class combinations using the operators AND, OR, NOT, etc. Unfortunately, when data contain some degree of uncertainty, considerable information loss can occur in these strict combinations. Several studies (Marsman and de Gruijter 1984; 1986; Drummond 1988) provide examples of comparisons between derived attributes and ground checks. The 'quality' estimates in these studies were often abysmally low. However, the degree of misclas-siflcation was rarely serious because the attribute values causing the misclassiflcation were often only slightly outside the defined class limits. The information existed; however, the map's quality was underestimated by the Boolean matching process (Burrough et al. 1992). A multivariate fuzzy set does not produce such strict boundaries. Data that have been trans-formed from original observations to fuzzy class values can be combined with a single 'joint fuzzy membership function' or JMF (Burrough et al. 1992) as follows: 37 Result (JMF) = M I N ( M F A , M F B , M F C ) (2.3) where MF A > M F B and MF C represent the membership functions of three different spatially-concur-rent attributes. The minimum value of each single membership function for each attribute value gives the JMF. The great majority of existing papers in this field focus on applications of this classification/JMF technique. Two streams of research have emerged—the first focuses on physical applications such as land or crop suitability analysis (Drummond 1988; Burrough 1989; Suryana 1993), ero-sion hazard assessment (Suryana 1993) and soil-property analysis (Burrough 1989; Burrough et al. 1992). The second emphasises spatial decision support modelling, including generating deci-sion-making alternatives (Mendoza and Sprouse 1989; Banai 1993) and linear programming (Mendoza and Sprouse 1989). 2.3.1.2.5. Cardinal Values Resource data may be modelled using either discrete or continuous data structures. Further-more, the data may be stored as ordinal classes or as cardinal values. Fuzzy sets, as well as the other methods presented above, are of use with either type of data structure, but focus particu-larly on ordinal classes. Data that are stored as cardinal values cannot be referred to using membership values or single probabilities. Uncertainty in these values is a numerical distribu-tion, and must be simulated using a strictly numerical method such as probability distributions. Ordinal and cardinal values must be dealt with differently when uncertainty is propagated through modelling procedures. Although reduction to ordinal classes is a possible solution, this reduces the potential to model the data mathematically. 2 . 3 . 2 . P R O P A G A T I O N O F U N C E R T A I N T Y Study of the propagation of uncertainty through GIS and spatial modelling processes is an impor-tant, even crucial, task. There is considerable justification for such a statement. Goodchild (1991:121) notes that "currently we lack comprehensive methods of describing error, modelling its effects as it propagates through GIS operations, and reporting it in connection with the re-sults." Lanter and Veregin (1992:825) note that most research into error modelling has been 38 carried out "in isolation from the broader context of error propagation modelling in a GIS environ-ment". For example, Fisher's viewshed error analysis (e.g., Fisher 1991a; 1992; 1994) or Veregin's (1996) buffer operation propagation error modelling concentrate on very specific, spatially ori-ented operations. Work on isolated operations provides necessary input; however, generic propa-gation modelling for environmental models requires more universal methods. A model of the propagation of uncertainty through spatial data processing may utilise two basic approaches (Joy et al. 1994). An analytical approach, such as that employed by Heuvelink et al. (1989), uses mathematical functions. However, standard propagation theory (Taylor 1982) restricts such math-ematical analysis to operations that are continuously differentiable. The alternative is the Monte Carlo method. 2.3.2.1. ARITHMETIC PROPAGATION The simplest way of propagating uncertainty through a model is through basic arithmetic rela-tions. However, this only holds true for errors in cardinal values that are both random and inde-pendent (for examples see Burrough 1986a). A simple model such as A+B will yield significantly different error values than A-B, particularly if A=B. However, when variables are correlated and the model involves a product or quotient, partial differentiation of a Taylor expansion is required (Taylor 1982). The rules of thumb for mathematical model error propagation (Alonso 1968) in-clude suggestions to: • avoid intercorrelated variables; • try to avoid multiplication or division; and • avoid as far as possible taking differences or raising variables to powers. These limitations, though not absolute, place severe limitations on the development of environ-mental models. Nonetheless, arithmetic propagation is utilised in some studies. For example, Burrough (1993) uses a second order Taylor series to model heavy metal sediment levels in the Netherlands. The model used consists of one environmental variable and one terrain variable. Similar studies have been carried out by other members of Burroughs research group (e.g., Heuvelink 1995). 39 2.3.2.1. MONTE CARLO A m o r e u n i v e r s a l m e t h o d o f u n c e r t a i n t y p r o p a g a t i o n i s f o u n d i n M o n t e C a r l o s i m u l a t i o n . T h i s m e t h o d , t h o u g h c o m p u t a t i o n a l l y i n t e n s i v e , i s t o t a l l y i n d e p e n d e n t o f t h e u n c e r t a i n t y m o d e l s u s e d a n d t h e n a t u r e a n d s e q u e n c e o f G I S o r m o d e l l i n g o p e r a t i o n s e m p l o y e d . It i s g e n e r a l l y a p p l i c a b l e to e r r o r p r o p a g a t i o n p r o b l e m s i n a G I S c o n t e x t ( O p e n s h a w 1 9 8 9 ; F i s h e r 1 9 9 6 ; H e u v e l i n k a n d B u r r o u g h 1 9 9 3 ) . T h e M o n t e C a r l o m e t h o d w a s i n t r o d u c e d b y v o n N e u m a n n a n d U l a m d u r i n g W o r l d W a r II a s a c o d e w o r d f o r t h e s e c r e t w o r k a t L o s A l a m o s . T h e m e t h o d w a s a p p l i e d to s i m u l a t i n g r a n d o m n e u t r o n d i f f u s i o n i n f i s s i o n a b l e m a t e r i a l . L a t e r i t w a s e x p a n d e d to e v a l u a t i n g c o m p l e x i n t e g r a l s o r s o l v i n g c e r t a i n e q u a t i o n s t h a t w e r e n o t a m e n a b l e to a n a l y t i c a l s o l u t i o n s ( R u b e n s t e i n 1 9 8 1 ) . M o n t e C a r l o m e t h o d s n o w a r e a p p l i e d i n a v a r i e t y o f d i s c i p l i n e s i n o r d e r to s o l v e c o m p l e x p r o b l e m s — f r o m r a d i a t i o n t r a n s p o r t to r i v e r m o d e l l i n g . R e c e n t l e a p s f o r w a r d i n c o m p u t e r p r o c e s s i n g p o w e r h a v e m a d e t h i s ' b r u t e f o r c e ' m e t h o d i n c r e a s i n g l y a p p e a l i n g . M o n t e C a r l o t e c h n i q u e s a l l o w fo r replication o f a n e x p e r i m e n t . R e p l i c a t i n g i m p l i e s r e - r u n n i n g t h e e x p e r i m e n t o r s i m u l a t i o n n u m e r o u s t i m e s w i t h s e l e c t e d c h a n g e s i n t h e i n p u t p a r a m e t e r s . I n p u t d a t a u n c e r t a i n t y i s a s s u m e d to b e c h a r a c t e r i s e d b y a n e r r o r m o d e l t h a t r e p r e s e n t s r e a s o n a b l e e s t i m a t e s o f t h e p o s s i b l e v a l u e s . A s i n g l e s i m u l a t i o n i n v o l v e s r a n d o m l y s e l e c t i n g a v a l u e f r o m e a c h o f t h e i n p u t e r r o r m o d e l s , c o m p l e t i n g a s e r i e s o f a n a l y t i c a l o p e r a t i o n s , a n d s t o r i n g t h e r e s u l t s . T h e p r o c e s s i s r e p e a t e d M t i m e s , a n d t h e M r e s u l t m a p s a r e s u m m a r i s e d to p r e s e n t s o m e s o r t o f c o n f i d e n c e i n t e r v a l a r o u n d t h e m e a n o f a l l t h e s i m u l a t i o n s . T h e b a s i c a l g o r i t h m i s a s fo l l ows : 1) determine the types, levels and error characteristics of each source data set; 2) replace the observed data with a set of random variables drawn from an appropriate probability distribution designed to represent the uncertainty of the inputs; 3) apply a sequence of (GIS) operations to the data—uncertainties in models and equations may also be simu-lated by randomisation, if possible; 4) save the results; 5) repeat steps 2 to 4 M times; and 6) compute summary statistics. 40 A n i m p o r t a n t c o n s i d e r a t i o n i s d e t e r m i n i n g t h e a p p r o p r i a t e M v a l u e . M o s t a u t h o r s s u g g e s t r a t h e r s m a l l v a l u e s , o n t h e o r d e r o f 2 0 - 3 0 . O p e n s h a w (1989) n o t e s t h a t o n e s h o u l d n o t p l a c e t o o m u c h e m p h a s i s o n s i g n i f i c a n c e t e s t s o f t h e r e s u l t s a s c l a s s i c a l i n f e r e n c e r e a l l y i s n o t a p p r o p r i a t e d u e to m u l t i p l e s i g n i f i c a n c e t e s t i n g p r o b l e m s a n d t h e a b s e n c e o f a f o r m a l e x p e r i m e n t a l d e s i g n . H e s u g -g e s t s t h a t a n y s i g n i f i c a n c e t e s t s b e u s e d a s a g u i d e to a c t i o n r a t h e r t h a n a p r e c i s e t e s t o f a h y p o t h e s i s . If o n e m a k e s t h e b r o a d a s s u m p t i o n t h a t t h e t a r g e t v a l u e (the B o o l e a n r e s u l t ) i s t h e m o s t a c c u r a t e , t h e n h e s u g g e s t s s t o p p i n g t h e s i m u l a t i o n o n c e t h e t a r g e t r e s u l t i s r a n k e d h i g h e r t h a n f i f th . It m a y a l s o b e a p p r o p r i a t e to s i m p l y w a t c h fo r t h e a p p e a r a n c e o f t h e t a r g e t v a l u e i n o n e o f t h e t a i l s o f t h e o u t p u t d i s t r i b u t i o n , i n d i c a t i n g a s k e w e d r e s u l t . A s a M o n t e C a r l o s i m u l a t i o n i s t o t a l l y i n d e p e n d e n t o f t h e u n c e r t a i n t y m o d e l s a n d d a t a m a n i p u l a -t i o n o c c u r r i n g , i t i s a n i d e a l p l a t f o r m for a d d r e s s i n g p r o p a g a t i o n t h r o u g h c o m p l e x G I S p r o c e s s e s . O p e n s h a w et cd. (1990) r a n s u c h a s i m u l a t i o n o n a s t u d y o f r a d i a t i o n w a s t e d u m p s i te s . In t h i s c a s e , t h e y c h o o s e to p e r t u r b s p a t i a l c o m p o n e n t s o f t h e d a t a s u c h a s n o d e a n d v e r t e x l o c a t i o n s . F i s h e r ( 1 9 9 1 c ) f o c u s e s o n s o i l i n c l u s i o n s : r a n d o m i s i n g g r i d c e l l s i n a s o i l c o v e r a g e to s i m u l a t e t h e d i f f e r e n c e s b e t w e e n r e a l i t y , w h e r e s m a l l s o i l i n c l u s i o n s o c c u r , a n d t h e s m o o t h e d s o i l m o d e l u s e d b y s t a n d a r d B o o l e a n p r o c e s s i n g . H e o f f e r s u n c o n t r o l l e d ( total ly r a n d o m ) a n d c o n t r o l l e d a l g o -r i t h m s ; t h e l a t t e r r e q u i r i n g k n o w l e d g e o f i n c l u s i o n p r o b a b i l i t i e s fo r t h e v a r i o u s s o i l t y p e s p r e s e n t . E m m i a n d H o r t o n (1995) u s e M o n t e C a r l o p r o c e d u r e s to e x a m i n e u n c e r t a i n t y i n r i s k a s s e s s m e n t fo r e a r t h q u a k e s , w h i l e K u n k e l a n d W e n d l a n d (1997) u s e s i m i l a r t e c h n i q u e s to m o d e l g r o u n d w a t e r r e s i d e n c e . T h e M o n t e C a r l o t e c h n i q u e i s s i m p l e i n p r i n c i p l e ; p r o p e r i m p l e m e n t a t i o n i s m o r e d i f f i c u l t . It s h o u l d b e n o t e d t h a t , i n t h e s t u d i e s s u m m a r i s e d a b o v e , a c o n s i d e r a b l e a m o u n t o f t h e s i m u l a t o r ' s e f for t h a s g o n e i n t o c h o o s i n g t h e r a n d o m i s e d v a r i a b l e s i n a m a n n e r a p p r o p r i a t e to t h e a p p l i c a -t i o n . 2 . 3 . 3 . U N C E R T A I N T Y I N C O N T I N U O U S D A T A T h e v a r i o u s e r r o r m o d e l s d i s c u s s e d e a r l i e r c a n b e u t i l i s e d i n e i t h e r a d i s c r e t e o r c o n t i n u o u s s p a t i a l d a t a m o d e l . In a d i s c r e t e m o d e l a s i n g l e m e a s u r e w o u l d t y p i c a l l y b e a p p l i e d to e a c h ob jec t . 41 For example, one number would describe an entire polygon's spatial uncertainty, while another number would describe its classification uncertainty. However, many environmental models im-plicitly assume a continuous spatial variation, as do interpolation techniques such as kriging. The continuous data model used in most GIS is a raster. Although raster structures discretise the continuous nature of the reality/models, they do so to a far lesser degree than the standard discrete entitles of points, lines and polygons. Fuzzy set theory has been utilised in three primary ways in geographic data analysis: fuzzy class memberships, fuzzy rules for class combinations or queries, and fuzzy spatial boundaries be-tween entities. This latter application represents an important bridge between discrete and con-tinuous models. There has been some research in this area, although, as Heuvelink and Burrough (1993) point out, there is little experience regarding how to deal with these transition zones be-tween source polygons. Polygon boundaries are represented as lines on categorical maps. This belies the fact that these lines are fundamentally different from all other geographic linear features. Works such as Tho-mas Poiker's classic "A Theory of the Cartographic Line" (Peucker 1975) focus on the relationships between linear elements on maps and their computer representations. Cartographic generalisa-tion focuses on complex or dynamic linear features such as shorelines or rivers. The lines that represent boundaries between the classes in categorical maps have received considerably less attention. These types of boundaries exist in the real world to varying degrees. For example, forest types are divided by transition zones whose widths vary widely (Joy et al. 1994). Even before addressing positional error, the 'sharp' boundary between a clearcut and mature forest is a 'cor-ridor of transition' that is a minimum of several metres wide. Soil type divisions other than bed-rock boundaries or fault lines can exhibit considerably more variability in their boundary widths. The inclusion of these spatial constraints in attribute classification is a necessity in a spatial uncertainty model. The traditional separation of 'attribute' and 'geometry' is consistent with an entity-relationship model of phenomena in which geometry defines objects which then have at-42 t r i b u t e s a n d r e l a t i o n s h i p s ( M a r k a n d C s i l l a g 1 9 8 9 ) . H o w e v e r , t h e t w o a r e t i g h t l y i n t e r t w i n e d . T h e l i n e g e o m e t r y i s s t i l l a n a r t e f a c t o f t h e a t t r i b u t e c l a s s i f i c a t i o n p r o c e s s . In F i g u r e 2 . 5 f o u r m o d e l s o f p r o b a b i l i s t i c f u n c t i o n s a r e p r e s e n t e d . T h e f i r s t t w o h a v e t r a d i t i o n a l l y b e e n a p p l i e d to l i n e p o s i t i o n ; h o w e v e r , M a r k a n d C s i l l a g (1989) e x t e n d t h i s m o d e l to i n c l u d e m e m b e r s h i p p r o b a b i l i t i e s . T h e y f o c u s o n d e v e l o p i n g t h e t h i r d m o d e l (c). T h i s w o r k i s e v e n m o r e a p p l i c a b l e to f u z z y m e m b e r s h i p f u n c t i o n s , a s a ' p r o b a b i l i t y ' o f 0 . 2 5 a d m i t s a 7 5 % p o s s i b i l i t y o f anything e l s e , w h e r e a s a s i m i l a r f u z z y m e m b e r s h i p r e f e r s to a degree o f c l a s s m e m b e r s h i p . M a r k a n d C s i l l a g a s s u m e t h a t t h e p r o b a b i l i t y o f m e m b e r s h i p a t a n y g i v e n p o i n t n e a r t h e b o u n d a r y c a n b e a p p r o x i m a t e d b y s o m e f a m i l y o f p a r a m e t r i c c u r v e s . T h e y a d m i t t h e s t r o n g l i k e l i h o o d o f a n a s y m m e t r i c f u n c t i o n t h a t v a r i e s w i t h l o c a t i o n a l o n g a b o u n d a r y , h o w e v e r , t h e y u t i l i s e a s i m p l e c u m u l a t i v e n o r m a l f u n c t i o n a s a f i r s t a p p r o x i m a t i o n . T h e f o u r t h m o d e l p r e s e n t e d i n F i g u r e 2 . 5 i s t h e ' c o r r i d o r o f t r a n s i t i o n ' m o d e l d e v e l o p e d i n D a v i s (1994) a n d u t i l i s e d i n t h e f o l l o w i n g c h a p t e r . T h e s h i f t f r o m p r o b a b i l i s t i c l i n e f u n c t i o n s to a f u z z y ' p o s s i b i l i t y ' i s a s i m p l e o n e . O n a m u l t i p l e -c a t e g o r y m a p t h e l i n e s m a y b e s a i d to b e t h e a r e a s o f least s p a t i a l a t t r i b u t e c e r t a i n t y . In t h e o r y , t h e y r e p r e s e n t a s e r i e s o f p o i n t s w h e r e l i k e l i h o o d s c o l l i d e . U t i l i s i n g t h e f u z z y d a t a m o d e l i t i s a p p a r e n t t h a t t h i s l i n e r e p r e s e n t s a s e r i e s o f p o i n t s i n s p a c e w h e r e , r a t h e r t h a n t h e r e b e i n g a n .Q o • (a) • h V (c) | (b) (d) Figure 2.5. F o u r p r o b a b i l i s t i c f u n c t i o n s o f s p a t i a l b o u n d a r y u n -c e r t a i n t y : a) t h e s h a r p B o o l e a n b o u n d a r y ; b) t h e e p s i l o n m o d e l ; c) t h e m e m b e r s h i p p r o b a b i l i t y m o d e l ; a n d d) t h e c o r r i d o r o f t r a n s i -t i o n m o d e l . T h e a r r o w i n d i c a t e s t h e s a m e s p a t i a l p o s i t i o n (the B o o l e a n b o u n d a r y ) i n e a c h . ( S o u r c e : D a v i s , 1 9 9 4 ; M a r k a n d C s i l l a g , 1989) 43 e q u a l p r o b a b i l i t y o f o n e o r t h e o t h e r b e i n g p r e s e n t , t h e m e m b e r s h i p v a l u e s o f t h e t w o e q u a l i s e . T h e r e f o r e , r a t h e r t h a n d e f i n i n g a s t a n d a r d c u r v e for t h e b o u n d a r y m o d e l , t h e f u z z y c l a s s i f i c a t i o n t e c h n i q u e s d i s c u s s e d e a r l i e r i n t h i s c h a p t e r c a n b e a p p l i e d to c r e a t e a n a t t r i b u t e - o r i e n t e d f u z z y b o u n d a r y m o d e l . 2.3.4. SUMMARY T h e a b o v e s e c t i o n h a s i n c l u d e d d i s c u s s i o n s o f m o d e l l i n g a n d p r o p a g a t i o n u s i n g d i s c r e t e e n t i t i e s , c o n t i n u o u s e n t i t i e s , s p a t i a l l o c a t i o n , a n d t h e u s e o f b o t h o r d i n a l a n d c a r d i n a l a t t r i b u t e d a t a . T h i s m u l t i t u d e o f f a c t o r s t h a t m a k e u p u n c e r t a i n t y m o d e l l i n g a n d p r o p a g a t i o n s e r v e s to h i g h l i g h t t h e m u l t i d i m e n s i o n a l a s p e c t o f t h i s f i e ld o f i n q u i r y . T h e r e i s n o s i n g l e a n s w e r — n o ' u n c e r t a i n t y b u t -t o n ' t h a t c a n b e t a c k e d o n to a G I S . T h e n a t u r e a n d c o n t e x t o f t h e d a t a a n d t h e t a r g e t a p p l i c a t i o n d e f i n e t h e t y p e o f u n c e r t a i n t y m o d e l l i n g a n d p r o p a g a t i o n n e c e s s a r y . 2 . 4 . C O M M U N I C A T I O N O F U N C E R T A I N T Y H a r k i n g b a c k to t h e b i g p i c t u r e — t h e m a n a g e m e n t o f u n c e r t a i n t y fo r d e c i s i o n - m a k i n g — t h e f i n a l s t e p i n t h e p r o c e s s i s c o m m u n i c a t i n g t h i s u n c e r t a i n t y to t h e t a r g e t a u d i e n c e . W h e n d e a l i n g w i t h l u m p e d m o d e l s t h e u n c e r t a i n t y c a n b e c o m m u n i c a t e d w i t h a s i m p l e n u m e r i c a l s u m m a r y . It s t i l l m a y b e a c h a l l e n g e to u n d e r s t a n d a n d c o m m u n i c a t e t h e implications o f a s t a n d a r d d e v i a t i o n , f u z z y , o r o t h e r s i m i l a r m e a s u r e ; h o w e v e r , t h e d i m e n s i o n s o f t h e p r o b l e m a r e r e l a t i v e l y l i m i t e d i n c o m p a r i s o n to s p a t i a l d a t a . A d i s t r i b u t e d u n c e r t a i n t y m o d e l w i l l p r o b a b l y h a v e a v a r i a b l e s p a t i a l d i s t r i b u t i o n o f u n c e r t a i n t y ; h i g h e r i n s o m e a r e a s t h a n o t h e r s , o r h i g h e r fo r c e r t a i n f e a t u r e s . A g r a p h i c a l m e t h o d o f c o m m u n i c a t i o n i s n e c e s s a r y to c o m m u n i c a t e t h i s i n f o r m a t i o n . T h i s s e c t i o n b r i e f l y d e s c r i b e s s o m e o f t h e b a c k g r o u n d a n d c u r r e n t r e s e a r c h i n t h i s f i e l d . R e s e a r c h i n t o u n c e r t a i n t y v i s u a l i s a t i o n h a s g r a d u a l l y e x t e n d e d t h e g r a p h i c v a r i a b l e s o f f e r e d b y B e r t i n (1985) i n t o n e w r e a l m s o p e n e d u p b y c o m p u t e r d i s p l a y s . T h e f i r s t o p t i o n i s t h e u s e o f s t a t i c v a r i a b l e s s u c h a s c h a n g i n g t h e ' f o c u s ' o f a p a r t i c u l a r o b j e c t to r e p r e s e n t u n c e r t a i n t y ( M a c E a c h r e n 1 9 9 2 ; 1 9 9 4 ) . U s e o f a c o l o u r v a r i a b l e s u c h a s d e c r e a s i n g c o l o u r s a t u r a t i o n v a l u e s o r c h a n g i n g h u e i s a n o t h e r p o s s i b i l i t y . O t h e r p o t e n t i a l s t a t i c v a r i a b l e s s u c h a s f o g o r t e x t u r e c h a n g e a r e d i s c u s s e d b y G o o d c h i l d , B u t t e n f i e l d , a n d W o o d (1994) . 44 Dynamic cartography (map animation) offers advantages over static displays in terms of informa-tion density. Simple dynamic variables such as 'duration' in a flashing symbol map or shifting pixel map (Dibiase et al. 1992; Fisher 1994) may be used to express degree of certainty in dis-played information. Allowing the user to toggle between the actual data and uncertainty informa-tion is another possibility (MacEachren 1994; van der Wei et al 1994). A third alternative is displaying individual realisations of a model to express variability (Goodchild et al 1994). This concept may be extended to a dynamic display of the full range of realisations. Displaying such realisations has the advantage of drawing attention to the possible effects of the uncertainty—a key issue for the user that lacks understanding of the statistical basis for the uncertainty model. For example, watching the slope stability values for the area above a road change from 'safe' to 'unsafe' has a more dramatic impact than simply reporting standard deviation values numeri-cally. A variety of variables are available for manipulation. Static visualisation might include the com-mon technique of pairing uncertainty maps with the data they represent, or using Bertin's (1985) variables in combination to depict data and uncertainty in a single map—the best candidates being value, colour and texture (Goodchild et al. 1994). MacEachren's (1992) concept of defocus-ing symbols fits this category, as does Fisher's (1994) use of interactive sound. Another of Bertin's variables, arrangement, could be utilised to represent uncertainty as the z-axis in a perspective view. In dynamic visualisation, some of the available choices include display animation using various realisations, Fisher's (1993; 1994) use of randomisation of pixels, the use of sound, text and images in a multimedia role to support map presentations (Beard and Buttenfield 1991), or inter-active manipulation—where the user is involved in altering parameters and viewing results in real time (MacEachren 1994). The exploratory nature of many of the studies cited above has typically led to exclusion of the human factor. Cartographic research has made steady progress in understanding how humans interact with static maps. Rapid advances in visualisation technology require equally rapid ad-45 v a r i c e s i n u n d e r s t a n d i n g t h e p s y c h o l o g y o f t h e s e d i f f e r e n t d i s p l a y s . S o m e s u c h a d v a n c e s a r e s u m m a r i s e d i n H e a r n s h a w a n d U n w i n (1994) . H o w e v e r , a p p l i c a t i o n s t h a t f o c u s o n t h e d i s p l a y o f u n c e r t a i n t y i n f o r m a t i o n h a v e b e e n r a r e . N u m e r o u s s t u d i e s s u c h a s E v a n s (1994) a r e r e q u i r e d to c o m p a r e t h e e f f i c a c y o f t h e m a n y p o s s i b l e r e a l i s a t i o n s o f u n c e r t a i n t y d i s p l a y . 2 . 5 . U N C E R T A I N T Y I N F O R E S T R Y D A T A A N D M O D E L S F o r e s t r y i n v e n t o r y d a t a , g r o w t h m o d e l s a n d d e c i s i o n m o d e l s a r e e a c h s u b j e c t to m a n y o f t h e u n c e r t a i n t i e s a n d i s s u e s p r e s e n t e d a b o v e . A n u m b e r o f t h e u n c e r t a i n t y m o d e l s h a v e b e e n a p p l i e d to v a r i o u s c o m p o n e n t s o f t h e f o r e s t r y r e s o u r c e s e c t o r ; h o w e v e r , c o n s e r v a t i v e a t t i t u d e s a n d a c e r t a i n a m o u n t o f i n e r t i a i n f o r e s t r y a g e n c i e s a n d b u s i n e s s e s h a s m e a n t s l o w a d o p t i o n o f n e w t e c h n i q u e s . U n c e r t a i n t y m o d e l l i n g h a s t h e p o t e n t i a l to a l l o w m o r e t i m e l y a n d i n f o r m e d d e c i s i o n s , g r o w t h a n d y i e l d m o d e l s t h a t b e t t e r re f lec t r e a l i t y , a n d i n v e n t o r y d a t a t h a t i n c o r p o r a t e s l e s s a b s t r a c t i o n a n d i s e a s i e r to m a i n t a i n . U s i n g t h e c l a s s i f i c a t i o n o f u n c e r t a i n t y s o u r c e s a n d i s s u e s p r e s e n t e d i n t h e p r e v i o u s s e c t i o n , t h e f o l l o w i n g i s a s u m m a r y o f f o r e s t r y - r e l a t e d i s s u e s a n d r e l e v a n t r e s e a r c h : Positional uncertainty - F i e l d s a m p l i n g r e l i e s o n a c c u r a c y i n t h e a b s o l u t e p o s i t i o n o f s a m p l e s i t e s a n d p l o t s . S o m e f o r e s t r y r e s e a r c h ( B i l o d e a u etal. 1 9 9 3 ; L o w e l l 1997) h a s e x a m i n e d a l t e r n a -t ive s a m p l i n g s t r a t e g i e s t h a t p o t e n t i a l l y l e a d to r e d u c t i o n s i n a t t r i b u t e u n c e r t a i n t y . T h e s p a t i a l p o s i t i o n o f b u f f e r s a n d b o u n d a r i e s o f f o r e s t s t a n d s a r e s u b j e c t to u n c e r t a i n t i e s i n t h e v a r i o u s d a t a g a t h e r i n g t e c h n i q u e s . T h e e f fect t h i s h a s o n a r e a c a l c u l a t i o n s h a s b e e n e x a m i n e d ( M a g n u s s e n 1996) . Temporal uncertainty - I n v e n t o r y a n d s a m p l i n g t a k e p l a c e a t v a r i a b l e s p a c e d i n t e r v a l s . B e t w e e n t h e s e t i m e s u n c e r t a i n t y g r a d u a l l y i n c r e a s e s i n i t e m s s u c h a s v o l u m e e s t i m a t e s v i a g r o w t h m o d -e l s . T h e r e f o r e , t h e a c c u r a c y o f t h e c u r r e n t i n v e n t o r y i s d e p e n d e n t o n f r e q u e n c y o f u p d a t e . T h e m o r e s o p h i s t i c a t e d d e c i s i o n m o d e l s p r e d i c t c h a n g e i n s o c i a l a n d e c o n o m i c s y s t e m s . U n c e r t a i n t y i n t h e s e v a l u e s a l s o i n c r e a s e s o v e r t i m e . 46 Attribute uncertainty—There i s v a r i a b i l i t y i n t h e s i z e - v o l u m e r e l a t i o n s u s e d i n e s t i m a t o r s . C o n -s i d e r a b l e w o r k h a s b e e n u n d e r t a k e n i n d e t e r m i n i n g t h e n a t u r e o f t h i s v a r i a b i l i t y (e.g. , C u n i a a n d W h a r t o n 1 9 8 6 ) . P o l y g o n s a r e n o t h o m o g e n e o u s ; t h e r e i s a l s o v a r i a b i l i t y i n s p e c i e s w i t h i n a p o l y -g o n . A c c u r a c y a s s e s s m e n t s h a v e b e e n c a r r i e d o u t b y s o m e a g e n c i e s ( C a l i f o r n i a D e p t . o f F o r e s t r y 1 9 9 2 ; B C M i n i s t r y o f F o r e s t s 1 9 9 5 ) . M e t h o d s s u c h a s f u z z y c l a s s i f i c a t i o n h a v e b e e n p r o p o s e d a s a s o l u t i o n ( C a p r a et al. 1 9 9 5 ) . W o r k o n s t o c h a s t i c m o d e l s s u c h a s J o y et al. (1994) d e a l w i t h m u l t i p l e u n c e r t a i n t i e s t h r o u g h M o n t e C a r l o s i m u l a t i o n , Boundary uncertainty—The s t a n d a r d p o l y g o n a l s y s t e m o f f o r e s t i n v e n t o r y d i s t o r t s a m o r e c o n -t i n u o u s r e a l i t y . A l t h o u g h t h i s s i m p l i f i e s d a t a g a t h e r i n g a n d m a i n t e n a n c e , a c c u r a c y o f i n v e n t o r y s u f f e r s . T h e r e i s a l s o t h e i s s u e o f u n r e p e a t a b l e b o u n d a r y d e l i n e a t i o n d u e to i n t e r p r e t a t i o n u n c e r -t a i n t y . R e s e a r c h o n f u z z y r e p r e s e n t a t i o n o f b o u n d a r i e s h a s a l s o t a k e n p l a c e i n t h e f o r e s t r y sec to r . F o r e x a m p l e , L o w e l l ( 1 9 9 3 a ) c r e a t e s f u z z y b o u n d a r i e s u s i n g V o r o n o i a r e a - s t e a l i n g t e c h n i q u e s , a n d e v a l u a t e s t h e m t h r o u g h c o m p a r i s o n s w i t h e x i s t i n g m a p s a n d s a m p l e s . Satellite classification uncertainty—As sate l l i te i m a g e s b e c o m e a m o r e s u b s t a n t i a l d a t a s o u r c e for i n v e n t o r y , u n c e r t a i n t y i n c l a s s i f i c a t i o n b e c o m e s a n i s s u e . W o r k s u c h a s C a p r a et al. (1995) p r o p o s e a l t e r n a t i v e s s u c h a s f u z z y c l a s s i f i c a t i o n t e c h n i q u e s fo r f o r e s t r y m a n a g e m e n t . Digitising uncertainty—During t h e t r a n s i t i o n f r o m p a p e r to d i g i t a l m a p s d i g i t i s i n g u n c e r t a i n t y w a s a n i s s u e . It i s l e s s s o p r e s e n t l y a s m o r e a n d m o r e d a t a a r e c o l l e c t e d d i g i t a l l y a n d a r e a v a i l a b l e i n d i g i t a l f o r m . Propagation—Growth m o d e l s a n d d e c i s i o n s u p p o r t m o d e l s h a v e g r o w n c o n s i d e r a b l y i n c o m p l e x -i t y o v e r r e c e n t y e a r s . P r o p a g a t i o n o f u n c e r t a i n t y t h r o u g h t h e s e m o d e l s i s a n i s s u e . A l t e r n a t i v e m o d e l s s u c h a s t h a t p r o p o s e d b y T h o m p s o n a n d V e r t i n s k y (1991) i n c l u d e u n c e r t a i n t i e s t h r o u g h r e c o g n i t i o n o f s p a t i a l c o n s t r a i n t s a n d l o n g - t e r m d y n a m i c s . V a n K o o t e n et al (1990) u t i l i s e M o n t e C a r l o t e c h n i q u e s to p r o p a g a t e u n c e r t a i n t y t h r o u g h a g r o w t h m o d e l , h o w e v e r , t h e y f o u n d t h a t a l a c k o f d a t a o n v a r i a b i l i t y h a m p e r e d t h i s t y p e o f m o d e l l i n g . Generalisation—Data a r e g e n e r a l i s e d fo r v a r i o u s r e a s o n s i n t h e c o u r s e o f i n v e n t o r y , p r o d u c i n g s u m m a r i e s a n d s e t t i n g u p m o d e l s , a l t h o u g h t h e r e i s l i t t le f o r e s t r y - s p e c i f i c r e s e a r c h o n t h i s i s s u e . 47 Modelling with GIS—Growth m o d e l s t h a t w e r e f o r m e r l y ' l u m p e d ' (i .e., n o n - s p a t i a l ) a r e g r a d u a l l y b e i n g i n c o r p o r a t e d i n t o G I S a s t e c h n i q u e s b e c o m e a v a i l a b l e (e.g. , T h o m p s o n a n d V e r t i n s k y 1 9 9 1 ) . T h i s s p a t i a l o r i e n t a t i o n o f f o r e s t m o d e l s i s b e c o m i n g i m p o r t a n t fo r m a n y r e a s o n s , f r o m f i n a n c i a l to e c o l o g i c a l . S i m i l a r l y , m a n a g e m e n t d e c i s i o n s m u s t b e i n c r e a s i n g l y s e n s i t i v e to s p a t i a l c o n c e r n s . F o r e x a m p l e , t h e d e c i s i o n to h a r v e s t a p a r t i c u l a r s t a n d m a y d e p e n d u p o n t h e s t a n d ' s p r o x i m i t y to s t r e a m s , a r e a s o f e c o l o g i c a l c o n c e r n , o r u r b a n a r e a s . O u t s i d e o f a G I S , i n c o r p o r a t i n g a s i n g l e s p a t i a l f a c t o r c a n b e a m a j o r i s s u e (e.g. , L i u a n d H e r r i n g t o n 1 9 9 6 ) . A d j a c e n c y — t h e m a n a g e m e n t o f a c t i v i t i e s o c c u r r i n g i n n e a r b y s t a n d s — w i l l a l s o a f f e c t d e c i s i o n m a k i n g . C r i t i q u e s o f s i m p l i s t i c d e c i s i o n - m a k i n g m o d e l s i n u s e i n t h e i n d u s t r y ( M a r s h a l l 1 9 8 6 ; M e n d o z a a n d S p r o u s e 1989) h a v e l e d to s t u d i e s a n d i m p l e m e n t a t i o n s t h a t d e a l w i t h e c o n o m i c i s s u e s ( C l e a v e s 1 9 9 4 ; L i u 1 9 9 5 ) , p l a n t i n g d e c i s i o n s ( R e e d 1 9 9 1 ) , l a n d a l l o c a t i o n ( V a n K o o t e n et al 1996) a n d o v e r a l l r i s k m a n a g e -m e n t ( F i g h t a n d B e l l 1 9 9 4 ; M u l d e r a n d C o r n s 1 9 9 5 ) . Visualisation—The c o m p l e x i t y o f t h e c u r r e n t f o r e s t m a n a g e m e n t d e c i s i o n - m a k i n g e n v i r o n m e n t m a k e s d a t a v i s u a l i s a t i o n a n i m p o r t a n t i s s u e . T h e v i s u a l i s a t i o n o f b o t h d a t a a n d i t s a s s o c i a t e d u n c e r t a i n t y i s o f t e n t h e o n l y w a y to m a k e s e n s e o f c o m p l e x i s s u e s . W o r k s u c h a s O r l a n d (1994) d e m o n s t r a t e t h i s i m p o r t a n c e to f o r e s t p l a n n i n g a n d d e c i s i o n m a k i n g 2 . 5 . 1 . U N C E R T A I N T Y I N Son. A N D T E R R A I N M O D E L L I N G B o t h s o i l a n d t e r r a i n m o d e l s a r e a l s o c r u c i a l c o m p o n e n t s o f f o r e s t i n v e n t o r y d a t a . U n c e r t a i n t y i n t h e s e d a t a m a y a f f e c t g r o w t h m o d e l s ( t h r o u g h s o i l p a r a m e t e r s , s l o p e a n d a s p e c t ) , p r o f i t a b i l i t y a n d h a r v e s t d e c i s i o n m o d e l s ( t h r o u g h a c c e s s i b i l i t y fac to rs ) , a n d s l o p e s t a b i l i t y , a m o n g o t h e r f a c t o r s . 2.5.1.1. SOIL S o i l d a t a a r e n a t u r a l c a n d i d a t e s for u n c e r t a i n t y s t u d i e s . P e d o l o g i s t s h a v e l o n g b e e n d i s s a t i s f i e d w i t h t r a d i t i o n a l m a p p i n g s y s t e m s , a s s o i l p o l y g o n s a r e p o o r r e p r e s e n t a t i o n s o f t h e c o n t i n u o u s v a r i a t i o n f o u n d i n r e a l i t y . S o i l t y p e s a r e r a r e l y d e l i n e a t e d b y s h a r p b o u n d a r i e s . T h e r e i s a l s o c o n s i d e r a b l e u n c e r t a i n t y i n s a m p l i n g , s i n c e m o s t d a t a a r e i n f e r r e d f r o m v e g e t a t i o n t y p e s . S h i f t i n g a n a l y s i s o v e r to t h e s t a n d a r d d i s c r e t e e n t i t l e s o f G I S d i d l i t t le to a m e l i o r a t e t h i s p r o b l e m . H o w -ever , t h e d i g i t a l e n v i r o n m e n t o f f e r e d n e w a l t e r n a t i v e s for b o t h c o n c e p t u a l a n d p h y s i c a l d a t a s t r u c -48 tures. Uncertainty modelling techniques could be used to represent spatial variability, attribute uncertainty, sample uncertainty, and uncertainty in classification. Soil scientists paved the way for many other applications that also deal with uncertain data. Burrough's publications dealing with fuzzy sets for soil data (1986b; 1989; 1991) were some of the first applications of this methodology to GIS data. This work led to recognition of the utility of fuzzy methods in classification (Burrough et al. 1992; Odeh et al. 1992) and to experiments in visualising soil data uncertainty (Fisher 1993; Maclean et al. 1993). Soil sampling schemes were also improved through explicit recognition of variability (Domburg et al. 1994; McBratney 1994). As techniques developed for the propagation of uncertainty in GIS-based models, soil data be-came a favourite example. Both attribute (Goodchild 1994; Stein 1994) and spatial characteristics (Rogowski 1996a; 1996b) of soils have been modelled using a variety of uncertainty models. 2.5.1.2. ELEVATION Digital elevation data are used for a large and rapidly growing number of applications. In particu-lar, elevation and its derivatives—slope and aspect—are used in forestry for growth models, slope stability models, models to visualise alternative strategies, viewshed modelling, and others. There are several options for gathering elevation data. Ground surveys are the most expensive, but also (potentially) the most accurate. Ground surveys may utilise traditional surveying techniques or, more commonly, GPS derived readings. Elevation data may also be gathered from stereo photo-grammetry, or read directly from synthetic aperture radar (SAR) data. Uncertainty in elevation data and its derivatives is almost always in reference to cardinal data, and since there is a 'true' value for reference (subject to geoid variability), the most common problem is numerical error. This assumes that the spatial location of an elevation datum is known absolutely; however, spatial uncertainty can also create variability. This latter problem may be addressed as with other types of spatial uncertainty (e.g., Monckton 1993). Error typically propagates through a series of readings and operations on its way to a final eleva-tion, slope or aspect value. For example, in surveying elevation, errors in each reading of a transect 4 9 are cumulative, instrument error must be added in, and these errors are added to the error of the reference benchmark. However, these errors are minor relative to the errors generated through interpolation between sample points. In photogrammetry, or its digital equivalent, error is introduced when the stereo correspondence produces mismatches. These can result from a variety of conditions, including low contrast, clouds, relief distortions between images, periodic terrain textures, or the presence of vegetation. Poor alignment of the images, the hardware, or the image software can also generate errors. Some of these problems can be reduced through post-processing techniques. However, post-processing can introduce its own subtle errors due to assumptions in the algorithms used (e.g., Hannah 1981). SAR data are particularly dependent upon registration and algorithmic accuracy; over the past decade algorithms have improved and errors have been reduced. It may soon join the ranks as a regular source of topographic data. Elevation data are commonly interpolated into DEMs which can then be manipulated via GIS and used for a variety of applications. The interpolation and the manipulations also introduce uncer-tainty into the data. Interpolation uncertainty has been studied for a variety of procedures, in-cluding contours (Wood 1994), and satellite-derived data (Sasowsky et al. 1992). The accuracy of the DEMs themselves has been the subject of many studies. Typically, due to the difficulty in accurately ground-truthing the models, different sources are compared (e.g., Brown and Bara 1994; Garcia 1994; Felicisimo 1994), although the advent of accurate GPS data has led to some ground-truthed studies (Adkins 1994). The surfaces derived from DEMs are typically used as input to various environmental models. These uncertainties have been studied by themselves (Skidmore 1989), and in relation to views-hed modelling (Fisher 1991a; 1992; 1994), feature extraction (Lee etal. 1992), hydrologic model-ling (Bruneau and Gascuel-Odoux 1995), and fire modelling (Delavar 1997) among many others. Monte Carlo simulation has been applied to DEM error propagation modelling (under the new name of'stochastic imaging'), in which constrained random values are applied to the entire sur-face (e.g., Flamm and Turner 1994; Journal 1996). 50 2 . 6 . RESEARCH GAPS T h e r e a r e a n u m b e r o f s p e c i f i c a s p e c t s to u n c e r t a i n t y m a n a g e m e n t t h a t h a v e b e e n i n s u f f i c i e n t l y r e s e a r c h e d . O v e r a l l , t h e r e i s a g l a r i n g l a c k o f i n t e g r a t e d s t u d i e s t h a t e x a m i n e ' c r a d l e - t o - g r a v e ' u n c e r t a i n t y — f r o m s o u r c e s t h r o u g h to f i n a l m o d e l l i n g s u c h a s d e c i s i o n s u p p o r t . It i s r a r e t h a t a s p e c i f i c m o d e l l i n g r o u t i n e w i l l h o s t o n l y o n e t y p e o f u n c e r t a i n t y . S p a t i a l , t e m p o r a l a n d t h e m a t i c u n c e r t a i n t i e s o f b o t h c a r d i n a l a n d o r d i n a l d a t a i n t e r a c t i n u n p r e d i c t a b l e w a y s i n c o m p l e x e n v i -r o n m e n t a l o r d e c i s i o n m o d e l s . It i s i m p o r t a n t to s t u d y t h e s e i n t e r a c t i o n s i n r e a l - w o r l d s c e n a r i o s . 2.6.1. UNCERTAINTY MODELLING U n c e r t a i n t y m o d e l l i n g r e s e a r c h i s s t i l l i n i t s i n f a n c y , a n d j u s t i f i c a t i o n f o r r e s e a r c h i n t h i s f i e ld i s n o t d i f f i c u l t to c o m e b y . In a g e n e r a l s e n s e , t h e i m p o r t a n c e o f t h e a c c u r a c y i s s u e i n s p a t i a l d a t a i s e p i t o m i s e d b y t h a t i s s u e b e i n g t h e s u b j e c t o f t h e f i r s t o f t h e 12 r e s e a r c h i n i t i a t i v e s u n d e r t a k e n b y t h e N a t i o n a l C e n t e r fo r G e o g r a p h i c I n f o r m a t i o n a n d A n a l y s i s ( N C G I A , 1 9 8 9 ) . M o r e s p e c i f i c a l l y , B u r r o u g h et al. (1992) p o i n t to t h e l a c k o f w o r k o n t h e p r o b l e m s o f f u z z y s p a t i a l m a p p i n g : b o u n d a -r i e s , n e i g h b o u r h o o d a n d c o n t i g u i t y a n a l y s i s . In a n e a r l i e r w o r k , B u r r o u g h ( 1 9 8 6 b ) n o t e d t h a t : It i s r e m a r k a b l e t h a t t h e r e h a v e b e e n s o few s t u d i e s o n t h e w h o l e p r o b l e m o f r e s i d u a l v a r i a t i o n a n d h o w e r r o r s a r i s e o r a r e p r o p a g a t e d i n G I S p r o c e s s i n g , a n d w h a t t h e e f fec ts o f t h e s e e r r o r s m i g h t b e o n t h e r e s u l t s o f t h e s t u d i e s m a d e . (p. 103) In h i s s u m m a r y p a p e r fo r t h e 'b ib le ' o f s p a t i a l a c c u r a c y ( " A c c u r a c y o f S p a t i a l D a t a b a s e s " , G o o d c h i l d a n d G o p a l 1 9 9 2 ) , S t a n O p e n s h a w (1992) s t a t e s t h a t : T h e r e i s c l e a r l y a n u r g e n t n e e d fo r b a s i c r e s e a r c h to r e s o l v e m a n y o f t h e [GIS e r r o r a n d p r o p a g a t i o n ] i s s u e s . [ Th is i n c l u d e s ] d e v e l o p i n g a b e t t e r u n d e r s t a n d i n g o f e r r o r p r o p a g a t i o n t h r o u g h s p a t i a l d a t a b a s e s , i d e n t i f y i n g a n d c l a s s i f y i n g o p e r a t i o n s m o s t s e n s i t i v e to e r ro r , a n d p r o v i d i n g b a s i c t o o l s to h a n d l e e r r o r i n a v a r i e t y o f s i t u a t i o n s , (p. 2 6 4 ) H e g o e s o n to i n d i c a t e t h e n e c e s s i t y o f u t i l i s i n g t h e l a t e s t h i g h - s p e e d p r o c e s s i n g t e c h n o l o g y to i n v e s t i g a t e t h e e x t e n t o f t h e p r o b l e m i n r e a l - w o r l d a p p l i c a t i o n s . 2.6.2. VALIDATION O n e a s p e c t i s a l m o s t c o m p l e t e l y l a c k i n g i n u n c e r t a i n t y r e s e a r c h : v a l i d a t i o n o f u n c e r t a i n t y m o d -e l s . G e n e r a l e v a l u a t i o n s s u c h a s E b e r b a c h (1993) a n d L i u a n d H e r r i n g t o n (1996) t h a t l o o k a t t h e 51 consequences or specific costs of uncertainty base their conclusions on the propagation of uncer-tainty values. These may be derived in numerous ways, including expert evaluations via semantic analysis, variability data recorded in field surveys, or simply rough estimates. Mathematical propa-gation models such as those discussed in Burrough (1986b) and implemented in Heuvelink (1995) can produce incredibly large uncertainty values if subtraction of inputs is part of the model. Other propagation models are based on numerous assumptions, such as the validity of stochastic landscape simulation (e.g., Journal 1996) or the Gaussian distribution of error terms. It is rare that uncertainty in the model inputs is validated; rarer still that outputs are confirmed. The primary difficulty is a lack of techniques for sampling uncertainty. Dealing with cardinal values is a straightforward matter, though certainly time-consuming. One simply compares real-ity with samples and generates an error distribution. However, if both spatial and attribute un-certainty are addressed, this sample point may actually be located elsewhere, complicating mat-ters considerably. Ordinal data are even more complex. For example, how does one confirm a fuzzy membership value in a soil class? How can boundary uncertainty values be confirmed under both spatial and attribute uncertainty? If the results of uncertainty models are to be considered useful in decision support, there must be some methods available to indicate that they tell the 'truth'. Although it is accurate to state that a particular uncertainty model or representation is more 'honest' than standard Boolean methods (e.g., Lowell 1993b), it would assist the credibility of this research field if it were possible to compare the degree of 'honesty'. 2 . 6 . 3 . L I N K I N G U N C E R T A I N T Y M A N A G E M E N T W I T H D E C I S I O N M A K I N G Although uncertainty in decision making is a common research area, links between data uncer-tainty and decision making are rarely investigated. There are two major aspects: developing meth-ods of summarising uncertainty for decision makers (and evaluating these methods), and devel-oping methods for evaluating the effectiveness of decisions made using such inputs. The first of these aspects focuses primarily on the visualisation of uncertainty. MacEachren (1992) indicates that cartographers have spent little time investigating methods of presenting uncer-52 tainty information. This is also indicated by a 200 page report on the "Visualization of Spatial Data Quality" (Beard and Buttenfield 1991) as part of another NCGIA research initiative. The bulk of the position papers in this work indicate a need for applied research in this area, the major impediment being that uncertainty models are required before they may be visualised. The mod-els themselves are perhaps the most neglected area of research, yet the models, the propagation algorithms, and the visualisation methods can only be effectively developed in concert with each other. Evaluation of their effectiveness is also a crucial issue. The second aspect—evaluating the effectiveness of decisions made using uncertainty informa-tion—is also a neglected area. This may be principally due to the immaturity of this research field; however, the lack may also be due to the immensity of the problem. It is difficult to compare the implications of decisions made under differing information environments without performing substantial double-blind experiments. Before committing resources to such experimentation, it is important to have a thorough understanding of uncertainty management at all levels. Therefore, while this current research examines some of the implications of decision making using uncer-tainty management, verification and evaluation is not implemented for the reasons discussed above. 2 . 7 . SUMMARY This chapter has attempted to summarise a number of related research fields that, together, focus on uncertainty modelling in spatial data and natural resource inventory. A number of research gaps have been noted, including the need for better understanding of uncertainty behaviour and propagation in spatial databases, the need for applied research to allow interpretation of uncer-tainty, and a requirement for validation and applied tests of uncertainty models. The following chapter makes the first inroads into these research gaps by presenting an uncer-tainty model developed in previous research undertaken by the author. The basic components of this model are used as data sources during the new research described in upcoming chapters. Rather than introduce these components piecemeal into the discussion of new research method-53 ology, a summary of the previous research is presented in the following chapter to facilitate understanding of data lineage. 54 Chapter Three Modelling and Storing Measures of Uncertainty in Inventory 3 . 1 . I N T R O D U C T I O N There are a number of uncertainty models that could be used in the process of developing sam-pling, evaluation and verification techniques (e.g., Fisher 1991, Heuvelink 1995). Much of the work undertaken in this dissertation makes use of an uncertainty model developed by the author during previous research (Davis 1994; Davis and Keller 1997a). The purpose of that research was to develop the basic components of an uncertainty model that could parallel a standard process model. The theoretical data model used was fuzzy sets, implemented in a raster environment. The principal focus of the research was on dealing with the troublesome conversion of standard vector and raster data, based on a Boolean concept of reality (something either is or isn't), into a fuzzy representation. The process model utilised in the test case of the research was the infinite slope stability model. Upcoming chapters will make use of various components of the uncertainty model. Building on the groundwork established in the previous research, Chapter Four will focus on the development of uncertainty sampling and evaluation techniques (applicable to a variety of uncertainty models), while Chapter Five will include evaluation of uncertainty model output. Throughout the remain-55 der of this dissertation, the fuzzy-set data concept (introduced in the previous chapter and further discussed herein) is used extensively. In these upcoming chapters, the specific details of the uncertainty model used are not of great importance during discussion of the sampling, evaluation and verification concepts and of the development of theoretical techniques (i.e., various models could have been used). Nevertheless, when these concepts and techniques are implemented, an understanding of the examples pre-sented in upcoming chapters will require some degree of understanding of the details of the underlying data model. Therefore, this chapter provides a capsule overview of the previous re-search. Sufficient detail is provided for the reader to understand the purpose and major compo-nents of the model. Minor components are, for the most part, omitted. The reader is referred to the original document for clarification (Davis 1994). To summarise, the previous research included two major stages: 1) Making use of information such as expert opinion and published statistics, metadata was generated that describes uncertainty in a number of data layers (including soil type, per-cent slope, and ground cover). These metadata focus on the spatial variability of uncer-tainty, rather than simply summary measures, and are therefore described as an 'uncer-tainty model' of input data. 2) By utilising a combination of fuzzy set analysis and Monte Carlo simulation, an uncertainty model for slope stability was developed. The uncertainties modelled in stage (1) were propa-gated through a slope stability modelling procedure. In order to address the various types of uncertainty, the research used a variety of techniques, including constrained DEM randomisation, the coding of expert opinion as fuzzy classifiers, fuzzy set manipulation, and uncertainty parameter estimates from compiled laboratory data. Two ma-jor shortcoming were noted in the conclusion: 1) the visualisation of the results is a key compo-nent of understanding such a complex dataset—work is required on designing and implementing effective visualisation tools for this type of uncertainty data in order to feed into work on decision making under uncertainty; and 2) although a number of alternative uncertainty models had been 56 generated by various researchers, there existed no way of evaluating or comparing these models. Both of these items require considerable research work—both in laboratory development and field data gathering—and were felt to be well beyond the scope of the original project. 3 . 2 . METHODOLOGY The central focus of the previous research was the production of spatially variable uncertainty estimates in the output of a typical resource modelling procedure: slope stability assessment. This procedure was chosen due to the variety of data types required as input: soil and forestry polygons, DEM and slope surfaces, and laboratory-derived soil attribute data. The methods devel-oped emphasised three new elements: 1) an asymmetric spatially-variable polygon boundary model termed the 'corridor of transition model'; 2) refinements to and applications of a theoretical DEM randomisation procedure (proposed by Goodchild 1980); and 3) the combination of fuzzy values and variability data in the same modelling procedure. The first stages of the modelling procedure required that each of the major uncertainties in the inputs be identified and numerically modelled. These are identified as follows: 1. Classification uncertainty in classified values such as soil type or forest type; 2. Data collection uncertainty (e.g., 10% of polygons misclassified); 3. Spatial uncertainties (e.g., certainty in classification decreases near polygon boundaries); 4. Error envelopes around derived items (e.g., soil cohesion for Type 1 = X±A); and 5. Error envelopes around continuous mapped values such as elevation. These five general groups can be split into two major types. The first three focus on uncertainties in classified values, while the latter two focus on error—where cardinal numbers are available (or can be derived). The two groups differ conceptually, and so must be addressed in different ways. The first three deal specifically with classes and polygons. One of the principal issues that had to be addressed revolves around the polygon data structure. Polygons are somewhat appropriate in describing information such as forest cutblock boundaries, but less useful in describing the bounda-5 7 ries of tree stands, and even less useful in delineating the distribution of soil or slope. In the process of reducing a continuous reality to a polygon data structure, a great deal of uncertainty is introduced into the data. The first task in generating an uncertainty model for polygon data is to model exactly how much uncertainty is generated in this process and where it is located. 3 . 2.1. T H E C O R R I D O R O F T R A N S I T I O N M O D E L Uncertainty in resource data is often spatially variable, and in such cases would be best repre-sented by a continuous surface. However, the most common method of storing such resource data (polygons) is a representation that contains abrupt transitions between homogenous areas. The 'corridor of transition' (COT) model is a term developed to describe a procedure of estimating both the level and spatial location of uncertainty generated by the data reduction process leading to polygon formation. Essentially, this procedure takes a polygonal surface, information about what assumptions were used in forming the polygons, and information about the data gathering and classification system, and generates information about the level of uncertainty in each type of polygon at every point in the mapped area. 3.2.1.1. THE SEMANTIC IMPORT MODEL Ideally, most of the required information would be available from the original data used to gener-ate the polygons. However, it is rare that such information would be available to the typical resource modeller—who normally has to make due with a map generated by others, and possibly some minimal metadata. The modeller could go out in the field and perform extensive resampling of the data; however, as before, this scenario is unlikely given the costs of current sampling techniques. While less accurate, a reasonable substitute can be found in expert knowledge re-garding the resource (note that in the following discussion, soil data are used as an example; the procedures discussed are equally applicable to ground cover data or any other spatially variable phenomena). Soil scientists, familiar with the area being modelled, can be polled for information regarding the mtermixing of soil types, the characteristics of soil inclusions, and other generalised data. Burrough (1989) utilised such methods in a study of fuzzy soil classification. His 'Semantic Import Model' 58 (Robinson 1988) was utilised in the previous research to translate semantic classifiers into nu-merical form. Similarly, textual soil survey results often contain such non-mathematical data; but such information is lost when the soil maps are produced. Using the techniques of Semantic Import, this quantitative information can be translated into a fuzzy index or Certainty Factor (CF), where 0 <= CF <= 1. This technique has been used in a number of soil survey studies, including the Robinson and Burroughs work noted above, as well as by Suryana (1993), who modelled crop suitability using expert opinions on the certainty of a variety of soil factors. For example, the soil class 'sandy/silty morainal blanket' is described as 'very easily confused' with the class 'silty morainal veneer'. In this case, the term 'very easily confused' would be trans-lated as a low certainty factor: '0.2' when quantifying the classification certainty of one relative to the other. A scale of phrases linked to numerical values is utilised (for more information on the method see Robinson 1988 and Davis 1994). In the same way, the level of certainty in boundary delineation is also captured. For example, there is generally high certainty in delineating the boundary between a bedrock extrusion and another soil class, but low certainty when the bound-ary is between two surficially similar soil types. Note that the data derived from Semantic Import were not used to replace existing quantitative data, but to enhance detail regarding the level of certainty to be assigned to these numbers. The concept of 'certainty factor' is simply another name for fuzzy set theory. Fuzzy sets are a way of quantifying degrees of membership in a set. If the 'set' in question is the class 'sandy/silty morainal blanket', then this number would refer to the degree to which a particular sample is like the ideal class. It may also refer to our degree of certainty that the value at a particular point would fall into the bounds of this class. Fuzzy set theory is introduced in greater detail in Section 2.3.1.2.4. The behaviour of fuzzy thematic indices (the 'certainty factor') in the 'transition corridor' between polygons can be referred to as a thematic spatial interaction model. Spatial interaction models, such as the gravity model, have a long history in human geography. The concept has application on the physical side of the discipline as well. The transition corridor model is defined as follows. 59 3.2.1.1.1. Polygon Centres In addressing the areas that bound a polygon, it is necessary to focus on zones both inside and outside the polygon. In defining such zones, a very precise definition of a polygon's spatial struc-ture is necessary—with particular emphasis on the polygon centre. One basic assumption of this transition corridor model is that polygons are most similar to their classified type at their physical centres, while least 'pure' at or at some distance beyond their borders, subject to the constraints of the Semantic Import (SI) tables (such as Table 3.1). This contention is supported by Burrough et al. (1992) in reference to soils, while Joy et al. (1994) support such a model regarding forest stands. The physical centre of the polygon is an important part of this model, so adopting it requires defining what the centre actually is. Circular polygons have an easily determined centre; however, the more common irregular polygon does not. The most common measure of polygon centrality is the centroid. The centroid of a polygon does not necessarily define a practical spatial centre—a non-circular polygon can easily have a cen-troid located outside its borders. A method used by Lowell (1993a) in which centroids were digi-tised at the visual centres of a polygon (by eye) is impractical in a real-world data set—too many polygons exist for such a manual technique. In automating the procedure it became necessary to redefine the polygon in a way that accounts for irregularities in its shape. For example, in compu-ter image analysis / object recognition research it is often necessary to 'skeletonise' an image segment in order to compare (and recognise) a generalised shape (e.g., see Choras 1993, Brandt Soil Type 1 2 3 4 5 6 7 8 9 10 1 0.85 0.35 0.25 0.15 0.15 0.10 0.15 0.15 0.15 0.35 2 0.85 0.25 0.15 0.15 0.10 0.15 0.15 0.15 0.35 3 090 0.15 0.35 0.10 0.15 0.15 0.15 0.25 4 0.80 0.15 0.10 0.40 0.40 0.35 0.25 5 090 0.10 0.15 0.15 0.15 0.20 6 0.90 0.10 0.10 0.10 0.25 7 0.40 0.35 0.25 8 0 80 0.35 0.25 9 0.85 0.25 10 0.90 Forest Type 1 2 1 0.1 2 0.95 Table 3.1. Misclassiflcation matrices derived from the SI model. The value at c9 represents the possibility that type J would be misclassifled as type J. The chart's trace contains the maximum certainty values. 60 1994). A skeleton is composed of lines that run parallel or at right angles to the contours defining an object's 3-D shape. Another relevant application is the construction of extended Voronoi net-works (Gold 1992). These networks capture the spatial relationships between objects in an intui-tive manner. Although current research has implemented them for points, extensions of the con-cept may be applicable to defining the relationships between the points surrounding a polygon. In the current application, although the polygons are only 2-D, the effective 'shape' of a polygon can be defined by treating it, in essence, as a 3-D object. An even slope is drawn from the bound-ary to the central region. Where the various slopes intersect—there lies the centre. The centre now is defined as a line, rather than a point, with varying degrees of centrality. The method utilised in this 'corridor of transition' is as follows. A spatial model was defined in which the 'centre' of an irregular polygon is delineated by a series of points that are located the maximum-minimum distance from any polygon boundary; that is, each point in the series is as far from an edge as it can be without being closer to another edge. As illustrated in Figure 3.1, a circular polygon has only one point at a maximum-minimum distance from its boundaries. A 'sausage' polygon has a clear line, each point on which is the same dis-tance from its closest boundaries. In the case of the irregular polygon, the centre ridge is more complex. If the distances to the edge are treated as an elevation (using a slope of one), the result-ing ridge-like structure can be visualised as shown in Figure 3.2. The elevation of the ridge varies with max-min distance to the nearest boundary. Utilising a raster surface model, the algorithm used to produce this 'max-min ridge' creates a stepped surface by filling the cell from its boundaries inward. The details are as follows: Figure 3.1. Alternative centroid models for variable polygon shapes 61 Figure 3 .2 . The variable ridge model of a polygon's centre, using the z dimension and a perspective view for illustrative purposes. The cen-tral ridge, and its relative height, defines the local thickness of the polygon and provides a 'direction to centre' for later algorithms. 1) Produce a raster surface of polygon boundaries (set at '1'); all other cells remain empty; 2) Begin a loop through all cells with Current_Depth initialized at '2'; 3) If Current_Cell is empty AND adjacent to a filled cell AND the filled cell's value = Current_Depth -1 (i.e., the current cell is the next step up on the stepped surface); 3a) Fill Current_Cell with Current_Depth; 3b) If an adjacent cell meets the criteria of (3), move and fill as above; 3c) Repeat until criteria cannot be met; (note that special processing is required for very narrow structures) 4) Repeat for all cells in surface; 5) Increment Current_Depth and repeat loop (i.e., next step up in the surface); 6) Continue until all cells have been assigned a value; Although similar to a raster distance function (i.e., 'distance from every pixel to the nearest bound-ary'), this algorithm provides a more evenly stepped surface as required by later procedures, particularly in areas with narrow polygon extensions. 3.2.1.1.2. Polygon Boundaries With the polygon centre defined, the next stage is to incorporate the 'expert opinion' information into the boundary model. To reiterate, the boundary between two polygons is assumed to be the point where the possibility of a sample falling into one or the other category is equal. If one polygon is Soil Type 5,and the other Type 2, a sample at their bounds would have equal possibility 62 of being 5 or 2. The boundary model deals with what the possibilities are in all other areas. If, for example, the two types are very dissimilar, and could be easily distinguished during a soil survey, then it is likely that the boundary is well placed, and a sample taken a short distance within the Type 5 polygon would likely show Type 5 soil. In contrast, if the two soil types are quite similar, then the opposite would be true. By applying the 'expert opinion' data coded via semantic import, the entire surface of the (originally) polygonal map can be coded with the 'certainty factor' that quantifies this possibility for every soil type. This procedure must also capture other information into the boundary model, such as overall classification error. Standard boundary models (Figure 2.5) are inadequate to the task of incorporating the certainty factors—also termed 'fuzzy metadata'—defined by the Semantic Import (SI) work. Even Mark and Csillag's (1989) parametric functions (Figure 2.5-c) suffer from an (admitted) series of broad as-sumptions—chief of which is symmetry. There is no reason to assume that two classes will blend evenly at their polygon's boundaries with a similar membership function slope on both sides. In defining the membership function(s) that occur around the polygon boundaries, four primary items are of interest. 1. What are the certainty factors involved in classifying each cover type? This will indicate the maximum certainty that can be associated with a particular type. For example, even in the centre of a polygon there would still be some degree of uncertainty in the class due to variability on the ground. 2. What are the minimum certainty factors; that is, what is the likelihood that Soil Type B has been misclassified and is actually Soil Type A? In this case the focus is on misclassiflcation rather than variability. As this will vary for different soil-type relationships, a matrix of values is required. 3. How do two spatially adjacent types interact in the transition corridor? If the two are similar there may be a gentle gradient, while dissimilar types may have sharper barriers. For exam-ple, on the boundary between two similar soil type polygons, there might be a large area 63 where a random sample would have similar possibilities of showing one class or the other. This information also requires a matrix of values. 4. The attribute 'blurring' in the transition corridor between polygons is likely dependent upon polygon size. An estimation of the size factor is also required. Utilising such Si-derived data, the following model was defined (Figure 3.3). Keep in mind that the 'slopes' referred to in the definition are changes in the certainty factor of a classification, and not a measure of actual intermingling (although the two might coincide). For a given point on a polygon boundary the directions of internal and external slope were deter-mined from the max-min ridge map. The maximum intrusion distance was determined from the values derived from the SI matrix. One effect of this limit is to cause small polygons (with a maximum width below this distance) to be 'influenced' (shift in the internal fuzzy structure) by their larger neighbours. A final intrusion limit was also determined from the SI data. For example, a value of 0.9 sets the final intrusion distance as 90% of the distance to the maximum intrusion line. The SI data were Figure 3 .3 . The 'corridor of transition' model for spatial boundary uncertainty - cross-section and plan views. 64 also used to set the extrusion distance in a similar manner. Note, however, that the spatial con-straints on the extrusion result from the characteristics of the adjacent, rather than the current, polygon. The internal and external slopes were calculated and applied to the fuzzy surface using a 0.5 index at the polygon boundary. At this boundary an idealised sample taken from the surface should be 'equally' similar to the polygons on both sides of the line. This is termed the 'coinciding of possibility' by Mark and Csillag (1989), or, in Boolean-style probability terms, there is a 50% chance of a sample belonging to either polygon. The following algorithm was devised, making use of 'soil type' as an example. 1) Create a surface for each soil type, where 0 <= Cell_Value <= 1; 2) Initialize surfaces using SI values. For every cell in every surface: 2a) Assign the related value from the SI table. For example, if Current_Surface is Type B, and Current_Cell was originally assigned Type C, then use SI value for possibility of C being misclassified as B. If Current_Cell was originally assigned B then use max. certainty factor (SI trace). 3) For every surface; For every cell in Current_Surface: 3a) If Current_Cell is a polygon boundary, assign it 0.5 and determine number of adjacent polygon types. For each Adjacent_Type: (i) Calculate distance to internal 'ridge' from Current_Cell using the line of max slope from the max-min ridge surface. (ii) Calculate internal slope, where rise/run = maximum internal fuzzy value / Sl-derived max intrusion distance (see Figure 3.3). (iii) Loop around Current_Cell, assigning SI values based on slope and distance to Current_Cell; When inside a Current_Surface polygon, overwrite if < current value, when outside a Current_Surface poly-gon, overwrite if > current value; In summary, this transition corridor' procedure has taken the original Boolean model of poly-gons, where there are sharp bounds between class A and class B, and—using a number of values derived from expert opinion—created a constrained smoothing between classes. If the 'certainty factor' or fuzzy surface is visualised using a perspective view, the results appear as in Figure 3.4 65 and 3.5. These views represent the certainty in one particular class. In the first figure a standard (Boolean) soil class map is shown. The ridges are areas where the polygons are classified as soil class 1 (sandy/silty morainal blanket), while the valleys are areas that are not soil class 1. In the second figure the transition corridor model has been applied to the original polygons. High de-grees of classification certainty only appear in the central regions of the original polygon struc-tures. Note that this type of surface is generated for each of the soil classes. 3.2.2. DEM RANDOMISATION Classification uncertainty and how this uncertainty varies over space has been dealt with above. The next issue in addressing uncertainty in the inputs to the slope stability model is the latter two of the five general classes of uncertainty—the error envelopes for spatially distributed values such as slope, and values that are attributes of classified values, such as soil cohesion. In this section the principal issue is error—a far more common problem than the modelling of uncertainty dealt with above. Error propagation can be addressed in two basic ways: mathemati-cal functions (e.g., Heuvelink et al. 1989), or a Monte Carlo method. Standard propagation theory (Taylor 1982) restricts mathematical analysis to functions that are continuously differentiable. Though computationally intensive, Monte Carlo methods are considered applicable to error propa-gation problems in a GIS context (Openshaw 1989; Heuvelink and Burrough 1993). Figure 3 .4 . Perspective view of the fuzzy surface representing soil type 1 prior to im-plementation of the transition corridor al-gorithm. Figure 3 .5. Perspective view of the fuzzy surface representing soil type 1 after ap-plying the transition corridor algorithm. 66 In Monte Carlo simulation the model is run using the standard set of inputs, and then run again using a new set of inputs that have been randomised within the error envelope for each type of data. Rather than deriving a single output, the output becomes a distribution, and the overall variability can be ascertained. Attribute values such as 'soil cohesion' can be randomised in a straightforward manner. However, spatially distributed values such as slope must be treated slightly differently due to autocorrelation. The following procedure is an extension of one origi-nally proposed by Goodchild (1980). A typical continuous digital elevation model (DEM) is derived from spot height data. The errors associated with these spot heights are normally available as accompanying metadata. If Kriging is utilised to generate the DEM, then variance values for every cell in the model can be saved. Combining this variance with the original spot height error, a final error value can be generated for every cell. It is assumed that this error follows a normal error curve (Goodchild 1980; SRMB 1990). By making use of this error curve and a constrained randomisation procedure, it is possible to generate an 'equally likely' elevation surface in which each cell is provided with a new height (within its error bounds). However, there is one major problem. Working with individual cells ignores the autocorrelation present in an elevation model. If two adjacent cells are assigned val-ues from opposite ends of their error envelopes, artefactual roughness has been created, decreas-ing the overall autocorrelation index. This problem was addressed by Goodchild (1980). However, the original algorithm suggested by Goodchild brings the original and new autocorrelation indices together through a constrained random swapping of cells. However, this algorithm was intended for datasets other than DEMs. The procedure described here utilises a series of constrained smoothing passes over the dataset to gradually reduce short-range variability. In so doing, it lessens the artefactual peaks and troughs generated in the randomisation procedure. In the end, a new elevation surface is created in which each cell's value falls within the original elevation value's error envelope, and the autocorrelation index of the entire surface matches the original surface to within a stated tolerance. 67 This process uses the following algorithm: 1) Generate a variogram from the original DEM and fit a curve; 2) Generate a blank DEM. For every cell: 2a) Determine distance (D) from cell to closet elevation spot height. 2b) Substitute D in variogram curve, and derive std dev for Current_Cell (SDc); 2c) Add SDc to SD of closest spot height, giving actual SD for cell's elevation (SDe); 2d) Generate a normal-constrained random number based on SDe, using the original DEM value as a mean; 3) repeat for every cell in the DEM; 4) determine a spatial autocorrelation index (Moran's Index: / ) for the original DEM (/0rig) and for the 'equally likely' surface (/e|); 5) If l/orig - fell > specified tolerance 7"; 5b) Smooth the new DEM (using 0.1 * I Old_Value - 9-cell-window mean I) and repeat (5); 3.2.3. COMBINING ERROR AND UNCERTAINTY Application of the procedures developed in the previous two sections resulted in data structures that carry considerably more information content than the original polygonal maps and raster DEM. This information might be used to provide uncertainty estimates in simple GIS queries such as 'area of class A' or 'what is here?' The fuzzy datasets generated through the transition corridor procedure can be combined using a fuzzy math function known as the 'Joint Membership Function', described in more detail in Sec-tion 2.3.1.2.4. The error values can be combined using a Monte Carlo procedure as described above. However, more complex resource modelling procedures require the development of meth-ods of combining fuzzy class membership data and cardinal error data. These procedures are presented in the context of slope stability modelling. 3.3. S L O P E S T A B I L I T Y M O D E L L I N G The infinite slope stability equation is a commonly used measure of the stability of surficial mate-rials. This model utilises data with a variety of potential appended uncertainties, namely soil cohesion and other soil properties, forest cover and root depth, and slope of the soil plane based 68 on an elevation model. The resulting 'factor-of-safety' value is a relative number only—compara-ble only within a particular application. p £ _ Cr +c; + c o s 2 a[g0 +y{D-Dw)+(rsal-yw)pjtan0' ( 3 1 } smacosa[q0+r(D-Dw)+rsatDw] FS = factor of safety f) = total soil thickness C r = tree root cohesion C s = soil cohesion Y = moist soil unit weight yw = water unit weight (X = slope of ground surface Dw = saturated soil thickness CJQ = tree surcharge 0 = internal angle of friction ysat = saturated soil unit weight 3.3.1. COMBINING AND SUMMARISING Fuzzy set membership values for soil and forestry classes can be easily combined with the fuzzy 'AND' of the JMF function. The cardinal DEM and soil attribute error data can be propagated through the equation using Monte Carlo methods. Combining the two requires a multiple stage simulation procedure, resulting in a number of output maps representing degrees of certainty in each particular realisation. Rather than simply generating factor-of-safety data, these results can be utilised to present information relevant to the particular application of the model. Each cell in the map is a member of all soil classes and all forest classes, with varying degrees of membership—some close to zero. Each realisation requires a different set of soil parameters to be applied in the FS equation (3.1). The Monte Carlo procedure must, therefore, be repeated for every possible combination of forest and soil class. The following algorithm summarises this procedure: 1) For each soil type; for each forest cover type; 2) Generate an 'equally likely' DEM based on error estimates (as discussed above); 3) Derive a slope map from the DEM; 4) For every cell, randomise all the derived variables based on the current soil and forest cover types; 5) Apply the factor-of-safety formula to every cell; 6) Repeat #2-5 M times; and 7) Compute summary statistics for the M maps. 69 The number of Monte Carlo runs (M) required to properly represent the distribution of the uncer-tainty is a subject of debate in the literature. In this case, a significance test does not really apply as there is no formal experimental design. The only value to test against is the Boolean result, which, technically, should be the mean of the resulting Monte Carlo frequency distribution. How-ever, as proposed by Openshaw (1989), there is no reason why such a statistic might not be used as a guide rather than a precise test. Hope (1968) showed that only 19 realisations were required to yield statistically useful results. Openshaw refers to an M of 20-30 if only summary statistics are required. For the purpose of the following case study, a conservative value of M=50 was chosen. To test the consistency of the Monte Carlo algorithm at this M, the 50 run algorithm was repeated 20 times for a limited subset of test data and the results plotted (see Davis 1994). The curves were consist-ent, arid the Boolean value never approached the tails, indicating that M=50 was sufficient for the operation. Too small an M value would be indicated by exceedingly random lines, while identical curves would indicate an unnecessarily large M value. This visual method was suggested by Openshaw (1989). 3 . 4 . C A S E S T U D Y In order to demonstrate these techniques a case study was implemented. As a test-of-concept, the purpose of this case study was not to verify the actual numbers involved, but to demonstrate how these uncertainty and error management techniques could be used to extract additional informa-tion from existing data and knowledge. The case study served to illustrate how slope stability data can be turned into information useful in a decision support context, but did not actually involve any additional interpretation of the output for decision support. An 8500 hectare study site was selected. The area is located on Louise Island, on the east side of Moresby Island in the Queen Charlotte Group, British Columbia, Canada, at 53°N, 132°E (see Figure 3.6). The area is a forest company test site, and was selected for: 1) the availability of data, and 2) the availability of experts with experience in the region. 70 Figure 3.6. Location of the Louise Island study site 3 . 5 . T H E B O U N D A R Y M O D E L A N D A T T R I B U T E U N C E R T A I N T Y Soil data were imported from existing maps digitised during previous Boolean slope stability studies. Though a factor, positional uncertainty of linework was minimised and then ignored in order to simplify this test of concept. Ten classes of soil were defined. Forestry data were gathered from digital forest cover maps and reduced to three classes: cut within 3-10 years, forested and other. Maps were rasterised at a 25m resolution. Several soil and forestry experts were consulted regarding SI data for the transition corridor model. A sample of the resulting values for soil types is presented in Table 3.2. The values shown 71 represent the estimated likelihood of misclassification based on surface indicators. The trace represents the maximum certainty values. A similar table was generated for class-to-class intru-sion (overlap). Tables for the considerably simpler forest classes were also generated. The transition corridor algorithm detailed above was then implemented for each soil and forest class, resulting in a fuzzy surface similar to Figure 3.5 for each. Although many small polygons appear in the foreground, note the plateaus' apparent on the larger structures, indicating central areas of polygons not affected by the boundary model. In contrast, the equivalent Boolean map (Figure 3.4) shows nothing but plateaus and cliff-like transitions. Cross-sections of two different boundary types are presented in Figure 3.7. 3 . 6 . T H E M O N T E C A R L O P R O C E D U R E The 'non-spatial' items required for Equation 3.1 were gathered from an extensive literature re-view undertaken by the US Forest Service Intermountain Research Station (Hammond et al 1992) while developing their slope stability modelling system. Soil types found in the Louise Island study site could be successfully matched with the classification system used by the USFS, and means and standard deviations of the relevant data were calculated. The USFS study found that, for the most part, the values for each variable were normally distributed, with the exceptions of soil cohesion and root cohesion which were log-normal. The values are presented in Table 3.2. The Monte Carlo simulation process must be repeated for every possible combination of soil and forest cover. In the limited 'test-of-concept' there were only 10 soil and 3 forest types, requiring 30 different simulation runs. However, more complex models could require significantly more simula-tions. In such a case, it would be prudent to deterinine in advance what classified data value Figure 3.7. Detail of transition corridors between (a) bedrock and soil type 1; and (b) two similar soil types (1 and 3). 72 Class Desc. UCS Dry Weiqht Cohesion Ang. Frictio n SoilTh ckness 1 gm/cc mean sd kg/m2 mean sd degrees mean sd mean sd low hi lo hi lo hi 1 Sandy/Silty Morainal Blanket SM 1.121 2.051 1.586 0.155 • 0 3197.7 1598.9 533.0 32 46 39 2.33 1.5 0.1 2 Silty Morainal Blanket/Veneer ML 0.961 1.922 1.442 0.160 1221 1855.2 1537.8 105.8 23 43 33 3.33 1.5 0.1 3 Rubbly, Silty Colluvium/Morainal ML-MH 1.378 1.474 1.426 0.016 390.6 1464.6 927.6 179.0 29 32 30.5 0.50 1.5 0.1 4 Gravelly, Silty, Fluvial GM 1.762 2.083 1.922 0.053 488.2 2099.3 1293.7 268.5 33 43 38 1.67 1.5 0.1 5 Rubbly Colluvial GW 1.570 2.051 1.810 0.080 0 0.0 0.0 0.0 28 39 33.5 1.83 1.5 0.1 6 Bedrock . 0.000 0.000 0.0 0.0 88 89 88.5 0.17 0 0 7 Gravelly Silty Fluvial GM 1.762 2.083 1.922 0.053 488.2 2099.3 1293.7 268.5 33 43 38 1.67 1.5 0.1 8 Gravelly Silty Fluvial GM 1.762 2.083 1.922 0.053 488.2 2099.3 1293.7 268.5 33 43 38 1.67 1.5 0.1 9 Silty Fluvial ML 0.993 1.089 1.041 0.016 0 0.0 0.0 0.0 22 30 26 1.33 1.5 0.1 10 Silty Morainal w. Bedrock MH 1.121 1.442 1.282 0.053 390.6 1464.6 927.6 179.0 27 47 37 3.33 1 0.3 Table 3.2. Soil characteristics and estimated standard deviations (Primary source: Hammond et al 1992). combinations are incompatible or very unlikely (e.g., bedrock and mature forest) and eliminate these from the procedure. Elevation data consisting of a semi-regular grid of elevation spot heights (British Columbia TRIM data; SRMB 1990) produced from stereo-photogrammetry were utilised. The average spacing be-tween data points is 28m, therefore a raster grid spacing of 25m was chosen to minimise the interpolation required. The stated error parameters for spot height data are (SRMB 1990): • 90% of all determinate DEM points vertically accurate within ±5 metres. • 90% of all indeterminate points vertically accurate ± 20m 90% of the time. The vertical accuracy of the primary elevation points was calculated as follows: assuming that the error at each point is normally distributed (an assumption supported by Fisher 1989, 1991b), 90% of the area under a normal curve is contained within ±1.28 standard deviations of the mean. As this 1.28 refers to a 5m elevation difference regarding determinate points, one standard devia-tion of a specific point is calculated as 5/1.28, or 3.91m. One standard deviation for indetermi-nate points can be calculated as SD = 20/1.28, or 15.62m. Block kriging using Surfer 4.0 was used to interpolate from these points to a regular grid. Kriging was chosen as the interpolation method for two reasons: its high accuracy and an ability to produce variance maps of the derived values. The published error values were combined with the variance for each data point to produce a final variance value for each interpolated cell. 73 The variance values were utilised in running the 'equally likely' DEM algorithm; in most cases, three to five smoothing iterations were required to bring Moran's Index within 0.001 of the origi-nal. Slopes were then derived from the DEM. 3 . 7 . R E S U L T S The entire run of 1500 Monte Carlo simulations (50 runs x 30 type combinations) at a 25m resolution cannot be feasibly stored on most contemporary GIS platforms. A series of initial test-runs of the slope stability model were performed on a representative section of the data in order to determine the shape of the output curve. When fitted to a curve, the results indicate that a normal function would suffice to properly describe the resulting realisations of the slope stability values. Three maps were generated for each type combination: certainty factor, factor-of-safety mean and factor-of-safety standard deviation. For example as illustrated in Figure 3.8, a particular cell might be assigned the following values: 1. For realisation Soil = 3 and Forest = 2, CF = 1.2, FS = 2.1, SD = 1.4 2. For realisation Soil = 4 and Forest = 2, CF = 6.5, FS = 6.2, SD = 1.5 Realisation #1 Soil = 3 (Rubbly/Silty Colluvium) Forest = 2 (Forested) Realisation #2 Soil = 4 (GraveJIy/Silty Fluvial) Forest = 2 (Forested) Factor-of-Safety Mean 2.1 " Factor-of-Safety Std. Dev. Certainty Factor mm® •• • mm Figure 3.8. An illustration of the three types of surfaces resulting from the uncer-tainty modelling routine. Two of the many realisations are pictured. (Note that these are typical planimetric grids - the image is for illustrative purposes only) 74 This means that, in the first case, the likelihood that the soil type is 3 and the forest type is 2 is rather low (CF=1.2); if this actually is the case then the factor-of-safety is relatively low (2.1 -indicating a high likelihood of failure), and there is a reasonably high certainty in this factor-of-safety prediction (standard deviation = 1.4). The second realisation, where the soil is Type 4 and the forest is Type 2, is much more likely. In this case, the factor-of-safety is relatively high (i.e., safe), and the standard deviation indicates a relatively high certainty in this FS value. It is necessary to examine all realisations to determine the most likely combinations, and it is important to note that certainty factors and standard deviations are relative to the entire study area. The standard slope stability results (i.e., Boolean results) may be extracted using a maximum likelihood filter, in which only the highest certainty factor for each cell, and its associated values, are retained. Essentially, this maximum likelihood summary tosses out all the additional infor-mation generated by the new procedures discussed above, and returns to the Boolean represen-tation. The maximum likelihood map for the entire region is displayed in Figure 3.9. By incorporating the standard deviation information into the analysis (Figure 3.10), it is possible to use the maximum likelihood information in different ways. For example, as demonstrated in Figure 3.11, areas in which slope instability are highly possible (low standard deviation and low factor-of-safety) are highlighted. The uncertainty model's real utility is in its retention of information about realisations that do not quite 'make the grade' in the maximum likelihood filter. For instance, using the example values presented above, if the CF for realisation one was 6.2 rather than 1.2, the low factor-of-safety associated with this very likely realisation would be important. If this realisation should represent reality, then the cell is particularly unstable and perhaps should be avoided for road construction or harvesting. Realisation two would give the opposite results. This type of 'less-than-maximum likelihood' analysis is illustrated in Figure 3.12. Here, the lowest factor-of-safety with a reason-able likelihood is retained, rather than maximum likelihood. This type of data summary might be termed a 'worst case analysis', and would be useful when potential danger is the issue. 7 5 Figure 3.11. An example of an appllcation-spe- Figure 3.12. The worst-case-scenario sum-cific data summary: highlighting the areas in mary, in which the 'most dangerous' realisa-which slope instability is highly possible. tion that has a reasonable likelihood of oc-curring is utilised (on a cell-by-cell basis). These multiple surfaces can be used in many other ways. Although the research in this disserta-tion stops short of incorporating the information into management schemes, some of the relevant issues are introduced in Chapter Six. 3 .7 .1. PROBLEMS AND WORK REQUIRED 3.7.1.1. VERIFYING PARAMETERS A typical modelling procedure results in an unequivocal answer. It may be right, it may be wrong, or it may be somewhere in between, but the interpretation of the values is normally straightfor-7 6 ward. In the uncertainty model, however, equivocal values are more the norm. In this case study, soil classes, forest classes, and model results all have certainty factors assigned to them during the procedure. Standard deviations were also assigned to each cell of the results. What do these values actually mean on the ground? The field of uncertainty modelling clearly lacks procedures for verifying or comparing the results of the models developed. A straightforward Boolean soil classification can be verified with a sam-pling scheme, and perhaps summarised in a classification error matrix. If errors are too high, then the classification scheme may need to be modified. However, there are no existing proce-dures to verify, for example, a certainty factor of 6.5 in a particular soil class. Procedures are needed to demonstrate the utility of uncertainty modelling in a quantitative format, rather than simply relying on generic statements such as "a truer representation of reality" (Ramlal 1996) that have little value in practise. Given such procedures, it would then be possible to re-tune model parameters to better represent the uncertainty in a particular area, or to modify the model's inputs derived from semantic import for the same purpose. In the following chapter such proce-dures are developed and tested through field verification of the uncertainty model. 3.7.1.2. PREDICTION A second problem with this particular modelling procedure is the difficulty in verifying its output (as opposed to verifying input as discussed above). Slope stability models are inherently difficult to verify unless considerably detailed data are available. A single sample (e.g., an image) of an area gives only one temporal slice. Mass movement takes place over time; areas grow back and obscure the evidence of past mass movement; and some mass movement is delayed for many years beyond the peak initiation time. Both temporal and spatial detail are required to properly evaluate slope stability. In a similar manner to the previous problem, there are no standard procedures for verifying certainty factors in multiple realisations resulting from the uncertainty model. We need to deter-mine if this model predicted mass movement correctly, but of greater importance is: did it predict mass movement uncertainty correctly? Procedures are required for this problem as well. These 77 procedures are developed and discussed in Chapter Five. There, the model developed in this chapter and a second model (based on new parameters) developed in the following chapter are applied to a separate test area. The model results are validated using a high-resolution temporal model of mass movement. 3.7.1.3. R E P O R T I N G A N D C O M M U N I C A T I N G U N C E R T A I N T Y The Louise Island test case provided a platform for demonstrating the complexities of uncertainty propagation, and for developing one particular approach to handling this complexity. One under-lying purpose of this procedure was to maintain a maximum amount of information through to the final results of the environmental model; whatever it may be. As one author puts it, "it is a mistake to round inventory data or classify it [prior to] final presentation" (lies 1994:12). This multiplicity of results should then lead to a further stage in model processing: summarising the results for the particular application at hand. For example, when slope stability data are used in a harvesting profitability model, the key issue is: what areas are too steep/unstable to cut? When road building is the issue, the question becomes: what areas have a medium to high prob-ability of catastrophic failure? The data required of the slope stability model would be somewhat different in each of these two cases. Summarising procedures will use different parameters in each situation. It is particularly difficult to understand spatially variable, multidimensional model results using simple summary statistics. Although maps are an improvement, they too are inadequate to the task; particularly if the target audience is not familiar with the underlying science. Decision support models making use of this information have as basic criteria: clear, concise, understand-able summaries of many types of data. If uncertainty models are to fit into this framework then it will be necessary to place considerable emphasis on communication of the model results. Exam-ples of simple types of communication were offered above. A more extensive discussion of commu-nicating uncertainty in this particular project can be found in Davis and Keller (1997b); however, as discussed in that paper, a considerable research program would be required to implement 78 uncertainty communication into real world management. The research discussed in this disserta-tion is but one step in that direction. 3 . 8 . S U M M A R Y In this chapter a previously developed uncertainty model has been described, and its application to slope stability modelling on Louise Island, B.C. has been presented. Procedures were developed to address the conversion from a Boolean-polygon to fuzzy-continuous data model, and to apply this fuzzy model to a typical process modelling procedure. At this stage, the results are of limited utility due to a lack of procedures for visualising and therefore enhancing understanding of the data. Its spatial constraints are primarily based on information gathered from semantic import (SI), so there is also a lack of proof that the model's inputs actually describe uncertainty correctly. Furthermore, at this stage there are no data to determine if the prediction of uncertainty in the model outputs is actually correct; only with extensive landslide data could this be addressed. The following chapter focuses on the second of these problems: evaluating the parameters of the uncertainty model obtained through semantic import and other secondary methods. 79 Chapter Four Verification of Model Inputs 4.1.INTRODUCTION The uncertainty model described in Chapter Three uses parameters obtained from both numeri-cal analysis of resource attributes and semantic import of expert opinion. The metadata available for the modelling procedures described are somewhat typical of resource modelling in general: metadata are not gathered during resource surveys and so must be extracted through analysis, or estimated from other sources (e.g., Burrough 1989; Livingstone and Raper 1994). Only in highly controlled studies is a wealth of metadata likely to be available. In typical resource modelling, data derived from secondary sources are usually subjected to some type of procedure to confirm their utility in the current modelling scenario. For example, in stand-ard slope stability modelling, the slope values derived from photogrammetry might be spot checked in the field to determine their accuracy. However, one of the principal problems with studies of uncertainty in resource models is the difficulty in obtaining these confirmatory data. In essence, uncertainty modelling uses additional data (either retained or from new sources) to increase knowledge about the potential variability of databases, modelling procedures or decision models. However, the use of an inappropriate uncertainty model or propagation procedure can lead to under- or over-estimating this variability. Such mistakes can potentially be as significant as 80 ignoring uncertainty altogether. Even if the appropriate procedures have been used and the esti-mates of variability make sense, there is no easy way of confirming this fact. There are two principal areas where confirmatory procedures are currently lacking: a) fuzzy num-bers, and b) classification uncertainty. Fuzzy Numbers: There is some difficulty in understanding and utilising fuzzy numbers outside of database manipulations. For example, what does a 0.7 certainty factor for a sandy/silty morainal soil actually look like? In theory, it refers to a sample that is 'somewhat like' the ideal class. Membership values make sense in manipulating data; however, confirming this number with a sample is more difficult. There are no established procedures to compare such a fuzzy classifica-tion with a confirmatory sample. Classification Uncertainty: Fuzzy classes and fuzzy classification methods are utilised in many resource management disciplines. However, the majority of the effort in resource management uncertainty analysis involves either creating fuzzy classifications from a series of samples (e.g., remotely sensed images), or assigning fuzzy class memberships to samples based on a training dataset. When faced with the typical situation of existing definitions of class structure, and exist-ing polygon-based resource databases, fuzzy class membership routines do not necessarily make sense. The data used to establish these classes are not available at that point. In comparing samples with classes one is faced with a 'black box', where attribute I of sample A falls between parameters b and c, and so sample A belongs to class X (and only class X). There is often insuffi-cient information to support notions of class 'purity' required in fuzzy classification. In addition to these two problems, there is also the issue of tuning an uncertainty model. In standard discrete natural resource models, confirmatory sampling might be used on a random (or systematic) basis to determine if polygons were classified correctly. Parameters could then be redefined to fit reality. In a standard distributed model, one might perform transect samples to determine if the spatial structure of the attribute(s) being modelled are accurate. For example, a transect between two forest stands could establish if the spatial distribution of species between the stands matches the model (e.g., gradual change or abrupt change). 81 However, in a distributed uncertainty model this transect involves a series of changing member-ship values. If we focus on a complex system such as soil class—which is based on multiple attributes—the model might indicate that "a sample at point [x,y) should belong to class 'gravelly, silty fluvial' with CF (certainty factor, also known as fuzzy membership value) of 0.7, and class 'sandy morainal' with CF of 0.5". There are no established methods for sampling such a transect and then comparing it with these modelled values. Fuzzy classification systems provide a starting point; however, their assumptions do not necessarily apply. This chapter addresses this issue of sampling in order to calibrate an uncertainty model. The methods explored are extensions to existing fuzzy classification techniques—adapted and ex-panded to address confirmatory sampling. Existing techniques are reviewed, new extensions are developed, and the implications are addressed for uncertainty modelling in general. A subset of the techniques are then applied to the model developed in Chapter Three. Samples taken within the study area are used to re-calibrate the most crucial parameters of the uncertainty model. The differences between the new data models and the originals are then discussed (implementation of the data models in the process model and comparisons with the original will occur in the following chapter). The sampling and allocation issues discussed herein have considerable relevance to uncertainty models in general, particularly those utilising expert opinion as input. 4 . 2 . B A C K G R O U N D The principal questions addressed in this chapter are: 1) how can fuzzy classification structures be compared with confirmation samples?; 2) how well did expert opinion function as an input to generate the distribution of uncertainty represented by the fuzzy structures? (the transition cor-ridor model); 3) how well does metadata gathered from published statistics represent the actual uncertainty on the ground? (focusing on major model inputs); and 4) how can these confirmation data be used to recalibrate the model? Although most of the physical effort involved in answering these questions is concentrated on numbers two and three, it is the first question that consumes most of this chapter. Fuzzy classes represent a unique and often highly appropriate way of looking at the world. However, when the 8 2 focus is on specific attributes, such classes are also a considerable abstraction. Given this focus on fuzzy classification, the first point of business is to delve into the topic in greater detail than provided in the introduction to fuzzy sets in Chapter Two. 4.2.1. F U Z Z Y C L A S S I F I C A T I O N The model utilised in this work is based in part on the theory of fuzzy sets (see §2.3.1.2.4. for background details). Four main application areas of fuzzy sets have appeared in resource analy-sis. These are: 1. Fuzzy rules: Rather than encoding the steps in decision making as a series of IF-THEN statements, fuzzy rules are a set of parameters that are applied all at once, and the decision is made through a weighting system. This more closely emulates the human decision proc-ess, and is the most common application of fuzzy set mathematics (e.g., Bouille 1992). Fuzzy rule applications are numerous. For example, the Tokyo subway system uses a brak-ing system based on fuzzy rules. The speed of braking is determined through a fuzzy deci-sion-making process based on simultaneous evaluation of dozens of separate inputs (speed, weight, weather, etc.). In cases such as this, the fuzzy system has been found to provide smoother, more efficient operation than standard computer-assisted hardware. The fuzzy decision-making process can be encoded in hardware, speeding the process by exponential factors. 2. Fuzzy class definitions: Fuzzy classifications allow a blurring between standard classes by defining a class boundary as a function, rather than the hard boundaries of an IF-THEN statement (e.g., Burrough 1989). A sample might belong to two (or more) classes to varying degrees. Figure 2.4 demonstrates how the edges of 'hard' classes are blurred by a fuzzy classification system. 3. Fuzzy queries: Standard spatial database queries involve hard numbers. For example, determining the suitability of an area for agriculture requires queries such as 'IF RAINFALL < 200mm AND DRAINAGE = GOOD THEN...'. Fuzzy queries apply fuzzy set theory to the analysis of spatial data, allowing the fuzzy semantics of a query such as 'What areas are 83 NEAR the river, NORTH of the town and SUITABLE for agriculture?' When humans ask such a question, they are actually setting a series of fuzzy constraints on the query. For example, the word 'near' has different meanings depending upon the scale and the purpose of the analysis. We do not mentally picture a sharp cut-off when using this word. Fuzzy queries serve to translate this type of meaning into an actual spatial data query. 4. Fuzzy classification systems: A variety of methods have been developed to segment com-plex environmental sample sets into classes. Fuzzy set theory has been applied through algorithms such as fuzzy-c-means (Bezdek et cd. 1984), in which classes and sample mem-berships in classes are determined through iterative minimisation of a fuzzy function. This system has proven useful in several areas, including remote sensing (Du and Lee 1996; Foody 1996) and soil classification (McBratney and DeGruijter 1992). In the work discussed in this chapter the focus is on using the latter item—fuzzy classification— as well as the second item—fuzzy classes—to verify the uncertainty inherent in the major inputs to the slope stability model. In essence, it involves defining how far a particular sample is away from its modelled class in fuzzy attribute space. In this section, soils are used as the principal example as forestry data are represented by a much simpler classification system and the other major inputs to the slope stability model are based on cardinal data. The model discussed in the previous chapter uses fuzzy values to define to what degree a particu-lar point on the ground (actually, a cell in the raster structure) belongs to each of the soil classes. In this case, the fuzzy value refers to how much we expect a ground sample at that site would be like each ideal class. For example, a value of 0.8 for class 1 indicates high similarity, while (in the same cell) a value of 0.2 for class 4 indicates low similarity. We would expect that an average sample taken in the cell would be similar to the ideal definition of class 1, and dissimilar to the ideal definition of class 4. However, the key question is, how is it possible to make this comparison? In a normal confirma-tion sampling situation one would gather and analyse samples, classify them, and then compare modelled class with sampled class. If the comparison did not fall within a classes parameters, then the cell is deemed misclassified. 84 In a more complex situation the classes might be defined using a fuzzy system (we are now referring to 'fuzzy class definitions'—number two in the above list). In this case, small differences such as the sand content falling marginally outside the class bounds would not disqualify the sample. The comparison between the sample and the class would not be a yes or no, but a fuzzy number—a degree of belonging. However, the definition of a class typically involves a number of different attributes. This comparison must therefore summarise how the sample and the class compare relative to all of these class components. If the comparison is made using a graphical method, the graph must have as many axes as there are attributes. This is termed p-dimensional attribute space, where p is the number of attributes used to define the class. Figure 4.1 illus-trates a simplified view of this space, showing just two attribute axes, three classes and two samples. Note how the fuzzy class definitions blur the class boundaries. Essentially, the uncertainty model discussed in the previous chapter has generated a prediction for this fuzzy number. The purpose of the verification is to see if the number corresponds with reality. Soil systems are particularly suited to fuzzy classification. As Fridland (1974) notes (quoted by Odeh et al. 1992:506): "in terms of classification, the soil cover is liable to be either continuous (with gradual transitions between soils, though closely related soil forms) or discrete (with sharp transitions between soils and very dissimilar neighbouring soils)." This complexity is apparent at all scales of soil analysis (Webster 1985). Soil science was one of the first natural resource-based applications of fuzzy classification (e.g., Burrough 1989), and this discipline continues to be a favourite area of application for these techniques. In the sections that follow, two terms will be used to ad-dress uncertainty verification: classification and allocation. 'Classification' refers to building a new set of classes based on detailed data (e.g., a series of samples using cardinal values,) while 'allocation' refers to fitting a new sample (with 1 — i - " — — L ~ — 1 ' ° r Attribute B its cardinal values) into previously defined classes. The term Figure 4.1. Simplified (p=2) view of p-dimensional attribute space, 'allocation' is the more correct of the two in this context, as fuzzy classes and samples 85 classes will have already been defined when verification commences. However, the methods avail-able for allocation are all drawn or extended from classification procedures; therefore, much of the preliminary discussion below will focus on this latter term. There are a number of possible ways of setting up classes or deciding to which class a new sample belongs. Most of these methods assign one class (and only one class) to a particular sample. There are, however, several techniques that allow multiple class memberships. The following sections will introduce or elaborate on classification methods that, in the process of setting up classes, utilise some type of multiple class technique that is potentially useful in the process of allocation. 4 . 2 . 2 . M A X I M U M L I K E L I H O O D The maximum likelihood (ML) classification algorithm is a decision rule that assigns a set of measurements to a class based on probability. It is commonly used as a supervised classification procedure in remote sensing, yet it is equally applicable to assigning a sample to a class in attribute space. The ML rule is normally used to assign a pixel/sample Xto a single class; how-ever, if the decision stage is removed, a series of probabilities can be assigned to X indicating probability of membership in every class. The ML classification algorithm is as follows: Decide that X is in class c if, and only if, pc > pt, where i =1,2,3...,m possible classes (4-1) and pc = [- 0 5 bg. (detfe ))]-[(15(X-Mc J t \x " Mc ) ^.2) where Mc is the mean measurement vector (i.e., the set of measurements in attribute space to class centroids), Vc is the covariance matrix of class c, and det[VJ is the determinant of the covariance matrix (Odeh et al. 1992). To retain probabilities for all classes (rather than create hard boundaries) the first decision rule is removed. 86 In remote sensing classification the primary problem with this routine is the assumption that each class has an equal probability of occurring in the terrain. If this is not the case then the decision rule can be changed by weighting each class by it's a priori probability. There are two primary problems with this algorithm when applied to the allocation of new indi-viduals to existing classes. First, the ML routine assumes a Gaussian distribution of all statistics. This assumption is sensible for remote sensing applications, since in supervised classification the training sites may be chosen with this restriction in mind. However, there is no indication that all attributes of soil or forestry classes are distributed this evenly. Indeed, there is some evidence otherwise (Odeh et al 1992; discussed in the following section). Secondly, the assumptions of 'probability' are not the same as fuzzy 'possibility'. Though similar in notation, the two are consid-erably different in application. The former deals with yes/no answers, while the latter deals in similarity. For a full discussion of the difference the reader is referred to Yoshinari et al (1993) or Bezdek(1992). 4 . 2 . 3 . CONTINUOUS CLASSES - FUZZY CLUSTERING Starting with the work by Ruspini (1969), Dunn (1974) and Bezdek (1974), several methods for constructing continuous classification systems have been developed, where the reduction to a single class membership per sample does not (necessarily) occur. Collectively, these methods are referred to as fuzzy clustering. The most popular and well studied method is known as the fuzzy-c-means or fuzzy-Jc-means algorithm (the two are identical but use different notation). Fuzzy-c-means is a direct generalisation of hard-k-means (Hartigan 1975). This method is based on mini-mising the within-class sum-of-square error function. The details of both methods are presented in Appendix A. 4.2.3.1. BACKGROUND - FUZZY CLUSTERING Fuzzy clustering was developed for geo-statistics and soil science due to problems encountered in restricting class boundaries to regions (in attribute space) with a small probability density. The gradual changes found in reality were poorly represented by hard classes. For example, individu-als could be very close to each other in all attribute values but be split into different classes due 87 to hard-and-fast rules. Figure 4.2 details some of the possible distributions of soil samples (individu-als) in simple 2-D attribute space; hard bounda-ries are only useful in some of the cases. Although ordination methods (e.g., principle com-ponents analysis or multidimensional scaling) of-fer ways to represent data using a continuous model and a simplified dimensional space, these meth-ods are less than ideal for non-linear class struc-tures such as those found in soil analysis (McBratney and DeGruijter 1992). Fuzzy cluster-ing using continuous classes is better suited to non-linearity. (b) (c) Figure 4.2. Notional distribution of indi-viduals in attribute space: (a) hierarchical, (b) clusters with directed lines, (c) weakly clustered, (d) equal density (Source: McBratney and DeGruijter 1992). Continuous classes provide better representation of individuals located interstitially between classes (intergrades) than do standard discontinuous classes. Instead of trying to expand the nearest class to include them (or simply calling them 'exclusions'), intergrades are given partial member-ship in all nearby classes based on the distance in attribute space to each class centroid (Figure 4.3). The key difference between partial (or fuzzy) memberships and other types is this series of memberships. A particular sample is not locked out of all other classes once its maximal member-ship value is determined. As with other fuzzy values, a particular class membership indicates to what degree the sample is 'like' the idealised or chosen class, rather than a probability of mem-bership. The concept of using a distance metric to allocate an individual sample in attribute space is demon-strated in Figure 4.4. The spheres around the classes represent the boundaries used in 'hard' Figure 4.3. Hard classes and continuous classes. In (a), the interstitial sample be-longs to no class, while in (b), its member-ship is defined based on centroid distances. 88 classification to exclude all others. In fuzzy clas-sification, such boundaries become gradients. The classification algorithms (such as those dis-cussed in Appendix A) do not define any specific method of calculating the distance to the class centroids. This distance can be calculated using several possible metrics. One option is the Eucli-dean norm which gives equal weight to all axes Figure 4.4. Mahalanobis distance in a 3-D attribute space. The spheres indicate Boolean and ignores any dependencies among them. For class divisions that are discounted in fuzzy clustering. example, in a two attribute soil class, the distance from a new sample to the class centroid would be measured using a simple Pythagorean equation, as demonstrated in Figure 4.3. Additional attributes would simply increase the number of dimen-sions in the calculation. However, this simple concept of a circular (2-D), spherical (3-D) or hyper-spherical (>3-D) class would require that all attribute variables be linear. It is possible to normalise the various axes in order to approach the spherical class shape, but non-linearity will distort the imaginary spherical class. In fact, studies have shown that such regu-larly shaped classes are a rare occur-rence when modelling soil systems (Odeh et al 1992). Non-linearity in variables will requiring a class shape that might be termed a 'hyper-poly-gon'. In Figure 4.5 the concept of a hyper-polygon class is demonstrated = r<~'~ Figure 4.5. Classes viewed as structures in (A,B,C) at-in three dimensions. tribute space. Classes are not necessarily spherical group-ings (e.g., yellow class); they may be represented by more complex objects, here termed 'hyper-polygons' (e.g., red class). 89 The other problem with the Euclidean norm is that, in practise, many of the axes will be depend-ent on one another to some degree. Therefore, in most cases a more appropriate distance metric is the Mahalanobis norm. This metric is utilised in many natural resource studies (e.g., Abel et al. 1992; Leese and Main 1994). It is capable of compensating for a non-spherical shape of the class in attribute space and, additionally, can account for dependencies in the variables. It makes use of the pooled within-classes variance-covariance matrix. matrix (Odeh et al. 1992). 4.2.3.2. A P P L Y I N G F U Z Z Y C L U S T E R I N G T O C O N F I R M A T O R Y S A M P L I N G Functionally, the work discussed in this chapter differs from the fuzzy clustering derived from fuzzy-c-means algorithms. We are not interested in defining classes; these were defined during the original soil survey. The focus is instead on allocating new samples (individuals in attribute space) to existing classes. The methods will draw on the algorithms described above; however, classification (in the sense of defining classes) is no longer the issue, so iterative procedures are not necessary. There has been relatively little research performed on alternative methods of allocation. Some of the basic methods are reviewed by Sneath (1979), Payne and Preece (1980) and McBratney (1994). These authors note three primary methods of allocating new individuals to existing classes: diag-nostic keys, diagnostic tables, and distance in attribute space. Diagnostic Key: Keys force a user to make a sequence of tests, each having different possible outcomes. After a series of tests the unknown individual will be fitted into a known class. Keys make use of a tree structure of decision making, although the tree can have a variety of topolo-gies. Keys are normally restricted to standard hard' classification systems such as a standard soil taxonomy. (4.3) where xt is the vector of attributes, c, is the vector of centroids, and X is the variance-covariance 90 Diagnostic Table: This is a two-way table used for identifying the class of an unknown individual. For example, the rows would represent the class and the columns the range of attribute values required. An unknown may belong to more than one class, or none. This system is often used in biology for nested classification systems (e.g., order, group, subgroup). Distance in Attribute Space (taxonomic distance): These methods use some measure of the distance from the individual to a class centre in attribute space. Methods such as discriminant analysis and pattern recognition can be considered part of this group. For example, neural net-work-based methods (e.g., Skidmore et al. 1997), though they use a network structure rather than points in attribute space, allocate individuals based on example rather than a set of prede-termined rules. A fuzzy classification system such as fuzzy-c-means (and its extensions) utilises this type of dis-tance measure as an integral part of the analysis. The Euclidean or Mahalanobis distance met-rics, minus the iterative steps, can be used to determine the distance in attribute space. 4.2.3.3. STRUCTURE OF THE CLASSES IN ATTRIBUTE SPACE The algorithms used in classification procedures group the samples (individuals) in a wide variety of ways. However, once classes are established (even as an intermediate step in an iterative proce-dure) the most common method of measuring how 'close' an individual is to a particular class is to use the class centroid as a target (as illustrated in Figure 4.4). This makes perfect sense when shuffling centroids in the iterative classification process, particularly when variances are equal-ised and classes are somewhat spherical. However, in the process of allocation—particularly re-garding soil or other non-linear sets of attributes—the centroid may not be the best target for a quantified degree of membership. In a simple example using spherical class structures, Figure 4.6a shows a new individual located in an intergrade position between homogeneous classes. Though it is located equidistant from each centroid (and therefore has equal memberships in both classes using Euclidean or Maha-lanobis measures), it is clearly 'closer' to class A than class B. Whether this fact is of functional utility depends upon the application. 9 1 (a) (b) Figure 4.6. (a) A new individual at an intergrade position between class A and B is given equal membership if MD values are the same, despite the variations in class size. In (b), MD is smaller for class C, but the fuzzy structure of the classes indicate that D may be the better choice (the dotted line represents the location of class bounds if the fuzzy classes were to be 'hardened'. In another example (Figure 4.6b) the intergrade individual is located between two classes with fuzzy attribute definitions. It is clearly closer to the centroid of class C; however, class C is defined internally with a gentle membership slope. It may be more appropriate to assign the individual to class D due to its higher internal 'density' (the dotted line indicates where the class bounds would be located if the fuzzy classes were 'hardened'). Another issue is the classification system itself. Using soils as an example, standard classification uses a general purpose taxonomy where soils are divided into classes based on many different attributes. When a classification subset is used for a specific purpose such as agriculture or slope stability modelling, only certain attributes may be of interest. The classes thus defined may have no 'pure' centre. There may be no ideal combination of attributes that define the perfect 'sandy, morainal blanket'; the definition is simply a range of values (with or without a fuzzy boundary). The 'pure' centroid as a target makes little sense in this situation. Yet another problem with centroids is the non-linearity of some environmental classification sys-tems (notably soils). In spatial data analysis the spatial centroid of a polygon is only an appropri-ate summary device if the polygon is regular. Various distortions in polygon shape can lead to a centroid that is highly inappropriate (Figure 4.7). A similar situation exists with hyper-polygons in attribute space. The class centroid may, therefore, be a poor measure of centrality. 92 What is clearly required is a more complex measure of 'belong-we are trying to allocate may also be represented by more than ing' to a class than a simple centroid. However, the sample that a simple point in attribute space. The nature of the sample will (a) (b) also determine what methods are required. Figure 4 .7 . Centroids are not an ideal measure of polygon location when polygons are non-circular. 4.2.3.4. NATURE OF THE SAMPLE To reiterate, simple measures that generate classification or allocation statistics for a sample in attribute space may be insufficient due to complicated class structure. A more complex method may also be required due to the nature of the sample itself. This individual (the sample) may not simply be a point in attribute space, but instead a region of higher geometry (i.e., a hyper-sphe-roid or polygon rather than a point). The nature of this region is determined by two major factors: 1) sample uncertainty; and 2) the nature of continuous models. Sample Uncertainty: The single measurement of each attribute represented by a point sample would normally appear as a zero-dimensional point in attribute space. This is the standard way of dealing with samples in most classification schemes, including those that incorporate fuzzy clus-tering. However, the precision of the tools used, the possible errors in laboratory analysis, and the nature of the attribute being measured all contribute to sample uncertainty (see §2.2.3.1. and §2.2.3.3.). This uncertainty in essence blurs the sample point in attribute space. This will compli-cate any measure of attribute distance to a class, for the sample itself may be represented by a region or by fuzzy boundaries. Continuous Model: The second problem is the resolution of a continuous model. A raster-based model of a continuous data layer (a common method of representation) has a specific cell size that defines the model's resolution. In a remote sensing application the reflectance of a pixel is an average of all occurrences within (and some from neighbouring pixels). In a raster data model the value of a cell is normally the 'typical', most common, or the effective contents of the cell (e.g., a raster model of transportation might assign a cell a value of 'road' even if only 10% of the cell contained a road). There are a number of other methods of determining how a cell should be 93 coded, based on what is inside or what its' neighbours contain (for a full listing see Chrisman 1997). On the ground, a simple point sample within this cell would not be an appropriate method of testing a model; rather, a series of samples or a sample in a typical location within the cell would be more sensible. Such a sampling scheme would generate something more than a point in attribute space. In extreme cases, where vari-ability is high or inclusions are common (e.g., soil inclusions, heterogeneous forests) a sample hyper-spheroid or polygon would be appropri-ate. In Figure 4.8 a sample hyper-spheroid with three classes in attribute space is demonstrated. Note how the centroid-to-centroid membership Figure 4.8. A sample hyper-polygon (lower measures become increasingly inadequate as the right) and three classes in attribute space with Mahalanobis measures between their centro-complexity of the situation increases. ids. 4.2.3.5. METRICS AND MEASURES FOR MEMBERSHIP VALUES If the data are available, the situation in attribute space can be modelled in quite complex ways. The metrics used to make measurements can be based on the Euclidean or Mahalanobis types discussed herein, or could also be based on numerous others generated in fields such as remote sensing, pedology, biology, and most other natural sciences (for e.g., Manhattan or Minkowski metrics). The measures and axes being measured may be complicated by transformations such as those used in principle components analysis. The classes could be defined by anytJiing from a black-box with rigid boundaries to a hyper-polygon composed of a complex function. Samples could be points, blurred points, or equally complex hyper-polygons with discontinuous struc-tures. Therefore, even when using a relatively simple metric such as Mahalanobis, the measurements that are used to characterise class memberships may not be optimum if the standard centroid-to-centroid method is used. There are a number of alternatives; the choice of which would depend 94 upon the amount and type of data available, the nature of the feature being modelled, and the purpose of the modelling. In the case of soil modelling these might include a) class boundary inclusion, b) alternative vector measures, and c) various combinations. Inclusion of class boundaries: The boundary of a class in attribute space provides additional information regarding a sample's potential membership in that class. Depending on the class model utilised, the boundary may be either a rigid yes/no line or a function that tapers off. The rigid boundary provides an obvious measurement point; the tapering boundary has many. In the case of classes that differ in size (on one or more attribute axes) the use of class boundary is a more accurate measure of relative membership than the centroid (Figure 4.6). When the class shape does not approximate a hyper-spheroid it may also be a more accurate measure. Measurement to a hard boundary could be accomplished as follows: Each class is represented by a membership function, which (for comparison) in hard partitioning is in the form of: mA(x) = 0 for x < bmin or x > bmax (4.4a) mA(x) = 1 for bmin <= x <= bmox (4.4b) where b is the set of values defining the class, and mA(x) is the 'grade of membership' of x in A (either a yes or no when using hard partitioning). For one attribute, the distance to the border is simply the minimum of | x-b \. If xis within the bounds of the class then the distance equals zero. The grade of membership {mA(x)) will be defined as the inverse of this distance function. With multiple classes the Euclidean distance function becomes: 4=X[min(x,v-fcJj (4-5) where p is the number of classes, x is the sample value and b is the boundary location for the class. Adding the covariance matrix to the equation would incorporate the Mahalanobis distance. 95 In the case of continuous or fuzzy classes, the class membership function is more complex than defined in Equation 4.4. A typical fuzzy class membership function might be defined as (Burrough 1989): where a is a parameter governing the shape of the function and c defines the value of the property at the centroid. In this case the distance function will be more complex, as a cutoff value is required to 'harden' the boundary. In any case, the resulting number will reflect the distance in attribute space from the class; however, it will not take into account the continuous nature of the class. For this, a more complex measure will be required. Vector Measures: If the class, the sample or both are considered fuzzy sets, then metrics for fuzzy set distance are appropriate. Several authors have developed such metrics for the measurement of physical distance (Preparata and Shamos 1985; Altman 1994). The metric developed by Altaian (1994) returns a fuzzy set as a measure of the distance between two fuzzy regions. These authors focus on raster datasets, and so are actually dealing with stepped, or discretised continuous values. The sets being compared do not have infinite memberships, as might be found in a truly continuous structure (e.g., object-oriented data structures with fuzzy sets defined by functions). This restriction simplifies calculations considerably, and is appropriate for the current study. Preparata and Shamos (1985) note that there are several distance metrics available to measure inter-group distance, such as the Mahalanobis, Euclidean, Manhattan, and Minkowski Lp met-rics. However, as noted above, one of the problems with a predefined class is the difficulty in utilising a centroid or group mean, due to non-linearity in the attributes (attribute space) or problems in assuming normality. These authors make use of a nearest-neighbour metric. Altman (1994) extends this metric to return a fuzzy set from the calculation. The nearest neighbour fuzzy distance metric (NNFD) between regions A and B is defined as: (4.6) dist(A,B)= KJ (rmn(jiA{a),^B(b))/d2{a,b)) (a,o)e AxB (4.7) 96 where d2(a,b) is the distance between elements a and b using one of the metrics defined above (Altaian 1994). This NNFD metric results in a fuzzy set of distances and membership values. These values may be summarised graphically or 'hardened' in an application-specific manner. For example, in Figure 4.9 the nearest neighbour fuzzy distance results for a simple set of two classes and a new individual are illustrated. The area represented by the 9 x 9 raster map is a simplified two-dimensional attribute space (all three raster maps represent the same attribute space—the objects are separated out for clarity). The object in the first view is a class, but defined using a fuzzy representation rather than hard bounds (a hard bounded class would be repre-sented by all ones). In the second view a second, smaller class is shown. The third view shows a fuzzy sample. The two graphs on the bottom show the resulting distance from the sample to each of the classes. Instead of the single value that would result from a standard distance measure, each graph contains a fuzzy set of values. This set represents the distance from all of the sample to the entirety of the class. For example, the values above zero in class two (top middle raster) are contained with nine cells. The 'new individual' (sample) is contained within four cells. The graph on the lower right contains points that describe the physical distance between each of the four Class 1 Class 2 New Individual c < 1.0 a 0.8 3 5 > 0.6 >> N £ 0.4 0.2 0.0 0.0 0.3 0.4 0.4 1.0 0.2 0.0 0.0 0.0 0.0 0.4 0.6 i " 0.4 0.2 0.0 0.0 0.0 0 5 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0 0.4 1.0 1.0 0.7 0.2 0.0 0.0 0.0 0.0 0.5 0.7 0.6 0.5 0.1 0.0 0.0 0.0 0.0 0.2 0.3 0.3 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Attribute B 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.4 02 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 • • • • #t #t • **• • <• • • • • » • • • • • ••«••< •» «•» • ** ** • • •< 1 1.0 0.8 0.6 0.4 0.2 0.0 5 10 Distance (Sample, Class 1) 15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 o.o 0.0 0.0 M n 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 • • — 5 10 Distance (Sample, Class 2) 15 Figure 4.9. Sample to class distance defined using fuzzy sets in a simplified 2-D attribute space. 97 sample cells and the nine class cells (horizontal axis) and also describe the fuzzy overlay value, min(a.b), for the two (vertical axis). Other Measures: A variety of combinations of the above measures are possible. For example, the distance from a sample with variability to a well-behaved (i.e., Gaussian on all axes) class could be accomplished with a vector series of measurements to the centroid, summarised (if appropriate, i.e., normal distribution) using a mean distance and a standard deviation. For example, if the second graph in Figure 4.9 was represented by a line around the boundary (and the raster was smoothed at a higher resolution), the result would be approximately a normal curve that could be represented by summary numbers (as long as the class was 'well-behaved'). Weighted measures would be appropriate when the various axes (attributes) contribute to the class definition to greater or lesser degrees. Although normalising of the attribute axes deals with numerical and number scale differences, there should be a way of de-emphasising non-crucial attributes in a similar manner to weighted fuzzy classification rules. 4 . 2 . 4 . SUMMARY OF THEORETICAL WORK The sections above have introduced the concepts of fuzzy classification systems and the allocation of new members to existing sets using a variety of methodologies. Several possible distance met-rics were introduced. Existing methods of fuzzy allocation based on centroids of classes were extended to include recognition of class boundaries using both hard and fuzzy classification func-tions. Several other methods were discussed that allow a fuzzy sample and a fuzzy class to be compared resulting in either scalars or new fuzzy vectors. 4 . 3 . APPLICATION TO PARAMETER VERIFICATION AND TUNING The distance metrics described in the previous sections are primarily useful in allocating new individuals in a fuzzy classification scheme. Application of one or more of these metrics will allow an uncertainty model to be verified using confirmatory sampling procedures. This is an important step in determining if the model adequately represents reality (adequate for the purpose of the model). In this section an appropriate distance metric is chosen and used to allocate new samples 98 into the fuzzy classification scheme introduced in the previous chapter. The samples and alloca-tion procedures allow the spatial structure of the uncertainty to be verified. In some uncertainty models the structure may have been inferred from other studies, or given some default smoothing value (e.g., Mark and Csillag 1989). In the scheme developed for the previous research, expert opinions about uncertainty levels and structure were coded using the Semantic Import (SI) Model (see §3.2.1.1). Parameter verification (and subsequent re-tuning) will allow a new uncertainty model to be produced that reflects the actual levels and structure of variability at the site. 4.3.1. SAMPLES REQUIRED The primary purpose of parameter verification in this context is to verify and update the spatial constraints of the uncertainty model. Classification uncertainty is not included as an explicit part of the model (e.g., fuzzy class functions, boundaries in attribute space, etc.). The classes utilised are considerably simpler than those used in standard soil taxonomy, for they focus specifically on soil cohesion parameters. The classes are predefined, and have rigid boundaries. However, spatial and classification uncertainty constraints create a model of continuous spatial variation of fuzzy membership values. The key elements of the model that are subject to verification are the overall levels of uncertainty (maximum and minimum—referring to the possibility of misclassiflcation) for each of the soil classes, as well as the spatial variation in uncertainty across the boundaries of the original po-lygonal layer. Sampling will therefore focus on these polygon boundaries as well as on the 'purity' of a class in the centres of the original polygons. For simplicity of sampling and analysis seven attributes are defined, each of which can be repre-sented by a percentage. The target classes are defined using these seven attributes, based on the original subdivision of a standard soil survey (see Davis 1994 for details on the original survey). The attributes include relative percentages of different grain size classes, as well as values for general origins: morainal, fluvial, etc. Due to the huge number of samples required, values were chosen that could be quickly estimated in the field (after calibrating the estimates; discussed further below). 99 4 . 3 . 2 . METHODOLOGY The purpose here is to verify the spatial structure of uncertainty used as input to the uncertainty model of slope stability. There are two main types of data (as discussed in §3.2), classified (for-merly) polygonal—run through the transition corridor model, and non-classified continuous. Ini-tially, this verification will focus on the classified data, in particular the soil data, as it is the most important and most variable classified input. The transition corridor model makes use of two data sources: the original polygon locations and expert opinion about classification uncertainty and spatial structure of that uncertainty. The purpose of this current exercise is to verify the predicted spatial structure and, later, to revise the structure based on the results. The first step in verifying the spatial structure of the uncertainty is to design a ground-sampling scheme that will capture this uncertainty in an efficient manner. The transition corridor model algorithm is designed to work, as much as possible, at right angles to polygon boundaries. There-fore, the most efficient way of sampling would be along transects that follow this direction. In ideal circumstances the transect locations would be chosen randomly within the study area. The transects would then be oriented to pass directly through polygon centres and to cross their boundaries at right angles. This would represent the most efficient way of verifying the spatial structure of the soil uncertainty model (due to the assumptions of the interpolation procedures; see §3.2.1 and Figure 3.3). Also, a substantial number of transects would be used. However, several practical constraints limited the number of transects and their locations. Transportation logistics limited the available areas to those reasonably close to roads. Time constraints and the high intensity of the sampling required to delineate spatial structure limited the number of transects possible. The extreme and often impassable nature of the terrain made it difficult to travel along an ideal transect line; suboptimal opportunistic transects were often required. The presence of a large numbers of bears in the area also precluded work in deep forest cover or valley bottoms. These limitations no doubt reduced the efficiency of the comparative procedures (the limitations are discussed in greater detail below). 100 The data were collected in a representative section of the test site described in Chapter Three (Figure 3.6). A total of 171 samples were collected using four transects, each crossing a number of original polygons (Figure 4.10). Due to the low incidence of certain soil types in the study area, only three of the original ten soil classes are part of the sampling (Types 1,2 and 3 in Table 3.2). These three comprise 79% of the total study area, and over 92% of the forested land. Spatial locations of the initial control points for each transect were recorded using GPS co-ordinates. Intermediate stations were surveyed using hand-held compass transits and tapes. To match the resolution of the spatial model, sample locations were separated by 25 metres. Pit locations were chosen based on the typical surficial soil type within the surrounding 25 x 25 metre area. Estimated percentages of other soil type inclusions in the local area were also re-corded. The soil was sampled for grain size as a percentage in seven classes, as well as for its relation to three origin classes: colluvial, morainal and fluvial. There are two potential drawbacks to this type of sampling. First, by matching the resolution of the spatial model (at 25 metres), the samples may suffer from grid mismatching (ignoring spatial Figure 4.10. An overview of transect locations on the Louise Island test site. The circled numbers represent the transect numbers, while the smaller num-bers represent soil types for the original polygons (for reference). 101 uncertainty for the moment), creating a mixed-pixel problem. Smaller sample separations would not eliminate, but would considerably reduce this problem. Although a separation of 10 metres was originally planned in the sampling scheme, initial test sampling work (prior to the beginning the first transect) indicated that 10 metre separations were highly redundant. The spatial resolu-tion of the resource itself is at least 25 metres in the areas sampled. The sample spacing for the actual transects was therefore increased accordingly. Also, as the target of the verification scheme is the relative structure of the surface (rather than a point-by-point comparison), this mismatch-ing of grids becomes less of an issue; mismatches are subsumed by structural smoothing (dis-cussed in the next section). The second potential drawback is spatial uncertainty. Tests using a GPS unit in the field indi-cated that 95% of all readings would fall within ± 1 1 0 metres of the true location (Appendix B details the method used; note that differential GPS, though preferable, was unavailable in this region). Spatial uncertainty in the original soil data is not known; however, minimum mapping units used in the survey indicate that resolution is on the order of 25 metres, and so well within the bounds of the GPS uncertainty. Therefore, each of the sample 'cells' could be misaligned with the soil model by up to (approximately) four cells distance. Note, however, that this only applies to the entire transect (i.e., a global transformation). Within the transect itself the uncertainty of location is several orders of magnitude smaller (approximately one to three metres—no further quantification was performed due to the small magnitude of this uncertainty relative to the oth-ers). The direction of the transect relative to local polygon boundaries has considerable influence on the spatial uncertainty. In Figure 4.11 (an idealised demonstration), the modelled values (using the fuzzy model illustrated in Figure 3.3) are represented by the shaded squares, while the origi-nal polygon boundary is the wavy line. The transect at 90° to the polygon boundary (B) is only effectively misaligned along its length; misalignment parallel to the polygon border would have little effect on the results. However, a parallel transect's (A) misalignment would have the opposite effect. The only area where the parallel transect would be substantially uncertain is in the neigh-102 i i A * • * • • • " \ • , m . . . . f • • B Direction of Shift Along Transect Axis Parallel to Transect Axis 0.8 0.7 • 0.6 • 0.5 • 0.4 0.8 0.7 0.6 0.5 0.4 0.8 0.7 0.6 0.5 0.4 0.8 0.7 0.6 0.5 0.4 Figure 4.11. Idealised transects and the effects of shifting them within uncertainty bounds. Transect A, running roughly parallel to the original polygon boundary, is minimally affected by shifts along its length, while sideways shifts create changes in magnitude. Transect B, at 90° to the boundary, is significantly affected by shifts along its length, though sideways shifts produce minimal change. bourhood of the original polygon boundaries because, if this transect were on the wrong side of the line, the relevant interpolation algorithm would be quite different, and so the values might shift drastically. If the parallel transect is located well within the current polygon (i.e., > ~100m from the boundary), the sideways variability would be less significant (although still a factor). Moreover, as demonstrated in Figure 4.11, the primary shift would be in magnitude. The analyti-cal methods discussed below focus on relative change and are less sensitive to differences in absolute magnitude, so the problems of parallel transects are further minimised. Nevertheless, the issue of transect uncertainty when close to a modelled boundary is an important one. How-ever, it is only relevant to spatial uncertainty issues (which may or may not be related to attribute uncertainty - see Goodchild 1991.) Addressing this spatial uncertainty makes the task of analysis considerably more complex. To address the uncertainty in a truly comprehensive manner it would be necessary to generate numerous (anywhere from 12-25 or more depending upon the assumptions used) realisations of the model in a manner akin to the Monte Carlo techniques discussed in the previous chapter. 103 However, if suboptimal transects are eliminated, a reasonable solution (i.e., only slightly less accurate and much easier to interpret) is to perturb (i.e., offset) the transect locations only along their length. If the transects are at 90° to the boundary then this shift will (potentially) align the models and the sample if they correspond to some degree. Elements of a transect that parallel a boundary will not be affected by the shift (other than in absolute magnitude) because, in the model, paralleling a boundary generates a (roughly) straight line (Figure 4.11, upper graphs). The methodology used to implement this 'reasonable solution' is described below. 4.3.2.1. CROSS-CORRELATION For comparison of the sample transect and the modelled values across-correlation statistic is utilised. The purpose of cross-correlation is to compare two or more data series and determine the strength of the relation between them. The offset (often termed 'lag' in reference to cross-correla-tion) at which the two are maximally equivalent can also be determined. Designating n* as the number of positions in the series, and Yx and Y2 as corresponding values from sample and model, the cross-correlation for match position m is (Davis 1986): r„, = (4.8) The significance test (using standard t-test tables) is (df = N*-2): In*-2 A value of 1.0 indicates perfect correspondence, while a -1.0 indicates perfect negative corre-spondence. Two random, independent series would generate a value of zero. The match position is varied within the bounds of spatial uncertainty, and the resulting set of values plotted in a cross-correlogram (Appendix C). There is significantly more short-range variability on the transects than is apparent in the model. In order to better represent the general variability of the model, a series of smoothing passes are 104 performed using a moving average equation based on orthogonal polynomials. Initial tests indi-cated that the relevant range of smoothing would be +2 pixels to ±4 pixels (larger amounts of smoothing led to decreases in correspondence). Therefore, the smoothing equations utilised are limited to those that function within this range: Avg(5): YT =-^[W +12fc+1 +Y,_l)-3(YI+2+YI_2)] Avg(7): Y, =^[7y,+6(yw+yl_I)+3(y;+2+y;_2)-2(yl+3+yl_,)] (4.10) Avg(9): ^  =^[597, + 54{YM +Y„)+39(YM +YI_2)+U{YI+3 + Yi_3)-2\{YI+4 + YI_4)] To further clarify this issue, a set of idealised cross correlograms is illustrated in Figure 4.12, with the correlograms on the left, and the data they are based on placed on the right. The two values being compared are a modelled transect of an attribute and a measured transect of the same attribute. In the first pair (a), the model and the measured values line up exactly. This is reflected in the cross correlogram by a steep rise to a value of 1 at 0 (zero) offset. In (b), the modelled and measured values are similar, although the amplitude is different. The cross correlogram gives the same results as it is relatively insensitive to amplitude variations. In (c), the measured value is the exact opposite of the modelled value. Here, the correlogram is reversed, showing -1 at offset zero, indicating perfect negative correspondence. In (d), modelled and measured are similar in pattern, but are offset from each other by 2 distance units. This is reflected in the cross correlogram by a steep rise at an offset of +2. In example (e), a more complex situation is pictured. Although the two are in perfect correspondence—generating a steep rise on the left graph at 1, they would also be in perfect correspondence if the entire 'measured' curve were to be shifted left or right by 6 distance units. This offset correspondence gives rise to the peaks in the correlogram at +6 and -6. The correlogram also dips at +3 and -3, because an offset of +3 or -3 would give rise to high negative correspondence. In reality, the situation is rarely this unambiguous. For example, in (f) the effect of offsetting the modelled values would be minimal, therefore the correlogram shows high corre-spondence at all offsets. There is little information regarding pattern correspondence that can be gained from this graph. 106 The purpose of smoothing is to eliminate short range variation that causes a reduction in corre-spondence between measured and modelled values. In Figure 4.12 (g) and (h) the effects of this smoothing are demonstrated. However, smoothing is not automatically called for. In cases where short term variability is the norm, then it would decrease overall correspondence. 4.3.2.2. DATA SUMMARY The Mahalanobis distance (MD) metric is used to determine the fuzzy allocation statistics for each sample. In this instance the centroid of each class is used as a target. While this situation is not ideal, a lack of information regarding the detail of class structure makes it necessary. Due to the generalisations involved in simplifying the original soil classes, considerable overlap in class at-tribute space is expected. A continuous class structure is therefore utilised, with the fuzzy value (p = 2. This is the standard value utilised as a first approximation in fuzzy-c-means implementa-tions (Appendix A explains the significance of this value). Variations in this value have been explored by Odeh et al. (1992) (also see Appendix A). Ideally, this allocation procedure would take place in concert with the primary soil survey. In that case the detailed class definitions would be readily available from earlier calculations. The MD values assigned to each sample are then inverted (distance is the effective opposite of fuzzy membership), normalised (using the maximum value in the dataset) and scaled (globally) for comparison with the fuzzy membership values assigned by the uncertainty model. Scaling is not BR Cob Peb. Sand Silt Fluv Mor. Coll. Class 3 0 0 30 20 70 100 0 0 Weighting 0.17 0.17 0.17 0.17 0.17 0.05 0.05 0.05 Sample #17 0 0 20 40 40 0 100 0 Membership = -yJ(dA f * wA + {dB )2 * wB +... where d = sample to centroid distance, w = weighting of attribute = V(30-20)2 *O.17 + (20-40)2 *0.17 + (70-40)2 *0.17 + (l00-0)2 *0.05 + (100-0)2 *0.05 = 35 , Max- Value. _ , „ 66-35 Normalise, scale and invert: = * Scale Factor = * 0.8 = 0.37 Max 66 where 'Max' is the largest membership value in the dataset and 'Scale Factor' is determined by visually lining up result graphs (it has no significance in cross - correlation statistics). Table 4.1. Calculation of the Mahalanobis distance for one sample. 107 strictly necessary, as the cross-correlation procedures ignore magnitude; however, scaling assists in visual comparison of the model and sample datasets. An example of the original data and calculated allocation values are shown in Table 4.1. 4.3.3. RESULTS The resulting cross-correlograms for the transects are displayed in Appendix C, as are calculated significance tests of cross-correlation at the 5% level. Smoothed values are also calculated in order to determine if large-scale variability is affecting the correlation coefficients. In most cases the smoothed values perform better than the raw data. The cross-correlograms summarise a great deal of information, and are used here to develop general observations about the behaviour of the transect samples relative to the model. The transects are individually summarised below. Each of the paragraphs refers to the soil types defined in Table 3.2, where Type 1 = sandy/silty morainal blanket, Type 2 = silty morainal blanket/veneer, and Type 3 = rubbly, silty colluvium/morainal. In the discussion, a 'positive correlation coefficient for Type 1' would refer to a strong positive correlation between a) the Mahalanobis Distance values for Type 1 based on samples along the transect, and b) the values generated from the uncertainty model. The 'spatial offset' refers to the peak (or trough) at which the correlation coefficient is maximised. There is some difficulty in applying standard terminology in a comparison of MD sample data arid fuzzy modelled data. Using a fictitious example, in the original Boolean polygonal soil model, soil Type 2 is not present at location A—whose map co-ordinates are (3,7); however, the fuzzy model would represent it with some value (e.g., 0.2), based on possibilities of misclassification and other data discussed in the previous chapter, indicating that there is a small possibility that whatever is there would be misclassified as Type 2. The ground sample data run through the MD manipula-tion will also report some value at (3,7) for Type 2, even if the presence of bedrock makes the value zero. Therefore, when a type is termed 'not present', this only refers to the original polygonal data—its' uncertainty value is still present. Unfortunately, it is not possible to use the cross-correlation statistic to directly compare samples and model for these 'not present' types (and 108 therefore compare the misclassification estimates for all soil types). This is because the calcula-tions are not sensitive to absolute values, only relative changes. Therefore, two reasonably flat graphs (such as the graph typically generated by a 'not present' type) will show high correlation at all offsets, even if they differ somewhat in absolute values. This type of correlation is demon-strated in Figure 4.12(f). It may be possible to compare such values using simple averages; how-ever, this extension to the work is not pursued herein (if it were possible to sample all soil types this work would have merit; with only one parameter to test—misclassification—the test would be of little use in model tuning); therefore, the focus of this analysis will remain on the three types deemed 'present' by the original polygonal soil map. The following observations refer to the graphs (Figure C.3—'Original') of Appendix C. Note that correlations beyond an offset of +4 or -4 are considered spurious, as the spatial uncertainty is ~ 100m (4 x 25m cells). The graphs are extended to -6 and +6 to assist in trend visualisation. The paths of the transects are illustrated in Figure 413. Transect 1: Comparison of samples and model along this transect indicates that Soil Types 1 and 2 both demonstrate substantial positive correlation coefficients (see Appendix C), and both of these exhibit a similar spatial offset (~+2). Soil Type 3 shows a weaker positive correlation, but at a very different offset (-4). It shows a strong negative correlation at +2. (a) Original polygon boundary Soil type (original Boolean) \ 2 I 3 I 1 I 3 I 1 5 10 15 20 25 30 Sample Number (25m spacing) 35 (b) 3 1 1 40 50 60 70 80 90 Sample Number (Note shift in scale) (c) I I (d) 3 1 3 1 3 1 3 1 1 100 110 120 130 140 150 160 170 Sample Number (Note shift in scale) Sample Number (Note shift in scale) Figure 4.13. The sequence of polygons 'encountered' (i.e., using the Boolean model) on each transect, (a) Transect #1, (b) Transect #2, (c) Transect #3, and (d) Transect #4. The transects have been individually scaled to fit a standard length. 109 The following general conclusions can be drawn from this transect analysis. The +2 offset com-mon to all three cross correlograms indicates that the transition between 'soil units' (using the term to refer to groups of similar soils) is likely being modelled well, though at a 50m (+2) offset. For soil Types 1 and 2 the model and the samples appear to correspond well; however, there is an apparent significant problem in one of the soil type definitions—possibly Type 3—indicated by the mirrored coefficients. This means that the values modelled are changing in the opposite direction than expected, though they are changing at the expected spatial location. Transect 2: Again, both Types 1 and 2 demonstrate significant positive coefficients between samples and model, and the offsets fall within the same range. The range in which the offsets are high is broad, indicating that the transitions between soil units may not be very distinct. This may be explained by the fact that this transect spends some of its length in the vicinity of a 1/3 boundary. Because it does not cross the 1/3 boundary at the ideal 90°, the transition between the two on the graph is less distinct. Again, Type 3 demonstrates some negative correlation; almost the reverse of Types 1 and 2. This lends further evidence to the apparent soil type definition problem for Type 3. Transect 3: This transect passes through a number of smaller areas, back and forth between Types 1 and 3 (Type 2 is 'not present', though is included in the discussion as noted above). In this case, correlation for Type 1 is highly negative at an ~0 offset; Type 2 is highly variable, with a small peak at 0, but a small negative peak at +3, and Type 3 has a positive peak (-0.4) around offset 0. As with transect 1, it appears that the common offset is 0, and again there is an apparent definition problem causing a 'mirrored' negative offset of Types 1 and 3, however it is Type 1 that receives the negative value this time. This provides some evidence that the type definition problem may be related to confusion or overlap between Types 1 and 3. Nevertheless, there is also a polygon size influence occurring. The polygons encountered in this transect are smaller than in others (Fig. 4.13c). The results of cross-correlation may also be influenced by a poor representation of small polygons in the soil model and derived uncertainty model (due to cell size and implied scale of analysis). For example, at some points on the transect 1 1 0 a 100m offset is sufficient to 'hop' from one Type 1 polygon into another, providing spurious results, as demonstrated in Figure 4.12e. Transect 4 : This transect crosses a 1-3 boundary between two large masses. Both Types 1 and 2 show peaks at a 0 offset, while Type 3 shows a somewhat flat line. Type 3 is obviously poorly represented in the uncertainty model; the flat line indicates that the single boundary present on the transect may also be poorly modelled. These results indicate that the definition of the spatial structure of Soil Type 3 may be inad-equate. This soil type (rubbly, silty, colluvium/morainal) is the only one defined that contains a wide variety of materials. The other types are much more specific. It appears that, on the ground, Type 3 and Type 1 have greater similarity than their definitions indicate. Classification confusion between the two is higher than predicted by experts. Overall, there are a number of points where the correlation peaks (or troughs) are located at very similar offsets. Although this suggests that the samples and model are in general agreement about the locations of boundaries, there are too many dependencies between the soil types (both samples and model) for this to be considered statistically significant, or the significance to be investigated. 4.3.4. APPLYING CHANGES In general, the results indicate that expert definitions of classification accuracy and spatial be-haviour of uncertainty in soil distribution are not ideal. Five of the cross-correlations did not achieve significance on the t-test (Appendix C); some of the more specific problems are noted above. In particular, the spatial boundaries and overall misclassification estimates between the models of Types 1 and 3 are apparently in error to some degree; a number of other parameters also may be suboptimal. However, though there is some indication of what direction these param-eters should move (e.g., the misclassification values between Types 1 and 3 should increase), there is no direct way of determining the degree to which they should move. The purpose of the routines described in this section is to attempt to re-set the parameters (origi-nally defined by expert opinion) so that they provide a better match between the model and the samples. Given the complexity of the situation—the fact that a change in one parameter would Ill affect others—there is no simple, direct way to effect such a change. One possible way of address-ing this problem is an exhaustive search of all the possible parameter settings, with a check of cross-correlation statistics at every step. The target is to find a set of model parameters that achieves the highest overall cross-correlation for all transects simultaneously. An iterative procedure is therefore applied to the soil data subset of the uncertainty model. The parameters defining misclassiflcation and spatial behaviour (Table 3.1) for these three soil types are varied within reasonable bounds. Cross-correlation for all transects are calculated at each stage. When the parameters converge on the highest overall correlation coefficient for each transect the procedure is ended. Essentially, this procedure takes all potential values that the 'expert opinion' input could possible have, and builds a new set of cross correlograms for each of the many combinations (approximately 129) for each transect. It searches for the set of values that give the highest overall correlation in a maximum number of the graphs (though ensuring that the result is reached through convergence, rather than through erroneous peaks). 'Reasonable bounds' were used to reduce the number of nested iterations required to complete the procedure. They were chosen based on the original expert opinion estimates of the values (Table 3.1). For example, the possibility of Type 3 being misclassified as Type 2 was defined originally as 0.25. The values were therefore varied between 0.05 and 0.6 (an initial guess at possible parameter bounds). The results of the procedure discussed below were checked to deter-mine if these limits were adequate (i.e., did any new misclassiflcation values approach these limits). No changes were required. The algorithm used is as follows: 1) define the bounds of the fuzzy misclassification and intrusion matrices (Table 3.1) (in the absence such 'expert input'the range of 0.1 to 0.9 would be used) and initialise with the lowest value; 2) (re-) calculate the correlation coefficients for all soil types; 3) if a coefficient is > previously stored max value, store this value and append the fuzzy values used in the calculation (retain previous maximum values); 4) repeat (2-3) with the next set of fuzzy values; continue until all combinations have been attempted (i.e., through nested iteration); 5) examine results and discard any maximum that shows no evidence of convergence. 112 T h e r e s u l t s o f t h i s p r o c e d u r e s h o w c o n s i d e r a b l e s i m i l a r i t y fo r e a c h t r a n s e c t ( T a b l e 4 .2 ) . F o r e x a m -p l e , t h e m i s c l a s s i f i c a t i o n m a t r i x v a l u e s c a l c u l a t e d fo r S o i l T y p e 1 o n t r a n s e c t 1 a n d t r a n s e c t 4 (two t r a n s e c t s c r o s s i n g s i m i l a r b o u n d a r i e s ) a r e a s fo l l ows : T r a n s e c t 1: M a x i m u m c o r r e l a t i o n c o e f f i c i e n t : 0 . 9 2 u s i n g ( 0 . 5 0 , 0 . 1 5 , 0 . 4 0 ) (the t h r e e v a l u e s i n p a r e n t h e s e s r e p r e s e n t t h e r e l a t i v e p o s s i b i l i t y o f m i s c l a s s i f i c a t i o n fo r T y p e s 1-3 r e s p e c t i v e l y a s d i s c u s s e d i n § 3 . 2 . 1 . T h e y a r e s u m m a r i s e d i n T a b l e 4 .2 ) . T r a n s e c t 4 : M a x i m u m c o r r e l a t i o n c o e f f i c i e n t : 0 . 4 1 u s i n g ( 0 . 6 5 , 0 . 1 5 , 0 . 6 5 ) F o r S o i l T y p e 3 t h e v a l u e s a r e a l s o q u i t e s i m i l a r : T r a n s e c t 1: M a x i m u m c o r r e l a t i o n c o e f f i c i e n t : 0 . 8 5 u s i n g ( 0 . 7 0 , 0 . 1 0 , 0 . 5 0 ) T r a n s e c t 4 : M a x i m u m c o r r e l a t i o n c o e f f i c i e n t : 0 . 5 6 u s i n g ( 0 . 6 0 , 0 . 1 0 , 0 . 5 0 ) In o t h e r w o r d s , t h e p r o c e d u r e d e s c r i b e d a b o v e w a s u s e d to d e t e r m i n e t h e m o d e l p a r a m e t e r s t h a t b e s t d e s c r i b e t h e f i e ld d a t a . T h i s w a s d o n e i n d i v i d u a l l y f o r e a c h o f t h e t r a n s e c t s . T h e p a r a m e t e r s c h o s e n b y t h e p r o c e d u r e (the t h r e e n u m b e r s i n p a r e n t h e s e s above ) w e r e q u i t e s i m i l a r fo r d i f f e r e n t t r a n s e c t s , i n d i c a t i n g i n d e p e n d e n t c o n f i r m a t i o n o f t h e v a l u e s . T h e r e s u l t s o f t h i s p r o c e d u r e w e r e t h e n u s e d to r e s e t t h e o r i g i n a l m i s c l a s s i f i c a t i o n m a t r i c e s i n t h e u n c e r t a i n t y m o d e l . O r i g i n a l v a l u e s a n d u p d a t e d v a l u e s a r e d i s p l a y e d i n T a b l e 4 . 3 . In g e n e r a l , i t i s Type Transect MaxCC Lag Avg Misclass Values Comment 1 1 0.92 2 9 0.50, 0.15, 0.40 large 1/3 boundaries 2 0.56 2 5 0.60, 0.50, 0.50 one large 1/3 3 0.55 -1 0 0.50, 0.15, 0.70 small 1/3, small Inclusions of 1 4 0.41 -4 5 0.65, 0.15, 0.65 one large 1/3 2 1 0.83 3 9 0.70, 0.50, 0.60 one section at the edge 2 0.76 0 9 0.25, 0.50, 0.10 one large section 3 0.35 -4 0 0.70, 0.50, 0.55 no 2 present 4 0.70 0 9 0.30, 0.50, 0.20 no 2 present 3 1 0.85 3 9 0.70, 0.10, 0.50 large 1/3 boundaries 2 0.50 2 9 0.55, 0.60, 0.50 one large 1/3 3 0.56 1 9 0.15, 0.10, 0.55 small 1/3, small inclusions of 1 4 0.56 1 9 0.60, 0.10, 0.50 one large 1/3 Table 4.2. M a x i m u m v a l u e s o f c o r r e l a t i o n c o e f f i c i e n t s o b t a i n e d t h r o u g h i t e r a t i v e c a l c u l a t i o n s . 113 (a) Soil Class 1 2 3 Soil Class 1 2 3 Soil Class 1 2 3 1 0.85 0.35 0.25 1 0.60 0.25 0.55 1 -0.25 -0.10 0.30 2 0.35 0.85 0.25 2 0.25 0.55 0.30 2 -0.10 -0.30 0.05 3 0.25 0.25 0.90 (b) 3 0.51 0.10 0.55 (c) 3 0.26 -0.15 -0.35 Table 4.3. Original (a) and updated (b) misclassiflcation matrices, and the difference between the two (c). The value at cfl represents the possibility that type i would be misclassified as type j . The chart's trace contains maximum certainty values. evident that all maximum certainty values have decreased substantially, and that classes one and three have been considerably 'blurred'. Note also the lack of symmetry in the second table. Ac-cording to the optimisation procedure results, the possibility of mistaking Type 3 for Type 1 is different than the possibility of mistaking Type 1 for Type 3. 4 . 4 . D I S C U S S I O N These confirmatory sampling and fuzzy allocation procedures indicate that, in this soil and slope stability modelling scenario, expert opinion has not provided an accurate assessment of classifi-cation uncertainty. However, there are only 171 samples over four transects available to base this conclusion on. Sampling for spatial structure is very time and resource consuming relative to spot sampling; it would be difficult to gather sufficient amounts of transect datasets to even approach statistical certainty. When 40 or 50 individual soil pits are subsumed into one statistic (such as the cross-correlation peak or t-test for one transect), it becomes difficult to generate a sufficient n to satisfy most statistical tests. Moreover, most standard statistical comparison tech-niques require independence of samples; by its very nature the samples composing a transect are not independent, although the transects themselves are independent from one another. Here, cross-correlation is used to compare sampled transects with modelled transects. Cross-correlation provides two results: the strength of the relations between the two series, and the offset in distance between them at their position of maximum correspondence. In this section, the parameters of the model generating the modelled transects were reset using an iterative proce-dure. Resetting the parameters of the model has increased the strength of the relations between the sample transects and the model. Details are provided in Appendix C. 114 However, this 'increase in strength of the relations' is only a statistical comparison between the actual and modelled versions of a slope stability model input (soil). The true test of these confirma-tory samples and changes in model parameters will be a comparison of the model's final results— slope stability uncertainty—with some actual slope stability data. Only then will it be possible to see if the changes wrought by this procedure have any real value in increasing the predictive capability of the slope stability uncertainty model. A high-resolution dataset of slope failures is required to perform this task. The following chapter is devoted to analysis of both realisations of the uncertainty model (i.e., pre- and post-calibration) using such a dataset. 4 . 5 . C A L I B R A T I O N O F C O N T I N U O U S D A T A The work discussed above focuses on calibration of classified, polygonal data in a spatially-ori-ented uncertainty model. Although the focus is on soils, the techniques developed or adapted are of use with any type of data that have been broken into classes and 'shoe-horned' into a polygonal structure (i.e., they have a more continuous distribution in reality than polygons would indicate.) This section focuses on calibrating data of a different nature: continuous data represented by cardinal values. This type of calibration or verification is not typically a difficult procedure, as values can be directly compared between the model and samples. The only major complicating factors are spatial uncertainty causing mismatches between the two, and issues of sampling scale. The two principal inputs to the slope stability model are soil type and slope; the previous section used the former as an example, while this section addresses calibration of the latter (again, as an example of a typical set of continuous values). In the work discussed in Chapter Three, the level of uncertainty in the slope values was estimated from the published error statistics for the data source and from the Kriging function used to generate the elevation model. These values were propagated through the slope function using Monte Carlo techniques, and a final value and vari-ance were derived for each cell in the model (details are in §3.2.2). Typically, calibration of a continuous value does not require advanced methods, just a numerical comparison. However, this calibration is of particular interest due to a heavy reliance on elevation 115 data in many sectors of natural resource management. Many decisions are made based on these data (e.g., sight-lines, slope stability, road placement, growth models), and few independent stud-ies have been undertaken to verify the elevation data or their derived products such as slope. There are no known studies from the test region or, in fact, the Queen Charlotte Islands in general. Again, it must be noted that the calibration focuses on values falling within the estimated uncertainty bounds. In the case of continuous values, however, this comparison is much easier to achieve than with classified values. The soil sampling effort described in the previous sections also included slope measurements at all sample points. Additional slope measurements were made on an opportunistic basis, providing a grand total of 240 sample points. The purpose of this sampling was not to perform a thorough statistical analysis of elevation data. This type of work has been undertaken by several authors (e.g., Xiao 1996; Ruiz 1995), who generally conclude that elevation error is specific to regions, terrain, methods used, etc. Instead, the specific purpose is to characterise any significant consist-ent deviation from the elevation-model-based slope that fall outside of the error estimates for this particular area, and to use this information to perform a correction of slope values. However, if significant deviation does exist, this fact implies that other areas where similar data are used may also be subject to deviations outside the published error statistics. Slope values were sampled using a hand-held inclinometer with a tested accuracy of ± 3°. Sample values were gathered as above—characterising the local 25 x 25 metre zone. By focusing on the average slope over this area, rather than a 'spot' slope measurement, the value recorded for the sample eliminates short-range variations in slope. This fits the assumptions of the elevation model that is being calibrated (where 25 metre cells are used to best characterise source spot heights at an average 30 metre spacing). Positional uncertainty is an issue, as was noted with the offsets used for soil transects above. Initial tests were performed in which the comparisons discussed in the following paragraph were repeated using random shifts in cell location within the bounds of locational uncertainty. However, these tests indicated that the shifts had no significant effect on the results. The averaging nature of slope calculations (where a cell only has a slope relative to the 116 eight cells that surround it) is likely the reason for this lack of significant change. Therefore, positional uncertainty was ignored in the following procedure. The variance in the slope values was derived as discussed in Chapter Three. To reiterate, elevation values were gathered from published spot height data derived from photogrammetry. The eleva-tion model was produced using Kriging, and elevation uncertainty was determined by combining the variance output from the Kriging procedure with the published error statistics for the spot heights. The predicted variance in slope was determined through a Monte Carlo procedure, in which an 'equally likely' DEM was produced, a slope surface derived, and then the procedure repeated (n = 50). The predicted slope variance values were saved on a cell-by-cell basis, but are summarised in the figure discussed in the following paragraph using a set of lines (those above and below the zero line). A comparison of the modelled and sampled slopes (Figure 4.14) illustrates considerable variability between the two outside the bounds established by variance calculations. However, a definite trend is apparent in the data. Gentle slopes are generally overestimated by the source data and slope modelling procedure (i.e., the inclinometer-measured values were consistently gentler than 30 Measured Slope Figure 4.14. Differences between measured slope and TRIM modelled slope graphed relative to measured slope. The 'expected' lines enclose the variance that is expected to be found in the values, based on the published error values and the additional error introduced during DEM generation. Variance is generally larger on gentle slopes. 117 the slope values derived from the spot height data), while steep slopes are generally underesti-mated. The trend-line is represented by: Y = -0.0006.*3 + 0.037* 2 - 1.02* + 9.8; JP = 0.73 (4.11) This formula (4.11) can be used to correct calculated slope values for this particular region. It would be unwise to use this formula to generalise outside of the study area and its environs, as it is unclear whether the extreme terrain, the contractors who produced the elevation spot heights, or the slope modelling routine assumptions are to blame. For example, the photogrammetric equipment and algorithms used to derive elevations for the source data might have been miscalibrated, or designed for less extreme terrain. The abrupt changes between forested and cleared areas may not have been compensated for (stem density tends to be very high in this area). The slope algorithm used in the GIS could also have a bearing on the error; such derived values are rarely checked in the field. This would be an area for further study. Some related work has been performed by Ruiz (1995) and Xiao (1996). Nevertheless, whatever the cause, within this type of terrain and in areas where the same photo-grammetric equipment and routines were used to gather elevation data, this slope correction should provide more accurate values for stability modelling. This correction factor was applied to the dataset, and the corrected values will be employed in the following chapter when implement-ing a slope stability model on an island near to the current study area. This correction has reduced variability, but the variability of the corrected slope values are still greater than the published statistics indicate (demonstrated by the vertical range in Figure 4.14—which still fall outside the variability line after the trend is straightened out). Therefore, the overall variability values are redefined for the calibrated version of the model, based on these post-correction val-ues. 4 . 6 . C O N C L U S I O N S The calibration and verification of uncertainty models was noted as a neglected area in the devel-opment of such models. Two major areas were identified: the verification of classified 'fuzzy' val-ues and the verification of continuous values. Several procedures were proposed, including a 118 number of possible extensions to the Mahalanobis distance metric for dealing with class variabil-ity in attribute space. Sample 'fuzziness' was also noted as an issue, and a complex measure of distance were adapted for use in measuring and determining fuzzy sample allocation statistics. The principal questions addressed in this chapter were: 1) how can fuzzy classification structures be compared with confirmation samples?; 2) how well did expert opinion function as an input to generate the distribution of uncertainty represented by the fuzzy structures (the transition corri-dor model)?; 3) how well does metadata gathered from published statistics represent the actual uncertainty on the ground? (focusing on major model inputs); and 4) how can these confirmation data be used to recalibrate the model? Methods for comparing fuzzy classification structures with confirmation samples were developed using the extensions to the Mahalanobis distance metric; other issues were presented, such as sample variability and complex distance issues. A subset of these methods was utilised to cali-brate the uncertainty model developed in Chapter Three using field transect data. It was deter-mined that the expert opinion input for classification uncertainties did not adequately describe the necessary values, due in part to the nature of the classes utilised. The metadata gathered from published statistics were tested against ground data for the major continuous input to the model: slope percentage. Analysis of field data indicated that the continu-ous values in the slope dataset were outside their expected zone, as determined by stochastic simulation using source data error statistics. However, a clear trend was found to be present in the model-sample comparisons, and therefore global corrections were possible. The model was recalibrated using these data. The soil uncertainty values were also recalibrated based on the confirmatory data using an iterative procedure, utilising cross-correlation analysis incorporating sample spatial uncertainty. Having introduced the uncertainty model and its application to slope stability modelling in Chap-ter Three, and having presented methods for field-verifying and updating the inputs and param-eters of such a model, at this point there remain several crucial unanswered questions. First, how useful is the uncertainty model? The principal model parameters (though not all) have now been 119 calibrated, but currently the model results are simply predictions of possibility of slope failure and level of certainty in that prediction. The basic question is: does the uncertainty model accu-rately predict uncertainty in the slope stability model? For example, in areas where slides have not occurred, did the model either predict a low factor-of-safety (FS) or predict a high FS with high uncertainty? A second hypothesis follows from the work in this chapter: does the updated version of the model better predict slides and slide uncertainty than the original? 120 Chapter Five Evaluation of Uncertainty Model Output 5.1.INTRODUCTION The confirmatory sampling undertaken in Chapter Four addresses the problem of tuning uncer-tainty model parameters. This type of information is useful for a purely descriptive inventory model, or when inventory information is used as input to a process model such as slope stability. However, much of the work in Chapters Three and Four refers to the output of a slope stability process model—one of the essential items of information used to make forest management deci-sions. The model itself predicts relative levels of stability, while the uncertainty model predicts the variability in these results. This chapter focuses on confirming the latter through use of a highly accurate landslide database. The work discussed in this chapter is applicable to many types of inventory uncertainty models; however, slope stability model evaluation is important in its own right. Slope stability models are rarely evaluated in a data-rich environment. Typically, they are tested on very limited areas, then applied in a wide variety of situations (Christian et al. 1994). Proper evaluation of a model requires detailed landslide data that cover a wide temporal swath. This type of information is rarely gath-ered at a scale appropriate to mass wastage analysis (Aung 1992). Even though this information is available here, the relative nature of the model's predictions (i.e., there is no 'slide/no slide' cut-121 off line) precludes a complete evaluation. True 'evaluation' of a model that makes use of relative predictions requires something similar to evaluate it against—typically another model or another version of the current model. Evaluating it against reality is more difficult. Process models incorporating spatially variable uncertainty are a relatively new development. There are few guidelines for developing methods of evaluating their predictive capability. In the case of the slope stability uncertainty model there are several issues that circumvent simple analysis. These include: 1. Binary events predicted on a cardinal scale. The predictions made regarding slope stabil-ity are on a cardinal (i.e., approximately one to six) scale. In the absence of prior studies that calibrate these numbers for the region, there is no obvious cut-off line between 'slides will occur' and 'slides won't occur'. However, this information is to be compared with just such yes/no events. Simple summaries and tests of significance are therefore not available to fully evaluate model performance. 2. Evaluating predictions with variance. The uncertainty model would no doubt be consid-ered successful if all mass wastage zones were predicted with low factor-of-safety values and tight standard deviations. However, it could be considered equally successful if very few slides were predicted, but associated variance was very high. Although in such a case the predictions for mass wastage would be useless, the uncertainty model would be illuminat-ing the fact that the source data are of insufficient quality to support predictions (in itself an important output). This considerably complicates the process of evaluation. 3. Grid model. The raster model used for this procedure involves some different assumptions them the vector model used to gather the mass wastage information. For one, the resolution of the raster model is different than the vectors (25m vs. lm or less). Slide zones smaller than a 25 metre pixel will therefore be poorly modelled, leading to inaccuracies. Similarly, the smoothing of slide boundaries will also affect model predictive accuracy. Another raster issue is the multiple predictions applied to slide areas larger than one cell. This variability must be addressed in the evaluation. However, a raster model is required by the slope 122 stability uncertainty modelling procedure, whose focus is the continuous variation of uncer-tainty across the landscape, and therefore these raster-vector issues must be addressed. 4. Spatially variable variance. For similar reasons to the point above, multiple cell slides will not only incorporate variance between cell predictions, but also variance in each cell's pre-diction. 5. Multiple realisations. The fuzzy model allows multiple realisations of its output; the user must choose the appropriate way of visualising or using these data. Therefore, there is no single answer to the question of model confirmation. 6. Incomplete data. Although the mass wastage database used in this evaluation has both high spatial resolution and a wide temporal extent, it does not capture every possible slide. There are undoubtedly still some areas that have yet to slide due to past forestry activity, and pre-logging slides are only partially captured (i.e., they are outside the temporal extent of the model). This will contribute to apparent inaccuracies in model predictions. 7. Autocorrelation. As with most spatial models, contiguous spatial units cannot typically be considered as independent samples. In the case of slope stability analysis, if one cell con-tains a slide, it is very likely that the one below or above it (on the slope) also contains a slide. It is almost as likely that the ones beside it are slide zones as well. This lack of independence violates many statistical test assumptions. It is therefore necessary to rely on a number of descriptive or exploratory techniques in their stead. This evaluation therefore relies primarily upon techniques of exploratory spatial data analysis (ESDA, see Keller 1994), coupled with some standard statistical tests where appropriate. It pro-vides no single answer to a hypothesis of predictability. Instead, it offers comparisons of a number of model realisations and methods of summarising model predictions. As with many other uncer-tainty analysis techniques, this lack of a single answer may frustrate those accustomed to seeing the resource analysis world in black and white. However, this is balanced by the increase in information content regarding the process model, the source data, and the field site itself. 123 5 . 2 . M E T H O D O L O G Y The slope stability uncertainty model developed and calibrated on Louise Island (Chapter Four) is applied to data of similar source and resolu-tion on Lyell Island, 50km to the south (Figure 5.1). The two islands have similar terrain, simi-lar soils, and were subject to similar resource extraction methods. There is reason to believe that slope processes are similar on both (B. Pe-ters pers. comm.). Although it would be ideal to calibrate the model at the original test site, the second site was used for practical reasons. Fund-ing was available for mass wastage database development for only the latter. Soil and elevation data are processed in a manner similar to that described in Chapter Three. The calibrated version of the model (calibration includes both soil parameters and updated slope values based on field testing) is run using the methodology described therein, and results are produced at the same spatial resolution (25m). A second set of results is produced using the original (not calibrated) model parameters and slope values to facilitate comparative evaluation. The mass wastage data are taken from a database developed for this work. The database covers over twenty years of slide history in the study area, has a spatial resolution of approximately 1 metre, and an average accuracy level of under 3 metres. The tools used to develop the database make use of uncertainty visualisation techniques in a data fusion tool that merges oblique frames with planimetric data. The details of the system and its development are provided in Appendix D. This development represents a new way of entering data into a GIS through the use of uncertainty tracking and uncertainty visualisation. It was an integral part of building the database used in this chapter. Figure 5.1. The Lyell Island study area. 124 The landslide data were gathered for two complimentary purposes. The first is the uncertainty model confirmation discussed herein. The second purpose was to provide a baseline and initial high resolution dataset for on-going monitoring of landslide stabilisation efforts. A summarised version of the case study is presented in Appendix E. The full case study is described in Davis et al. (1998). Appendix E also includes (in context) details of construction of the database used in this chapter. All slides in the database are utilised (going back to those visible in 1976), including those that have since stabilised and grown back. Divisions between spatially contiguous slides are dissolved, and a raster database is produced at the working resolution of 25m. Two general hypotheses are proposed: 1. H 0 : Predictions in slide zone cells are not significantly different than predictions for the population (all cells). 2. H 0 : Predictions made by the original version of the model (prior to parameter calibration) are not significantly different than those made by the calibrated version produced in Chapter Four. These hypotheses are tested using summary values in several different contexts. The exploratory analysis methodology uses the following sequence: 1. Analysis of data means; 2. Presentation of alternative realisations of the uncertainty model; 3. Description and analysis of data variance; 4. Graph and comparison of expected vs. actual values; 5. Incorporation of spatial constraints. 5.3. R E S U L T S The infinite slope stability model predicts where areas of greater or lesser slope stability occur based on local slope, soil type, and ground cover. The model of uncertainty in slope stability 125 introduced in Chapter Three is a generalisation of this model, in which the mean value of the maximum likelihood output corresponds to the original (Boolean) model. The uncertainty model contains a considerable amount of additional information about alternate possibilities and vari-ance in the results. With this additional information comes an increased responsibility to under-stand the details of the model, the assumptions built in, and the implications of the data in order to properly analyse and communicate this information. In this discussion of results a number of different methods are used to compare the model predic-tions with the landslide data. The discussion progresses from a simple non-spatial statistical comparison of means through to a number of different methods of dealing with variance in the results and spatial constraints on the model. 5.3.1. C O M P A R I S O N O F M E A N S An initial step in comparing the predictions and the landslide data is to simply see if the slide areas (mapped as discussed in Appendix D) have predicted factor-of-safety values (slope stability predictions) that are significantly different than non-slide areas. This involves comparing the mean values of the factor-of-safety for the population (all cells) and for the slide cells. As noted in the Chapter Three, the uncertainty model can be viewed using a variety of 'realisa-tions'. For each cell in the model, all possible combinations of inputs (soil and forest classes) are stored with their associated likelihood. A 'realisation' of the uncertainty model involves choosing from among these possibilities in a structured manner. The realisation utilised initially in this means comparison is maximum likelihood, in which the most likely value for each cell, as defined by the fuzzy overlay value, is fixed. This realisation produces the same numbers as would be obtained using the standard (Boolean) version of the slope stability model. The values are pre-sented in Table 5.1., row #2. Other realisations presented in this table will be discussed in up-coming sections. The values in this table are based on the calibrated version of the model as produced in the previous chapter. They are presented together here to facilitate comparison at later stages of analysis. 1 2 6 As Cells Mean SD N ' Z-test Probability * 1 Slides: Max Likelihood Realisation, Std. Dev. Surface. 0.040 0.020 1801 2 Slides: Max Likelihood Realisation, Factor-of-Safety Surface 1.61 0.48 1801 -18.33 0.99 3 Slides: Worst Case Realisation, Std. Dev. Surface 0.030 0.023 1593 4 Slides: Worst Case Realisation, Factor-of-Safety Surface 1.19 0.65 1593 -44.9 0.99 • ' ., " VC ' 5 Slides: Worst Case Realisation, Std. Dev. Sfc, Upper 50% of slides 0.025 0.017 683 6 Slides: Worst Case Realisation, Factor-of-Safety Sfc, Upper 50% of slides 1.09 0.57 683 -33.58 0.99 . . . . • , . . 7 Population: Max Likelihood Realisation, Std. Dev. Surface 0.051 0.024 1825 8 Population: Max Likelihood Realisation, Factor-of-Safety Surface 1.87 0.61 1825 9 Population: Worst Case Realisation, Std. Dev. Surface 0.34 0.68 1825 10 Population: Worst Case Realisation, Factor-of-Safety Surface 1.00 0.94 1825 11 As Slide Units 1.55 0.22 154 -6.65 0.99 * Probability that slide cells are a different population than non-slide cells Table 5.1. Summary statistics for slope stability predictions, based on mean values. Rows 1 through 10 show various realisations of the uncertainty model, and summarise values for either 'slide' cells—where the model predictions were correct, or 'population' values, for the entire area. The paired rows (e.g., 3&4) show summaries for the standard deviation surface and the prediction surface for a particular realisation. The realisations listed here are introduced one at a time as the chapter progresses. Although the Z-test statistic is not entirely appropriate, due primarily to the lack of random sampling (the population is roughly normally distributed based on x 2 at 15% significance), the very high value displayed (-18.33) indicates that the mean predictions of factor-of-safety in slide zones are significantly different from the population (i.e., the slope stability analysis is generally capable of distinguishing slide zones from non-slide zones). However, the probabilities associated with the Z statistic cannot be reliably estimated due to the violation of statistical assumptions. Further on in this chapter a sampling method will be discussed that bypasses some of these violations. 127 These simple summaries do very little to explain the 1 o 0.8 detailed information contained in the slope stability predictions. A first step in expanding the analysis is to look at the relative frequencies of the results. A pair of histograms illustrating the factor-of-safety (FS) predictions for slide zones and the non-slide zones are presented in Figure 5.2. The peak of the graph is chosen as a rough dividing line between 'slide' predic-tions and 'non-slide' predictions. This means that: 0.6 0.4 0.2 0.0 1 Slides Not Slides / 1 \ \ ' V \ Factor of Safety Figure 5.2. Relative frequency of slide zones and non-slide factor-of-safety val-ues using an ML realisation. a) everything under the slide curve (light line) on the left side of the divide represents cells that were predicted as 'will slide' and did actually contain a slide (correct prediction); b) everything under the slide curve on the right side of the divide represents areas that were designated 'safe', but actually contained a slide; c) everything under the 'non slide' curve on the left represents areas that were predicted as slides that did not slide; and d) Everything under the 'non slide' curve on the right represents areas predicted as 'safe' that are safe. Keep in mind that this is a histogram, so the absolute values of the curves are not an issue (i.e., in an absolute graph the 'slide' curve would be tiny relative to the 'non slide' curve). This highlights an important distinction in slope stability modelling: there are two types of wrong answer, comparable to Type I and Type II errors in hypothesis testing. Predicting that an area will fail when it does not (hereafter called Type A) creates only an economic problem (e.g., trees in the area are not harvested when they could have been). However, not predicting a slide that does occur (Type B) can have more than economic consequences. Damage to personnel, equipment and infrastructure can occur, in addition to environmental damage such as stream degradation and loss of soil. In this ML realisation, one-half (47%) of slide cells that were predicted as 'safe' (using the rough estimator discussed above) fall into the crucial Type B class. In a t y p i c a l B o o l e a n a n a l y s i s t h i s w o u l d r e p r e s e n t t h e f i n a l r e s u l t . O n e - t h i r d o f t h e c e l l s w o u l d b e p o o r l y p r e d i c t e d , a n d w e w o u l d go b a c k a n d t r y a n d r e v i s e t h e s l o p e s t a b i l i t y m o d e l to i n c r e a s e p r e d i c t i v e a c c u r a c y . H o w e v e r , t h e u n c e r t a i n t y m o d e l h a s r e t a i n e d a c o n s i d e r a b l e a m o u n t o f d a t a r e g a r d i n g a l t e r n a t i v e r e a l i s a t i o n s t h a t c a n a s s i s t i n r e d u c i n g t h i s T y p e B p r e d i c t i o n e r ro r . 5.3.2. ALTERNATIVE REALISATIONS A s i n t r o d u c e d a n d b r i e f l y d e m o n s t r a t e d i n C h a p t e r T h r e e , t h e r e i s a w i d e r a n g e o f p o s s i b l e r e a l i -s a t i o n s o f s l o p e s t a b i l i t y u n c e r t a i n t y . T h e i n c l u s i o n i n t h i s m o d e l o f s p a t i a l l y v a r i a b l e c e r t a i n t y f a c t o r v a l u e s a l l o w s m o r e t h a n t h e m o s t p r o b a b l e v a l u e fo r e a c h c e l l to b e d i s p l a y e d . F o r e x a m p l e , t h e ' w o r s t - c a s e s c e n a r i o ' (WC) r e a l i s a t i o n u t i l i s e s t h e ( a p p l i c a t i o n - s p e c i f i c ) l o w e s t r e a s o n a b l e v a l u e for f a c t o r - o f - s a f e t y r a t h e r t h a n t h e m o s t p r o b a b l e (where ' r e a s o n a b l e ' i s t y p i c a l l y d e f i n e d t h r o u g h a n i te ra t i ve p r o c e s s u s i n g a s l i d i n g sca le ) . T h e W C v e r s i o n o f t h e m o d e l a l l o w s e r r o r to b e f o c u s e d o n t h e s i d e o f c a u t i o n , b y d e c r e a s i n g t h e p o s s i b i l i t y o f T y p e B e r r o r a t t h e e x p e n s e o f T y p e A . N o t e , h o w e v e r , t h a t t h e a c c e p t a b l e b a l a n c e b e t w e e n t h e s e two t y p e s o f e r r o r m u s t b e d e t e r m i n e d b y t h e a p p l i c a t i o n . A s d i s c u s s e d i n e a r l i e r c h a p t e r s , t h i s ' t o l e r a n c e o f r i s k ' d e p e n d s u p o n a n u m b e r o f e x t e r n a l f a c t o r s . T h e u n c e r t a i n t y m o d e l h a s r e t a i n e d s u f f i c i e n t i n f o r m a t i o n t h a t t h e a n a l y s t c a n m a k e t h i s t y p e o f t r a d e - o f f i n t h e f i n a l s t a g e s o f a n a l y s i s (i .e., v i s u a l i s a t i o n fo r d e c i s i o n s u p p o r t ) r a t h e r t h a n a t t h e e a r l i e s t d a t a g a t h e r i n g s t a g e s (as w i t h s t a n d a r d B o o l e a n m o d e l s ) . A W C r e a l i s a t i o n w a s p r o d u c e d u s i n g t h e l o w e s t f a c t o r - o f - s a f e t y v a l u e p r e d i c t e d fo r e a c h c e l l a t a r e a s o n a b l e (i .e., > 0 . 5 5 ) c e r t a i n t y factor . F i g u r e 5 . 3 s h o w s t h e s a m e v a l u e s a s F i g u r e 5 . 2 , b u t u s i n g t h i s s e p a r a t e r e a l i s a t i o n . If w e u s e t h e s a m e d i v i d i n g p o i n t a s t h e p r e v i o u s f i g u r e ( F S = 1.6 - t h e d o t t e d l i n e o n t h e r ight) t h e r e i s n o w o n l y 2 0 % o f t h e 0 8 s l i d e a r e a o n t h e r i g h t s i d e ( p r e d i c t e d a s 'safe ' , u s i n g 0 6 New divider i i Median (slide/no slide) K from previous figure 1.0 a r e a u n d e r t h e curve ) . E v e n i f t h e d i v i d i n g l i ne i s m o v e d to t h e n e w g r a p h p e a k ( re f lec t ing t h e l o w e r m e a n F S t h a t r e s u l t s f r o m p e s s i m i s t i c p r e d i c t i o n s — s e e T a b l e 5 . 1 , r o w s 3 a n d 4) , t h e r i g h t s i d e s t i l l h o l d s o n l y 3 2 % . T y p e B e r r o r h a s b e e n r e d u c e d . H o w e v e r , t h i s i s e v i -0.4 0.2 0.0 -0.2 L 1 Slides Not Slides 1 * 1 \ 1 V 1 \ . R NT # I 1 v—v/\ V o o o - ^ - ^ T ^ o j c J o i c o c o Factor of Safety Figure 5 . 3 . R e l a t i v e f r e q u e n c y o f s l i d e z o n e s a n d n o n - s l i d e f a c t o r - o f - s a f e t y v a l -u e s u s i n g a w o r s t - c a s e r e a l i s a t i o n . 129 dently at the expense of Type A error. The Type A curve (the 'non-slide' line on the left of the divide) has increased in relative area from Figure 5.2. A variety of other realisations are possible, as the uncertainty model delays 'hardening' the data into a specific state for as long as possible in the analysis process. The purpose of the analysis will determine the appropriate realisation, as there is no 'best' way of looking at the data. For exam-ple, a road building project would have different requirements from a harvesting risk model, and a seismic crew utilising the data would have their own specifications for acceptable uncertainty. 5.3.3. V A R I A N C E While alternate realisations make use of some of the unique characteristics of an uncertainty model, the variance values have not, as yet, been utilised. The variance values represent, on a cell-by-cell basis, the spread of output generated by the Monte Carlo simulations in the uncer-tainty-based slope stability model (see §3.3.1). Once you decide on the realisation you will use, each cell in that realisation has a slope stability pre-diction (factor-of-safety number) and an associated standard deviation. The standard deviation in each cell will differ between realisations. This standard deviation value will help to determine with what cer-tainty a particular prediction has been made, and how this uncertainty varies with both spatial and attribute variables. Figure 5.4. Factor^of^saiery values for slide zones relative to number of cells. A series of standard deviation thresholds are In order to generate a summary of how standard de- used in separate curves. viation (SD) behaves relative to factor-of-safety, the SD values are thresholded at decreasing values and a histogram similar to Figure 5.2 generated (Figure 5.4). The largest SD is approximately 0.16, therefore, the 'SD=0.16' curve is virtually identical to the 'slide' curve in Figure 5.2. The 'SD=0.14' curve represents a histogram of all cells with a SD of 0.14 or less. This thresholding is continued down to the small SD=0.02 'bump' at the lower left (around FS=0.8). 130 Generally the graph shows that: 1. Some very dangerous areas are predicted with high prediction certainty, but there are not very many of these. 2. In the zone where most slides occur (FS 1.3-1.8; medium danger), they can only be predicted with intermediate rather than high certainty. 3. As predictions move towards the safer side of the series (> 1.6), the uncertainty in the pre-diction increases, demonstrated by the increas-ing spread on the graph. Factor of Safety Figure 5.5. Factor-of-safety values for non-slide cells relative to number of cells. A series of standard deviation thresholds are used in separate curves. (a) 0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 r r CO CM tD O O N (V (p) Factor of Safety Factor of Safely Figure 5.6. The previous two figures are here graphed using cumulative values, (a) details slide cells, while (b) shows the non-slide cells. Viewing this same type of graph using the non-slide data (Figure 5.5), it is apparent that there is a different distribution of uncertainty in these data than exhibited by the slide data. The graphs are similar in dangerous areas; however, overall the non-slide demonstrates more uncertainty. Figure 5.6 highlights how the variance increases gradually for the non-slides towards the right of the curve (b), but quickly for the slide cells (a). These variations will be examined in further detail below. 131 5.3.4. EXPECTED VS. ACTUAL T h e sections above have looked p r i n c i p a l l y at descriptive s tat is t ica l a n d graphic methods for e x a m i n i n g differences between sl ide a n d non-s l ide areas. However, the eva luat ion of any type of model typical ly uses a c o m p a r i s o n of expected vs. a c t u a l values . T h e p r o b l e m is that, as intro-d u c e d above, there i s a range of expected values. H i g h u n c e r t a i n t y a n d low predictive success m u s t be cons idered as va luable (though of less p r a c t i c a l use) t h a n low uncer ta inty a n d h i g h predictive success . F o r example, i f one of the sources for the model i n t r o d u c e d low r e s o l u t i o n or low qual i ty data , t h i s w o u l d be represented i n the resul ts as h i g h u n c e r t a i n t y a n d low predictive success. In th is case the u n c e r t a i n t y model is h ighl ight ing a data p r o b l e m rather t h a n problems w i t h the m o d e l itself. E v e n t h o u g h the sl ide areas are poorly predicted, as a n uncertainty model , the process is a success . L o w predictive success w i t h low u n c e r t a i n t y w o u l d indicate poor per-formance of the slope stabi l i ty model (or some other factor s u c h as cal ibration). A second p r o b l e m is that direct c o m p a r i s o n s of sl ide areas a n d predict ions are diff icult , due to the necessity of j u x t a p o s i n g b i n a r y (slide) d a t a w i t h c a r d i n a l predict ions a n d the ir associated v a r i a b i l -ity (enumerated i n greater detai l above). One possible way of a d d r e s s i n g b o t h of these problems is t h r o u g h d e t e r m i n i n g w h a t the expected d i s t r i b u t i o n w o u l d be for b o t h factor-of-safety a n d v a r i -ance, a n d t h e n g r a p h i n g the expected zone. Success w o u l d be d e t e r m i n e d b y g r a p h i n g b o t h the p o p u l a t i o n a n d the s l ide cells a n d d e t e r m i n i n g the percentage that fall in to the 'expected' area. T h i s possible range of expected va lues is rep-resented graphical ly as the area shaded dark ly i n Figure 5 . 7 u s i n g a n F S variance vs. F S graph (the s h a d i n g represents a general tendency rather t h a n specific n u m e r i c values). However, a plot of the p o p u l a t i o n (Figure 5.8a) shows that the p o p u l a t i o n a lready exhibi ts this ten-dency (i.e., i t falls r o u g h l y w i t h i n the shaded area of F i g u r e 5 .7) . General ly , h i g h var iance always is associated w i t h safe areas, whi le low 1 2 3 Factor of Safety Figure 5.7. A scatter g r a p h of var iance vs. factor-of-safety for sl ide zones s h o u l d fall w i t h i n these general b o u n d s i f the pre-dictive a c c u r a c y of the u n c e r t a i n t y model is good. 0.18 0.16 g 0.14 I 0.12 | 0.10 •S 0.08 CD ? 0.06 a 55 0.04 0.02 0.00 ! ••'•i«ir' I •' 1'. * .* ..* . - f (fa*?.*' »*»*} * * .**< 0.18 0.16 1 °-14 2 0.12 | 0.10 •S 0.08 a £ 0.06 a 5 0.04 0.02 0.00 -—. • • • * * 2 3 Factor of Safety 2 3 Factor of Safety Figure 5.8 (a) P o p u l a t i o n ( r a n d o m s u b s e t ) v s . s t a n d a r d d e v i a t i o n ; (b) S l i d e c e l l s (all) v s . s t a n d -a r d d e v i a t i o n . v a r i a n c e i s a s s o c i a t e d w i t h u n s a f e a r e a s . R e - e x a m i n i n g t h e s t a n d a r d d e v i a t i o n f i g u r e i n C h a p t e r T h r e e ( F i g u r e 3 .10) , t h i s a s s o c i a t i o n i s g r a p h i c a l l y a p p a r e n t . H i g h e s t v a r i a b i l i t y e x i s t s o n v a l l e y b o t t o m s , m o s t p r o b a b l y d u e to e x a g g e r a t i o n s of m i n o r s l o p e v a r i a t i o n s i n t h e m o d e l l i n g p r o c e s s . T h e fac t t h a t t h e p o p u l a t i o n already e x h i b i t s t h e e x p e c t e d d i s t r i b u t i o n m a k e s a n ' i n s i d e o r o u t -s i d e t h e l i n e ' t y p e o f u n c e r t a i n t y m o d e l e v a l u a t i o n d i f f i c u l t . T h e r e a r e few c e l l s u i a t h a v e a h i g h F S and a l o w v a r i a n c e . T h i s d i f f i c u l t y i s c o m p o u n d e d b y a l a c k o f s p e c i f i c n u m e r i c b o u n d a r i e s fo r t h e p r e d i c t i o n r e g i o n , m a k i n g it i m p o s s i b l e to c o m p a r e s l i d e p r e d i c t i o n s w i t h t h e p o p u l a t i o n o n t h e b a s i s o f s c a t t e r p l o t s h a p e s . (Note t h a t t h e p l o t i n F i g u r e 5.8a d i s p l a y s a r a n d o m s u b s e t o f t h e p o p u l a t i o n w i t h a c o u n t e q u a l to F i g u r e 5.8b fo r t h e p u r p o s e o f c o m p a r i s o n a n d v i s u a l c lar i ty ) T h e r e f o r e , it i s n e c e s s a r y to d r o p t h e s t i p u l a t i o n n o t e d a b o v e t h a t h i g h v a r i a b i l i t y a n d h i g h f a c -t o r - o f - s a f e t y e q u a l s p r e d i c t i v e s u c c e s s . ' It w i l l i n s t e a d b e n e c e s s a r y to f o c u s s p e c i f i c a l l y o n p r e -d i c t i o n s o f s l i d e s w i t h l o w v a r i a b i l i t y a n d l o w F S (i.e., o n w h e t h e r s l i d e c e l l s a p p e a r i n t h e l o w e r left s i d e o f t h e c u r v e — t h e ' u n s a f e ' a r e a ) . A c o m p a r i s o n o f t h e p o p u l a t i o n a n d t h e s l i d e c e l l s ( F i g u r e 5.8) s h o w s a s l i g h t s h i f t i n c o n c e n t r a -t i o n to t h e left ( d e c r e a s e i n m e a n F S i n s l i d e ce l ls ) , a n d t h e d i s a p p e a r a n c e o f m o s t s a f e o u t l i e r s (far r ight ) . H o w e v e r , t h e r e i s n o s u b s t a n t i a l c h a n g e i n t h e c o n c e n t r a t i o n o f c e l l s . (The a p p e a r a n c e o f two a p p a r e n t ' n u c l e i ' o r c o n c e n t r a t i o n s o n t h e l o w e r left a r e m o s t p r o b a b l y r e m n a n t s o f t h e c e n -t r o i d s i n s o i l a t t r i b u t e s p a c e o f t h e t w o m o s t p r e v a l e n t s o i l t ypes ) . H o w e v e r , w h e n t h e w o r s t - c a s e d a t a a r e i n t r o d u c e d ( F i g u r e 5.9), t h e d e g r e e o f c o n c e n t r a t i o n i n t h e l o w e r left i s f u r t h e r i n c r e a s e d . 133 P r e d i c t i v e s u c c e s s h a s g o n e u p ( f rom t h e v i e w p o i n t o f r e d u c i n g v a r i a b i l i t y ) . H o w e v e r , t h i s i s n o t t h e f i n a l c o m p a r i s o n . T h e r e a r e s t i l l a d d i t i o n a l m a n i p u l a t i o n s t h a t a r e p o s s i b l e d u e to b o t h t h e l a r g e a m o u n t o f i n f o r m a t i o n i n t h e u n c e r t a i n t y m o d e l a n d t h e s p a -t ia l l y v a r i a b l e n a t u r e o f t h a t m o d e l . 0.18 0.16 I °-14 I 0.12 | 0.10 •S 0.08 re 1 0.06 S 0.04 0.02 0.00 . - '..>j'i 2 3 Factor of Safety Figure 5.9. W o r s t c a s e r e a l i s a t i o n f a c t o r -o f - s a f e t y v s . s t a n d a r d d e v i a t i o n . 5.3.5. ZONAL SPATIAL LIMITS T h e W C r e a l i s a t i o n p l a c e s c o n s t r a i n t s o n t h e a t -t r i b u t e s i n o r d e r to b e t t e r f o c u s o n t h e o v e r a l l g o a l s o f t h e m o d e l . Spatial c o n s t r a i n t s c a n a l s o h e l p t h i s f o c u s . T h u s f a r i n t h e a n a l y s i s t h e c o m p a r i s o n s h a v e b e e n b e t w e e n t h e p r e d i c t e d a n d a c t u a l v a l u e s for t h e e n t i r e a r e a s d e s i g n a t e d a s m a s s w a s t a g e . H o w e v e r , w i t h i n a n i n d i v i d u a l s l i d e t h e r e a r e d i f f e r e n t z o n e s w h e r e d i f f e r e n t p r o c e s s e s o c c u r . O f p a r t i c u l a r c o n c e r n a r e t h e d i f f e r e n c e s b e t w e e n t h e u p p e r a n d l o w e r a r e a s o f a s l i d e . A c e r t a i n p e r c e n t a g e o f t h e l o w e r s e c t i o n o f a n y s l i d e c a n b e c o n s i d e r e d t h e d e p o s i t i o n z o n e . It c o n s i s t s o f t e r r a i n f e a t u r e s t h a t a r e n o t c o n d u c i v e to c o n t i n u i n g t h e s l i d e , s u c h a s s t a b l e s o i l s o r a r e d u c t i o n i n s l o p e . T h i s a r e a i s n o t t e c h n i c a l l y p a r t o f t h e s l i d e process, b u t c e r t a i n l y p a r t o f t h e d i s t u r b e d r e g i o n . T h e u p p e r a r e a o f t h e s l i d e i s t y p i c a l l y t h e i n i t i a t i o n z o n e , a n d i s m o r e i n v o l v e d i n t h e p r o c e s s o f t h e l a n d s l i d e . T h e r e f o r e , a l t h o u g h t h e d i v i d i n g l i n e w i l l v a r y , t h e u p p e r s l i d e a r e a s s h o u l d b e p r e d i c t e d w i t h g r e a t e r a c c u r a c y t h a n t h e l o w e r a r e a s , a l l e l s e b e i n g e q u a l . A s a g e n e r a l i s a t i o n , t h e l o w e r 5 0 % o f e a c h s l i d e z o n e w a s r e m o v e d f r o m t h e ' s l ide ' d a t a s e t . A s a r e s u l t , c o n -c e n t r a t i o n i n t h e l o w e r left o f t h e s c a t t e r p l o t s u b s t a n -t ia l l y i n c r e a s e s ( F i g u r e 5 . 1 0 ) . T h i s i s r e f l e c t e d b y a d r o p i n t h e m e a n f a c t o r - o f - s a f e t y v a l u e f r o m 1 .20 ( W C , f u l l s l ide) to 1 . 0 9 ( W C , u p p e r s l ide) i n T a b l e 5 . 1 . W i t h t h i s l a t t e r r e f i n e m e n t o f p r e d i c t i o n s s o m e o f t h e m i n o r v a r i a t i o n s b e c o m e m o r e v i s i b l e . F o r e x a m p l e , 0.18 0.16 | 0.14 2 0.12 | 0.10 "2 0.08 f 0.06 3$ 0.04 0.02 0.00 • :J 2 3 Factor of Safety Figure 5.10. W o r s t c a s e r e a l i s a t i o n u s -i n g t h e u p p e r 5 0 % o f s l i d e z o n e s . 134 i n c o m p a r i n g t h e l a t t e r g r a p h o f u p p e r s l i d e s w i t h t h e m a x i m u m l i k e l i h o o d r e a l i s a t i o n , it i s a p p a r -e n t t h a t o n l y o n e o f t h e two m a i n ' n u c l e i ' s h i f t s to t h e left. T h e W C r e a l i s a t i o n p r e d i c t s t h e m o s t f a i l u r e - p r o n e s l i d e a r e a s w i t h e q u a l p r o b a b i l i t y to t h e M L r e a l i s a t i o n . O n l y t h e ' m e d i u m d a n g e r ' p r e d i c t i o n s a r e s h i f t e d d o w n (left). T h e r o u g h a p p r o x i m a t i o n o f ' 5 0 % ' to d i v i d e s l i d e s i s ef fect ive ; h o w e v e r , t h e r e i s l i k e l y a m o r e ef fect ive d i v i s i o n p o i n t t h a t w i l l f u r t h e r i n c r e a s e p r e d i c t i v e a c c u r a c y . T h e r e f o r e , a l l s l i d e z o n e c e l l s (observed) w e r e c o d e d b a s e d o n t h e i r p o s i t i o n i n t h e l o c a l s l i d e a r e a . F o r e x a m p l e , a c e l l a t e l e v a -t i o n 1 0 0 m o n a s l i d e r a n g i n g f r o m 5 0 m to 1 2 5 m i n e l e v a t i o n w a s a s s i g n e d a ' p o s i t i o n v a l u e ' o f 0 . 6 6 (66%) . T h e s l i d e c e l l s w e r e t h e n d i v i d e d i n t o two g r o u p s b a s e d o n t h e f a c t o r - o f - s a f e t y p r e d i c t i o n s , m a k i n g u s e o f t h e g r a p h p e a k ( F i g u r e 5 . 2 , v a l u e o f F S = 1 . 6 ) a s a d i v i d i n g l i n e b e t w e e n ' s l ide p r e d i c t e d ' a n d ' n o s l i d e p r e d i c t e d ' (note t h a t , a s m e n t i o n e d ear l ie r , d u e to t h e r e l a t i v e n a t u r e o f t h e F S p r e d i c t i o n s t h i s d i v i s i o n i s n o t n e c e s s a r i l y ideal ) . T h e ' p o s i t i o n v a l u e s ' f o r t h e ' s l ide p r e d i c t e d ' c e l l s a r e g r a p h e d i n F i g u r e 5 . 1 1 ( re lat ive f r e q u e n c y ) . It i s e v i d e n t f r o m t h e m o v i n g a v e r a g e o v e r l a y ( th in l ine) t h a t t h e r e i s n o c l e a r s i n g l e b r e a k - p o i n t o n t h i s g r a p h to d i v i d e t h e s l i d e z o n e s b y p o s i t i o n . O f t h o s e t h a t a r e e v i d e n t , t h e s m a l l s l o p e b r e a k a t ~ 3 0 % w o u l d r e m o v e 2 0 % o f t h e c o r r e c t l y c l a s s i f i e d c e l l s f r o m t h e set , w h i l e t h e b r e a k j u s t a b o v e 5 0 % w o u l d r e m o v e 3 8 % o f t h e s l i d e p r e d i c t e d c e l l s . F r o m t h e d a t a g r a p h e d h e r e it i s e v i d e n t t h a t a n y d i v i s i o n i n t h i s g e n e r a l r a n g e w i l l i n c r e a s e p r e -d i c t i v e a c c u r a c y , b u t a t t h e e x p e n s e o f T y p e B e r -r o r s (as d i s c u s s e d earlier). In o r d e r to f u r t h e r e x a m i n e t h i s i s s u e a v i s u a l c o m -p a r i s o n o f p r e d i c t i o n a c c u r a c y a n d s l i d e p o s i t i o n i s c o m p i l e d (for t h r e e m a j o r s l i de a reas ) i n F i g u r e 5 . 1 2 . In t h i s f i g u r e i t i s e v i d e n t t h a t , g e n e r a l l y , p r e d i c -t i o n s fa l l i n t h e u p p e r s e c t i o n s o f s l i d e s — p a r t i c u -Figure 5.11. R e l a t i v e f r e q u e n c y o f t h e r e l a -l a r l y t h e la rge a n d / o r l o n g s l i d e s . H o w e v e r , m e d i u m u v e p o s m o n i n e a C h s l i d e for a l l c o r r e c t l y p r e -, „ . . . , , , . „ . . , , , d i e t e d c e l l s ( b a s e d o n a n F S = 1.6 d i v i s ion ) , a n d s m a l l s l i d e s w o u l d t y p i c a l l y b e e i t h e r i n c l u d e d . . . ,. . _ J r J T h e t h i n l i n e i s a m o v i n g a v e r a g e o f t h e r a w d a t a ( th i ck l ine) Relative Position in Slide 135 Figure 5.12. Position of low FS predicted areas (grey cells) relative to slides (dark lines) and their position and orienta-tion on slopes (using 50m contours—thin lines). The 25m cell size provides a relative scale indicator, (a) is a draped perspective view of one side of the Gogit valley; (b) is a plan view of an intensive slide region in the centre of the island; (c) is a plan view of the Powrivco Valley (the initial test area). 136 or excluded from the predicted set. The strong influence of soil polygon divisions is also evident in this Boolean realisation of slide prediction, providing evidence indicating why the graph in Figure 5.11 shows no clear division. Note, however, that this binary division into 'predicted' and 'not predicted' is a somewhat arbitrary slice into a complex range of predictions. Extensions to this analysis might focus on how prediction behaves spatially as the division line slides up or down (i.e., the dotted line in Figure 5.2 moves left or right). 5.3.6. SPATIAL CONSTRAINTS Cell-by-cell predictions of slide zones violate the independent sample assumption required for most statistical significance tests. One method of overcoming this problem is subsampling the slide areas at regular intervals to reduce spatial dependence (based on a semivariogram sill). However, this dataset already is operating at the limit of its resolution. Many slides are composed of less than 4 pixels. Subsampling would effectively reduce the dataset, so that predictions would only focus on large slide areas—biasing the results. Another option is to treat each individual slide as one event, and compile statistics based on this reduced summary dataset. Slide area FS values were reduced to one mean value per slide zone. For comparison, a series of areas of similar size to each slide were located randomly within the island's boundary. Statistics were compiled as with the slide zones. Although the Z test score was substantially reduced, even with a large decrease in Nthis still indicates a very high probability that the slides are not part of the background population (Table 5.1 above—last line). 5.3.7. COMPARISON: OLD vs. NEW A secondary purpose of this evaluation exercise is to determine if the changes made to the model in Chapter Four resulted in any increase in predictive accuracy. The following methodology is used: 1. Run the uncertainty model described in Chapter Three using the Lyell Island soil and for-estry datasets (the soil dataset is described in Appendix E; the forestry dataset was gener-ated using the 1990 orthophoto coverage). 1 3 7 2. Reset the uncertainty model parameters using the numbers determined by the allocation routines in Chapter Four. Generally, these routines determined that soil types one and three have a higher likelihood of misclassification with each other than the original param-eters accounted for. Other minor changes were also made, but only in reference to the three most common soil types. 3. Repeat the model and compare results. The graph in Figure 5.13 compares the two using a maximum likelihood realisation. It is apparent that changes in the model did not affect the slide zones to any significant degree. Exploratory spatial analysis indicates that the areas affected by the differences in the model (Figure 5.14 -shaded pixels) are not located in slide zones to any large degree. The shaded zones in this figure are located principally in bedrock zones (cross-hatched polygons), classified as Type six. This type is not present to any significant degree within mass wastage zones, where the dominant type(s) consist of sandy/silry colluvial or morainal blankets. Bedrock generates a very high factor-of-safety in the infinite slope stability model—so high that minor variations in model parameters can cause significant numerical differences between runs. However, these variations are all between very 'safe' values, and so are of little consequence. Changes in the model (parameter and slope calibration) resulted in minor variations over the entire surface of the study area. Although these changes had little impact on the slide predic-tions, they may affect other applications that do not focus on soil cohesion or weight. Although exploration of the implications of this information would be best left to specialists in geomorphology or soil sci-ence, it is apparent that this type of comparison is poten- 0.8 0.6 tially useful for exploring the details of environmental 0.4 M.L - Old M.L. - New n A / f \ / \ \ 1 V J ' / I O ^ - C O C J C O C M ^ C O C M C O d d ^ - ' i ^ w eg cd co Factor of Safety models. Here, although the model may now be more rep- 0.2 resentative of reality, the effort expended to make it so did not translate into increased utility. This type of analy-. . . , , ,„ , Figure 5.13. A comparison of predic-sis is, in a sense, the spatial equivalent 01 a non-spatial ,. . ... . , r ^ r tive accuracy between the original and sensitivity analysis (such as the type used by Hammond u P d a t e d uncertainty models. 138 et al. 1992 to determine that slope and cohesion are the most sensitive components of the infinite slope stability model). 5 . 4 . D I S C U S S I O N This evaluation of the uncertainty model has certainly generated more questions than it has answered. In the simplest case, the two null hypotheses pro-posed in the introduction to this chapter have been addressed. Namely, the first (pre-dictions in slide zones are not signifi-Figure 5.14. The location of differences between the cantly different than predictions for two models relative to soil polygons. Significant differ-ences are indicated by shade variations. Soil type six the population) was rejected, and the (bedrock) is highlighted with cross-hatching. Bedrock and significant variation tend to coincide. second (predictions made by the origi-nal version of the model are not significantly different than those made by updated version) was not rejected. The only general conclusions that can be drawn from these two items of information are that 1) the slope stability model works better than random assignments, and 2) calibration of the spatial constraints on the soil model input had no significant effect on predictions in this particular environment. Delving deeper into the spatial and attribute structure of the model, it becomes apparent that realisations of the model other than maximum likelihood increase the prediction success. When 'success' is redefined using the Type A and B errors (as redefined above), realisations such as the 'worst-case scenario' can be used to concentrate the error within the less important of the two. However, it is difficult to compare realisations on other than a summary level, for each creates an entirely unique population. 139 Variance data allows some more specific conclusions to be drawn about prediction accuracy. For example, the most unsafe areas were predicted with highest certainty, while uncertainty in poor predictions (high FS in slide zones) increases with increasing FS. This generally confirms the expected performance of the model; however, the numerical significance of this correspondence cannot be directly ascertained. A graphical analysis of actual vs. expected model results shows that the population data already exhibit the expected shape, eliminating some possible methods of analysis. However, the model, particularly in its spatially constrained realisations, has sufficient predictive success (slides vs. not) to allow evaluation to concentrate on this particular aspect, and ignore the possibility of low success at high levels of uncertainty (a secondary method of defining uncertainty model 'suc-cess'). The use of zonal spatial limits creates the greatest increase in slide prediction success, virtually eliminating high FS values from the slide cells. One of the greatest difficulties with this type of analysis (as with any mass wastage modelling) is the problem of comparing the relative predictions of the model with the binary events of land-slides. Most studies fall back on simple percentages and cut-off lines for evaluation methods (e.g., a cut-off FS of 1.7 classifies 70% of slides correctly). The methods used here offer more informa-tion that can be used to evaluate other models or alternative realisations of this model. As for the new questions generated, there are numerous aspects of the data, the model and the underlying process(es) that can be illuminated through exploratory analysis. These might in-clude: • Working backwards by determining the realisation for each cell that gave best predictive accu-racy, then determining why the certainty factor was not optimised. This would highlight prob-lems with the soil database, the classification system or the model itself. • The use of border spatial constraints, such as 'nibbling' the edges of the slide zones to reduce inaccuracies caused by data errors, such as vector-raster conversion, or due to physical proc-ess, such as the outside edges of slides being 'dragged along for the ride' by the main failure. Similarly, a higher resolution dataset could also reduce these problems, though at the cost of reducing the spatial extent of the area modelled, or increasing the processing requirements. 140 • Examining where the average deposition area begins on a slide, and increasing model accu-racy in this way. By thresholding the model using different spatial constraints/variations in prediction accuracy could be used to determine the optimal way to represent a landslide in the database. Areas to investigate might include percentage area, slope effects, type of soil and nature of the deposition zone (slope effects have been noted by Fannin and Rollerson 1996). 5 . 5 . C O N C L U S I O N S This chapter has presented an evaluation of the slope stability uncertainty model developed and originally implemented on Louise Island. A high resolution database, detailing landslide timing and location on Lyell Island, was used to this end. For the most part, the evaluation focused on exploratory analysis of the data. Comparisons between various realisations of the model led to the conclusion that the worst-case-scenario version, coupled with spatial constraints limiting analy-sis to the upper 50% of slide zones, was the most effective for predicting slides with low variance. The second possible type of correct prediction—high FS with low variance—was not analysed due to the parameters of the population. A comparison of the model using its original parameters and a second run of the model with parameters updated through ground-truthing was performed. The changes did not appear to improve the model's predictive capability. The development, testing and application of uncertainty modelling, as presented in the previous three chapters, is perhaps the simplest part of managing uncertainty in forest inventory. The crucial, most difficult step is widespread implementation of such tools in resource management. Certain aspects of management would require changes if uncertainty in data were to be recog-nised and addressed. The following chapter 'caps' much of the technical work by briefly pointing out the utility of this research in a management context. 141 Chapter Six Discussion 6.1.INTRODUCTION This dissertation has primarily been concerned with the development of techniques for storing, propagating, and in particular, verifying certain types of metadata. The work in this chapter focuses on the implications of these metadata, metadata manipulation techniques, and their verification in the realm of real-word natural resource management. The discussion will initially revisit the arguments used to justify this line of research, and then broaden out into the integra-tion of uncertainty models into natural resource management, with a specific focus on forestry. The importance of uncertainty model verification will be highlighted throughout. The broad discussion of uncertainty modelling in Chapter Two highlighted the fact that, in many natural resource applications, the traditional methods of analysis—based on a Boolean approach to data—are inadequate. A variety of uncertainty modelling methods are possible—one of which is demonstrated in Chapter Three. Much of the work in the remaining chapters has focused on verifying the inputs and outputs of this model, as well as the development of general techniques that allow this type of verification to occur in other uncertainty models. The final research ques-tion posed in the introduction is 'what are some of the implications [of these methods and tech-niques] for resource management decision making?' This is a rather open question, and has not 142 been posed with the expectation of generating a complete answer. However, as one of the under-lying themes of this research is the juxtaposition of uncertainty research with real-world data, the work would be incomplete without juxtaposing the general process of uncertainty modelling and verification with real-world decision making. There is a need for the integration of uncertainty modelling, both spatial and non-spatial, with natural resource management decision making. The first part of this chapter will focus on justify-ing this statement. To begin, there are already several areas where basic types of uncertainty modelling exist in the decision making process. These include high level 'risk management' and lower level metadata. Considerable research has focused on the former, and the topic will only be addressed peripherally here. The latter is a rapidly growing area of resource data management (in fact, most spatial data management); however, this effort is primarily in the capture and storage stages, rather than in the 'how do we use this information?' stage. A discussion of this topic—the data management aspects of metadata—can be found in Chapter Two (§2.4). Here, the issue is the implications for management. The latter part of this chapter includes a discussion of other 'future research' directions highlighted or generated by the research in this document. 6 . 2 . R E S O U R C E M A N A G E M E N T Although 'resource management' is a term that has been part of the discussion in most of the preceding chapters, here the term will be broken down into specific components. As yet, the term has not been explicitly defined. One simple, generic definition is "a series of deliberate interven-tions in system processes" (lies 1994). Although this 'intervention' could include forcibly doing nothing (e.g., halting urban expansion to preserve certain habitat), more typically it involves a sequence of planned activities introduced at different points in space and time that focus on creating or maintaining a particular state of the resource (lies 1994). The goal, or 'state of the resource', is usually defined externally (e.g., government policy, societal values). In any case, the 'activities'—the management strategy—are used to move towards this goal. In delineating the types of problems that uncertainty management and verification might ad-dress, it will be necessary to partition the topic of management into three commonly accepted 143 principal phases (or levels) often applied to both planning and management: strategic, tactical and operational. At each of these levels the problems, opportunities and constraints are some-what different. The following sections will focus on the management of natural resources, with most examples drawn from the forestry sector. 6.2.1. STRATEGIC LEVEL At the strategic level the concept of uncertainty management has received a considerable amount of attention. Most strategic management and planning occurs at a summary level, with little spatial specificity. At the strategic level, 'risk management' is quite a common component of decision support, usually in the form of identifying risks in various categories that are associated with various alternative decisions or plans (e.g. Marshall 1986, Pollard 1994). For example, the economic risk of a shift in harvesting strategies might be determined using models of uncertainty in the economy, uncertainties in public policy change, and various other inputs. Decisions might be made in order to minimise risk, or to minimise risk while maximising profitability or stability. In a more specific example, the expected value of forestry restoration programs is often calculated using the expected value approach to risk management, in which biomass additions and other values are calculated using the 'best available evidence' to assess relative probabilities of different decisions (Scarfe 1997). Conservative estimates are used to ensure that societal aversion to envi-ronmental risks are captured in the estimate. Rather than simply guessing at these conservative values (e.g. -3% per annum social discount rate; see Scarfe 1999), uncertainty analysis incorpo-rated with risk management would provide this conservative approach with some actual variabil-ity data. Uncertainty in the types of spatial models dealt with in this dissertation are rarely direct inputs to strategic level planning. Most planning at this level involves tabular data, such as summaries of inventory by region or by type (e.g., Traas 1994). However, uncertainty management at the spa-tially-specific (lower levels) is a crucial, and often ignored component of these summary data. Even when they have been estimated, it is often difficult to combine uncertainty in the various components into an overall statistic. A clear example is the calculations that lead to an allowable cut determination (AAC) for regions British Columbia. In these documents (e.g., BCMOF 1996), 144 summary statistics are typically perturbed by a standard amount (± 10% in the example) and the calculations are repeated twice to set lower and upper bounds. Strategic decisions are then made based on the sensitivity of the model to each of the inputs individually. Uncertainty modelling and propagation have a clear role in a) identifying the actual variability in the figures, b) combining the various input variabilities through uncertainty propagation, and c) providing realistic esti-mates of the overall sensitivity of the models used. Uncertainty models also have a place in long-term strategic forecasting. In attempts to determine the future state of a resource based on the present situation and a chosen strategic management strategy, uncertainty modelling can play a role (in addition to that discussed above) in deciding just how far to trust the model. One question is: when do the data become 'buried' in their own uncertainty?, and another is: what is the actual uncertainty in the long-term supplies of a re-source? These are crucial strategic planning questions that properly verified uncertainty manage-ment models can help address. 6.2.2. TACTICAL LEVEL The tactical level of management or planning generally involves a short or medium term view-point, a greater detail of planning than the strategic level, usually more spatial specificity, and increased reliance on spatially-specific data. In the forest sector, tactical planning focuses on inventory and inventory updates, the allocation of resources, and various short-range (i.e., 2-5 year) management plans. Uncertainty management has the potential to be very useful at this level due to a) increased reliance on spatial data, b) increased variety of data required, and c) the emergence of multi-stakeholder decision-making systems. As plans become more complex and begin to draw on spatially-specific information, spatial uncer-tainty management becomes increasingly important. For example, in forest inventory and inven-tory updates there is currently little information available regarding the spatial distribution of uncertainty. Updates are generally performed at fixed intervals and on large segments of the inventory (delineated by political or cartographic boundaries). The incorporation of a verified un-certainty model has the potential to allow identification of specific areas that have high uncer-145 tainty, leading to spatially-specific updates. For example, uncertainty might be relatively high in areas that were originally inventoried from smaller (than normal) scale photos, or in areas where growth is relatively fast. This type of update has the potential to decrease costs significantly (see Appendix D for further discussion of this topic). At the tactical level a wider variety of data are required than at higher planning levels. For exam-ple, where strategic plans required a rough summary of the area covered by wetlands, tactical plans require knowledge of exact area values, and how these wetlands are distributed relative to other items. Similarly, tactical planning brings a number of different models together, such as growth and yield, slope stability, and regeneration models. Even if the uncertainty in each indi-vidual model is understood, what is generally lacking is the ability to bring these together at the landscape level. Verified uncertainty propagation techniques offer a way to perform this task. Another relevant aspect of tactical level planning is the gradual shift from a relatively simple process with one agency, company or individual making decisions to complex multiple-stake-holder processes. Although this is not the case in all sectors of resource management, in those where multiple-stakeholder processes have been implemented the requirement to thorough jus-tify planning decisions has increased substantially. These often-adversarial processes commonly involve different interest groups bringing their version of the resource data to the table, leading to arguments about data veracity (e.g., Basta 1990). The fact is, they may all actually be looking at the same data—with each group displaying a different tail of the variance curve. An understand-ing of data uncertainty, coupled with uncertainty models in which the actual distribution of that curve has been verified, could lead to a decrease in contention over this particular issue. Perhaps, in recognising the level of uncertainty in their data and models, such decision-makers (or commit-tees) might implement greater levels of conservatism in their decisions. This section has introduced several areas where uncertainty management might assist tactical level planning in resource management. There are also secondary issues—areas in which second-ary effects of uncertainty management would widen the scope of planning. One example is the development of temporal models (discussed in greater detail in Appendix D). Inventories used on 146 a tactical level time frame quite often are composed of different versions (e.g., 1995 version, 1999 version). It is difficult, often impossible, to perform efficient studies of broad scale change-over-time using such a system. This may be due to scale changes (greater/lesser detail), variation in interpretation (e.g., polygon boundaries redrawn) and other similar issues. Temporal-oriented data storage (such as the simple example in Appendix E, or more complex temporal models such as those discussed in Langran 1992) increases the potential for change-over-time modelling. However, uncertainty in these temporal objects is a crucial factor that needs to be addressed. For example, if 10m satellite data are added to a lm airphoto-based temporal inventory, the level of uncertainty in the new information increases. If temporal databases do not track uncertainty using verified models, they stand to decrease their utility for analysis. 6 . 2 . 3 . OPERATIONAL LEVEL At the operational level of management and planning, spatially specific uncertainty management has a decided, though often different, role to play. Much of this role hinges on the fact that resource sectors such as forestry currently have smaller profit margins than they have had his-torically and, therefore, decreased room for error in operational planning. One example is the decreasing supply of old-growth timber, leading to operations in forests of marginal profitability. Uncertainty modelling can potentially support operational planning in several areas: basic data gathering, modelling at fine scales, and highly specific inventories. Uncertainty modelling has the potential to assist operational level data gathering in a manner similar to that described in the previous section, although with a focus on large-scale data. By pointing out specific areas of high uncertainty (such as the polygon boundaries highlighted in the worst case scenario model in Chapter Three), operational planning can focus on gathering addi-tional data only in areas of high uncertainty. An example of the role that uncertainty modelling can play in fine scale operational modelling is found in Chapter Three. Knowledge of uncertainty in slope stability might lead to different deci-sions regarding road placement or the timing and location of harvesting. 147 Some forest resource management sectors are currently in the process of shifting to an opera-tional planning level lower than the accustomed stand-level management. In the usual stand-level planning, it was assumed that errors tended to average out—offset each other when a number of stands were harvested. In the new, fine-scaled operational regime, operational plans that focus on selective removal will require a greater understanding of what actually constitutes a stand of trees. An understanding of uncertainty at the largest scale of inventory will be of increasing importance. This also applies to operational plan development. An increasing amount of planning and documentation is required in today's highly regulated resource sectors. It is crucial to the success of future plans that today's plans be correct in as many aspects as possible. If, for exam-ple, a visibility analysis indicates that the altered terrain will constitute 11% of the landscape, yet the actual visibility is 18%, future plans may be looked on with some scepticism. With terrain uncertainty and inventory uncertainty modelling, increasingly realistic operational planning esti-mates can be developed. Uncertainty management in general is of increasing importance in a leaner, information-rich operational management environment in all resource sectors. 6 . 3 . U N C E R T A I N T Y M O D E L V A L I D A T I O N Uncertainty modelling can potentially address many planning and management issues at strate-gic, tactical and operational levels. In addition, the validation of these models increases their importance in many ways. At the strategic level, risk management is gradually becoming an indispensable part of decision-making. However, there is a clear need for validation of the risks associated with different actions. This will not be an easy task, but it remains a crucial one. At both the strategic and tactical levels there is a need to evaluate decisions made on the basis of uncertainty information; not simply risks, but specific plans (such as allowable cut). The principal impediment is the difficulty in evaluating decisions made in different information environments without both (or all) decisions being made in comparative (e.g., double-blind) experiments. Without such experiments it is diffi-cult to tell what might have happened. 148 At operational levels validation is also important. Currently, High, , | | the '15.72157 ha of mature timber' type of answers that result from GIS-based analysis and modelling are mis-trusted—with good reason. GIS and associated spatial (and non-spatial) models must prove that their estimates are Figure 6.1. Typical graph of nor-sound before they will be accepted at the operational level. mally distributed uncertainty For example, knowing that the level of harvestable timber is as graphed in Figure 6.1 might make for quite different operational planning than simply knowing the central number. The validity of those outside figures is as important as the central one. Through tests of expert opinion estimates of risk or uncertainty, we end up with both better models and, through feedback, better understanding of uncertainty. This benefit cannot easily be measured. However, when planners, managers or scholars are forced to revise their estimates of the quality of the data they work with every day, they may also be forced to revise their methods, plans and research tactics. In the same way, confirmation of their quality estimates may also have positive benefits. 6 . 4 . F U R T H E R R E S E A R C H The sections above have each included several recommendations for further research directions. Each of the chapters have also included specific suggestions for areas in which further work might enhance understanding of uncertainty, or lead off into separate research streams. This section summarises these points and indicates relative priorities of the suggestions, with the presentation progressing from general to specific items. Research Programs. In reviewing the research literature in the general area of uncertainty man-agement of spatial data (Chapter Two), it soon becomes apparent that this field particularly lacks research programs. Many laudable individual projects exist; however, the diverse range of applica-tions (as indicated in documents such as the Proceedings of the International Symposium on Spa-tial Accuracy of Natural Resource DataBases, Congalton 1994) are rarely tied together in integra-149 tive programs. Verification of uncertainty metadata and models is a crucial element of such a program, and it is therefore hoped that the research presented in this document will assist in the development of such areas of study. Integration into Management. As noted in sections above, there are a number of specific areas where properly verified uncertainty models and uncertainty propagation routines would be of use in real-world management. Potentially useful areas of research include the verification of the probabilities and possibilities associated with risk management scenarios through tests where various outcomes of a decision are followed through to their conclusion. At tactical or operational levels of management, uncertainty verification work is required on a more task-specific basis, with the goal of allowing a manager to expect to encounter a specific level of uncertainty, enabling her to put detailed contingency plans in place to deal with all likely eventualities. Issues of uncertainty communication noted in Chapters Two and Three also have considerable impact on the integration of uncertainty modelling in natural resource management. When visu-alisation tools are developed (e.g., Fisher 1991b, Goodchild et cd. 1994), this process usually ceases at the demonstration stage. Verification research is required to determine a) if what the visualisation routines/tools indicate is correct, and b) that the impression of uncertainty levels provided by these tools to non-technical users is in line with the actual level of uncertainty. Work such as this is underway (e.g., Antle, in prep.); further efforts are required in various resource sectors and tasks. A shift in 'spatial understanding' regarding uncertainty can only be judged through its effects on policy, resource decisions, scientific hypothesis generation or other bottom-line items. The integration of verified uncertainty models into real-world management is undoubtedly the most crucial research issue to be discussed in this section. Using this general issue as a target, an efficient research program can thereby be planned and executed, tying in the various tasks of modelling, verifying and decision-support integration. Initial test cases such as the projects dis-cussed in Davis (1994) and in Appendix E are needed to make decision-makers aware of the 150 issues and importance of this work. However, concerted, application-specific research is required in order to develop applications for real-world, day-to-day management. Uncertainty Model Input Verification. The research discussed in Chapter Four focused on verifying several of the inputs to a specific uncertainty model, based on a more general approach to verifying uncertainty models of classified and data. There are several possible research avenues that derive from this and the work on continuous data, including: • Performing exhaustive sampling in limited areas in order to verify (and therefore better define) the behaviour of class uncertainty in attribute space. • Performing comparative studies of slope representation in terrain models in order to deter-mine the probable reasons for the noted variations from estimated uncertainty. It was noted in Chapter Four that, given the available data, it would be difficult to determine whether the extreme terrain, the elevation data contractors, the slope modelling routine assumptions, the GIS algorithm, or some other factor (or a combination) is to blame for the discrepancy. A comparative study might be undertaken using one or more of the following: increased inten-sity of slope sampling, a variety of comparative areas in different locations and with different types of terrain and ground cover, a comparison of photogrammetric techniques, and GIS algorithm evaluation in variable terrain. Although some studies exist that compare various types of elevation models (e.g., Sasowsky etal 1992) or GIS techniques (e.g., Skidmore 1989), there is a lack of detailed ground evaluation studies, which has led to a potentially 'unhealthy' reliance on inventories such as TRIM. Uncertainty Model Output Evaluation. In Chapter Five the uncertainty model was evaluated using the Lyell Island data. A number of new questions were generated that might be answered through exploratory analysis or with a more extensive dataset. These include: • Working backwards by determining the realisation for each cell that gave best predictive accu-racy, then determining why the certainty factor was not optimised. This would highlight prob-lems with the soil database, the classification system and/or the model itself. This would 151 require a more extensive dataset than is currently available for the island in order to bring secondary factors into play. • The use of border spatial constraints, such as 'nibbling' the edges of the slide zones to reduce inaccuracies caused by vector-raster conversion. Similarly, a higher resolution dataset could also reduce these problems, though at the cost of reducing the spatial extent of the area modelled, or increasing the processing requirements. • Examining where the average deposition area begins on a slide, and increasing model accu-racy in this way. By thresholding the model using different spatial constraints, variations in prediction accuracy could be used to determine the optimal way to represent a landslide in the database. Areas to investigate might include percentage area, slope effects, type of soil and nature of the deposition zone. Once again, a higher resolution and more detailed dataset would be required for accurate determination of these factors. Some work on this topic has been conducted by Fannin and Rollerson (1996), who noted that deposition is generally trig-gered by a distinct change in slope gradient. Other Areas. A number of the techniques developed during the mass wastage database produc-tion and testing (Appendices D and E) led to possibilities for further specific research. These include: • The 'crossings index' was presented as a rough indicator of mis-registration under certain circumstances. In the text of Appendix D it was noted that a possible extension study would further explore the implications of line crossings relative to scale, digitising accuracy, and other factors. It was also noted that a more intensive study would be required to determine what the absolute implications of the crossings index number are (rather than the relative implications explored in the text). • In Appendix D an attempt was made at automating the image registration procedure. It was noted that new types of search algorithms would be required to complete this procedure in reasonable time, quite possibly requiring some degree of understanding of the human vision 152 process. A logical extension of this work would be a study of the visual clues used by a manual system operator to register images in a small number of steps, and to translate this into a computer procedure—possibly using expert systems or other learning algorithms. • A direct comparison between different terrain types focusing on similar targets would enable quantification of the apparent increase in accuracy of the ODFS registration in extreme ter-rain noted in Appendix D. Communication of Uncertainty. Uncertainty models, such as the model presented and verified earlier, are typically complex assemblages of data. Communicating these data to both analysts and decision-makers presents a challenge in the area of data visualisation. No standard, proven methods for display of uncertainty data exist. A number of techniques have been discussed in the literature; however, few implementations exist, and the majority of these refer to artificial or sample datasets. Exploratory visualisation of practical datasets focusing on real-world problems is required to further develop this research field. More specifically, an understanding of the implications of the results of an uncertainty modelling procedure cannot easily take place without visualising uncertainty measures in concert with the original data. For example, slope stability can be easily classified and displayed. However, there are many unknowns regarding how uncertainty in these data can be effectively communicated to a user accustomed to seeing crisp, Boolean-style data. Considerable research is needed in this area. More specifically, the numerous dimensions involved in a multiple uncertainty representation place an increased burden on the spatial analyst. The database is far more flexible than in the Boolean case, but flexibility is coupled with complexity. Uncertainty and error can be combined at the summary/display stage, but only in an environment where the users' needs are thoroughly understood. The wide variety of possible representations allows the database to provide just about any answer desired. Therefore, considerable work is required to determine just what 'rea-sonable' uncertainty/error values are and how these translate into reality in the field. At this point the concepts of risk analysis and acceptable risk come into play. Once the display has been calibrated with field data (e.g., green: safe, yellow: a reasonable possibility of failure, and red: the 153 near certainty of failure) using techniques such as those discussed in this document, it becomes possible for the user to set the desired risk level (e.g., 5% chance of being wrong) and proceed with an analysis. The term 'risk' implies that there are social, economic, or other factors interacting with the spatial uncertainty metadata in a decision-making context. This concept of 'acceptable risk level' may be easier for most users to interpret than the quantified uncertainty values used as internal representations in the database. The added dimensions available through the use of dynamic visualisation tools also place an increased burden on the cartographer. The purpose here is not simply effective communication of a particular message. The 'message' imparted through visualising uncertainty information is far less tangible, and therefore far more difficult to evaluate. A shift in 'spatial understanding' re-garding uncertainty can only be judged through its effects on policy, resource decisions, scientific hypothesis generation or other bottom-line items. In addition to the issues discussed above, visualisation tool development would benefit from user evaluations—not simply regarding cartographic communication, but through a simulated deci-sion-making scenario. The effectiveness of these tools can only be properly judged through a cross-comparison of decisions made based on different techniques, as well as comparisons with a control group using static, Boolean-based maps (for example see Antl'e, in preparation). It is likely that experience and innate understanding of uncertainty are already incorporated into many types of Boolean-based decisions. It will, however, be difficult to make predictions outside of particular application areas. Comments on Further Research/Implementation. Although this work has focused specifically on a slope stability model implementation of uncertainty modelling, many of the techniques devel-oped and implemented are also highly applicable to other types of natural resource modelling. First, the uncertainty model itself is generic in nature, in that it can encompass a wide variety of types of uncertainty, and acts as a shell around the process model. The basic requirements are: a good understanding of the nature of uncertainty in each of the model inputs, and the ability to interpret the extensive output of the procedure. In essence, the entire uncertainty modelling 154 procedure forces the investigator to develop a thorough and complete understanding of the data she works with. This research has indicated that the process of gathering expert opinion can be fraught with error, and that published error statistics are not necessarily trustworthy. The effort required to determine the actual values for uncertainty must be balanced with the depth of analysis required for the application at hand. There are many other possible ways of gathering this information; however, they too should be viewed with some suspicion if a precise accounting of uncertainty is necessary. Overall, it is important to identify and then concentrate on the specific types of uncertainty that affect the resource model to the greatest extent. Some of this information might be gained from a sensitivity analysis, while others will simply be common sense. However, because of the poten-tially multiplicative nature of uncertainty, the other types should at least be estimated and in-cluded wherever possible. 6 . 5 . S U M M A R Y The discussion in this chapter has touched on a number of resource management areas where uncertainty management, and particularly uncertainty model validation, may be of use in solving problems or increasing efficiency. The issues are different at different planning or management levels; therefore, the discussion was partitioned into the three main levels of planning: strategic, tactical and operational. At each level, some of the more obvious implications of incorporating verified, spatially variable uncertainty models were discussed. As with the remainder of this document, forestry was utilised as the principal example of resource management. This sector's heavy reliance on data and mod-els of a highly spatially variable resource with numerous associated uncertainties makes it a prime example. No doubt, in other sectors there are many other implications of data uncertainty and verification for resource management. It is hoped that these examples will bring some of the relevant issues to the forefront. 155 Chapter Seven Conclusions 7 . 1 . S U M M A R Y O F S T U D Y The new research discussed in this document and its appendices took place over the span of four years. The major tasks can be broken down as follows: Louise Island - Model Input Verification: Approximately six weeks were spent in the field gath-ering the data for this phase. Two of those weeks were used to gather existing data from various agencies in the Queen Charlotte Islands, two were spent on Louise Island and two on Lyell Island performing preliminary work for the next phase. In the lab, two months work went into develop-ing the systems and performing the preliminary analysis; another four weeks were spent develop-ing the visualisation routines for analysis and reporting. Development of the conceptual work took place over an extended period. Lyell Island - Model Output Verification: Lyell Island is located approximately 100km from the nearest roads, and 200km from the nearest fuel supplies. Access required 2-10 hours of boat travel (weather dependent). A single water circumnavigation of the island required 3-4 hours in good weather. Therefore, much of the field effort involved in this research focused on logistics. During the second field season three weeks were spent on Lyell (author and assistant). All travel on the island was on decommissioned roads (by foot). Survey equipment was carried to fourteen of the major landslides, and the slides were physically surveyed. A typical survey of a single slide 156 required 5-6 hours of hiking and 2-3 hours actually on the slide. Extremely remote slides were accessed via zodiac landings on highly exposed beaches. The aerial survey work described in Appendices D and E required a single day of effort, and approximately one week of logistics and preparation. The system was also tested in a separate area (in work that is not described herein), with another two weeks of field time required. Development and analysis of the Lyell data took place over two years. Approximately four months of full-time work (two months with an assistant) went into developing the orthophotos and the baseline database of mass wastage. Another three months were required for code development of the ODFS system described in Appendix D. Approximately two months were required to perform the rehabilitation case study described in Appendix E. As with the Louise analysis, the final analysis and conceptual work took place over an extended period. 7 . 2 . R E S E A R C H Q U E S T I O N S The research presented in this dissertation has focused on the issue of uncertainty model verifi-cation. Specifically, the central research question was: c a n a n a t u r a l r e s o u r c e m a n a g e m e n t u n c e r t a i n t y m o d e l b e v e r i f i e d i n o r d e r t o e v a l u a t e i t s u t i l i t y i n r e a l - w o r l d m a n a g e m e n t ? In the introduction it was noted that there can be no simple yes or no solution, as there exists no simple statistic to determine if uncertainty as modelled equals uncertainty as sampled. As the research has shown, the issue of uncertainty model verification is a complex one; yet, through techniques such as exploratory data analysis, it has been possible to address the principal re-search question. The question was addressed by breaking it down into a series of manageable questions, each of which focuses on one of the 'verification boxes' in Figure 1.1. The questions are as follows. 1. What are appropriate methods for modelling data uncertainty in natural resource manage-ment, making use of information typically available? This question was addressed in previous research, summarised in Chapter Three, and used as a model base for the following chapters. It was noted that there are several types of uncertainty that 157 must remain conceptually separate, but may be brought together in data summaries and queries of the resulting uncertainty model. A number of methods were discussed, and fuzzy sets were chosen for the test case (but only for particular types of uncertainty). 2. How appropriate are these methods, and how can this 'appropriateness' be determined? Specific questions include: 2a. How effective is gathering metadata from expert opinion? 2b. How effective is gathering metadata from published variability statistics? The effectiveness of these two inputs to an uncertainty model was determined through ground verification of the modelled information. Methods were developed to allow comparison of sampled values with classification uncertainty levels, allowing the 'appropriateness' of the model to be determined. This development was the principal focus of Chapter Four, and was used in a test case to verify the model described in Chapter Three. Metadata gathered using expert opinion on soil uncertainty were found to underestimate uncertainty in all cases, with specific soil types exhibiting greater uncertainty than others. Therefore, it was concluded that expert opinion is not necessarily an ideal input to this particular model, and should be looked on with some trepidation in similar exercises. The results pointed to the apparent fact that soil scientists do not necessarily have a strong grasp of the overall level of uncertainty in the data they regularly employ. Uncer-tainty verification was shown to have considerable importance in 'tuning' this model input. Tests on the effectiveness of gathering data from published variability statistics also showed that, in the most important input to the test case model (slope stability), the published values underes-timated the variability found in reality. However, the tests also indicated that, in this case, correc-tions could be applied that would reduce this margin. The question of 'appropriateness' is not one that can be answered with a direct yes or no. The methods of modelling uncertainty did not, in their initial 'laboratory value' state, effectively reflect uncertainty on the ground. However, even prior to verification, they still—by definition—reflected the ground condition better than a standard Boolean model. The focus of the research was on the latter half of Question Two: 'how can this appropriateness be determined?' The methods devel-158 oped are applicable to a wide range of uncertainty models and data sources, and will allow the appropriateness of other types of data and models to be determined and compared. 3. What are appropriate methods for propagating these metadata through to information prod-ucts (i.e., using a typical type of natural resource model)? 4. How appropriate are these methods, and how can this 'appropriateness' be determined? The first question was addressed in Chapter Three, where a combination of techniques (fuzzy joint membership function and Monte Carlo simulation) were utilised to propagate dissimilar types of uncertainty through a typical model. The focus of this research was addressing the fourth question through the development of techniques to determine the appropriateness of this uncer-tainty propagation. Again, methods were proposed and developed that would be applicable in a variety of situations. Here, they were tested using the output of the slope stability uncertainty model. The principal issue was addressing the fact that the model predictions incorporate multi-ple realisations and variability data, while the verification data were back or white. The results showed that the uncertainty model and propagation methods were highly appropriate in develop-ing information products for a typical natural resource model. The information retained in the uncertainty modelling system allowed the slope stability model predictions to be of greater use in predicting slope failure with high certainty than the typical Boolean model. The fact that the uncertainty level in the Boolean model (in a typical application) is an unknown value serves to highlight the utility of uncertainty modelling procedures in general. In general, the methods used to answer Questions Two through Four also demonstrate appropri-ateness through greater flexibility. The uncertainty modelling and propagation techniques, when properly verified, serve to increase the number of questions that can be asked of the data—both the model source and results. This could lead to either increased utility for research, or wider practical applicability of data and models. 5. What are some of the implications of the methods outlined in the above questions for re-source management decision making? 159 The final question is a more general one, and was answered in the previous chapter. The implica-tions are many and varied, and were broken down into three levels of planning. The implications of uncertainty modelling and verification at the strategic level are primarily non-spatial, and revolve principally around the task of'risk management'. Verified uncertainty models would allow the various possible outcomes of a decision model to be assigned metadata, increasing decision model utility. At the tactical and operational levels the spatial variability of uncertainty becomes more important. Here, one of the crucial implications uncertainty modelling is in understanding the data used for planning, possibly allowing planners to understand that, in some cases, many of the presented scenarios fall within the range of possibility. Verified uncertainty models may also allow planning for leeway in strategic and operational plans. Finally, returning to the overall question, it is apparent from the above research that a natural resource uncertainty model can be verified in order to determine its utility in real world manage-ment and, furthermore, one of the principal utilities of such a model is to allow greater flexibility and understanding of datasets and resource models by real world manager. However, this in-creased flexibility is combined with increased complexity. Communication becomes a crucial tool in bringing uncertainty models to the desktop of real world managers. The research presented in this document represents a major step in an overall research program intended to integrate uncertainty management into natural resource decision making. Uncer-tainty models, uncertainty model verification, and resource model uncertainty verification feed directly into management integration. With another piece added to the puzzle, it is hoped that research into the process of integration will continue to grow. Bibliography 160 ABEL, D.J., YAP, S .K., WALKER, G., CAMERON, M.A., AND ACKLAND, R.G. 1992. Support in spatial information systems for unstructured problem-solving. In Proceedings of the Fifth International Symposium on Spatial Data Handling. ADKINS, K . F . , 1994. Accuracy assessment of elevation data sets using GPS. Photogrammetric Engineering and Remote Sensing, 60(2): 195-202. ALONSO, W., 1968. Predicting best with imperfect data. Journal of the American Institute of Planners, July:248-55. ALTMAN, D., 1994. Fuzzy set theoretic approaches for handling imprecision in spatial analysis. International Journal of Geographical Information Systems, 8(3):271-289. ANTLE, A., in preparation, Interactive visualisation tools for data and metadata, Ph.D. Dissertation, University of British Columbia Department of Geography. ARONOFF, S., 1982. Classification accuracy: a user approach. Photogrammetric Engineering & Remote Sensing, 48(8): 1299-1307. AUNG, H., 1992. Probabilistic analysis of earthen slope stability (slope failure). Ph.D. Dissertation, Marquette University. 159pp. BANAI, R., 1993. Fuzziness in geographic information systems: contributions from the analytic hierarchy process. International Journal of Geographical Information Systems, 7(4):315-329. BASTA, D., 1990. Information technology for multiple use decision making. In Coastal Ocean Space Utilization, edited by R.B. Abel and S.D. Halsey (Elsevier Science Publishing Co.), pp. 309-313. BC MINISTRY OF ENVIRONMENT, 1988. Terrain Classification System for British Columbia (revised edition). (Victoria, BC: Gov't of BC)."" . , BC MINISTRY OF FORESTS, 1996. Queen Charlotte Islands Timber Supply Area Rationale for (AAC) Determination. (Victoria, BC: Queen's Printer), 44 p. »« BEARD, M . K . AND BUTTENFIELD, B.P. (eds), 1991. NCGIA Research Initiative 7: Visualization of spatial data quality. NCGIA Technical Paper 91-26, ( NCGIA) . BERTIN, J . , 1985. Graphical Semiology (Madison, Wisconsin: University of Wisconsin Press). BEZDEK, J.C., 1974. Numerical taxonomy with fuzzy sets. Journal of Mathematical Biology, 34:1-15. BEZDEK, J.C., 1992. Fuzzy models: what are they, and why? IEEE Transactions on Fuzzy Systems, 1(1): 1-5. BEZDEK, J.C., EHRLICK, R., AND FULL, W., 1984. FCM: the fuzzy c-means clustering algorithm. Computers inGeoscience, 10:191-203. BILODEAU, J.-M., BEDARD, Y . , AND LOWELL, K . E . , 1993. Using a Geographic Information System to plan a forest inventory that respects the spatial distribution of forest strata. Northern Journal of Applied Forestry, 10(4): 161. 161 B L A K E M O R E , M., 1983. Generalization and error in spatial databases. Cartographica, 21(2/ 3):131-139. B O U I L L E , F., 1992. Fuzzy neural processing by an object-oriented expert system: application to geographic information systems. In International Fuzzy Engineering Symposium, Fuzzy Engineering Toward Human Friendly Systems. Yokohama, Japan, Nov 1991.(Tokyo: IOS Press), pp. 574-585. B R A N D T , J.W., 1994. Convergence and continuity criteria for discrete approximations of the continuous planar skeleton. Computer Vision,Graphics and Image Processing, 59(1): 116-125. B R I M I C O M B E , A.J., 1993. Combining positional and attribute uncertainty using fuzzy expectation in a GIS. In Proceedings, GIS/LIS Annual Conference. Minneapolis, Minnesota, November 2-4.(Bethesda, Maryland: American Society for Photogrammetry and Remote Sensing), pp. 73-81. B R O W N , D.G. A N D B A R A , T.J., 1994. Recognition and reduction of systematic error in elevation and derivative surfaces from 7.5 minute DEMs. Photogrammetric Engineering and Remote Sensing, 60(2): 189-194. B R U N E A U , P. A N D G A S C U E L - O D O U X , C , 1995. Sensitivity to space and time resolution of a hydrological model using digital elevation data. Hydrological Processes, 9:69-81. B U C K L E S , B. P. A N D P E T R Y , F. E. , 1985. Generalized database and information systems. Unpublished Manuscript. B U I T E N , H.J. A N D VAN P U T T E N , B., 1997. Quality assessment of remote sensing image registration - analysis and testing of control point residuals. ISPRS Journal Of Photogrammetry And Remote Sensing, 52(2):57-73. B U R R O U G H , P.A., 1986a. Five reasons why geographical information systems are not being used efficiently for land resource assessment. In Auto-Carto London, edited by M. Blakemore. London, 14-19 Sept.(AutoCarto London), pp. 139-148. B U R R O U G H , P.A., 1986b. Principles Of Geographical Information Systems For Land Resources Assessment (Oxford: Clarendon). B U R R O U G H , P.A., 1989. Fuzzy mathematical methods for soil survey and land evaluation. Journal of Soil Science, 40:477-492. B U R R O U G H , P.A., 1991. Soil Information Systems. In Geographic Information Systems: Principles and Applications, 2, edited by D.J. Maguire, M.F. Goodchild, and D.W. Rhind (Harlow, Essex, England: Longman Scientific), pp. 153-168. B U R R O U G H , P.A., 1993. Spatial data quality and error analysis issues: GIS functions and environmental modeling. In GIS and Environmental Modelling Conference. Colorado. Manuscript. B U R R O U G H , P.A., M A C M I L L A N , R.A., A N D V A N D E U R S E N , W., 1992. Fuzzy classification methods for determining land suitability from soil profile observations and topography. Journal of Soil Science, 43:193-210. B U T T E N F I E L D , B.P., 1991. Visualizing cartographic metadata. In NCGIA Research Initiative 7: Visualization of Spatial Data Quality, edited by M.K. Beard and B.P. Buttenfield (NCGIA), p. C17-C24. B U T T E N F I E L D , B. A N D B E A R D , M.K., 1994. Graphical and geographical components of data quality. In Visualization in Geographical Information Systems, edited by H.M. Hearnshaw and D.J. Unwin (New York: John Wiley and Sons), pp. 150-157. 162 BUZUG, T.M., WEESE, J . , FASSNACHT, C , AND LORENZ, C , 1997. Image registration: convex weighting functions for histogram-based similarity measures. Lecture Notes in Computer Science, 1205:203-225. CALIFORNIA DEPT. OF FORESTRY. 1992. Implications of sample size and sampling methodology for forestland map accuracy assessments. Manuscript. CAPRA, A., NICOSIA, O., AND SCICOLONE, B., 1995. Application of fuzzy set theory to drainage and salinity problems. RiuistaDilngegneriaAgraria, 26(4): 200-207. CHEN, Z . Z . AND FINN, J.T., 1994. The estimation of digitizing error and its propagation with possible application to habitat mapping. In International Symposium on Spatial Accuracy of Natural Resource DataBases: Unlocking the Puzzle, edited by R.G. Congalton. Williamsburg, Virginia, May 16-20 1994.(Bethesda, Maryland: American Society for Photogrammetry and Remote Sensing), pp. 57-66. CHORAS, R.S., 1993. Image codong by morphological skeleton transformation. Lecture Notes in Computer Science, 71: 216-222. CHRISMAN, N.R., 1982. A theory of cartographic error and its measurement in digital databases. In Proceedings, Fifth International Symposium on Computer-Assisted Cartography. AUTO-CARTO 8. Falls Church, Virginia(ASPRS & ACSM), pp. 159-168. CHRISMAN, N.R., 1989. Modelling error in overlaid categorical maps. In The Accuracy Of Spatial Databases, edited by M. Goodchild and S. Gopal (Bristol, PA: Taylor & Francis), pp. 21-34. CHRISMAN, N.R., 1991. The error component of spatial data. In Geographical Information Systems: Principles And Applications, 1, edited by D.J. Maguire, M.F. Goodchild, and D.W. Rhind (Harlow, U.K.: Longman Scientific And Technical), pp. 165-174. CHRISMAN, N.R., 1997. Exploring Geographic Information Systems, (New York: John Wiley and Sons), 298 pp. CHRISTIAN, J.T., LADD, C . C , AND BAECHER, G.B., 1994. Reliability applied to slope stability analysis. Journal of Geotechnical Engineering -ACSE, 120(12): 2180-2207. CLAGUE, J.J. , 1989. Quaternary geology of the Queen Charlotte Islands. In The Outer Shores, edited by G.G.E. Scudder and N. Gessler (Skidegate, Queen Charlotte: Queen Charlotte Islands Museum), pp. 65-74. CLEAVES, D.A., 1994. Assessing uncertainty in expert judgements about natural resources. USDA Forest Service General Technical Report 50-110, (USDA Forest Service) . CONGALTON, R.G., 1994. (ed.), International Symposium on Spatial Accuracy of Natural Resource DataBases: Unlocking the Puzzle. Williamsburg, Virginia, May 16 1994-May 20 1994.(Bethesda, Maryland: American Society for Photogrammetry and Remote Sensing), CONGALTON, R.G. AND MACLEOD, R.D., 1994. Change detection accuracy assessment on the NOAA Chesapeake Bay pilot study. In International Symposium on Spatial Accuracy of Natural Resource DataBases: Unlocking the Puzzle, edited by R.G. Congalton. Williamsburg, Virginia, May 16 1994-May 20 1994.(Bethesda, Maryland: American Society for Photogrammetry and Remote Sensing), pp. 78-87. CROWLEY, J.L. AND DEMAZEAU, Y., 1993. Principles and techniques for sensor data fusion. Signal Processing, 32(1): 5-27. CSAPLOVICS, E., 1992. Analysis of colour infrared aerial photography and SPOT satellite data for monitoring land cover change of a heathland region of the Causse du Larzac (Massif Central, France). International Journal of Remote Sensing, 13(3):441-449. 163 CUNIA, T. AND WHARTON, E.H., 1986. Estimating Tree Biomass Regressions and their Error, Proceedings of the Workshop on Tree Biomass Regression Functions and Their Contribution to the Error of Forest Inventory Estimates. Syracuse NY, May 26-30.(USDA Forest Service). 450pp. DAVIS, J.C., 1986. Statistics and Data Analysis inGeology (New York: John Wiley and Sons). DAVIS, T. J . , 1994. Modeling and Visualizing Spatial Uncertainty using Fuzzy Logic and Monte Carlo Simulation. M.Sc. Thesis: University of Victoria. 131pp. DAVIS, T.J. AND KELLER, CP., 1997a. Modelling uncertainty in natural resource analysis using fuzzy sets and monte carlo simulation: slope stability prediction. International Journal Of Geographical Information Science, 11(5):409-434. DAVIS, T.J. AND KELLER, CP., 1997b. "Modelling and Visualising Multiple Spatial Uncertainties", Computers andGeosciences, 23(4), p.397-408. DAVIS, T.J., KELLER, CP., AND KLINKENBERG, B. 1998. A preliminary evaluation of the Lyell Island Rehabilitation Project. Report for The Archipelago Management Board, Gwaii Haanas. 72pp. DE ROO, A.J.P., HAZELHOFF, L., AND BURROUGH, P.A., 1989. Soil erosion modelling using 'answers' and geographic information systems. Earth Surface Processes andljxndforms, 14:517-532. DELAVAR, MAHMOUD REZA, 1997. Development of probability maps to assess the accuracy and reliability of information in the output of a GIS system. PhD Dissertation: University Of New South Wales (Australia). DIBIASE, D., MACEACHREN, A.M., KRYGIER, J.B., AND REEVES, C , 1992. Animation and the role of map design in scientific visualization. Cartography And Geographic Information Systems, 19(4):201-221. DJAMDJI, J.-P., 1993. Geometrical registration of images: the multiresolution approach. Photogrammetric Engineering And Remote Sensing, 59(5): 645. DOMBURG, P., DEGRUIJTER, J .J . , AND BRUS, D.J., 1994. A structured approach to designing soil survey schemes with prediction of sampling error from variograms. Geoderma, 62:151-164. DRUMMOND, J . , 1988. Fuzzy sub-set theory applied to environmental planning in GIS. In EuroCarto 7, edited by J . C Muller. Enschede, Netherlands, Sep 20- 22 1988.(ITC). pp 121-134. DU, L. AND LEE, J.S., 1996. Fuzzy classification of earth terrain covers using complex polarimetric SAR data. International Journal Of Remote Sensing, 17(4):809-826. DUNN, J . C , 1974. A fuzzy relative of the isodata process and its use in detecting compact, well-separated clusters. Journal of Cybernetics, 3:22-57. DUNN, R., HARRISON, A.R., AND WHITE, J . C , 1990. Positional accuracy and measurement error in digital databases of land use: an empirical study. International Journal of Geographical Information Systems, 4(4): 385-398. DlTTTON, G., 1992. Handling positional uncertainty in spatial databases. In Proceedings of the Fifth International Symposium on Spatial Data Handling, edited by P. Bresnahan, E. Corwin, and D. Cowen. Charleston, South Carolina, Aug. 3-7.(University of South Carolina), pp. 460-468. EBERBACH, E., 1993. Representing spatial and temporal uncertainty. Lecture Notes In Computer Science, 682:129. 164 ECOSAT GEOBOTANICAL SURVEYS, 1989. Forest Harvesting Activities and LANDSATThematic Mapper Analysis of Lyell Island. Report submitted to Parks Canada, Queen Charlotte, BC. EDWARDS, G. AND LOWELL, K.E., 1996. Modeling uncertainty in photointerpreted boundaries. Photogrammetric Engineering And Remote Sensing, 62(4):377-391. EHLSCHLAEGER, C.R. AND GOODCHILD, M.F., 1994. Uncertainty in spatial data: defining, visualizing and managing data errors. In Proceedings: GIS/LJS. Phoenix, Arizona, Oct. 25-27.(Bethesda, Maryland: ASPRS/ACSM). pp 240-247. E M M I , P.C. AND HORTON, C.A., 1995. A Monte Carlo simulation of error propagation in a GIS-based assessment of seismic risk. International Journal Of Geographical Information Systems, 9(4):447-461. ESTEP, K.W., MAClNTYRE, F., AND NOJI, T.T., 1994. Seal sizes and habitat conditions assessed from aerial photography and video analysis. ICES Journal of Marine Science, 51:253-261. ESTES, J .E. , STOW, D., AND JENSEN, J.R., 1982. Monitoring land use and land cover changes. In Remote Sensing for Resource Management, edited by C. Johannsen and J . Sanders (Ankeny, Iowa: Soil Conservation Society of America), pp. 100-110. EVANS, B., 1997. Dynamic display of spatial data-reliability: does it benefit the map user? Computers and Geosciences, 23(4):254-270. E V A N S , D.L., 1994. Some considerations for AVHRR forest classification accuracy assessments. In International Symposium on Spatial Accuracy of Natural Resource DataBases: Unlocking the Puzzle, edited by R.G. Congalton. Williamsburg, Virginia, May 16-20 1994.(Bethesda, Maryland: American Society for Photogrammetry and Remote Sensing), pp. 161-167. EVERITT, J.H., ESCOBAR, D.E., AND JUDD, F.W., 1991. Evaluation of airborne video imagery for distinguishing black mangrove (Avicennia germinans) on the Lower Texas Gulf Coast. Journal of Coastal Research, 7(4): 1169-1173. EVERITT, J.H., ESCOBAR, D.E., AND VlLLARREAL, R., 1993. Integration of airborne video, global positioning system and geographic information system technologies for detecting and mapping two woody legumes on rangelands. Weed Technology, 7:981-987. FANNIN, R. J . AND ROLLERSON, T.P., 1996, Assessing debris flow behavior in coastal British Columbia: Runout behavior. FERIC Special Report, SR-116. FELICISIMO, A.M., 1994. Parametric statistical method for error detection in digital elevation models. ISPRS Journal Of Photogrammetry And Remote Sensing, 49(4):29-33. FERGUSON, R.L., 1977. Linear regression in geography. Concepts and Techniques in Modern Geography #15. (Norwich, England: Geo-Books). 44p. FERGUSON, R.L., WOOD, L.L., AND G R A H A M , D.B., 1993. Monitoring spatial change in seagrass habitat with aerial photography. Photogrammetric Engineering And Remote Sensing, 59(6): 1033-1038. FIGHT, R.D. AND BELL, E.F., 1994. Coping with uncertainty: a conceptual approach for timber management planning. USDA Forest Service General Technical Report PNW-5920pp. FISHER, P.F., 199 la. First experiments in viewshed uncertainty: the accuracy of the viewshed area. Photogrammetric Engineering and Remote Sensing, 57(10): 1321-1327. FISHER, P.F., 1991b. Modeling and visualizing uncertainty in GIS. In NCGIA Research Initiative 7: Visualization of Spatial Data Quality, edited by M.K. Beard and B.P. Buttenfield (NCGIA), p. C63-C70. 165 FISHER, P.F., 1991c. Modelling soil map-unit inclusions by Monte Carlo simulation. International Journal of Geographical Information Systems, 5(2): 193-208. FISHER, P.F., 1992. First experiments in viewshed uncertainty: simulating fuzzy viewsheds. Photogrammetric Engineering and Remote Sensing, 58(3):345-352. FISHER, P.F., 1993. Visualizing uncertainty in soil maps by animation. Cartographica 30(2-3):20-29. FISHER, P.F., 1994. Visualization of the reliability in classified remotely sensed images. Photogrammetric Engineering and Remote Sensing, 60(7):905-910. FISHER, P.F., 1996. Extending the applicability of viewsheds in landscape planning. Photogrammetric Engineering And Remote Sensing, 62( 11): 1297-1302. FLAMM, R.O. AND TURNER, M.G., 1994. Alternative model formulations for a stochastic simulation of landscape change. Landscape Ecology, 9(l):37-46. FONSECA, L . M . AND MANJUNATH, B.S., 1996. Registration techniques for multisensor remotely sensed imagery. Photogrammetric Engineering And Remote Sensing, 62(9): 1049-1056. FOODY, G.M., 1996. Approaches for the production and evaluation of fuzzy land cover classifications from remotely-sensed data. International Journal Of Remote Sensing, 17(7): 1317-1340. FRIDLAND, V.M., 1974. Structure of the soil mantle. Geoderma, 12:35-41. GARCIA, M.A., 1994. Terrain modeling with uncertainty for geographic information systems. Proceedings- SPIE- The International Society For Optical Engineering, 2357//PT1:273. GlMBARZEVSKY, P., 1988. Mass wasting on the Queen Charlotte Islands: A regional inventory. (Victoria, BC: BC Ministry of Forests and Lands) ; Land Management Report #29. GOLD, C M . , 1991. Applications of dynamic voronoi data structures. In Second European Conference on GIS. Brussels, April, p. 1090-1098. GOODCHILD, M.F., 1980. Algorithm 9: simulation of autocorrelation for aggregate data. Environment and Planning A, 12:1073-1081. GOODCHILD, M.F., 1989. Modeling error in objects and fields. In The Accuracy Of Spatial Databases, edited by M. Goodchild and S . Gopal (Bristol, PA: Taylor & Francis), pp. 107-114. GOODCHILD, M.F., 1991. Issues of quality and uncertainty. In Advances In Cartography, edited by J.C. Muller (New York: Elsevier Applied Science Series), pp 140-172. GOODCHILD, M.F., 1994. GIS error models and visualization techniques for spatial variability of soils. In 15th World congress of soil science. Acapulco; Mexico, Jul 1994.(Mexican Society of Soil Science; ISSS), pp. 683-698. GOODCHILD, M.F., BUTTENFIELD, B., AND WOOD, J . , 1994. Introduction to visualizing data quality. In Visualization in Geographical Information Systems, edited by H.M. Hearnshaw and D.J. Unwin (New York: John Wiley and Sons), pp. 141-149. GOODCHILD, M.F., CHIH-CHANG, L., AND LEUNG, Y , 1994. Visualizing fuzzy maps. In Visualization in Geographical Information Systems, edited by H.M. Hearnshaw and D.J. Unwin (New York: John Wiley & Sons, Inc.), pp. 158-167. GORDON, J . AND SHORTLIFFE, E.H. 1992. The Dempster-Shafer theory of evidence: rule-based expert systems. In The MYCIN Experience of the Stanford Heuristic Programming Project, edited by B.G. Buchanan and E.H. Shortliffe (Reading, MA: Addison-Wesley) 166 GOTTESFELD-BROWN, L., 1992. A Survey of Image Registration Techniques. ACM Computing Surveys, 24(4):325-375. GOVERNMENT OF .B.C., 1994. Forest Practices Code of British Columbia Act. S.B.C. 1994, c. 41. GRAHAM, L.A., 1993. Airborne video for near-real-time vegetation mapping. Journal of Forestry, (Aug):28-34. GUPTILL, S.C. AND MORRISON, J.L., 1995. Elements of Spatial Data Quality (New York: Elsevier). HAINING, R.P., GRIFFITH, D.A., AND BENNETT, R.J., 1983. Simulating two-dimensional autocorrelated surfaces. Geographical Analysis, 15:247-253. HALL, G.B. AND WANG, F., 1992. Comparison of Boolean and fuzzy classification methods in land suitability analysis by using geographical information systems. Environment & Planning A, 24(4):497-516. HAMMOND, C , HALL, D., MILLER, S., AND SwETIK, P., 1992. Level I Stability Analysis (LISA) Documentation. Intermountain Research Station, General Technical Report INT-285, (Ogden, Utah: United States Forest Service, Department of Agriculture) . HANNAH, M.J., 1981. Error detection and correction in digital terrain models. Photogrammetric Engineering andRemote Sensing, 47(l):63-69. HARRIS, P.M., 1995. The integration of geographic data with remotely sensed imagery to improve classification in an urban area. Photogrammetric Engineering And Remote Sensing, 61(8):993. HARTIGAN, J.A., 1975. Clustering Algorithms (New York: Wiley). HEARNSHAW, H.M. AND UNWIN, A.R., 1994. Visualization in Geographical Information Systems (New York: John Wiley & Sons). HERRING, J.R., 1991. Using spline functions to represent distributed attributes. In Proceedings: Auto-Carto 10 . Baltimore, Maryland, March., pp. 46-57. HEUVELINK, G.B.M., 1995. Identification of an error model for quantitative spatial attributes under different models of spatial variation. In International symposium : Spatial accuracy of natural resource data bases: unlocking the puzzle, edited by R. Congalton. Williamsburg, VA, May 1994.(Bethesda, MD: ASPRS), p. 267. HEUVELINK, G.B.M., BURROUGH, P.A., AND STEIN, A., 1989. Propagation of errors in spatial modelling with GIS. International Journal of Geographical Information Systems, 3(4): 303-322. HEUVELINK, G.B. AND BURROUGH, P.A., 1993. Error propagation in cartographic modelling using Boolean logic and continuous classification. International Journal of Geographical Information Systems, 7(3):231-246. HOPE, A.C.A., 1968. A simplified Monte Carlo significance test procedure. Journal of the Royal Statistical Society, B30:583-598. HORD, R.M. AND BROONER, W., 1976. Land-use map accuracy criteria. Photogrammetric Engineering & Remote Sensing, 42(5):671-677. HOUGARD, T.R. AND VALDEZ, R.A., 1994. A multi-temporal, multi-accuracy fisheries GIS database for the Colorado River in Grand Canyon, Arizona. In International Symposium on Spatial Accuracy of Natural Resource DataBases: Unlocking the Puzzle, edited by R.G. Congalton. Williamsburg, Virginia, May 16-20 1994.(Bethesda, Maryland: American Society for Photogrammetry and Remote Sensing), pp. 67-77. HOWES, D., HARPER, J . , AND OWENS, E., 1994. British Columbia physical shore-zone mapping system. (Victoria: Resources Inventory Committee) 71 p. 167 HUDSON, W.D. AND RAMM, C.W., 1987. Correct formulation of the kappa coefficient of agreement. Photogrammetric Engineering & Remote Sensing, 53(4) :421 -422. HUNTER, G.J., 1995. A new model for handling vector data uncertainty in geographic information systems. In 33nd Annual conference—Information technology linking the Americas: your network to an expanded world, edited by M. Sailing. San Antonio; TX, Jul 1995., pp. 410-419. HUNTER, G.J., ROBEY, B.H., AND GOODCHILD, M.F., 1996. Experimental development of a model of vector uncertainty. In Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, edited by H.T. Mowrer, R.L. Czaplewski, and R.H. Hamre. Fort Collins, Colorado, May 21-23.(Fort Collins, Colorado: Rocky Mountain Forest and Range Experimental Station), pp. 217-224. ILES, K., 1994. Directions in Forest Inventory. Journal of Forestry, 92(12): 12. JOURNAL, A.G., 1996. Modelling uncertainty and spatial dependence: stochastic imaging. International Journal of Geographical Information Systems, 10(5): 517-522. JOY, M.W., KLINKENBERG, B., AND CUMMING, S., 1994. Handling uncertainty in a spatial forest model integrated with GIS. In GIS '94. Vancouver, B.C., February. (Ministry of Supply and Services, Canada), pp. 360-365. K A N A Z A W A , K., 1994. GOO: a database for temporal uncertainty management. In Proceedings of the 10th Biennial conference of the Canadian Society for Computational Studies of Intelligence. Banff, Alberta, May., pp. 197-204. KELLER, CP., 1994. Exploratory spatial data analysis (ESDA) - The next revolution in GIS. In GIS '94. Vancouver, B.C., February. (Ministry of Supply and Services, Canada), pp. 298-302. KELLER, J.M., 1997. Fuzzy set theory in computer vision: A prospectus. Fuzzy Sets and Systems, 90(2): 177-185. KING, D. AND VLCEK, J . , 1990. Development of a multispectral video system and its application in forestry. CanadianJournal of Remote Sensing, 16(1): 15-22. KOLLIAS, V.J. AND VOLIOTIS, A., 1991. Fuzzy reasoning in the development of geographical information systems. FRSIS: A prototype soil information system with fuzzy retrieval capabilities. International Journal of Geographical Information Systems, 5(2):209-223. KONTOES, C , WILKINSON, G.G., BURRILL, A., GOFFREDO, S., AND MEGIER, J., 1993. An experimental system for the integration of GIS data in knowledge-based image analysis for remote sensing of agriculture. International Journal Of Geographical Information Systems, 7(3):247. KOSKO, B. AND ISAKA, S., 1993. Fuzzy Logic. Scientific American, July:76-81. KUNKEL, R. AND WENDLAND, F., 1997. WEKU - a GIS-supported stochastic model of groundwater residence times in upper aquifers for the supraregional groundwater management. Environmental Geology, 30(1-2): 1-9. LANGRAN, G., 1992. Time in Geographic Information Systems (Philadelphia: Taylor&Francis). LANTER, D.P. AND VEREGIN, H., 1992. A research paradigm for propagating error in layer-based GIS. Photogrammetric Engineering & Remote Sensing, 58(6): 825-833. LEE, J . , SNYDER, P.K., AND FISHER, P.F., 1992. Modeling the effect of data errors on feature extraction from digital elevation models. Photogrammetric Engineering and Remote Sensing, 58(10): 1461-1467. LEESE, M.N. AND MAIN, P.L., 1994. The efficient computation of Mahalanobis Distances and their interpretation in archaeometry. Archaeometry, 36(2): 307. 168 Liu, R. AND HERRINGTON, L.P., 1996. The expected cost of uncertainty in geographic data. Journal of Forestry, 94(12):27-31. Liu, R., 1995. The effects of spatial data errors on the GIS-based forest management decisions. PhD Dissertation: State University Of New York Col. of Environmental Science & Forestry. 224p. LIVINGSTONE, D. AND RAPER, J . , 1994. Modelling environmental systems with GIS: theoretical barriers to progress. In Innovations in GIS 1, edited by M.F. Worboys (Bristol, PA: Taylor & Francis), pp. 229-240. LODIN, M. AND SKEA, D., 1996. A new method for evaluating positional map accuracy. In Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, edited by H.T. Mowrer, R.L. Czaplewski, and R.H. Hamre. Fort Collins, Colorado, May 21-23.(Fort Collins, Colorado: Rocky Mountain Forest and Range Experimental Station). LOWELL, K.E., 1993a. Initial studies on the quantification and representation of uncertainty in forestry data. In GIS '93 Symposium. Vancouver, BC, Feb 1993., pp. 791-796. LOWELL, K.E., 1993b. Predictive model development and evaluation with unknown spatial units. Photogrammetric Engineering and Remote Sensing, 59(10): 1509-1515. LOWELL, K.E., 1997. An empirical evaluation of spatially based forest inventory samples. Canadian Journal of Forest Research, 27(3):352. LOWELL, K.E., EDWARDS, G., AND LANGRAN-KuCERA, G., 1996. Modeling heterogeneity and change in natural forests. Geomatica, 50(4):425-440. MACDOUGALL, E.B., 1975. The accuracy of map overlays. Landscape Planning, 2:23-30. MACEACHREN, A.M., 1992. Visualizing uncertain information. Cartographic Perspectives, 13:11-17. MACEACHREN, A.M., 1994. Time as a cartographic variable. In Visualization in Geographical Information Systems, edited by H.M. Hearnshaw and D.J. Unwin (New York: John Wiley & Sons, Inc.), pp. 115-130. MACLEAN, A.L., D'AVELLO, T.P., AND SHETRON, S.G., 1993. The use of variability diagrams to improve the interpretation of digital soil maps in a GIS. Photogrammetric Engineering and Remote Sensing, 59(2):223-228. MAFFINI, G., ARNO, M., AND B l T T E R L l C H , W., 1989. Observations and comments on the generation and treatment of error in digital GIS data. In The Accuracy Of Spatial Databases, edited by M. Goodchild and S. Gopal (Bristol, PA: Taylor & Francis), pp. 55-68. MAGNUSSEN, S., 1996. A co-ordinate-free area variance estimator for forest stands with a fuzzy outline. Forest Science, 42(l):76-85. MARK, D.M. AND CSILLAG, F , 1989. The nature of boundaries on 'area-class' maps. Cartographica, 26(l):65-77. MARSHALL, P.L., 1986. Risk and uncertainty in the analysis of timber supply. Report #85-12. Dept. of Forest Resource Management, Faculty of Forestry, UBC. 12 p. MARSMAN, B. AND DE GRUIJTER, J.J. , 1984. Dutch soil survey goes into quality control. In Soil Information Systems Technology, edited by P.A. Burrough and S.W. Bie (Pudoc, Wageningen, The Netherlands), pp. 127-134. MARSMAN, B. AND DE GRUIJTER, J.J. , 1986. Quality of soil maps: a comparison of survey methods. Soil Survey Papers No. 15. (Netherlands: Netherlands Soil Survey) . MASCARENHAS, N.D., BANON, G.J., AND CANDEIAS, A.L., 1996. Multispectral image data fusion under a Bayesian approach. International Journal Of Remote Sensing, 17(8): 1457-1471. 169 MATHER, P.M., 1995. Map-image registration accuracy using least-squares polynomials. International Journal Of Geographical Information Systems, 9(5):543. MATHIEU, S., LEYMARIE, P., AND BERTHOD, M., 1994. Determination of proportions of land use blend in pixels of a multispectral satellite image. IEEE Transactions, 1154-1156. MCBRATNEY, A.B., 1994. Allocation of new individuals to continuous soil classes. Australian Journal of Soil Research, 32:623-633. MCBRATNEY, A.B. AND DEGRUIJTER, J.J., 1992. A continuum approach to soil classification by modified fuzzy k-means with extragrades. Journal of SoU. Sciences, 43:159-175. MCBRATNEY, A.B. AND MOORE, A. W., 1985. Application of fuzzy sets to climatic classification. Agricultural and Forest Meteorology, 35:165-185. MENDOZA, G.A. AND SPROUSE, W., 1989. Forest planning and decision making under fuzzy environments: an overview and illustration. Forest Science, 35(2):481-502. MINISTRY OF FORESTS, BC., 1995. Computer assisted methods to assess positional accuracy of forest cover map updates. Manuscript. 42 p. MONCKTON, C , 1993. An investigation into the spatial structure of error in digital elevation data. In First National conference on GIS research. (Bristol, PA; Taylor and Francis) pp. 201-214. MORRISON, P. H . , 1975. Ecological and geomorphological consequences of mass movements in the Alder Creek watershed and implications for forest land management. B.Sc. Thesis: Eugene, Oregon: University of Oregon. MOWRER, H.T., 1995. Monte Carlo techniques for propagating uncertainty through simulation models and raster-based GIS. In International symposium: Spatial accuracy of natural resource data bases: unlocking the puzzle, edited by R. Congalton. Williamsburg, VA, May 1994.(Bethesda, MD: ASPRS), pp. 179-188. MOWRER, H.T., 1997. Propagating uncertainty through spatial estimation processes for old-growth subalpine forests using sequential Gaussian simulation in GIS. Ecological Modelling, 98(l):73-86. MUCHONEY, D.M. AND HAACK, B., 1994. Change detection for monitoring forest defoliation. Photogrammetric Engineering And Remote Sensing, 60( 10): 1243-1251. MULDER, J.A. AND CORNS, I.G.W., 1995. NAIA: a decision support system for predicting ecosystems from existing land resource data. Al Applications, 9(l):49-60. NAESSET, E., 1996. Use of the weighted kappa coefficient in classification error assessment of thematic maps. International Journal Of Geographical Information Systems, 10(5):591-603. NCDCDS, NATIONAL COMMITTEE FOR DIGITAL CARTOGRAPHIC DATA STANDARDS., 1988. The proposed standard for digital cartographic data. The American Cartographer, 15(1):9-140. NEWCOMER, J.A. AND SZAJGIN, J., 1984. Accumulation of thematic map errors in digital overlay analysis. The American Cartographer, ll(l):58-62. ODEH, I.O., MCBRATNEY, I.B., AND CHITTLEBOROUGH, D.J., 1992. Soil pattern recognition with fuzzy c-means: Applications to classification and soil-landform interrelationships. Soil Science Society of America Journal, 56:505-516. OPENSHAW, S., 1989. Learning to live with errors in spatial databases. In The Accuracy Of Spatial Databases, edited by M. Goodchild and S. Gopal (Bristol, PA: Taylor & Francis), pp. 263-276. 170 OPENSHAW, S., CHARLTON, M., AND CARVER, S., 1990. Error propagation: a Monte Carlo simulation. In Handling Geographical Information, edited by I. Masser and M. Blakemore (UK: Longman Scientific), pp. 78-95. OPENSHAW, S., WAUGH, D., AND CROSS, A., 1994. Some ideas about the use of map animation as a spatial analysis tool. In Visualization in Geographical Information Systems, edited by H.M. Hearnshaw and D.J. Unwin (New York: John Wiley and Sons). pp 131 -139. OR, D. AND HANKS, R.J., 1992. Spatial and temporal soil water estimation considering soil variability and evapotranspiration uncertainty. Water Resources Research, 28(3):803-814. ORLAND, B., 1994. Visualization techniques for incorporation in forest planning geographic information system. Landscape and Urban Planning, 30:83-97. OWENS, T. AND MCCONVILLE, D., 1996. A method for measuring the spatial accuracy of coodinates collected using the global positioning system. In Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Second International Symposium, edited by H.T. Mowrer, R.L. Czaplewski, and R.H. Hamre. Fort Collins, Colorado, May 21-23.(Fort Collins, Colorado: USDA Forest Service), pp. 351-358. PALUBINSKAS, G., 1994. Comparison of texture-based and fuzzy classification approaches for regenerating tropical forest mapping using Landsat TM. In Proceedings: The International Society for Optical Engineering, Spatial Information From Digital Photogrammetry and Computer Vision, edited by H. Ebner and C. Heipke. Munich; Germany, September.(Society for Photogrammetry and Remote Sensing.), pp. 640-647. PEIPE, J . , 1995. High resolution still video camera for industrial photogrammetry. Photogrammetric Record, 85:135. PERKAL, J . , 1966. An attempt at objective generalization. GeodeziaEs Kartografia, VII(2): 130-142. PEUCKER, T.K., 1975. A theory of the cartographic line. In Proceedings International Symposium on Computer-assisted Cartography. AUTO CARTOII, September 21-25.(US Department of Commerce and the ACSM), pp. 508-518. PHINN, S.R., S T O W , D.A., AND ZEDLER, J.B., 1996. Monitoring wetland habitat restoration in Southern California using airborne multispectral video data. Restoration Ecology, 4(4):412. P l E T R Z Y K , U., H E R H O L Z , K., AND HEISS, W.-D., 1995. An interactive technique for three-dimensional image registration: validation for PET, SPECT, MRI and CT brain studies. The Journal of Nuclear Medicine, 35(12):2011. POLLARD, S.J.T., 1994. Embracing uncertainty: developing risk assessment methodologies for environmental management in Europe. In 2nd International Symposium on Environmental Contamination in Central and Eastern Europe. Budapest, Sep 1994., pp. 574-576. PREPARATA, F.P. AND SHAMOS, M.I., 1985. Computational Geometry (Berlin: Springer-Verlag). P R I C E , K.P., 1992. Shrub dieback in a semiarid ecosystem: the integration of remote sensing and geographic information systems for detecting vegetation change. Photogrammetric Engineering And Remote Sensing, 58(4):455. RAMLAL, B., 1996. A data quality model for the representation of mixed spatial variation. PhD Dissertation: University Of Maine. 266p. REED, W.A., 1991. Planting decisions in the face of uncertainty. FEPA Research Unit Working Paper #156, University of B C , (Vancouver, BC: UBC) 30p. RlDD, M.K. AND LIU, J . , 1998. A Comparison of Four Algorithms for Change Detection in an Urban Environment. Remote Sensing of Environment, 63(2):95-102. 171 RlNGROSE, S. AND MATHESON, W., 1992. The use of Landsat MSS imagery to determine the aerial extent of woody vegetation cover change in the west-central Sahel. Global Ecology and Biogeography Letters, 2(1): 16-25. ROBBINS, B.D., 1997. Quantifying temporal change in seagrass areal coverage: the use of GIS and low resolution aerial photography. Aquatic Botany, 58(3):259-265. ROBINOVE, C.J., CHAVEZ, P.S., GEHRING, D., AND HOLMGREN, R., 1981. Arid land monitoring using LANDSAT albedo images. Remote Sensing of Environment, 11:133-156. ROBINSON, V.B., 1988. Some implications of fuzzy set theory applied to geographic databases. Computers, Environment and UrbanSystems, (12):89-98. ROGOWSKI, A.S., 1996. Incorporating soil variability into a spatially distributed model of percolate accounting. In Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, edited by H.T. Mowrer, R.L. Czaplewski, and R.H. Hamre. Fort Collins, Colorado, May 21-23.(Fort Collins, Colorado: Rocky Mountain Forest and Range Experimental Station), pp 178-185. ROGOWSKI, A.S., 1996. Quantifying soil variability in GIS applications: Spatial distribution of soil properties. International Journal of Geographical Information Systems, 10(4):455-475. ROHRER, W.H. AND SPARKS, D.L., 1993. Express saccades - the effects of spatial and temporal uncertainty. Vision Research, 33(17):2447-2460. ROOD, K.M., 1984. An aerial photograph inventory of the frequency and yield of mass wasting on the Queen Charlotte Islands, B.C. (Victoria, BC: BC Ministry of Forests and Lands Management Report #34). RUBENSTEIN, R.Y., 1981. Simulation and the Monte Carlo Method (New York: John Wiley & Sons). RUIZ, M. H., 1995. A model of error propagation from digital elevation models to viewsheds: PhD Dissertation: University of Florida. 175p. RUSINEK, H., TSUI, W.-H., AND LEVY, A.V, 1993. Principle axes and surface fitting methods for three-dimensional image registration. The Journal of Nuclear Medicine, 34(11):2019-2026. RUSPINI, E.H., 1969. A n