STATISTICAL A V A L A N C H E FORECASTING USING M E T E O R O L O G I C A L D A T A F R O M B E A R PASS, BRITISH COLUMBIA, C A N A D A by JAMES A N T O N Y F L O Y E R B.Sc.(Hons.), The University of Edinburgh, UK, 1998 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF T H E REQUIREMENTS FOR T H E D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Geography) We accept this thesis as conforming te-the, requir^ standard, T H E UNIVERSITY OF BRITISH C O L U M B I A August 2003 © James Antony Floyer, 2003 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t h e h e a d o f my d e p a r t m e n t o r by h i s o r h e r r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t copying or p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l gain s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Department'of The U n i v e r s i t y o f B r i t i s h V a n c o u v e r , Canada Columbia Abstract A framework is presented for a numerical prediction scheme with a prediction accuracy of greater than 70% at Bear Pass, British Columbia, Canada. One way analysis of variance and canonical discriminant analysis are used to identify the principal variables that allow discrimination between avalanche and non-avalanche time periods. The optimum variable set is then used in a Mahalanobis-metric nearest neighbours model to determine the observations in the historical database that most closely represent conditions of the forecast period. The information about these neighbours can be used by the forecaster to aid in making an avalanche forecast. In this study, a univariate analysis of the meteorological and snowpack data from The Pass is used to assess the contribution of each of the variables for use in avalanche prediction. Using these data, the snow-climate of the area is assessed and it is shown that Bear Pass may be classified as a maritime snow climate, although some elements of a transitional snow-climate are evident. A canonical discriminant analysis is performed for both a two-group discrimination into avalanche and non-avalanche periods and for a three-group discrimination into wet, dry and non-avalanche periods. For the three-group discrimination, a two-stage discrimination is preferred, involving first discriminating into wet-avalanche or dry-avalanche periods and then discriminating between wet or dry-avalanche and non-avalanche periods. The analysis is performed for all areas in a combined analysis and also for individual sub-areas defined within The Pass. Improvements on classification rates to three out of the four sub-areas are found compared to the analysis for the whole Pass. ii Various parameters that affect the nearest neighbour model performance are analysed. The effect of group size distribution is assessed, as well as the number and threshold number of neighbours that form the basis for the forecast decision. The effect of dimensionality on the model is also analysed. The Mahalanobis-metric nearest neighbours model is compared to Cornice, a nearest neighbours model developed for avalanche forecasting in Scotland. Classification rates for the two models are found to be very similar. Finally, preliminary findings from a new method using time patterns of predictor variables is presented. The method makes use of time series data from remote weather stations. Sections of the time series are similar to present conditions are identified through a deconvolution process. It is hoped that this method may offer improvements to the conventional nearest neighbours method by implicitly assuming that meteorological conditions are correlated in time. iii Table of Contents ABSTRACT II TABLE OF CONTENTS LIST O F T A B L E S IV VIII LIST O F FIGURES IX ACKNOWLEDGEMENTS XI NOTES O N T H E INCLUSION O F E X T R A C T S F R O M PREVIOUS W O R K C H A P T E R 1. INTRODUCTION XII 1 1.1 GEOGRAPHIC CHARACTERISTICS 1 1.2 TERRAIN CHARACTERISTICS 2 1.3 A V A L A N C H E FREQUENCY ANALYSIS ALONG T H E PASS 3 1.4 INDIVIDUAL AREAS WITHIN T H E PASS 5 C H A P T E R 2. REVIEW O F A V A L A N C H EPREDICTION M O D E L S 8 2.1 INTRODUCTION 8 2.2 DETERMINATION OF THE MOST SIGNIFICANT METEOROLOGICAL FACTORS 9 2.3 DISCRIMINANT ANALYSIS 2.4 2.5 2.3.1 Discrimination between wet and dry-avalanche periods 2.3.2 Variable selection in discriminant analysis CLUSTER ANALYSIS 2.4.1 Buser 's nearest neighbours model 2.4.2 Models using the Euclidean distance metric. 2.4.3 Models using the Mahalanobis distance metric SUMMARY AND CONCLUSIONS C H A P T E R 3. PREPARATION OF T H E DATABASE 11 11 12 14 14 75 16 17 19 3.1 INTRODUCTION 19 3.2 WEATHER AND SNOWPACK DATA 20 iv 3.3 3.4 CONVERTING CATEGORICAL DATA INTO NUMERICAL DATA 23 3.3.1 Wind speed. 23 3.3.2 Wind direction 23 3.3.3 Precipitation type 25 3.3.4 Snow surface condition 25 A V A L A N C H E OCCURRENCE DATA 25 3.4.1 Canadian avalanche size classification scheme 3.4.2 Avalanche activity index 3.4.3 Moisture index 3.4.4 Filtering of avalanches by size 3.4.5 Filtering of avalanches by trigger type C H A P T E R 4. 26 26 26 27 27 UNIVARIATE ANALYSIS O F VARIABLES 29 4.1 INTRODUCTION 29 4.2 SNOW-CLIMATE OF BEAR PASS 29 4.2.1 Snowpack depth 30 4.2.2 Air temperature 4.2.3 Snow density 31 4.2.4 Discussion and summary 33 31 A3 A V A L A N C H E PROBABILITY PLOTS 34 4.4 ANALYSIS OF VARIANCE 35 4.5 ANALYSIS OF INDIVIDUAL VARIABLES 40 4.5.1 Precipitation type 40 4.5.2 Snowfall rate 40 4.5.3 Maximum temperature 4.5.4 Present temperature 41 4.5.5 Minimum temperature 41 4.5.6 Relative humidity 42 4.5.7 New snow board 42 : •' 41 v 4.5.8 Interval snow board 42 4.5.9 Storm snow board 42 4.5.10 Surface snow condition 43 4.5.11 Snowpack depth 43 4.5.12 Water-equivalent new precipitation 44 4.5.13 Foot penetration 44 4.5.14 Windspeed. 44 4.5.15 Wind direction 45 4.5.16 Lagged temperature variables 4.6 SUMMARY AND CONCLUSIONS C H A P T E R 5. 45 46 DISCRIMINANT ANALYSIS 48 5.1 INTRODUCTION 48 5.2 M E T H O D OF ANALYSIS 48 5.3 TWO-GROUP DISCRIMINATION 50 5.4 5.5 5.3.1 Canonical coefficients 50 5.3.2 Interactions between variables 5.3.3 Optimal variables 54 5.3.4 Discussion of variables 55 5.3.5 Classification rates 56 5.3.6 Analysis of individual areas 57 THREE-GROUP DISCRIMINATION 51 59 5.4.1 Direct three-group classification 59 5.4.2 Indirect three-group classification 61 CONCLUSIONS C H A P T E R 6. 64 NEAREST NEIGHBOURS ANALYSIS 6.1 INTRODUCTION 6.2 DISTANCE METRIC 67 67 .- 68 vi 6.3 SELECTION CRITERIA FOR DETERMINING GROUP MEMBERSHIP 70 6.4 LEARNING AND TESTING DATA SETS 71 6.5 RESULTS OF THE NEAREST NEIGHBOURS ANALYSIS 72 6.6 6.7 6.5.1 Groups of equal size 72 6.5.2 Groups of unequal size 75 6.5.3 The effect of dimensionality 76 6.5.4 Wet and dry-avalanche classification COMPARISON WITH CORNICE MODEL 6.6.1 Rationale 6.6.2 Differences in mode of operation.'..:. 6.6.3 Comparison of results CONCLUSIONS C H A P T E R 7. 7.1 CONCLUSIONS RECOMMENDATIONS FOR FURTHER WORK - TIME PATTERNS OF PREDICTOR VARIABLES 78 79 79 79 81 81 83 85 7.1.1 Motivation 85 7.1.2 Proposed method 7.1.3 Deconvolution 7.1.4 Determining the desired output 7.1.5 Testing the method 7.1.6 Selecting the nearest neighbours 89 7.1.7 Limitations with the method. 90 7.1.8 Conclusions 85 86 88 88 92 BIBLIOGRAPHY 94 APPENDIX A: A V A L A N C H E P A T H S A L O N G B E A R PASS 97 APPENDIX B: S O F T W A R E A N D P R O G R A M M I N G 101 APPENDIX C: M E T A D A T A F O R T H E DIGITAL E L E V A T I O N M O D E L O F B E A R PASS 101 vii List of Tables Table 1. Variables recorded at the Endgoal manual weather observation site. 21 Table 2. Canadian avalanche size classification scheme with typical factors 26 Table 3. Snow-climate zones for the United States based on 15 seasons of data. 30 Table 4. Group means, F-ratio and p-values from the ANOVA shown for each variable. 46 Table 5. CDA results using all variables for discrimination between avalanche and non-avalanche periods.. 57 Table 6. Total correlation matrix for the variables used in the CDA for separation into avalanche and nonavalanche groups 53 Table 7. CDA results for the best variable group for discrimination between avalanche and non-avalanche periods 54 Table 8. Classification matrix (using the test sample) for discrimination into avalanche and non-avalanche groups for both individual areas andfor the whole pass 58 Table 9. Classification matrix for the testing data set for the'direct three-group classification into wet / dry / non-avalanche groups 61 Table 10. Results for the indirect three-group classification scheme. 62 Table 11. Optimum variable sets for wet/dry, wet/ non and dry / non discriminations. 63 Table 12. Comparison of classification rates between nearest neighbours models 81 Table 13. List of the avalanche paths that comprise each of the individual areas along Bear Pass viii 97 List of Figures Figure 1. Map ofBritish Columbia showing Bear Pass, Kootenay Pass and Rogers Pass avalanche areas.. 2 Figure 2. Digital elevation model (DEM) of the main section of Bear Pass showing the avalanche risk based on the number of avalanches that impacted the highway for each path between 1982 and 2002 4 Figure 3. Average snowpack depth at the Endgoal /Little Entrance weather observation site for 20 seasons (November - April) between 1982 and 2002 32 Figure 4. Average air temperature measured at the Endgoal / Little Entrance weather observation site for 20 seasons (November — April) between 1982 and 2002 34 Figure 6. Probability plot for precipitation type for three levels ofAAL 36 Figure 8. Probability plot for maximum temperature for three levels ofAAL 36 Figure 9. Probability plot for present temperature for three levels ofAAL 36 Figure 10. Probability plot for minimum temperature for three levels ofAAL 37 Figure 11. Probability plot for relative humidity for three levels ofAAL 37 Figure 12. Probability plot for the new snow boardfor three levels ofAAL 37 Figure 13. Probability plotfor the interval snow boardfor three levels ofAAL 37 Figure 14. Probability plot for the storm snow boardfor three levels ofAAL 38 Figure 15. Probability plot for the snow surface condition for three levels ofAAL 38 Figure 16. Probability plot for the snowpack depth for three levels ofAAL 38 Figure 17. Probability plotfor the water-equivalent new precipitation for three levels ofAAL 38 Figure 18. Probability plot foot penetration for three levels ofAAL 39 Figure 19. Probability plot for wind speedfor three levels ofAAL 39 Figure 20. Probability plot for wind direction for three levels ofAAL 39 Figure 21. Avalanche probability vs. avalanche occurrences for 1-Jan to 31-Mar 1995 57 Figure 22. Canonical scores plotfor the direct wet / dry / non-avalanche discrimination 60 Figure 23. Schematic representation of the nearest neighbours to a time period in 3D space. ix 68 Figure 24. The effect ofvarying the threshold number ofneighbours, k for groups of even size. 74 Figure 25. The effect of varying the number ofneighbours, k on the classification rate. 75 T Figure 26. The effect of varying the threshold number of neighbours, k for groups of uneven size. 76 T Figure 27. The effect of dimensionality on the classification rate for the avalanche / non-avalanche group classification. 78 Figure 28. Time series ofpresent temperature data for 2000from the Disraeli remote weather station. 86 Figure 29. Deconvolved output for the original time series deconvolved by the optimum inversefilterof the wavelet. 89 Figure 30. Deconvolved output for the original time series deconvolved by the optimum inversefilterfor a wavelet defined in a low-amplitude portion of the time series 91 Figure 31. Map ofBear Pass showing the location of all avalanche paths 100 X Acknowledgements This research was supported by the Natural Sciences and Engineering Research Council of Canada and Canadian Mountain Holidays Chair in Snow and Avalanche Science at the University of British Columbia, Canada. Thanks are extended to the Technicians of the Ministry of Transportation who assisted in collecting and collating the database used in this project. Doug Wilson of the Ministry of Transportation is especially thanked for imparting his extensive knowledge of the Bear Pass highway operation and for identifying the individual areas within the pass. Ross Purves, Keith Morrison, Graham Moss and Blythe Wright are credited with producing the Cornice nearest neighbours model, which is an excellent piece of software used for avalanche forecasting in Scotland. It is compared here to a similar model presented in this study. Ross is especially thanked for his support in ensuring that the Bear Pass data could be loaded successfully into the model. Acknowledgement and gratitude is extended to Dr D. McClung, for providing thoughtful discussions on many aspects of the avalanche-forecasting problem. On a personal level, my thanks go out to the friends who kept me sane during the writing of this thesis and in particular to Verena Blasy, whose energy and spirit is a constant inspiration. xi Notes o n the inclusion of extracts from previous w o r k Some sections in this study have been presented previously in Floyer and McClung (2003, in press). In all cases, the script has been edited to ensure consistency of style within this manuscript. I confirm that I was the principle author of this paper and that the publishers of this work (Elsevier B.V., Co Clare, Ireland) have granted permission for reproduction of the material in this study. Much of Chapter 5, Discriminant analysis is taken from Floyer and McClung (2003, in press), although additional information regarding classification into wet and dry-avalanche periods, as well as an expansion on the mathematics behind the analysis are presented in this study. Other sections that contain extensive extracts from this paper are: Section 1.4, Individual areas within The Pass and Section 4.4, Analysis of variance. Limited extracts from this paper are contained in Chapter 3, Preparation of the database, although the methods used for processing the data are described in far greater detail in this study. xii C H A P T E R 1. INTRODUCTION 1.1 Geographic characteristics Bear Pass is located on Highway 37A between Meziadin Junction and Stewart on the west coast of northern British Columbia, Canada (see Figure 1). This highway serves the small coastal communities of Stewart, British Columbia and Hyder, Alaska; the only other access to these communities is by boat. As a result, members of these communities rely heavily on this highway for access to provisions and services. However, due to a combination of steepsided mountains and high annual snowfall, Bear Pass contains some of the most extreme avalanche terrain in the region, which poses a serious potential hazard to travellers on the highway during winter months. The area experiences a wet climate with temperatures moderated by proximity to the Pacific Ocean. This climate leads to high snowfall rates, which average 8.5 m per winter season. The highway is at a relatively low elevation, with the highest point along The Pass being at 450 m above sea level. To manage the avalanche hazard, the Ministry of Transportation maintains an active avalanche program, based out of Stewart. A combination of road defences, control work through the use of explosives and road closures ensure public safety during periods of high avalanche risk. Avalanche risk assessment is primarily done using conventional avalanche forecasting techniques (LaChapelle, 1980). If the risk is assessed to be too high for public use and control work is not possible, then the highway will be closed to traffic. Road closures, although safe, are unpopular. As a result, forecasters are continually looking for ways to 1 make a more accurate forecast, which allows them to minimise the amount of road closures while maintaining an acceptable level of risk to the travelling public. Numerical prediction models may be able to assist with this by increasing the reliability of avalanche forecasts. A scheme that is suitable for avalanche forecasting at Bear Pass is presented in this study. Figure 1. Map of British Columbia showing Bear Pass, Kootenay Pass and Rogers Pass avalanche areas. 1.2 Terrain characteristics Bear Pass contains dramatic terrain, with elevation differences between the highway and the highest start zones of up to 1900m. Forecasters have identified 78 separate avalanche paths 2 along Bear Pass (see Appendix A). A prominent feature in the centre of The Pass is the Bear River Glacier, which terminates in a lake very close to the highest point along the pass. Most of the avalanche paths lie within the main section of Pass, which stretches about 20 km east and west from this point. This east-west orientation creates a funnelling effect along The Pass that results in winds that tend to blow either from the east or from the west. This funnelling effect means that direct wind loading through deposition of snow on lee slopes is less of an important a factor than for other avalanche prone areas. However, cross loading of slopes which promotes avalanche activity by slab formation does frequently occur and this can be enhanced by the outflow of katabatic winds from the Bear River Glacier (personal communication, Doug Wilson). There are two rivers that flow out from either side of The Pass, the Bear River to the west and Strohn Creek to the west. Both of these rivers widen into broad, previously glaciated valleys with distance away from the central of The Pass. This has the effect that some of the avalanche paths are set well back from the centre of the valley where the road runs. It will require larger magnitude avalanches from these paths to cause a hazard to the highway. 1.3 Avalanche frequency analysis along The Pass In order to assess the frequency aspect of avalanche risk along The Pass, the total number of avalanches that impacted the highway from 1982 to 2002 was calculated for each path along the highway. This information is displayed in Figure 2. This may be interpreted as indicating the number of highway impacting avalanches with a 20-year return period or less for each avalanche path along The Pass. 3 •113-626 Figure 2. Digital elevation model (DEM) of the main section of Bear Pass showing the avalanche risk based on the number of avalanches that impacted the highway for each path between 1982 and 2002. The Little Bears area, which is not shown, lies to the south west of this map. Information on the source of this map is found in Appendix C. The distribution of avalanche activity is not uniform along The Pass. Much of the avalanche activity occurs along only a few avalanche paths. In particular, three short paths in the centre of The Pass on the north side account for about 30% of the avalanche occurrences that impact the highway. Also on the north side of the highway, there is a concentration of high frequency avalanche paths at the eastern limit of The Pass. On the south side, two discrete avalanche paths produce frequent highway-impacting avalanches. The westernmost of these, the George Copper Path, produces avalanches due to serac fall from a hanging glacier. This type of avalanche may release at any time of the year and is not predictable from meteorological observations. 4 Bear Pass is known for its high magnitude avalanches. A large avalanche that sweeps across the highway may be particularly hazardous to a vehicle that happened to be in the path of the oncoming avalanche. However, the likelihood of a vehicle being hit directly by such an avalanche is far less than that of a vehicle running into the resulting deposit. For motorists, even a small amount of snow deposited on the highway from an avalanche can be hazardous, causing vehicles lose control and skid. Avalanche risk is principally determined by the frequency of avalanche events rather than by their magnitude because frequency can vary over several orders of magnitude. 1.4 Individual areas within The Pass The response of the most active avalanche paths along The Pass to changing meteorological conditions is well known. More useful for the forecaster is information about the less frequent avalanches that can be much harder to predict. During certain conditions, therefore, it might be advantageous to group avalanche paths in such a way that excludes certain high activity avalanche paths from the analysis. In this way, the forecast will not be biased on an abundance of information from only a few avalanche paths. Another reason for dividing The Pass up into individual areas is to group avalanche paths that share common features and are therefore likely to produce avalanches during a similar set of conditions. Four distinct areas within The Pass were identified on the basis of having avalanche paths of similar aspect, size and start zone elevation. Information was gleaned from the Ministry of Transportation database, including the avalanche atlas for Bear Pass (Province of British Columbia, 1982) and from personal communication with Ministry of Transportation 5 avalanche forecasters. The paths that comprise each of the individual areas are listed in Appendix A. The Little Bears area is the area closest to Stewart and contains five east-facing avalanche paths on a portion of the highway that runs north-south. These paths have vertical falls (the distance between the starting zone and the runout zone) of 1400-1900 m. They are characterised by broken slopes with shallow gullies and some immature vegetation is present in the paths. Return periods for avalanches which affect the highway range from 3-10 years. The Summit S luffs is a small area of short, low elevation avalanche paths located approximately in the middle of The Pass. This area faces south and is characterised by frequent lower magnitude avalanche occurrences. Vertical falls range from 160-205 m. The slopes forming these paths are open and of uniform aspect and angle. Since the highway crosses the lower part of these paths, several avalanches each year affect the highway from each of these paths. The South Aspect area contains the remainder of the avalanche paths on the north side of the highway that face south. Most of these paths are large with vertical falls ranging from 9001660 m. However, there are some smaller paths towards the west of The Pass. The return periods for these paths are much greater, which distinguishes them from the Summit Sluffs paths. Some of the larger paths are capable of producing large avalanches. Such avalanches occur infrequently, with return periods of up to 20 years. The smaller paths are vegetated in their runout zones and produce small 'tongues' of debris that flow onto the highway approximately once every five years. The North Aspect area contains the comparatively large avalanche paths on the south side of the highway. Vertical falls range from 980-1900 m and the paths are characterised by 6 complex, rocky terrain. Most of these avalanche paths have runout zones that are a significant distance from the highway, in some cases up to 1 km. This means that only large avalanches running down these paths will impact the highway. 7 C H A P T E R 2. REVIEW O F A V A L A N C H E PREDICTION M O D E L S 2.1 Introduction Conventional forecasting techniques as described by LaChapelle (1980) and McClung and Schaerer (1993, p. 164) may be complemented by the use of computer models in an avalanche forecasting operation (Fohn, 1998). Computer models provide a useful tool to verify a forecaster's opinion about the prevailing avalanche conditions. In addition, they allow the forecaster to 'test out' possible scenarios by assessing the impact of changes to one or more of the present weather variables. Two statistical models for forecasting avalanche hazard are considered in this study. These are: discriminant analysis and nearest neighbours analysis. The discriminant analysis yields more accurate classification rates than the method of the nearest neighbours. Thus, this method might be preferred for avalanche forecasting. However, in practice, discriminant analysis modes have not been widely adopted for day-to-day avalanche forecasting. Fohn (1998) argues that, "For avalanche forecasters „avalanche day"-probabilities are difficult to interpret in terms of avalanche danger." The output probability gives no indications as to the type, size or the location of avalanche activity that is likely. The method of the nearest neighbours addresses some of these issues by providing direct information about the meteorological conditions and avalanche occurrences for the time periods that most closely relate to conditions of the present. This allows forecasters to incorporate historical information, which they may otherwise have been unable to recall, into their own assessment of the avalanche hazard for the time period being forecasted. Nearest 8 neighbour models have proved to be popular tools to aid forecasters in making decisions about the avalanche hazard (Buser, 1989). Nearest neighbour models are currently used, amongst other locations, for regional-scale avalanche forecasting in Switzerland (Brabec and Meister, 2001) and for local-scale forecasting in Parsenn, Switzerland (Gassner et al, 2000), Snowbasin, Utah, U S A (Gassner et al, 2000) and in 5 areas in Scotland (Purves et al, 2002). 2.2 Determination of the most significant meteorological factors Atwater (1954) proposed a set of 10 contributory factors for avalanche hazard evaluation based on a decade of observations in the U.S.. Listed amongst these are: new-snow fall, water-equivalent new precipitation, type and intensity of new snow, wind action and temperature. Atwater also notes that there must be a reasonable base of old snow, "sufficient to cover ground obstructions so that it becomes easier for new snow to slide over them". Atwater's discussion of the character of the old snow, however, does not fully reflect the complexity of the natural system. He considers that the character of the old snow surface plays a conflicting role: "a loose snow surface promotes good cohesion with a fresh fall but allows deeper penetration of any avalanche that starts, while a crusted or windpacked surface means poor cohesion with the new snow but restricts the avalanche to the new layer." We now understand that surfaces that offer poor cohesion may result in weak layers that may be buried by subsequent falls of new snow. These weak layers may persist in a snowpack for some time, especially in areas which have a continental snow climate characterised by cold temperatures and low density snow (e.g. Jamieson et al, 2001). Perla (1970) performed a univariate analysis on data from Alta, Utah, U S A using Atwater's contributory factors as a template for his investigation. He considered 20 years of storm plots 9 from 1950 to 1969. By considering only data from storm cycles, when the risk of avalanching was high, avalanche probability values reported by Perla for the different variables are higher than if all the observation periods had been included in the analysis. Perla concluded that "avalanche hazard varies considerably with precipitation and wind direction, only slightly with temperature change, and seems to have no definite relationship to wind speed and snow settlement." This appears to indicate that as long as wind is present and blowing in a direction that is conducive to slab formation on avalanche-prone slopes, then how hard the wind is blowing does not affect the avalanche hazard in any way. It should be noted that all of Perla's slopes were of a similar, southern aspect. McClung and Tweedy (1993) presented a univariate study on factors important to numerical avalanche forecasting at Kootenay Pass, British Columbia, Canada. Their results were consistent with Perla's findings, with the exception that in their case, wind direction was also found to be a significant variable. In addition, McClung and Tweedy listed new-snow normal stress and new-snow density as significant indicators of avalanche conditions, variables that were not included in Atwater or Perla's studies. The new-snow normal stress is defined as the weight per unit area of new snow as measured on the new-snow board. The variance of this variable can be fully explained by considering the product of the depth of new snow and the density of the new snow. Information relating to the density of new snow is also included in the water-equivalent new precipitation, although this variable is able to measure rain events as well as snowfall. Therefore, it appears as though there are redundant variables in McClung and Tweedy's analysis, although in their concluding remarks they warn "multivariate analysis will probably reveal intercorrelations between precipitation variables and snow variables." (McClung and Tweedy, 1993, p 322.) 10 Despite the dangers of performing a univariate analysis on a group of highly correlated variables, this approach gives a good starting point for deciding which variables are the most suitable for inclusion in a numerical avalanche model. The philosophy can be summed up as: if it isn't significant in a univariate analysis, then it won't be significant in a multivariate analysis. Hence, this approach acts as a primary selection of variables that will then be used for further analysis in a multivariate sense. 2.3 The Discriminant Analysis use of multivariate discriminant analysis for forecasting avalanches from meteorological data was first presented by Judson and Erickson (1973) and has been applied (amongst others) to studies in the San Juan Mountains, U S A (Bovis, 1977), the Parsenn area, Switzerland (Obled and Good, 1980) and Kootenay Pass, Canada (McClung and Tweedy, 1994). The basic method employed involves deriving discriminant functions from a database of past meteorological and avalanche observations which are subsequently applied to a new time period to classify it as an avalanche or a non-avalanche period. In all cases, the output is a probability that the case in question belongs to a particular avalanche or non-avalanche group. 2.3.1 Discrimination between wet and dry-avalanche periods Some studies attempt to distinguish between wet-snow and dry-snow avalanches. Bovis (1977) made this distinction based on defining a wet slide season and a dry slide season. Different functions were constructed for each. A clear limitation of this approach is that the region may revert back into dry-avalanche conditions even after the wet slide season has begun, although Bovis does allow that a dry slide function could be used in the wet slide 11 season. Bovis' analysis was also performed for each individual season, rather than performing one analysis on all the seasons' data. This resulted in discriminant functions built using one season's data that did not apply well to another. Another consequence of this approach was that results for each season's forecast were based on a very small sample size, as low as n= 10 in some instances. McClung and Tweedy (1994) determine whether the wet or dry discriminant function should be used for each time period using a moisture index. This is calculated from data about the moisture content of the snow from the avalanche occurrences for that period. In their method, a primary discrimination is done using a separate function to designate the forecast period as either a wet-avalanche or dry-avalanche period. Depending on the outcome of the first discrimination, a second discrimination distinguishes between either wet and non-avalanche groups or dry and non-avalanche groups. Their method also allows the function to be biased by an expert's opinion using Bayesian probabilities. In this study, it is argued later that the snow-climate of Bear Pass is not conducive to discriminating between wet and dry-avalanche periods. Although making a wet / dry distinction is considered in this study, it is rejected in favour of the simpler avalanche / nonavalanche designation. 2.3.2 Linear V a r i a b l e selection in d i s c r i m i n a n t discriminant analysis reduces analysis a multivariate system to a one-dimensional discrimination by a method that maximises the principal axis in terms of groups separation (e.g. Manly, 1994, p. 108). As a result, this method is fairly robust to the problem of dimensionality. Nevertheless, several authors have presented ways to select the optimum 12 variable set for discriminant analysis. Judson and Erikson (1973) performed a univariate analysis similar to that of Perla (1970) to determine the variable set for the discriminant analysis. McClung and Tweedy selected variables for their discriminant analysis based on univariate F-ratios. The F-ratio is the ratio between the mean square (a measure of between group variance) and the mean square error (a measure of within group variance). Bovis (1977) used a stepwise discriminant analysis that adds variables to the group based on the Fratio until the point where addition of further variables no longer improves the discriminant function. However, stepwise discriminant analysis has been more recently criticised (Whitaker, 1997) for the tendency to calculate "horrendously wrong degrees of freedom", which results in inaccurate values for the F-ratios. This leads to the tendency to select different variable sets based on which variable was first selected into the analysis. Floyer and McClung (2003, in press) presented an arguably superior method for selecting variables for a discriminant analysis model by analysing the standardised canonical coefficients as well as including information from univariate F-ratios to decide on an optimum variable set. This so-called canonical discriminant analysis allows the contribution of the individual variables to the discriminant function to be assessed. A principal argument of this study is that rather than selecting a variable set for a discriminant analysis, the discriminant analysis itself may be used to select the optimum variable set. This variable set is then used in a nearest neighbours analysis, which is shown in this study to be sensitive to the choice of input variables. The method of variable selection adopted in this study is a combination of univariate analysis and canonical discriminant analysis, similar to that presented by Floyer and McClung. 13 2.4 Cluster Analysis 2.4.1 Buser's nearest neighbours model Forecasting models that make use of cluster techniques are commonly referred to as nearest neighbours models. Buser (1983) developed the first nearest neighbours model for use in avalanche forecasting, which operates by finding the cases in the database that most closely correspond to the conditions of the forecast period. The forecaster is then provided with detailed information about what happened during these similar days in the historical record. This framework has proved to be very popular, and has been used as a basis for a great number of forecasting models since (e.g. Buser, 1987; Kristensen and Larsson, 1994; McClung and Tweedy, 1994 and other recent model listed previously in Section 2.3). Buser employed an Euclidean distance metric for deteimining the distances to the neighbours, and selected the variables for his study based on a previous study by Obled and Good (1980). However, three criticisms can be levied at the method described by Buser. Firstly, the choice of the Euclidean distance metric may not be optimal when dealing with inter-correlated variables. Secondly, insufficient consideration was given to the choice of the number of neighbours as well as the threshold value of the number of neighbours required to be associated with avalanches for an avalanche period to be predicted. (Buser refers to this term as the discrimination value. It is argued here, that this nomenclature may be confused for either the classification value for the discriminant analysis or the canonical values of the discriminant functions and thus the term threshold value is used in this study.) Third, the variables were chosen on an ad-hoc basis. 14 Buser's model, called NXD, has been subsequently revised (Buser et al, 1987 and Gassner et al, 2000). The third version of the model, NXD3, included additional variables, such as "the weighted sum of new snow on the preceding three days", "the amount of new snow of the previous day" and the "wind direction and velocity of the previous day" (Buser et al, 1987, p 560). The rationale for the inclusion of these "elaborate variables" (Gassner et al, 2000, p 53) is that it allows memory effects to be included in the model. Buser reports an increase in the prediction rate for avalanche days as a result of including these variables, however notes a corresponding decrease in the prediction rate for non-avalanche days. This appears to have simply shifted the bias towards predicting avalanche rather than non-avalanche days rather than leading to a genuine improvement in the classification rate, which lends further weight to the need for a thorough study on the effect of varying the number and threshold value of the neighbours. 2.4.2 M o d e l s using the Euclidean distance metric When using the Euclidean distance metric, consideration needs to be given to the weighting and the scaling of the variables included in the model. In it's latest version, NXD2000 requires variable weights to be set by an expert forecaster with knowledge of the local area (Gassner et al, 2000). Although this introduces expert knowledge into the model, it is not necessarily mathematically optimised and also leads to the possibility of over-fitting the model. This may occur since adjustments that a human forecaster will make to variable weights will likely be as a result of events that happened most recently in the past. Biasing the model based on this most recent information may not lead to successful generalisation of the model for unknown future situations. 15 A better approach to the problem of variable selection is to mathematically find an optimum set of variables. Purves et al (2002) has developed a model that uses a genetic learning algorithm to search for the optimum set of variable weights using the classification rate as the metric for success. Although the variable weights are not set according to any particular physical rationale, the model is optimised according to the classification rate, which is, in itself a justifiable rational on which to base variable weights. In addition, the full use of the historical database is considered, making the model robust to generalisation for seasons of slightly different character. 2.4.3 M o d e l s using the Mahalanobis distance metric McClung and Tweedy (1994) argue that the Mahalanobis distance metric is a superior choice over the Euclidean metric. They claim that this distance metric "retains the assumption that the variables are correlated and distances are calculated in discriminant space in contrast to the implicit assumption of no correlation between variables if a Euclidean metric is chosen." (McClung and Tweedy, 1994, p 355.) On this basis, the Mahalanobis distance metric is also adopted for this study. A consequence of this is that variable weights need not be set. In fact, setting variable weights will not change the outcome of the model at all. (See Section 6.5.3 for a further discussion on this topic.) Therefore, we need a different method for selecting the variables that are to be included in the nearest neighbours analysis. As stated previously, the method adopted by this study will use the best variable group determined from a combination of the univariate analysis and canonical discriminant analysis. 16 2.5 Summary and conclusions The univariate analysis is felt to offer a good starting point for deciding which variables offer good discrimination between avalanche and non-avalanche periods. However, if a univariate analysis alone is used, redundancy will result when dealing with inter-correlated variables. By extending the variable selection process into the multivariate domain using canonical discriminant analysis, a more robust method for selecting variables is presented and used in this study. Variable selection is crucial to the performance of the nearest neighbours model, especially in view of the choice of the Mahalanobis distance metric. This study will explore both the benefits and limitations of using this distance metric. Furthermore, in order to understand how to optimise the performance of a nearest neighbours model, there is a need for a thorough study into the effects of the following criteria for nearest neighbours models: a) The effect of dimensionality, i.e. how many variables may be included in the analysis before model performance starts to drop. b) The effect of varying the number of neighbours considered by the model. c) The effect of varying the threshold number of neighbours that represents the number of neighbours that need to be associated with avalanche conditions for that period to be forecasted as an avalanche period. Finally, there is need for a comparison study between different nearest neighbour models using different distance metrics. In this study, a comparison between the Mahalanobis-metric nearest neighbours model presented in this study and the Euclidean-metric nearest 17 neighbours model, Cornice, used for avalanche prediction in Scotland aims to address this present shortfall. 18 C H A P T E R 3. PREPARATION O F T H E D A T A B A S E 3.1 Introduction The data used in this study consisted of manually recorded weather and snowpack measurements from November 1982 to April 2002. The manual weather station in The Pass was relocated in 1994, so the data set is a combination data set from two different locations. However, the old and the new stations were run concurrently for a period of two years to ensure that the recorded values for the new station were consistent with those of the old. In addition to the manual weather station, there are three remote weather stations that record maximum, minimum and present temperature variables as well as wind speed and wind direction. The Windy high and Windy Low recording sites have been in position since 1986 and the Disraeli recording site has been in position since 1991. A n analysis was performed to test the suitability of the data recorded by these remote weather stations in a discriminant analysis, however, the data were found not to be suitable. The reason is that a large number of records with errors are retained in the database. For example, if an anemometer becomes iced up with rime, the wind speed will be recorded as 0 km h" until the rime either melts or 1 the station is visited by a maintenance crew. In either way, it may be a number of days before the problem is remedied and in the meantime, erroneous recordings are introduced into the database. Recent data quality (since 2000) was found to be superior for the remote weather station, possibly as a result of upgrades to the instrumentation. However, for both the discriminant analysis and the nearest neighbours analysis, a long historical database is desired. This makes the models more robust to seasonal changes to weather and avalanche observations. A two-year record is considered to be insufficient to allow suitable model 19 generalisation for forecasting unknown future events. A possible technique fort the future, permitting the use of hourly remote weather data involves wavelet analysis of time series data. This is discussed in Section 7.1. 3.2 Weather and snowpack data A list of the variables that are recorded at the Endgoal manual weather station is shown in Table 1. It should be noted that foot penetration was used in place of ram penetration for this study. Ram penetration is arguably a more scientific measure of the surface penetrability of the snow, since a penetrometer with a consistent weight and drop height is used for each measurement. Foot penetration is measured as the depth of the forecaster's footprint with all the forecaster's weight applied statically to one foot. Foot penetration varies between forecasters due to differences in foot size and body weight. However, ram penetration has only been recorded at Bear Pass from 1996 through 2002. For the time periods where both variables were recorded, the Pearson correlation coefficient between ram penetration and foot penetration was found to be 0.82. Therefore the two are highly correlated and we accept that foot penetration may be used as a surrogate for ram penetration. 20 Table 1. Variables recorded at the Endgoal manual weather observation site. Variable Units precipitation type categorical snowfall rate cm h"' maximum temperature °C present temperature °C minimum temperature °C relative humidity % new snow board cm interval snow board cm storm snow board cm snow surface condition categorical snowpack depth cm water equivalent new precipitation mm foot penetration cm wind speed categorical wind direction categorical Memory effects were introduces into the model using lagged temperature variables. In this study, these lagged variables are simply the maximum, minimum and present temperature values for two 12-hour periods prior to the period of the avalanche observation. In addition, a present temperature trend variable was introduced, defined as the difference between the present temperature and the present temperature lagged by two 12-hour time periods. The database was thoroughly screened for incomplete records. These records were discarded from the analysis. Records have been kept at Bear Pass since 1976. However, significant data gaps were found in the observation record from the period 1976 to 1982. In addition, the average number of avalanches recorded since 1982 is substantially higher than the number recorded prior to 1982. It is likely that this difference is due not to the fact that there were less avalanche occurrences prior to 1982 but rather that the diligence in recording avalanche 21 occurrences has increased subsequent to this time. Thus, data prior to November 1982 were discarded from the database. Following Bovis (1976) and McClung and Tweedy (1994), the transformation X —> \n{X +1) was applied to the precipitation and snow board variables to counter a t t positive skewness in the distribution of these variables. The transformed variables show a very nearly Gaussian distribution with the exception of the snow boards which have a spike at zero, indicating a large number of days with no snow observed. To counter this problem, data records were eliminated where the total snowpack depth was zero, since, even though starting zones might contain some snow, avalanches would be unlikely to occur on such days. A problem with this action is that it also excludes days when no snow is recorded at the weather station but avalanche conditions were present on the higher slopes. Thus the prediction capability of the models will be limited for avalanches that occur during the very beginning or at the end of the season. However, with the zero snowpack depth days removed, the overall prediction rate increased by about 2%. This overall increase in model performance justifies the removal of these records, despite leading to a possible reduction in model performance at the margins of the avalanche season. Time periods of 12 hours were used for this study, since, for most days in the database, there is a morning and an afternoon weather observation (i.e. two records per day). More recently however, only the morning weather observations were available. Where only morning observations were recorded, these 24-hour records were used simply as if they had been recorded over a 12-hour period and no attempt was made to artificially convert them to 12hour periods (for example by dividing the new precipitation by 2). Thus with this assumption, the new record still indicates the change in conditions since the last weather 22 reading was taken. This is potentially a major limitation with the models, especially since it is likely that weather observations in The Pass will continue to be taken only in the morning and in theftiturethey may be phased out entirely in favour of automated systems. 3.3 Converting categorical data into numerical data The four variables recorded in the database in categorical form were required to be converted into numerical form in order for them to be included in the numerical analysis. Each of these variables was treated in a different way. 3.3.1 Wind speed Wind speed is recorded in categorical form as either calm, light, moderate, strong or extreme. Physical interpretation of each of these categories is simple, since each represents a range of numerical values. According to the observation guidelines and recording standards for weather, snowpack and avalanches (CAA, 1995), these categories should correspond to wind speeds of 0 km h" , 1-25 km h" , 26-40 km h" , 41-60 km h" and >60 km h" respectively. By 1 1 1 1 1 assuming that an observer will tend to assign a qualitative category based on a Gaussian distribution around the mean quantitative value of each category, this variable may be converted to numerical form by ascribing this mean value to each category. Values assigned to each category were: calm = 0; light = 13; moderate = 33; strong = 50; extreme = 70. 3.3.2 W i n d direction Wind direction is a difficult variable to deal with due to its radial distribution. It is a simple operation to assign each category (i.e. N , N E , E , etc..) the mean wind direction value in degrees for that category (i.e. 0°, 45°, 90°, e t c . ) . However, this operation leaves us with the 23 problem that although the north and north west directions represent fairly similar directions, their numerical values are totally different (0° and 270° respectively). The problem of this 'data jump' may be alleviated by applying a transformation using either sine or cosine functions. This, however, leads to a new problem, that of duplication. For example, if the transformation 1 + cos(#) is applied, where 0 is the wind direction in radians, then the transformed value for west is the same as the transformed value for east (both take the value of 1). North and south values, however, are independent. Similarly, the transformation 1 + sin(0) leads to duplication of north and south but retains independence for east and west values. It is possible to run the analysis using both the sine and cosine transformations, since the information contained within these two transformations is mutually independent. Another approach is to analyse the distribution of wind directions and choose a transformation that minimises the consequence of duplicating certain wind directions. Chelani et al (2002) suggest a transformation function of the form 1 + sin(# + <j>), where (f> takes the value orthogonal to the wind direction with the greatest degree of variance between itself and its diametrical opposite (in their case <j> = /r/4 ). The most common wind direction recorded at Bear Pass was easterly, with the second most common wind direction being westerly. This distribution reflects the east-west orientation of The Pass, which creates a wind funnelling effect. Therefore we require <j> to be orthogonal to this, with result that ^ = 0. For Bear Pass, therefore, the best transform to use for the wind direction data is 1 + sin(#). 24 3.3.3 Precipitation type There is no physical-based numerical interpretation in the case of the precipitation type. Therefore, it was necessary to use a rank-based transformation scheme for converting the categories to numerical values. The parameter for determining the rank of each category was chosen as the avalanche probability. This was calculated as the fraction of time periods that resulted in avalanche conditions for all cases where that category was observed (see Section 4.3 and Figure 6). The values that were assigned were: no precipitation = 0; rain = 1; snow = 2; freezing rain = 3; mixed rain and snow = 4. 3.3.4 S n o w surface condition Values were assigned to the different classes of surface snow condition using a similar method described for the precipitation type above. The values that were assigned were: surface hoar = 1; old snow = 2; ice crust = 3; other = 4; new snow = 5; wind crust = 6; slush = 7. 3.4 Avalanche occurrence data Each recorded avalanche occurrence was assigned to one of the 12-hour weather observation time periods according to the date and time of the occurrence. Following McClung and Tweedy (1994), occurrences falling between OOOOh midnight and 1200h midday were grouped with the morning weather observations; the remainder were grouped with afternoon records. Out of 3200 weather observations, approximately 20% were found to have avalanche occurrences associated with them. 25 3.4.1 C a n a d i a n avalanche size classification The scheme Canadian avalanche size-classification scheme was used throughout this study for describing the size of avalanche occurrences. This scheme is an absolute scale, which allows comparisons of avalanche size in terms of their potential destructiveness to be made between different avalanche paths. Table 2 lists the Canadian size-classification scheme. In addition to the classes listed, half-sizes, from 1.5 to 4.5 are recorded in the database. Table 2. Canadian avalanche size classification scheme with typical factors. From McClung and Schaerer (1981) and CAA (1995). Size Avalanche destructive potential Typical mass Typical path length Typical impact pressures 1 Relatively harmless to people. <10t 10m 1 kPa 2 Could bury, injure or kill a person. 10 t 100 m lOkPa 3 Could bury and destroy a car, damage a truck, destroy a small building or break a few trees. 10 t 1000 m 100 kPa 4 Could destroy a railway car, large truck, several buildings, or a forest up to 4 hectares. 10 t 2000 m 500 kPa 5 Largest snow avalanche known. Could destroy a village or a forest of 40 hectares. 10 t 3000 m 1000 3.4.2 2 3 4 s kPa A v a l a n c h e activity index Following McClung and Tweedy (1993), an avalanche activity index (AAI) was defined as the sum of avalanche sizes within each time period. Defined in this way, the AAI is an index of both magnitude and frequency of avalanching. 3.4.3 The Moisture index moisture content of each avalanche is recorded in the database as either: 1, dry; 2, moist; 3, wet. In order to be able to distinguish between periods characterised by wet-avalanches 26 and periods characterised by dry-avalanches, a moisture index was calculated for each time period. Following McClung and Tweedy (1994), this was defined as the average value of moisture content for all avalanches recorded in a time period. Periods with a moisture index of <1.5 were categorised as dry-avalanche periods; periods with a moisture index >1.5 were categorised as wet-avalanche periods. 3.4.4 Filtering of a v a l a n c h e s b y size Avalanche occurrences were filtered by size for each individual avalanche path. For each path, a minimum size of avalanche that was considered hazardous to the highway was set using the forecasters' expert knowledge of the paths affecting The Pass. Filtering by size in this way does not necessarily improve the numerical forecasting prediction rates. However, it provides a scheme compatible with the forecaster's prediction objectives in that the forecasted avalanche periods will be those likely to produce avalanches of sufficient magnitude to impact the highway. The threshold sizes for avalanches to be included in the analysis for each path are given in Appendix A . 3.4.5 Filtering of avalanches by trigger type Avalanche occurrences were also filtered according to the avalanche trigger type. After much consideration, a scheme was used that included all occurrences from the Summit Sluffs area and only naturally triggered avalanches for all other paths. The justification for this is that control work on the Summit Sluffs paths is generally conducted either after paths have already produced avalanches or when avalanches on the paths are considered imminent. In either case, the control work is conducted using roadside case charges that may be detonated in most weather conditions. Explosive control work in the remainder of The Pass is done 27 either from an artillery position or using helicopter control missions. These operations are often carried out during weather periods that would not normally produce natural avalanches. This is especially true for helicopter control missions where clear weather is preferred for flights. If the controlled avalanches were included in the record, they would skew the model toward predicting avalanches on fine weather days. 28 C H A P T E R 4. UNIVARIATE ANALYSIS O F VARIABLES 4.1 Introduction The first aspect considered in this section of the analysis was the overall snow climate of the region. This was done by analysing mean annual values for three variables: snow depth, temperature and new-snow density. It is argued that although it is not a clear-cut case, Bear Pass may be best characterised as a maritime snow-climate according to the snow-climate classification described by Armstrong and Armstrong (1987). Following studies by Atwater (1954), Perla (1970) and McClung and Tweedy (1993), an analysis of individual meteorological and snowpack variables from Bear Pass was undertaken. A n analysis of this type is an important starting point for determining which factors are the most useful for avalanche forecasting in Bear Pass. This analysis also aided in assigning numerical values to two of the categorical variables in Section 3.3. The results were found to show many similarities with previous studies. Notable exceptions are the effects of wind speed and wind direction, which appear not to be important factors for Bear Pass. 4.2 Snow-climate of Bear Pass Armstrong and Armstrong (1987) classified snow-climates into three categories according to average air temperatures, average snow depth and average new-snow density measurements. These categories are: maritime, transitional and continental. Table 3 shows the values reported by Armstrong and Armstrong (1987) for a mix of recording stations for 15 years of records in the U.S.. Based on this classification scheme, the snow-climate for Bear Pass can 29 be classified as maritime, although some aspects of a transitional snow-climate are observed. A discussion of the variables use to assess the snow climate is presented below. Table 3. Snow-climate zones for the United States based on 15 seasons of data. A season is November-April. (After Armstrong and Armstrong, 1987.) Maritime Transitional Continental Average snow depth (cm) 186 170 113 Average air temperature (°C) -1 -5 -7 120 85 70 Average new snow density (kg m") 3 4.2.1 Snowpack depth Bear Pass has winter snowfall rates that average a total of 8.5 m snow per year . The mean 1 average snowpack depth was found to be 172 cm. Comparison with Table 3 indicates that this is closest to the transitional snow-climate type. There is a reasonable degree of variability in the annual average snowpack depth. Figure 3 shows the mean annual snowpack depth for 20 seasons between 1982 and 2002. The season of 1991-92 had the greatest average snowpack depth of 222 cm while the season of 1983-84 had the lowest average snowpack depth of 119 cm. The standard deviation for the sample was calculated as 28 cm. By comparison with Kootenay Pass, this annual snowfall is relatively low. McClung and Tweedy (1993) state values of 12 m and 186 cm respectively for the total annual snow and average annual snowpack depth at Kootenay Pass. Despite this high snowfall, Kootenay Pass is classified as a transitional snow-climate. It may be surmised, therefore, that rather than This assumes a winter season runningfromNovember to April. This value is stated here to be consistent with the snow climate classification scheme described by Armstrong and Armstrong (1987). (See Section 4.2.) If all annual snowfall is included, the total snowfall was found to be 9.7 m snow per year. 1 30 Bear Pass experiencing lower than average snowfall, Kootenay Pass experiences exceptionally high snowfall. This is possibly due to it's higher elevation at 1775 m. 4.2.2 Air temperature The mean average air temperature was found to be -2.6 °C. This is closest to the maritime snow-climate zone. Figure 4 shows that the range of average annual temperatures was between 0.7 °C (1991-92) and -4.8 °C (2001-02). The standard deviation was calculated as 1.7 °C. 4.2.3 Snow density Snow density is not a recorded parameter at Bear Pass. Instead, the average foot penetration was analysed and by comparison with data from Kootenay Pass, where both foot penetration and snow density are both recorded, an average snow density was inferred. The average foot penetration was found to be 28 cm at Bear Pass and 36 cm at Kootenay Pass. It is likely that the lower average foot penetration at Bear Pass stems from a combination of a lower average annual snowfall and a greater snow density, due to higher average air temperatures. With no firm data available, all it is possible to say about the average snow density is that it is likely to be greater than that of Kootenay Pass, which McClung and Tweedy (1993) reported as 92 kg m" . 3 31 250 CO ••a0 0 CD CM CO CO CO CO 00 00 m CO CD 4 00 0 0 CD CO 00 CD in CO CO 00 CD CO CO CD 00 CD 0 0 0 C0 D CD 0 0 CD o CD CD CO CD CO CD 00 00 T— CD CD o CO CO CM CD CD CD CO CO CD CD CM CD CO CD CD CO CD CO LO CD CD 4 CO CD CD CD CD in CO CD t-CD CT) CO CD CT) 00 CT) CD t^CD CD CD CD CD CO CD CO O O o CN CM. O CT) CD CD o CM o o CM o o — t o CM CM o o Season Figure 3. Average snowpack depth at the Endgoal / Little Entrance weather observation site for 20 seasons (November - April) between 1982 and 2002. The overall mean is shown as a dashed line. Season CO 0 0 CD co CD CM CO 0 0 CD _6 .1 0 0 CD IO 0 0 CD 4 CO h- CO CD 0 0 0 0 0 OO 0 CD CD CD CD in 0 0 0 0 CD CD CD CO CD l-i. 0 0 CD 00 0 0 CD O CD CD CD 0 0 CD CD CD CM CD CD CO CD CD CD CD to CD CD CD CD CD O CD CD CD CD CM CD CD CO CD CD CD CD 4 in CD CD CD CD CO CD CD 0 0 CD CD CD CD CD 1^- CD 00 O) CD CD O O O CM CD CD CD x — O o CM O O O CM CM O O CM T— o o CM 1 Figure 4. Average air temperature measured at the Endgoal / Little Entrance weather observation site for 20 seasons (November - April) between 1982 and 2002. The overall mean is shown as a dashed line. 32 4.2.4 Discussion and summary The snow-climate at Bear Pass shows similarities to both the maritime and transitional categories as defined by Armstrong and Armstrong (1987). The average snowpack depth was found to be closest to the transitional snow-climate whereas the average air temperature was found to be closest to the maritime snow-climate. The average snow density lies between that of transitional and maritime. The proximity of Bear Pass to the Pacific Ocean would lead one to expect the area to belong to the maritime snow-climate zone. There are two possible reasons why the snow-climate at Bear Pass is not clearly defined as maritime and has some characteristics of a transitional snow-climate zone. Firstly, the shape of the coastline is complicated in this region, meaning that despite having a coastal port, Stewart, which is the western-most terminus of Bear Pass, is approximately 200 km east of the first land mass that air travelling east across the Pacific would encounter. Thus, it is likely that there is a rain shadow effect that reduces the amount of precipitation that reaches Bear Pass, with much of the precipitation being released over the Ketchikan Peninsula and the Misty Fjords National Monument. Secondly, Armstrong and Armstrong defined the classification scheme using data collected from the U.S. but no stations from Alaska were considered. Hence, all of the data used to generate the classification scheme is from stations well to the south of Bear Pass. The northern location of Bear Pass means that, despite the warming effect of the nearby areas of water, temperatures will be colder than further south and precipitation rates will likely be lower as a result. 33 It is argued that the best classification for the snow-climate of Bear Pass is maritime. This is in view of the consideration of the recorded values described above, the location of Bear Pass near the Pacific Ocean and the fact that the original classification scheme was determined using more southerly recording stations. Bear Pass does show some aspects of a transitional snow-climate, however, which will certainly influence the type and characteristics of avalanche occurrences at Bear Pass. 4.3 Avalanche probability plots Avalanche probability plots can be useful for determining how the avalanche hazard varies for each variable. A probability plot was generated for each of the variables listed in Table 1. In order to generate the plots, each variable was categorised according to the distribution of its values. This distribution was determined by inspection of the scatter plots. A typical scatter plot is show in Figure 5. 100 -, 0 . 50 100 150 200 , 250 300 Snowpack depth (cm) Figure 5. Scatter plot for avalanche activity against snowpack depth. 34 Once the variables had been categorised, the probability of avalanching for each category was determined. This was calculated as the fraction of time periods for which the avalanche activity index (AAI - see Section 3.4.2) was greater or equal to a certain level. Three levels of A A I were considered: AAI>1; AAI>3; AAI>10. The probability plots are shown in Figures 5 to 19. A discussion of the individual variables that includes a discussion on the analysis of variance is found in Section 4.5. 4.4 Analysis of variance In addition to the probability plots, one way analysis of variance (ANOVA) was used to test whether the population means differ between two or more groups (e.g. Manly, 1994, p. 44). The F-ratio is the ratio between the mean square (a measure of between group variance) and the mean square error (a measure of within group variance). The F-ratio will approximate unity for variables that do not discriminate between groups. A n assumption for this test is that variables are normally distributed. Group means for avalanche and non-avalanche groups as well as the F-ratio are shown in Table 4. This table also lists values for the lagged temperature variables and the present temperature trend variable. 35 —.—AAI>1 —.—AAI>1 __x__AAI>3 __x__AAI>3 0.8 ...+... AAI>10 ...+... AAI>10 JQ CO S3 0.6 2 a. <a sz 0.4 o c J2 co > < 0.2 R S FR R/S -15 Precipitation type -10 -5 0 5 10 15 Maximum temperature (°C) Figure 6. Probability plot for precipitation type for three levels of AAI. Data are plotted for the following categories: 0 = no precipitation; R = rain; S = snow; FR =freezingrain; R/S = mixed rain and snow. Figure 8. Probability plot for maximum temperature for three levels of AAI. Data points are the midpoints of the categories used to determine avalanche probability 1 -AAI>1 0.8 -AAI>1 -x--AAI>3 __x__AAI>3 0.8 ..+... AAI>10 ...+... AAI>10 JD n 0.6 2 CL <D sz 0.4 o J? (0 > < 1 0.2 -15 2. Snowfall rate (cm h") 1 Figure 7. Probability plot for snowfall rate for three levels of AAI. -10 -5 0 5 10 15 Present temperature (°C) Figure 9. Probability plot for present temperature for three levels of AAI. Data points are the midpoints of the categories used to determine avalanche probability. 36 Minimum temperature (°C) New snow board (cm) Figure rO. Probability plot for minimum temperature for three levels of AAI. Data points are the midpoints of the categories used to determine avalanche probability. Figure 12. Probability plot for the new snow board for three levels of AAI. Data points are the midpoints of the categories used to determine avalanche probability with an extra category for when the new snow board depth = 0. 1 -AAI>1 -AAI>1 __*__AAI>3 ,0.8 0.8 ...+... AAI>10 JO CD __x__AAI>3 AAI>10 „ -° 0.16 4o 0.4 c _CD CD > < 0.2 + 0 20 40 60 80 100 Relati\© humidity (%) Figure 11. Probability plot for relative humidity for three levels of AAI. TJata points are the midpoints of the categories used to determine the avalanche probability. 10 20 30 40 Interval snow board (cm) Figure 13. Probability plot for the interval snow board for three levels of AAI. Data points are the midpoints of the categories used to determine avalanche probability with an extra category for when the interval snow board depth = 0. 37 Figure 14. Probability plot for the storm snow board for three levels of AAI. JData points are the midpoints of the categories used to determine avalanche probability. . 0.8 Figure 16. Probability plot for the snowpack depth for three levels of AAI. Data points are the midpoints of the categories used to determine avalanche probability with an extra category for when the snowboard depth = 0. AAI>1 — __*__AAl2:3 0.8 ...+... AAI>10 co XI o 0.6 I o 0.4 c sz CJ c > -X--AAI>3 • -+.-- AAI>10 06 CD JO CO < AAI>1 JO CO > 0.2 < SH OS ICE OTH NS WC SL Snow surface condition Figure, 15. Probability plot for the snow surface condition for three levels of AAI. Data are plotted for the following categories: SH = surface hoar; OS = old snow; ICE = ice crusts; OTH = other; NS = new snow; WC = wind crust; SL = slush. 0.2 0 1-10 11-20 21-30 >30 Water-equivalent new precipitation (mm) Figure 17. Probability plot for the water-equivalent new precipitation for three levels of AAI. Data were categorised according to the groups shown on the abscissa. 38 0 10 20 30 40 50 60 70 Foot penetration (cm) Cm N NE E S E S SW W NW Wind direction Figure 18. Probability plot foot penetration for three levels of AAI. Data points are the midpoints of the categories used to determine avalanche probability. Figure 20. Probability plot for wind direction for three levels of AAI. Data were categorised according to the groups shown on the abscissa (Cm = Calm, i.e. wind speed = 0). -AAI>1 0.8 __X__AAI>3 ...+... AAI>10 0.6 co o 0.4 c sz _CD CO > < 0.2 x- x- Calm Light x-Moderate -X Strong Wind speed Figure 19. Probability plot for wind speed for three levels of AAI. Data were categorised according to the groups shown on the abscissa. 39 4.5 Analysis of individual variables 4.5.1 Precipitation type Figure 6 shows that precipitation type does not have a strong influence on avalanche probability. There is a slight increase in avalanche probability for freezing rain and mixed rain and snow, however, both of these categories contain only small- samples (6 and % respectivefy) so the probabilityestimates,are not reliable for Ihesecategpries. The avalanche probability is lower when there is no precipitation, which is expected but it should be noted that the probability does not drop to zero. Since the avalanche occurrences have been filtered so that only explosive triggered avalanche tracks in the Summit Sluffs area are included in the analysis, some of this probability, especially for AAI>10, derives from natural avalanche releases after a storm, possibly during periods of warming. Table 4 shows that the groups are not well separated according to precipitation type, with an F-ratio of 3.2. This gives a probability that group separation is significant of 0.0832, which is rejected at a significance level of 5% (a = 0.05). As a result, precipitation type is dropped from the analysis. 4.5.2 S n o w f a l l rate This shows an almost linear trend for all three curves (Figure 7), with avalanche probability increasing dramatically as the snowfall rate increases. This indicates that this is an important variable for avalanche prediction. This is confirmed in Table 4, with an F-ratio of 34.0, which is significant at a = 0.05 with a p-value of O.001. 40 4.5.3 The Maximum temperature probability plot for maximum temperature (Figure 8) shows that there is a peak in avalanche probability at 1.5 °C for AAI>3 and AAI>10. For AAI>1 this peak is at a higher temperature of 3 °C. This may reflect that there are more smaller avalanches that result from wet-snow releases that are triggered by solar warming events. Table 4 shows that this variable is significant for group selection with an F-ratio of 27.4. 4.5.4 Present temperature Figure 9 shows a peak in the avalanche probability for present temperature at 0.5 °C for all levels of AAI and for AAI>1 and AAI>3 shows a secondary peak at 4.5 °C. This bimodal distribution could indicate the two dominant types of avalanche activity, dry slab releases and wet snow avalanches respectively. Table 4 confirms that this variable is significant for group selection with a F-ratio of 64.4. 4.5.5 The Minimum temperature minimum temperature shows a peak in avalanche probability at 0 °C (Figure 10). This is different to the group mean given by the results of the A N O V A (Table 4), which gives the group mean for avalanche periods as -3.12 °C. This difference may be due to the fact that there are fewer avalanche periods when the temperature is around 0 °C but when minimum temperatures are around this value, there is a greater likelihood of encountering avalanche conditions. This variable is found to be significant for group selection. 41 4.5.6 Relative humidity Figure 11 shows that the avalanche probability increases slightly as relative humidity increases. This rising trend is weak. Group separation using this variable is, however, significant at a = 0.05 with a p-value of O.001 (Table 4). 4.5.7 New snow board The new snow variable shows strong trends for all three curves in Figure 12. The curve for AAI>10 has a flatter response until 25 cm of new snow and then increases dramatically for 35 cm of new snow. This may reflect the fact that widespread avalanche activity that includes higher magnitude avalanches requires a higher threshold of new snow. The results of the A N O V A in Table 4 indicate that this variable is significant for determining group selection. 4.5.8 Interval s n o w b o a r d The probability plot for the interval snow board (Figure 13) is very similar to that of the new snow board. This is expected, since the interval snow board will often measure the same snow depth as the new snow board. The situation where this might not be the case is during storm observations when the interval board may be cleared between standard observations. The results of the A N O V A in Table 4 indicate that this variable is significant for determining group selection and although it is suspected that this variable may be redundant, it is retained at this stage in the analysis. 4.5.9 Storm snow board The trend for storm snow (Figure 14) shows a rising trend with an inflection point between 40 and 60 cm of storm snow. McClung and Tweedy (1993) reported a similar trend in the 42 total storm snow variable, where an inflection point for AAI>10 was noted between 20 and 40 cm of total new snow. The origin of this inflection point is somewhat unclear; it could have been introduced through recording practices. It is possible that for big storms, the snow is allowed to accumulate for longer on the storm board, whereas for smaller, less intensive periods of snowfall the storm board is likely to be wiped clear of snow. This variable is significant for determining group selection, with an F-ratio of 109.2 (Table 4). 4.5.10 Surface s n o w condition Figure 15 shows that the avalanche probability is highest when a wind crust is present on the surface or there is slush. These two surface snow conditions are indicative of conditions suitable for slab avalanche formation and wet snow avalanches respectively. The A N O V A , however, shows that the variable gives only a small degree of group separation, which is not significant for determining group selection (Table 4). This variable is therefore dropped from the analysis. 4.5.11 S n o w p a c k d e p t h The avalanche probability appears to increase in a linear fashion with the depth of the snowpack (Figure 16). The physical justification for this is could be that a higher concentration of storm events occur during the central part of the winter season, when the depth of the snowpack is greatest. Atwater (1954) proposed that a minimum snowpack depth of 2 feet (60 cm) was required before avalanches could occur in order to cover obstacles to allow snow to slide. Figure 16 shows that, although less frequent, avalanches do occur in Bear Pass when the snowpack is less than 60 cm and no sharp drop in the avalanche probability is noted below this value. The snowpack depth measured at the observation site 43 will almost certainly be lower than that in the start-zones, which explains why this threshold is not seen in the data. The A N O V A results (Table 4) confirm that snowpack depth gives good group separation and is significant for determining group membership. 4.5.12 Water-equivalent n e w precipitation Figure 17 shows that the water-equivalent new precipitation shows a strong rising trend for all levels of AAI. This is confirmed by the results of the A N O V A (Table 4), which give an Fratio of 198.0 which has a significance of >0.001. This variable is a strong indicator of whether avalanches are likely to occur. 4.5.13 F o o t penetration Foot penetration (Figure 18) shows a rising trend with avalanche probability for all levels of AAI. Table 4 shows that the variable gives good group separation with a group mean for avalanche periods of 38.5 cm and a group mean for non-avalanche periods of 26.0 cm. The F-ratio is 89.1 which is significant at a = 0.05. 4.5.14 W i n d s p e e d The very weak trend evident in Figure 19 indicates a decrease in avalanche probability with increasing wind speed. This trend is confirmed by the results of the A N O V A in Table 4, which shows the group mean for the avalanche group to be 3.23 whereas the group mean for the non-avalanche group is 2.84. This is counter intuitive according to what we know about the effects of wind loading on the avalanche hazard (for example McClung and Schaerer, 1993, p 27). Two explanations for this are possible. It either means that in all cases when a 44 strong wind blows, it blows from a direction that causes wind scouring on all avalanche slopes or that the wind gauge is faulty or poorly located. Given that most slope aspects are represented along The Pass, the former is highly unlikely. It is therefore suspected that the wind recordings may not be accurate and we reject this variable from the analysis. 4.5.15 W i n d direction The curve in Figure 20 shows a flat response except when the wind blows from the north west. Confidence in this result is not high, however, since this wind direction was only reported for 7 cases in the database. According to the results of the A N O V A (Table 4), this variable is not significant in determining the avalanche hazard at Bear Pass. 4.5.16 L a g g e d temperature variables Table 4 shows that both the lagged present temperature variable and the lagged minimum temperature variable are significant for determining group membership, with F-ratios of 25.5 and 32.3 respectively. The lagged maximum temperature variable was not found to be significant, which suggests that the maximum temperature is useful for predicting avalanches that release due to solar warming during the day. These instances are short-lived and apply only to that particular day. For this reason, the lagged maximum temperature variable is not important for determining avalanche occurrences. The present temperature trend variable was found to be significant for determining group membership, with an F-ratio of 18.8. 45 Table 4. Group means, F-ratio and p-values from the ANOVA shown for each variable. For the group means, no indicates the group of non-avalanche periods and yes indicates the group of avalanche periods. Sample sizes for all variables was 865. Group means Variable no yes 1.92 2.31 3.2 0.0832 0.46 0.57 34.0 <0.001 -1.63 0.32 27.4 <0.001 -4.31 -1.32 64.4 <0.001 -6.32 -3.12 74.6 <0.001 relative humidity (%) 85.2 90.2 34.7 <0.001 new snow board (cm) 0.62 1.26 86.2 <0.001 interval snow board (cm) 0.60 1.11 60.4 O.001 storm snow board (cm) 0.99 1.92 109.2 <0.001 snow surface condition 4.03 4.52 3.1 0.0912 snowpack depth (cm) 156.5 182.7 65.4 <0.001 water equivalent new precipitation (mm) 2.08 6.90 198.0 <0.001 foot penetration (cm) 26.0 38.5 89.1 <0.001 wind speed (k 3.23 2.84 3.8 0.0515 1.65 1.72 3.7 0.0753 -6.45 -3.99 32.3 O.001 -4.40 -2.13 25.5 <0.001 -1.60 -0.63 4.0 0.0461 present temperature trend (°C per time period) -0.01 0.94 18.8 <0.001 precipitation type snowfall rate (cm h") 1 maximum temperature present temperature (°C) (°C) minimum temperature (°C) hr") 1 wind direction minimum temperature lagged present temperature lagged Summary and (°C) (°C) maximum temperature lagged 4.6 F-ratio p-value (°C) conclusions From the above analysis, significant variables are defined. Although most variables were found to be statistically significant for separating periods into avalanche and non-avalanche periods, it is clear that some variables have a higher ability to discriminate between the groups than others. Variables pertaining to the amount of snowfall, such as snowfall rate, water equivalent new snow and new snow board depth show a strong ability to discriminate 46 between groups. The temperature variables are also good indicators for avalanche activity although the trend for temperature variables is not linear. Foot penetration and snowpack depth give good group separation. Humidity permits group separation but is not as significant as some of the other variables. Variables that were not found to be significant for discriminating between avalanche and non-avalanche groups were: precipitation type; snow surface condition; wind speed; wind direction and the lagged maximum temperature variable. These variables are rejected from the analysis at this stage. The finding that wind speed and wind direction are not significant variables at Bear Pass is maybe surprising, since accumulation and wind loading on lee slopes can lead to slab formation. The most likely explanation is that the location of the anemometer is not in a position where meaningful wind measurements may be made. The Pass trends east-west and the weather station is located towards the east end of The Pass in the valley bottom. The low elevation of the anemometer means that recorded winds are not representative of start zone conditions. In addition, an anemometer in this position should record easterly or westerly winds relatively accurately but record northerly or southerly winds with less accuracy due to the parallel shape of the valley which creates a funnelling effect for the wind. Most of the paths face either north or south, meaning that the winds most important for avalanche formation are not recorded accurately. 47 C H A P T E R 5. DISCRIMINANT ANALYSIS 5.1 Introduction A canonical discriminant analysis (CDA) was performed using the significant variables from the previous analysis. This method is similar to that presented by Bovis (1977), Obled and Good (1980) and McClung and Tweedy (1994), although in this case a greater emphasis is placed on the coefficients of the canonical variates. The technique involves an analysis of the variables in multivariate space, which is necessary for a proper understanding of the interactions between the variables. Separate analyses were performed for the two-group case, distinguishing between avalanche and non-avalanche periods and for the three-group case, distinguishing between wet, dry and non-avalanche periods. The two-group case was analysed for both The Pass as a whole and for the individual areas within The Pass. The analysis for the individual areas within The Pass was not performed for the three-group case, as this would have resulted in unacceptably low sample sizes. 5.2 Method of analysis CDA seeks Z = A X +A X to X X 2 2 find +...A X , m m a linear combination of the predictor variables, that exhibits the largest difference between group means relative to the within-group variance. Following Devroye et al (1996, p 46), for the two group case this amounts to maximising the function: 48 (A X, - A X ) T T 2 2 G(A) = XI(A X -A X,.) T T 2 (1) y where X , , X are the sample means of groups one and two respectively (avalanche and non2 avalanche) and A = (A ,A ,...,A ), T x 2 m the coefficients for the predictor variables. The subscript /' refers to the groups and the subscript j refers to the n cases within each of the groups. The denominator of Equation 1 is equivalent to the product of the scatter matrices for the two groups. If the variance-covariance matrices for each group are homogeneous, then we may replace this term with the pooled sample variance-covariance matrix, given by: S= 1 n +n x 2 -2 where C, is the variance-covariance matrix for each of the groups. Once the coefficients A ,...,A l m for the m variables have been calculated, the value of Z is compared to that of the group means and a probability of group membership calculated based on the relative distance of this point from each of the group means. For C D A it is advantageous to use group sizes that are approximately equal. This ensures that the a priori probability of forecasting an avalanche period is the same as that for a nonavalanche period, which should result in parity between classification rates for avalanche and non-avalanche groups. In each analysis, equal group size was achieved through eliminating records from the group with the largest number of samples using a random number generator. In addition to equalising the group sizes, a random sample of 35% of the time periods was initially withheld from the analysis and used as an independent testing set. A l l classification rates reported in this study are based on values achieved using an independent testing set. 49 5.3 Two-group discrimination The analysis was first performed for the two-group case of discriminating between avalanche and non-avalanche time periods. Sample sizes for this analysis were 865 and 465 for the learning and testing data sets respectively. In order to test the assumption that the variance-covariance matrices are homogeneous across groups, a value for Wilks' lambda (Schatzoff, 1966) was calculated. Wilks' Lambda gives information on multivariate group separation with a value close to zero indicating that the two groups are well separated and a value of unity indicating no group separation. The corresponding F-statistic and the likelihood ratio statistic were also calculated for testing the null hypothesis that the variance-covariance matrices are equivalent. Since we have only two groups, these values were reliable (Wilkinson, 1990, p. 270). The results gave a value for Wilks' Lambda of 0.74, indicating that the group means are not well separated. The value of the F-statistic was 28.0 with a p-value of 0.001 indicating that the group separation that is present, however, is significant. Therefore, we may retain the null hypothesis that the variance-covariance matrices are similar for the groups. 5.3.1 Canonical coefficients The standardised canonical coefficients give information on the relative importance of each variable in the discriminant function. These coefficients are listed in Table 5 for the twogroup classification scheme. The total canonical correlation was 0.51 indicating that 51% of the variance of the original variables is expressed by the canonical variates. Also shown in Table 5 are the canonical scores for the group means. These indicate the position of the groups relative to the canonical variates. 50 Table 5. CDA results using all variables for discrimination between avalanche and non-avalanche periods. Canonical coefficients standardised by within-group variance are shown. Also shown are the canonical scores for group means. Variable present temperature lagged -0.69 present temperature 0.68 water-equivalent new precipitation 0.59 interval snow board -0.55 snowpack depth 0.53 minimum temperature 0.52 new snow board 0.43 foot penetration 0.43 maximum temperature -0.28 minimum temperature lagged 0.24 present temperature trend 0.19 storm snow board snowfall rate relative humidity Group Non-Avalanche Avalanche 5.3.2 Standardised canonical coefficient -0.15 0.14 -0.004 Canonical scores for group means -0.58 0.6 Interactions between variables Many of the variables were found to be highly correlated with each other. Table 6 shows the total Pearson correlation matrix for all variables used in the C D A . When variables are highly correlated, standardised canonical scores can become unstable (Whitaker, 1997). This is because the scores are derived from the amount of unique variance explained by each variable. In the case of two highly correlated variables the scores may 'flip' between preferring one variable or the other due to a relatively small change in the model parameters 51 (for example removing a third variable from the analysis that also has some correlation with these two variables). One example of this interaction is between the new snow board and storm snow board variables. These variables are highly correlated with each other (0.96). The new snow board has a canonical score of 0.43 indicating that it is directly correlated with the avalanche group. The interval snow board, however, which often records exactly the same values as the new snow board, has a canonical score of -0.55 and is therefore inversely correlated with the avalanche group. A possible explanation for this is that the portion of the variance unique to the interval snow board arises from periods when there is less snow on the interval snow board than the new snow board (presumably because it has been cleared as additional weather observations are taken during a storm). Therefore, the canonical score for the interval snow board is negative. For the two snow boards, the interval board was rejected for use, since readings of this board are subject to variations during different weather conditions. There is also an apparent two-way link between the present temperature and the present temperature lagged by 24 hours. There is also a high correlation between these variables (0.85). Again from an analysis of the group means, one would expect both the present temperature and the lagged present temperature to show a direct correlation with the avalanche group of time periods. The canonical analysis reveals that the best discrimination occurs when there is a high present temperature and a lower present temperature 24 hours ago. This corresponds physically to a rising temperature trend. Many of the temperature variables were removed from the analysis due to problems with collinearity. This does not necessarily mean that these variables had no ability to discriminate 52 between avalanche and non-avalanche periods. Instead, it is likely that most of their discrimination power was already accounted for by another, slightly stronger variable. present temp lagged present temperature water eq. new precipitation interval snow board snowpack depth minimum temperature new snow board foot penetration maximum temperature maximum temp lagged present temp trend storm snow board relative humidity snowfall rate storm snow board present temp trend maximum temp lagged maximum temperature foot penetration new snow board minimum temperature snowpack depth interval snow board water-equivalent new precipitation present temperature present temp lagged Table 6. Total Pearson correlation matrix for the variables used in the CDA for separation into avalanche and non-avalanche groups. 1.00 0.85 1.00 0.06 0.16 1.00 -0.15 -0.04 0.54 -0.01 0.02 -0.06 -0.01 0.18 1.00 -0.01 0.00 1.00 -0.15 -0.03 0.56 0.96 0.00 -0.01 -0.27 -0.21 0.35 0.57 0.02 -0.17 0.61 0.90 0.92 0.07 -0.16 0.05 0.89 -0.16 -0.35 0.92 0.83 0.00 -0.22 0.02 0.83 -0.23 -0.40 0.92 1.00 -0.21 0.21 0.23 0.26 0.06 0.09 0.28 0.15 0.05 -0.21 -0.07 0.05 0.50 0.74 -0.03 0.11 0.78 0.63 -0.10 -0.20 0.28 1.00 snowfall rate -0.17 -0.14 0.35 0.44 -0.15 -0.08 0.44 0.34 -0.21 -0.22 0.06 0.38 1.00 relative humidity -0.08 -0.05 0.31 0.35 -0.07 0.17 0.28 -0.07 -0.11 0.07 0.42 0.31 0.90 0.94 1.00 1.00 0.36 1.00 1.00 1.00 1.00 53 5.3.3 O p t i m a l variables An optimum variable set was sought comprising variables that offered good ability to discriminate between avalanche and non-avalanche periods. The analysis was repeated a number of times to test the effect of altering the variable set used in the analysis. It was found that the dominant variables performed consistently well in the analysis despite the removal or inclusion of other variables. Only these were retained in the final variable selection. Table 7 shows the standardised canonical coefficients for the best variable group for the avalanche / non-avalanche discrimination. These variables were: the water equivalent new precipitation, present temperature, snowpack depth, foot penetration and the present temperature trend. Table 7. CDA results for the best variable group for discrimination between avalanche and non-avalanche periods. Standardised canonical coefficients for the best variable group are shown as well as canonical scores for group means. Variable Standardised canonical coefficient water equivalent new precipitation 0.57 present temperature 0.52 snowpack depth 0.44 foot penetration 0.43 present temperature trend 0.19 Group Non-Avalanche Avalanche Canonical scores for group means -0.55 0.57 54 5.3.4 Discussion of variables It is expected that increased precipitation should allow discrimination between avalanche and non-avalanche periods, since loading of avalanche prone slopes by new snow (or rain) is frequently sufficient to trigger avalanches on these slopes (e.g. McClung and Schaerer, 1993, p. 152). From the analysis, the water equivalent new precipitation (recorded using a precipitation gauge) was found to be a superior variable in discriminating between avalanche and non-avalanche periods than the new snow board. This is probably due to in part to the increased precision of the precipitation gauge over measuring the height of snow on the new snow board and in part due to the fact that the precipitation gauge will record rain events, which can be an avalanche trigger. The positive canonical score for the present temperature variable indicates a tendency for avalanches to occur during warmer conditions. In a relatively warm, maritime climate, avalanches due to the formation of depth hoar over prolonged cold weather periods are usually not frequent enough to affect an analysis such as the present one. Instead, avalanches appear to occur most frequently during warmer periods, characterised by precipitation-laden storms, which can include rain or snow. The analysis shows that more avalanches occur when the snowpack depth is large. Probably the most important reason for this correlation is the fact that most avalanches occur during midwinter and late winter storms, at which time the snowpack depth is greatest. In other words, this is simply the main snowfall season. Foot penetration has a positive canonical score, indicating the greater the penetration of the foot into the snow, the more likely avalanches are to occur. This could simply indicate the presence of new snow, since a large amount of freshly fallen snow will give a high foot 55 penetration reading. However, foot penetration is only weakly correlated with the water equivalent new precipitation (0.35). It could be that if significant settlement occurs after a storm cycle, then the snowpack becomes relatively stable and a low foot penetration reading results. Conversely, if settlement does not occur, then instabilities in the snowpack will remain and foot penetration readings will be high. Although the canonical score for the temperature trend variable is relatively low (0.19), it is not highly correlated with the other temperature variables (i.e. explains unique variance) and it performed consistently (i.e. did not behave in an unstable way when altering the variable group) making this variable important in the analysis. Once again, this variable indicates a tendency for avalanches in Bear Pass to occur during periods of warming temperatures. 5.3.5 The Classification rates discriminant function built using the variables above were used to classify each day into either an avalanche day or a non-avalanche day. The results using the testing data set give classification rates of 73% for avalanche periods and 72% for non-avalanche periods. The prediction rates for the testing data (as previously stated, a random sample of 35% of the records) are only marginally lower than those for the learning data set (75% for both avalanche and non-avalanche cases), which indicates that the model is not too closely fitted to the learning data set. This gives confidence in the prediction ability of the model in a fieldtesting environment. Figure 21 illustrates how the calculated avalanche probability differs from the observed avalanche occurrences for a small subset of the data from 1 January to 31 March 1995. st st Mostly, the calculated probabilities fit well to the observed avalanche activity. There are, 56 however, periods where the model fails to predict an avalanche occurrence, most notably at the start of this 'season', when avalanches occurred with calculated avalanche probabilities between 0.2 and 0.4. 1 i I" 0.8 o 0.6 o 0.4 (0 to £ 0.2 - Q_ c No < 0 0 60 30 90 Days Figure 21. Avalanche probability vs. avalanche occurrences for 1-Jan to 31-Mar 1995. Avalanche probabilities (+) are calculated from the discriminant analysis. Avalanche occurrences ( •) are grouped by whether (yes) or not (no) there were any avalanche occurrences during that time period 5.3.6 Analysis of individual areas In addition to testing all paths, the method was also applied to the individual areas within The Pass (described in Section 1.4). The variables that showed consistently high discrimination power across all of the groups were: the water equivalent new precipitation, the snowpack depth, and the foot penetration. The minimum temperature was found to be an important variable for the Summit Sluffs area whereas the present temperature was found to be important for the three other areas. The new snow board depth was found to be important for both the North Aspect area and the Little Bears area. 57 The importance of the minimum temperatures in the Summit Sluffs area is possibly due to the tendency of the area to produce small avalanches (all avalanche occurrences from the Summit Sluffs paths were included since even a small avalanche has the ability to impact the highway). Colder minimum temperatures may indicate clear weather periods where sun can play a role in triggering avalanches. The importance of the new snow board variable for the North Aspect and Little Bears areas is possibly due to the fact that these areas contain large avalanche paths that require comparatively large avalanches to impact the highway. Avalanches of this magnitude normally occur during times of unusually high snowfall. The results of the classification by individual area are given in Table 5. All of the areas, with the exception of the North Aspect area, perform better than the all-area model. The North Aspect area is the most heterogeneous area containing avalanche paths in complicated terrain that produce high magnitude but infrequent avalanches. It is this type of avalanche that is the hardest to predict. The Summit Sluffs area performs best with an overall classification rate of 79%. This area represents the most homogeneous area within The Pass and produces more easily predictable avalanches. Table 8. Classification matrix (using the test sample) for discrimination into avalanche and non-avalanche groups for both individual areas and for the whole pass. Area Sample Size Classification Rates Avalanche NonOverall avalanche Little Bears 130 78% 77% 78% Summit Sluffs 171 78% 80% 79% South Aspect 273 77% 71% 74% North Aspect 224 64% 76% 70% All Areas 465 73% 72% 72% 58 5.4 Three-group discrimination In addition to performing the discriminant analysis to distinguish between avalanche and non-avalanche periods, the avalanche periods were further classified as either wet-avalanche or dry-avalanche periods according to their moisture index (see Section 3.4.2) to give a threegroup classification scheme. The rational for this is that wet-snow avalanches and dry-snow avalanches occur in response to different sets of meteorological conditions. Wet-snow avalanches will tend to release during warmer periods, particularly during times of increased solar activity and are not always associated with new snowfall. Dry-snow avalanches usually occur during colder conditions, although are often associated with a rising temperature trend, and frequently result from overloading of weak-layers during periods of intense snowfall. There are two possible approaches to the problem of three-group discrimination. The first is the 'direct' approach, which attempts to classify the period directly into one of the three categories. The second, 'indirect' approach first categorises the period as either a wet or a dry-avalanche period and then performs a second discrimination to determine the final outcome. 5.4.1 Direct three-group classification The linear discriminant function described in Section 5.2 can be generalised to the case of the multi-class problem (e.g. Duda et al, 2001, p 121). The decision of group membership is still based on finding which group's mean value (in multivariate space) is closest to the values for the forecast period. For a three-group classification, we can define a principle axis and a secondary, orthogonal axis of group separation. Therefore, we can graphically display the results by plotting the 59 canonical 'scores', calculated by projecting each sample's values onto these axes. A canonical scores plot of this type for the three-group discrimination at Bear Pass is shown in Figure 22. Avalanche Groups Non o , Wet Dry -4.0 -1.8 0.4 2.6 FACTOR(1) 4.8 Figure 22. Canonical scores plot for the direct wet / dry / non-avalanche discrimination. Factor (1) represents the principle axis of group separation, factor (2) represents the secondary axis of group separation. Bivariate confidence ellipses represent one standard deviation awayfromthe mean for each factor. The results of the direct three-group discrimination are given in Table 9. The sample sizes for the analysis were 362 for the learning set and 180 for the testing set. The overall model performance was 64%. Classification rates between the groups were not even, however, with the dry-avalanche group showing the best classification rate of 73%, the wet-avalanche group performing the next best with 65% and the non-avalanche group performing the worst with 53% of cases correctly identified. 60 Table 9. Classification matrix for the testing data set for the direct three-group classification into wet / dry / non-avalanche groups. No. cases classified as: Group 5.4.2 Wet Dry Non Percentage of cases correctly classified Wet 39 10 11 65% Dry 5 44 11 73% Non 15 13 32 53% Total 59 67 54 64% Indirect three-group classification An alternative approach to distinguishing between three groups follows that of McClung and Tweedy (1994). There are two rounds of classification, each one involving a two-group discrimination. Firstly, a discrimination between wet and dry-avalanche periods is performed. If the result shows conditions are found to be wet, then the second round of discrimination decides whether a wet-avalanche period or a non-avalanche period is likely. If conditions are found to be dry, the second round of discrimination decides whether a dryavalanche or a non-avalanche period is likely. The advantage to this approach is that a different discriminant function is used for the different group classifications. A n optimum variable set may be determined for each of the three possible discriminations. If a wet or a dry-avalanche is successfully predicted in the first round, the discriminant function in the second round is optimised to discriminate between this type of avalanche and a non-avalanche period. There is a price to pay for this, however. If the case in the first round is misclassified, the consequence is that the 61 discriminant function used in the second round is not designed for discriminating that type of avalanche. Table 10 shows the results of the indirect three-group classification scheme for the two rounds of discrimination. For the second round, classification rates are shown for both the case of a successful and an unsuccessful first round (wet / dry) discrimination. Good discrimination is achieved between wet and dry-avalanche periods, with a classification rate of 80%. The classification rates for the wet / non-avalanche groups and the dry / nonavalanche groups are also higher than for the simple case of the avalanche / non-avalanche classification, with rates of 74% and 77% observed respectively. The optimum variable sets for the different discriminant functions are shown in Table 11. Table 10. Results for the indirect three-group classification scheme. Percentage of cases correctly classified Discrimination First round Wet / Dry 80% Wet/Non 74% Dry / Non 77% Wet /Non 59% Dry / Non 59% Second round First round correctly classified First round incorrectly classified Overall classification 72% When we take into account the cost of misclassification in the first round, a different picture emerges. The overall classification rate takes into account the proportion of cases misclassified in the first round and then assumes that these would have been classified using 62 the incorrect discriminant function in the second round. The total classification rate was found to be 72%, which is the same as for the two-group discrimination between avalanche and non-avalanche periods. Table 11. Optimum variable sets for wet / dry, wet / non and dry / non discriminations. For each discrimination, the first group has a positive group mean canonical score (implying positive correlation to the variable canonical coefficients) and the second group has a negative mean canonical score (implying inverse correlation to the variable canonical coefficients). Discrimination Wet / Dry Wet/Non -0.54 storm snow board maximum temperature 0.49 new storm board 0.45 foot penetration -0.29 new snow board -0.18 minimum temperature 0.69 snowpack depth 0.63 water-equivalent precipitation Dry / Non Standardised canonical coefficient Variable new 0.48 foot penetration 0.24 present temperature 0.16 present temperature -0.79 foot penetration 0.47 new snow board 0.33 snowpack depth 0.24 storm snow board 0.23 If the classification rate of the first round of discrimination could in some way be increased with a more accurate determination of whether wet or dry-avalanche conditions are likely, then this scheme will give improvements to the overall model classification rates. McClung and Tweedy (1994) suggest a framework for allowing an expert forecaster to bias the results of the first round discrimination. This is done by applying Bayes formula (Equation 6, p 72), assigning the forecaster's weight as an a priori probability. 63 McClung and Tweedy state a wet / dry-avalanche classification rate of 87%. This is higher than the value of 80% found in this analysis. It appears therefore, that avalanches are more readily stratified in terms of moisture content at Kootenay Pass than at Bear Pass. The warmer temperatures at Bear Pass mean that wet-snow avalanches are more likely to occur at any time during the season. Another problem lies in the reporting of the moisture content of avalanche occurrences. This is frequently misreported in the database (personal communication, Doug Wilson) due to the large differences in elevation between the highway and the start zones. This may result in a dry slab avalanche release appearing as a wetavalanche when it is deposited on the highway, where conditions are substantially warmer. For these reasons, discriminating between wet and dry-avalanche periods for numerical forecasting is not advantageous at Bear Pass. 5.5 Conclusions Based on the discriminant analysis, a classification scheme with a performance of at least 72% may be constructed for Bear Pass. In addition, optimum variable sets for four possible classification schemes were presented. The most important variables across all of these schemes were found to be: water equivalent new precipitation, snowpack depth and foot penetration and temperature. Whether the maximum, present or minimum was the most important temperature variable was dependent on the individual classification scheme. In addition, the storm snow board was found to be an important variable when dry-avalanches were considered. The instabilities in the canonical variates that occur when subtracting or adding variables from the model clearly illustrate one of the principal dangers of determining the principal 64 variables in an analysis of this type. Wilkinson (1990, p. 335) refers to piecemeal selection of variables (specifically stepwise discriminant analysis, which was not used in this analysis) as a 'fishing expedition' and advises against the indiscriminate use of this kind of 'artificial intelligence'. Therefore, any variables selected through the use of this statistical method, especially when deviations from a Gaussian distribution are suspected, must be rationalised by a physical justification to be included. Also, collinearity must be analysed to eliminate redundant variables. There is good justification in dividing the Pass into smaller areas, as prediction rates for all areas except the North Aspect area improved when this division was made. These improvements are achieved despite a reduction in sample size for the individual areas. Some differences in the principal variables were noted between the different areas but most of the principal variables were consistent among the areas. Three-group discrimination was performed to distinguish between wet, dry and nonavalanche periods. The indirect approach, with two rounds of discrimination, was found to be significantly better than the direct approach with only one round of discrimination. The direct approach gave an overall classification rate of 64% whereas the indirect approach gave an overall classification rate of 72%. It might be possible to improve on this classification rate if expert knowledge is used to assist with the forecast. Performing separate analyses on wet and dry-avalanche periods, however, only improves classification rates if it is known with certainty that a particular time period belongs to either a wet or a dry-avalanche group. The principle reason for carrying out a discriminant analysis of this kind may in fact not relate to its ability to forecast a probability that avalanches will occur. This is on account of the fact that a skilled forecaster with good knowledge of the local area will be able to assess 65 the avalanche hazard to a higher degree of accuracy than a numerical model. What humans are less apt to do, however, is to recall information from the past in a structured and calculated way. This leads us to the method of the nearest neighbours, which is able to perform this task in a reliable way using all of the available data in the database. However, variable selection is difficult to achieve using a nearest neighbours approach. The most important aspect of the analysis above, therefore, was in determining which variables to use as input variables for the nearest neighbours model and how they are correlated. 66 C H A P T E R 6. N E A R E S T NEIGHBOURS ANALYSIS 6.1 Introduction The optimum variables determined from the canonical discriminant analysis are used in a nearest neighbours model. The basic idea behind the method of the nearest neighbours is conceptually simple. A specified number of time periods are sought that most closely relate to conditions of the period being forecasted for. These 'neighbours' are then queried as to whether or not avalanches occurred during these time periods and if a certain percentage of the periods (or higher) had avalanche occurrences, then an avalanche period is forecasted; if not, a non-avalanche period is forecasted. A schematic representation of this method is shown in Figure 23. An experienced avalanche forecaster will have more detailed knowledge of the local area than the few parameters recorded in the weather and snowpack observations. For example, only a small amount of snowfall may have been reported by the weather station. The forecaster, however, knows that if there is a northerly wind then there will be additional accumulation on a certain south-facing slope, which will make this particular area more prone to avalanching than the model predicts. Alternatively a forecaster may know that a certain slope is safe, despite a model predicting a high avalanche risk for that area, due to previous control work that has already mitigated the avalanche danger on that slope. In these instances, the forecaster is taking the information from the model and fine-tuning it by adding information based on their own knowledge of the area. Therefore, a model that is useful to the forecaster is one where as much information as possible is available about the 67 nature of the likely avalanche hazard. This information is available from the output of the nearest neighbours model. Figure 23. Schematic representation of the nearest neighbours to a time period in 3D space. Current time period is shown as a solid circle, time periods corresponding to avalanche periods are asterisks and time periods corresponding to non-avalanche periods are open circles. The 10 nearest neighbours in 3D space are joined to the time period with a solid line. In this case, 6 out of 10 of the neighbours are associated with avalanche conditions. Therefore, avalanche conditions would be considered likely for the time period being tested. 6.2 Distance Metric A primary consideration in a multivariate nearest neighbours analysis is selecting an appropriate distance metric, so that it is possible to decide which neighbours are the closest to 68 the case being considered. The simplest multivariate distance metric is the Euclidean, with the distance between the i * and j * period with m variables defined as: m This metric does not take into intercorrelations between the variables into account. In addition, all variable carry equal weights regardless of their importance to predicting avalanches. A n enhancement is to include a weighting scheme and / or a scaling scheme to ensure that the most important variables for forecasting avalanche are given the highest priority in the model and to prevent variables with larger values from dominating the calculated distance. The scaled and weighted Euclidean distance metric may be written: m D =^CMX -X ) 2 y 2 ik jk (4) where C is a matrix of variable scaling coefficients and X is a matrix of variable weighting coefficients. A better approach, which is the method used in this study, is to use the Mahalanobis distance metric, defined as: where, X = (X ,X ,...X ) T l 2 m and S is the pooled variance-covariance matrix. Including information on the variance-covariance matrix is advantageous in two ways. Firstly differences in scale of the different co-ordinate axes are automatically corrected for. Secondly, the correlations between the variables are accounted for. McClung and Tweedy (1994) point out that the Mahalanobis distance metric reduces to the Euclidean metric when S is set as the identity matrix, which is the case when all variables are uncorrelated. Clearly 69 implicitly making this assumption for a set of meteorological and snow variables that are correlated with each other is not justifiable in any sense. Given the advantages to using the Mahalanobis distance metric over the Euclidean distance metric, it is maybe surprising how many studies have made use of the Euclidean distance metric. This maybe stems from the fact that it is easier to assign and modify weights to individual variables when using the Euclidean distance metric (if this is done for the Mahalanobis distances then the changes will be nullified through changes to the variance-covariance matrix). This gives an apparent feeling of 'control' to the model, allowing forecasters to 'tweak' the model if they feel a certain variable should be weighted more than another. This, however, leads to the possibility of adapting a model to account for features that occur in the short term, which may then reduce the applicability of the model to general situations in the long term. Although by using the Mahalanobis distance metric, problems with scaling and intercorrelations between variables are avoided, the problem of dimensionality still remains. This is the problem whereby adding additional variables into the analysis can lead to a reduction in model performance. This problem is discussed further in Section 6.5.3. 6.3 Selection criteria for determining group membership There are two parameters that need to be set as selection criteria for determining group membership. These are: the number of neighbours considered, k and the threshold number of neighbours, kx, which represents the minimum number of neighbours out of the possible k neighbours that need to show signs of avalanching in order for the forecast period to be classified as an avalanche period. This may be expressed as a ratio, however, determining a suitable value for k is still important for maximising the performance of the model. 70 In choosing a value for k, it is necessary to consider the following. We want to use a sufficiently large value of k to obtain a reliable estimate for our prediction. This is sometimes referred to as the bias (e.g. Hand and Vinciotti, 2003). On the other hand, we want the k nearest neighbours to be as close as possible to the original in multivariate space, so that the probability of group membership at that particular location is accurately reflected by the location of the predictor neighbours. If the neighbours are too far removed from the case being predicted, which will be the case for the more distant neighbours if too high a value for k is chosen, the decision boundaries will be imprecise. This can be expressed as ensuring that we do not lose too much of the variance of the predictor system by defining our neighbour spheres to be too large. Enas and Choi (1986) suggest that the choice of k in the order of 9/8 NL 1 /ft to N L . As we have approximately 1300 samples when the group size is equal and approximately 3200 samples when all cases are included, we expect optimum values for the choice of k to lie approximately within the range of 6 to 15 and 8 to 21 respectively. 6.4 Learning and testing data sets As for the discriminant analysis, the data were divided up into learning and testing data sets so that an independent test could be performed to judge the model performance. The testing data set comprised 35% of the total number of cases for each model run. To ensure totally random designation of cases into the learning and testing data sets, a different seed for the random number generator was used in an incremental fashion every time a model run was performed. Independent testing of the model is important to prevent over-reporting of classification rates. If independent testing is not carried out, then the cases that are used to predict group 71 membership are also used to determine the probability density function that determines group membership. Clearly this will lead to an overestimate of the classification rate. 6.5 Results of the nearest neighbours analysis Three sets of results are presented for the nearest neighbours analysis. Results are presented for groups of equal size, then for groups of unequal size where all cases are retained in the analysis and finally for groups divided up according to wet-avalanche and dry-avalanche categories. In addition, the data were entered into a different nearest neighbours model, Cornice and a comparison analysis performed. The results confirm that by setting an appropriate threshold level, there is no difference in performance between analysing groups of equal and unequal size. Dividing the groups into wet-avalanche and dry-avalanche categories improves classification rates only if an expert forecaster can exceed the ability of the model to determine whether a particular time period belongs to a wet or a dry-avalanche period. Comparisons with the Cornice model yield very similar results despite a different approach in calculating the distances between the neighbours in multivariate space. The individual methods and results for these analyses are discussed below. 6.5.1 G r o u p s of equal size The nearest neighbour method can be viewed as an attempt to estimate the a posteriori probability of group membership from a set of samples (Duda et al, 2001, p. 184). We can express this in terms of Bayes rule, which states that the a posteriori probability of group membership, P(G \ X) is: t 72 where P(G ) is the a priori probability of group membership and P{X \G ) is the likelihood t t ratio of conditional probabilities. For equal group sizes, P(G ) = P(G ) = 0.5. When this is X the case, the a priori term, P(G ), t 2 drops out of the equation above and the probability of group membership depends only on the likelihood ratio of conditional probabilities, P(X \G ). The probability density function that defines the likelihood ratio is determined by t the N L learning samples, which are grouped in some organised fashion, i.e. they are not random. Cases that clearly belong to one particular group will have almost all of their neighbours belonging to that group. Cases for which group membership is less clear will have some neighbours belonging to one group and some neighbours belonging to the other. When there are equal numbers in each group, the optimum threshold for deciding which group a case belongs to is simply a majority decision. The case is assigned to the group that has the greatest number of neighbours belonging to it. It makes sense, in this case, to use an odd value for k, since this avoids the problem of what to do in the event of a tie. Figure 24 illustrates the optimum value for the threshold value kj for equal group sizes and k=T5. 73 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Threshold number of neighbours, k (k=15) T Figure 24. The effect of varying the threshold number of neighbours, k for groups of even size. k=15. The two solid curves represent 1) the percentage of avalanche periods correctly classified ( • ) and 2) the percentage of non-avalanche periods correctly classified ( + ). The dashed curve represents the overall classification rate for avalanche and non-avalanche periods. T Following the discussion in Section 6.3 regarding choosing an optimum value for k, the effect on the classification rate was analysed for a range of different values of k=l to k=10T. (Note, k=l is the special case of the single nearest neighbour.) The result of varying k in this way is shown in Figure 25. The optimum value of k was found to be 21, with a classification rate of 71.5%. Note that the value of k is slightly higher than was predicted in Section 6.3 from an analysis of the work by Enas and Choi (1986). It is possible that this difference is due to the use of the Mahalanobis distance metric as opposed to the Euclidean distance metric used by Enas and Choi. 74 0 10 20 30 40 50 60 70 80 90 100 Number of neighbours, k Figure 25. The effect of varying the number of neighbours, k on the classification rate. For all cases, the value of k was taken as (k+l)/2. The shape of the curve indicates that the penalty for underestimating k is greater than for overestimating k. T 6.5.2 Groups of unequal size When group sizes are uneven, the probability of group membership will be biased towards the group with the most number of cases if the majority decision is used as the threshold to determine group membership. To counter this, the threshold value k j may be varied so that the importance of having an avalanche neighbour is regarded by the model as greater than having a non-avalanche neighbour. Figure 26 shows the results of the analysis for groups of uneven size. The optimum criteria for avalanche group membership was level was found to be when avalanches occurred in 7 out of 30 periods (k-r=7 and k=30), which gave a classification rate of 71.5%. This model performance is the same as for when the model was run for groups of equal size. 75 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Threshold number of neighbours, k (k=30) T Figure 26. The effect of varying the threshold number of neighbours, k for groups of uneven size. k=30. The two solid curves represent 1) the percentage of avalanche periods correctly classified ( • ) and 2) the percentage of non-avalanche periods correctly classified (+ ). The dashed curve represents the overall classification rate for avalanche and non-avalanche periods. T 6.5.3 The T h e effect of dimensionality effect of dimensionality (e.g. Duda et al, 2001, p. 107) is a potential problem that applies to the method of the nearest neighbours. For a given sample size, the classification rate will tend to suffer if the number of variables included becomes too high. There are two reasons for this. Firstly, the number of learning samples required increases exponentially with the complexity of the system. Secondly, variables that are less important in determining group membership may mask the effects of those that are. For a system described by 20 variables with only two of these significant for determining group membership, then by including all 20 variables, the variance of the two significant variables only accounts for a small 76 percentage of the total variance of the system. This limitation is not widely reported in avalanche forecasting literature. One possible solution to this problem is to assign weights based on the importance of the variable. This is not possible, however, if the Mahalanobis distance metric is used, since any change in the weight of a particular variable will incur corresponding changes to the variance-covariance matrix. The second and only viable solution in the case of the Mahalanobis distance metric, is to ensure that the optimum group of variables for the system has been selected. The variable groups for the nearest neighbours model runs were selected based on the C D A , however the information on the standardised canonical coefficients is not carried over to the nearest neighbours analysis. To test the effect of increasing the dimensionality of the nearest neighbours analysis, variables were added according to their importance. This importance was assessed through a combination of the F-ratio (Section 4.4) and the standardised canonical coefficients (Section 5.3.1). The classification rate was recorded for each successive variable added to the analysis. The results are shown in Figure 27, which shows that the optimum classification rate is achieved when 5 to 6 variables are used in the analysis. Any more than this and the performance of the model diminished substantially. A prediction rate of 67.9% was observed when the best 8 variables were used and of 62.8% when the best 12 variables were used. Clearly, this is a major limitation that needs to be considered when designing a nearest neighbours forecasting model. 77 72 No. of variables Figure 27. The effect of dimensionality on the classification rate for the avalanche / non-avalanche group classification. Groups of uneven size were used with k=30 and kr=7. 6.5.4 W e t a n d dry-avalanche classification The nearest neighbours model was also run for separate analyses to distinguish between wet and non-avalanche periods and between dry and non-avalanche periods. The variables used in this section of the analysis were those determined using CDA and are listed in Table 11. An increase in the classification rates were observed if the records were partitioned into wet and dry-avalanche periods. The classification rate for wet / non-avalanche periods was found to be 73% and the classification rate for dry / non-avalanche periods was found to be 75%. It was shown, however, that the cost of misclassifying a wet avalanche period as a dry avalanche period or vice versa is high. In each case, classification rate was shown to drop to 58%. It was shown that if we rely on the discriminant analysis to decide whether a day belongs to a wet or a dry-avalanche period, then the overall classification rate is the same as 78 if we had not tried to partition the cases into wet / dry avalanche groups. If an expert forecaster is able to more accurately distinguish between wet / dry avalanche periods then it would be advantageous to partition the database in this way. 6.6 Comparison with Cornice model 6.6.1 Rationale The nearest neighbours model was compared with the 'Cornice' nearest neighbour model developed for use by the Scottish Avalanche Information Service (SAIS) and presently in use in five avalanche area in Scotland. This was done in order to compare two nearest neighbour models that have different mathematical frameworks on the same data set. In this way, it may be determined which, if any, is the superior method for nearest neighbours forecasting. 6.6.2 Differences in m o d e of operation Cornice uses a weighted Euclidean distance metric as opposed to the Mahalanobis distance metric used by the model presented in this study. In addition, Cornice makes use of a total of four periods of data when making a forecast, the forecast period itself and three previous periods weighted by successively decreasing values. As a result, if three previous time periods are not present in the database then a forecast cannot be made for that period. The number of neighbours that the model considers, k is fixed at 10, and the threshold number of avalanches, kj, is fixed at 3. Both these numbers are ad-hoc values, reflecting the rule of thumb considered by the forecasters in Scotland to represent an avalanche period. Only the case of prediction into avalanche and non-avalanche periods was considered. Since kj is set at 0.3k and this cannot be varied, the whole data set was imported into Cornice. In 79 the previous analysis, parity between predicting avalanche and non-avalanche days was achieved for uneven group sizes using the whole data set and similar proportional values for k and kr. This approach was also successful for Cornice and good similarity between classification rates for avalanche and non-avalanche periods was achieved. Once the meteorological data and avalanche occurrences were loaded into Cornice, the next stage was to set the scales and weights for the model. Cornice has an automatic function that performs each of these tasks, which makes these tasks quick to perform. It also allows rapid updating of the model as new observations are added to the database. The scaling function eliminates the effect that some variables are recorded with a scale that has a greater magnitude than others. The weighting function uses a genetic learning algorithm to converge on the optimum linear combination of weights that maximise the classification rate of the model. The reader is referred to Purves et al (2002) for more details on this algorithm. Convergence, in practice, is rapid, taking about a minute to compute the optimum set of weights. Cornice has a 'batch forecast' function that calculates a forecast for each period in the database and compares the results based on the observed avalanche occurrences for the periods. This is equivalent to determining a classification matrix. A criticism of this function in Cornice, however, is that it performs this batch forecast on the learning data set, meaning that this is not an independent test of model performance. As a result, model performance is overstated by the batch forecast routine. In order to obtain an independent value for the batch forecast, 4 seasons from the database (approximately 20%) were withheld and painstakingly entered and an individual forecast 80 made for each period. The results were compared with the avalanche occurrence record and tallied to form a classification matrix. 6.6.3 C o m p a r i s o n of results Table 12 compares the results of the comparison between Cornice and the Mahalanobismetric model. Classification rates for Cornice are listed for both the built-in batch testing function and the independent test of model performance. In general, the results of the two models are very similar. When comparing between the independent test results, the Mahalanobis-metric model was found to perform 0.4% better than Cornice. The batch testing routine in Cornice was found to overestimate the model performance by about 1% as a result of testing the same periods used to construct the model. Table 12. Comparison of classification rates between nearest neighbours models. Model Classification rate yes no overall Cornice (batch forecast) 72.5 72.0 72.2 Cornice (independent test) 70.7 71.1 71.1 Mahalanobis-metric 71.5 71.5 71.5 6.7 Conclusions The nearest neighbours method was found to give a prediction rate of 71.5%. This classification rate is for all areas within The Pass and reflects testing using an independent data set. This is only 0.5% lower than the prediction rate achieved through the discriminant analysis. 81 No difference was found in the prediction rates when avalanche and non-avalanche groups of even or uneven size were used. However, when uneven group sizes were used, the threshold value, kj, for the number of avalanche-neighbours required for the period to be forecasted as an avalanche period needed to be adjusted so that an even classification rate between avalanche and non-avalanche periods could be achieved. The nearest neighbours model was found to be sensitive to the number of input variables with model performance being reduced if either too many or too few variables were used. The optimum number of variables was found to be between 5 and 6. A comparison with the Cornice model found the classification rates between the two models to be very similar. When an independent data set was used for testing model performance, the classification rate for the Mahalanobis-metric model presented in this study was 0.5% higher than for Cornice. Due to constraints of time, the nearest neighbours model was not tested for individual areas within The Pass. It is suspected, however, that models may be built for these areas that achieve similar classification rates as reported for the discriminant analysis in Section 5.3.6. 82 C H A P T E R 7. CONCLUSIONS It has been shown in this study that avalanche activity may be predicted from meteorological and snowpack data at Bear Pass with a success rate of 72%. The scheme requires first selecting an optimum variable set using univariate analysis and canonical discriminant analysis techniques, before using those variables to make an avalanche forecast using one of two statistical methods. The method of discriminant analysis gives a probability of avalanching as an output; the method of the nearest neighbours makes a simple yes / no decision based on a threshold value but in addition outputs a list of time periods which show the most similarity to the forecast period. Classification rates for the discriminant analysis were shown to be slightly superior to those for the nearest neighbours method, and rates were enhanced further for three out of four areas by geographically subdividing The Pass into individual areas. It is argued that the true value of a forecasting scheme such as this is not in the numerical value of the calculated avalanche probability but rather in the information that can be gleaned from analysing similar instances in the past. Human forecasters use both intuitive and deductive reasoning when making decisions regarding the prevailing avalanche hazard. The information on which intuitive reasoning is based is not readily described or analysed by the individual but nonetheless plays an important role in the decision making process. The incorporation of this intuitive knowledge is one of the central reasons why expert human forecasters consistently make better avalanche forecasts than computer models. Computers have no ability for intuitive reasoning. However, they do have an infallible memory, something human forecasters do not. The nearest neighbours model is effective as it uses the 83 computer's ability to accurately recall events to provide information that stimulates both deductive and intuitive reasoning by the human forecaster. In essence, the best aspects of computer modelling and human forecasting are merged. For this reason, an operational nearest neighbours model at Bear Pass would be a useful tool for avalanche forecasters to have available to them. Many parameters were assessed and adjusted to try and optimise the performance of the models described. It is possible to spend a lifetime tweaking and transforming variables to gain a small fraction of a percent increase in the classification rate. No doubt, with the right tweaking, the data in this study could be optimised just a little bit more. It is my opinion, however, that the models presented perform to a level very close to the maximum that is possible given the available data. In order to enhance model performance further, higher quality data are required for both meteorological observations and avalanche occurrences. Data of improved quality and relevance (in that it is recorded with greater proximity to avalanche start zones) are beginning to become available from high elevation remote weather stations. Time series analysis on hourly data recorded by these remote weather stations will likely provide the next avenue for analysis for the purpose of improving numerical avalanche forecasting. 84 7.1 Recommendations for further work - time patterns of predictor variables 7.1.1 Motivation It is felt that the limit of model performance using the conventional discriminant analysis and nearest neighbour techniques described above from data collected from a twice-daily manual weather station has been reached. In addition, discriminant analysis and nearest neighbours techniques make the assumption that meteorological conditions over successive time periods are not correlated with each other. This is at odds to what we observe about the avalanche phenomenon, where conditions for avalanching need to be built up through a complicated sequence of events. The build-up and the avalanche events, therefore, are most certainly correlated in time. A possible method is proposed here which addresses both the desire to use hourly data to improve model resolution and the need for a time series approach that assumes that events are correlated in time. In this way, it may be possible to improve the nearest neighbour prediction over the more conventional methods described above. 7.1.2 Proposed method The method proposed contains a wavelet analysis of meteorological data in the form of a time series. Only the univariate case is considered, where present temperature is assumed to be an indicator for predicting avalanche conditions. For a working model, generalisation to the multivariate case will likely be necessary. The method identifies sections of time in the past that were the most similar to present conditions. 85 A wavelet is defined to represent a short section of the time series from the most recent past. The length, n, of the wavelet can be varied; a reasonable value to choose may be on the order of magnitude of a typical storm event, say three days or 72 one-hour time periods. This is illustrated in Figure 28, which is a plot of hourly temperature readings from 2000 for the Disraeli remote Weather station in Bear Pass. In this case the time series extends back to the start of the season (1 November 2000). For a true analysis, the time period should include as many seasons' data as possible. Results at the start and the end of each season will have to be discarded, however, due to the joining of discontinuous data. 5 Days from start of seaon (1-Nov-2000) Figure 28. Time series of present temperature data for 2000 from the Disraeli remote weather station. As an example, a wavelet is defined as the last 3-day time period of this time series. 7.1.3 Deconvolution The method of deconvolution reduces a signal x(t) of a certain shape to a desired output e(t) by applying a filter a(t). The inverse filter a(t) deconvolves the wavelet x(t) into the desired output e(t). The wavelet has previously been defined as the most recent n-elements of the 86 time series for the variable being considered (in this case present temperature). The desired output is a unit spike at the start of the section of the time series, with zeros everywhere else. This allows sections of the time series that have a similar response to that of the wavelet to be identified. In practise, due to power considerations, this spike must be set at the time that best reflects the energy characteristics of the wavelet (see Section 7.1.4). Two out of the three elements in the system are now known, which permits a solution to be found for the inverse filter a(t). The inverse filter a(t) is defined such that: x(t) = a(t)*x(t), (7) where * is the convolution operator (e.g. Robinson, 1986, p 45). From Robinson and Treitel, 1980, the general form of the matrix equation for an optimum filter of length n is: Ka(t) = g(t), (8) where R is the autocorrelation matrix of the wavelet and g(t) is the crosscorrelation vector of the input and desired output. In expanded form, this can be written as: r r 0 .. x n-\ go r n-2 r n-\ n-2 r a \ V R Si _Sn-\ _ o . r r = a(t) The inverse filter, a(t) is optimal in the sense that it least-squares minimises the energy of the error between the actual and desired outputs. 87 7.1.4 Determining the desired output In order for the inverse to be considered stable (see Robinson and Treitel, 1980, p 87), the desired output must reflect the energy characteristics of the wavelet. Consider a spike at the first time period t = 0, with zeros making up the remainder of the time series. This is a minimum-energy wavelet, as all the power contained in the wavelet is at the start. Conversely, a maximum-energy n-length wavelet will contain n-1 zeros followed by a spike at the time period t = n -1 (for a zero-based time series). In the general case, the point of maximum energy may be at any point along the wavelet, and thus the spike is set to reflect this point. In practice, this may be done in an iterative manner, finding the n solutions to Equation 9 representing the different position of the spike in g(t) between t = 0 and t = n - 1 . The solution chosen is the one that minimises the error between the desired output and the deconvolved output. This error, E, is defined as: n-1 When the whole time series is convolved with the inverse filter then the output should give a series of spikes which correspond to sections of the time series that are similar to that of the wavelet. The exact location of the spike will depend on the offset found to give the least energy error solution for the desired output. By subtracting this offset from the time of the spike, the start time for the period of similar conditions may be found. 7.1.5 Testing the method Using the hourly minimum temperature data and the wavelet defined above, an inverse filter was found for each possible desired output series, varying the position of the unit spike for 88 all values of / . The case that gave the least error between the deconvolved output and the desired output was chosen as the overall optimum inverse filter. The whole time series was then deconvolved by this optimum inverse filter. The output yielded a series of spikes. The position of these output spikes was corrected by subtracting the offset of the unit spike in the desired output g(t). As a result, the output spikes indicate the start of similar sections of the time series as the wavelet. Figure 29 shows the results of this test. 5 0 10 20 30 40 50 60 70 80 90 100 110 Days from start of season (1-Nov-2000) Figure 29. Deconvolved output for the original time series deconvolved by the optimum inverse filter of the wavelet. Original time series and wavelet both shown. Also indicated (arrows) are the first time periods of the four most similar sections of the time series determined from the four highest spikes of the deconvolved output. 7.1.6 Selecting the nearest neighbours This is similar in principle to selecting the closest neighbours in the nearest neighbours model. The nearest k cases can simply be selected as the k highest output spikes from the deconvolved output. Figure 29 shows the start of the four time periods that were identified as 89 the most similar sections of the time series to the original wavelet. The original wavelet is characterised by a sharp negative peak. There is a small temperature rise and then a second, subsidiary negative peak before the temperature once again rises. The four similar sections of the time series identified also show similar profiles in their time series. Note that one of the sections actually has an output that is higher than that of the original wavelet. This section has a similar shape to the wavelet and a consistently higher magnitude value (meaning that the sign of the value is not important) for the present temperature for each corresponding time period. When this is the case, the deconvolved output will exceed that for the wavelet itself. 7.1.7 Limitations with the method One of the major limitations with this approach is that it contains the assumption of an identifiable signal contained within the data that otherwise comprise random noise (i.e. similar to a Gaussian distribution with a mean of zero). This is not the case for the temperature data from Bear Pass. It would be possible to remove the mean from the data, possibly on a seasonal basis, however, the problem with detecting a signal above the noise remains for wavelets with low-amplitude signals. Figure 30 shows the same series deconvolved with a different wavelet, this time taken from a low-amplitude period from day 23 to 25. Spurious spikes can be seen that represent periods that show a similar shape to the wavelet but have a higher magnitude. As a result of this limitation, this technique is best suited for either extreme magnitude, or dramatic events with a steep gradient in some predictor variable. For avalanche forecasting, such events could be: a prolonged period of extreme cold temperature that could indicate 90 development of depth hoar; a sharp warming event associated with the passing of a warm front leading to a significant amount of new snow; a sharp increase in temperature during a warm, clear day, leading to wet-snow releases due to solar insolation. O 10 20 30 40 50 60 70 80 90 100 110 Days from start of season (1-Nov-2000) Figure 30. Deconvolved output for the original time series deconvolved by the optimum inverse filter for a wavelet defined in a low-amplitude portion of the time series. Original time series and wavelet both shown. Extending the method to the multivariate case is not trivial. Normalisation of the variable in terms of its scale is already achieved, since the total power of the output should approximate unity (since the desired output is simply a string of zeros with one spike with unit amplitude at one point along the time series). However, the distribution of power along the output will depend on the error associated with the inverse filter, with power leaking into other time periods if the error is not zero. This will affect the amplitude of the output spikes. One possible solution may be to devise a distance metric based on the relative amplitude of the output spikes along the whole time series compared to the output spike corresponding to the original section of the time series that defined the wavelet. The amplitude ratios for each 91 variable could then be combined in a linear model or even a high-order neural network model to determine the optimum weighting scheme for the variables, using the classification rates of periods into avalanche and non-avalanche periods as a metric of success. A further problem, which relates to this ability to classify periods as either avalanche or nonavalanche stems from the fact that most of the avalanches are not precisely recorded in time. The field observer records the time of observation and the estimated age in hours of the avalanche deposit. This estimate, especially for avalanches that occur during the night, when fewer observations are made, will frequently have a temporal accuracy of greater than 1 hour (the temporal accuracy of the meteorological observations). Therefore, it will be necessary to spread the avalanche occurrences over a broader period of time, based on a probability distribution of avalanching centred over the best estimate of the occurrence time. In this way, a small error in the recording time of the avalanche will not significantly impact the classification as an avalanche or non-avalanche period. A n investigation into classification into avalanche and non-avalanche periods has not yet been undertaken for this method. 7.1.8 Conclusions The method may show considerable promise for determining time sections in the past where similar conditions occurred to those of the present. A section of time is identified as opposed to a single event, allowing sequences of events to be sought. In this way, the method contains the assumption that meteorological recordings are correlated in time and that the sequence of these recordings is responsible for producing avalanche conditions. There are, however, limitations to the applicability of this method. The method assumes a strong, identifiable signal and as such, it may only be applicable to high magnitude or 92 changing events with rapidly changing meteorological predictors. More testing is required to fully examine this limitation. Even with this limitation, there are many conditions that produce avalanches that fulfil these criteria and therefore, this method could be an important avenue for investigation for prediction of avalanches from high-resolution meteorological data. Much more work is required to validate this procedure as a working model for avalanche forecasting. Further investigation is required into the limitations of the method through assumptions about the nature of the time series. Work is also required to extend the method to a multivariate case with a definition of a metric with which to select the closest cases. In addition, a suitable hourly avalanche record needs to be determined, so that the output spikes may be matched to this record to determine whether avalanches are likely or not for the forecast period. 93 Bibliography Atwater, M . M . 1954. Snow avalanches. Scientific American, 190, No. 1, 26-31. Armstrong, R.L. and B.R. Armstrong. 1987. Snow-avalanche climates of the western United States: A comparison of maritime, intermountain and continental conditions. Avalanche Formation, Movement and Effect (Proceedings of the Davos Symposium, September 1986), IAHS Publication no. 162, 281-294. Bovis, M.J. 1976. Avalanche release and snow characteristics, San Juan Mountains, Colorado. Institute of Arctic and Alpine Research, Occasional Paper no. 19, 83-130. Bovis, M.J. 1977. Statistical forecasting of snow avalanches, San Juan Mountains, southern Colorado. Journal ofGlaciology, 18(78), 87-99. Brabeck, B. and R. Meister. 2001. A nearest-neighbor model for regional avalanche forecasting. Annals of Glaciology, 32, 130-134. Buser, O. 1983. Avalanche forecast with the method of nearest neighbours: an interactive approach. Cold Regions Science and Technology, 8(2), 155-163. Buser, O., M . Butler and W. Good. 1987. Avalanche Formation, Movement and Effect (Proceedings of the Davos Symposium, September 1986), IAHS Publication no. 162, 557569. Buser, O. 1989. Two years experience of operational avalanche forecasting using the nearest neighbours method. Annals of Glaciology, 13, 31-34. C A A . 1995. Observation Guidelines and Recording Standards for Weather, Snowpack and Avalanches. Canadian Avalanche Association, Revelstoke, BC, Canada. Chelani, A.B., D.G. Gajghate and M.Z. Hason. 2002. Prediction of ambient PMio and toxic metals using artificial neural networks. Journal of the Air and Waste Management Association, 52, 805-810. Devroye, L., L . Gyorfi and G. Lugosi. 1996. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York. Duda, R.O., P.E. Hart and D.G. Stork. 2001. Pattern Classification (second edition). WileyInterscience, New York. Enas, G.G. and S.C. Choi. 1986. Choice of the smoothing parameter and efficiency of knearest neighbour classification. Computational Mathematics with Applications, 12A, 235244. and D . M . McClung. 2003 (in press). Numerical avalanche prediction: Bear Pass, British Columbia, Canada. To be published in: Cold Regions Science and Technology. 94 Fohn, P.M.B. 1998. A n overview of avalanche forecasting models and methods. Norwegian Geotechnical Institute (NGI) publication no. 203, 25 Years of Snow Avalanche Research, Voss, Norway, 12-16 May 1998, 19-27. Gassner, M . , H.J. Etter and K. Birkeland. 2000. Proceedings of the International Snow Science Workshop in Big Sky Montana, October 2000, 52-59. Hand, D.J. and V . Vinciotti. 2003. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters 24, 1555-1562 Jamieson, B., T. Geldsetzer and C. Stethem. 2001. Case study of a deep slab instability and associated dry slab avalanches. Proceedings of the International Snow Science Workshop in Big Sky Montana, October 2000, 101-108. Judson, A . and B.J. Erikson. 1973. Predicting avalanche intensity from weather data: a statistical analysis. USFS research paper RM-112. LaChapelle, E.R. 1980. The fundamental processes in conventional avalanche forecasting. Journal of Glaciology, 26(94), 75-84. Manly, B.F.J. 1994. Multivariate Statistical Methods, a primer (second edition). Chapman & Hall, Boca Raton. McClung, D . M . and P. Schaerer. 1981. Snow Avalanche Size Classification Proceedings of Avalanche Workshop 1980. National Research Council, Associate Committee on Geotechnical Research; Technical Memorandum No. 133, 12-27. McClung, D . M . and P. Schaerer. 1993. The Avalanche Handbook. The Mountaineers, Seattle. McClung, D . M . and J. Tweedy. 1993. Characteristics of avalanching: Kootenay Pass, British Columbia, Canada. Journal of Glaciology, 39(132), 316-322. McClung, D . M . and J. Tweedy. 1994. Numerical avalanche prediction: Kootenay Pass, British Columbia, Canada. Journal of Glaciology, 40(135), 350-358. Obled, C. and W. Good. 1980. Recent developments of avalanche forecasting by discriminant analysis techniques: a methodological review and some applications to the Parsenn area (Davos, Switzerland). Journal of Glaciology, 25(92), 315-346. Perla, R J . 1970. On contributory factors in avalanche hazard evaluation. Canadian Geotechnical Journal, 7, 414-419. Province of British Columbia, 1982. Snow avalanche atlas, Bear Pass, Stweart-Hyder, B.C. Victoria B.C. Ministry of Transportation and highways. 202pp. 95 Purves, R. K. Morrison, G. Moss and B. Wright. 2002. Cornice - development of a nearest neighbours model applied in backcountry avalanche forecasting in Scotland. Proceedings of the International Snow Science Workshop in Penticton, BC Canada,29 September - 4 October, 2002, 117-122. Schatzoff, M . 1966. Exact distributions of Wilk's likelihood ratio criterion. Bimetrika, 53, 347-358. Whitaker, J.S. 1997. Use of Stepwise Methodology in Discriminant Analysis. Paper presented at the annual meeting of the Southwest Educational Research Association, Austin, January, 1997. Wilkinson, L. 1990. SYSTAT: The System for Statistics. Evanston, IL: SYSTAT, Inc. 96 Appendix A: Avalanche paths along Bear Pass Table 13. List of the avalanche paths that comprise each of the individual areas along Bear Pass (Province of British Columbia, 1982). The list of paths not used in the analysis is also included. Paths were not included either because the path was considered to pose no threat to the highway or in the case of the George Copper path, because the path is subjected to avalanching thorough icefall, which is not predictable from meteorological data. Group Track No Track Name Trigger Size threshold 12.9 Little Bear#l Natural only 1.5 13.7 Little Bear #2 Natural only 1.5 14.2 Little Bear #3 Natural only 1.5 14.7 Little Bear #4 Natural only 1 21.9 Canyon Slide Natural only 2.5 24.9 Gladstone Natural only 2.5 25 Disraeli Natural only 2.5 26.1 Chocolate Bar #1 Natural only 2 26.4 Chocolate Bar #2 Natural only 2 28.3 Cullen Natural only 2 28.6 Junction South #1 Natural only 1.5 28.7 Junction South #2 Natural only 1.5 29 Junction South #3 Natural only 1.5 29.5 South Valley #1 Natural only 1.5 29.61 South Valley #2 Natural only 1.5 29.81 South Valley #3 Natural only 2 30.9 South Valley #4 Natural only 2 33.5 Kettle Hole South Natural only 2.5 35.1 Strohn Natural only 2.5 35.5 East Strohn Natural only 3 37.3 Cornice Natural only 3 Little Bears North Aspect 37.7 Entrance Natural only 3 40.2 Eagle Natural only 2.5 40.4 Gunner Natural only 2.5 43.5 Surprise #1 Natural only 3 28 Short Slope #2 All 0 31.9 Summit Sluffs #1 All 0 32.4 Summit Sluffs #2 All 0 33.2 Summit Sluffs #3 All 0 Summit Sluffs 97 Table 13 continued. Group Track No Track Name Trigger Size threshold 29.1 Junction North #1 Natural only 0 29.3 Junction North #2 Natural only 0 29.6 North Valley #1 Natural only 0 29.8 North Valley #2 Natural only 0 29.9 North Valley #3 Natural only 0 South Aspect 30.1 North Valley #4 Natural only 0 30.3 North Valley #5 Natural only 0 30.5 North Valley #6 Natural only 0 30.6 North Valley #7 Natural only 0 30.7 North Valley #8 Natural only 0 31.4 Old Post #5 North Natural only 0 33.6 Kettle Hole North Natural only 2.5 34.1 Yvonne Peak Natural only 2.5 35.2 Yvonne Peak East Natural only 2.5 35.6 Old Post #6 North Natural only 2 36.2 Addition Natural only 2 36.4 Old Post #7 North Natural only 2 37.6 Mounds #1 Natural only 2 37.8 Mounds #2 Natural only 2 38.1 North Creek Fork Natural only 1.5 38.4 Little Entrance #1 Natural only 0 38.5 Blindman Natural only 1.5 38.6 Lindsay's Natural only 1.5 38.8 Little Entrance #2 Natural only 0 39.3 Second Revision #1 Natural only 1.5 39.4 Second Revision #2 Natural only 1.5 39.5 Little Entrance #3 Natural only 0 39.8 Little Entrance #4 Natural only 0 39.9 Highway Fill #1 Natural only 1.5 40.1 Highway Fill #2 Natural only 1.5 40.7 Three Gully North Natural only 1.5 40.9 Misty Natural only 1.5 41 Camel Natural only 1 41.2 Cracker Natural only 1 41.4 Sunny Boy #1 Natural only 0 41.5 Sunny Boy #2 Natural only 0 41.7 Endgoal Natural only 0 Table 13 continued Group Track No Track Name Trigger Size threshold 4.2 Dahlie N/A N/A 16 Bunting N/A N/A 20 American N/A N/A 20.2 Canyon Washout #1 N/A N/A 20.8 Champion N/A N/A 21.7 Canyon Washout #2 N/A N/A 21.8 Cutline N/A N/A Not Used 25.2 Rufus Creek N/A N/A 27.2 George Copper N/A N/A 27.7 Short Slope #1 N/A N/A 42.7 Lower Windy West N/A N/A 43 Lower Windy East N/A N/A o o ••-rNCTir ra xi a ra CNCH * % CO <L >> O ro== D) Ui = as <B s r » > • > s > > 3 c c SZ SZ c 3 OO 3 O tf) 5 0 0 2 Z >, >l ^ > % >lZ — — — — — — — — LD ffi (I ^ tr |3>>>>>>>>S r § § 0 0 0 0 0 o o o 32 ^ n Z Z Z Z Z Z Z Z O ^ •5 r-pi ^ ro tc co m i q ID N •sr ciScyicyJcxioSooooo*— CNMOitNCMnnronnon SS 0) es fa Q . £3 y B 3 "3 55 £ o — £1 3 s O D d £ O C|H 33 O O J 2 -S g> j 1 £ .23 .3 H-» ra OH o E ft ft ca ra (Li ft gr o — *A OT . -a —c o « I 1& tf) •3» s —u Appendix B: Software and The programming files containing the meteorological records and avalanche occurrences were supplied by the Ministry of Transportation as Microsoft Excel files. As a result, Excel was used to organise and edit the database. A great many macros were developed to filter and edit the data using the built-in programming language of Excel known as Visual Basic for Applications (VBA). This had the advantage that it interfaces with the programming language Visual Basic (VB) The which was used to develop the nearest neighbours model. univariate analysis and canonical discriminant analysis was performed using the statistical software package Systat (version 10). The nearest neighbours model was programmed using stand-alone Visual Basic. The model features the ability to make forecasts for either a single period at a time or to perform a batch forecast for the whole data set. A function to exclude a random sample for use as an independent testing set was written and included in this program. A p p e n d i x C : Metadata for the digital elevation m o d e l of B e a r P a s s The digital elevation model (DEM) shown in Figure 2 was produced by digitising contour data from a 1:50000 NTS map. Using ArcGIS software (version 8.2), a triangulated integrated network (TIN) was produced based on these digitised contours and then a 10 m raster grid interpolated from this. The location of the avalanche paths were digitised by staff from the Ministry of Transportation, coloured according to the avalanche risk and then superimposed onto the DEM. 101
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical avalanche forecating using meteorological...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical avalanche forecating using meteorological data from Bear Pass, British Columbia, Canada Floyer, James Antony 2003
pdf
Page Metadata
Item Metadata
Title | Statistical avalanche forecating using meteorological data from Bear Pass, British Columbia, Canada |
Creator |
Floyer, James Antony |
Date Issued | 2003 |
Description | A framework is presented for a numerical prediction scheme with a prediction accuracy of greater than 70% at Bear Pass, British Columbia, Canada. One way analysis of variance and canonical discriminant analysis are used to identify the principal variables that allow discrimination between avalanche and non-avalanche time periods. The optimum variable set is then used in a Mahalanobis-metric nearest neighbours model to determine the observations in the historical database that most closely represent conditions of the forecast period. The information about these neighbours can be used by the forecaster to aid in making an avalanche forecast. In this study, a univariate analysis of the meteorological and snowpack data from The Pass is used to assess the contribution of each of the variables for use in avalanche prediction. Using these data, the snow-climate of the area is assessed and it is shown that Bear Pass may be classified as a maritime snow climate, although some elements of a transitional snow-climate are evident. A canonical discriminant analysis is performed for both a two-group discrimination into avalanche and non-avalanche periods and for a three-group discrimination into wet, dry and non-avalanche periods. For the three-group discrimination, a two-stage discrimination is preferred, involving first discriminating into wet-avalanche or dry-avalanche periods and then discriminating between wet or dry-avalanche and non-avalanche periods. The analysis is performed for all areas in a combined analysis and also for individual sub-areas defined within The Pass. Improvements on classification rates to three out of the four sub-areas are found compared to the analysis for the whole Pass. Various parameters that affect the nearest neighbour model performance are analyzed. The effect of group size distribution is assessed, as well as the number and threshold number of neighbours that form the basis for the forecast decision. The effect of dimensionality on the model is also analyzed. The Mahalanobis-metric nearest neighbours model is compared to Cornice, a nearest neighbours model developed for avalanche forecasting in Scotland. Classification rates for the two models are found to be very similar. Finally, preliminary findings from a new method using time patterns of predictor variables is presented. The method makes use of time series data from remote weather stations. Sections of the time series are similar to present conditions are identified through a deconvolution process. It is hoped that this method may offer improvements to the conventional nearest neighbours method by implicitly assuming that meteorological conditions are correlated in time. |
Extent | 6636954 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-10-28 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0091085 |
URI | http://hdl.handle.net/2429/14259 |
Degree |
Master of Science - MSc |
Program |
Geography |
Affiliation |
Arts, Faculty of Geography, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2003-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2003-0372.pdf [ 6.33MB ]
- Metadata
- JSON: 831-1.0091085.json
- JSON-LD: 831-1.0091085-ld.json
- RDF/XML (Pretty): 831-1.0091085-rdf.xml
- RDF/JSON: 831-1.0091085-rdf.json
- Turtle: 831-1.0091085-turtle.txt
- N-Triples: 831-1.0091085-rdf-ntriples.txt
- Original Record: 831-1.0091085-source.json
- Full Text
- 831-1.0091085-fulltext.txt
- Citation
- 831-1.0091085.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0091085/manifest